Reginald McLean

About Me

I am currently an Applied Research Scientist on the Trust and Safety team at Alberta Machine Intelligence Institute (Amii). Before Amii, I completed my PhD at Toronto Metropolitan University under the supervision of Nariman Farsad and Isaac Woungang. My PhD work examined how to improve the capabilities of multi-task reinforcement learning agents in scenarios where a single policy must accomplish multiple tasks. Previously, I completed my MSc in Computer Science at Brock University under the supervision of Beatrice Ombuki-Berman, and I received my BSc. in Computer Science from Trent University.

Past Experience

Before joining Amii, I was a part-time lecturer in the Computer Science Department at Brock University. Previously, I was an intern at Royal Bank of Canada working on supporting their technical infrastructure using AIOps methods. Upon the completion of my MSc, I was the Lead Machine Learning Developer at Castle Ridge Asset Management.

Publications

* Indicates equal contributions.

Meta-World+: An Improved, Standardized, RL Benchmark

Reginald McLean*, Evangelos Chatzaroulas*, Luc McCutcheon, Frank Röder, Tianhe Yu, Zhanpeng He, K.R. Zentner, Ryan Julian, J.K. Terry, Issac Woungang, Nariman Farsad, Pablo Samuel Castro

Accepted at Conference on Neural Information Processing Systems 2025.

Accepted at International Conference on Machine Learning (ICML) 2025, Workshop on CODEML: Championing Open-source DEvelopment in Machine Learning (Spotlight, Oral).

Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release an open-source version of Meta-World that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.

Multi-Task Reinforcement Learning Enables Parameter Scaling

Reginald McLean*, Evangelos Chatzaroulas*, J.K. Terry, Issac Woungang, Nariman Farsad, Pablo Samuel Castro

Accepted at Reinforcement Learning Conference, 2025.

Outstanding Paper on Scientific Understanding in Reinforcement Learning.

Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.

Overcoming State and Action Space Disparities in Multi-Domain, Multi-Task Reinforcement Learning

Reginald McLean, Kai Yuan, Issac Woungang, Nariman Farsad, Pablo Samuel Castro

Accepted at Morphology-Aware Policy and Design Learning Workshop @ CoRL 2024 (Spotlight, Oral).

Current multi-task reinforcement learning (MTRL) methods have the ability to perform a large number of tasks with a single policy. However when attempting to interact with a new domain, the MTRL agent would need to be re-trained due to differences in domain dynamics and structure. Because of these limitations, we are forced to train multiple policies even though tasks may have shared dynamics, leading to needing more samples and is thus sample inefficient. In this work, we explore the ability of MTRL agents to learn in various domains with various dynamics by simultaneously learning in multiple domains, without the need to fine-tune extra policies. In doing so we find that a MTRL agent trained in multiple domains induces an increase in sample efficiency of up to 70% while maintaining the overall success rate of the MTRL agent.

Video Language Critic: Transferable Reward Functions for Language-Conditioned Robotics

Minttu Alakuijala, Reginald McLean, Isaac Woungang, Nariman Farsad, Samuel Kaski, Pekka Marttinen, Kai Yuan

Accepted at Transactions on Machine Learning Research

Accepted at Workshop on Language and Robot Learning: Language as an Interface @ CoRL 2024

Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from substantial amounts of external observation-only data, and only the latter depends on a specific robot embodiment. To this end, we propose Video-Language Critic, a reward model that can be trained on readily available cross-embodiment data using contrastive learning and a temporal ranking objective, and use it to score behavior traces from a separate actor. When trained on Open X-Embodiment data, our reward model enables 2x more sample-efficient policy training on Meta-World tasks than a sparse reward only, despite a significant domain gap. Using in-domain data but in a challenging task generalization setting on Meta-World, we further demonstrate more sample-efficient training than is possible with prior language-conditioned reward models that are either trained with binary classification, use static images, or do not leverage the temporal information present in video data.

Swarm Based Algorithms for Neural Network Training

Reginald McLean, Beatrice Ombuki-Berman, Andries P. Engelbrecht

Accepted at 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

The purpose of this paper is to compare the abilities and deficiencies of various swarm based algorithms for training artificial neural networks. This paper uses seven algorithms, seven regression problems, sixteen classification problems, and four bounded activation functions to compare algorithms in regards to loss, accuracy, hidden unit saturation, and overfitting. It was found that particle swarm optimization is the top algorithm for regression problems based on loss, firefly algorithm was the top algorithm for classification problems when examining accuracy and loss. The ant colony optimization and artificial bee colony algorithms caused the least amount of hidden unit saturation, with the bacterial foraging optimization algorithm producing the least amount of overfitting.