The Mathematics of Reinforcement Learning and ExperimentaL sciences (MARLEL) group is an international and interdisciplinary research network that has grown organically from long-term collaborations between co-authors and colleagues. While not formally affiliated with a specific institution, the group brings together researchers who share a common vision and scientific affinity, and who have been working together over the years on advancing both the theory and the practice of reinforcement learning.
As its name suggests, the group’s central focus is to advance the core mathematical foundations of Reinforcement Learning, with particular attention to the interplay between sequential decision theory, optimization, and statistical learning.
Beyond its theoretical dimension, the group also promotes cross-disciplinary dialogue between reinforcement learning and the experimental sciences, in particular life sciences—such as agriculture, biology, and medicine—where learning-by-interaction paradigms naturally align with the experimental process itself. Through these connections, the group aims to develop methods that are both mathematically rigorous and scientifically impactful.
From a broader philosophical standpoint, the group views Reinforcement Learning as the mathematics of interaction and decision-making, a fundamental complement to the generative side of Artificial Intelligence. Whereas Generative AI focuses on modeling and creating data from existing knowledge, Decisional AI, embodied by reinforcement learning, addresses the active process of exploring, experimenting, and optimizing in uncertain environments. This makes it a natural theoretical counterpart to experimental science itself, where knowledge emerges from iterative cycles of hypothesis, intervention, and observation.
By deepening the mathematical understanding of these adaptive processes, the group seeks to build bridges between formal theory and empirical inquiry, fostering a view of intelligence not only as a capacity to infer from data, but as an ability to learn through action: to decide, test, and evolve knowledge through experimentation.
Collaborators
- Odalric-Ambrym Maillard (PI) — Inria, Université de Lille, France
- Shubhada Agrawal — Indian Institute of Science, Bangalore, India
- Audrey Durand — Université Laval, Québec, Canada
- Aditya Gopalan — Indian Institute of Science, Bangalore, India
- Anders Jonsson — Universitat Pompeu Fabra, Barcelona, Spain
- Ronald Ortner — Technical University of Leoben, Austria
- Mohammad Sadegh Talebi — University of Copenhagen, Denmark
Research topics
This group targets producing high-quality contributions on theoretical, algorithmic, and applied aspects of online learning, bandit theory, and reinforcement learning, with a particular emphasis on models that incorporate structure, hierarchy, uncertainty, or complex feedback, in order to advance our understanding of how agents can learn efficiently from sequential interaction, limited information, or structured environments. Our main focus of interests include, but are not limited to, the following list of topics:
1. Foundations of Reinforcement Learning Theory
- Average-reward reinforcement learning and linearly-solvable MDPs
- Structured or communicating MDPs, specialized decision processes, and reward-machine models
- Exploration principles with provable guarantees; uncertainty-aware planning
- Risk-sensitive and safety-oriented RL, coherent risk measures, CVaR analysis
- Sample-complexity analysis in model-based and model-free settings
- Confidence sets and concentration inequalities for structured dynamics
2. Online Learning and Bandit Theory
- Stochastic, adversarial, corrupted, multimodal, and structured multi-armed bandits
- Regret minimization, problem-dependent analysis, optimal lower bounds
- Best-arm identification, top-two selection, and sample-efficient exploration
- Linear, generalized linear, and high-dimensional bandits with structured action spaces
- Partial monitoring, hybrid feedback models, and active learning paradigms
- Bandits under model misspecification, latent structure, or corrupted signals
3. Offline, Continual, and Transfer Reinforcement Learning
- Offline RL with structured models, regular decision processes, and language-guided metrics
- Continual offline RL, non-stationary learning, and transfer across tasks or domains
- Representations for hierarchical policy spaces and long-horizon evaluation
- Reliable off-policy learning, critic robustness, and generalization guarantees
- Policy adaptation and fine-tuning of pretrained or foundation models
4. Planning, Hierarchical Models, and Representation Learning
- Reward-machine-induced structure, hierarchical decomposition, temporal abstractions
- Learning bisimulation metrics, state representation, and optimal transport distances
- Generalized planning with object-centric, pointer-based, or learned abstractions
- Monte-Carlo Tree Search under uncertainty, value estimation, and exploration bonuses
- Representation learning for model reduction and sample-efficient planning
5. Robustness, Safety, and Privacy
- Differentially private reinforcement learning and bandit algorithms
- Robust learning under adversarial or corrupted feedback
- Sequential testing, adaptive stopping rules, and reliable agent evaluation
- Safe exploration strategies and performance-guaranteed policy updates
6. Learning in Structured or Application-Driven Domains
- Wireless communication systems, radio optimization, resource scheduling
- Graph-based spatial prediction, Gaussian-process methods, and multi-modal data fusion
- Agriculture, industrial processes, and scientific computing with sequential agents
- Agroecology and experimental life sciences, large-scale or massified experimentation
- Biodiversity-preserving strategies, and data-driven ecological management
- Education-oriented adaptive experimentation and high-dimensional decision models
- Health and rehabilitation applications with harmonized data sources