About
Clinical decision making often draws on multiple diverse data sources including, but not limited to, images, text reports, electronic health records (EHR), continuous physiological signals and genomic profiles and yet most AI systems deployed in healthcare remain unimodal. However, integrating diverse medical modalities presents significant challenges including data heterogeneity, data scarcity, missing or asynchronous modalities, variable data quality and the lack of standardized frameworks for aligned representation learning. This workshop aims to bring the EurIPS community together to tackle the problem of fusing these heterogeneous inputs into interpretable, coherent patient representations, aiming to mirror the holistic reasoning of clinicians. We aim to bring together machine learning researchers, clinicians and industry partners dedicated to the theory, methods, and translation of multimodal learning in healthcare.
Goals
In this workshop, we aim to:
- Advance methods for learning joint representations from images, text, signals, and genomics.
- Investigate foundation-model pretraining at scale on naturally paired modalities.
- Address robustness, fairness, and missing-modality issues unique to healthcare fusion.
- Foster clinician–ML collaboration and outline translational paths to deployment.
Key Info
Important Dates
Our Call for Papers is now open!
Please note all deadlines are Anywhere on Earth (AOE).
- Submission Deadline: October 15, 2025
- Acceptance Notification: October 31, 2025
- Workshop Date: December 6, 2025
- Camera-Ready Submission Deadline: November 15, 2025
Call for Papers
Authors are invited to submit 4-page abstracts on topics relevant to multimodal representation learning in healthcare. These include, but are not limited to, vision-language models for radiology, temporal alignment of multimodal ICU streams, graph and transformer architectures for patient data fusion, cross-modal self-supervised objectives, and multimodal benchmarks with fairness and bias analysis.
Submission
- Submission site: via OpenReview
- Format: NeurIPS 2025 template
- Length: max 4 pages excluding references
- Review: Double-blind
- Anonymization: Required, ensure that there are no names or affiliations in all parts of the submission including any code.
All accepted papers will be published on the website. Please note that there will be no workshop proceedings (non-archival).
Poster Format
All accepted workshop papers will be presented as physical posters during the MMRL4H@EurIPS 2025 workshop in Copenhagen.
- Size: A1
- Orientation: Portrait
- Printing: Authors are responsible for printing and bringing their posters
- Poster sessions: Provide opportunities for in-depth discussion and networking with attendees and invited speakers
Accepted Contributions
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
by Hugo Thimonier, Antony Perzo, Renaud Seguier
PDF · OpenReview
Position: Real-World Clinical AI Requires Multimodal, Longitudinal, and Privacy-Preserving Corpora
by Azmine Toushik Wasi, Shahriyar Zaman Ridoy
PDF · OpenReview
When are radiology reports useful for training medical image classifiers?
by Herman Bergström, Zhongqi Yue, Fredrik D. Johansson
PDF · OpenReview
MIND: Multimodal Integration with Neighbourhood-aware Distributions
by Hanwen Xing, Christopher Yau
PDF · OpenReview
NAP: Attention-Based Late Fusion for Automatic Sleep Staging
by Alvise Dei Rossi, Julia van der Meer, Markus Schmidt, Claudio L. A. Bassetti, Luigi Fiorillo, Francesca D. Faraci
PDF · OpenReview
Are Large Vision Language Models Truly Grounded in Medical Images? Evidence from Italian Clinical Visual Question Answering
by Federico Felizzi, Olivia Riccomi, Michele Ferramola, Francesco Andrea Causio, Manuel Del Medico, De Vita Vittorio, Lorenzo De Mori, Alessandra Piscitelli, Pietro Eric Risuleo, Bianca Destro Castaniti, Antonio Cristiano, Alessia Longo, Luigi De Angelis, Mariapia Vassalli, Marcello Di Pumpo
PDF · OpenReview
Multimodal Alignment for Synthetic Clinical Time Series
by Arinbjörn Kolbeinsson, Benedikt Kolbeinsson
PDF · OpenReview
iMML: A Python package for multi-modal learning with incomplete data
by Alberto López, John Zobolas, Tanguy Dumontier, Tero Aittokallio
PDF · OpenReview
POEMS: Product of Experts for Interpretable Multi-omic Integration using Sparse Decoding
by Mihriban Kocak Balik, Pekka Marttinen, Negar Safinianaini
PDF · OpenReview
Multi-Omic Transfer Learning for the Diagnosis & Prognosis of Blood Cancers
by Leonardo P.A. Biral, Sandeep Dave
PDF · OpenReview
From Binning to Joint Embeddings: Robust Numeric Integration for EHR Transformers
by Maria Elkjær Montgomery, Mads Nielsen
PDF · OpenReview
A tutorial on discovering and quantifying the effect of latent causal sources of multimodal EHR data
by Marco Barbero Mota, Eric Strobl, John M Still, William W. Stead, Thomas A Lasko
PDF · OpenReview
Multi-Modal AI for Remote Patient Monitoring in Cancer Care
by Yansong Liu, Ronnie Stafford, Pramit Khetrapal, Huriye Kocadag, Graca Carvalho, Patricia de Winter, Maryam Imran, Amelia Snook, Adamos Hadjivasiliou, D Vijay Anand, Weining Lin, John Kelly, Yukun Zhou, Ivana Drobnjak
PDF · OpenReview
VenusGT: A Trajectory-Aware Graph Transformer for Rare-Cell Discovery in Single-Cell Multi-Omics
by Natalia Sikora, Rebecca Rees, Sean Righardt Holm, Hanchi Ren, Lewis W. Francis
PDF · OpenReview
Virtual Breath-Hold (VBH) for Free-Breathing CT/MRI: Segmentation-Guided Fusion with Image-Signal Alignment
by Rian Atri
PDF · OpenReview
Towards Multimodal Representation Learning in Paediatric Kidney Disease
by Ana Durica, John Booth, Ivana Drobnjak
PDF · OpenReview
A learning health system in Neurorehabilitation as a foundation for multimodal patient representation
by Thomas Weikert, Eljas Roellin, Diego Paez-Granados, Chris Easthope Awai
PDF · OpenReview
Schedule
| Time | Session |
|---|---|
| 9:00–9:15 | Opening Remarks |
| 9:15–9:35 | Speaker Session I - Gunnar Rätsch |
| 9:40-10:00 | Speaker Session I - Sonali Parbhoo |
| 10:00-10:10 | Oral Session I - MIND: Multimodal Integration with Neighbourhood-aware Distributions |
| 10:10-10:20 | Oral Session I - Are Large Vision Language Models Truly Grounded in Medical Images? Evidence from Italian Clinical Visual Question Answering |
| 10:20-11:00 | Coffee & Poster Session I |
| 11:00-11:20 | Speaker Session II - Bianca Dumitrascu |
| 11:25-11:45 | Speaker Session II - Rajesh Ranganath |
| 11:45-12:30 | Panel (all speakers) - "Translating Multimodal ML to the Bedside" |
| 12:30-14:00 | Lunch & Networking |
| 14:00-14:20 | Speaker Session III - Desmond Elliott |
| 14:25-14:45 | Speaker Session III - Stephanie Hyland |
| 14:45-14:55 | Oral Session II - When are radiology reports useful for training medical image classifiers |
| 14:55-15:05 | Oral Session II - POEMS: Product of Experts for Interpretable Multi-omic Integration using Sparse Decoding |
| 15:05-15:45 | Coffee & Poster Session II |
| 15:45-16:15 | Open Q&A |
| 16:15-16:30 | Closing Remarks & Awards |
Confirmed Speakers

Desmond Elliott
University of Copenhagen, DK

Sonali Parbhoo
Imperial College London, UK

Stephanie Hyland
Microsoft Research, UK

Rajesh Ranganath
New York University, USA

Bianca Dumitrascu
Columbia University, USA

Gunnar Rätsch
ETH Zurich, CH
Organizers

Stephan Mandt
Associate Professor, UC Irvine, USA

Ece Özkan Elsen
Assistant Professor, University of Basel, CH

Samuel Ruiperez-Campillo
PhD Student, ETH Zurich, CH

Thomas Sutter
Postdoctoral Researcher, ETH Zurich, CH

Julia Vogt
Associate Professor, ETH Zurich, CH

Nikita Narayanan
PhD Student, Imperial College London, UK
Contact
- General inquiries: mmrl4h@gmail.com
