Call for Papers
Important Dates
Important Dates for Review Process. We will follow the suggested dates by CVPR26.
Mar 01, 2026 AOE
Workshop Paper Submission Deadline
Mar 19, 2026
Workshop Paper Notification Date
Apr 10, 2026
Program, Camera-ready, Videos Uploaded
Paper Submission and Acceptance
We welcome technical, position, or perspective papers related to the topics outlined below. All submissions must be written in English, follow the official CVPR proceedings format, and adhere to the double-blind review policy.
- Tiny or Short Papers (2–4 pages) - We invite concise papers that present implementations and evaluations of unpublished but insightful ideas, moderate yet self-contained theoretical analyses, follow-up experiments, re-analyses of prior work, or new perspectives on existing research.
- Regular Papers (up to 8 pages, including figures and tables) - We encourage submissions introducing original methods, novel research visions, applications, or discussions of open challenges in multimodal learning.
We accept both archival and non-archival paper submissions; authors should indicate the submission type during submission.
A Best Paper Award will be presented based on reviewer scores and the workshop committee’s evaluation.
All accepted papers will be presented as posters during the workshop, and some of them will be selected for short oral presentations. Poster sessions will be conducted onsite with dedicated time for interactive discussions. For remote attendees, we will offer a virtual poster gallery and live Q&A channels to ensure inclusive engagement.
Topics and Themes
We welcome all relevant submissions in the area of multimodal learning, with emphasis on any-to-any multimodal intelligence, such as:
- Multimodal Representation Learning
- Multimodal Transformation
- Multimodal Synergistic Collaboration
- Benchmarking and Evaluation for Any-to-Any Multimodal Learning
Other topics include, but are not limited to:
- Unified multimodal foundation and agentic models.
- Representation learning for embodied and interactive systems.
- Integration of underexplored modalities and cognitive perspectives on multimodal perception and reasoning.
About
The recent surge of multimodal large models has brought strong progress in connecting language, vision, audio, and beyond. Yet most existing systems remain constrained to fixed modality pairs, lacking flexibility to generalize or reason across arbitrary combinations. The Any-to-Any Multimodal Learning workshop aims to explore systems that can understand, align, transform, and generate across any set of modalities. We organize the discussion around three pillars: representation learning, transformation, and collaboration.
For the latest papers and datasets, please refer to Awesome-Any-to-Any-Generation. This repository is regularly updated and provides valuable resources for your submission.
Speakers
Tentative Schedule
| Time | Schedule | Speaker |
|---|---|---|
| Morning Schedule | ||
| TBD | Introduction and opening remarks | - |
| TBD | Keynote Talk 1 | TBD |
| TBD | Keynote Talk 2 | TBD |
| TBD | Oral Presentations | - |
| TBD | Coffee Break | - |
| TBD | Keynote Talk 3 | TBD |
| TBD | Keynote Talk 4 | TBD |
| TBD | Poster Session 1 (Interactive) + Virtual Gallery | - |
| TBD | Lunch Break | - |
| Afternoon Schedule | ||
| TBD | Keynote Talk 5 | TBD |
| TBD | Keynote Talk 6 | TBD |
| TBD | Poster Session 2 (Interactive) + Virtual Gallery | - |
| TBD | Coffee Break | - |
| TBD | Keynote Talk 7 | TBD |
| TBD | Panel Discussion | TBD |
| TBD | Closing Remarks + Best Paper Award | TBD |