
Call is OPEN
Deadline: 1st of May 2026
Notification of acceptance: 12th of May 2026
Camera-ready submissions: 22th of May 2026
We invite original submissions presenting innovative ideas, creative approaches, and rigorous methodologies that advance the state of the art in motion analysis, for this upcoming workshop, in the form of:
- Extended Abstract Paper in length of 1-2 pages (references do not count toward the page limit) in IEEE Conf. style. Authors may optionally include a supplementary video to accompany their submission.
- Standalone videos with a max. length of 3 min.
Selected extended abstracts/videos will have the opportunity to be archived on this website and will be promoted to a broader audience through various media channels.
Authors of accepted workshop papers will be required to hold a spotlight teaser presentation, which is accompanied by a poster presentation during the interactive session at the coffee-break.
Please submit your paper/video via the following link
Outline and Objectives
The perception, reconstruction, and synthesis of human motion have long been central topics in computer vision. Over the past decade, remarkable progress in vision-based human motion understanding has been enabled by the availability of large-scale datasets and the rise of powerful foundation models trained on them. These developments have substantially advanced our ability to model human pose, dynamics, and interaction from visual input alone.
Yet, human motion is inherently multimodal. It is not only seen but also felt and heard, and can be measured with a variety of devices. Recent research has increasingly explored the integration of diverse sensing modalities, from wearable devices such as IMUs and insoles to non-visual signals like WiFi and sound. This multimodal shift opens new possibilities for building richer, all-round, context-aware representations of human behavior, while also posing open challenges in cross-sensor alignment, temporal reasoning, and data-efficient learning. Moreover, each sensing modality comes with its own limitations. Thus, there is a growing need to connect multimodal sensing and motion understanding within a unified framework.
The Workshop on Multimodal Human Motion Analysis (MOMA) aims to catalyze this integration. Bringing together researchers from robotics, multimodal learning, and perception, it provides a forum to discuss new methodologies, benchmarks, and frameworks for robust, generalizable, and ethically aligned motion understanding. The workshop focuses on two complementary areas: multimodal perception, covering unified representations, temporal reasoning, and data-efficient learning for action analysis, and embodied, human-centered intelligence, addressing foundation models, edge-efficient deployment, and responsible evaluation. Through invited talks and panel discussions, MOMA highlights emerging directions and fosters interdisciplinary dialogue toward real-world, human-centered motion understanding.
Topics of interest
The topics covered in the workshop include, but are not limited to:
- Multimodal human action and behavior analysis from visual, depth, inertial, and physiological data.
- Cross-sensor fusion and alignment for motion understanding.
- Multimodality and Robustness.
- Temporal reasoning and long-term modeling of human activities and interactions.
- Advances in human motion representations for multimodal human motion understanding.
- Human-centric Foundation and generative models (e.g., diffusion, transformers, LLMs).
- Self, weakly, and unsupervised learning methods for data-efficient and cross-domain generalization.
- Advances in edge-deployable, and energy-efficient AI models for real-time human sensing.
- Introducing robustness to occlusions, crowded scenes, and domain or subject variability.
- Responsible and human-centered evaluation: fairness, bias mitigation, privacy, and transparency.
- Applications in healthcare, rehabilitation, sports performance, workplace safety, Extended Reality (XR), and robotics.
Invited Speakers
Jianfei YangTalk Title Multimodal Foundation Model for Language-Grounded Human Sensing and Reasoning Bio Jianfei Yang is an Assistant Professor at Nanyang Technological University (NTU), where he leads the Multimodal AI and Robotic Systems (MARS) Lab. His research focuses on Human-Centric Physical AI and Embodied AI, integrating multimodal sensing, foundation models, and robotics for real-world applications such as human sensing, activity understanding, and intelligent interaction. |
Ronald PoppeTalk Title Temporal Coordination in Fine-Grained Analysis of Parent-Child Interactions Bio Ronald Poppe is an associate professor in the Information and Computing Sciences Department of Utrecht University. His research interests center around the analysis of human (interactive) behavior from videos and other sensors, with applications in media analysis and generation, and in the clinical domain. He received a Ph.D. from the University of Twente, The Netherlands (2009) and was a visiting researcher at the Delft University of Technology, Stanford University, and University of Lancaster. He is a senior member of the IEEE. |
Thomas PloetzTalk Title Sensor-Based Human Activity Recognition as the Basis for Effective Health and Wellbeing Assessments Bio Thomas Ploetz is a Computer Scientist with expertise and decades of experience in Pattern Recognition and Machine Learning research (PhD from Bielefeld University, Germany). His core research lies in the field of wearable and ubiquitous computing with specific focus on computational behavior analysis that is driven by the automated analysis of what people are doing and how this changes over time — all based on the automated analysis of multimodal time series data that are captured using a range of sensors that are either body worn or integrated into the built environment. He works as a Professor of Computing at the School of Interactive Computing at the Georgia Institute of Technology in Atlanta, USA, where he leads the Computational Behavior Analysis research lab (cba.gatech.edu). |
Suining Henry HeTalk Title Human-Mobility Interaction: A Multimodal Tale of Micromobility Bio Suining Henry He is currently working as the Associate Professor (with Tenure) at School of Computing, University of Connecticut (UConn). Before that, Henry was working as a Tenure-Track Assistant Professor at UConn since 09/2019. He leads the UConn's Ubiquitous and Urban Computing Lab. Before joining UConn, he worked as a postdoctoral research fellow at the Real-Time Computing Lab (RTCL), University of Michigan. His research interests include Human-centered AI, GeoAI, and AI of Things. |
Organizers
Olivia NocentiniPostDoctoral Researcher at the Italian Institute of Technologye-mail: olivia.nocentini@iit.it |
|
Rishabh DabralResearch Group Leader at the Max Planck Institute for Informaticse-mail: rdabral@mpi-inf.mpg.de |
|
Niaz AhmadPostdoctoral Research Fellow in the CVIS Lab at Toronto Metropolitan University |
|
Marta LorenziniSenior Technician at the Italian Institute of Technologye-mail: marta.lorenzini@iit.it |
|
Arash AjoudaniDirector of the Human-Robot Interfaces and Interaction Laboratory at the Italian Institute of Technologye-mail: arash.ajoudani@iit.it |
Aknowledgement
This work was supported by the Italian Workers’ Compensation Authority INAIL within the VIVA project.