Multimodal Machine learning is one of the fastest growing areas of machine learning. Often seen as one of the holy grails of AI, it is concerned with joint modeling of multiple modalities to better model natural phenomena. My research in this area includes finding new theoretical justification of multimodal learning, as well as novel empirical methods for fusion, alignment, and co-learning. 

  • Theoretical and Empirical Foundations of Multimodality

  • Neural Models of Fusion

  • Structured Prediction

  • Fast Stochastic Inference Models under Uncertainty

  • Multimodal Active Learning, Meta-Learning and Reinforcement Learning

Machine Learning and Multimodality


Allowing neural models to build priors and beliefs from natural interactions is among core challenges in AI. Communication, reasoning and interaction with humans and environment require in-depth studies of computational models, as well as novel resources.

  • Multimodal Language Modeling

  • Artificial Social Intelligence

  • Multimodal Casual Prediction

  • Multimodal Commonsense

Multimodal Communication and Reasoning

Screen Shot 2017-10-25 at 6.33.54 PM.png

Multimodal sensing is an essential part of AI. From advanced neural models, to explainable statistics, multimodal sensing bridges the gap between the realm of AI and the real world. My research in this area is focused on visual sensing of human face and body, recognition of auditory cues, as well as social perceptions. A particular area of my research is exporting such technology to low-resource robots and environments. 

  • Face and Body Sensing

  • Audio Source Separation and Localization

  • Sentiment, Emotion and Personality Recognition

  • Embedded Sensing

Multimodal Sensing