dialpad Vision and Language & Multi-modal learning:Zero/few-shot learning, representation learning, continual learning.
Visual-question answering, crossmodal retrieval, multi-hop reasoning.
directions_run Synthetic data generation for compositionality and privacy protection:Simulated environments to provide a safe, controlled setting where agents can learn.
Virtual playgrounds that allow systems to experience and interact within the 3D space.
high_quality Dynamic evaluations and real-world applications:Data distribution and mitigation of spurious correlations.
Assessing the performance and effectiveness of models under varying conditions.
![]() |
Can Hallucination Correction Improve Video-Language Alignment?
Lingjun Zhao, Mingyang Xie, Paola Cascante-Bonilla, Hal Daumé III, Kwonjoon Lee. February 2025. [bibtex] |
![]() |
Natural Language Inference Improves Compositionality in Vision-Language Models.
Paola Cascante-Bonilla, Yu Hou, Yang Trista Cao, Hal Daumé III, Rachel Rudinger. The Thirteenth International Conference on Learning Representations. ICLR 2025. Singapore. April 2025. [project page] [bibtex] |