Multimodal Commonsense Reasoning
Date: 2026/01/15 – 2026/01/15
Academic Seminar: Multimodal Commonsense Reasoning
Speaker: Zhecan James Wang, Postdoctoral Research Fellow in Computer Science at UCLA-NLP group
Time: 9:00 a.m., January 15, 2026 (Beijing Time)
Location: Room 454, Longbin Building
Abstract
In my previous work, I have focused on enabling AI models to achieve human-level commonsense reasoning through two complementary avenues. The first avenue enhances reasoning capabilities by extracting and integrating fine-grained, multimodal knowledge—emphasizing the acquisition of contextual information and its incorporation into complex reasoning processes. The second avenue addresses model reliability from three perspectives: prediction consistency, transparent (or explainable) reasoning steps, and faithful performance in biased or ambiguous scenarios. By leveraging such detailed, multimodal knowledge, AI models can improve their reasoning, robustness, and interpretability, thereby strengthening human trust and understanding in human-AI relationship. Building on these foundations, my future research will continue to advance more generalized and human-centered AI, exploring areas such as real-world learning, multimodal math reasoning, security in reasoning, agent-based learning, embodied learning, interactive learning with human feedback, and AI for science, social good, and beyond.
Biography
I am Zhecan (James) Wang, a Postdoctoral Research Fellow in Computer Science at UCLA-NLP group, where I work under the guidance of Prof. Kai-Wei Chang (張凱崴, Amazon Scholar, Sloan Fellow) and Prof. Nanyun Peng. I earned my Ph.D. in Computer Science from Columbia University (2019-2024), mentored by Prof. Shih-Fu Chang (張世富, Dean of the Engineering School, National Academy of Engineering Fellow). My research focuses on Natural Language Processing, Vision-Language Understanding, Multimodal Reasoning, Neural-Symbolic Learning, Trustworthy and Explainable AI, and Human-Centered AI, with applications extending to applied science and beyond.
During my academic and professional journey, I have contributed significantly to DARPA projects, serving as a principal contributor and coordinator for the Machine Commonsense (MCS) and ECOLE (Environment-driven Conceptual Learning) programs. These roles involved leading collaborative efforts with top institutions, resulting in state-of-the-art performance on benchmarks like VCR, VQA v2, and OKVQA. I also achieved first place in the ICCV Microsoft Global MS-Celeb-1M Challenge, guiding a team of 12 students and the DARPA MCS Visual Commonsense Reasoning leaderboard.
In industry, I have conducted impactful research at Google DeepMind’s Gemini Team under Dr. Quoc V. Le, Microsoft Research under Dr. Lu Yuan, and other esteemed organizations such as MIT Media Lab, Xpeng Motors led by Dr. Yandong Guo (郭彦东, VP@OPPO), NUS LV Lab led Prof. Jiashi Feng (冯佳时) and Panasonic AI Lab. My research contributions include 17 top-tier conference papers (first or co-first author on 10 of them) and 7 workshop papers, 8 AI-related patents (7 U.S.A., 1 China), and 1300+ Google Scholar citations. My collaborations span academia and industry, engaging with 17 professors across 12 institutions. My work has been featured by PaperWeekly, AI2, DARPA, 新智源, and 量子位, earning recognition in both academic and industrial AI communities.