Agent S: An Open Agentic Framework that Uses Computers Like a HumanSaaket Agashe*, Jiuzhou Han*, Shuyu Gan, Jiachen Yang, Ang Li, Xin Eric WangICLR 2025
Multimodal Situational SafetyKaiwen Zhou*, Chengzhi Liu*, Xuandong Zhao, Anderson Compalas, Dawn Song, Xin Eric WangICLR 2025
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in VideosXuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric WangICLR 2025
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout EditingKaizhi Zheng, Xiaotong Chen, Xuehai He, Jing Gu, Linjie Li, Zhengyuan Yang, Kevin Lin, Jianfeng Wang, Lijuan Wang, Xin Eric WangICLR 2025
LLM-Coordination: Evaluating and Analyzing Multi-Agent Coordination Abilities in Large Language ModelsSaaket Agashe, Yue Fan, Anthony Reyna, Xin Eric WangFindings of NAACL 2025
2024
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens GroundingYue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric WangEMNLP 2024
Active Listening: Personalized Question Generation in Open-Domain Social Conversation with User Model Based PromptingKevin Bowden, Yue Fan, Winsom Chen, Wen Cui, Davan Harrison, Xin Eric Wang, Marilyn WalkerFindings of EMNLP 2024
Multimodal Procedural Planning via Dual Text-Image PromptingYujie Lu, Pan Li, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang WangFindings of EMNLP 2024
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image GenerationXuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A Sigurdsson, Nanyun Peng, Xin Eric WangTransactions on Machine Learning Research (TMLR) 2024
Discfusion: Discriminative Diffusion Models as Few-shot Vision and Language LearnersXuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric WangTransactions on Machine Learning Research (TMLR) 2024
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual EditingJing Gu, Yilin Wang, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric WangECCV 2024
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language ModelsGengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi WuECCV 2024
Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQAYue Fan, Jing Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Xinze Guan, Xin Eric WangACL 2024
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language ModelsKaiwen Zhou, Kwonjoon Lee, Teruhisa Mitsu, Xin Eric WangACL 2024
Navigation as Attackers Wish? Towards Building Byzantine-Robust Embodied Agents under Federated LearningYunchao Zhang, Zonglin Di, Kaiwen Zhou, Cihang Xie, Xin Eric WangNAACL 2024
ComCLIP: Training-Free Compositional Image and Text MatchingKenan Jiang*, Xuehai He*, Ruize Xu, Xin Eric WangNAACL 2024
2023
Photoswap: Personalized Subject Swapping in ImagesJing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric WangNeurIPS 2023
LayoutGPT: Compositional Visual Planning and Generation with Large Language ModelsWeixi Feng*, Wanrong Zhu*, Tsu-jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang WangNeurIPS 2023
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis EvaluationYujie Lu, Xianjun Yang, Xiujun Li, Xin Eric Wang, William Yang WangNeurIPS 2023
R2H: Building Multimodal Navigation Helpers that Respond to Help RequestsYue Fan, Jing Gu, Kaizhi Zheng, Xin Eric WangEMNLP 2023
Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image GenerationWanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang WangEMNLP 2023
Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based AlignmentZhen Zhang, Jialu Wang, Xin Eric WangFindings of EMNLP 2023
Aerial Vision-and-Dialog NavigationYue Fan, Winson Chen, Tongzhou Jiang, Chun Zhou, Yi Zhang, Xin Eric WangFindings of ACL 2023
T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image GenerationJialu Wang, Xinyue Gabby Liu, Zonglin Di, Yang Liu, Xin Eric WangFindings of ACL 2023
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object NavigationKaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, Xin Eric WangICML 2023
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image SynthesisWeixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang WangICLR 2023
Neuro-Symbolic Procedural Planning with Commonsense PromptingYujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang WangICLR 2023
Multimodal Graph Transformer for Multimodal Question AnsweringXuehai He, Xin Eric WangEACL 2023
Visualize Before You Write: Imagination-Guided Open-Ended Text GenerationWanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang WangEACL 2023
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language GenerationWanrong Zhu, Xin Eric Wang, An Yan, Miguel Eckstein, William Yang WangEACL 2023
Parameter-efficient Model Adaptation for Vision TransformersXuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric WangAAAI 2023
Athena 3.0: Personalized Multimodal Chatbot with Neuro-symbolic Dialogue GeneratorsYue Fan, Kevin K. Bowden, Wen Cui, Winson Chen, Vrindavan Harrison, Angela Ramirez, Saaket Agashe, Xinyue Gabby Liu, Neha Pullabhotla, Nan Qiang, Jeshwanth Bheemanpally, Sugam Garg, Marilyn Walker, Xin Eric WangAlexa Prize SocialBot Grand Challenge 5 Proceedings 2023
SlugJARVIS: Multimodal Commonsense Knowledge-based Embodied AI for SimBot ChallengeJing Gu, Kaizhi Zheng, Kaiwen Zhou, Yue Fan, Xuehai He, Jialu Wang, Zonglin Di, Xin Eric WangAlexa Prize SimBot Challenge Proceedings 2023
2022
CPL: Counterfactual Prompt Learning for Vision and Language ModelsXuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric WangEMNLP 2022
VLMbench: A Compositional Benchmark for Vision-and-Language ManipulationKaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, Xin Eric WangNeurlPS 2022
FedVLN: Privacy-preserving Federated Vision-and-Language NavigationKaiwen Zhou, Xin Eric WangECCV 2022
Language-Driven Artistic Style TransferTsu-Jui Fu, Xin Eric Wang, William Yang WangECCV 2022
Understanding Instance-Level Impact of Fairness Constraints Jialu Wang, Xin Eric Wang, Yang Liu ICML 2022
Imagination-Augmented Natural Language UnderstandingYujie Lu, Wanrong Zhu, Xin Eric Wang, Miguel Eckstein, William Yang WangNAACL 2022
Diagnosing Vision-and-Language Navigation: What Really MattersWanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang WangNAACL 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence LearningJuncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric WangCVPR 2022
M3L: Language-based Video Editing via Multi-Modal Multi-Level TransformerTsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, William Yang WangCVPR 2022
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future DirectionsJing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric WangACL 2022
Assessing Multilingual Fairness in Pretrained Multimodal RepresentationsJialu Wang, Yang Liu, Xin Eric WangFindings of ACL 2022
Interpretable Research Replication Prediction via Variational Contextual Consistency Sentence MaskingTianyi Luo, Rui Meng, Xin Eric Wang, Yang LiuFindings of ACL 2022
2021
Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image SearchJialu Wang, Yang Liu, Xin Eric WangEMNLP 2021
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang,
William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng LiuNeurIPS 2021
Visual Question Rewriting for Increasing Response RateJiayi Wei, Xilian Li, Yi Zhang, Xin Eric WangSIGIR 2021
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang EACL 2021
L2C: Describing Visual Differences Needs Semantic Understanding of Individuals An Yan, Xin Eric Wang, Tsu-Jui Fu, William Yang Wang EACL 2021
2020
Closing the Loop Between Language and Vision for Embodied AgentsXin WangUC Santa Barbara
SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, William Yang WangEMNLP 2020
Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations Wanrong Zhu☆, Xin Eric Wang, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang EMNLP 2020
Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation Jiannan Xiang☆, Xin Eric Wang, William Yang Wang Findings of EMNLP 2020
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation Xin Eric Wang*, Vihan Jain*, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi ECCV 2020
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang ECCV 2020
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation Juncheng Li, Xin Wang, Siliang Tang, Haizhou Shi, Fei Wu, Yueting Zhuang, William Yang Wang CVPR 2020
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton van den Hengel CVPR 2020
Vision-Language Navigation Policy Learning and Adaptation Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs Pengda Qin, Xin Wang, Wenhu Chen, Chunyun Zhang, Weiran Xu, William Yang Wang AAAI 2020
2019
TIGEr: Text-to-Image Grounding for Image Caption Evaluation Ming Jiang, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhang, Zhe Gan, Jana Diesner, Jianfeng Gao EMNLP-IJCNLP 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Xin Wang*, Jiawei Wu*, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang ICCV 2019
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang CVPR 2019
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis CVPR 2019
Self-Supervised Dialogue Learning Jiawei Wu, Xin Wang, William Yang Wang ACL 2019
Self-Supervised Learning for Contextualized Extractive Summarization Hong Wang, Xin Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, William Yang Wang ACL 2019
Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models Dinghan Shen, Asli Celikyilmaz, Yizhe Zhang, Liqun Chen, Xin Wang, Jianfeng Gao, Lawrence Carin ACL 2019
Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation Jiawei Wu, Xin Wang, William Yang Wang NAACL 2019
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning Xin Wang, Jiawei Wu, Da Zhang, Yu Su, William Yang Wang AAAI 2019
2018
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation Xin Wang*, Wenhan Xiong*, Hongmin Wang, William Yang Wang ECCV 2018
XL-NBT: A Cross-lingual Neural Belief Tracking Framework Wenhu Chen, Jianshu Chen, Yu Su, Xin Wang, Dong Yu, Xifeng Yan, William Yang Wang EMNLP 2018
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling Xin Wang*, Wenhu Chen*, Yuan-Fang Wang, William Yang Wang ACL 2018
S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Network Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang BMVC 2018
Video Captioning via Hierarchical Reinforcement Learning Xin Wang, Wenhu Chen, Jiawei Wu, Yuan-Fang Wang, William Yang Wang CVPR 2018
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning Xin Wang, Yuan-Fang Wang, William Yang Wang NAACL-HLT 2018
2017
Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer Xin Wang, Geoffrey Oxholm, Da Zhang, Yuan-Fang Wang CVPR 2017
Deep Reinforcement Learning for Visual Object Tracking in Videos Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang Tech report 2017