Publications
-
arXiv preprints
- Andrew Szot*, Bogdan Mazoure*, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira and Alexander Toshev “From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons”, 2024. PDF
- Enrico Fini*, Mustafa Shukor*, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor Guilherme Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua Susskind and Alaaeldin El-Nouby* “Multimodal Autoregressive Pre-training of Large Vision Encoders”, 2024. PDF / Code
- Haotian Zhang*, Mingfei Gao*, Zhe Gan*, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, Zirui Wang, Afshin Dehghan, Peter Grasch and Yinfei Yang “MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning”, 2024. PDF
- Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moorthy, Jeff Nichols, Yinfei Yang and Zhe Gan “Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms”, 2024. PDF
- Ruohong Zhang, Bowen Zhang, Yanghao Li, Haotian Zhang, Zhiqing Sun, Zhe Gan, Yinfei Yang, Ruoming Pang and Yiming Yang “Improve Vision Language Model Chain-of-thought Reasoning”, 2024. PDF
- Hanrong Ye*, Haotian Zhang*, Erik Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, Jiasen Lu and Yinfei Yang “MM-Ego: Towards Building Egocentric Multimodal LLMs”, 2024. PDF
- Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang and Zhe Gan “Contrastive Localized Language-Image Pre-Training”, 2024. PDF
- Zhengfeng Lai*, Vasileios Saveris*, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Juan Lao Tebar, Wenze Hu, Zhe Gan, Peter Grasch, Meng Cao and Yinfei Yang “Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models”, 2024. PDF
- Mingze Xu*, Mingfei Gao*, Zhe Gan, Hong-You Chen, Zhengfeng Lai, Haiming Gang, Kai Kang and Afshin Dehghan “SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models”, 2024. PDF / Code
- Elmira Amirloo*, Jean-Philippe Fauconnier*, Christoph Roesmann*, Christian Kerl, Rinu Boney, Yusu Qian, Zirui Wang, Afshin Dehghan, Yinfei Yang, Zhe Gan and Peter Grasch “Understanding Alignment in Multimodal LLMs: A Comprehensive Study”, 2024. PDF
- Xiujun Li, Yujie Lu, Zhe Gan, Jianfeng Gao, William Yang Wang and Yejin Choi “Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?”, 2024. PDF / Project page
- Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei Yang and Zhe Gan “MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs”, 2024. PDF / Code
-
2024
- Brandon McKinzie*, Zhe Gan*, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev and Yinfei Yang “MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training”, European Conf. on Computer Vision (ECCV), 2024. PDF
- Yusu Qian, Haotian Zhang, Yinfei Yang and Zhe Gan “How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts”, Neural Information Processing Systems (NeurIPS), Workshop on Safe Generative AI, 2024. PDF / Code
- Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar and Alexander Toshev “Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2024. PDF (Oral)
- Haotian Zhang*, Haoxuan You*, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan and Yinfei Yang “Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models”, Conference on Language Modeling (COLM), 2024. PDF
- Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang and Zhe Gan “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs”, European Conf. on Computer Vision (ECCV), 2024. PDF / Code
- Zhengfeng Lai*, Haotian Zhang*, Bowen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang and Meng Cao “VeCLIP: Improving CLIP Training via Visual-enriched Captions”, European Conf. on Computer Vision (ECCV), 2024. PDF / Code
- Jialian Wu, Jianfeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan and Lijuan Wang “GRiT: A Generative Region-to-text Transformer for Object Understanding”, European Conf. on Computer Vision (ECCV), 2024. PDF / Code
- Haoxuan You*, Haotian Zhang*, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang and Yinfei Yang “Ferret: Refer and Ground Anything Anywhere at Any Granularity”, Int. Conf. Learning Representations (ICLR), 2024. PDF / Code (Spotlight, Top 5% among all submissions)
- Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang and Zhe Gan “Guiding Instruction-based Image Editing via Multimodal Large Language Models”, Int. Conf. Learning Representations (ICLR), 2024. PDF / Code (Spotlight, Top 5% among all submissions)
- Ajay Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang and Yinfei Yang “Compressing LLMs: The Truth is Rarely Pure and Never Simple”, Int. Conf. Learning Representations (ICLR), 2024. PDF / Code
- Wentao Wu*, Aleksei Timofeev*, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan and Yinfei Yang “MOFI: Learning Image Representations from Noisy Entity Annotated Images”, Int. Conf. Learning Representations (ICLR), 2024. PDF / Code
- Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang and Mohit Bansal “Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation”, Computer Vision and Pattern Recognition (CVPR), Workshop on the Evaluation of Generative Foundation Models, 2024. PDF / Project page
-
2023
- Chunyuan Li*, Zhe Gan*, Zhengyuan Yang*, Jianwei Yang*, Linjie Li*, Lijuan Wang and Jianfeng Gao “Multimodal Foundation Models: From Specialists to General-Purpose Assistants”, Foundations and Trends in Computer Graphics and Vision, 2023. PDF (A survey book on multimodal foundation models)
- Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar and Alexander Toshev “Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation”, Neural Information Processing Systems (NeurIPS), Workshop on I Can't Believe It's Not Better, 2023. PDF
- Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal and Lijuan Wang “An Empirical Study of Multimodal Model Merging”, Conf. on Empirical Methods in Natural Language Processing (Findings of EMNLP), 2023. PDF / Code
- Xueyan Zou*, Zi-Yi Dou*, Jianwei Yang*, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee and Jianfeng Gao “Generalized Decoding for Pixel, Image, and Language”, Computer Vision and Pattern Recognition (CVPR), 2023. PDF / Code / Project page
- Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng and Lijuan Wang “ReCo: Region-Controlled Text-to-Image Generation”, Computer Vision and Pattern Recognition (CVPR), 2023. PDF / Code
- Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang and Furu Wei “Non-Contrastive Learning Meets Language-Image Pre-Training”, Computer Vision and Pattern Recognition (CVPR), 2023. PDF
- Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu and Lijuan Wang “LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling”, Computer Vision and Pattern Recognition (CVPR), 2023. PDF / Code
- Tsu-Jui Fu*, Linjie Li*, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang and Zicheng Liu “An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling”, Computer Vision and Pattern Recognition (CVPR), 2023. PDF / Code
- Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Boyd-Graber and Lijuan Wang “Prompting GPT-3 To Be Reliable”, Int. Conf. Learning Representations (ICLR), 2023. PDF / Code
- Bingbing Wen, Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Bill Howe and Lijuan Wang “InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models”, arXiv preprint, 2023. PDF
- Zixin Zhu*, Yixuan Wei*, Jianfeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu and Han Hu “Exploring Discrete Diffusion Models for Image Captioning”, arXiv preprint, 2023. PDF / Code
-
2022
- Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu and Jianfeng Gao “Vision-Language Pre-training: Basics, Recent Advances, and Future Trends”, Foundations and Trends in Computer Graphics and Vision, 2022. PDF / Link (A survey book on vision-language pre-training)
- Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu and Lijuan Wang “GIT: A Generative Image-to-text Transformer for Vision and Language”, Transactions on Machine Learning Research (TMLR), 2022. PDF / Code (Our new multimodal foundation model that achieves 12 new SOTA on a diverse set of image/video captioning and QA tasks)
- Chenfei Wu*, Jian Liang*, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang and Nan Duan “NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis”, Neural Information Processing Systems (NeurIPS), 2022. PDF / Webpage 1 / Webpage 2 / GitHub / Twitter / YouTube
- Zi-Yi Dou*, Aishwarya Kamath*, Zhe Gan*, Pengchuan Zhang, Jianfeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann LeCun, Nanyun Peng, Jianfeng Gao and Lijuan Wang “Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone”, Neural Information Processing Systems (NeurIPS), 2022. PDF / Code / Webpage
- Sheng Shen*, Chunyuan Li*, Xiaowei Hu*, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach and Jianfeng Gao “K-LITE: Learning Transferable Visual Models with External Knowledge”, Neural Information Processing Systems (NeurIPS), 2022. PDF / Code (Oral)
- Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu and Lijuan Wang “UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling”, European Conf. on Computer Vision (ECCV), 2022. PDF / Code (Oral, Top 2.7% among all submissions)
- Tianlong Chen, Yu Cheng, Zhe Gan, Jianfeng Wang, Lijuan Wang, Jingjing Liu and Zhangyang Wang “Adversarial Feature Augmentation and Normalization for Visual Recognition”, Transactions on Machine Learning Research (TMLR), 2022. PDF / Old / Code
- Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun (Violet) Peng, Zicheng Liu and Michael Zeng “An Empirical Study of Training End-to-End Vision-and-Language Transformers”, Computer Vision and Pattern Recognition (CVPR), 2022. PDF / Code
- Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu and Lijuan Wang “Scaling Up Vision-Language Pre-training for Image Captioning”, Computer Vision and Pattern Recognition (CVPR), 2022. PDF / Code
- Kevin Lin*, Linjie Li*, Chung-Ching Lin*, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu and Lijuan Wang “SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning”, Computer Vision and Pattern Recognition (CVPR), 2022. PDF / Code
- Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang and Zicheng Liu “Injecting Semantic Concepts into End-to-End Image Captioning”, Computer Vision and Pattern Recognition (CVPR), 2022. PDF
- Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu and Lijuan Wang “An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA”, Proc. American Association of Artificial Intelligence (AAAI), 2022. PDF / Slides (Oral, Leaderboard #1 on OK-VQA as of Nov. 4, 2021)
- Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang and Zicheng Liu “Playing Lottery Tickets with Vision and Language”, Proc. American Association of Artificial Intelligence (AAAI), 2022. PDF / Slides (Oral)
- Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu and Jingjing Liu “Efficient Robust Training via Backward Smoothing”, Proc. American Association of Artificial Intelligence (AAAI), 2022. PDF
- Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang and Zicheng Liu “VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling”, arXiv preprint, 2022. PDF / Code
- Yixin Nie*, Linjie Li*, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal and Lijuan Wang “MLP Architectures for Vision-and-Language Modeling: An Empirical Study”, arXiv preprint, 2022. PDF
- Jianfeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu and Lijuan Wang “UFO: A UniFied TransfOrmer for Vision-Language Representation Learning”, arXiv preprint, 2022. PDF
-
2021
- Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang and Zhangyang Wang “Chasing Sparsity in Vision Transformers: An End-to-End Exploration”, Neural Information Processing Systems (NeurIPS), 2021. PDF / Code
- Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu and Zhangyang Wang “The Elastic Lottery Ticket Hypothesis”, Neural Information Processing Systems (NeurIPS), 2021. PDF / Code
- Tianlong Chen, Yu Cheng, Zhe Gan, Jingjing Liu and Zhangyang Wang “Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective”, Neural Information Processing Systems (NeurIPS), 2021. PDF / Code
- Boxin Wang*, Chejian Xu*, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah and Bo Li “Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models”, Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2021. PDF / Benchmark and Leaderboard (Oral)
- Linjie Li*, Jie Lei*, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang and Zicheng Liu “VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation”, Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2021. PDF / Starter Code / Leaderboard and Challenge
- Junya Chen, Zhe Gan, Xuan Li, Qing Guo, Liqun Chen, Shuyang Gao, Tagyoung Chung, Yi Xu, Belinda Zeng, Wenlian Lu, Fan Li, Lawrence Carin and Chenyang Tao “Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE”, Neural Information Processing Systems (NeurIPS), Workshop on Self-Supervised Learning, 2021. PDF
- Linjie Li, Jie Lei, Zhe Gan and Jingjing Liu “Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models”, Int. Conf. on Computer Vision (ICCV), 2021. PDF / Dataset (Oral, Top 3% among all submissions)
- Chen Zhu, Yu Cheng, Zhe Gan, Furong Huang, Jingjing Liu and Tom Goldstein “MaxVA: Fast Adaptation of Stepsizes by Maximizing Observed Variance of Gradients”, European Conf. Machine Learning (ECML), 2021. PDF / Code
- Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang and Jingjing Liu “EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets”, Association for Computational Linguistics (ACL), 2021. PDF / Code (Oral)
- Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng and Jingjing Liu “Cluster-Former: Clustering-based Sparse Transformer for Question Answering”, Findings of Association for Computational Linguistics (Findings of ACL), 2021. PDF (Leaderboard #1 on NaturalQuestions as of Sep. 27, 2020)
- Jie Lei*, Linjie Li*, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal and Jingjing Liu “Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling”, Computer Vision and Pattern Recognition (CVPR), 2021. PDF / Code (Oral with 3 Strong Accepts, Best Student Paper Honorable Mention)
- Liqun Chen*, Dong Wang*, Zhe Gan, Jingjing Liu, Ricardo Henao and Lawrence Carin “Wasserstein Contrastive Representation Distillation”, Computer Vision and Pattern Recognition (CVPR), 2021. PDF
- Shuyang Dai, Zhe Gan, Yu Cheng, Chenyang Tao, Lawrence Carin and Jingjing Liu “APo-VAE: Text Generation in Hyperbolic Space”, North American Chapter of the Association for Computational Linguistics (NAACL), 2021. PDF
- Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li and Jingjing Liu “InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective”, Int. Conf. Learning Representations (ICLR), 2021. PDF / Slides / Poster / Code (Leaderboard #1 on Adversarial NLI as of Oct. 9, 2020)
- Siyang Yuan*, Pengyu Cheng*, Ruiyi Zhang, Weituo Hao, Zhe Gan and Lawrence Carin “Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning”, Int. Conf. Learning Representations (ICLR), 2021. PDF
- Yuwei Fang*, Shuohang Wang*, Zhe Gan, Siqi Sun and Jingjing Liu “FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding”, Proc. American Association of Artificial Intelligence (AAAI), 2021. PDF / Code / Slides / Blog (Leaderboard #1 on XTREME and XGLUE as of Sep. 8, 2020)
- Wenhu Chen, Zhe Gan, Linjie Li, Yu Cheng, William Wang and Jingjing Liu “Meta Module Network for Compositional Visual Reasoning”, Winter Conf. on Applications of Computer Vision (WACV), 2021. PDF / Code (Best Student Paper Honorable Mention)
- Luowei Zhou, Jingjing Liu, Yu Cheng, Zhe Gan and Lei Zhang “CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning”, arXiv preprint, 2021. PDF
- Linjie Li, Zhe Gan and Jingjing Liu “A Closer Look at the Robustness of Vision-and-Language Pre-trained Models”, arXiv preprint, 2021. PDF / Slides (SOTA on 7 VQA robustness benchmarks as of April 23, 2021)
- Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu and Chenguang Zhu “Accelerating Real-Time Question Answering via Question Generation”, arXiv preprint, 2021. PDF
- Dong Wang, Yuewei Yang, Chenyang Tao, Zhe Gan, Liqun Chen, Fanjie Kong, Ricardo Henao and Lawrence Carin “Proactive Pseudo-Intervention: Contrastive Learning For Interpretable Vision Models”, arXiv preprint, 2021. PDF
- Minhao Cheng, Zhe Gan, Yu Cheng, Shuohang Wang, Cho-Jui Hsieh and Jingjing Liu “Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization”, OpenReview, 2021. PDF
-
2020
- Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng and Jingjing Liu “Large-Scale Adversarial Training for Vision-and-Language Representation Learning”, Neural Information Processing Systems (NeurIPS), 2020. PDF / Code-I / Code-II / Slides / Poster / Blog (Spotlight) Top 4% among all submissions, SOTA on 6 Vision+Language tasks
- Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang and Jingjing Liu “Contrastive Distillation on Intermediate Representations for Language Model Compression”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2020. PDF / Blog / Code
- Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang and Jingjing Liu “Cross-Thought for Sentence Encoder Pre-training”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2020. PDF / Code
- Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung and Jingjing Liu “Multi-Fact Correction in Abstractive Text Summarization”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2020. PDF / Slides / Blog
- Linjie Li*, Yen-Chun Chen*, Yu Cheng, Zhe Gan, Licheng Yu and Jingjing Liu “HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2020. PDF / Slides / Code / Blog (SOTA on 8 Video+Language datasets, Leaderboard #1 on TVR and TVC as of Sep. 15, 2020)
- Yizhe Zhang*, Guoyin Wang*, Chunyuan Li, Zhe Gan, Chris Brockett and Bill Dolan “POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2020. PDF / Code / Demo
- Yuwei Fang, Siqi Sun, Zhe Gan, Rohit Pillai, Shuohang Wang and Jingjing Liu “Hierarchical Graph Network for Multi-hop Question Answering”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2020. PDF / Code (Leaderboard #1 on HotpotQA as of Dec. 1st, 2019)
- Yu Cheng, Zhe Gan, Yizhe Zhang, Oussama Elachqar, Dianqi Li and Jingjing Liu “Contextual Text Style Transfer”, Findings of Empirical Methods in Natural Language Processing (Findings of EMNLP), 2020. PDF
- Yi Wei, Zhe Gan, Wenbo Li, Siwei Lyu, Ming-Ching Chang, Lei Zhang, Jianfeng Gao and Pengchuan Zhang “MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network”, Asian Conf. on Computer Vision (ACCV), 2020. PDF
- Shuyang Dai, Yu Cheng, Yizhe Zhang, Zhe Gan, Jingjing Liu and Lawrence Carin “Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation”, Asian Conf. on Computer Vision (ACCV), 2020. PDF
- Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen and Jingjing Liu “Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models”, European Conf. on Computer Vision (ECCV), 2020. PDF (Spotlight) Top 5% among all submissions
- Yen-Chun Chen*, Linjie Li*, Licheng Yu*, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng and Jingjing Liu “UNITER: UNiversal Image-TExt Representation Learning”, European Conf. on Computer Vision (ECCV), 2020. PDF / Code (SOTA on 13 Vision+Language Datasets/Tasks, No. 1 on VCR and NLVR2 leaderboards as of Sep. 2019)
- Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu and Jianfeng Gao “Sequential Attention GAN for Interactive Image Editing”, ACM International Conference on Multimedia (ACMMM), 2020. PDF
- Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan and Lawrence Carin “CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information”, Int. Conf. Machine Learning (ICML), 2020. PDF / Code
- Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin and Jingjing Liu “Graph Optimal Transport for Cross-Domain Alignment”, Int. Conf. Machine Learning (ICML), 2020. PDF / Code
- Jiacheng Xu, Zhe Gan, Yu Cheng and Jingjing Liu “Discourse-Aware Neural Extractive Text Summarization”, Association for Computational Linguistics (ACL), 2020. PDF / Blog / Code
- Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu and Jingjing Liu “Distilling Knowledge Learned in BERT for Text Generation”, Association for Computational Linguistics (ACL), 2020. PDF / Slides / Blog / Code
- Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Dinghan Shen, Guoyin Wang, Zheng Wen and Lawrence Carin “Improving Adversarial Text Generation by Modeling the Distant Future”, Association for Computational Linguistics (ACL), 2020. PDF / Old
- Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang and Jingjing Liu “BachGAN: High-Resolution Image Synthesis from Salient Object Layout”, Computer Vision and Pattern Recognition (CVPR), 2020. PDF / Code
- Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang and Jingjing Liu “VIOLIN: A Large-Scale Dataset for Video-and-Language Inference”, Computer Vision and Pattern Recognition (CVPR), 2020. PDF / Code
- Ruiyi Zhang, Changyou Chen, Zhe Gan, Zheng Wen, Wenlin Wang and Lawrence Carin “Nested-Wasserstein Self-Imitation Learning for Sequence Generation”, Artificial Intelligence and Statistics (AISTATS), 2020. PDF / Workshop on Bayesian Deep Learning, NeurIPS 2019
- Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein and Jingjing Liu “FreeLB: Enhanced Adversarial Training for Natural Language Understanding”, Int. Conf. Learning Representations (ICLR), 2020. PDF / Code (Spotlight) Leaderboard #1 on GLUE, ARC Easy/Challenge and Commonsense QA as of Sep. 2019
- Wenlin Wang, Hongteng Xu, Zhe Gan, Bai Li, Guoyin Wang, Liqun Chen, Qian Yang, Wenqi Wang and Lawrence Carin “Graph-Driven Generative Models for Heterogeneous Multi-Task Learning”, Proc. American Association of Artificial Intelligence (AAAI), 2020. PDF / Poster / Slides (Spotlight)
- Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao and Graham Neubig “What Makes A Good Story? Designing Composite Rewards for Visual Storytelling”, Proc. American Association of Artificial Intelligence (AAAI), 2020. PDF / Poster / Code (Spotlight)
-
2019
- Wenlin Wang, Chenyang Tao, Zhe Gan, Guoyin Wang, Liqun Chen, Xinyuan Zhang, Ruiyi Zhang, Qian Yang, Ricardo Henao and Lawrence Carin “Improving Textual Network Learning with Variational Homophilic Embeddings”, Neural Information Processing Systems (NeurIPS), 2019. PDF
- Siqi Sun, Yu Cheng, Zhe Gan and Jingjing Liu “Patient Knowledge Distillation for BERT Model Compression”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2019. PDF / Poster / Code
- Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao and Hongning Wang “Adversarial Domain Adaptation for Machine Reading Comprehension”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2019. PDF / Poster / Code
- Dianqi Li, Yizhe Zhang, Zhe Gan, Yu Cheng, Chris Brockett, Ming-Ting Sun and Bill Dolan “Domain Adaptive Text Style Transfer”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2019. PDF / Code
- Ming Jiang, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhang, Zhe Gan, Jana Diesner and Jianfeng Gao “TIGEr: Text-to-Image Grounding for Image Caption Evaluation”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2019. PDF / Poster / Code
- Linjie Li, Zhe Gan, Yu Cheng and Jingjing Liu “Relation-Aware Graph Attention Network for Visual Question Answering”, Int. Conf. on Computer Vision (ICCV), 2019. PDF / Supp / Poster / Code
- Zhe Gan, Yu Cheng, Ahmed El Kholy, Linjie Li, Jingjing Liu and Jianfeng Gao “Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog”, Association for Computational Linguistics (ACL), 2019. PDF / Poster
- Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi and Siddhartha Srinivasa “Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation”, Computer Vision and Pattern Recognition (CVPR), 2019. PDF / YouTube / Code (Oral)
- Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson and Jianfeng Gao “StoryGAN: A Sequential Conditional GAN for Story Visualization”, Computer Vision and Pattern Recognition (CVPR), 2019. PDF / Slides / Poster / Code
- Wenlin Wang, Zhe Gan, Hongteng Xu, Ruiyi Zhang, Guoyin Wang, Dinghan Shen, Changyou Chen and Lawrence Carin “Topic-Guided Variational Autoencoders for Text Generation”, North American Chapter of the Association for Computational Linguistics (NAACL), 2019. PDF (Oral)
- Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen and Lawrence Carin “Improving Sequence-to-Sequence Learning via Optimal Transport”, Int. Conf. Learning Representations (ICLR), 2019. PDF / Code
- Qiuyuan Huang*, Zhe Gan*, Asli Celikyilmaz, Dapeng Wu, Jianfeng Wang and Xiaodong He “Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation”, Proc. American Association of Artificial Intelligence (AAAI), 2019. PDF (Spotlight)
-
2018
- Yizhe Zhang, Michel Galley, Jianfeng Gao, Zhe Gan, Xiujun Li, Chris Brockett and Bill Dolan “Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization”, Neural Information Processing Systems (NeurIPS), 2018. PDF / Blog / Code
- Liqun Chen, Shuyang Dai, Chenyang Tao, Dinghan Shen, Zhe Gan, Haichao Zhang, Yizhe Zhang and Lawrence Carin “Adversarial Text Generation via Feature-Mover's Distance”, Neural Information Processing Systems (NeurIPS), 2018. PDF / Code
- Xinyuan Zhang, Ricardo Henao, Zhe Gan, Yitong Li and Lawrence Carin “Multi-Label Learning from Medical Plain Text with Convolutional Residual Models”, Machine Learning for Healthcare (MLHC), 2018. PDF (Spotlight)
- Yunchen Pu, Shuyang Dai, Zhe Gan, Weiyao Wang, Guoyin Wang, Yizhe Zhang, Ricardo Henao and Lawrence Carin “JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets”, Int. Conf. Machine Learning (ICML), 2018. PDF / Supp / Poster / Slides / Code
- Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang and Xiaodong He “AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks”, Computer Vision and Pattern Recognition (CVPR), 2018. PDF / Poster / Code / Blog
- Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh and Lawrence Carin “Topic Compositional Neural Language Model”, Artificial Intelligence and Statistics (AISTATS), 2018. PDF
- Yunchen Pu, Martin Renqiang Min, Zhe Gan and Lawrence Carin “Adaptive Feature Abstraction for Translating Video to Text”, Proc. American Association of Artificial Intelligence (AAAI), 2018. PDF
-
2017
- Zhe Gan*, Liqun Chen*, Weiyao Wang, Yunchen Pu, Yizhe Zhang, Hao Liu, Chunyuan Li and Lawrence Carin “Triangle Generative Adversarial Networks”, Neural Information Processing Systems (NeurIPS), 2017. PDF / Poster / Code
- Yunchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li and Lawrence Carin “Adversarial Symmetric Variational Autoencoder”, Neural Information Processing Systems (NeurIPS), 2017. PDF
- Yunchen Pu, Zhe Gan, Ricardo Henao, Chunyuan Li, Shaobo Han and Lawrence Carin “VAE Learning via Stein Variational Gradient Descent”, Neural Information Processing Systems (NeurIPS), 2017. PDF
- Yizhe Zhang, Dinghan Shen, Guoying Wang, Zhe Gan, Ricardo Henao and Lawrence Carin “Deconvolutional Paragraph Representation Learning”, Neural Information Processing Systems (NeurIPS), 2017. PDF / Code
- Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He and Lawrence Carin “Learning Generic Sentence Representations Using Convolutional Neural Networks”, Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2017. PDF / Slides / Code (Oral)
- Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen and Lawrence Carin “Adversarial Feature Matching for Text Generation”, Int. Conf. Machine Learning (ICML), 2017. PDF / Supp / Slides / Poster / Code
- Yizhe Zhang, Changyou Chen, Zhe Gan, Ricardo Henao and Lawrence Carin “Stochastic Gradient Monomial Gamma Sampler”, Int. Conf. Machine Learning (ICML), 2017. PDF / Supp / Slides / Poster / Code
- Zhe Gan*, Chunyuan Li*, Changyou Chen, Yunchen Pu, Qinliang Su and Lawrence Carin “Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling”, Association for Computational Linguistics (ACL), 2017. PDF / Supp / Slides / Code (Oral)
- Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin and Li Deng “Semantic Compositional Networks for Visual Captioning”, Computer Vision and Pattern Recognition (CVPR), 2017. PDF / Slides / Slides2 / Poster / Poster2 / Video / Code (Spotlight)
- Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao and Li Deng “StyleNet: Generating Attractive Visual Captions with Styles”, Computer Vision and Pattern Recognition (CVPR), 2017. PDF / Data
- Zhe Gan, P.D. Singh, Ameet Joshi, Xiaodong He, Jianshu Chen, Jianfeng Gao and Li Deng “Character-level Deep Conflation for Business Data Analytics”, Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017. PDF / Slides / Code
- Yin Xian, Yunchen Pu, Zhe Gan, Liang Lu and Andrew Thompson “Adaptive DCTNet for Audio Signal Classification”, Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017. PDF
- Qinliang Su, Xuejun Liao, Chunyuan Li, Zhe Gan and Lawrence Carin “Unsupervised Learning with Truncated Gaussian Graphical Models”, Proc. American Association of Artificial Intelligence (AAAI), 2017. PDF (Oral)
-
2016
- Yizhe Zhang, Zhe Gan and Lawrence Carin “Generating Text via Adversarial Training”, Workshop on Adversarial Training, NeurIPS 2016. PDF / Code
- Yin Xian, Yunchen Pu, Zhe Gan, Liang Lu and Andrew Thompson “Modified DCTNet for Audio Signals Classification”, Journal of the Acoustical Society of America, 2016. Link
- Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens and Lawrence Carin “Variational Autoencoder for Deep Learning of Images, Labels and Captions”, Neural Information Processing Systems (NeurIPS), 2016. PDF / Poster
- Jiaming Song, Zhe Gan and Lawrence Carin “Factored Temporal Sigmoid Belief Networks for Sequence Learning”, Int. Conf. Machine Learning (ICML), 2016. PDF / Supp / Poster / Slides
- Chunyuan Li, Andrew Stevens, Changyou Chen, Yunchen Pu, Zhe Gan and Lawrence Carin "Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification", Computer Vision and Pattern Recognition (CVPR), 2016. PDF / Supp / Poster / Slides (Spotlight)
- Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li and Lawrence Carin “Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization”, Artificial Intelligence and Statistics (AISTATS), 2016. PDF / Poster / Slides / Code (Oral)
-
2015
- Ricardo Henao, Zhe Gan, James Lu and Lawrence Carin "Deep Poisson Factor Modeling", Neural Information Processing Systems (NeurIPS), 2015. PDF / Supp / Poster / Code
- Zhe Gan, Chunyuan Li, Ricardo Henao, David Carlson and Lawrence Carin "Deep Temporal Sigmoid Belief Networks for Sequence Modeling", Neural Information Processing Systems (NeurIPS), 2015. PDF / Poster / Slides / Code
- Zhe Gan, Changyou Chen, Ricardo Henao, David Carlson and Lawrence Carin "Scalable Deep Poisson Factor Analysis for Topic Modeling", Int. Conf. Machine Learning (ICML), 2015. PDF / Supp / Poster / Slides / Code
- Zhe Gan, Ricardo Henao, David Carlson and Lawrence Carin "Learning Deep Sigmoid Belief Networks with Data Augmentation", Artificial Intelligence and Statistics (AISTATS), 2015. PDF / Supp / Poster / Code
- Zhe Gan, Xin Yuan, Ricardo Henao, Ephraim Tsalik and Lawrence Carin "Inference of Gene Networks Associated with the Host Response to Infectious Disease", Chapter 13 of the Book "Big Data Over Networks". Cambridge University Press. In Press. Link / PDF / Code
Tutorial and Workshop
- Chunyuan Li, Zhe Gan, Haotian Zhang, Jianwei Yang, Linjie Li, Zhengyuan Yang, Kevin Lin, Jianfeng Gao and Lijuan Wang "Recent Advances in Vision Foundation Models", Computer Vision and Pattern Recognition (CVPR), 2024. Tutorial Website
- Linjie Li, Zhe Gan, Chunyuan Li, Jianwei Yang and Zhengyuan Yang "Recent Advances in Vision Foundation Models", Computer Vision and Pattern Recognition (CVPR), 2023. Tutorial Website
- Zhe Gan, Linjie Li, Chunyuan Li, Jianwei Yang, Pengchuan Zhang, Lijuan Wang, Zicheng Liu and Jianfeng Gao "Recent Advances in Vision-and-Language Pre-training", Computer Vision and Pattern Recognition (CVPR), 2022. Tutorial Website
- Man Luo, Tejas Gokhale, Zhiyuan Fang, Pratyay Banerjee, Yezhou Yang, Chitta Baral, Damien Teney, Zhe Gan, Kenneth Marino, TianLu Wang and Somak Aditya "O-DRUM: Workshop on Open-Domain Retrieval Under a Multi-Modal Setting", Computer Vision and Pattern Recognition (CVPR), 2022. Workshop Website
- Zhe Gan, Chunyuan Li, Jianwei Yang and Pengchuan Zhang "Microsoft Vision+Language Summer Talk Series", 2021. MSR Website / YouTube / Bilibili
- Peter Anderson, Yoav Artzi, Zhe Gan, Xiaodong He, Linjie Li, Jingjing Liu, Xin (Eric) Wang, Qi Wu, and Luowei Zhou "From VQA to VLN: Recent Advances in Vision-and-Language Research", Computer Vision and Pattern Recognition (CVPR), 2021. Tutorial Website
- Zhe Gan, Licheng Yu, Yu Cheng, Luowei Zhou, Linjie Li, Yen-Chun Chen, Jingjing Liu and Xiaodong He "Recent Advances in Vision-and-Language Research", Computer Vision and Pattern Recognition (CVPR), 2020. Tutorial Website
- Peter Knees and Zhe Gan "The ACM Multimedia 2020 Interactive Arts Exhibition". Website
PhD Dissertation
- Zhe Gan "Deep Generative Models for Vision and Language Intelligence", Duke University. PDF
© December 2024 Zhe Gan