I am a Researcher at Midea Group, AI Lab. My research interests primarily lie in the areas of model compression, multi-modal, and robotics.

I obtained my bachelor of science from the University of Toronto in 2020, majored in Statistics.

🔥 News

  • 2024.04: Our survey paper on Multi-modal safety is accepted at IJCAI Survey Track!
  • 2024.03: We release Mipha, a comprehensive overview of Multimodal Small Language Model. Our Mipha-3B outperforms LLaVA-1.5-13B on multiple benchmarks!
  • 2024.02:  🎉🎉 One papers on MLLM + Robotics have been accepted in CVPR 2024
  • 2024.02:  🎉🎉 Two papers on MLLM + Robotics have been accepted in ICRA 2024
  • 2024.01: :fire: We release a efficient multi-modal models called LLaVA-Phi, building upon Phi-2 and LLaVA.
  • 2024.01: Two papers accepted in AAAI 2024, one paper is presented as Oral (top 1.5%)
  • 2023.12: One paper accepted in ICASSP 2024 as Oral presentation (top 3%)
  • 2023.08: One paper accepted in ECML/PKDD 2024
  • 2023.06: Glad to annouce that our paper LogAnomaly is now the 4th most cited paper in IJCAI2019

đź“ť Selected Publications

† Equal Contributions, ‡ Corresponding Author

IJCAI 2024
sym

Safety of Multimodal Large Language Models on Image and Text, IJCAI 2024 Survey Track

Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao

CVPR 2024
sym

Retrieval-Augmented Embodied Agents, CVPR 2024

Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang

RAG + Robotics with cross-embodiment data.

Arxiv Jan 2024
sym

Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models, Arxiv 2024

Minjie Zhu†, Yichen Zhu†, Xin Liu, Ning Liu, et al.

Analysis on how to make better multimodal small language models.

Arxiv Jan 2024
sym

LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model, Arxiv 2024

Yichen Zhu, Minjie Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang

An early version of multi-modal small language model under 3B.

Arxiv Jan 2024
sym

Query-Relevant Images Jailbreak Large Multi-Modal Models, Arxiv 2024

Xin Liu†, Yichen Zhu†, Yunshi Lan, Chao Yang, Yu Qiao

The first study on jailbreaking in large multi-modal models.

ICRA 2024
sym

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking, ICRA 2024

Minjie Zhu†, Yichen Zhu†, Jinming Li, Junjie Wen, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang

ICRA 2024
sym

Object-Centric Instruction Augmentation for Robotic Manipulation, ICRA 2024

Junjie Wen†, Yichen Zhu†, Minjie Zhu, Jinming Li, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang

AAAI 2024, Oral
sym

Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective, AAAI 2024, Oral (top 1.5%)

Wanying Wang†, Yichen Zhu†, Yirui Zhou, Chaomin Shen, Jian Tang, Zhiyuan Xu, Yaxin Peng, Yangchun Zhang

AAAI 2024
sym

Early Pruning with Self-Distillation for Efficient Model Compression, AAAI 2024

Chen Dong†, Ning Liu†, Yichen Zhu, Fachao Zhang, Dong Liu, Feifei Feng, Jian Tang

ICASSP 2024
sym

[When Training-Free NAS meets Vision Transformer: A Neural Tangent Kernel Perspective, ICASSP 2024, Oral (top 3%)]

Qiqi Zhou, Yichen Zhu‡

CVPR 2023
sym

ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector, CVPR 2023

Yichen Zhu, Qiqi Zhou, Ning Liu, Zhiyuan Xu, Zhicai Ou, Xiaofeng Mou, Jian Tang

NeurIPS 2023
sym

Teach Less, Learn More: On the Undistillable Classes in Knowledge Distillation, NeurIPS 2023, Spotlight (top 3%)

Yichen Zhu, Ning Liu, Zhiyuan Xu, Xin Liu, Weibin Meng, Louis Wang, Zhicai Ou, Jian Tang

ECCV 2022
sym

Label-guided auxiliary training improves 3d object detector, ECCV 2022

Yaomin Huang†, Xinmei Liu†, Yichen Zhu, Zhiyuan Xu, Chaomin Shen, Zhengping Che, Guixu Zhang, Yaxin Peng, Feifei Feng, Jian Tang