1 | Shengcao Cao

Aligning Large Multimodal Models with Factually Augmented RLHF

Learn large multimodal models with RLHF augmented by factual information to reduce hallucination.

Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

Propose a multi-modal learning framework for 3D shapes with two learning stages and three unified adapter modules.

Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

Learn to segment visual entities and their parts in an open-world with pure self-supervision.

Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

Develop a fully self-supervised approach to object detection which understands part-to-whole composition and needs much less training costs.

Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

Distill knowledge from an automatically designed sequence of teachers into a lightweight student object detector to mitigate the teacher-student capacity gap.

Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yu-Xiong Wang, Liang-Yan Gui

Contrastive Mean Teacher for Domain Adaptive Object Detectors

Unify contrastive learning for representation learning and Mean Teacher for domain adaptation into one general-purpose framework for learning unsupervised domain adaptive object detectors.

Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

Rethinking Transformer-Based Set Prediction for Object Detection

Analyze the slow convergence of DEtection TRansformer (DETR), and propose alternative solutions to improve convergence and performance of DETR.

Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris M. Kitani

Neighborhood-Aware Neural Architecture Search

Propose a novel neural architecture search formulation to encourage flatness in the architecture space, which improves generalization of searched architectures.

Xiaofang Wang, Shengcao Cao, Mengtian Li, Kris M. Kitani

Learnable Embedding Space for Efficient Neural Architecture Compression

Search compressed network architectures in a learnable embedding space via Bayesian optimization, and find compressed networks outperforming hand-crafted architectures.

Shengcao Cao, Xiaofang Wang, Kris M. Kitani