Computer Vision

Few-shot Classification Guided by Generalization Error Bound

Recently, transfer learning has generated promising performance in few-shot classification by pre-training a backbone network on base …

Fan Liu, Sai Yang, Delong Chen, Huaxi Huang, Jun Zhou

Few-shot Classification Guided by Generalization Error Bound

Visual Instruction Tuning with Polite Flamingo

During visual instruction tuning of multi-modal LLM, we introduced a multi-modal response rewriter called “Polite Flamingo” to address the degeneration of response politness, which is a typical instance of the “multi-modal alignment tax”.

Delong Chen, Jianfeng Liu, Wenliang Dai, Baoyuan Wang. “Visual Instruction Tuning with Polite Flamingo”. In AAAI (2024).

Delong Chen, Jianfeng Liu, Wenliang Dai, Baoyuan Wang

RemoteCLIP: A Vision Language Foundation Model for Remote Sensing

We introduced RemoteCLIP, the first general-purpose vision-language foundation model for remote sensing. RemoteCLIP outperform previous image-text retrieval SoTA by 9.14% mean recall on RSICD dataset and by 8.92% on RSICD dataset. For zero-shot classification, our RemoteCLIP outperform CLIP baseline by up to 6.39% average accuracy on 12 downstream datasets.

Fan Liu, Delong Chen (joint first author), Qingyunguan Zhang et al. “RemoteCLIP: A Vision Language Foundation Model for Remote Sensing”. Arxiv Preprint (2023).

Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Jun Zhou

Few-shot Classification via Ensemble Learning with Multi-Order Statistics

We prove theoretically that leveraging ensemble learning on the base classes can correspondingly reduce the true error in the novel classes. Following this principle, a novel method named Ensemble Learning with Multi-Order Statistics (ELMOS) is proposed in this paper.

Sai Yang, Fan Liu, Delong Chen, Jun Zhou

MEP-3M: A Large-scale Multi-modal E-Commerce Products Dataset

We construct a large-scale Multi-modal E-commerce Products classification dataset MEP-3M, which consists of over 3 million products and 599 fine-grained product categories. Previsouly, the conference version of this paper won IJCAI 2021 LTDL Best Dataset Paper award.

Fan Liu, Delong Chen (joint first author), Xiaoyu Du, et al. “MEP-3M: A Large-scale Multi-modal E-Commerce Products Dataset”. In Pattern Recognition (2023).

Fan Liu, Delong Chen, Xiaoyu Du, Ruizhuo Gao, Feng Xu

A Review of Driver Fatigue Detection and Its Advances on the Use of RGB-D Camera and Deep Learning

In this review, we summarize the latest research findings and analyze the developmental trends of driver fatigue detection. We present the work on integration of RGB-D camera and deep learning, where Generative Adversarial Networks and multi-channel schemes are utilized to enhance the performance. [DOI]

Fan Liu, Delong Chen, Jun Zhou, Feng Xu

Deep Learning Based Single Sample Per Person Face Recognition: A Survey

We present a comprehensive survey on Single Sample Per Person (SSPP) Face Recognition. [DOI]

Fan Liu, Delong Chen, Fei Wang, Zewen Li, Feng Xu

ProtoCLIP: Prototypical Contrastive Language Image Pretraining

This paper proposed ProtoCLIP for improved representation grouping and enhanced robustness against modality gap in large-scale vision language pretraining. ProtoCLIP improved linear probing and zero-shot accuracy by 5.8% and 2.0%, and matched the performance of CLIP with 3×fewer epochs.

Delong Chen, Zhao Wu, Fan Liu, et al. “ProtoCLIP: Prototypical Contrastive Language Image Pretraining” In IEEE Transactions on Neural Networks and Learning Systems, TNNLS (2023).

Delong Chen, Zhao Wu, Fan Liu, Zaiquan Yang, Shaoqiu Zheng, Ying Tan, Erjin Zhou