Visual Instruction Tuning with Polite Flamingo

Abstract

During visual instruction tuning of multi-modal LLM, we introduced a multi-modal response rewriter called "Polite Flamingo" to address the degeneration of response politness, which is a typical instance of the "multi-modal alignment tax.

Publication
ArXiv Preprint [arXiv]
Delong Chen
Delong Chen
陈德龙