Visual Instruction Tuning with Polite Flamingo


During visual instruction tuning of multi-modal LLM, we introduced a multi-modal response rewriter called "Polite Flamingo" to address the degeneration of response politness, which is a typical instance of the "multi-modal alignment tax.

ArXiv Preprint [arXiv]
Delong Chen
PhD Student

PhD Student at HKUST