Modality-invariant and Specific Prompting for Multimodal Human Perception Understanding