Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference