Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection