DMSC: Dynamic Multi-Scale Coordination Framework for Time Series Forecasting


Abstract

Time Series Forecasting (TSF) faces persistent challenges in modeling intricate temporal dependencies across different scales. Despite recent advances leveraging different decomposition operations and novel architectures based on CNN, MLP or Transformer, existing methods still struggle with static decomposition strategies, fragmented dependency modeling, and inflexible fusion mechanisms, limiting their ability to model intricate temporal dependencies. To explicitly solve the mentioned three problems respectively, we propose a novel Dynamic Multi-Scale Coordination Framework (DMSC) with Multi-Scale Patch Decomposition block (EMPD), Triad Interaction Block (TIB) and Adaptive Scale Routing MoE block (ASR-MoE). Specifically, EMPD is designed as a built-in component to dynamically segment sequences into hierarchical patches with exponentially scaled granularities, eliminating predefined scale constraints through input-adaptive patch adjustment. TIB then jointly models intra-patch, inter-patch, and cross-variable dependencies within each layer’s decomposed representations. EMPD and TIB are jointly integrated into layers forming a multi-layer progressive cascade architecture, where coarse-grained representations from earlier layers adaptively guide fine-grained feature extraction in subsequent layers via gated pathways. And ASR-MoE dynamically fuses multi-scale predictions by leveraging specialized global and local experts with temporal-aware weighting. Comprehensive experiments on thirteen real-world benchmarks demonstrate that DMSC consistently maintains state-of-the-art (SOTA) performance and superior computational efficiency for TSF tasks.

1 Introduction↩︎

Time Series Forecasting (TSF) constitutes a pivotal capability across numerous fields such as energy consumption [1], [2], healthcare monitoring [3], [4], transportation scheduling [5], [6], weather forecasting [7], [8] and economics [9], [10]. Time series data exhibit intricate temporal dependencies, which include nonlinear relationships, multi-scale patterns (e.g., trends and seasonality), and dynamic variable couplings [11]. Due to the inherent non-stationarity [12] and complex interconnections [13], traditional approaches often fail to adequately capture the underlying patterns in time series data, making effectively dependency modeling a critical challenge for enhancing predictive accuracy [14]. In recent years, deep learning has demonstrated a strong capability to capture complex dependency relationships in time series data, making it a formidable tool for TSF [15].

However, intricate dependencies in time series inherently manifest across multiscales: coarser granularities typically encapsulate long-term trends while finer resolutions capture short-term fluctuations and periodic patterns. Conventional single-scale modeling approaches often fail to balance fine-grained local details and holistic global trends [16]. To effectively model multi-scale temporal dependencies, recent research has pioneered multi-scale modeling frameworks that concurrently extract global and local patterns, thereby thereby mitigating the representational constraints inherent in single-scale approaches. While previous implementations often employ moving average downsampling [17], [18], which remains insufficient in capturing local semantic information. In contrast, more effective alternatives focus on patch-wise and variable-wise decomposition operations that decompose sequence across feature and temporal dimensions. Such decomposition operations better preserve both global trends and local semantics, and multi-length patches inherently facilitate the learning of multi-scale representations. Critically, prevailing methods suffer from two limitations: 1) reliance on fixed-scale decomposition strategies that lack dynamic scale adaptation; 2) fragmented processing of temporal dependencies and cross-variable interactions, which compromises holistic dependency modeling.

The primary advantage of multi-scale feature extraction lies in its capacity to mine complementary information across diverse temporal granularities and hierarchical levels, thereby overcoming the limitations of single-scale modeling. Following feature capture, the effective design of prediction heads for multi-scale fusion is equally important. Conventional approaches typically employ linear projection layers [19], additive combinations of multiple projections [17], or spectral amplitude-weighted aggregation [20]. However, these simplistic fusion methods exhibit fundamental inadequacies: they fail to leverage heterogeneous dominance patterns across multiscales, disregard dynamic inter-scale importance, and incur quadratic growth of model parameters with increasing input lengths and scale counts, thereby severely compromising deployment efficiency [11].

To effectively modeling complex temporal dependencies which inherently manifest across multiple patterns and scales, we propose a Dynamic Multi-Scale Coordination framework (DMSC) for TSF. This framework dynamically decomposes time series and extracts features across scales. It effectively captures intricate dependencies across temporal resolutions and cross-variable interactions. Meanwhile, it can generate adaptive predictions based on learned representations via temporal-aware fusion. Specifically, DMSC processes intricate dependencies through a multi-layer progressive architecture and an Adaptive Scale Routing Mixture-of-Experts (ASR-MoE). The multi-layer progressive cascade architecture incorporates Embedded Multi-Scale Patch Decomposition Block (EMPD), Triad Interaction Block (TIB), in which EMPD transforms the input into 3D representations at varying granularities and TIB models heterogeneous dependencies within each layer. In this architecture, coarse-grained representations from shallow layers adaptively guide fine-grained feature extraction in subsequent layers via gated pathways. After multi-scale feature extraction, ASR-MoE performs temporal-aware weighted dynamic fusion for multi-scale prediction, thus enabling adaptive integration of cross-scale features to enhancing forecasting accuracy. DMSC formulates a unified framework that incorporates dynamic patch decomposition, deeper models, and sparse MoE principles to achieve full-spectrum multi-scale coordination. This dynamic synergy enables the DMSC framework to achieve SOTA performance across 13 real-world benchmarks while maintaining high efficiency and low computational cost. Our contributions are as follows:

  • A novel multi-layer progressive cascade architecture is designed to jointly integrate Embedded Multi-Scale Patch Decomposition Block (EMPD) and Triad Interaction Block (TIB) across layers. Unlike fixed-scale approaches, EMPD dynamically adjusts patch granularities by employing a lightweight network based on the temporal characteristics of input data, and TIB jointly models intra-patch, inter-patch, and inter-variable dependencies through gated feature fusion to form a coarse-to-fine feature pyramid, where coarse-grained representations from earlier layers adaptively guide fine-grained extraction in subsequent layers via gated residual pathways.

  • Adaptive Scale Routing Mixture-of-Experts (ASR-MoE) is proposed to resolve static fusion limitations in prediction. It establishes a hierarchical expert architecture to explicitly decouples long-term and short-term dependencies, in which global-shared experts capture common long-term temporal dependencies, while local-specialized experts model different short-term variations. Significantly, a temporal-aware weighting aggregator is designed to dynamically compute scale-specific prediction contributions with historical memory.

  • Extensive experiments demonstrate that DMSC achieves SOTA performance and superior efficiency across multiple TSF benchmarks.

2 Related Work↩︎

2.1 Time Series Forecasting Models↩︎

Existing deep models for TSF tasks can be broadly categorized into MLP-based, CNN-based, and Transformer-based architectures [21]. MLP-based models [22][24] typically capture temporal dependencies through predefined decomposition and fully-connected layers. While these models demonstrate efficiency in capturing intra-series patterns, they exhibit notable limitations in modeling long-term dependencies and complex inter-series relationships. In contrast, CNN-based models [25][27] leverage different convolutions to effectively capture local dependencies. Although they remain constrained in modeling global relationships, recent efforts have mitigated this through large-kernel convolutions [28] and frequency-domain analysis [20]. Nevertheless, Transformer-based models [29][31] have emerged as a powerful paradigm, leveraging self-attention mechanisms [32] to effectively capture long-range dependencies and persistent temporal relationships. However, Transformer-based models face substantial criticism due to their permutation invariance and quadratic computational complexity limitations in recent researches [33], [34]. Despite significant progress, most existing deep architectures rely on single or fixed decomposition strategies, and often model dependencies in a fragmented manner, lacking mechanisms for dynamic, adaptive multi-scale decomposition and coordinated dependency modeling.

2.2 Decomposition Operations in TSF↩︎

To effectively capture intricate temporal dependencies, numerous studies have adopted a multi-scale modeling perspective, employing diverse decomposition operations on time series. TimesNet [20] learns frequency-adaptive periods in the frequency domain and transforms series into 2D tensors, explicitly modeling intra-period and inter-period variations. TimeMixer [17] decomposes sequences into seasonal and trend components [30], then applies mixing strategies at different granularities to integrate multi-scale features. However, simply using moving average inadequately preserves local semantic patterns. Diverging from these operations, several methods focus on variable-wise and patch-wise disentanglement. PatchTST [35] segments series into subseries-level patches as input tokens and employs channel independence to capture local semantics. ITransformer [36] embeds entire series as single tokens to model extended global representations. TimeXer [37] hierarchically represents variables through dual disentanglement, variable-wise for cross-channel interactions and patch-wise for endogenous variables. However, these decomposition techniques are constrained by static or predefined scale settings, hindering their adaptability to complex temporal patterns.

Figure 1: The framework of DMSC. EMPD dynamically segments input series into hierarchical patches, TIB jointly models three dependencies, and ASR-MoE fuses multi-scale predictions adaptively.

2.3 Multi-Scale Prediction Operations in TSF↩︎

Beyond multi-scale decomposition operations, researchers have also investigated multi-scale operations in prediction heads. TimeMixer [17] employs dedicated projectors to generate scale-specific predictions. Diverging from simplistic additive aggregation, TimeMixer++ [16] utilizes frequency-domain amplitude analysis to perform weighted aggregation of projector outputs. TimeKAN [18] adopts frequency transformation and padding operations to unify multi-scale representations into identical dimensions, thereby enabling holistic scale integration. Current multi-scale fusion mechanisms predominantly rely on static weighting or fixed designs. They fail to dynamically prioritize the importance of different scales with intricate temporal dependencies. Different from these fragmented approaches, our proposed DMSC framework explores full-spectrum multi-scale coordination across embedding, extraction, and prediction stages, achieving both superior forecasting accuracy and high computational efficiency.

3 Dynamic Multi-Scale Coordination Framework↩︎

Time series forecasting addresses the fundamental challenge of predicting future values \({Y} = {y_{t+1}, ..., y_{t+h}} \in \mathbb{R}^{h\times{C}}\) from historical observations \({X} = {x_1, ..., x_t} \in \mathbb{R}^{t\times{C}}\), where \(h\) is the prediction horizon, \(t\) is the history horizon, and \(C\) is the number of variables. To model complex multi-scale dependencies inherent in real-world time series, we propose the Dynamic Multi-Scale Coordination (DMSC) framework. As illustrated in Fig. 1, DMSC establishes full-spectrum multi-scale coordination across three key stages: embedding, feature extraction, and prediction, realized through EMPD, TIB and ASR-MoE components.

3.1 Multi-Layer Progressive Cascade Architecture↩︎

The multi-layer progressive cascade architecture facilitates hierarchical feature learning through stacked EMPD-TIB units, forming a dynamic feature pyramid. Within this pyramid, coarse-grained representations from earlier layers adaptively guide fine-grained feature extraction in subsequent layers via gated pathways. Formally, given input \(\mathbf{X} \in \mathbb{R}^{C \times L}\), the \(l\)-th layer processes:

\[\label{eq:Layer32a} \mathcal{T} = \{\mathbf{F}_1, \mathbf{F}_2,..., \mathbf{F}_l\},\tag{1}\] \[\label{eq:Layer32b} \mathbf{F}_l = \boldsymbol{TIB}_l(\mathbf{Z}_l),\tag{2}\] \[\label{eq:Layer32c} \mathbf{Z}_l = \boldsymbol{EMPD}_l(\mathcal{G}_l(\mathbf{F}_{l-1}) + \mathbf{X}),\tag{3}\] where \(\mathcal{T}\) denotes the set of hierarchical features generated by this architecture for ASR-MoE, with each \(\mathbf{F}_l \in \mathbb{R}^{C \times{D}}\) representing the output of \(l\)-th layer. Here, \(\mathcal{G}_l\) represents a gated projection matrix that adaptively modulates the residual information flow. This cascade flow enables progressive refinement of multi-scale representations, where \(\mathbf{F}_{l-1}\) adaptively guides the computation of \(\mathbf{Z}_l\) with residual connections, and \(\boldsymbol{TIB}_l\) iteratively enhances representations by jointly modeling intra-patch, inter-patch, and cross-variable dependencies. By combining dynamic patch decomposition (EMPD) with triadic dependency modeling (TIB), our cascade architecture achieves input-aware progressive refinement of temporal granularities. This integration enables coherent modeling of both short-term dynamics and long-range trends while maintaining parameter efficiency.

3.2 Embedded Multi-Scale Patch Decomposition↩︎

Patch decomposition serves as a fundamental operation that transforms time series into structured representations, preserving local semantics while enabling hierarchical pattern discovery. Unlike conventional single or fixed-scale approaches, the EMPD block introduces adaptive and hierarchical decomposition mechanism as a built-in component, dynamically adjusting patch granularities based on sequence characteristics. This approach eliminates rigid predefined patch configurations while natively integrating multi-scale decomposition into our framework.

Specifically, given an input time series tensor \(\mathbf{X} \in \mathbb{R}^{C\times{L}}\)(where \(C\) denotes the number of variates and \(L\) the sequence length), EMPD first computes a scale factor \(\alpha\) through a lightweight neural network: \[\label{eq:EMPD32factor} \alpha = \mathcal{N}_{\theta}(\Phi_{\mathrm{GAP}}(\mathbf{X}))\tag{4}\] where \(\mathcal{N}_{\theta}\) denotes a lightweight MLP with sigmoid activation, and \(\Phi_{\mathrm{GAP}}\) denotes a global average pooling operation that compresses the temporal dimension. The factor \(\alpha\in[0, 1]\) dynamically determines the base patch length, which subsequently undergoes exponential decay across layers as follows: \[\label{eq:EMPD32patch1} \mathbf{P}_{\text{base}} = [\mathbf{P}_\text{min} + \alpha\cdot(\mathbf{P}_\text{max} - \mathbf{P}_\text{min})],\tag{5}\] \[\label{eq:EMPD32patch2} \mathbf{P}_{l} = \mathrm{max}(\mathbf{P}_\text{min}, [\mathbf{P}_\text{base}/\tau^l]),\tag{6}\] where \(\mathbf{P}_\text{max}\) and \(\mathbf{P}_\text{min}\) denote the parameters for minimum and maximum patch bounds, respectively. \(\mathbf{P}_{l}\) represents the patch length at the \(l\)-th layer. This design ensures that shallow layers process coarse-grained dependencies while deeper layers capture fine-grained ones. EMPD then applies replication padding to mitigate boundary effects, followed by hierarchical patch unfolding operation: \[\label{eq:EMPD32unfold} \mathbf{X}_p^l = \mathrm{Unfold}_{P_l, S_l}(\mathrm{Padding}_{S_l}(X)),\tag{7}\] where \(\mathrm{Padding}_{S_l}(\cdot)\) denotes the replication padding operation with stride \(S_l = P_l / 2\) and \(\mathrm{Unfold}_{P_l, S_l}\) unfolds the padded sequence into patches of length \(P_l\) using a stride of \(S_l\). This operation produces a 3D patch tensor \(\mathbf{X}_u^l \in \mathbb{R}^{C\times{N_l}\times{P_l}}\). Finally, EMPD projects \(\mathbf{X}_p^l\) into a unified embedding space via a linear projection: \[\label{eq:EMPD32proj} \mathbf{Z}^l = \mathrm{Projector}(\mathbf{X}_p^l),\tag{8}\] This direct linear projection efficiently maps patch-wise temporal features into a unified embedding space, avoiding redundant flattening operations while preserving the hierarchical structure of the multi-scale patches.

3.3 Triad Interaction Block↩︎

The Triad Interaction Block(TIB) is designed to model heterogeneous dependencies within the multi-scale patch representations generated by EMPD, through joint capture of intra-patch, inter-patch, and cross-variable interactions. Given the input tensor \(\mathbf{Z}^l \in\mathbb{R}^{C\times{N_l}\times{D}}\), TIB integrates these three complementary dependency types within a coherent framework. Specifically, TIB first processes intra-patch dependencies using depth-wise separable convolutions through operation \(\mathbf{\zeta}_\text{intra}\) to capture fine-grained local patterns: \[\label{eq:TIB32intra} \mathbf{F}_{\text{intra}}^l = \mathbf{\zeta}_\text{intra}(\mathbf{Z}^l),\tag{9}\] where \(\mathbf{\zeta}_\text{intra}\) comprises depth-wise convolution and \(\mathrm{Conv1d}\) to capture local temporal contexts and project features back to the embedding space. This operation preserves intra-patch continuity while effectively extracting local representations.

For inter-patch dynamics \(\mathbf{F}_{{intra}}^l\), TIB employs dilated convolutions with adaptive pooling via \(\mathbf{\zeta}_{inter}\): \[\label{eq:TIB32inter} \mathbf{F}_{\text{inter}}^l = \mathbf{\zeta}_\text{inter}(\mathbf{F}_{\text{intra}}^l),\tag{10}\] where \(\mathbf{\zeta}_\text{inter}\) consists of the dilated convolution and adaptive pooling operations, capturing broader temporal contexts and patch-level information without increasing computational cost.

TIB then models cross-variable dependencies by generating adaptive weighting coefficients \(\mathbf{G}^l\in \mathbb{R}^{{C}\times{1}}\) as follows: \[\label{eq:TIB32inter1} \mathbf{G}^l = \sigma\left(\mathrm{MLP}(\bar{\mathbf{F}}_{\text{inter}}^l)\right),\tag{11}\] \[\label{eq:TIB32inter2} \mathbf{F}_{\text{cross}}^l = \mathbf{G}^l \odot \mathbf{F}_{\text{inter}}^l,\tag{12}\] where \(\bar{\mathbf{F}}_{\text{inter}}^l\) denotes the global average of \(\mathbf{F}_{\text{inter}}^l\), and the sigmoid function \(\sigma\) acts as a feature-specific gate mechanism. \(\mathbf{G}^l\) are applied element-wise to \(\mathbf{F}_{\text{inter}}^l\) via the Hadamard product \(\odot\), adaptively scaling each feature’s contribution based on its global relevance and enabling context-aware cross-variable interactions.

Finally, the three dependency representations are adaptively fused using learned gating weights, and the resulting features are integrated with residual connection and normalization: \[\label{eq:TIB32fuse} \mathbf{F}^l_{\text{fused}} = \sum_{i=1}^{3}\mathbf{G}^l_i\odot\mathbf{F}^l_i,\tag{13}\] \[\label{eq:TIB32out} \mathbf{F}^l_{\text{output}} = \mathrm{LayerNorm}(\mathbf{F}^l_{\text{fused}} + \mathbf{Z}^l),\tag{14}\] where \(\mathbf{G}^l_i\) represents the learned gating weights for the \(i\)-th representations. This architecture enables TIB to dynamically balance the contributions of different dependency types based on the temporal characteristics of input data. Consequently, it effectively captures intricate dependencies and provides rich temporal features for multi-scale prediction.

2pt

Table 1: Long-term forecasting results. All the results are averaged from 4 different prediction lengths {96, 192, 336, 720}, and the look-back length is fixed to 96 for all baselines. A lower MSE or MAE indicates a better prediction, with the best in boldface and second in underline.
Models DMSC(Ours) TimeMixer iTransformer PatchTST Dlinear TimesNet Autoformer TimeXer PatchMLP TimeKAN AMD
2-3 (lr)4-5 (lr)6-7 (lr)8-9 (lr)10-11 (lr)12-13 (lr)14-15 (lr)16-17 (lr)18-19 (lr)20-21 (lr)22-23 Metric MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
ETTh1 \(\boldsymbol{0.415}\) \(\boldsymbol{0.426}\) \(0.456\) \(0.444\) \(0.452\) \(0.446\) \(0.445\) \(0.445\) \(0.461\) \(0.457\) \(0.479\) \(0.466\) \(0.670\) \(0.564\) \(0.460\) \(0.452\) \(0.458\) \(0.445\) \(\underline{0.429}\) \(\underline{0.432}\) \(0.452\) \(0.439\)
ETTh2 \(\boldsymbol{0.355}\) \(\boldsymbol{0.383}\) \(\underline{0.372}\) \(\underline{0.400}\) \(0.384\) \(0.407\) \(0.383\) \(0.412\) \(0.563\) \(0.519\) \(0.411\) \(0.421\) \(0.488\) \(0.494\) \(0.374\) \(0.404\) \(0.406\) \(0.422\) \(0.389\) \(0.409\) \(0.616\) \(0.558\)
ETTm1 \(\boldsymbol{0.368}\) \(\boldsymbol{0.383}\) \(0.384\) \(0.398\) \(0.408\) \(0.412\) \(0.386\) \(0.400\) \(0.404\) \(0.408\) \(0.418\) \(0.418\) \(0.636\) \(0.534\) \(0.391\) \(0.395\) \(0.387\) \(0.398\) \(\underline{0.376}\) \(0.397\) \(0.394\) \(\underline{0.396}\)
ETTm2 \(\boldsymbol{0.268}\) \(\boldsymbol{0.317}\) \(0.278\) \(0.324\) \(0.292\) \(0.336\) \(0.288\) \(0.334\) \(0.354\) \(0.402\) \(0.291\) \(0.330\) \(0.502\) \(0.452\) \(\underline{0.277}\) \(\underline{0.323}\) \(0.287\) \(0.329\) \(0.282\) \(0.330\) \(0.288\) \(0.332\)
Electricity \(\boldsymbol{0.170}\) \(\boldsymbol{0.258}\) \(0.190\) \(0.280\) \(\underline{0.176}\) \(\underline{0.265}\) \(0.205\) \(0.295\) \(0.226\) \(0.319\) \(0.194\) \(0.293\) \(0.492\) \(0.523\) \(0.201\) \(0.275\) \(0.200\) \(0.296\) \(0.201\) \(0.290\) \(0.206\) \(0.287\)
Exchange \(\boldsymbol{0.336}\) \(\boldsymbol{0.391}\) \(0.359\) \(\underline{0.401}\) \(0.362\) \(0.406\) \(0.389\) \(0.417\) \(\underline{0.339}\) \(0.414\) \(0.430\) \(0.446\) \(0.539\) \(0.519\) \(0.398\) \(0.422\) \(0.383\) \(0.417\) \(0.384\) \(0.414\) \(0.366\) \(0.408\)
Weather \(\boldsymbol{0.233}\) \(\boldsymbol{0.269}\) \(0.245\) \(0.274\) \(0.261\) \(0.281\) \(0.258\) \(0.279\) \(0.265\) \(0.316\) \(0.256\) \(0.283\) \(0.348\) \(0.382\) \(\underline{0.242}\) \(\underline{0.272}\) \(0.252\) \(0.277\) \(0.244\) \(0.274\) \(0.271\) \(0.291\)
Traffic \(\boldsymbol{0.407}\) \(\boldsymbol{0.274}\) \(0.509\) \(0.307\) \(\underline{0.422}\) \(\underline{0.283}\) \(0.482\) \(0.309\) \(0.688\) \(0.428\) \(0.633\) \(0.334\) \(1.028\) \(0.605\) \(0.466\) \(0.287\) \(0.539\) \(0.364\) \(0.590\) \(0.373\) \(0.547\) \(0.345\)
Solar \(\boldsymbol{0.213}\) \(\boldsymbol{0.258}\) \(0.228\) \(0.279\) \(0.238\) \(\underline{0.263}\) \(0.249\) \(0.292\) \(0.330\) \(0.401\) \(0.268\) \(0.286\) \(0.727\) \(0.634\) \(\underline{0.226}\) \(0.269\) \(0.298\) \(0.301\) \(0.289\) \(0.321\) \(0.363\) \(0.337\)

2pt

Table 2: Short-term forecasting results. All the results are averaged from 4 different prediction lengths {12, 24, 48, 96}, and the look-back length is fixed to 96 for all baselines.
Models DMSC(Ours) TimeMixer iTransformer PatchTST Dlinear TimesNet Autoformer TimeXer PatchMLP TimeKAN AMD
2-3 (lr)4-5 (lr)6-7 (lr)8-9 (lr)10-11 (lr)12-13 (lr)14-15 (lr)16-17 (lr)18-19 (lr)20-21 (lr)22-23 Metric MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
PEMS03 \(\boldsymbol{0.131}\) \(\boldsymbol{0.241}\) \(0.190\) \(0.288\) \(0.261\) \(0.328\) \(0.308\) \(0.368\) \(0.279\) \(0.377\) \(\underline{0.154}\) \(\underline{0.254}\) \(0.513\) \(0.517\) \(0.180\) \(0.280\) \(0.220\) \(0.289\) \(0.283\) \(0.357\) \(0.281\) \(0.368\)
PEMS04 \(\boldsymbol{0.129}\) \(\boldsymbol{0.244}\) \(0.230\) \(0.324\) \(0.913\) \(0.801\) \(0.365\) \(0.405\) \(0.295\) \(0.389\) \(\underline{0.138}\) \(\underline{0.250}\) \(0.472\) \(0.495\) \(0.327\) \(0.418\) \(0.203\) \(0.305\) \(0.297\) \(0.372\) \(0.327\) \(0.397\)
PEMS07 \(\underline{0.092}\) \(\underline{0.193}\) \(0.165\) \(0.256\) \(0.109\) \(0.214\) \(0.438\) \(0.422\) \(0.329\) \(0.394\) \(0.109\) \(0.216\) \(0.418\) \(0.466\) \(\boldsymbol{0.088}\) \(\boldsymbol{0.192}\) \(0.166\) \(0.262\) \(0.264\) \(0.346\) \(0.196\) \(0.318\)
PEMS08 \(\boldsymbol{0.162}\) \(\boldsymbol{0.228}\) \(0.264\) \(0.332\) \(0.199\) \(0.278\) \(0.347\) \(0.390\) \(0.403\) \(0.444\) \(\underline{0.198}\) \(\underline{0.236}\) \(0.603\) \(0.541\) \(0.206\) \(0.248\) \(0.244\) \(0.326\) \(0.337\) \(0.377\) \(0.427\) \(0.443\)

3.4 Adaptive Scale Routing Mixture-of-Experts↩︎

To address the limitations of static fusion mechanisms in multi-scale forecasting, we propose the Adaptive Scale Routing Mixture-of-Experts (ASR-MoE) block. This dynamic prediction head captures temporal dependencies at different horizons through a hierarchy of experts (global-shared and local-specialized experts) and adaptively aggregates scale-specific predictions based on temporal patterns.

Specifically, ASR-MoE incorporates two distinct expert groups to handle different temporal granularities. Global experts \(\mathcal{E}^g=\{\mathcal{G}_1, \mathcal{G}_2,..., \mathcal{G}_m\}\) capture common long-term dependencies through deeper networks, modeling persistent trends and long-periodic patterns that commonly existed in time series. Local experts \(\mathcal{E}^l=\{\mathcal{L}_1, \mathcal{L}_2,..., \mathcal{L}_n\}\) capture diverse short-term variations via shallower networks, detecting short-periodic patterns and high-frequency fluctuations that vary dynamically across stages. This explicit decoupling enables specialized handling of complex temporal dynamics while maintaining high parameter efficiency.

To compute the weighting for experts at each scale, a dynamic routing mechanism assigns input-dependent weights, thereby balancing the contributions of global and local dependencies. Given a scale-specific feature \(\mathbf{F}_l\), the router generates weights, and only the top-\(K\) local experts are activated via sparse routing: \[\label{eq:MoE322} \mathbf{\hat{\Omega}}^L_l, \mathcal{I} = \begin{cases} \mathbf{\Omega}^L_l, & \mathbf{\Omega}^L_l \in \mathrm{Top-K}(\mathbf{\Omega}_l, K) \\ 0, & \text{otherwise} \end{cases}\tag{15}\] \[\label{eq:MoE3211} \mathbf{\Omega}_G = \mathrm{Sigmoid}(\mathrm{MLP}(\mathbf{F}_l)),\tag{16}\] \[\label{eq:MoE321} \mathbf{\Omega}_l = \mathrm{Softmax}(\mathrm{MLP}(\mathbf{F}_l)),\tag{17}\] where \(\mathbf{\Omega}_g\) and \(\mathbf{\Omega}_l\) represent the global and local expert weight matrices, respectively. \(\mathcal{I}\) denotes the set of indices for the selected experts. Finally, a temporal-aware weighting module fuses outputs from all scales by historical scale importance. The scale-specific predictions \(\hat{\mathbf{Y}}^l\) are fused using time-dependent weights:

\[\label{eq:MoE323} \mathbf{w} = \mathrm{Softmax}\left(\mathrm{MLP}\left(\bigoplus^L_{l=1}\phi(\mathbf{F}_l)\cdot\mathbf{w}_\text{hist}\right)\right),\tag{18}\] where \(\phi(\cdot)\) denotes temporal descriptors, \(\mathbf{w}_\text{hist}\) represents the historical weighting memory, and \(\bigoplus\) denotes the concatenation operation. The final prediction is obtained by integrating the multi-scale outputs: \[\label{eq:MoE324} \hat{\mathbf{Y}} = \sum_{l=1}^{L}\mathbf{w}_l\left(\sum_{M}^{m=1}{\mathbf{\Omega}^G_m\mathcal{G}_m}(\mathbf{F}_l)+\sum_{n\in{\mathcal{I}}}{\hat{\mathbf{\Omega}}^L_n\mathcal{L}_n}(\mathbf{F}_l)\right),\tag{19}\]

To ensure balanced expert utilization, ASR-MoE incorporates an auxiliary balance loss: \[\label{eq:MoE325} \mathcal{L}_{{balance}} = -\lambda\mathbf{E}[\sum\mathbf{\Omega}_{j}log\mathbf{\Omega}_j],\tag{20}\] The overall loss function for DMSC is defined as follows: \[\label{eq:MoE326} \mathcal{L} = \mathcal{L}_{pred} + \mathcal{L}_{{balance}},\tag{21}\] where \(\mathcal{L}_{pred}\) denotes the Mean Squared Error (MSE) loss.

By integrating hierarchical expert specialization, dynamic routing, and temporal-aware weighting, ASR-MoE adaptively prioritizes relevant scales and experts, achieving a synergistic balance between long-term trend capture and short-term detail refinement in time series forecasting.

4 Experiments↩︎

To comprehensively evaluate the performance and effectiveness of the proposed DMSC framework, we conduct extensive experiments on 13 real-world TSF benchmarks across multiple domains and temporal resolutions.

4.0.1 Datasets.↩︎

For long-term forecasting, we conduct experiments on nine well-established benchmarks, including ETT (ETTh1, ETTh2, ETTm1, ETTm2), Electricity, ECL, Traffic, Weather and Solar. For short-term forecasting, we utilize the PEMS(PEMS03, PEMS04, PEMS07, PEMS08) dataset.

4.0.2 Baselines.↩︎

We compare our framework with ten SOTA models of TSF, including transformer-based models: iTransformer[36], PatchTST[35], Autoformer[30], TimeXer[37]; MLP-based models: DLinear[34], PatchMLP[33], AMD[38]; CNN-based models: TimesNet[20], TimeMixer[17], and a novel architecture TimeKAN[18].

4.1 Main Results↩︎

The main results for long-term and short-term forecasting are presented in Table 1 and Table 2, in which the lower MSE and MAE values indicates superior forecasting performance. The proposed DMSC framework demonstrates consistent SOTA performance across all 13 benchmarks, achieving the lowest MAE and MSE in most experimental settings. These results confirm the robust effectiveness and generalizability of DMSC for both long-term and short-term forecasting tasks. Notably, multi-scale architectures like TimesNet and PatchMLP exhibit performance instability, which we attribute to their inflexible fusion mechanisms that fails to fully leverage the captured multi-scale features. The competitive performance of TimeXer and iTransformer underscores the effectiveness of patch-wise and variate-wise strategy for information representation, but their fixed decomposition schemes inherently constrain the model’s potential. Furthermore, compared to iTransformer and Dlinear, DMSC achieves considerable improvement in data with more variate(Traffic, PEMS), highlighting the necessity of jointly dependency modeling. Collectively, the above observations validate the efficacy of DMSC in addressing the core challenges of time series forecasting.

4.2 Model Analysis↩︎

4.2.1 Ablation Study.↩︎

To rigorously evaluate the contribution and effectiveness of each component within DMSC, we conduct systematic ablation experiments on Electricity, Weather, Traffic and Solar datasets. We first ablate each of the three core blocks individually. Results of Table 3 highlight the complementary synergy of the three blocks. We then perform further ablation analyses for individual components. For EMPD in Table 4, 1 employs predefined static patch size for exponentially decaying. For TIB in Table 5, 2 retains only intra-patch feature extraction without jointly modeling the three dependency representations, while 3 removes dynamic fusion mechanism \(\mathbf{F}^l_{\text{fused}}\) in TIB. For ASR-MoE in Table 6, 4 replace the prediction heads with a simple summation of multiple linear layers, and 5 and 6 eliminate global experts and local experts respectively. Based on these ablation studies, we have the following observations.

Table 3: Ablation results. All results are averaged from prediction length {96, 192, 336, 720}.
Dateset Electricity Weather Traffic Solar
3-4 (lr)5-6 (lr)7-8 (lr)9-10 MSE MAE MSE MAE MSE MAE MSE MAE
DMSC(Ours) \(\boldsymbol{0.170}\) \(\boldsymbol{0.258}\) \(\boldsymbol{0.233}\) \(\boldsymbol{0.269}\) \(\boldsymbol{0.407}\) \(\boldsymbol{0.274}\) \(\boldsymbol{0.213}\) \(\boldsymbol{0.258}\)
w/o EMPD \(0.181\) \(0.270\) \(0.248\) \(0.275\) \(0.503\) \(0.323\) \(0.248\) \(0.278\)
w/o TIB \(0.185\) \(0.273\) \(0.255\) \(0.278\) \(0.497\) \(0.321\) \(0.264\) \(0.293\)
w/o ASR-MoE \(0.196\) \(0.285\) \(0.251\) \(0.280\) \(0.503\) \(0.320\) \(0.271\) \(0.305\)
Table 4: Ablation study on EMPD to compare dynamic decomposition with static predefined decomposition strategies.
Case Electricity Weather Traffic Solar
3-4 (lr)5-6 (lr)7-8 (lr)9-10 MSE MAE MSE MAE MSE MAE MSE MAE
DMSC(Ours) \(\boldsymbol{0.170}\) \(\boldsymbol{0.258}\) \(\boldsymbol{0.233}\) \(\boldsymbol{0.269}\) \(\boldsymbol{0.407}\) \(\boldsymbol{0.274}\) \(\boldsymbol{0.213}\) \(\boldsymbol{0.258}\)
1static decomp \(0.179\) \(0.271\) \(0.245\) \(0.272\) \(0.478\) \(0.313\) \(0.243\) \(0.276\)
Table 5: Ablation study on TIB to validate effectiveness of joint triad dependency modeling.
Case Electricity Weather Traffic Solar
3-4 (lr)5-6 (lr)7-8 (lr)9-10 MSE MAE MSE MAE MSE MAE MSE MAE
DMSC(Ours) \(\boldsymbol{0.170}\) \(\boldsymbol{0.258}\) \(\boldsymbol{0.233}\) \(\boldsymbol{0.269}\) \(\boldsymbol{0.407}\) \(\boldsymbol{0.274}\) \(\boldsymbol{0.213}\) \(\boldsymbol{0.258}\)
2only \(\mathbf{F}_{intra}\) \(0.187\) \(0.276\) \(0.249\) \(0.275\) \(0.489\) \(0.315\) \(0.275\) \(0.298\)
3w/o \(\mathbf{F}^l_{{fused}}\) \(0.186\) \(0.277\) \(0.247\) \(0.273\) \(0.483\) \(0.314\) \(0.277\) \(0.299\)
Table 6: Ablation study on ASR-MoE to assess the importance of expert specialization (global/local) and dynamic fusion strategies for prediction results.
Case Electricity Weather Traffic Solar
3-4 (lr)5-6 (lr)7-8 (lr)9-10 MSE MAE MSE MAE MSE MAE MSE MAE
DMSC(Ours) \(\boldsymbol{0.170}\) \(\boldsymbol{0.258}\) \(\boldsymbol{0.233}\) \(\boldsymbol{0.269}\) \(\boldsymbol{0.407}\) \(\boldsymbol{0.274}\) \(\boldsymbol{0.213}\) \(\boldsymbol{0.258}\)
4Agg. Heads \(0.194\) \(0.281\) \(0.254\) \(0.281\) \(0.510\) \(0.312\) \(0.278\) \(0.299\)
5w/o \(\mathcal{E}^g\) \(0.191\) \(0.280\) \(0.250\) \(0.279\) \(0.504\) \(0.324\) \(0.265\) \(0.293\)
6w/o \(\mathcal{E}^l\) \(0.199\) \(0.285\) \(0.243\) \(0.271\) \(0.488\) \(0.326\) \(0.275\) \(0.303\)

For EMPD, replacing the input-adaptive decomposition with either predefined static patches or raw data embedding leads to performance degradation. This demonstrates the necessity of adaptive patch-wise decomposition and its lightweight network for dynamic granularity adjustment. For TIB, replacing joint modeling with standard convolution layers or only intra-patch features resulted in performance drops, validating the necessity of comprehensive joint modeling of triadic dependencies. Similarly, ablation of dynamic fusion mechanism in TIB confirms its effectiveness in fusing heterogeneous dependencies, as static handling undermines holistic cross-scale dependency capture. For ASR-MoE, replacing the adaptive prediction mechanism with a single prediction head or summed multiple linear projections leads to substantial performance declines. This indicates that single heads inadequately utilize multi-scale features, while simplistic aggregation overlooks scale-specific dependency importance. Moreover, removing either global or local experts degrades performance, demonstrating the necessity of maintaining distinct experts for different temporal patterns. The elaborate design of ASR-MoE effectively harnesses diverse temporal dynamics through collective specialized expert utilization. Collectively, these ablation studies confirm that dynamic and multi-scale modeling is central to DMSC. All of three blocks and the proposed progressive cascade architecture together enable robust multiscale modeling and hierarchical feature learning, with consistent performance drops from ablating any component validate architectural integrity of DMSC.

Figure 2: Visualization of different experts on ETTh1 datasets, the look-back and prediction length are set to 96.

4.2.2 ASR-MoE Expert Specialization.↩︎

To verify the specialization of ASR-MoE experts, we visualize their activation patterns across different temporal experts. As shown in Fig. 2, distinct specialization profiles emerge: global experts exhibit heightened sensitivity to persistent trends and long-periodic patterns, whereas local experts demonstrate acute responsiveness to short-periodic patterns and high-frequency fluctuations. This observed specialization demonstrates the adaptive acquisition of specialized knowledge by different experts.

4.2.3 Increasing Look-Back Length.↩︎

Theoretically, longer input sequences provide richer historical information, which can enhance the model’s ability to capture long-term trends and complex multi-scale dependencies. Our DMSC framework is specifically designed to leverage this characteristic, as its dynamic multi-scale coordination capability inherently facilitates adaptive modeling and effective information utilization across varying look-back lengths. So we evaluate DMSC with look-back lengths selected from {48, 96, 192, 336, 720} on Electricity dataset. As shown in Fig. 3, DMSC consistently achieves improved performance with longer historical sequences while outperforming other baseline methods. This demonstrates its capacity to extract richer temporal representations from extended contexts and effectively integrate multi-scale temporal patterns.

Figure 3: Forecasting results with varying look-back length on Electricity dataset. Look-back lengths are set to {48, 96, 192, 336, 720}, and prediction length is fixed to 96.

4.2.4 Efficiency Analysis↩︎

We rigorously evaluate the memory usage and training time of DMSC against other SOTA baselines on Weather(21 variates) and Traffic(862 variates) datasets. As demonstrated in Fig. 4 and Fig. 5, DMSC consistently achieves superior efficiency in both memory usage and training time while maintaining competitive forecasting accuracy. Crucially, DMSC demonstrates near-linear scalability to long sequences, as outperforming Transformer-based models with quadratic complexity and MLP-based approaches susceptible to parameter explosion. These efficiency gains are particularly pronounced when processing high-dimensional multivariate data (Traffic), where the cross-variable interaction in TIB avoids the computational overhead of exhaustive pairwise attention while preserving modeling capacity. DMSC thus establishes a balance of accuracy, latency, and memory usage for practical deployment scenarios.

Figure 4: Model efficiency analysis under 96-look-back length and 96-prediction length on Weather(21 variates) datasets. Batch size is set to 128.
Figure 5: Model efficiency analysis under 96-look-back length and 96-prediction length on Traffic(862 variates) datasets. Batch size is set to 16.

5 Conclusion↩︎

This paper presents the Dynamic Multi-Scale Coordination (DMSC) framework, a novel approach that advances time series forecasting through comprehensive dynamic multi-scale coordination. DMSC has three key contributions: 1) Embedded Multi-Scale Patch Decomposition(EMPD) dynamically decomposes time series into hierarchical patches, eliminating predefined scale constraints through input-adaptive granularity adjustment; 2) Triad Interaction Block (TIB) jointly models intra-patch, inter-patch, and cross-variable dependencies, forming a coarse-to-fine feature pyramid through progressive cascade layers; 3) Adaptive Scale Routing MoE (ASR-MoE) dynamically fuses multi-scale predictions via temporal-aware weighting of specialized global and local experts. Extensive experiments on thirteen benchmarks demonstrate that DMSC achieves SOTA performance and superior efficiency. Future work will extend the framework to multi-task learning and explore optimizations on complex real-world environments, thereby enhancing its applicability across diverse scenarios.

6 Implementation Details↩︎

6.1 Dataset Descriptions↩︎

We conduct long-term forecasting experiments on 9 real-world datasets, including ETT(ETTh1, ETTh2, ETTm1, ETTm2); Electricity(ECL); Exchange; Solar; Traffic; and Weather datasets. Meanwhile, we use PEMS(PEMS03, PEMS04, PEMS07, PEMS08) datasets to conduct short-term forecasting experiments. The details of datasets are listed in Tab 7.

Table 7: Dataset Descriptions. Num is the number of variable. Dataset size is organized in (Train, Validation, Test).
Name Domain Length Num Prediction Length Dataset Size Freq. (m)
ETTh1 Temperature 14400 7 {96,192,336,720} (8545,2881,2881) 60
ETTh2 Temperature 14400 7 {96,192,336,720} (8545,2881,2881) 60
ETTm1 Temperature 57600 7 {96,192,336,720} (34465,11521,11521) 15
ETTm2 Temperature 57600 7 {96,192,336,720} (34465,11521,11521) 15
Electricity Electricity 26304 321 {96,192,336,720} (18317,2633,5261) 60
Exchange Exchange Rate 7588 8 {96,192,336,720} (5120,665,1422) 1440
Traffic Road Occupancy 17544 862 {96,192,336,720} (12185,1757,3509) 60
Weather Weather 52696 21 {96,192,336,720} (36792,5271,10540) 10
Solar-Energy Energy 52179 137 {96,192,336,720} (36601,5161,10417) 10
PEMS03 Traffic Flow 26208 358 {12,24,48,96} (15617,5135,5135) 5
PEMS04 Traffic Flow 16992 307 {12,24,48,96} (10172,3375,3375) 5
PEMS07 Traffic Flow 28224 883 {12,24,48,96} (16711,5622,5622) 5
PEMS08 Traffic Flow 17856 170 {12,24,48,96} (10690,3548,3548) 5

6.2 Metrics Details↩︎

To evaluate model’s performance for TSF, we utilize the mean square error(MSE) and mean absolute error(MAE). The calculations of metrics are as follows: \[\label{metrics32MSE} \boldsymbol{MSE} = \frac{1}{n}\sum_{i=1}^{n}(\mathbf{Y}_i - \hat{\mathbf{Y}}_i)^2\tag{22}\] \[\label{metrics32MAE} \boldsymbol{MAE} = \frac{1}{n}\sum_{i=1}^{n}|\mathbf{Y}_i - \hat{\mathbf{Y}}_i|\tag{23}\]

6.3 Implementation Details↩︎

All experiments are implemented in Pytorch, and conducted on a single NVIDIA RTX 4090 24GB GPU. We utilize ADAM optimizer with an initial learning rate \(10^{-3}\) and L2 loss for model optimization. For DMSC, we set layers of progressive cascade framework to 2 - 5, embedding dimension is set to 64, 128, 256, 512. The patch decay rate is set to 2 for exponentially degradation. All experiments are based on the framework of TimesNet. And all the baselines are implemented based on the configurations of original paper and its code.

7 Model Analysis↩︎

7.1 Model Complexity Analysis↩︎

The computational complexity of DMSC is dominated by three core components: EMPD, TIB, and ASR-MoE. EMPD hierarchically decomposes input sequences into exponentially scaled patches via unfolding and linear projection, yielding a lightweight \(O(C\cdot{L}\cdot{d_{\mathrm{model}}})\) complexity, where \(C\) is the number of variate, \(L\) is the sequence length, and \(d_{\mathrm{model}}\) is the embedding dimension. TIB employs depth-wise separable convolutions and dilated convolutions to model intra-patch, inter-patch, and cross-variable dependencies, maintaining near-linear complexity \(O(C\cdot{N}\cdot{d^2_\mathrm{model}})\) per layer (\(N\) is patch count). ASR-MoE leverages sparse activation, routing each input to only top-\(K\) experts (global and local), reducing fusion complexity from quadratic to \(O(B\cdot{C}\cdot{d_\mathrm{model}}\cdot(K+S))\) (\(B\): batch size, \(S\): shared experts). Collectively, DMSC achieves \(O(L)\) sequential scalability, outperforming Transformer-based models(\(O(L^2)\)) and MLP-based alternatives(\(O(L\cdot{d^2_{ff}})\)), while excelling in both long- and short-term forecasting tasks.

7.2 Hyperparameter Sensitivity Analysis↩︎

7.2.1 Embedding Dimension.↩︎

The embedding dimension determines the richness of feature representations. We evaluate the model’s performance on ECL and Weather datasets with embedding dimensions \(d_{\mathrm{model}} \in \{64, 128,\) \(256, 512\}\). The results in Fig. 6 (a) and Fig. 6 (b) show that the optimal dimension is at 128 / 256, as smaller embedding impairs multi-scale feature separation, while larger configurations yield marginal gains with higher memory cost, indicating dimensionality saturating.

a
b

Figure 6: Hyperparameter sensitivity analysis under different embedding dimension, \(d_{\mathrm{model}}\) is set to {64, 128, 256, 512}. Left is conducted on Electricity dataset and right is on Weather dataset. Look-back length is fixed to 96, and prediction lengths are set to {96, 192, 336, 720}.. a — Performance on Electricity dataset., b — Performance on Weather dataset.

a
b

Figure 7: Hyperparameter sensitivity analysis on Weather dataset. Left shows performance of different patch length decay rate, right shows performance of different progressive cascade layers. Look-back length is fixed to 96, and prediction lengths are set to {96, 192, 336, 720}.. a — Performance of different patch length decay rate., b — Performance of different progressive cascade layers.

7.2.2 EMPD Patch Length Decay.↩︎

EMPD employs exponentially decaying patch lengths across layers to capture multi-scale dependencies. We evaluate the impact of the EMPD decay rate by testing values of \(\tau\in\{2, 3, 4\}\) on Weather dataset. As shown in Fig. 7 (a), a decay factor of \(\tau=2\) achieves the best forecasting performance, which balances the preservation of sufficient local details in fine-grained layers with the extraction of broader trends in coarse-grained layers.

7.2.3 Number of Progressive Cascade Layers.↩︎

The number of progressive cascade layers (\(l\in\{1, 2, 3, 4, 5\}\)) influences hierarchical feature extraction. As shown in Fig. 7 (b), three layers can achieve optimal efficiency, shallower stacks (1 - 2 layers) fail to capture fine-grained interactions, while deeper configurations (4 - 5 layers) introduce substantially increased latency with diminishing performance returns.

8 Full Results↩︎

We evaluate DMSC on 13 real-world TSF benchmarks spanning diverse domains. Tab 8 shows the full results of long-term forecasting tasks on ETT (ETTh1, ETTh2, ETTm1, ETTm2), Electricity, ECL, Traffic, Weather and Solar datasets. Tab 9 contains the full results of short-term forecasting tasks on PEMS(PEMS03, PEMS04, PEMS07, PEMS08) datasets. From full results, we can see that DMSC achieves pretty good performance on most datasets and tasks.

Meanwhile, Tab 10 shows the full results of ablation study on Electricity, Weather, Traffic and Solar datasets, which demonstrating the efficiency and contribution of each component in DMSC.

2pt

Table 8: Full results of long-term forecasting. All the results are selected from 4 different prediction lengths {96, 192, 336, 720}, and the look-back length is fixed to 96 for all baselines. A lower MSE or MAE indicates a better prediction, with the best in \(\boldsymbol{boldface}\) and second in underline.
Models DMSC(Ours) TimeMixer iTransformer PatchTST Dlinear TimesNet Autoformer TimeXer PatchMLP TimeKAN AMD
3-4 (lr)5-6 (lr)7-8 (lr)9-10 (lr)11-12 (lr)13-14 (lr)15-16 (lr)17-18 (lr)19-20 (lr)21-22 (lr)23-24 MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
96 \(\boldsymbol{0.370}\) \(\boldsymbol{0.395}\) \(0.379\) \(\underline{0.397}\) \(0.387\) \(0.405\) \(\underline{0.378}\) \(0.399\) \(0.397\) \(0.412\) \(0.415\) \(0.429\) \(0.589\) \(0.526\) \(0.386\) \(0.404\) \(0.393\) \(0.405\) \(0.387\) \(0.401\) \(0.394\) \(0.404\)
192 \(\boldsymbol{0.408}\) \(\boldsymbol{0.420}\) \(0.430\) \(0.429\) \(0.441\) \(0.436\) \(0.427\) \(0.429\) \(0.446\) \(0.441\) \(0.479\) \(0.466\) \(0.653\) \(0.551\) \(0.429\) \(0.435\) \(0.443\) \(0.434\) \(\underline{0.415}\) \(\underline{0.423}\) \(0.444\) \(0.432\)
336 \(\boldsymbol{0.421}\) \(\boldsymbol{0.433}\) \(0.493\) \(0.459\) \(0.494\) \(0.463\) \(0.468\) \(0.455\) \(0.489\) \(0.467\) \(0.517\) \(0.482\) \(0.715\) \(0.581\) \(0.484\) \(0.457\) \(0.486\) \(0.456\) \(\underline{0.453}\) \(\underline{0.443}\) \(0.485\) \(0.451\)
720 \(\underline{0.462}\) \(\boldsymbol{0.457}\) \(0.522\) \(0.493\) \(0.488\) \(0.483\) \(0.508\) \(0.497\) \(0.513\) \(0.511\) \(0.505\) \(0.490\) \(0.726\) \(0.601\) \(0.544\) \(0.513\) \(0.509\) \(0.487\) \(\boldsymbol{0.461}\) \(\underline{0.463}\) \(0.486\) \(0.472\)
Avg \(\boldsymbol{0.415}\) \(\boldsymbol{0.426}\) \(0.456\) \(0.444\) \(0.452\) \(0.446\) \(0.445\) \(0.445\) \(0.461\) \(0.457\) \(0.479\) \(0.466\) \(0.670\) \(0.564\) \(0.460\) \(0.452\) \(0.458\) \(0.445\) \(\underline{0.429}\) \(\underline{0.432}\) \(0.452\) \(0.439\)
96 \(\boldsymbol{0.275}\) \(\boldsymbol{0.329}\) \(0.290\) \(0.341\) \(0.301\) \(0.350\) \(0.295\) \(0.347\) \(0.341\) \(0.394\) \(0.316\) \(0.358\) \(0.443\) \(0.459\) \(\underline{0.284}\) \(\underline{0.337}\) \(0.311\) \(0.358\) \(0.291\) \(0.342\) \(0.397\) \(0.451\)
192 \(\boldsymbol{0.359}\) \(\boldsymbol{0.383}\) \(\underline{0.366}\) \(0.394\) \(0.380\) \(0.399\) \(0.378\) \(0.401\) \(0.482\) \(0.479\) \(0.415\) \(0.414\) \(0.500\) \(0.506\) \(\underline{0.366}\) \(\underline{0.391}\) \(0.404\) \(0.415\) \(0.376\) \(0.393\) \(0.501\) \(0.501\)
336 \(\boldsymbol{0.376}\) \(\boldsymbol{0.398}\) \(0.425\) \(0.433\) \(\underline{0.423}\) \(\underline{0.431}\) \(0.425\) \(0.442\) \(0.591\) \(0.541\) \(0.452\) \(0.448\) \(0.506\) \(0.502\) \(0.438\) \(0.438\) \(0.447\) \(0.451\) \(0.437\) \(0.443\) \(0.611\) \(0.563\)
720 \(0.412\) \(\boldsymbol{0.424}\) \(\boldsymbol{0.405}\) \(\underline{0.431}\) \(0.431\) \(0.447\) \(0.436\) \(0.456\) \(0.839\) \(0.661\) \(0.461\) \(0.463\) \(0.503\) \(0.509\) \(\underline{0.407}\) \(0.449\) \(0.462\) \(0.464\) \(0.451\) \(0.458\) \(0.956\) \(0.718\)
Avg \(\boldsymbol{0.355}\) \(\boldsymbol{0.383}\) \(\underline{0.372}\) \(\underline{0.400}\) \(0.384\) \(0.407\) \(0.383\) \(0.412\) \(0.563\) \(0.519\) \(0.411\) \(0.421\) \(0.488\) \(0.494\) \(0.374\) \(0.404\) \(0.406\) \(0.422\) \(0.389\) \(0.409\) \(0.616\) \(0.558\)
96 \(\boldsymbol{0.304}\) \(\boldsymbol{0.335}\) \(0.319\) \(0.359\) \(0.341\) \(0.376\) \(0.326\) \(0.366\) \(0.346\) \(0.374\) \(0.336\) \(0.375\) \(0.564\) \(0.506\) \(\underline{0.318}\) \(\underline{0.356}\) \(0.321\) \(0.362\) \(0.326\) \(0.365\) \(0.331\) \(0.363\)
192 \(\boldsymbol{0.346}\) \(\boldsymbol{0.364}\) \(\underline{0.360}\) \(0.384\) \(0.382\) \(0.396\) \(0.365\) \(0.387\) \(0.382\) \(0.391\) \(0.377\) \(0.395\) \(0.586\) \(0.516\) \(0.373\) \(0.389\) \(0.364\) \(\underline{0.381}\) \(\underline{0.360}\) \(0.384\) \(0.373\) \(0.382\)
336 \(\underline{0.372}\) \(\underline{0.392}\) \(0.395\) \(0.406\) \(0.420\) \(0.421\) \(0.392\) \(0.406\) \(0.415\) \(0.415\) \(0.418\) \(0.420\) \(0.679\) \(0.547\) \(0.412\) \(\boldsymbol{0.387}\) \(0.396\) \(0.404\) \(\boldsymbol{0.369}\) \(0.402\) \(0.405\) \(0.403\)
720 \(\underline{0.451}\) \(\underline{0.440}\) \(0.461\) \(0.444\) \(0.487\) \(0.456\) \(0.461\) \(0.443\) \(0.473\) \(0.451\) \(0.541\) \(0.481\) \(0.715\) \(0.567\) \(0.460\) \(0.450\) \(0.468\) \(0.443\) \(\boldsymbol{0.449}\) \(\boldsymbol{0.437}\) \(0.467\) \(0.437\)
Avg \(\boldsymbol{0.368}\) \(\boldsymbol{0.383}\) \(0.384\) \(0.398\) \(0.408\) \(0.412\) \(0.386\) \(0.400\) \(0.404\) \(0.408\) \(0.418\) \(0.418\) \(0.636\) \(0.534\) \(0.391\) \(0.395\) \(0.387\) \(0.398\) \(\underline{0.376}\) \(0.397\) \(0.394\) \(\underline{0.396}\)
96 \(\underline{0.174}\) \(\underline{0.256}\) \(0.179\) \(0.261\) \(0.186\) \(0.272\) \(0.184\) \(0.269\) \(0.193\) \(0.293\) \(0.188\) \(0.268\) \(0.554\) \(0.469\) \(\boldsymbol{0.172}\) \(\boldsymbol{0.254}\) \(0.176\) \(0.259\) \(0.176\) \(0.263\) \(0.187\) \(0.271\)
192 \(\boldsymbol{0.233}\) \(\boldsymbol{0.295}\) \(\underline{0.238}\) \(\underline{0.301}\) \(0.252\) \(0.312\) \(0.247\) \(0.307\) \(0.284\) \(0.361\) \(0.250\) \(0.306\) \(0.609\) \(0.497\) \(0.241\) \(0.302\) \(0.246\) \(0.304\) \(0.239\) \(\underline{0.301}\) \(0.251\) \(0.309\)
336 \(\boldsymbol{0.296}\) \(\boldsymbol{0.331}\) \(\underline{0.299}\) \(\underline{0.339}\) \(0.315\) \(0.351\) \(0.313\) \(0.354\) \(0.382\) \(0.429\) \(0.306\) \(0.341\) \(0.401\) \(0.409\) \(0.301\) \(0.340\) \(0.309\) \(0.344\) \(0.304\) \(0.346\) \(0.309\) \(0.346\)
720 \(\boldsymbol{0.370}\) \(\boldsymbol{0.385}\) \(0.395\) \(\underline{0.394}\) \(0.415\) \(0.408\) \(0.409\) \(0.406\) \(0.558\) \(0.525\) \(0.420\) \(0.405\) \(0.443\) \(0.433\) \(\underline{0.394}\) \(0.395\) \(0.417\) \(0.408\) \(0.410\) \(0.408\) \(0.407\) \(0.400\)
Avg \(\boldsymbol{0.268}\) \(\boldsymbol{0.317}\) \(0.278\) \(0.324\) \(0.292\) \(0.336\) \(0.288\) \(0.334\) \(0.354\) \(0.402\) \(0.291\) \(0.330\) \(0.502\) \(0.452\) \(\underline{0.277}\) \(\underline{0.323}\) \(0.287\) \(0.329\) \(0.282\) \(0.330\) \(0.288\) \(0.332\)
96 \(\boldsymbol{0.138}\) \(\boldsymbol{0.223}\) \(0.161\) \(0.252\) \(\underline{0.148}\) \(\underline{0.241}\) \(0.181\) \(0.274\) \(0.211\) \(0.302\) \(0.163\) \(0.267\) \(0.232\) \(0.347\) \(0.241\) \(0.244\) \(0.167\) \(0.264\) \(0.177\) \(0.270\) \(0.185\) \(0.267\)
192 \(\underline{0.160}\) \(\boldsymbol{0.258}\) \(0.176\) \(0.269\) \(0.167\) \(0.248\) \(0.187\) \(0.280\) \(0.211\) \(0.305\) \(0.184\) \(0.284\) \(0.363\) \(0.447\) \(\boldsymbol{0.159}\) \(\underline{0.260}\) \(0.181\) \(0.276\) \(0.185\) \(0.276\) \(0.190\) \(0.272\)
336 \(\boldsymbol{0.167}\) \(\boldsymbol{0.253}\) \(0.193\) \(0.283\) \(0.179\) \(\underline{0.271}\) \(0.204\) \(0.296\) \(0.223\) \(0.319\) \(0.196\) \(0.297\) \(0.599\) \(0.595\) \(\underline{0.177}\) \(0.276\) \(0.203\) \(0.303\) \(0.201\) \(0.292\) \(0.205\) \(0.288\)
720 \(\underline{0.213}\) \(\underline{0.299}\) \(0.232\) \(0.316\) \(\boldsymbol{0.208}\) \(\boldsymbol{0.298}\) \(0.246\) \(0.328\) \(0.258\) \(0.351\) \(0.232\) \(0.325\) \(0.775\) \(0.701\) \(0.229\) \(0.321\) \(0.251\) \(0.341\) \(0.241\) \(0.323\) \(0.246\) \(0.321\)
Avg \(\boldsymbol{0.170}\) \(\boldsymbol{0.258}\) \(0.190\) \(0.280\) \(\underline{0.176}\) \(\underline{0.265}\) \(0.205\) \(0.295\) \(0.226\) \(0.319\) \(0.194\) \(0.293\) \(0.492\) \(0.523\) \(0.201\) \(0.275\) \(0.200\) \(0.296\) \(0.201\) \(0.290\) \(0.206\) \(0.287\)
96 \(\boldsymbol{0.082}\) \(\boldsymbol{0.201}\) \(\underline{0.087}\) \(\underline{0.204}\) \(0.088\) \(0.208\) \(0.094\) \(0.213\) \(0.098\) \(0.233\) \(0.115\) \(0.242\) \(0.158\) \(0.290\) \(0.094\) \(0.214\) \(0.094\) \(0.217\) \(0.092\) \(0.212\) \(0.088\) \(0.208\)
192 \(\boldsymbol{0.174}\) \(\underline{0.298}\) \(\underline{0.177}\) \(\boldsymbol{0.295}\) \(0.180\) \(0.303\) \(0.182\) \(0.303\) \(0.186\) \(0.325\) \(0.216\) \(0.333\) \(0.299\) \(0.406\) \(0.182\) \(0.303\) \(0.187\) \(0.311\) \(0.180\) \(0.300\) \(0.182\) \(0.305\)
336 \(\boldsymbol{0.315}\) \(\boldsymbol{0.405}\) \(0.328\) \(\underline{0.414}\) \(0.331\) \(0.418\) \(0.347\) \(0.426\) \(\underline{0.325}\) \(0.434\) \(0.375\) \(0.444\) \(0.470\) \(0.511\) \(0.384\) \(0.448\) \(0.342\) \(0.424\) \(0.352\) \(0.430\) \(0.332\) \(0.417\)
720 \(\underline{0.772}\) \(\boldsymbol{0.659}\) \(0.847\) \(0.692\) \(0.848\) \(0.695\) \(0.931\) \(0.724\) \(\boldsymbol{0.746}\) \(0.663\) \(1.012\) \(0.765\) \(1.228\) \(0.869\) \(0.932\) \(0.724\) \(0.908\) \(0.715\) \(0.912\) \(0.715\) \(0.861\) \(\underline{0.701}\)
Avg \(\boldsymbol{0.336}\) \(\boldsymbol{0.391}\) \(0.359\) \(\underline{0.401}\) \(0.362\) \(0.406\) \(0.389\) \(0.417\) \(\underline{0.339}\) \(0.414\) \(0.430\) \(0.446\) \(0.539\) \(0.519\) \(0.398\) \(0.422\) \(0.383\) \(0.417\) \(0.384\) \(0.414\) \(0.366\) \(0.408\)
96 \(\underline{0.160}\) \(\underline{0.210}\) \(0.164\) \(\underline{0.210}\) \(0.176\) \(0.216\) \(0.174\) \(0.215\) \(0.195\) \(0.252\) \(0.172\) \(0.221\) \(0.301\) \(0.364\) \(\boldsymbol{0.157}\) \(\boldsymbol{0.205}\) \(0.168\) \(0.214\) \(0.163\) \(0.210\) \(0.194\) \(0.236\)
192 \(\underline{0.207}\) \(\underline{0.250}\) \(0.208\) \(0.251\) \(0.225\) \(0.257\) \(0.221\) \(0.256\) \(0.239\) \(0.299\) \(0.220\) \(0.260\) \(0.335\) \(0.385\) \(\boldsymbol{0.204}\) \(\boldsymbol{0.248}\) \(0.215\) \(0.255\) \(0.209\) \(0.251\) \(0.239\) \(0.271\)
336 \(\boldsymbol{0.253}\) \(\boldsymbol{0.284}\) \(0.264\) \(0.292\) \(0.281\) \(0.299\) \(0.280\) \(0.297\) \(0.282\) \(0.333\) \(0.280\) \(0.302\) \(0.352\) \(0.376\) \(0.264\) \(0.293\) \(0.272\) \(0.295\) \(\underline{0.263}\) \(\underline{0.291}\) \(0.290\) \(0.306\)
720 \(\boldsymbol{0.313}\) \(\boldsymbol{0.332}\) \(0.343\) \(\underline{0.343}\) \(0.361\) \(0.353\) \(0.357\) \(0.349\) \(0.345\) \(0.381\) \(0.353\) \(0.350\) \(0.405\) \(0.401\) \(0.343\) \(\underline{0.343}\) \(0.351\) \(0.346\) \(\underline{0.341}\) \(0.344\) \(0.362\) \(0.352\)
Avg \(\boldsymbol{0.233}\) \(\boldsymbol{0.269}\) \(0.245\) \(0.274\) \(0.261\) \(0.281\) \(0.258\) \(0.279\) \(0.265\) \(0.316\) \(0.256\) \(0.283\) \(0.348\) \(0.382\) \(\underline{0.242}\) \(\underline{0.272}\) \(0.252\) \(0.277\) \(0.244\) \(0.274\) \(0.271\) \(0.291\)
96 \(\boldsymbol{0.389}\) \(\boldsymbol{0.259}\) \(0.476\) \(0.292\) \(\underline{0.393}\) \(\underline{0.269}\) \(0.459\) \(0.299\) \(0.712\) \(0.438\) \(0.593\) \(0.317\) \(0.663\) \(0.403\) \(0.428\) \(0.271\) \(0.513\) \(0.352\) \(0.598\) \(0.382\) \(0.546\) \(0.346\)
192 \(\boldsymbol{0.405}\) \(\boldsymbol{0.258}\) \(0.501\) \(0.301\) \(\underline{0.412}\) \(\underline{0.277}\) \(0.469\) \(0.303\) \(0.662\) \(0.417\) \(0.618\) \(0.327\) \(0.915\) \(0.557\) \(0.447\) \(0.280\) \(0.509\) \(0.350\) \(0.579\) \(0.365\) \(0.529\) \(0.335\)
336 \(\boldsymbol{0.398}\) \(\underline{0.286}\) \(0.514\) \(0.314\) \(\underline{0.424}\) \(\boldsymbol{0.283}\) \(0.483\) \(0.309\) \(0.669\) \(0.419\) \(0.642\) \(0.341\) \(1.217\) \(0.704\) \(0.472\) \(0.289\) \(0.533\) \(0.360\) \(0.572\) \(0.361\) \(0.540\) \(0.339\)
720 \(\boldsymbol{0.437}\) \(\boldsymbol{0.290}\) \(0.545\) \(0.320\) \(\underline{0.459}\) \(\underline{0.301}\) \(0.518\) \(0.326\) \(0.709\) \(0.437\) \(0.679\) \(0.350\) \(1.317\) \(0.755\) \(0.517\) \(0.307\) \(0.599\) \(0.395\) \(0.609\) \(0.384\) \(0.576\) \(0.358\)
Avg \(\boldsymbol{0.407}\) \(\boldsymbol{0.274}\) \(0.509\) \(0.307\) \(\underline{0.422}\) \(\underline{0.283}\) \(0.482\) \(0.309\) \(0.688\) \(0.428\) \(0.633\) \(0.334\) \(1.028\) \(0.605\) \(0.466\) \(0.287\) \(0.539\) \(0.364\) \(0.590\) \(0.373\) \(0.547\) \(0.345\)
96 \(\boldsymbol{0.187}\) \(\boldsymbol{0.231}\) \(\underline{0.196}\) \(0.263\) \(0.207\) \(\underline{0.237}\) \(0.216\) \(0.274\) \(0.290\) \(0.378\) \(0.223\) \(0.256\) \(0.552\) \(0.524\) \(0.198\) \(0.244\) \(0.239\) \(0.272\) \(0.248\) \(0.302\) \(0.310\) \(0.312\)
192 \(\boldsymbol{0.216}\) \(\boldsymbol{0.261}\) \(0.245\) \(0.279\) \(0.242\) \(\underline{0.264}\) \(0.250\) \(0.294\) \(0.320\) \(0.398\) \(0.262\) \(0.272\) \(0.696\) \(0.605\) \(\underline{0.226}\) \(0.270\) \(0.291\) \(0.298\) \(0.291\) \(0.321\) \(0.348\) \(0.330\)
336 \(\boldsymbol{0.223}\) \(\boldsymbol{0.269}\) \(\underline{0.235}\) \(0.287\) \(0.251\) \(\underline{0.274}\) \(0.265\) \(0.302\) \(0.353\) \(0.415\) \(0.287\) \(0.299\) \(0.816\) \(0.677\) \(0.239\) \(0.281\) \(0.325\) \(0.316\) \(0.307\) \(0.332\) \(0.399\) \(0.355\)
720 \(\boldsymbol{0.226}\) \(\boldsymbol{0.271}\) \(\underline{0.236}\) \(0.286\) \(0.251\) \(0.276\) \(0.266\) \(0.298\) \(0.357\) \(0.413\) \(0.298\) \(0.318\) \(0.844\) \(0.731\) \(0.242\) \(\underline{0.282}\) \(0.336\) \(0.321\) \(0.310\) \(0.329\) \(0.393\) \(0.349\)
Avg \(\boldsymbol{0.213}\) \(\boldsymbol{0.258}\) \(0.228\) \(0.279\) \(0.238\) \(\underline{0.263}\) \(0.249\) \(0.292\) \(0.330\) \(0.401\) \(0.268\) \(0.286\) \(0.727\) \(0.634\) \(\underline{0.226}\) \(0.269\) \(0.298\) \(0.301\) \(0.289\) \(0.321\) \(0.363\) \(0.337\)

2pt

Table 9: Full results of short-term forecasting. All the results are selected from 4 different prediction lengths {12, 24, 48, 96}, and the look-back length is fixed to 96 for all baselines. A lower MSE or MAE indicates a better prediction, with the best in \(\boldsymbol{boldface}\) and second in underline.
Models DMSC(Ours) TimeMixer iTransformer PatchTST Dlinear TimesNet Autoformer TimeXer PatchMLP TimeKAN AMD
3-4 (lr)5-6 (lr)7-8 (lr)9-10 (lr)11-12 (lr)13-14 (lr)15-16 (lr)17-18 (lr)19-20 (lr)21-22 (lr)23-24 MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
12 \(\boldsymbol{0.066}\) \(\boldsymbol{0.171}\) \(0.084\) \(0.194\) \(0.069\) \(\underline{0.175}\) \(0.105\) \(0.216\) \(0.122\) \(0.245\) \(0.088\) \(0.195\) \(0.224\) \(0.346\) \(\underline{0.068}\) \(0.179\) \(0.118\) \(0.191\) \(0.095\) \(0.210\) \(0.106\) \(0.225\)
24 \(\underline{0.092}\) \(\boldsymbol{0.203}\) \(0.130\) \(0.244\) \(0.099\) \(0.210\) \(0.198\) \(0.296\) \(0.202\) \(0.320\) \(0.118\) \(0.224\) \(0.492\) \(0.513\) \(\boldsymbol{0.089}\) \(\underline{0.204}\) \(0.204\) \(0.231\) \(0.166\) \(0.281\) \(0.174\) \(0.294\)
48 \(\boldsymbol{0.136}\) \(\boldsymbol{0.251}\) \(0.218\) \(0.317\) \(0.164\) \(0.275\) \(0.472\) \(0.466\) \(0.334\) \(0.428\) \(0.169\) \(0.268\) \(0.392\) \(0.459\) \(\underline{0.137}\) \(\underline{0.253}\) \(0.213\) \(0.314\) \(0.314\) \(0.393\) \(0.333\) \(0.417\)
96 \(\boldsymbol{0.230}\) \(\underline{0.339}\) \(0.327\) \(0.398\) \(0.711\) \(0.651\) \(0.458\) \(0.493\) \(0.459\) \(0.517\) \(\underline{0.239}\) \(\boldsymbol{0.330}\) \(0.944\) \(0.749\) \(0.427\) \(0.483\) \(0.347\) \(0.421\) \(0.558\) \(0.543\) \(0.510\) \(0.536\)
Avg \(\boldsymbol{0.131}\) \(\boldsymbol{0.241}\) \(0.190\) \(0.288\) \(0.261\) \(0.328\) \(0.308\) \(0.368\) \(0.279\) \(0.377\) \(\underline{0.154}\) \(\underline{0.254}\) \(0.513\) \(0.517\) \(0.180\) \(0.280\) \(0.220\) \(0.289\) \(0.283\) \(0.357\) \(0.281\) \(0.368\)
12 \(\boldsymbol{0.078}\) \(\boldsymbol{0.183}\) \(0.105\) \(0.216\) \(0.766\) \(0.709\) \(0.116\) \(0.230\) \(0.147\) \(0.272\) \(\underline{0.092}\) \(\underline{0.202}\) \(0.211\) \(0.341\) \(0.293\) \(0.397\) \(0.109\) \(0.204\) \(0.107\) \(0.222\) \(0.131\) \(0.256\)
24 \(\boldsymbol{0.102}\) \(\boldsymbol{0.215}\) \(0.168\) \(0.280\) \(0.799\) \(0.728\) \(0.216\) \(0.314\) \(0.225\) \(0.340\) \(\underline{0.111}\) \(\underline{0.224}\) \(0.394\) \(0.471\) \(0.308\) \(0.409\) \(0.129\) \(0.248\) \(0.178\) \(0.294\) \(0.196\) \(0.311\)
48 \(\boldsymbol{0.147}\) \(\boldsymbol{0.261}\) \(0.270\) \(0.359\) \(1.041\) \(0.882\) \(0.503\) \(0.489\) \(0.356\) \(0.437\) \(\underline{0.152}\) \(\underline{0.266}\) \(0.429\) \(0.463\) \(0.339\) \(0.425\) \(0.213\) \(0.326\) \(0.329\) \(0.409\) \(0.344\) \(0.420\)
96 \(\boldsymbol{0.190}\) \(\underline{0.316}\) \(0.377\) \(0.439\) \(1.045\) \(0.886\) \(0.623\) \(0.586\) \(0.453\) \(0.505\) \(\underline{0.197}\) \(\boldsymbol{0.308}\) \(0.853\) \(0.703\) \(0.367\) \(0.441\) \(0.361\) \(0.441\) \(0.572\) \(0.563\) \(0.638\) \(0.602\)
Avg \(\boldsymbol{0.129}\) \(\boldsymbol{0.244}\) \(0.230\) \(0.324\) \(0.913\) \(0.801\) \(0.365\) \(0.405\) \(0.295\) \(0.389\) \(\underline{0.138}\) \(\underline{0.250}\) \(0.472\) \(0.495\) \(0.327\) \(0.418\) \(0.203\) \(0.305\) \(0.297\) \(0.372\) \(0.327\) \(0.397\)
12 \(\boldsymbol{0.059}\) \(\boldsymbol{0.157}\) \(0.070\) \(0.173\) \(0.068\) \(0.169\) \(0.093\) \(0.206\) \(0.116\) \(0.241\) \(0.075\) \(0.179\) \(0.207\) \(0.335\) \(\underline{0.061}\) \(\underline{0.165}\) \(0.107\) \(0.178\) \(0.085\) \(0.199\) \(0.096\) \(0.223\)
24 \(\underline{0.078}\) \(\underline{0.179}\) \(0.109\) \(0.215\) \(0.087\) \(0.190\) \(0.195\) \(0.295\) \(0.209\) \(0.327\) \(0.083\) \(0.198\) \(0.314\) \(0.412\) \(\boldsymbol{0.071}\) \(\boldsymbol{0.177}\) \(0.112\) \(0.216\) \(0.149\) \(0.268\) \(0.231\) \(0.360\)
48 \(\underline{0.104}\) \(\underline{0.211}\) \(0.199\) \(0.296\) \(0.122\) \(0.231\) \(0.485\) \(0.469\) \(0.397\) \(0.456\) \(0.128\) \(0.235\) \(0.595\) \(0.553\) \(\boldsymbol{0.100}\) \(\boldsymbol{0.208}\) \(0.168\) \(0.282\) \(0.292\) \(0.383\) \(0.143\) \(0.272\)
96 \(\underline{0.128}\) \(\underline{0.226}\) \(0.283\) \(0.341\) \(0.159\) \(0.267\) \(0.979\) \(0.716\) \(0.592\) \(0.552\) \(0.150\) \(0.253\) \(0.556\) \(0.563\) \(\boldsymbol{0.120}\) \(\boldsymbol{0.221}\) \(0.275\) \(0.373\) \(0.531\) \(0.535\) \(0.312\) \(0.417\)
Avg \(\underline{0.092}\) \(\underline{0.193}\) \(0.165\) \(0.256\) \(0.109\) \(0.214\) \(0.438\) \(0.422\) \(0.329\) \(0.394\) \(0.109\) \(0.216\) \(0.418\) \(0.466\) \(\boldsymbol{0.088}\) \(\boldsymbol{0.192}\) \(0.166\) \(0.262\) \(0.264\) \(0.346\) \(0.196\) \(0.318\)
12 \(\boldsymbol{0.076}\) \(\boldsymbol{0.178}\) \(0.100\) \(0.208\) \(\underline{0.081}\) \(\underline{0.183}\) \(0.109\) \(0.222\) \(0.153\) \(0.259\) \(0.158\) \(0.192\) \(0.295\) \(0.391\) \(0.146\) \(0.198\) \(0.096\) \(0.206\) \(0.105\) \(0.219\) \(0.141\) \(0.262\)
24 \(0.124\) \(0.223\) \(0.171\) \(0.279\) \(\underline{0.118}\) \(\underline{0.222}\) \(0.205\) \(0.306\) \(0.238\) \(0.357\) \(\boldsymbol{0.112}\) \(\boldsymbol{0.219}\) \(0.345\) \(0.419\) \(0.171\) \(0.221\) \(0.144\) \(0.256\) \(0.179\) \(0.291\) \(0.243\) \(0.350\)
48 \(\boldsymbol{0.172}\) \(\underline{0.216}\) \(0.317\) \(0.380\) \(\underline{0.202}\) \(0.292\) \(0.493\) \(0.484\) \(0.473\) \(0.515\) \(0.231\) \(\boldsymbol{0.198}\) \(0.503\) \(0.495\) \(0.220\) \(0.270\) \(0.253\) \(0.351\) \(0.343\) \(0.405\) \(0.439\) \(0.476\)
96 \(\boldsymbol{0.278}\) \(\boldsymbol{0.296}\) \(0.468\) \(0.459\) \(0.395\) \(0.415\) \(0.582\) \(0.548\) \(0.748\) \(0.646\) \(\underline{0.291}\) \(0.338\) \(1.268\) \(0.857\) \(0.285\) \(\underline{0.303}\) \(0.482\) \(0.492\) \(0.720\) \(0.592\) \(0.886\) \(0.684\)
Avg \(\boldsymbol{0.162}\) \(\boldsymbol{0.228}\) \(0.264\) \(0.332\) \(0.199\) \(0.278\) \(0.347\) \(0.390\) \(0.403\) \(0.444\) \(\underline{0.198}\) \(\underline{0.236}\) \(0.603\) \(0.541\) \(0.206\) \(0.248\) \(0.244\) \(0.326\) \(0.337\) \(0.377\) \(0.427\) \(0.443\)

2pt

Table 10: Full results of ablation study. All the results are selected from 4 different prediction lengths {96, 192, 336, 720}, and the look-back length is fixed to 96 for all baselines. A lower MSE or MAE indicates a better prediction, with the best in \(\boldsymbol{boldface}\).
Design DMSC(Ours) w/o EMPD w/o TIB w/o ASR-MoE 1static decomp 2only \(\mathbf{F}_{intra}\) 3w/o \(\mathbf{F}^l_{{fused}}\) 4Agg. Heads 5w/o \(\mathcal{E}^g\) 6w/o \(\mathcal{E}^l\)
3-4 (lr)5-6 (lr)7-8 (lr)9-10 (lr)11-12 (lr)13-14 (lr)15-16 (lr)17-18 (lr)19-20 (lr)21-22 MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
96 \(\boldsymbol{0.138}\) \(\boldsymbol{0.223}\) \(0.155\) \(0.247\) \(0.155\) \(0.247\) \(0.175\) \(0.267\) \(0.151\) \(0.245\) \(0.161\) \(0.251\) \(0.157\) \(0.250\) \(0.166\) \(0.256\) \(0.160\) \(0.251\) \(0.171\) \(0.259\)
192 \(\boldsymbol{0.160}\) \(\boldsymbol{0.258}\) \(0.170\) \(0.259\) \(0.169\) \(0.259\) \(0.182\) \(0.275\) \(0.163\) \(0.254\) \(0.171\) \(0.261\) \(0.166\) \(0.257\) \(0.177\) \(0.266\) \(0.167\) \(0.259\) \(0.176\) \(0.264\)
336 \(\boldsymbol{0.167}\) \(\boldsymbol{0.253}\) \(0.179\) \(0.270\) \(0.186\) \(0.276\) \(0.192\) \(0.282\) \(0.176\) \(0.271\) \(0.188\) \(0.278\) \(0.186\) \(0.278\) \(0.194\) \(0.283\) \(0.192\) \(0.284\) \(0.202\) \(0.290\)
720 \(\boldsymbol{0.213}\) \(\boldsymbol{0.299}\) \(0.220\) \(0.305\) \(0.228\) \(0.311\) \(0.234\) \(0.317\) \(0.225\) \(0.312\) \(0.228\) \(0.312\) \(0.238\) \(0.321\) \(0.238\) \(0.319\) \(0.245\) \(0.325\) \(0.246\) \(0.328\)
Avg \(\boldsymbol{0.170}\) \(\boldsymbol{0.258}\) \(0.181\) \(0.270\) \(0.185\) \(0.273\) \(0.196\) \(0.285\) \(0.179\) \(0.271\) \(0.187\) \(0.276\) \(0.187\) \(0.277\) \(0.194\) \(0.281\) \(0.191\) \(0.280\) \(0.199\) \(0.285\)
96 \(\boldsymbol{0.160}\) \(\boldsymbol{0.210}\) \(0.161\) \(0.208\) \(0.173\) \(0.216\) \(0.167\) \(0.215\) \(0.160\) \(0.205\) \(0.166\) \(0.210\) \(0.167\) \(0.216\) \(0.173\) \(0.219\) \(0.170\) \(0.219\) \(0.164\) \(0.213\)
192 \(\boldsymbol{0.207}\) \(\boldsymbol{0.250}\) \(0.213\) \(0.253\) \(0.218\) \(0.255\) \(0.216\) \(0.260\) \(0.215\) \(0.253\) \(0.214\) \(0.254\) \(0.213\) \(0.254\) \(0.222\) \(0.260\) \(0.213\) \(0.254\) \(0.213\) \(0.252\)
336 \(\boldsymbol{0.253}\) \(\boldsymbol{0.284}\) \(0.269\) \(0.292\) \(0.274\) \(0.296\) \(0.271\) \(0.297\) \(0.264\) \(0.289\) \(0.269\) \(0.294\) \(0.266\) \(0.291\) \(0.271\) \(0.296\) \(0.273\) \(0.296\) \(0.256\) \(0.287\)
720 \(\boldsymbol{0.313}\) \(\boldsymbol{0.332}\) \(0.349\) \(0.346\) \(0.353\) \(0.343\) \(0.348\) \(0.349\) \(0.342\) \(0.340\) \(0.346\) \(0.343\) \(0.343\) \(0.332\) \(0.352\) \(0.348\) \(0.346\) \(0.344\) \(0.338\) \(0.331\)
Avg \(\boldsymbol{0.233}\) \(\boldsymbol{0.269}\) \(0.248\) \(0.275\) \(0.255\) \(0.278\) \(0.251\) \(0.280\) \(0.245\) \(0.272\) \(0.249\) \(0.275\) \(0.247\) \(0.273\) \(0.255\) \(0.281\) \(0.251\) \(0.278\) \(0.243\) \(0.271\)
96 \(\boldsymbol{0.389}\) \(\boldsymbol{0.259}\) \(0.479\) \(0.311\) \(0.486\) \(0.318\) \(0.484\) \(0.314\) \(0.443\) \(0.299\) \(0.461\) \(0.303\) \(0.459\) \(0.308\) \(0.494\) \(0.301\) \(0.494\) \(0.320\) \(0.451\) \(0.317\)
192 \(\boldsymbol{0.405}\) \(\boldsymbol{0.258}\) \(0.493\) \(0.316\) \(0.483\) \(0.314\) \(0.491\) \(0.313\) \(0.458\) \(0.300\) \(0.484\) \(0.311\) \(0.473\) \(0.309\) \(0.495\) \(0.308\) \(0.487\) \(0.317\) \(0.486\) \(0.329\)
336 \(\boldsymbol{0.398}\) \(\boldsymbol{0.286}\) \(0.503\) \(0.322\) \(0.490\) \(0.315\) \(0.503\) \(0.318\) \(0.471\) \(0.302\) \(0.493\) \(0.318\) \(0.485\) \(0.311\) \(0.507\) \(0.316\) \(0.505\) \(0.326\) \(0.479\) \(0.317\)
720 \(\boldsymbol{0.437}\) \(\boldsymbol{0.290}\) \(0.538\) \(0.341\) \(0.530\) \(0.337\) \(0.535\) \(0.334\) \(0.542\) \(0.349\) \(0.516\) \(0.327\) \(0.516\) \(0.328\) \(0.544\) \(0.321\) \(0.531\) \(0.334\) \(0.535\) \(0.342\)
Avg \(\boldsymbol{0.407}\) \(\boldsymbol{0.274}\) \(0.503\) \(0.323\) \(0.497\) \(0.321\) \(0.503\) \(0.320\) \(0.479\) \(0.313\) \(0.489\) \(0.315\) \(0.483\) \(0.314\) \(0.510\) \(0.312\) \(0.504\) \(0.324\) \(0.488\) \(0.326\)
96 \(\boldsymbol{0.187}\) \(\boldsymbol{0.231}\) \(0.221\) \(0.260\) \(0.232\) \(0.278\) \(0.242\) \(0.290\) \(0.208\) \(0.255\) \(0.237\) \(0.280\) \(0.240\) \(0.278\) \(0.241\) \(0.284\) \(0.226\) \(0.272\) \(0.234\) \(0.279\)
192 \(\boldsymbol{0.216}\) \(\boldsymbol{0.261}\) \(0.241\) \(0.273\) \(0.264\) \(0.295\) \(0.295\) \(0.324\) \(0.248\) \(0.281\) \(0.281\) \(0.298\) \(0.302\) \(0.316\) \(0.283\) \(0.303\) \(0.279\) \(0.304\) \(0.273\) \(0.303\)
336 \(\boldsymbol{0.223}\) \(\boldsymbol{0.269}\) \(0.264\) \(0.290\) \(0.283\) \(0.306\) \(0.266\) \(0.302\) \(0.261\) \(0.289\) \(0.289\) \(0.309\) \(0.301\) \(0.312\) \(0.289\) \(0.301\) \(0.276\) \(0.296\) \(0.305\) \(0.322\)
720 \(\boldsymbol{0.226}\) \(\boldsymbol{0.271}\) \(0.267\) \(0.288\) \(0.276\) \(0.293\) \(0.282\) \(0.304\) \(0.254\) \(0.280\) \(0.292\) \(0.303\) \(0.264\) \(0.289\) \(0.298\) \(0.306\) \(0.280\) \(0.299\) \(0.289\) \(0.306\)
Avg \(\boldsymbol{0.213}\) \(\boldsymbol{0.258}\) \(0.248\) \(0.278\) \(0.264\) \(0.293\) \(0.271\) \(0.305\) \(0.243\) \(0.276\) \(0.275\) \(0.298\) \(0.277\) \(0.299\) \(0.278\) \(0.299\) \(0.265\) \(0.293\) \(0.275\) \(0.303\)

References↩︎

[1]
Alvarez, F. M.; Troncoso, A.; Riquelme, J. C.; and Ruiz, J. S. A. 2010. Energy time series forecasting based on pattern sequence similarity. IEEE Transactions on Knowledge and Data Engineering, 23(8): 1230–1243.
[2]
Guo, C.; Yang, B.; Andersen, O.; Jensen, C. S.; and Torp, K. 2015. Ecomark 2.0: empowering eco-routing with vehicular environmental models and actual vehicle fuel consumption data. GeoInformatica, 19(3): 567–599.
[3]
Tran, L.; Nguyen, M.; and Shahabi, C. 2019. Representation learning for early sepsis prediction. In 2019 Computing in Cardiology (CinC), 1–4. IEEE.
[4]
Wei, K.; Li, T.; Huang, F.; Chen, J.; and He, Z. 2022. Cancer classification with data augmentation based on generative adversarial networks. Frontiers of Computer Science, 16(2): 162601.
[5]
Guo, C.; Yang, B.; Hu, J.; Jensen, C. S.; and Chen, L. 2020. Context-aware, preference-based vehicle routing. The VLDB Journal, 29(5): 1149–1170.
[6]
Jin, K.; Wi, J.; Lee, E.; Kang, S.; Kim, S.; and Kim, Y. 2021. TrafficBERT: Pre-trained model with large-scale data for long-range traffic flow forecasting. Expert Systems with Applications, 186: 115738.
[7]
Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; and Tian, Q. 2023. Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619(7970): 533–538.
[8]
Wu, H.; Zhou, H.; Long, M.; and Wang, J. 2023. Interpretable weather forecasting for worldwide stations with a unified deep model. Nature Machine Intelligence, 5(6): 602–611.
[9]
Chen, Z.; Zheng, L.; Lu, C.; Yuan, J.; and Zhu, D. 2023. ChatGPT Informed Graph Neural Network for Stock Movement Prediction. SSRN Electronic Journal.
[10]
Yu, X.; Chen, Z.; Ling, Y.; Dong, S.; Liu, Z.; and Lu, Y. 2023. Temporal data meets LLM–explainable financial time series forecasting.
[11]
Qiu, X.; Hu, J.; Zhou, L.; Wu, X.; Du, J.; Zhang, B.; Guo, C.; Zhou, A.; Jensen, C. S.; Sheng, Z.; and Yang, B. 2024. TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods. Proceedings of the VLDB Endowment, 2363–2377.
[12]
Nason, G. P. 2006. Stationary and non-stationary time series. In Statistics in Volcanology. Geological Society of London.
[13]
Dagum, E. B.; and Bianconcini, S. 2016. Seasonal Adjustment Methods and Real Time Trend-Cycle Estimation, volume 8. Springer Nature.
[14]
Shao, Z.; Wang, F.; Xu, Y.; Wei, W.; Yu, C.; Zhang, Z.; Yao, D.; Sun, T.; Jin, G.; Cao, X.; Cong, G.; Jensen, C. S.; and Cheng, X. 2025. Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis. IEEE Transactions on Knowledge and Data Engineering, 291–305.
[15]
Chen, Z.; Ma, M.; Li, T.; Wang, H.; and Li, C. 2023. Long sequence time-series forecasting with deep learning: A survey. Information Fusion, 97: 101819.
[16]
Wang, S.; Li, J.; Shi, X.; Ye, Z.; Mo, B.; Lin, W.; Ju, S.; Chu, Z.; and Jin, M. 2025. TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis. arXiv:2410.16032.
[17]
Wang, S.; Wu, H.; Shi, X.; Hu, T.; Luo, H.; Ma, L.; Zhang, J. Y.; and Zhou, J. 2024. TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. arXiv:2405.14616.
[18]
Huang, S.; Zhao, Z.; Li, C.; and Bai, L. 2025. TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting. arXiv:2502.06910.
[19]
Li, C.; Li, M.; and Diao, R. 2025. TVNet: A Novel Time Series Analysis Method Based on Dynamic Convolution and 3D-Variation. arXiv:2503.07674.
[20]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; and Long, M. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In 11th International Conference on Learning Representations, ICLR 2023.
[21]
Wang, Y.; Wu, H.; Dong, J.; Liu, Y.; Long, M.; and Wang, J. 2024. Deep Time Series Models: A Comprehensive Survey and Benchmark. arXiv:2407.13278.
[22]
Chi, H.; Liu, F.; Yang, W.; Lan, L.; Liu, T.; Han, B.; Cheung, W.; and Kwok, J. 2021. TOHAN: A one-step approach towards few-shot hypothesis adaptation. Advances in neural information processing systems, 34: 20970–20982.
[23]
Oreshkin, B. N.; Carpov, D.; Chapados, N.; and Bengio, Y. 2020. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv:1905.10437.
[24]
Yi, K.; Zhang, Q.; Fan, W.; Wang, S.; Wang, P.; He, H.; An, N.; Lian, D.; Cao, L.; and Niu, Z. 2023. Frequency-domain MLPs are more effective learners in time series forecasting. Advances in Neural Information Processing Systems, 36: 76656–76679.
[25]
Bai, S.; Kolter, J. Z.; and Koltun, V. 2018. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv:1803.01271.
[26]
LIU, M.; Zeng, A.; Chen, M.; Xu, Z.; LAI, Q.; Ma, L.; and Xu, Q. 2022. SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction. In Advances in Neural Information Processing Systems, volume 35, 5816–5828.
[27]
Wang, H.; Peng, J.; Huang, F.; Wang, J.; Chen, J.; and Xiao, Y. 2023. MICN: Multi-scale Local and Global Context Modeling for Long-term Series Forecasting. In 11th International Conference on Learning Representations, ICLR 2023.
[28]
Luo, D.; and Wang, X. 2024. Moderntcn: A modern pure convolution structure for general time series analysis. In The twelfth international conference on learning representations, 1–43.
[29]
Wang, J.; Xia, X.; Lan, L.; Wu, X.; Yu, J.; Yang, W.; Han, B.; and Liu, T. 2024. Tackling noisy labels with network parameter additive decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(9): 6341–6354.
[30]
Wu, H.; Xu, J.; Wang, J.; and Long, M. 2021. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems, 34: 22419–22430.
[31]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; and Zhang, W. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, 11106–11115.
[32]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. Advances in neural information processing systems, 30.
[33]
Tang, P.; and Zhang, W. 2024. Unlocking the Power of Patch: Patch-Based MLP for Long-Term Time Series Forecasting. arXiv:2405.13575.
[34]
Zeng, A.; Chen, M.; Zhang, L.; and Xu, Q. 2023. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, 11121–11128.
[35]
Nie, Y.; Nguyen, N. H.; Sinthong, P.; and Kalagnanam, J. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In 11th International Conference on Learning Representations, ICLR 2023.
[36]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; and Long, M. 2024. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. arXiv:2310.06625.
[37]
Wang, Y.; Wu, H.; Dong, J.; Qin, G.; Zhang, H.; Liu, Y.; Qiu, Y.; Wang, J.; and Long, M. 2024. TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables. arXiv:2402.19072.
[38]
Hu, Y.; Liu, P.; Zhu, P.; Cheng, D.; and Dai, T. 2025. Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting. arXiv:2406.03751.

  1. Corresponding author↩︎