TimeXer#
- class pytorch_forecasting.models.timexer._timexer_v2.TimeXer(loss: Module, enc_in: int = None, hidden_size: int = 512, n_heads: int = 8, e_layers: int = 2, d_ff: int = 2048, dropout: float = 0.1, patch_length: int = 4, factor: int = 5, activation: str = 'relu', use_efficient_attention: bool = False, logging_metrics: list[Module] | None = None, optimizer: Optimizer | str | None = 'adam', optimizer_params: dict | None = None, lr_scheduler: str | None = None, lr_scheduler_params: dict | None = None, metadata: dict | None = None, **kwargs: Any)[source]#
Bases:
TslibBaseModelAn implementation of TimeXer model for v2 of pytorch-forecasting.
TimeXer empowers the canonical transformer with the ability to reconcile endogenous and exogenous information without any architectural modifications and achieves consistent state-of-the-art performance across twelve real-world forecasting benchmarks.
TimeXer employs patch-level and variate-level representations respectively for endogenous and exogenous variables, with an endogenous global token as a bridge in-between. With this design, TimeXer can jointly capture intra-endogenous temporal dependencies and exogenous-to-endogenous correlations.
- Parameters:
loss (nn.Module) – Loss function to use for training.
enc_in (int, optional) – Number of input features for the encoder. If not provided, it will be set to the number of continuous features in the dataset.
hidden_size (int, default=512) – Dimension of the model embeddings and hidden representations of features.
n_heads (int, default=8) – Number of attention heads in the multi-head attention mechanism. e_layers: int, default=2 Number of encoder layers in the transformer architecture.
d_ff (int, default=2048) – Dimension of the feed-forward network in the transformer architecture.
dropout (float, default=0.1) – Dropout rate for regularization. This is used throughout the model to prevent overfitting.
patch_length (int, default=24) – Length of each non-overlapping patch for endogenous variable tokenization.
factor (int, default=5) – Factor for the attention mechanism, controlling the number of keys and values.
activation (str, default='relu') – Activation function to use in the feed-forward network. Common choices are ‘relu’, ‘gelu’, etc.
use_efficient_attention (bool, default=False) – If set to True, will use PyTorch’s native, optimized Scaled Dot Product Attention implementation which can reduce computation time and memory consumption for longer sequences. PyTorch automatically selects the optimal backend (FlashAttention-2, Memory-Efficient Attention, or their own C++ implementation) based on user’s input properties, hardware capabilities, and build configuration.
logging_metrics (Optional[list[nn.Module]], default=None) – List of metrics to log during training, validation, and testing.
optimizer (Optional[Union[Optimizer, str]], default='adam') – Optimizer to use for training. Can be a string name or an instance of an optimizer.
optimizer_params (Optional[dict], default=None) – Parameters for the optimizer. If None, default parameters for the optimizer will be used.
lr_scheduler (Optional[str], default=None) – Learning rate scheduler to use. If None, no scheduler is used.
lr_scheduler_params (Optional[dict], default=None) – Parameters for the learning rate scheduler. If None, default parameters for the scheduler will be used.
metadata (Optional[dict], default=None) – Metadata for the model from TslibDataModule. This can include information about the dataset, such as the number of time steps, number of features, etc. It is used to initialize the model and ensure it is compatible with the data being used, including the split between endogenous (target) and exogenous covariates.
References
[1] https://arxiv.org/abs/2402.19072 [2] https://github.com/thuml/TimeXer
Notes
- [1] This implementation handles only continuous variables in the context length. Categorical variables
support will be added in the future.
- [2] The TimeXer model obtains many of its attributes from the TslibBaseModel class, which is a base class
where a lot of the boiler plate code for metadata handling and model initialization is implemented.
Methods
forward(x)Forward pass of the TimeXer model.