M Layer (Model)#

Try the API v2 pre-release!

You are viewing Documentation of v1 Models. A New API version 2 is in development.
Try it out before release: v2 Models | v2 API Reference
Caution: v2 is WIP and unstable. Not yet production-ready.

Model parameters very much depend on the dataset for which they are destined.

PyTorch Forecasting provides a .from_dataset() method for each model that takes a TimeSeriesDataSet and additional parameters that cannot directly derived from the dataset such as, e.g. learning_rate or hidden_size.

To tune models, optuna can be used. For example, tuning of the TemporalFusionTransformer is implemented by optimize_hyperparameters()

Available Models#

Here is an overview over the pros and cons of the implemented models:

Name

Authors

Covariates

Multiple targets

Regression

Classification

Probabilistic

Prediction intervals

Flexible History Length

Cold Start

Compute (1-5)

DecoderMLP

jdb78

x

x

x

x

x

x

x

x

1

DeepAR

jdb78

x

x

x

x

x

x

3

NBeatsKAN

Sohaib-Ahmed21

x

1

NBeats

jdb78

x

1

NHiTS

jdb78

x

x

x

x

x

1

RecurrentNetwork

jdb78

x

x

x

x

x

x

2

TemporalFusionTransformer

jdb78

x

x

x

x

x

x

x

x

4

TiDEModel

Sohaib-Ahmed21

x

x

x

x

x

3

TimeXer

PranavBhatP

x

x

x

x

x

3

xLSTMTime

muslehal, phoeenniixx

x

x

x

x

3

Implementing new architectures#

Please see the Using custom data and implementing custom models tutorial and extension templates to understand how implement basic and more advanced models.

Every model should inherit from a base model in base.

class pytorch_forecasting.models.base._base_model.BaseModel(dataset_parameters: dict[str, Any] = None, log_interval: int | float = -1, log_val_interval: int | float = None, learning_rate: float | list[float] = 0.001, log_gradient_flow: bool = False, loss: Metric = SMAPE(), logging_metrics: ModuleList = ModuleList(), reduce_on_plateau_patience: int = 1000, reduce_on_plateau_reduction: float = 2.0, reduce_on_plateau_min_lr: float = 1e-05, weight_decay: float = 0.0, optimizer_params: dict[str, Any] = None, monotone_constraints: dict[str, int] = {}, output_transformer: Callable = None, optimizer='adam')[source]

BaseModel from which new timeseries models should inherit from. The hparams of the created object will default to the parameters indicated in __init__().

The forward() method should return a named tuple with at least the entry prediction that contains the network’s output. See the function’s documentation for more details.

The idea of the base model is that common methods do not have to be re-implemented for every new architecture. The class is a [LightningModule](https://pytorch-lightning.readthedocs.io/en/latest/lightning_module.html) and follows its conventions. However, there are important additions:

  • You need to specify a loss attribute that stores the function to calculate the MultiHorizonLoss for backpropagation.

  • The from_dataset() method can be used to initialize a network using the specifications of a dataset. Often, parameters such as the number of features can be easily deduced from the dataset. Further, the method will also store how to rescale normalized predictions into the unnormalized prediction space. Override it to pass additional arguments to the __init__ method of your network that depend on your dataset.

  • The transform_output() method rescales the network output using the target normalizer from thedataset.

  • The step() method takes care of calculating the loss, logging additional metrics defined in the logging_metrics attribute and plots of sample predictions. You can override this method to add custom interpretations or pass extra arguments to the networks forward method.

  • The on_epoch_end() method can be used to calculate summaries of each epoch such as statistics on the encoder length, etc and needs to return the outputs.

  • The predict() method makes predictions using a dataloader or dataset. Override it if you need to pass additional arguments to forward by default.

To implement your own architecture, it is best to go through the Using custom data and implementing custom models and to look at existing ones to understand what might be a good approach.

Example

class Network(BaseModel):

    def __init__(self, my_first_parameter: int=2, loss=SMAPE()):
        self.save_hyperparameters()
        super().__init__(loss=loss)

    def forward(self, x):
        normalized_prediction = self.module(x)
        prediction = self.transform_output(prediction=normalized_prediction, target_scale=x["target_scale"])
        return self.to_network_output(prediction=prediction)

BaseModel for timeseries forecasting from which to inherit from

Parameters:
  • log_interval (Union[int, float], optional) – Batches after which predictions are logged. If < 1.0, will log multiple entries per batch. Defaults to -1.

  • log_val_interval (Union[int, float], optional) – batches after which predictions for validation are logged. Defaults to None/log_interval.

  • learning_rate (float, optional) – Learning rate. Defaults to 1e-3.

  • log_gradient_flow (bool) – If to log gradient flow, this takes time and should be only done to diagnose training failures. Defaults to False.

  • loss (Metric, optional) – metric to optimize, can also be list of metrics. Defaults to SMAPE().

  • logging_metrics (nn.ModuleList[MultiHorizonMetric]) – list of metrics that are logged during training. Defaults to [].

  • reduce_on_plateau_patience (int) – patience after which learning rate is reduced by a factor of 10. Defaults to 1000

  • reduce_on_plateau_reduction (float) – reduction in learning rate when encountering plateau. Defaults to 2.0.

  • reduce_on_plateau_min_lr (float) – minimum learning rate for reduce on plateau learning rate scheduler. Defaults to 1e-5

  • weight_decay (float) – weight decay. Defaults to 0.0.

  • optimizer_params (Dict[str, Any]) – additional parameters for the optimizer. Defaults to {}.

  • monotone_constraints (Dict[str, int]) – dictionary of monotonicity constraints for continuous decoder variables mapping position (e.g. "0" for first position) to constraint (-1 for negative and +1 for positive, larger numbers add more weight to the constraint vs. the loss but are usually not necessary). This constraint significantly slows down training. Defaults to {}.

  • output_transformer (Callable) – transformer that takes network output and transforms it to prediction space. Defaults to None which is equivalent to lambda out: out["prediction"].

  • optimizer (str) – Optimizer, “ranger”, “sgd”, “adam”, “adamw” or class name of optimizer in torch.optim or pytorch_optimizer. Alternatively, a class or function can be passed which takes parameters as first argument and a lr argument (optionally also weight_decay). Defaults to “adam”.

Details and available models#

See the API documentation for further details on available models:

models.deepar.DeepAR([cell_type, ...])

DeepAR: Probabilistic forecasting with autoregressive recurrent networks.

models.mlp.DecoderMLP([activation_class, ...])

MLP on the decoder.

models.nbeats.NBeats([stack_types, ...])

Initialize NBeats Model - use its from_dataset() method if possible.

models.nbeats.NBeatsKAN([stack_types, ...])

Initialize NBeatsKAN Model - use its from_dataset() method if possible.

models.nhits.NHiTS([output_size, ...])

Initialize N-HiTS Model - use its from_dataset() method if possible.

models.rnn.RecurrentNetwork([cell_type, ...])

Recurrent Network.

models.temporal_fusion_transformer.TemporalFusionTransformer([...])

Temporal Fusion Transformer for forecasting timeseries.

models.tide.TiDEModel(output_chunk_length, ...)

TiDE model for long-term time-series forecasting.

models.timexer.TimeXer(context_length, ...)

TimeXer model for time series forecasting with exogenous variables.

models.xlstm.xLSTMTime(input_size, ...[, ...])

xLSTMTime is a long‑term time series forecasting architecture built on the extended LSTM (xLSTM) design, incorporating either the scalar-memory stabilized LSTM (sLSTM) or the matrix-memory mLSTM variant.