Example Notebook for a basic vignette for `pytorch-forecasting v2` Model Training and Inference#

warning:: The “Data Pipeline” showcased here is part of an experimental rework of the pytorch-forecasting data layer, planned for release in v2.0.0. The API is currently unstable and subject to change without prior notice. This notebook serves as a basic demonstration of the intended workflow and is not recommended for use in production environments. Feedback and suggestions are highly encouraged — please share them in issue 1736.

In this notebook, we demonstrate how to train and evaluate the Temporal Fusion Transformer (TFT) using the new TimeSeries and DataModule API from the v2 pipeline. We can do this in 2 ways:

High-level package API:

This approach handles data loading, dataloader creation, and model training internally. It provides a simple, scikit-learn-like fit → predict workflow. Users can still configure key training options (such as the trainer, callbacks, and training parameters) but cannot plug in fully custom trainer implementations or override internal pipeline logic.
Low-level 3-stage pipeline: This involves explicitly constructing:
- a TimeSeries object
- a DataModule
- the model (e.g., TFT)
This workflow is ideal if you need custom setups such as custom trainers, callbacks, or advanced data preprocessing. It requires a deeper understanding of how the three layers (TimeSeries, DataModule, and the model) interact, but offers maximum flexibility.

Create Synthetic data#

We generate a synthetic dataset using load_toydata that creates a pandas DataFrame with just numerical values as for now the pipeline assumes the data to be numerical only.

[2]:

from pytorch_forecasting.data.examples import load_toydata

[3]:

num_series = 100  # Number of individual time series to generate
seq_length = 50  # Length of each time series
data_df = load_toydata(num_series, seq_length)
data_df.head()

[3]:

	time_idx	x	y	future_known_feature	static_feature
0	0	-0.030643	0.148280	1.000000	0.039213
1	1	0.148280	0.433029	0.995004	0.039213
2	2	0.433029	0.742511	0.980067	0.039213
3	3	0.742511	0.729270	0.955336	0.039213
4	4	0.729270	0.628604	0.921061	0.039213

High-level API#

Steps#

Create the TimeSeries object
Create configs for model, datamodule, trainer etc.
Create the model_pkg object
perform pkg.fit and pkg.predict.

Create Dataset object#

TimeSeries returns the raw data in terms of tensors .

TimeSeries dataset’s Key arguments:

data: DataFrame with sequence data.
time: integer typed column denoting the time index within data.
target: Column(s) in data denoting the forecasting target.
group: List of column names identifying a time series instance within data.
num: List of numerical features.
cat: List of categorical features.
known: Features known in future
unknown: Features not known in the future
static: List of variables that do not change over time,

[4]:

from pytorch_forecasting.data.timeseries import TimeSeries

[5]:

# create `TimeSeries` dataset that returns the raw data in terms of tensors
dataset = TimeSeries(
    data=data_df,
    time="time_idx",
    target="y",
    group=["series_id"],
    num=["x", "future_known_feature", "static_feature"],
    cat=["category", "static_feature_cat"],
    known=["future_known_feature"],
    unknown=["x", "category"],
    static=["static_feature", "static_feature_cat"],
)

/content/pytorch-forecasting/pytorch_forecasting/data/timeseries/_timeseries_v2.py:105: UserWarning: TimeSeries is part of an experimental rework of the pytorch-forecasting data layer, scheduled for release with v2.0.0. The API is not stable and may change without prior warning. For beta testing, but not for stable production use. Feedback and suggestions are very welcome in pytorch-forecasting issue 1736, https://github.com/sktime/pytorch-forecasting/issues/1736
  warn(

Create the configs#

[13]:

from sklearn.preprocessing import StandardScaler
from pytorch_forecasting.data.encoders import (
    NaNLabelEncoder,
    TorchNormalizer,
)
from pytorch_forecasting.metrics import MAE, SMAPE

Here we use EncoderDecoderTimeSeriesDataModule

EncoderDecoderTimeSeriesDataModule key arguments:

time_series_dataset: TimeSeries dataset instance
max_encoder_length : Maximum length of the encoder input sequence.
max_prediction_length : Maximum length of the decoder output sequence.
batch_size : Batch size for DataLoader.
categorical_encoders : Dictionary of categorical encoders.
scalers : Dictionary of feature scalers.
target_normalizer: Normalizer for the target variable.

[14]:

datamodule_cfg = dict(
    max_encoder_length=30,
    max_prediction_length=1,
    batch_size=32,
    categorical_encoders={
        "category": NaNLabelEncoder(add_nan=True),
        "static_feature_cat": NaNLabelEncoder(add_nan=True),
    },
    scalers={
        "x": StandardScaler(),
        "future_known_feature": StandardScaler(),
        "static_feature": StandardScaler(),
    },
    target_normalizer=TorchNormalizer(),
)

We would use TFT model in this tutorial

[15]:

model_cfg = dict(
    loss=MAE(),
    logging_metrics=[MAE(), SMAPE()],
    optimizer="adam",
    optimizer_params={"lr": 1e-3},
    lr_scheduler="reduce_lr_on_plateau",
    lr_scheduler_params={"mode": "min", "factor": 0.1, "patience": 10},
    hidden_size=64,
    num_layers=2,
    attention_head_size=4,
    dropout=0.1,
)

[16]:

trainer_cfg = dict(
    max_epochs=5,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    log_every_n_steps=10,
)

[17]:

from pytorch_forecasting.models.temporal_fusion_transformer._tft_pkg_v2 import (
    TFT_pkg_v2,
)

Create the `model_pkg` object#

This pkg class acts as a wrapper around the whole ML pipeline in pytorch-forecasting and we can simply just define the pkg class and then use pkg.fit and pkg.predict to perform the “fit”, “predict” mechanisms.

[18]:

model_pkg = TFT_pkg_v2(
    model_cfg=model_cfg,
    trainer_cfg=trainer_cfg,
    datamodule_cfg=datamodule_cfg,
)

{'loss': MAE(), 'logging_metrics': [MAE(), SMAPE()], 'optimizer': 'adam', 'optimizer_params': {'lr': 0.001}, 'lr_scheduler': 'reduce_lr_on_plateau', 'lr_scheduler_params': {'mode': 'min', 'factor': 0.1, 'patience': 10}, 'hidden_size': 64, 'num_layers': 2, 'attention_head_size': 4, 'dropout': 0.1}

[ ]:

model_pkg.fit(dataset)  # You can also pass in a DataModule here

/content/pytorch-forecasting/pytorch_forecasting/data/data_module.py:129: UserWarning: EncoderDecoderTimeSeriesDataModule is part of an experimental rework of the pytorch-forecasting data layer, scheduled for release with v2.0.0. The API is not stable and may change without prior warning. For beta testing, but not for stable production use. Feedback and suggestions are very welcome in pytorch-forecasting issue 1736, https://github.com/sktime/pytorch-forecasting/issues/1736
  warn(
/content/pytorch-forecasting/pytorch_forecasting/models/base/_base_model_v2.py:64: UserWarning: The Model 'TFT' is part of an experimental reworkof the pytorch-forecasting model layer, scheduled for release with v2.0.0. The API is not stable and may change without prior warning. This class is intended for beta testing and as a basic skeleton, but not for stable production use. Feedback and suggestions are very welcome in pytorch-forecasting issue 1736, https://github.com/sktime/pytorch-forecasting/issues/1736
  warn(
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:
  | Name                  | Type               | Params | Mode
---------------------------------------------------------------------
0 | loss                  | MAE                | 0      | train
1 | encoder_var_selection | Sequential         | 709    | train
2 | decoder_var_selection | Sequential         | 193    | train
3 | static_context_linear | Linear             | 192    | train
4 | lstm_encoder          | LSTM               | 51.5 K | train
5 | lstm_decoder          | LSTM               | 50.4 K | train
6 | self_attention        | MultiheadAttention | 16.6 K | train
7 | pre_output            | Linear             | 4.2 K  | train
8 | output_layer          | Linear             | 65     | train
---------------------------------------------------------------------
123 K     Trainable params
0         Non-trainable params
123 K     Total params
0.495     Total estimated model params size (MB)
18        Modules in train mode
0         Modules in eval mode
INFO:lightning.pytorch.callbacks.model_summary:
  | Name                  | Type               | Params | Mode
---------------------------------------------------------------------
0 | loss                  | MAE                | 0      | train
1 | encoder_var_selection | Sequential         | 709    | train
2 | decoder_var_selection | Sequential         | 193    | train
3 | static_context_linear | Linear             | 192    | train
4 | lstm_encoder          | LSTM               | 51.5 K | train
5 | lstm_decoder          | LSTM               | 50.4 K | train
6 | self_attention        | MultiheadAttention | 16.6 K | train
7 | pre_output            | Linear             | 4.2 K  | train
8 | output_layer          | Linear             | 65     | train
---------------------------------------------------------------------
123 K     Trainable params
0         Non-trainable params
123 K     Total params
0.495     Total estimated model params size (MB)
18        Modules in train mode
0         Modules in eval mode

INFO: `Trainer.fit` stopped: `max_epochs=5` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.

Artifacts saved in: /content/pytorch-forecasting/checkpoints

PosixPath('/content/pytorch-forecasting/checkpoints/best-epoch=3-step=168.ckpt')

Output#

Output of TFT model is a dict with key prediction:

y_pred["prediction"]: Tensor of shape (batch_size, prediction_length, output_size)

[ ]:

preds = model_pkg.predict(dataset, return_info=["index", "x", "y"])
# You can also pass in a DataModule or Dataloader here

/content/pytorch-forecasting/pytorch_forecasting/data/data_module.py:129: UserWarning: EncoderDecoderTimeSeriesDataModule is part of an experimental rework of the pytorch-forecasting data layer, scheduled for release with v2.0.0. The API is not stable and may change without prior warning. For beta testing, but not for stable production use. Feedback and suggestions are very welcome in pytorch-forecasting issue 1736, https://github.com/sktime/pytorch-forecasting/issues/1736
  warn(
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:lightning.pytorch.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

[21]:

print("First Predicted Value:")
print("Index:", preds["index"][0].item())
print("Prediction:", preds["prediction"][0].item())
print("Actual:", preds["y"][0].item())

First Predicted Value:
Index: -0.0801810473203659
Prediction: 0.11192154139280319
Actual: -0.1557866632938385

3-stage pipeline#

Steps#

Create TimeSeries Dataset object
Create DataModule object
Initialize, Train & Run Inference with the Model

Create Dataset & DataModule#

TimeSeries returns the raw data in terms of tensors .
DataModule wraps the dataset, handles splits, preprocessing, batching, and exposes metadata for the model initialisation.

Initialize the Model#

We initialize the TFT model using the metadata provided by the DataModule. This metadata includes all required dimensional info for the encoder, decoder, and static inputs.

Train the Model#

We use a Trainer from PyTorch Lightning to train the model

Run Inference#

After training, we can make predictions using the trained model

1. Create the dataset#

We create a TimeSeries dataset instance that returns the raw data in terms of tensors, then this “raw data” is sent to the data_modulethat will internally handle the dataloaders and preprocessing

TimeSeries dataset’s Key arguments:

data: DataFrame with sequence data.
time: integer typed column denoting the time index within data.
target: Column(s) in data denoting the forecasting target.
group: List of column names identifying a time series instance within data.
num: List of numerical features.
cat: List of categorical features.
known: Features known in future
unknown: Features not known in the future
static: List of variables that do not change over time,

[22]:

from pytorch_forecasting.data.timeseries import TimeSeries

[23]:

# create `TimeSeries` dataset that returns the raw data in terms of tensors
dataset = TimeSeries(
    data=data_df,
    time="time_idx",
    target="y",
    group=["series_id"],
    num=["x", "future_known_feature", "static_feature"],
    cat=["category", "static_feature_cat"],
    known=["future_known_feature"],
    unknown=["x", "category"],
    static=["static_feature", "static_feature_cat"],
)

/content/pytorch-forecasting/pytorch_forecasting/data/timeseries/_timeseries_v2.py:105: UserWarning: TimeSeries is part of an experimental rework of the pytorch-forecasting data layer, scheduled for release with v2.0.0. The API is not stable and may change without prior warning. For beta testing, but not for stable production use. Feedback and suggestions are very welcome in pytorch-forecasting issue 1736, https://github.com/sktime/pytorch-forecasting/issues/1736
  warn(

2. Create datamodule#

EncoderDecoderTimeSeriesDataModule key arguments:

time_series_dataset: TimeSeries dataset instance
max_encoder_length : Maximum length of the encoder input sequence.
max_prediction_length : Maximum length of the decoder output sequence.
batch_size : Batch size for DataLoader.
categorical_encoders : Dictionary of categorical encoders.
scalers : Dictionary of feature scalers.
target_normalizer: Normalizer for the target variable.

[ ]:

from sklearn.preprocessing import StandardScaler
from pytorch_forecasting.data.data_module import EncoderDecoderTimeSeriesDataModule
from pytorch_forecasting.data.encoders import (
    NaNLabelEncoder,
    TorchNormalizer,
)

[ ]:

# create the `data_module` that handles the dataloaders and preprocessing
data_module = EncoderDecoderTimeSeriesDataModule(
    time_series_dataset=dataset,
    max_encoder_length=30,
    max_prediction_length=1,
    batch_size=32,
    categorical_encoders={
        "category": NaNLabelEncoder(add_nan=True),
        "static_feature_cat": NaNLabelEncoder(add_nan=True),
    },
    scalers={
        "x": StandardScaler(),
        "future_known_feature": StandardScaler(),
        "static_feature": StandardScaler(),
    },
    target_normalizer=TorchNormalizer(),
)

/content/pytorch-forecasting/pytorch_forecasting/data/data_module.py:129: UserWarning: EncoderDecoderTimeSeriesDataModule is part of an experimental rework of the pytorch-forecasting data layer, scheduled for release with v2.0.0. The API is not stable and may change without prior warning. For beta testing, but not for stable production use. Feedback and suggestions are very welcome in pytorch-forecasting issue 1736, https://github.com/sktime/pytorch-forecasting/issues/1736
  warn(

3. Initialise and train the model#

To initialise the model you now don’t have to pass arguments like encoder_cont, decoder_cont etc as they are calculated internally using the metadata property [source] of EncoderDecoderTimeSeriesDataModule. But you still have to pass other params like loss, optimizer etc

model = TFT(
    loss=nn.MSELoss(),
    logging_metrics=[MAE(), SMAPE()],
    metadata=data_module.metadata,  # <-- crucial for model setup
    ...
)

The metadata includes:

max_encoder_length, max_prediction_length
number of continuous/categorical variables in encoder/decoder
number of static features

These are used to configure internal layers like encoder_cont, decoder_cat, etc.

[ ]:

from pytorch_forecasting.metrics import MAE, SMAPE
from pytorch_forecasting.models.temporal_fusion_transformer._tft_v2 import TFT

[60]:

# Initialise the Model
model = TFT(
    loss=MAE(),
    logging_metrics=[MAE(), SMAPE()],
    optimizer="adam",
    optimizer_params={"lr": 1e-3},
    lr_scheduler="reduce_lr_on_plateau",
    lr_scheduler_params={"mode": "min", "factor": 0.1, "patience": 10},
    hidden_size=64,
    num_layers=2,
    attention_head_size=4,
    dropout=0.1,
    metadata=data_module.metadata,  # pass the metadata from the datamodule to the model
    # to initialise important params like `encoder_cont` etc
)

/content/pytorch-forecasting/pytorch_forecasting/models/base/_base_model_v2.py:64: UserWarning: The Model 'TFT' is part of an experimental reworkof the pytorch-forecasting model layer, scheduled for release with v2.0.0. The API is not stable and may change without prior warning. This class is intended for beta testing and as a basic skeleton, but not for stable production use. Feedback and suggestions are very welcome in pytorch-forecasting issue 1736, https://github.com/sktime/pytorch-forecasting/issues/1736
  warn(

We use a Trainer from PyTorch Lightning to train the model:

trainer = Trainer(max_epochs=5, ...)
trainer.fit(model, data_module)

The Trainer:

Pulls data from data_module
Handles device placement
Logs training progress and metrics

[ ]:

from lightning.pytorch import Trainer

[61]:

# Train the model
print("\nTraining model...")
trainer = Trainer(
    max_epochs=5,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    log_every_n_steps=10,
)

trainer.fit(model, data_module)

INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:lightning.pytorch.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:
  | Name                  | Type               | Params | Mode
---------------------------------------------------------------------
0 | loss                  | MAE                | 0      | train
1 | encoder_var_selection | Sequential         | 709    | train
2 | decoder_var_selection | Sequential         | 193    | train
3 | static_context_linear | Linear             | 192    | train
4 | lstm_encoder          | LSTM               | 51.5 K | train
5 | lstm_decoder          | LSTM               | 50.4 K | train
6 | self_attention        | MultiheadAttention | 16.6 K | train
7 | pre_output            | Linear             | 4.2 K  | train
8 | output_layer          | Linear             | 65     | train
---------------------------------------------------------------------
123 K     Trainable params
0         Non-trainable params
123 K     Total params
0.495     Total estimated model params size (MB)
18        Modules in train mode
0         Modules in eval mode
INFO:lightning.pytorch.callbacks.model_summary:
  | Name                  | Type               | Params | Mode
---------------------------------------------------------------------
0 | loss                  | MAE                | 0      | train
1 | encoder_var_selection | Sequential         | 709    | train
2 | decoder_var_selection | Sequential         | 193    | train
3 | static_context_linear | Linear             | 192    | train
4 | lstm_encoder          | LSTM               | 51.5 K | train
5 | lstm_decoder          | LSTM               | 50.4 K | train
6 | self_attention        | MultiheadAttention | 16.6 K | train
7 | pre_output            | Linear             | 4.2 K  | train
8 | output_layer          | Linear             | 65     | train
---------------------------------------------------------------------
123 K     Trainable params
0         Non-trainable params
123 K     Total params
0.495     Total estimated model params size (MB)
18        Modules in train mode
0         Modules in eval mode


Training model...

INFO: `Trainer.fit` stopped: `max_epochs=5` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.

Output#

Output of TFT model is a dict with key prediction:

y_pred["prediction"]: Tensor of shape (batch_size, prediction_length, output_size)

[ ]:

data_module.setup(stage="test")
test_dataloader = data_module.test_dataloader()

[ ]:

preds = model.predict(test_dataloader, return_info=["index", "x", "y"])

INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:lightning.pytorch.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

[ ]:

print("First Predicted Value:")
print("Index:", preds["index"][0].item())
print("Prediction:", preds["prediction"][0].item())
print("Actual:", preds["y"][0].item())

First Predicted Value:
Index: 0.11104673147201538
Prediction: -0.001255139708518982
Actual: 0.07348770648241043

Example Notebook for a basic vignette for pytorch-forecasting v2 Model Training and Inference#

Create Synthetic data#

High-level API#

Steps#

Create Dataset object#

Create the configs#

Create the model_pkg object#

Output#

3-stage pipeline#

Steps#

Create Dataset & DataModule#

Initialize the Model#

Train the Model#

Run Inference#

1. Create the dataset#

2. Create datamodule#

3. Initialise and train the model#

Output#

Example Notebook for a basic vignette for `pytorch-forecasting v2` Model Training and Inference#

Create the `model_pkg` object#