TSLib for v2 - Example notebook for full pipeline#

Basic imports for getting started#

This notebook is a basic vignette for the usage of the tslib data module on the TimeXer model for the v2 of PyTorch Forecasting. This is an experimental version and is an unstable version of the API.

Feedback and suggestions on this pipeline - PR #1836

[ ]:

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
import torch

from pytorch_forecasting.data.data_module import TslibDataModule
from pytorch_forecasting.data.encoders import (
    NaNLabelEncoder,
    TorchNormalizer,
)
from pytorch_forecasting.data.timeseries import TimeSeries
from pytorch_forecasting.models.timexer._timexer_v2 import TimeXer

Construct a time series dataset#

This step requires us to build a TimeSeries object for creating a time series dataset, which identifies the features from a raw time series dataset. As you can see below, we are initialising a sample time series dataset.

[2]:

num_series = 100
seq_length = 50
data_list = []
for i in range(num_series):
    x = np.arange(seq_length)
    y = np.sin(x / 5.0) + np.random.normal(scale=0.1, size=seq_length)
    category = i % 5
    static_value = np.random.rand()
    for t in range(seq_length - 1):
        data_list.append(
            {
                "series_id": i,
                "time_idx": t,
                "x": y[t],
                "y": y[t + 1],
                "category": category,
                "future_known_feature": np.cos(t / 10),
                "static_feature": static_value,
                "static_feature_cat": i % 3,
            }
        )
data_df = pd.DataFrame(data_list)
data_df.head()

[2]:

	time_idx	x	y	future_known_feature	static_feature
0	0	0.177658	0.181124	1.000000	0.409581
1	1	0.181124	0.314081	0.995004	0.409581
2	2	0.314081	0.601934	0.980067	0.409581
3	3	0.601934	0.733805	0.955336	0.409581
4	4	0.733805	0.768843	0.921061	0.409581

Feature Categories and Definitions#

``time_idx``#

Definition: The temporal index column that orders observations chronologically
Example: Sequential time steps (0, 1, 2, …) or timestamps
Usage: Identifies the temporal ordering of data points within each time series

``target``#

Definition: The variable you want to predict/forecast
Example: Sales volume, stock price, temperature readings
Usage: The dependent variable that the model learns to forecast

``group``#

Definition: Categorical variables that identify different time series entities
Example: series_id, store_id, product_id, customer_id
Usage: Distinguishes between multiple time series in the dataset

``num``#

Definition: Numerical/continuous features used as model inputs
Example: Price, quantity, weather data, economic indicators
Usage: Continuous variables that provide numerical context for predictions

``cat``#

Definition: Categorical features that represent discrete classes or labels
Example: Product category, day of week, seasonal indicators, region
Usage: Discrete variables that provide categorical context for predictions

``known``#

Definition: Future values that are known at prediction time (exogenous variables)
Example: Holidays, planned promotions, scheduled events, calendar features
Usage: Information available for both historical and future periods

``unknown``#

Definition: Variables only available during training/historical periods
Example: Past weather conditions, historical prices, competitor actions
Usage: Features that help with training but aren’t available for future predictions

``static``#

Definition: Time-invariant features that remain constant for each time series
Example: Store size, product attributes, geographic location, customer demographics
Usage: Entity-specific characteristics that don’t change over time

[3]:

dataset = TimeSeries(
    data=data_df,
    time="time_idx",
    target="y",
    group=["series_id"],
    num=["x", "future_know_feature", "static_feature"],
    cat=["category", "static_feature_cat"],
    known=["future_known_feature"],
    unknown=["x", "category"],
    static=["static_feature", "static_feature_cat"],
)

/home/aryan/pytorch-forecasting/pytorch_forecasting/data/timeseries/_timeseries_v2.py:105: UserWarning: TimeSeries is part of an experimental rework of the pytorch-forecasting data layer, scheduled for release with v2.0.0. The API is not stable and may change without prior warning. For beta testing, but not for stable production use. Feedback and suggestions are very welcome in pytorch-forecasting issue 1736, https://github.com/sktime/pytorch-forecasting/issues/1736
  warn(

Initialise the `TslibDataModule` using the dataset#

This steps initialises a basic data module built specially for tslib modules and provides all the metadata required to train and implement the tslib of your choice! You can refer the implementation for TslibDataModule for more information.

[4]:

data_module = TslibDataModule(
    time_series_dataset=dataset,
    context_length=30,
    prediction_length=1,
    add_relative_time_idx=True,
    target_normalizer=TorchNormalizer(),
    categorical_encoders={
        "category": NaNLabelEncoder(add_nan=True),
        "static_feature_cat": NaNLabelEncoder(add_nan=True),
    },
    scalers={
        "x": StandardScaler(),
        "future_known_feature": StandardScaler(),
        "static_feature": StandardScaler(),
    },
    batch_size=32,
)

/home/aryan/pytorch-forecasting/pytorch_forecasting/data/_tslib_data_module.py:275: UserWarning: TslibDataModule is experimental and subject to change. The API is not stable and may change without prior warning.
  warnings.warn(

[5]:

data_module.metadata

[5]:

{'feature_names': {'categorical': ['category', 'static_feature_cat'],
  'continuous': ['x', 'future_known_feature', 'static_feature'],
  'static': ['static_feature', 'static_feature_cat'],
  'known': ['future_known_feature'],
  'unknown': ['x', 'category', 'static_feature', 'static_feature_cat'],
  'target': ['y'],
  'all': ['x',
   'category',
   'future_known_feature',
   'static_feature',
   'static_feature_cat'],
  'static_categorical': ['static_feature_cat'],
  'static_continuous': ['static_feature']},
 'feature_indices': {'categorical': [1, 4],
  'continuous': [0, 2, 3],
  'static': [],
  'known': [2],
  'unknown': [0, 1, 3, 4],
  'target': [0]},
 'n_features': {'categorical': 2,
  'continuous': 3,
  'static': 2,
  'known': 1,
  'unknown': 4,
  'target': 1,
  'all': 5,
  'static_categorical': 1,
  'static_continuous': 1},
 'context_length': 30,
 'prediction_length': 1,
 'freq': 'h',
 'features': 'MS'}

Initialise the model#

We shall try out two versions of this model, one using MAE() and one with QuantileLoss().

Let us quickly import the required packages for the next steps.

[ ]:

from pytorch_forecasting.metrics import MAE, SMAPE, QuantileLoss

[7]:

model1 = TimeXer(
    loss=MAE(),
    hidden_size=64,
    nhead=4,
    e_layers=2,
    d_ff=256,
    dropout=0.1,
    patch_length=4,
    logging_metrics=[MAE(), SMAPE()],
    optimizer="adam",
    optimizer_params={"lr": 1e-3},
    lr_scheduler="reduce_lr_on_plateau",
    lr_scheduler_params={
        "mode": "min",
        "factor": 0.5,
        "patience": 5,
    },
    metadata=data_module.metadata,
)

/home/aryan/pytorch-forecasting/pytorch_forecasting/models/base/_base_model_v2.py:61: UserWarning: The Model 'TimeXer' is part of an experimental reworkof the pytorch-forecasting model layer, scheduled for release with v2.0.0. The API is not stable and may change without prior warning. This class is intended for beta testing and as a basic skeleton, but not for stable production use. Feedback and suggestions are very welcome in pytorch-forecasting issue 1736, https://github.com/sktime/pytorch-forecasting/issues/1736
  warn(
/home/aryan/pytorch-forecasting/pytorch_forecasting/models/base/_tslib_base_model_v2.py:60: UserWarning: The Model 'TimeXer' is part of an experimental implementationof the pytorch-forecasting model layer for Time Series Library, scheduledfor release with v2.0.0. The API is not stableand may change without prior warning. This class is intended for betatesting, not for stable production use.
  warn(
/home/aryan/pytorch-forecasting/pytorch_forecasting/models/timexer/_timexer_v2.py:133: UserWarning: TimeXer is an experimental model implemented on TslibBaseModelV2. It is an unstable version and maybe subject to unannouced changes.Please use with caution. Feedback on the design and implementation iswelcome. On the issue #1833 - https://github.com/sktime/pytorch-forecasting/issues/1833
  warn.warn(
/home/aryan/pytorch-forecasting/pytorch_forecasting/models/timexer/_timexer_v2.py:179: UserWarning: Context length (30) is not divisible by patch length. This may lead to unexpected behavior, as sometime steps will not be used in the model.
  warn.warn(

[8]:

model2 = TimeXer(
    loss=QuantileLoss(quantiles=[0.1, 0.5, 0.9]),  # quantiles of 0.1, 0.5 and 0.9 used.
    hidden_size=64,
    nhead=4,
    e_layers=2,
    d_ff=256,
    dropout=0.1,
    patch_length=4,
    optimizer="adam",
    optimizer_params={"lr": 1e-3},
    lr_scheduler="reduce_lr_on_plateau",
    lr_scheduler_params={
        "mode": "min",
        "factor": 0.5,
        "patience": 5,
    },
    metadata=data_module.metadata,
)

[9]:

from lightning.pytorch import Trainer

trainer1 = Trainer(
    max_epochs=5,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    enable_model_summary=True,
)

trainer2 = Trainer(
    max_epochs=4,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    enable_model_summary=True,
)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

Fit the trainer on the model and feed data using the data module#

[10]:

trainer1.fit(model1, data_module)

You are using a CUDA device ('NVIDIA GeForce RTX 4050 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name         | Type                   | Params | Mode
----------------------------------------------------------------
0 | loss         | MAE                    | 0      | train
1 | en_embedding | EnEmbedding            | 320    | train
2 | ex_embedding | DataEmbedding_inverted | 2.0 K  | train
3 | encoder      | Encoder                | 133 K  | train
4 | head         | FlattenHead            | 513    | train
----------------------------------------------------------------
136 K     Trainable params
0         Non-trainable params
136 K     Total params
0.546     Total estimated model params size (MB)
57        Modules in train mode
0         Modules in eval mode

/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py:310: The number of training batches (42) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

`Trainer.fit` stopped: `max_epochs=5` reached.

Now let us train the model using QuantileLoss.

[11]:

trainer2.fit(model2, data_module)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name         | Type                   | Params | Mode
----------------------------------------------------------------
0 | loss         | QuantileLoss           | 0      | train
1 | en_embedding | EnEmbedding            | 320    | train
2 | ex_embedding | DataEmbedding_inverted | 2.0 K  | train
3 | encoder      | Encoder                | 133 K  | train
4 | head         | FlattenHead            | 1.5 K  | train
----------------------------------------------------------------
137 K     Trainable params
0         Non-trainable params
137 K     Total params
0.550     Total estimated model params size (MB)
57        Modules in train mode
0         Modules in eval mode

/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py:310: The number of training batches (42) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

`Trainer.fit` stopped: `max_epochs=4` reached.

Test the model#

[12]:

test_metrics = trainer1.test(model1, data_module)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test_MAE            0.13894660770893097
       test_SMAPE           0.40041154623031616
        test_loss           0.13894660770893097
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

[13]:

model1.eval()

[13]:

TimeXer(
  (loss): MAE()
  (en_embedding): EnEmbedding(
    (value_embedding): Linear(in_features=4, out_features=64, bias=False)
    (position_embedding): PositionalEmbedding()
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (ex_embedding): DataEmbedding_inverted(
    (value_embedding): Linear(in_features=30, out_features=64, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (layers): ModuleList(
      (0-1): 2 x EncoderLayer(
        (self_attention): AttentionLayer(
          (inner_attention): FullAttention(
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (query_projection): Linear(in_features=64, out_features=64, bias=True)
          (key_projection): Linear(in_features=64, out_features=64, bias=True)
          (value_projection): Linear(in_features=64, out_features=64, bias=True)
          (out_projection): Linear(in_features=64, out_features=64, bias=True)
        )
        (cross_attention): AttentionLayer(
          (inner_attention): FullAttention(
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (query_projection): Linear(in_features=64, out_features=64, bias=True)
          (key_projection): Linear(in_features=64, out_features=64, bias=True)
          (value_projection): Linear(in_features=64, out_features=64, bias=True)
          (out_projection): Linear(in_features=64, out_features=64, bias=True)
        )
        (conv1): Conv1d(64, 256, kernel_size=(1,), stride=(1,))
        (conv2): Conv1d(256, 64, kernel_size=(1,), stride=(1,))
        (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
        (norm3): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
    (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
  )
  (head): FlattenHead(
    (flatten): Flatten(start_dim=-2, end_dim=-1)
    (linear): Linear(in_features=512, out_features=1, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
)

[14]:

with torch.no_grad():
    test_batch = next(iter(data_module.test_dataloader()))
    x_test, y_test = test_batch
    y_pred = model1(x_test)

    print("Prediction:", y_pred["prediction"])

Prediction: tensor([[[ 0.1253]],

        [[ 0.2623]],

        [[ 0.4591]],

        [[ 0.6304]],

        [[ 0.7916]],

        [[ 0.9132]],

        [[ 1.0252]],

        [[ 1.1069]],

        [[ 1.1370]],

        [[ 1.1317]],

        [[ 1.0659]],

        [[ 0.9617]],

        [[ 0.8297]],

        [[ 0.6622]],

        [[ 0.5254]],

        [[ 0.3310]],

        [[ 0.1579]],

        [[-0.0506]],

        [[-0.1999]],

        [[ 0.0740]],

        [[ 0.2787]],

        [[ 0.4506]],

        [[ 0.6381]],

        [[ 0.7867]],

        [[ 0.9343]],

        [[ 1.0370]],

        [[ 1.1286]],

        [[ 1.1737]],

        [[ 1.1367]],

        [[ 1.0765]],

        [[ 0.9569]],

        [[ 0.8583]]])

[15]:

y_pred["prediction"].shape

[15]:

torch.Size([32, 1, 1])

Let us do the same for QuantileLoss predictions.

[16]:

test_metrics = trainer2.test(model2, data_module)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test_loss           0.07047828286886215
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

[17]:

model2.eval()

[17]:

TimeXer(
  (loss): QuantileLoss(quantiles=[0.1, 0.5, 0.9])
  (en_embedding): EnEmbedding(
    (value_embedding): Linear(in_features=4, out_features=64, bias=False)
    (position_embedding): PositionalEmbedding()
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (ex_embedding): DataEmbedding_inverted(
    (value_embedding): Linear(in_features=30, out_features=64, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (layers): ModuleList(
      (0-1): 2 x EncoderLayer(
        (self_attention): AttentionLayer(
          (inner_attention): FullAttention(
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (query_projection): Linear(in_features=64, out_features=64, bias=True)
          (key_projection): Linear(in_features=64, out_features=64, bias=True)
          (value_projection): Linear(in_features=64, out_features=64, bias=True)
          (out_projection): Linear(in_features=64, out_features=64, bias=True)
        )
        (cross_attention): AttentionLayer(
          (inner_attention): FullAttention(
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (query_projection): Linear(in_features=64, out_features=64, bias=True)
          (key_projection): Linear(in_features=64, out_features=64, bias=True)
          (value_projection): Linear(in_features=64, out_features=64, bias=True)
          (out_projection): Linear(in_features=64, out_features=64, bias=True)
        )
        (conv1): Conv1d(64, 256, kernel_size=(1,), stride=(1,))
        (conv2): Conv1d(256, 64, kernel_size=(1,), stride=(1,))
        (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
        (norm3): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
    (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
  )
  (head): FlattenHead(
    (flatten): Flatten(start_dim=-2, end_dim=-1)
    (linear): Linear(in_features=512, out_features=3, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
)

[18]:

with torch.no_grad():
    test_batch = next(iter(data_module.test_dataloader()))
    x_test, y_test = test_batch
    y_pred = model2(x_test)

    print("Prediction:", y_pred["prediction"])

Prediction: tensor([[[-0.1025, -0.0489,  0.0900]],

        [[ 0.0680,  0.0936,  0.2504]],

        [[ 0.2310,  0.2605,  0.4298]],

        [[ 0.3604,  0.3968,  0.5679]],

        [[ 0.4935,  0.5408,  0.7165]],

        [[ 0.6274,  0.6697,  0.8745]],

        [[ 0.7192,  0.7940,  0.9812]],

        [[ 0.7555,  0.8650,  1.0313]],

        [[ 0.7602,  0.8706,  1.0427]],

        [[ 0.7532,  0.8524,  1.0308]],

        [[ 0.7003,  0.7784,  0.9995]],

        [[ 0.5987,  0.6807,  0.9390]],

        [[ 0.4757,  0.5814,  0.7966]],

        [[ 0.3432,  0.4587,  0.6614]],

        [[ 0.1659,  0.2931,  0.5039]],

        [[-0.0338,  0.0983,  0.3208]],

        [[-0.1989, -0.0829,  0.1821]],

        [[-0.3732, -0.2402,  0.0121]],

        [[-0.5151, -0.3600, -0.1606]],

        [[-0.0789, -0.0406,  0.0908]],

        [[ 0.0495,  0.0830,  0.2585]],

        [[ 0.2185,  0.2520,  0.4223]],

        [[ 0.3870,  0.4209,  0.5818]],

        [[ 0.5243,  0.5766,  0.7636]],

        [[ 0.6293,  0.6854,  0.8715]],

        [[ 0.7055,  0.7854,  0.9698]],

        [[ 0.7722,  0.8390,  1.0474]],

        [[ 0.8323,  0.9074,  1.0969]],

        [[ 0.8132,  0.8968,  1.1051]],

        [[ 0.6892,  0.8067,  1.0172]],

        [[ 0.5896,  0.7130,  0.9167]],

        [[ 0.4989,  0.5976,  0.8067]]])

[19]:

y_pred["prediction"].shape

[19]:

torch.Size([32, 1, 3])