{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# How to use custom data and implement custom models and metrics\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _new-model-tutorial:\n", "\n", "Building a new model in PyTorch Forecasting is relatively easy. Many things are taken care of automatically\n", "\n", "* Training, validation and inference is automatically handled for most models - defining the architecture and hyperparameters is sufficient\n", "* Dataloading, normalization, re-scaling etc. is provided by the TimeSeriesDataSet\n", "* Logging training progress with multiple metrics including plotting examples is automatically taken care of\n", "* Masking of entries if different time series have different lengths is automatic\n", "\n", "However, there a couple of things to keep in mind if you want to make full use of the package. This tutorial first demonstrates how to implement a simple model and then turns to more complicated implementation scenarios.\n", "\n", "We will answer questions such as\n", "\n", "* How to transfer an existing PyTorch implementation into PyTorch Forecasting\n", "* How to handle data loading and enable different length time series\n", "* How to define and use a custom metric\n", "* How to handle recurrent networks\n", "* How to deal with covariates\n", "* How to test new models" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Building a simple, first model\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "For demonstration purposes we will choose a simple fully connected model. It takes a timeseries of size `input_size` as input and outputs a new timeseries of size `output_size`. You can think of this `input_size` encoding steps and `output_size` decoding/prediction steps.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "os.chdir(\"../../..\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([20, 2])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import torch\n", "from torch import nn\n", "\n", "\n", "class FullyConnectedModule(nn.Module):\n", " def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int):\n", " super().__init__()\n", "\n", " # input layer\n", " module_list = [nn.Linear(input_size, hidden_size), nn.ReLU()]\n", " # hidden layers\n", " for _ in range(n_hidden_layers):\n", " module_list.extend([nn.Linear(hidden_size, hidden_size), nn.ReLU()])\n", " # output layer\n", " module_list.append(nn.Linear(hidden_size, output_size))\n", "\n", " self.sequential = nn.Sequential(*module_list)\n", "\n", " def forward(self, x: torch.Tensor) -> torch.Tensor:\n", " # x of shape: batch_size x n_timesteps_in\n", " # output of shape batch_size x n_timesteps_out\n", " return self.sequential(x)\n", "\n", "\n", "# test that network works as intended\n", "network = FullyConnectedModule(input_size=5, output_size=2, hidden_size=10, n_hidden_layers=2)\n", "x = torch.rand(20, 5)\n", "network(x).shape" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "The above model is not yet a PyTorch Forecasting model but it is easy to get there. As this is a simple model, we will use the :py:class:`~pytorch_forecasting.models.base_model.BaseModel`. This base class is modified `LightningModule `_ with pre-defined hooks for training and validating time series models. The :py:class:`~pytorch_forecasting.models.base_model.BaseModelWithCovariates` will be discussed later in this tutorial.\n", "\n", "Either way, the main requirement is for the model to have a ``forward`` method.\n", "\n", ".. automethod:: pytorch_forecasting.models.base_model.BaseModel.forward\n", " :noindex:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from typing import Dict\n", "\n", "from pytorch_forecasting.models import BaseModel\n", "\n", "\n", "class FullyConnectedModel(BaseModel):\n", " def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int, **kwargs):\n", " # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this\n", " self.save_hyperparameters()\n", " # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this\n", " super().__init__(**kwargs)\n", " self.network = FullyConnectedModule(\n", " input_size=self.hparams.input_size,\n", " output_size=self.hparams.output_size,\n", " hidden_size=self.hparams.hidden_size,\n", " n_hidden_layers=self.hparams.n_hidden_layers,\n", " )\n", "\n", " def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n", " # x is a batch generated based on the TimeSeriesDataset\n", " network_input = x[\"encoder_cont\"].squeeze(-1)\n", " prediction = self.network(network_input)\n", "\n", " # rescale predictions into target space\n", " prediction = self.transform_output(prediction, target_scale=x[\"target_scale\"])\n", "\n", " # We need to return a dictionary that at least contains the prediction\n", " # The parameter can be directly forwarded from the input.\n", " # The conversion to a named tuple can be directly achieved with the `to_network_output` function.\n", " return self.to_network_output(prediction=prediction)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This is a very basic implementation that could be readily used for training. But before we add additional features, let's first have a look how we pass data to this model before we go about initializing our model.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Passing data to a model\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _passing-data:\n", "\n", "Instead of having to write our own dataloader (which can be rather complicated), we can leverage PyTorch Forecasting's :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet` to feed data to our model.\n", "In fact, PyTorch Forecasting expects us to use a :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet`.\n", "\n", "The data has to be in a specific format to be used by the :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet`. It should be in a pandas `DataFrame` and have a categorical column to identify each series and a integer column to specify the time of the record.\n", "\n", "Below, we create such a dataset with 30 different observations - 10 for 3 time series." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuegrouptime_idx
0-0.12559700
10.32566801
2-0.26596202
30.13230503
40.16711704
50.48124105
6-0.11318806
7-0.08960907
80.02915608
9-0.18195009
100.15033410
110.42862411
12-0.13910612
13-0.08533413
14-0.24366814
150.05591315
160.30859116
170.14118317
180.23075918
190.17352819
200.22631520
21-0.34839021
220.06781622
23-0.07479423
240.05939624
250.30074525
26-0.34403226
27-0.08393427
28-0.34348128
29-0.38520229
\n", "
" ], "text/plain": [ " value group time_idx\n", "0 -0.125597 0 0\n", "1 0.325668 0 1\n", "2 -0.265962 0 2\n", "3 0.132305 0 3\n", "4 0.167117 0 4\n", "5 0.481241 0 5\n", "6 -0.113188 0 6\n", "7 -0.089609 0 7\n", "8 0.029156 0 8\n", "9 -0.181950 0 9\n", "10 0.150334 1 0\n", "11 0.428624 1 1\n", "12 -0.139106 1 2\n", "13 -0.085334 1 3\n", "14 -0.243668 1 4\n", "15 0.055913 1 5\n", "16 0.308591 1 6\n", "17 0.141183 1 7\n", "18 0.230759 1 8\n", "19 0.173528 1 9\n", "20 0.226315 2 0\n", "21 -0.348390 2 1\n", "22 0.067816 2 2\n", "23 -0.074794 2 3\n", "24 0.059396 2 4\n", "25 0.300745 2 5\n", "26 -0.344032 2 6\n", "27 -0.083934 2 7\n", "28 -0.343481 2 8\n", "29 -0.385202 2 9" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "test_data = pd.DataFrame(\n", " dict(\n", " value=np.random.rand(30) - 0.5,\n", " group=np.repeat(np.arange(3), 10),\n", " time_idx=np.tile(np.arange(10), 3),\n", " )\n", ")\n", "test_data" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Converting it to a :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet` is easy:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from pytorch_forecasting import TimeSeriesDataSet\n", "\n", "# create the dataset from the pandas dataframe\n", "dataset = TimeSeriesDataSet(\n", " test_data,\n", " group_ids=[\"group\"],\n", " target=\"value\",\n", " time_idx=\"time_idx\",\n", " min_encoder_length=5,\n", " max_encoder_length=5,\n", " min_prediction_length=2,\n", " max_prediction_length=2,\n", " time_varying_unknown_reals=[\"value\"],\n", ")" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We can take a look at all the defaults and settings that were set by PyTorch Forecasting. These are all available as arguments to :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet` - see its documentation for more all the details." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'time_idx': 'time_idx',\n", " 'target': 'value',\n", " 'group_ids': ['group'],\n", " 'weight': None,\n", " 'max_encoder_length': 5,\n", " 'min_encoder_length': 5,\n", " 'min_prediction_idx': 0,\n", " 'min_prediction_length': 2,\n", " 'max_prediction_length': 2,\n", " 'static_categoricals': [],\n", " 'static_reals': [],\n", " 'time_varying_known_categoricals': [],\n", " 'time_varying_known_reals': [],\n", " 'time_varying_unknown_categoricals': [],\n", " 'time_varying_unknown_reals': ['value'],\n", " 'variable_groups': {},\n", " 'constant_fill_strategy': {},\n", " 'allow_missing_timesteps': False,\n", " 'lags': {},\n", " 'add_relative_time_idx': False,\n", " 'add_target_scales': False,\n", " 'add_encoder_length': False,\n", " 'target_normalizer': GroupNormalizer(\n", " \tmethod='standard',\n", " \tgroups=[],\n", " \tcenter=True,\n", " \tscale_by_group=False,\n", " \ttransformation=None,\n", " \tmethod_kwargs={}\n", " ),\n", " 'categorical_encoders': {'__group_id__group': NaNLabelEncoder(add_nan=False, warn=True),\n", " 'group': NaNLabelEncoder(add_nan=False, warn=True)},\n", " 'scalers': {},\n", " 'randomize_length': None,\n", " 'predict_mode': False}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.get_parameters()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now, we take a look at the output of the dataloader. It's `x` will be fed to the model's forward method, that is why it is so important to understand it.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x = {'encoder_cat': tensor([], size=(4, 5, 0), dtype=torch.int64), 'encoder_cont': tensor([[[ 1.7401],\n", " [-0.6492],\n", " [-0.4229],\n", " [-1.0892],\n", " [ 0.1716]],\n", "\n", " [[-0.4229],\n", " [-1.0892],\n", " [ 0.1716],\n", " [ 1.2349],\n", " [ 0.5304]],\n", "\n", " [[-0.6492],\n", " [-0.4229],\n", " [-1.0892],\n", " [ 0.1716],\n", " [ 1.2349]],\n", "\n", " [[-1.5299],\n", " [ 0.2216],\n", " [-0.3785],\n", " [ 0.1862],\n", " [ 1.2019]]]), 'encoder_target': tensor([[ 0.4286, -0.1391, -0.0853, -0.2437, 0.0559],\n", " [-0.0853, -0.2437, 0.0559, 0.3086, 0.1412],\n", " [-0.1391, -0.0853, -0.2437, 0.0559, 0.3086],\n", " [-0.3484, 0.0678, -0.0748, 0.0594, 0.3007]]), 'encoder_lengths': tensor([5, 5, 5, 5]), 'decoder_cat': tensor([], size=(4, 2, 0), dtype=torch.int64), 'decoder_cont': tensor([[[ 1.2349],\n", " [ 0.5304]],\n", "\n", " [[ 0.9074],\n", " [ 0.6665]],\n", "\n", " [[ 0.5304],\n", " [ 0.9074]],\n", "\n", " [[-1.5116],\n", " [-0.4170]]]), 'decoder_target': tensor([[ 0.3086, 0.1412],\n", " [ 0.2308, 0.1735],\n", " [ 0.1412, 0.2308],\n", " [-0.3440, -0.0839]]), 'decoder_lengths': tensor([2, 2, 2, 2]), 'decoder_time_idx': tensor([[6, 7],\n", " [8, 9],\n", " [7, 8],\n", " [6, 7]]), 'groups': tensor([[1],\n", " [1],\n", " [1],\n", " [2]]), 'target_scale': tensor([[0.0151, 0.2376],\n", " [0.0151, 0.2376],\n", " [0.0151, 0.2376],\n", " [0.0151, 0.2376]])}\n", "\n", "y = (tensor([[ 0.3086, 0.1412],\n", " [ 0.2308, 0.1735],\n", " [ 0.1412, 0.2308],\n", " [-0.3440, -0.0839]]), None)\n", "\n", "sizes of x =\n", "\tencoder_cat = torch.Size([4, 5, 0])\n", "\tencoder_cont = torch.Size([4, 5, 1])\n", "\tencoder_target = torch.Size([4, 5])\n", "\tencoder_lengths = torch.Size([4])\n", "\tdecoder_cat = torch.Size([4, 2, 0])\n", "\tdecoder_cont = torch.Size([4, 2, 1])\n", "\tdecoder_target = torch.Size([4, 2])\n", "\tdecoder_lengths = torch.Size([4])\n", "\tdecoder_time_idx = torch.Size([4, 2])\n", "\tgroups = torch.Size([4, 1])\n", "\ttarget_scale = torch.Size([4, 2])\n" ] } ], "source": [ "# convert the dataset to a dataloader\n", "dataloader = dataset.to_dataloader(batch_size=4)\n", "\n", "# and load the first batch\n", "x, y = next(iter(dataloader))\n", "print(\"x =\", x)\n", "print(\"\\ny =\", y)\n", "print(\"\\nsizes of x =\")\n", "for key, value in x.items():\n", " print(f\"\\t{key} = {value.size()}\")" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "To understand it better, we look at documentation of the :py:meth:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet.to_dataloader` method:\n", "\n", ".. automethod:: pytorch_forecasting.data.timeseries.TimeSeriesDataSet.to_dataloader\n", " :noindex:" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This explains why we had to first extract the correct input in our simple `FullyConnectedModel` above before passing it to our `FullyConnectedModule`.\n", "As a reminder:\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n", " # x is a batch generated based on the TimeSeriesDataset\n", " network_input = x[\"encoder_cont\"].squeeze(-1)\n", " prediction = self.network(network_input)\n", "\n", " # rescale predictions into target space\n", " prediction = self.transform_output(prediction, target_scale=x[\"target_scale\"])\n", "\n", " # We need to return a dictionary that at least contains the prediction\n", " # The parameter can be directly forwarded from the input.\n", " # The conversion to a named tuple can be directly achieved with the `to_network_output` function.\n", " return self.to_network_output(prediction=prediction)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "For such a simple architecture, we can ignore most of the inputs in `x`. You do not have to worry about moving tensors to specifc GPUs, [PyTorch Lightning](https://pytorch-lightning.readthedocs.io) will take care of this for you.\n", "\n", "Now, let's check if our model works. We initialize model always with their `from_dataset()` method with takes hyperparameters from the dataset, hyperparameters for the model and hyperparameters for the optimizer. Read more about it in the next section.\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Output(prediction=tensor([[-0.0175, -0.0045],\n", " [-0.0203, 0.0039],\n", " [-0.0128, 0.0033],\n", " [-0.0162, -0.0026]], grad_fn=))" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = FullyConnectedModel.from_dataset(dataset, input_size=5, output_size=2, hidden_size=10, n_hidden_layers=2)\n", "x, y = next(iter(dataloader))\n", "model(x)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "If you want to know to which group and time index (at the first prediction) the samples in the batch link to, you can find out by using :py:meth:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet.x_to_index`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
time_idxgroup
052
151
272
350
\n", "
" ], "text/plain": [ " time_idx group\n", "0 5 2\n", "1 5 1\n", "2 7 2\n", "3 5 0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.x_to_index(x)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Coupling datasets and models\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "You might have noticed that the encoder and decoder/prediction lengths (5 and 2) are already specified in the :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet` and we specified them a second time when initializing the model. This might be acceptable for such a simple model but will make it hard for users to understand how to map form the dataset to the model parameters in more complicated settings.\n", "This is why we should implement another method in the model: ``from_dataset()``. Typically, a user would always initialize a model from a dataset. The method is also an opportunity to validate that the dataset defined by the user is compatible with your model architecture.\n", "\n", "While the :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet` and all PyTorch Forecasting metrics support different length time series, not every network architecture does." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "class FullyConnectedModel(BaseModel):\n", " def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int, **kwargs):\n", " # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this\n", " self.save_hyperparameters()\n", " # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this\n", " super().__init__(**kwargs)\n", " self.network = FullyConnectedModule(\n", " input_size=self.hparams.input_size,\n", " output_size=self.hparams.output_size,\n", " hidden_size=self.hparams.hidden_size,\n", " n_hidden_layers=self.hparams.n_hidden_layers,\n", " )\n", "\n", " def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n", " # x is a batch generated based on the TimeSeriesDataset\n", " network_input = x[\"encoder_cont\"].squeeze(-1)\n", " prediction = self.network(network_input).unsqueeze(-1)\n", "\n", " # rescale predictions into target space\n", " prediction = self.transform_output(prediction, target_scale=x[\"target_scale\"])\n", "\n", " # We need to return a dictionary that at least contains the prediction.\n", " # The parameter can be directly forwarded from the input.\n", " # The conversion to a named tuple can be directly achieved with the `to_network_output` function.\n", " return self.to_network_output(prediction=prediction)\n", "\n", " @classmethod\n", " def from_dataset(cls, dataset: TimeSeriesDataSet, **kwargs):\n", " new_kwargs = {\n", " \"output_size\": dataset.max_prediction_length,\n", " \"input_size\": dataset.max_encoder_length,\n", " }\n", " new_kwargs.update(kwargs) # use to pass real hyperparameters and override defaults set by dataset\n", " # example for dataset validation\n", " assert dataset.max_prediction_length == dataset.min_prediction_length, \"Decoder only supports a fixed length\"\n", " assert dataset.min_encoder_length == dataset.max_encoder_length, \"Encoder only supports a fixed length\"\n", " assert (\n", " len(dataset.time_varying_known_categoricals) == 0\n", " and len(dataset.time_varying_known_reals) == 0\n", " and len(dataset.time_varying_unknown_categoricals) == 0\n", " and len(dataset.static_categoricals) == 0\n", " and len(dataset.static_reals) == 0\n", " and len(dataset.time_varying_unknown_reals) == 1\n", " and dataset.time_varying_unknown_reals[0] == dataset.target\n", " ), \"Only covariate should be the target in 'time_varying_unknown_reals'\"\n", "\n", " return super().from_dataset(dataset, **new_kwargs)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's initialize from our dataset:\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | Name | Type | Params\n", "---------------------------------------------------------------\n", "0 | loss | SMAPE | 0 \n", "1 | logging_metrics | ModuleList | 0 \n", "2 | network | FullyConnectedModule | 302 \n", "3 | network.sequential | Sequential | 302 \n", "4 | network.sequential.0 | Linear | 60 \n", "5 | network.sequential.1 | ReLU | 0 \n", "6 | network.sequential.2 | Linear | 110 \n", "7 | network.sequential.3 | ReLU | 0 \n", "8 | network.sequential.4 | Linear | 110 \n", "9 | network.sequential.5 | ReLU | 0 \n", "10 | network.sequential.6 | Linear | 22 \n", "---------------------------------------------------------------\n", "302 Trainable params\n", "0 Non-trainable params\n", "302 Total params\n", "0.001 Total estimated model params size (MB)\n" ] }, { "data": { "text/plain": [ "\"hidden_size\": 10\n", "\"input_size\": 5\n", "\"learning_rate\": 0.001\n", "\"log_gradient_flow\": False\n", "\"log_interval\": -1\n", "\"log_val_interval\": -1\n", "\"logging_metrics\": ModuleList()\n", "\"loss\": SMAPE()\n", "\"monotone_constaints\": {}\n", "\"n_hidden_layers\": 2\n", "\"optimizer\": ranger\n", "\"optimizer_params\": None\n", "\"output_size\": 2\n", "\"output_transformer\": GroupNormalizer(\n", "\tmethod='standard',\n", "\tgroups=[],\n", "\tcenter=True,\n", "\tscale_by_group=False,\n", "\ttransformation=None,\n", "\tmethod_kwargs={}\n", ")\n", "\"reduce_on_plateau_min_lr\": 1e-05\n", "\"reduce_on_plateau_patience\": 1000\n", "\"reduce_on_plateau_reduction\": 2.0\n", "\"weight_decay\": 0.0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from lightning.pytorch.utilities.model_summary import ModelSummary\n", "\n", "model = FullyConnectedModel.from_dataset(dataset, hidden_size=10, n_hidden_layers=2)\n", "print(ModelSummary(model, max_depth=-1))\n", "model.hparams" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Defining additional hyperparameters\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "So far, we have kept a wildcard ``**kwargs`` argument in the model initialization signature. We then pass these ``**kwargs`` to the :py:class:`~pytorch_forecasting.models.base_model.BaseModel` using a ``super().__init__(**kwargs)`` call. We can see which additional hyperparameters are available as they are all saved in the ``hparams`` attribute of the model:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"hidden_size\": 10\n", "\"input_size\": 5\n", "\"learning_rate\": 0.001\n", "\"log_gradient_flow\": False\n", "\"log_interval\": -1\n", "\"log_val_interval\": -1\n", "\"logging_metrics\": ModuleList()\n", "\"loss\": SMAPE()\n", "\"monotone_constaints\": {}\n", "\"n_hidden_layers\": 2\n", "\"optimizer\": ranger\n", "\"optimizer_params\": None\n", "\"output_size\": 2\n", "\"output_transformer\": GroupNormalizer(\n", "\tmethod='standard',\n", "\tgroups=[],\n", "\tcenter=True,\n", "\tscale_by_group=False,\n", "\ttransformation=None,\n", "\tmethod_kwargs={}\n", ")\n", "\"reduce_on_plateau_min_lr\": 1e-05\n", "\"reduce_on_plateau_patience\": 1000\n", "\"reduce_on_plateau_reduction\": 2.0\n", "\"weight_decay\": 0.0" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.hparams" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "While not required, to give the user transparancy over these additional hyperparameters, it is worth passing them explicitly instead of implicitly in ``**kwargs``\n", "\n", "They are described in detail in the :py:class:`~pytorch_forecasting.models.base_model.BaseModel`. \n", "\n", ".. automethod:: pytorch_forecasting.models.base_model.BaseModel.__init__\n", " :noindex:\n", " \n", "You can simply copy this docstring into your model implementation:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " BaseModel for timeseries forecasting from which to inherit from\n", "\n", " Args:\n", " log_interval (Union[int, float], optional): Batches after which predictions are logged. If < 1.0, will log\n", " multiple entries per batch. Defaults to -1.\n", " log_val_interval (Union[int, float], optional): batches after which predictions for validation are\n", " logged. Defaults to None/log_interval.\n", " learning_rate (float, optional): Learning rate. Defaults to 1e-3.\n", " log_gradient_flow (bool): If to log gradient flow, this takes time and should be only done to diagnose\n", " training failures. Defaults to False.\n", " loss (Metric, optional): metric to optimize, can also be list of metrics. Defaults to SMAPE().\n", " logging_metrics (nn.ModuleList[MultiHorizonMetric]): list of metrics that are logged during training.\n", " Defaults to [].\n", " reduce_on_plateau_patience (int): patience after which learning rate is reduced by a factor of 10. Defaults\n", " to 1000\n", " reduce_on_plateau_reduction (float): reduction in learning rate when encountering plateau. Defaults to 2.0.\n", " reduce_on_plateau_min_lr (float): minimum learning rate for reduce on plateua learning rate scheduler.\n", " Defaults to 1e-5\n", " weight_decay (float): weight decay. Defaults to 0.0.\n", " optimizer_params (Dict[str, Any]): additional parameters for the optimizer. Defaults to {}.\n", " monotone_constaints (Dict[str, int]): dictionary of monotonicity constraints for continuous decoder\n", " variables mapping\n", " position (e.g. ``\"0\"`` for first position) to constraint (``-1`` for negative and ``+1`` for positive,\n", " larger numbers add more weight to the constraint vs. the loss but are usually not necessary).\n", " This constraint significantly slows down training. Defaults to {}.\n", " output_transformer (Callable): transformer that takes network output and transforms it to prediction space.\n", " Defaults to None which is equivalent to ``lambda out: out[\"prediction\"]``.\n", " optimizer (str): Optimizer, \"ranger\", \"sgd\", \"adam\", \"adamw\" or class name of optimizer in ``torch.optim``\n", " or ``pytorch_optimizer``.\n", " Alternatively, a class or function can be passed which takes parameters as first argument and\n", " a `lr` argument (optionally also `weight_decay`). Defaults to\n", " `\"ranger\" `_.\n", " \n" ] } ], "source": [ "print(BaseModel.__init__.__doc__)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Classification\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Classification is a common task and can be easily implemented. In fact, we only have to change the target in our :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet` and adjust the number of prediction outputs to reflect the number of classes we want to predict. The changes for the :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet` are marked below." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
targetvaluegrouptime_idx
0B0.96715300
1A0.16529701
2B0.10974402
3A0.85084203
4C0.26409004
5A0.32398605
6B0.08549906
7A0.77299007
8C0.48427308
9C0.06574209
10C0.38706910
11A0.56454011
12B0.97942512
13C0.44959613
14C0.84480314
15C0.62255115
16C0.23227016
17C0.13269817
18A0.50196818
19C0.99766219
20C0.05438120
21C0.00659721
22B0.43417922
23A0.20202823
24A0.84301824
25B0.06882225
26C0.46217526
27B0.06395527
28C0.86186028
29B0.43856629
\n", "
" ], "text/plain": [ " target value group time_idx\n", "0 B 0.967153 0 0\n", "1 A 0.165297 0 1\n", "2 B 0.109744 0 2\n", "3 A 0.850842 0 3\n", "4 C 0.264090 0 4\n", "5 A 0.323986 0 5\n", "6 B 0.085499 0 6\n", "7 A 0.772990 0 7\n", "8 C 0.484273 0 8\n", "9 C 0.065742 0 9\n", "10 C 0.387069 1 0\n", "11 A 0.564540 1 1\n", "12 B 0.979425 1 2\n", "13 C 0.449596 1 3\n", "14 C 0.844803 1 4\n", "15 C 0.622551 1 5\n", "16 C 0.232270 1 6\n", "17 C 0.132698 1 7\n", "18 A 0.501968 1 8\n", "19 C 0.997662 1 9\n", "20 C 0.054381 2 0\n", "21 C 0.006597 2 1\n", "22 B 0.434179 2 2\n", "23 A 0.202028 2 3\n", "24 A 0.843018 2 4\n", "25 B 0.068822 2 5\n", "26 C 0.462175 2 6\n", "27 B 0.063955 2 7\n", "28 C 0.861860 2 8\n", "29 B 0.438566 2 9" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "classification_test_data = pd.DataFrame(\n", " dict(\n", " target=np.random.choice([\"A\", \"B\", \"C\"], size=30), # CHANGING values to predict to a categorical\n", " value=np.random.rand(30), # INPUT values - see next section on covariates how to use categorical inputs\n", " group=np.repeat(np.arange(3), 10),\n", " time_idx=np.tile(np.arange(10), 3),\n", " )\n", ")\n", "classification_test_data" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[1, 0],\n", " [2, 0],\n", " [0, 2],\n", " [2, 2]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pytorch_forecasting.data.encoders import NaNLabelEncoder\n", "\n", "# create the dataset from the pandas dataframe\n", "classification_dataset = TimeSeriesDataSet(\n", " classification_test_data,\n", " group_ids=[\"group\"],\n", " target=\"target\", # SWITCHING to categorical target\n", " time_idx=\"time_idx\",\n", " min_encoder_length=5,\n", " max_encoder_length=5,\n", " min_prediction_length=2,\n", " max_prediction_length=2,\n", " time_varying_unknown_reals=[\"value\"],\n", " target_normalizer=NaNLabelEncoder(), # Use the NaNLabelEncoder to encode categorical target\n", ")\n", "\n", "x, y = next(iter(classification_dataset.to_dataloader(batch_size=4)))\n", "y[0] # target values are encoded categories" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext", "tags": [] }, "source": [ "The keyword argument ``target_normalizer`` is here redundant because the would have detected that a categorical target is used and therefore a :py:class:`~pytorch_forecasting.data.encoders.NaNLabelEncoder` is required." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Now, we need to modify our implementation of the ``FullyConnectedModel``. In particular, we have to one hyperparameters to the model: ``n_classes`` which determines how\n", "many classes there are to predict. Our model will produce a number for each class at each timestep each of which can be converted into probabilities by applying a softmax (over the last dimension). This means we need a total of ``n_decoder_timesteps x n_classes`` predictions. Further, we need to specify the default loss function which we choose to be :py:class:`~pytorch_forecasting.metrics.CrossEntropy`." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | Name | Type | Params\n", "---------------------------------------------------------------\n", "0 | loss | SMAPE | 0 \n", "1 | logging_metrics | ModuleList | 0 \n", "2 | network | FullyConnectedModule | 346 \n", "3 | network.sequential | Sequential | 346 \n", "4 | network.sequential.0 | Linear | 60 \n", "5 | network.sequential.1 | ReLU | 0 \n", "6 | network.sequential.2 | Linear | 110 \n", "7 | network.sequential.3 | ReLU | 0 \n", "8 | network.sequential.4 | Linear | 110 \n", "9 | network.sequential.5 | ReLU | 0 \n", "10 | network.sequential.6 | Linear | 66 \n", "---------------------------------------------------------------\n", "346 Trainable params\n", "0 Non-trainable params\n", "346 Total params\n", "0.001 Total estimated model params size (MB)\n" ] }, { "data": { "text/plain": [ "\"hidden_size\": 10\n", "\"input_size\": 5\n", "\"learning_rate\": 0.001\n", "\"log_gradient_flow\": False\n", "\"log_interval\": -1\n", "\"log_val_interval\": -1\n", "\"logging_metrics\": ModuleList()\n", "\"loss\": CrossEntropy()\n", "\"monotone_constaints\": {}\n", "\"n_classes\": 3\n", "\"n_hidden_layers\": 2\n", "\"optimizer\": ranger\n", "\"optimizer_params\": None\n", "\"output_size\": 2\n", "\"output_transformer\": NaNLabelEncoder(add_nan=False, warn=True)\n", "\"reduce_on_plateau_min_lr\": 1e-05\n", "\"reduce_on_plateau_patience\": 1000\n", "\"reduce_on_plateau_reduction\": 2.0\n", "\"weight_decay\": 0.0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pytorch_forecasting.metrics import CrossEntropy\n", "\n", "\n", "class FullyConnectedClassificationModel(BaseModel):\n", " def __init__(\n", " self,\n", " input_size: int,\n", " output_size: int,\n", " hidden_size: int,\n", " n_hidden_layers: int,\n", " n_classes: int,\n", " loss=CrossEntropy(),\n", " **kwargs,\n", " ):\n", " # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this\n", " self.save_hyperparameters()\n", " # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this\n", " super().__init__(**kwargs)\n", " self.network = FullyConnectedModule(\n", " input_size=self.hparams.input_size,\n", " output_size=self.hparams.output_size * self.hparams.n_classes,\n", " hidden_size=self.hparams.hidden_size,\n", " n_hidden_layers=self.hparams.n_hidden_layers,\n", " )\n", "\n", " def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n", " # x is a batch generated based on the TimeSeriesDataset\n", " batch_size = x[\"encoder_cont\"].size(0)\n", " network_input = x[\"encoder_cont\"].squeeze(-1)\n", " prediction = self.network(network_input)\n", " # RESHAPE output to batch_size x n_decoder_timesteps x n_classes\n", " prediction = prediction.unsqueeze(-1).view(batch_size, -1, self.hparams.n_classes)\n", "\n", " # rescale predictions into target space\n", " prediction = self.transform_output(prediction, target_scale=x[\"target_scale\"])\n", "\n", " # We need to return a named tuple that at least contains the prediction.\n", " # The parameter can be directly forwarded from the input.\n", " # The conversion to a named tuple can be directly achieved with the `to_network_output` function.\n", " return self.to_network_output(prediction=prediction)\n", "\n", " @classmethod\n", " def from_dataset(cls, dataset: TimeSeriesDataSet, **kwargs):\n", " assert isinstance(dataset.target_normalizer, NaNLabelEncoder), \"target normalizer has to encode categories\"\n", " new_kwargs = {\n", " \"n_classes\": len(\n", " dataset.target_normalizer.classes_\n", " ), # ADD number of classes as encoded by the target normalizer\n", " \"output_size\": dataset.max_prediction_length,\n", " \"input_size\": dataset.max_encoder_length,\n", " }\n", " new_kwargs.update(kwargs) # use to pass real hyperparameters and override defaults set by dataset\n", " # example for dataset validation\n", " assert dataset.max_prediction_length == dataset.min_prediction_length, \"Decoder only supports a fixed length\"\n", " assert dataset.min_encoder_length == dataset.max_encoder_length, \"Encoder only supports a fixed length\"\n", " assert (\n", " len(dataset.time_varying_known_categoricals) == 0\n", " and len(dataset.time_varying_known_reals) == 0\n", " and len(dataset.time_varying_unknown_categoricals) == 0\n", " and len(dataset.static_categoricals) == 0\n", " and len(dataset.static_reals) == 0\n", " and len(dataset.time_varying_unknown_reals) == 1\n", " ), \"Only covariate should be in 'time_varying_unknown_reals'\"\n", "\n", " return super().from_dataset(dataset, **new_kwargs)\n", "\n", "\n", "model = FullyConnectedClassificationModel.from_dataset(classification_dataset, hidden_size=10, n_hidden_layers=2)\n", "print(ModelSummary(model, max_depth=-1))\n", "model.hparams" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([4, 2, 3])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# passing x through model\n", "model(x)[\"prediction\"].shape" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Predicting multiple targets at the same time\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Training a model to predict multiple targets simulateneously is not difficult to implement. We can even employ mixed targets, i.e. a mix of categorical and continous targets. The first step is to use define a dataframe with multiple targets:\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
target1target2grouptime_idx
00.9148550.87880100
10.8999520.94589201
20.3437210.94770302
30.1591210.59413603
40.9389190.61361504
50.6337400.66438905
60.3015080.48686906
70.5842050.76153207
80.6889110.91599508
90.3853330.45333809
100.5633180.70889310
110.1743960.96057311
120.9468800.06824112
130.3575710.34975913
140.9636210.90860314
150.4571520.71111015
160.7735430.69974716
170.4515170.74375917
180.9609910.76368618
190.9743210.66606619
200.4364440.57148620
210.7702660.41054921
220.0308380.41675322
230.5984300.70003823
240.5169090.48951424
250.1979440.04252025
260.9924300.19822326
270.5802340.05141327
280.6156180.25844428
290.2459290.29308129
\n", "
" ], "text/plain": [ " target1 target2 group time_idx\n", "0 0.914855 0.878801 0 0\n", "1 0.899952 0.945892 0 1\n", "2 0.343721 0.947703 0 2\n", "3 0.159121 0.594136 0 3\n", "4 0.938919 0.613615 0 4\n", "5 0.633740 0.664389 0 5\n", "6 0.301508 0.486869 0 6\n", "7 0.584205 0.761532 0 7\n", "8 0.688911 0.915995 0 8\n", "9 0.385333 0.453338 0 9\n", "10 0.563318 0.708893 1 0\n", "11 0.174396 0.960573 1 1\n", "12 0.946880 0.068241 1 2\n", "13 0.357571 0.349759 1 3\n", "14 0.963621 0.908603 1 4\n", "15 0.457152 0.711110 1 5\n", "16 0.773543 0.699747 1 6\n", "17 0.451517 0.743759 1 7\n", "18 0.960991 0.763686 1 8\n", "19 0.974321 0.666066 1 9\n", "20 0.436444 0.571486 2 0\n", "21 0.770266 0.410549 2 1\n", "22 0.030838 0.416753 2 2\n", "23 0.598430 0.700038 2 3\n", "24 0.516909 0.489514 2 4\n", "25 0.197944 0.042520 2 5\n", "26 0.992430 0.198223 2 6\n", "27 0.580234 0.051413 2 7\n", "28 0.615618 0.258444 2 8\n", "29 0.245929 0.293081 2 9" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "multi_target_test_data = pd.DataFrame(\n", " dict(\n", " target1=np.random.rand(30),\n", " target2=np.random.rand(30),\n", " group=np.repeat(np.arange(3), 10),\n", " time_idx=np.tile(np.arange(10), 3),\n", " )\n", ")\n", "multi_target_test_data" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We can then simply pass a list to ``target`` keyword of the :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet`. The class will choose reasonable defaults for normalizing the targets but we can also specify the normalizer explicitly by assigning an instance of :py:class:`~pytorch_forecasting.data.encoders.MultiNormalizer` to the ``target_normalizer`` keyword - for fun, lets use different ways of normalization." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[tensor([[0.9610, 0.9743],\n", " [0.6889, 0.3853],\n", " [0.6337, 0.3015],\n", " [0.5802, 0.6156]]),\n", " tensor([[0.7637, 0.6661],\n", " [0.9160, 0.4533],\n", " [0.6644, 0.4869],\n", " [0.0514, 0.2584]])]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pytorch_forecasting.data.encoders import EncoderNormalizer, MultiNormalizer, TorchNormalizer\n", "\n", "# create the dataset from the pandas dataframe\n", "multi_target_dataset = TimeSeriesDataSet(\n", " multi_target_test_data,\n", " group_ids=[\"group\"],\n", " target=[\"target1\", \"target2\"], # USING two targets\n", " time_idx=\"time_idx\",\n", " min_encoder_length=5,\n", " max_encoder_length=5,\n", " min_prediction_length=2,\n", " max_prediction_length=2,\n", " time_varying_unknown_reals=[\"target1\", \"target2\"],\n", " target_normalizer=MultiNormalizer(\n", " [EncoderNormalizer(), TorchNormalizer()]\n", " ), # Use the NaNLabelEncoder to encode categorical target\n", ")\n", "\n", "x, y = next(iter(multi_target_dataset.to_dataloader(batch_size=4)))\n", "y[0] # target values are a list of targets" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Using multiple targets leads to a slightly different ``x`` and ``y`` of the :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet`'s dataloader.\n", "``y`` is still a tuple of target and weight but the target is now a list of tensors. So is the ``target_scale``, the ``encoder_target`` and the ``decoder_target`` in ``x``.\n", "\n", "For this reason not every model is automatically suited to deal with multiple targets. However, it is (very often) fairly simple to extend a model to output a list of tensors (for each target) as opposed to just one tensor (for one target). We will now modify our ``FullyConnectedModel`` to work with one or more targets.\n", "\n", "As we use multiple targets, we need to define a loss function that can handle them. The :py:class:`~pytorch_forecasting.metrics.MultiLoss` is exactly built for that purpose. It also allows weighing the losses differently. Soley for demonstration purposes, we decide to optimize the mean absolute error for the first and the symmetric mean average percentage error for the second target. We weight the error on the first target double as high as the error on the second target." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | Name | Type | Params\n", "---------------------------------------------------------------\n", "0 | loss | MultiLoss | 0 \n", "1 | logging_metrics | ModuleList | 0 \n", "2 | network | FullyConnectedModule | 374 \n", "3 | network.sequential | Sequential | 374 \n", "4 | network.sequential.0 | Linear | 110 \n", "5 | network.sequential.1 | ReLU | 0 \n", "6 | network.sequential.2 | Linear | 110 \n", "7 | network.sequential.3 | ReLU | 0 \n", "8 | network.sequential.4 | Linear | 110 \n", "9 | network.sequential.5 | ReLU | 0 \n", "10 | network.sequential.6 | Linear | 44 \n", "---------------------------------------------------------------\n", "374 Trainable params\n", "0 Non-trainable params\n", "374 Total params\n", "0.001 Total estimated model params size (MB)\n" ] }, { "data": { "text/plain": [ "\"hidden_size\": 10\n", "\"input_size\": 5\n", "\"learning_rate\": 0.001\n", "\"log_gradient_flow\": False\n", "\"log_interval\": -1\n", "\"log_val_interval\": -1\n", "\"logging_metrics\": ModuleList()\n", "\"loss\": MultiLoss(2 * MAE(), SMAPE())\n", "\"monotone_constaints\": {}\n", "\"n_hidden_layers\": 2\n", "\"optimizer\": ranger\n", "\"optimizer_params\": None\n", "\"output_size\": 2\n", "\"output_transformer\": MultiNormalizer(\n", "\tnormalizers=[EncoderNormalizer(\n", "\tmethod='standard',\n", "\tcenter=True,\n", "\tmax_length=None,\n", "\ttransformation=None,\n", "\tmethod_kwargs={}\n", "), TorchNormalizer(method='standard', center=True, transformation=None, method_kwargs={})]\n", ")\n", "\"reduce_on_plateau_min_lr\": 1e-05\n", "\"reduce_on_plateau_patience\": 1000\n", "\"reduce_on_plateau_reduction\": 2.0\n", "\"target_sizes\": [1, 1]\n", "\"weight_decay\": 0.0" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from typing import List, Union\n", "\n", "from pytorch_forecasting.metrics import MAE, SMAPE, MultiLoss\n", "from pytorch_forecasting.utils import to_list\n", "\n", "\n", "class FullyConnectedMultiTargetModel(BaseModel):\n", " def __init__(\n", " self,\n", " input_size: int,\n", " output_size: int,\n", " hidden_size: int,\n", " n_hidden_layers: int,\n", " target_sizes: Union[int, List[int]] = [],\n", " **kwargs,\n", " ):\n", " # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this\n", " self.save_hyperparameters()\n", " # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this\n", " super().__init__(**kwargs)\n", " self.network = FullyConnectedModule(\n", " input_size=self.hparams.input_size * len(to_list(self.hparams.target_sizes)),\n", " output_size=self.hparams.output_size * sum(to_list(self.hparams.target_sizes)),\n", " hidden_size=self.hparams.hidden_size,\n", " n_hidden_layers=self.hparams.n_hidden_layers,\n", " )\n", "\n", " def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n", " # x is a batch generated based on the TimeSeriesDataset\n", " batch_size = x[\"encoder_cont\"].size(0)\n", " network_input = x[\"encoder_cont\"].view(batch_size, -1)\n", " prediction = self.network(network_input)\n", " # RESHAPE output to batch_size x n_decoder_timesteps x sum_of_target_sizes\n", " prediction = prediction.unsqueeze(-1).view(batch_size, self.hparams.output_size, sum(self.hparams.target_sizes))\n", " # RESHAPE into list of batch_size x n_decoder_timesteps x target_sizes[i] where i=1..len(target_sizes)\n", " stops = np.cumsum(self.hparams.target_sizes)\n", " starts = stops - self.hparams.target_sizes\n", " prediction = [prediction[..., start:stop] for start, stop in zip(starts, stops)]\n", " if isinstance(self.hparams.target_sizes, int): # only one target\n", " prediction = prediction[0]\n", "\n", " # rescale predictions into target space\n", " prediction = self.transform_output(prediction, target_scale=x[\"target_scale\"])\n", "\n", " # We need to return a named tuple that at least contains the prediction.\n", " # The parameter can be directly forwarded from the input.\n", " # The conversion to a named tuple can be directly achieved with the `to_network_output` function.\n", " return self.to_network_output(prediction=prediction)\n", "\n", " @classmethod\n", " def from_dataset(cls, dataset: TimeSeriesDataSet, **kwargs):\n", " # By default only handle targets of size one here, categorical targets would be of larger size\n", " new_kwargs = {\n", " \"target_sizes\": [1] * len(to_list(dataset.target)),\n", " \"output_size\": dataset.max_prediction_length,\n", " \"input_size\": dataset.max_encoder_length,\n", " }\n", " new_kwargs.update(kwargs) # use to pass real hyperparameters and override defaults set by dataset\n", " # example for dataset validation\n", " assert dataset.max_prediction_length == dataset.min_prediction_length, \"Decoder only supports a fixed length\"\n", " assert dataset.min_encoder_length == dataset.max_encoder_length, \"Encoder only supports a fixed length\"\n", " assert (\n", " len(dataset.time_varying_known_categoricals) == 0\n", " and len(dataset.time_varying_known_reals) == 0\n", " and len(dataset.time_varying_unknown_categoricals) == 0\n", " and len(dataset.static_categoricals) == 0\n", " and len(dataset.static_reals) == 0\n", " and len(dataset.time_varying_unknown_reals)\n", " == len(dataset.target_names) # Expect as as many unknown reals as targets\n", " ), \"Only covariate should be in 'time_varying_unknown_reals'\"\n", "\n", " return super().from_dataset(dataset, **new_kwargs)\n", "\n", "\n", "model = FullyConnectedMultiTargetModel.from_dataset(\n", " multi_target_dataset,\n", " hidden_size=10,\n", " n_hidden_layers=2,\n", " loss=MultiLoss(metrics=[MAE(), SMAPE()], weights=[2.0, 1.0]),\n", ")\n", "print(ModelSummary(model, max_depth=-1))\n", "model.hparams" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's pass some data through our model and calculate the loss.\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Output(prediction=[tensor([[[0.6287],\n", " [0.6112]],\n", "\n", " [[0.5641],\n", " [0.5441]],\n", "\n", " [[0.6994],\n", " [0.6710]],\n", "\n", " [[0.5038],\n", " [0.4876]]], grad_fn=), tensor([[[0.6652],\n", " [0.4931]],\n", "\n", " [[0.6647],\n", " [0.4883]],\n", "\n", " [[0.6632],\n", " [0.4920]],\n", "\n", " [[0.6718],\n", " [0.4899]]], grad_fn=)])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "out = model(x)\n", "out" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor(0.8016, grad_fn=)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.loss(out[\"prediction\"], y)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Using covariates\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Now that we have established the basics, we can move on to more advanced use cases, e.g. how can we make use of covariates - static and continuous alike. We can leverage the :py:class:`~pytorch_forecasting.models.base_model.BaseModelWithCovariates` for this. The difference to the :py:class:`~pytorch_forecasting.models.base_model.BaseModel` is a :py:meth:`~pytorch_forecasting.models.base_model.BaseModelWithCovariates.from_dataset` method that pre-defines hyperparameters for architectures with covariates.\n", "\n", ".. autoclass:: pytorch_forecasting.models.base_model.BaseModelWithCovariates\n", " :noindex:\n", " :members: from_dataset\n", " \n", "\n", "Here is a from the BaseModelWithCovariates docstring to copy:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Model with additional methods using covariates.\n", "\n", " Assumes the following hyperparameters:\n", "\n", " Args:\n", " static_categoricals (List[str]): names of static categorical variables\n", " static_reals (List[str]): names of static continuous variables\n", " time_varying_categoricals_encoder (List[str]): names of categorical variables for encoder\n", " time_varying_categoricals_decoder (List[str]): names of categorical variables for decoder\n", " time_varying_reals_encoder (List[str]): names of continuous variables for encoder\n", " time_varying_reals_decoder (List[str]): names of continuous variables for decoder\n", " x_reals (List[str]): order of continuous variables in tensor passed to forward function\n", " x_categoricals (List[str]): order of categorical variables in tensor passed to forward function\n", " embedding_sizes (Dict[str, Tuple[int, int]]): dictionary mapping categorical variables to tuple of integers\n", " where the first integer denotes the number of categorical classes and the second the embedding size\n", " embedding_labels (Dict[str, List[str]]): dictionary mapping (string) indices to list of categorical labels\n", " embedding_paddings (List[str]): names of categorical variables for which label 0 is always mapped to an\n", " embedding vector filled with zeros\n", " categorical_groups (Dict[str, List[str]]): dictionary of categorical variables that are grouped together and\n", " can also take multiple values simultaneously (e.g. holiday during octoberfest). They should be implemented\n", " as bag of embeddings\n", " \n" ] } ], "source": [ "from pytorch_forecasting.models.base_model import BaseModelWithCovariates\n", "\n", "print(BaseModelWithCovariates.__doc__)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We will now implement the model. A helpful module is the :py:class:`~pytorch_forecasting.models.nn.embeddings.MultiEmbedding` which can be used to embed categorical features. It is compliant with he :py:class:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet`, i.e. it supports bags of embeddings that are useful for embeddings where multiple categories can occur at the same time such holidays. Again, we will create a fully-connected network. It is easy to recycle our ``FullyConnectedModule`` by simply replacing setting ``input_size`` to the number of encoder time steps times the number of features instead of simply the number of encoder time steps." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "from typing import Dict, List, Tuple\n", "\n", "from pytorch_forecasting.models.nn import MultiEmbedding\n", "\n", "\n", "class FullyConnectedModelWithCovariates(BaseModelWithCovariates):\n", " def __init__(\n", " self,\n", " input_size: int,\n", " output_size: int,\n", " hidden_size: int,\n", " n_hidden_layers: int,\n", " x_reals: List[str],\n", " x_categoricals: List[str],\n", " embedding_sizes: Dict[str, Tuple[int, int]],\n", " embedding_labels: Dict[str, List[str]],\n", " static_categoricals: List[str],\n", " static_reals: List[str],\n", " time_varying_categoricals_encoder: List[str],\n", " time_varying_categoricals_decoder: List[str],\n", " time_varying_reals_encoder: List[str],\n", " time_varying_reals_decoder: List[str],\n", " embedding_paddings: List[str],\n", " categorical_groups: Dict[str, List[str]],\n", " **kwargs,\n", " ):\n", " # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this\n", " self.save_hyperparameters()\n", " # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this\n", " super().__init__(**kwargs)\n", "\n", " # create embedder - can be fed with x[\"encoder_cat\"] or x[\"decoder_cat\"] and will return\n", " # dictionary of category names mapped to embeddings\n", " self.input_embeddings = MultiEmbedding(\n", " embedding_sizes=self.hparams.embedding_sizes,\n", " categorical_groups=self.hparams.categorical_groups,\n", " embedding_paddings=self.hparams.embedding_paddings,\n", " x_categoricals=self.hparams.x_categoricals,\n", " max_embedding_size=self.hparams.hidden_size,\n", " )\n", "\n", " # calculate the size of all concatenated embeddings + continous variables\n", " n_features = sum(\n", " embedding_size for classes_size, embedding_size in self.hparams.embedding_sizes.values()\n", " ) + len(self.reals)\n", "\n", " # create network that will be fed with continious variables and embeddings\n", " self.network = FullyConnectedModule(\n", " input_size=self.hparams.input_size * n_features,\n", " output_size=self.hparams.output_size,\n", " hidden_size=self.hparams.hidden_size,\n", " n_hidden_layers=self.hparams.n_hidden_layers,\n", " )\n", "\n", " def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n", " # x is a batch generated based on the TimeSeriesDataset\n", " batch_size = x[\"encoder_lengths\"].size(0)\n", " embeddings = self.input_embeddings(x[\"encoder_cat\"]) # returns dictionary with embedding tensors\n", " network_input = torch.cat(\n", " [x[\"encoder_cont\"]]\n", " + [\n", " emb\n", " for name, emb in embeddings.items()\n", " if name in self.encoder_variables or name in self.static_variables\n", " ],\n", " dim=-1,\n", " )\n", " prediction = self.network(network_input.view(batch_size, -1))\n", "\n", " # rescale predictions into target space\n", " prediction = self.transform_output(prediction, target_scale=x[\"target_scale\"])\n", "\n", " # We need to return a dictionary that at least contains the prediction.\n", " # The parameter can be directly forwarded from the input.\n", " # The conversion to a named tuple can be directly achieved with the `to_network_output` function.\n", " return self.to_network_output(prediction=prediction)\n", "\n", " @classmethod\n", " def from_dataset(cls, dataset: TimeSeriesDataSet, **kwargs):\n", " new_kwargs = {\n", " \"output_size\": dataset.max_prediction_length,\n", " \"input_size\": dataset.max_encoder_length,\n", " }\n", " new_kwargs.update(kwargs) # use to pass real hyperparameters and override defaults set by dataset\n", " # example for dataset validation\n", " assert dataset.max_prediction_length == dataset.min_prediction_length, \"Decoder only supports a fixed length\"\n", " assert dataset.min_encoder_length == dataset.max_encoder_length, \"Encoder only supports a fixed length\"\n", "\n", " return super().from_dataset(dataset, **new_kwargs)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We have used here additional hooks available through the :py:class:`~pytorch_forecasting.models.base_model.BaseModelWithCovariates` such as ``self.static_variables`` or ``self.encoder_variables`` that can be readily determined from the hyperparameters. See the documentation of the :py:class:`~pytorch_forecasting.models.base_model.BaseModelWithCovariates` class for all available additions to the :py:class:`~pytorch_forecasting.models.base_model.BaseModel`.\n", "\n", "When the model receives its input `x`, you can use the hyperparameters and linked to variables and the additional variables by the :py:class:`~pytorch_forecasting.models.base_model.BaseModelWithCovariates` to identify the different variables. This is important as ``x[\"encoder_cat\"].size(2) == x[\"decoder_cat\"].size(2)`` and ``x[\"encoder_cont\"].size(2) == x[\"decoder_cont\"].size(2)``. This means all variables are passed to the encoder and decoder even if some are not allowed to be used by the decoder as they are not known in the future. The order of variables in ``x[\"encoder_cont\"]`` / ``x[\"decoder_cont\"]`` and ``x[\"encoder_cat\"]`` / ``x[\"decoder_cat\"]``is determined by the hyperparameters ``x_reals`` and ``x_categoricals``. Consequently, you can idenify, for example, the position of all continuous decoder variables with ``[self.hparams.x_reals.index(name) for name in self.hparams.time_varying_reals_decoder]``." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Note that the model does not make use of the known covariates in the decoder - this is obviously suboptimal but not scope of this tutorial. Anyways, let us create a new dataset with categorical variables and see how the model can be instantiated from it.\n" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuegrouptime_idxcategorical_covariatereal_covariate
00.94460400a0.405124
10.64074901b0.573697
20.01913302b0.253981
30.74983703a0.200379
40.71482404a0.297402
50.34958305b0.822654
60.28039206a0.857269
70.33307107b0.744103
80.02468108b0.084565
90.33907609a0.108766
100.61636410b0.965863
110.65018011b0.339208
120.10908712b0.840201
130.50265213a0.938904
140.99395914a0.730369
150.67132215b0.611059
160.85847916b0.885494
170.17871617a0.894173
180.86069118b0.987288
190.74990519a0.494003
200.78331720a0.176965
210.75645321a0.505112
220.41897422b0.151147
230.16182023a0.160465
240.22411624b0.504209
250.79923525b0.273152
260.50100726b0.151468
270.96315427a0.778906
280.19895528b0.016670
290.17224729b0.818567
\n", "
" ], "text/plain": [ " value group time_idx categorical_covariate real_covariate\n", "0 0.944604 0 0 a 0.405124\n", "1 0.640749 0 1 b 0.573697\n", "2 0.019133 0 2 b 0.253981\n", "3 0.749837 0 3 a 0.200379\n", "4 0.714824 0 4 a 0.297402\n", "5 0.349583 0 5 b 0.822654\n", "6 0.280392 0 6 a 0.857269\n", "7 0.333071 0 7 b 0.744103\n", "8 0.024681 0 8 b 0.084565\n", "9 0.339076 0 9 a 0.108766\n", "10 0.616364 1 0 b 0.965863\n", "11 0.650180 1 1 b 0.339208\n", "12 0.109087 1 2 b 0.840201\n", "13 0.502652 1 3 a 0.938904\n", "14 0.993959 1 4 a 0.730369\n", "15 0.671322 1 5 b 0.611059\n", "16 0.858479 1 6 b 0.885494\n", "17 0.178716 1 7 a 0.894173\n", "18 0.860691 1 8 b 0.987288\n", "19 0.749905 1 9 a 0.494003\n", "20 0.783317 2 0 a 0.176965\n", "21 0.756453 2 1 a 0.505112\n", "22 0.418974 2 2 b 0.151147\n", "23 0.161820 2 3 a 0.160465\n", "24 0.224116 2 4 b 0.504209\n", "25 0.799235 2 5 b 0.273152\n", "26 0.501007 2 6 b 0.151468\n", "27 0.963154 2 7 a 0.778906\n", "28 0.198955 2 8 b 0.016670\n", "29 0.172247 2 9 b 0.818567" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "from pytorch_forecasting import TimeSeriesDataSet\n", "\n", "test_data_with_covariates = pd.DataFrame(\n", " dict(\n", " # as before\n", " value=np.random.rand(30),\n", " group=np.repeat(np.arange(3), 10),\n", " time_idx=np.tile(np.arange(10), 3),\n", " # now adding covariates\n", " categorical_covariate=np.random.choice([\"a\", \"b\"], size=30),\n", " real_covariate=np.random.rand(30),\n", " )\n", ").astype(\n", " dict(group=str)\n", ") # categorical covariates have to be of string type\n", "test_data_with_covariates" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | Name | Type | Params\n", "--------------------------------------------------------------------------------------------\n", "0 | loss | SMAPE | 0 \n", "1 | logging_metrics | ModuleList | 0 \n", "2 | input_embeddings | MultiEmbedding | 11 \n", "3 | input_embeddings.embeddings | ModuleDict | 11 \n", "4 | input_embeddings.embeddings.group | Embedding | 9 \n", "5 | input_embeddings.embeddings.categorical_covariate | Embedding | 2 \n", "6 | network | FullyConnectedModule | 552 \n", "7 | network.sequential | Sequential | 552 \n", "8 | network.sequential.0 | Linear | 310 \n", "9 | network.sequential.1 | ReLU | 0 \n", "10 | network.sequential.2 | Linear | 110 \n", "11 | network.sequential.3 | ReLU | 0 \n", "12 | network.sequential.4 | Linear | 110 \n", "13 | network.sequential.5 | ReLU | 0 \n", "14 | network.sequential.6 | Linear | 22 \n", "--------------------------------------------------------------------------------------------\n", "563 Trainable params\n", "0 Non-trainable params\n", "563 Total params\n", "0.002 Total estimated model params size (MB)\n" ] }, { "data": { "text/plain": [ "\"categorical_groups\": {}\n", "\"embedding_labels\": {'group': {'0': 0, '1': 1, '2': 2}, 'categorical_covariate': {'a': 0, 'b': 1}}\n", "\"embedding_paddings\": []\n", "\"embedding_sizes\": {'group': (3, 3), 'categorical_covariate': (2, 1)}\n", "\"hidden_size\": 10\n", "\"input_size\": 5\n", "\"learning_rate\": 0.001\n", "\"log_gradient_flow\": False\n", "\"log_interval\": -1\n", "\"log_val_interval\": -1\n", "\"logging_metrics\": ModuleList()\n", "\"loss\": SMAPE()\n", "\"monotone_constaints\": {}\n", "\"n_hidden_layers\": 2\n", "\"optimizer\": ranger\n", "\"optimizer_params\": None\n", "\"output_size\": 2\n", "\"output_transformer\": GroupNormalizer(\n", "\tmethod='standard',\n", "\tgroups=[],\n", "\tcenter=True,\n", "\tscale_by_group=False,\n", "\ttransformation='relu',\n", "\tmethod_kwargs={}\n", ")\n", "\"reduce_on_plateau_min_lr\": 1e-05\n", "\"reduce_on_plateau_patience\": 1000\n", "\"reduce_on_plateau_reduction\": 2.0\n", "\"static_categoricals\": ['group']\n", "\"static_reals\": []\n", "\"time_varying_categoricals_decoder\": ['categorical_covariate']\n", "\"time_varying_categoricals_encoder\": ['categorical_covariate']\n", "\"time_varying_reals_decoder\": ['real_covariate']\n", "\"time_varying_reals_encoder\": ['real_covariate', 'value']\n", "\"weight_decay\": 0.0\n", "\"x_categoricals\": ['group', 'categorical_covariate']\n", "\"x_reals\": ['real_covariate', 'value']" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create the dataset from the pandas dataframe\n", "dataset_with_covariates = TimeSeriesDataSet(\n", " test_data_with_covariates,\n", " group_ids=[\"group\"],\n", " target=\"value\",\n", " time_idx=\"time_idx\",\n", " min_encoder_length=5,\n", " max_encoder_length=5,\n", " min_prediction_length=2,\n", " max_prediction_length=2,\n", " time_varying_unknown_reals=[\"value\"],\n", " time_varying_known_reals=[\"real_covariate\"],\n", " time_varying_known_categoricals=[\"categorical_covariate\"],\n", " static_categoricals=[\"group\"],\n", ")\n", "\n", "model = FullyConnectedModelWithCovariates.from_dataset(dataset_with_covariates, hidden_size=10, n_hidden_layers=2)\n", "print(ModelSummary(model, max_depth=-1)) # print model summary\n", "model.hparams" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "To test that the model could be trained, pass a sample batch.\n" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Output(prediction=tensor([[0.6245, 0.5642],\n", " [0.6215, 0.5603],\n", " [0.6228, 0.5637],\n", " [0.6277, 0.5627]], grad_fn=))" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x, y = next(iter(dataset_with_covariates.to_dataloader(batch_size=4))) # generate batch\n", "model(x) # pass batch through model" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Implementing an autoregressive / recurrent model\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Often time series models are autoregressive, i.e. one does not make `n` predictions for all future steps in one function call but predicts ``n`` times one step ahead. PyTorch Forecasting comes with a\n", ":py:class:`~pytorch_forecasting.models.base_model.AutoRegressiveBaseModel` and a :py:class:`~pytorch_forecasting.models.base_model.AutoRegressiveBaseModelWithCovariates` for such models.\n", "\n", ".. autoclass:: pytorch_forecasting.models.base_model.AutoRegressiveBaseModel\n", " :noindex:\n", "\n", "In this section, we will implement a simple LSTM model that could be easily extended to work with covariates. Note that because we do not handle covariates, lagged targets cannot be incorporated in this network. We use an implementation of the :py:class:`~pytorch_forecasting.models.nn.rnn.LSTM` that can handle zero-length sequences but otherwise 100% mirrors the PyTorch-native implementation." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | Name | Type | Params\n", "-----------------------------------------------\n", "0 | loss | SMAPE | 0 \n", "1 | logging_metrics | ModuleList | 0 \n", "2 | lstm | LSTM | 1.4 K \n", "3 | output_layer | Linear | 11 \n", "-----------------------------------------------\n", "1.4 K Trainable params\n", "0 Non-trainable params\n", "1.4 K Total params\n", "0.006 Total estimated model params size (MB)\n" ] }, { "data": { "text/plain": [ "\"dropout\": 0.1\n", "\"hidden_size\": 10\n", "\"learning_rate\": 0.001\n", "\"log_gradient_flow\": False\n", "\"log_interval\": -1\n", "\"log_val_interval\": -1\n", "\"logging_metrics\": ModuleList()\n", "\"loss\": SMAPE()\n", "\"monotone_constaints\": {}\n", "\"n_layers\": 2\n", "\"optimizer\": ranger\n", "\"optimizer_params\": None\n", "\"output_transformer\": GroupNormalizer(\n", "\tmethod='standard',\n", "\tgroups=[],\n", "\tcenter=True,\n", "\tscale_by_group=False,\n", "\ttransformation=None,\n", "\tmethod_kwargs={}\n", ")\n", "\"reduce_on_plateau_min_lr\": 1e-05\n", "\"reduce_on_plateau_patience\": 1000\n", "\"reduce_on_plateau_reduction\": 2.0\n", "\"target\": value\n", "\"target_lags\": {}\n", "\"weight_decay\": 0.0" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from torch.nn.utils import rnn\n", "\n", "from pytorch_forecasting.models.base_model import AutoRegressiveBaseModel\n", "from pytorch_forecasting.models.nn import LSTM\n", "\n", "\n", "class LSTMModel(AutoRegressiveBaseModel):\n", " def __init__(\n", " self,\n", " target: str,\n", " target_lags: Dict[str, Dict[str, int]],\n", " n_layers: int,\n", " hidden_size: int,\n", " dropout: float = 0.1,\n", " **kwargs,\n", " ):\n", " # arguments target and target_lags are required for autoregressive models\n", " # even though target_lags cannot be used without covariates\n", " # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this\n", " self.save_hyperparameters()\n", " # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this\n", " super().__init__(**kwargs)\n", "\n", " # use version of LSTM that can handle zero-length sequences\n", " self.lstm = LSTM(\n", " hidden_size=self.hparams.hidden_size,\n", " input_size=1,\n", " num_layers=self.hparams.n_layers,\n", " dropout=self.hparams.dropout,\n", " batch_first=True,\n", " )\n", " self.output_layer = nn.Linear(self.hparams.hidden_size, 1)\n", "\n", " def encode(self, x: Dict[str, torch.Tensor]):\n", " # we need at least one encoding step as because the target needs to be lagged by one time step\n", " # because we use the custom LSTM, we do not have to require encoder lengths of > 1\n", " # but can handle lengths of >= 1\n", " assert x[\"encoder_lengths\"].min() >= 1\n", " input_vector = x[\"encoder_cont\"].clone()\n", " # lag target by one\n", " input_vector[..., self.target_positions] = torch.roll(\n", " input_vector[..., self.target_positions], shifts=1, dims=1\n", " )\n", " input_vector = input_vector[:, 1:] # first time step cannot be used because of lagging\n", "\n", " # determine effective encoder_length length\n", " effective_encoder_lengths = x[\"encoder_lengths\"] - 1\n", " # run through LSTM network\n", " _, hidden_state = self.lstm(\n", " input_vector, lengths=effective_encoder_lengths, enforce_sorted=False # passing the lengths directly\n", " ) # second ouput is not needed (hidden state)\n", " return hidden_state\n", "\n", " def decode(self, x: Dict[str, torch.Tensor], hidden_state):\n", " # again lag target by one\n", " input_vector = x[\"decoder_cont\"].clone()\n", " input_vector[..., self.target_positions] = torch.roll(\n", " input_vector[..., self.target_positions], shifts=1, dims=1\n", " )\n", " # but this time fill in missing target from encoder_cont at the first time step instead of throwing it away\n", " last_encoder_target = x[\"encoder_cont\"][\n", " torch.arange(x[\"encoder_cont\"].size(0), device=x[\"encoder_cont\"].device),\n", " x[\"encoder_lengths\"] - 1,\n", " self.target_positions.unsqueeze(-1),\n", " ].T\n", " input_vector[:, 0, self.target_positions] = last_encoder_target\n", "\n", " if self.training: # training mode\n", " lstm_output, _ = self.lstm(input_vector, hidden_state, lengths=x[\"decoder_lengths\"], enforce_sorted=False)\n", "\n", " # transform into right shape\n", " prediction = self.output_layer(lstm_output)\n", " prediction = self.transform_output(prediction, target_scale=x[\"target_scale\"])\n", "\n", " # predictions are not yet rescaled\n", " return prediction\n", "\n", " else: # prediction mode\n", " target_pos = self.target_positions\n", "\n", " def decode_one(idx, lagged_targets, hidden_state):\n", " x = input_vector[:, [idx]]\n", " # overwrite at target positions\n", " x[:, 0, target_pos] = lagged_targets[-1] # take most recent target (i.e. lag=1)\n", " lstm_output, hidden_state = self.lstm(x, hidden_state)\n", " # transform into right shape\n", " prediction = self.output_layer(lstm_output)[:, 0] # take first timestep\n", " return prediction, hidden_state\n", "\n", " # make predictions which are fed into next step\n", " output = self.decode_autoregressive(\n", " decode_one,\n", " first_target=input_vector[:, 0, target_pos],\n", " first_hidden_state=hidden_state,\n", " target_scale=x[\"target_scale\"],\n", " n_decoder_steps=input_vector.size(1),\n", " )\n", "\n", " # predictions are already rescaled\n", " return output\n", "\n", " def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n", " hidden_state = self.encode(x) # encode to hidden state\n", " output = self.decode(x, hidden_state) # decode leveraging hidden state\n", "\n", " return self.to_network_output(prediction=output)\n", "\n", "\n", "model = LSTMModel.from_dataset(dataset, n_layers=2, hidden_size=10)\n", "print(ModelSummary(model, max_depth=-1))\n", "model.hparams" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We used the :py:meth:`~pytorch_forecasting.models.base_model.BaseModel.transform_output` method to apply the inverse transformation. It is also used under the hood for re-scaling/de-normalizing predictions and leverages the ``output_transformer`` to do so. The ``output_transformer`` is the ``target_normalizer`` as used in the dataset. When initializing the model from the dataset, it is automatically copied to the model.\n", "\n", "We can now check that both approaches deliver the same result in terms of prediction shape:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "prediction shape in training: torch.Size([4, 2, 1])\n", "prediction shape in inference: torch.Size([4, 2, 1])\n" ] } ], "source": [ "x, y = next(iter(dataloader))\n", "\n", "print(\n", " \"prediction shape in training:\", model(x)[\"prediction\"].size()\n", ") # batch_size x decoder time steps x 1 (1 for one target dimension)\n", "model.eval() # set model into eval mode to use autoregressive prediction\n", "print(\"prediction shape in inference:\", model(x)[\"prediction\"].size()) # should be the same as in training" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Using and defining a custom/non-trivial metric\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "To use a different metric, simply pass it to the model when initializing it (preferably via the `from_dataset()` method). For example, to use mean absolute error with our `FullyConnectedModel` from the beginning of this tutorial, type\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"hidden_size\": 10\n", "\"input_size\": 5\n", "\"learning_rate\": 0.001\n", "\"log_gradient_flow\": False\n", "\"log_interval\": -1\n", "\"log_val_interval\": -1\n", "\"logging_metrics\": ModuleList()\n", "\"loss\": MAE()\n", "\"monotone_constaints\": {}\n", "\"n_hidden_layers\": 2\n", "\"optimizer\": ranger\n", "\"optimizer_params\": None\n", "\"output_size\": 2\n", "\"output_transformer\": GroupNormalizer(\n", "\tmethod='standard',\n", "\tgroups=[],\n", "\tcenter=True,\n", "\tscale_by_group=False,\n", "\ttransformation=None,\n", "\tmethod_kwargs={}\n", ")\n", "\"reduce_on_plateau_min_lr\": 1e-05\n", "\"reduce_on_plateau_patience\": 1000\n", "\"reduce_on_plateau_reduction\": 2.0\n", "\"weight_decay\": 0.0" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pytorch_forecasting.metrics import MAE\n", "\n", "model = FullyConnectedModel.from_dataset(dataset, hidden_size=10, n_hidden_layers=2, loss=MAE())\n", "model.hparams" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Note that some metrics might require a certain form of model prediction, e.g. quantile prediction assumes an output of shape `batch_size x n_decoder_timesteps x n_quantiles` instead of `batch_size x n_decoder_timesteps`. For the `FullyConnectedModel`, this means that we need to use a modified `FullyConnectedModule`network. Here `n_outputs` corresponds to the number of quantiles.\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([20, 2, 7])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import torch\n", "from torch import nn\n", "\n", "\n", "class FullyConnectedMultiOutputModule(nn.Module):\n", " def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int, n_outputs: int):\n", " super().__init__()\n", "\n", " # input layer\n", " module_list = [nn.Linear(input_size, hidden_size), nn.ReLU()]\n", " # hidden layers\n", " for _ in range(n_hidden_layers):\n", " module_list.extend([nn.Linear(hidden_size, hidden_size), nn.ReLU()])\n", " # output layer\n", " self.n_outputs = n_outputs\n", " module_list.append(\n", " nn.Linear(hidden_size, output_size * n_outputs)\n", " ) # <<<<<<<< modified: replaced output_size with output_size * n_outputs\n", "\n", " self.sequential = nn.Sequential(*module_list)\n", "\n", " def forward(self, x: torch.Tensor) -> torch.Tensor:\n", " # x of shape: batch_size x n_timesteps_in\n", " # output of shape batch_size x n_timesteps_out\n", " return self.sequential(x).reshape(x.size(0), -1, self.n_outputs) # <<<<<<<< modified: added reshape\n", "\n", "\n", "# test that network works as intended\n", "network = FullyConnectedMultiOutputModule(input_size=5, output_size=2, hidden_size=10, n_hidden_layers=2, n_outputs=7)\n", "network(torch.rand(20, 5)).shape # <<<<<<<<<< instead of shape (20, 2), returning additional dimension for quantiles" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Using the above-defined ``FullyConnectedMultiOutputModule``, we could create a new model and use :py:class:`~pytorch_forecasting.metrics.QuantileLoss`. Note that you would have to align ``n_outputs`` with the number of quantiles in the :py:class:`~pytorch_forecasting.metrics.QuantileLoss` class either manually or by making use of the `from_dataset()` method. If you want to switch back to a loss on a single output such as for :py:class:`~pytorch_forecasting.metrics.MAE`, simply set the ``n_ouputs=1`` as all PyTorch Forecasting metrics can handle the additional third dimension as long as it is of size 1." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Implement a new metric\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "To implement a new metric, you simply need to inherit from the :py:class:`~pytorch_forecasting.metrics.MultiHorizonMetric` and define the loss function. The :py:class:`~pytorch_forecasting.metrics.MultiHorizonMetric` handles everything from weighting to masking values for you. E.g. the mean absolute error is implemented as" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "from pytorch_forecasting.metrics import MultiHorizonMetric\n", "\n", "\n", "class MAE(MultiHorizonMetric):\n", " def loss(self, y_pred, target):\n", " loss = (self.to_prediction(y_pred) - target).abs()\n", " return loss" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "You might notice the :py:meth:`~pytorch_forecasting.metrics.Metric.to_prediction` method. Generally speaking, it convertes ``y_pred`` to a point-prediction. By default, this means that it removes the third dimension from ``y_pred`` if there is one. For most metrics, this is exactly what you need.\n", "\n", "For custom :py:class:`~pytorch_forecasting.metrics.DistributionLoss` metrics, different methods need to be implemented.\n", "\n", ".. autoclass:: pytorch_forecasting.metrics.DistributionLoss\n", " :members: map_x_to_distribution, rescale_parameters\n", " :noindex:" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Model ouptut cannot be readily converted to prediction\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Sometimes a networks's ``forward()`` output does not trivially map to a prediction. For example, this is the case if you predict the parameters of a distribution as is the case for all classes deriving from :py:class:`~pytorch_forecasting.metrics.DistributionLoss`. In particular, this means that you need to handle training and prediction differently. Converting the parameters to predictions is typically implemented by the metric's ``to_prediction()`` method.\n", "\n", "We will study now the case of the :py:class:`~pytorch_forecasting.metrics.NormalDistributionLoss`. It requires us to predict the ``mean`` and the ``scale`` of the normal distribution. We can do so by leveraging our ``FullyConnectedMultiOutputModule`` class that we used for predicting multiple quantiles." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | Name | Type | Params\n", "--------------------------------------------------------------------------\n", "0 | loss | NormalDistributionLoss | 0 \n", "1 | logging_metrics | ModuleList | 0 \n", "2 | network | FullyConnectedMultiOutputModule | 324 \n", "3 | network.sequential | Sequential | 324 \n", "4 | network.sequential.0 | Linear | 60 \n", "5 | network.sequential.1 | ReLU | 0 \n", "6 | network.sequential.2 | Linear | 110 \n", "7 | network.sequential.3 | ReLU | 0 \n", "8 | network.sequential.4 | Linear | 110 \n", "9 | network.sequential.5 | ReLU | 0 \n", "10 | network.sequential.6 | Linear | 44 \n", "--------------------------------------------------------------------------\n", "324 Trainable params\n", "0 Non-trainable params\n", "324 Total params\n", "0.001 Total estimated model params size (MB)\n" ] }, { "data": { "text/plain": [ "\"hidden_size\": 10\n", "\"input_size\": 5\n", "\"learning_rate\": 0.001\n", "\"log_gradient_flow\": False\n", "\"log_interval\": -1\n", "\"log_val_interval\": -1\n", "\"logging_metrics\": ModuleList()\n", "\"loss\": SMAPE()\n", "\"monotone_constaints\": {}\n", "\"n_hidden_layers\": 2\n", "\"optimizer\": ranger\n", "\"optimizer_params\": None\n", "\"output_size\": 2\n", "\"output_transformer\": GroupNormalizer(\n", "\tmethod='standard',\n", "\tgroups=[],\n", "\tcenter=True,\n", "\tscale_by_group=False,\n", "\ttransformation=None,\n", "\tmethod_kwargs={}\n", ")\n", "\"reduce_on_plateau_min_lr\": 1e-05\n", "\"reduce_on_plateau_patience\": 1000\n", "\"reduce_on_plateau_reduction\": 2.0\n", "\"weight_decay\": 0.0" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from copy import copy\n", "\n", "from pytorch_forecasting.metrics import NormalDistributionLoss\n", "\n", "\n", "class FullyConnectedForDistributionLossModel(BaseModel): # we inherit the `from_dataset` method\n", " def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int, **kwargs):\n", " # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this\n", " self.save_hyperparameters()\n", " # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this\n", " super().__init__(**kwargs)\n", " self.network = FullyConnectedMultiOutputModule(\n", " input_size=self.hparams.input_size,\n", " output_size=self.hparams.output_size,\n", " hidden_size=self.hparams.hidden_size,\n", " n_hidden_layers=self.hparams.n_hidden_layers,\n", " n_outputs=2, # <<<<<<<< we predict two outputs for mean and scale of the normal distribution\n", " )\n", " self.loss = NormalDistributionLoss()\n", "\n", " @classmethod\n", " def from_dataset(cls, dataset: TimeSeriesDataSet, **kwargs):\n", " new_kwargs = {\n", " \"output_size\": dataset.max_prediction_length,\n", " \"input_size\": dataset.max_encoder_length,\n", " }\n", " new_kwargs.update(kwargs) # use to pass real hyperparameters and override defaults set by dataset\n", " # example for dataset validation\n", " assert dataset.max_prediction_length == dataset.min_prediction_length, \"Decoder only supports a fixed length\"\n", " assert dataset.min_encoder_length == dataset.max_encoder_length, \"Encoder only supports a fixed length\"\n", " assert (\n", " len(dataset.time_varying_known_categoricals) == 0\n", " and len(dataset.time_varying_known_reals) == 0\n", " and len(dataset.time_varying_unknown_categoricals) == 0\n", " and len(dataset.static_categoricals) == 0\n", " and len(dataset.static_reals) == 0\n", " and len(dataset.time_varying_unknown_reals) == 1\n", " and dataset.time_varying_unknown_reals[0] == dataset.target\n", " ), \"Only covariate should be the target in 'time_varying_unknown_reals'\"\n", "\n", " return super().from_dataset(dataset, **new_kwargs)\n", "\n", " def forward(self, x: Dict[str, torch.Tensor], n_samples: int = None) -> Dict[str, torch.Tensor]:\n", " # x is a batch generated based on the TimeSeriesDataset\n", " network_input = x[\"encoder_cont\"].squeeze(-1)\n", " prediction = self.network(network_input) # shape batch_size x n_decoder_steps x 2\n", " # we need to scale the parameters to real space\n", " prediction = self.transform_output(\n", " prediction=prediction,\n", " target_scale=x[\"target_scale\"],\n", " )\n", " if n_samples is not None:\n", " # sample from distribution\n", " prediction = self.loss.sample(prediction, n_samples)\n", " # The conversion to a named tuple can be directly achieved with the `to_network_output` function.\n", " return self.to_network_output(prediction=prediction)\n", "\n", "\n", "model = FullyConnectedForDistributionLossModel.from_dataset(dataset, hidden_size=10, n_hidden_layers=2)\n", "print(ModelSummary(model, max_depth=-1))\n", "model.hparams" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "You notice that not much changes. All the magic is implemented in the metric itself that knows how to re-scale the network output to \"parameters\" transform distribution \"parameters\" to \"predictions\" using the model's ``transform_output()`` method and the metric's ``to_prediction`` method under the hood, respectively.\n", "\n", "We can now test that the network works as expected:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([2, 2, 2, 2])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[\"decoder_lengths\"]" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "parameter predition shape: torch.Size([4, 2, 4])\n", "sample prediction shape: torch.Size([4, 2, 200])\n" ] } ], "source": [ "x, y = next(iter(dataloader))\n", "\n", "print(\"parameter predition shape: \", model(x)[\"prediction\"].size())\n", "model.eval() # set model into eval mode for sampling\n", "print(\"sample prediction shape: \", model(x, n_samples=200)[\"prediction\"].size())" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "To run inference, you can still use the :py:meth:`~pytorch_forecasting.models.base_model.BaseModel.predict()` method as additional arguments are passed to the metrics's ``to_quantiles()`` method with the ``mode_kwargs`` parameter, e.g. we can execute the following line to generate 100 traces and subsequently calculate quantiles." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: True\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] }, { "data": { "text/plain": [ "torch.Size([12, 2, 7])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.predict(dataloader, mode=\"quantiles\", mode_kwargs=dict(n_samples=100)).shape" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext", "tags": [] }, "source": [ "The returned quantiles are here determined by the quantiles defined in the loss function and can be modified by passing a list of quantiles to at initialization.\n", "\n", "Note that the sampling in the network's ``forward()`` method is not strictly necessary here. However, e.g. for stochastic, autogressive networks such as :py:class:`~pytorch_forecasting.models.deepar.DeepAR`, predicting should be done by passing ``n_samples=100`` directly to the predict method. Samples should be either aggregated with ``mode_kwargs=dict(use_metric=False)`` (added automatically) or extracted directly with ``mode=(\"raw\", \"prediction\")`` (equivalent to ``mode=\"samples\"`` in DeepAR)." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0.02, 0.1, 0.25, 0.5, 0.75, 0.9, 0.98]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.loss.quantiles" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0.2, 0.8]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "NormalDistributionLoss(quantiles=[0.2, 0.8]).quantiles" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Adding custom plotting and interpretation\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "PyTorch Forecasting supports plotting of predictions and interpretations. The figures can also be logged as part of monitoring training progress using tensorboard. Sometimes, the output of the network cannot be directly plotted together with the actually observed time series. In these cases (such as our ``FullyConnectedForDistributionLossModel`` from the previous section), we need to fix the plotting function. Further, sometimes we want to visualize certain properties of the network every other batch or after every epoch. It is easy to make this happen with PyTorch Forecasting and the `LightningModule `_ on which the :py:class:`~pytorch_forecasting.models.base_model.BaseModel` is based." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "The :py:meth:`~pytorch_forecasting.models.base_model.BaseModel.log_interval` property provides a log_interval that switches automatically between the hyperparameters ``log_interval`` or ``log_val_interval`` depending if the model is in training or validation mode. If it is larger than 0, logging is enabled and if ``batch_idx % log_interval == 0`` for a batch, logging for that batch is triggered. You can even set it to a number smaller than 1 leading to multiple logging events during a single batch." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Log often whenever an example prediction vs actuals plot is created\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "One of the easiest ways to log a figure regularly, is overriding the :py:meth:`~pytorch_forecasting.models.base_model.BaseModel.plot_prediction` method, e.g. to add something to the generated plot.\n", "\n", "In the following example, we will add an additional line indicating attention to the figure logged:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "\n", "def plot_prediction(\n", " self,\n", " x: Dict[str, torch.Tensor],\n", " out: Dict[str, torch.Tensor],\n", " idx: int,\n", " plot_attention: bool = True,\n", " add_loss_to_title: bool = False,\n", " show_future_observed: bool = True,\n", " ax=None,\n", ") -> plt.Figure:\n", " \"\"\"\n", " Plot actuals vs prediction and attention\n", "\n", " Args:\n", " x (Dict[str, torch.Tensor]): network input\n", " out (Dict[str, torch.Tensor]): network output\n", " idx (int): sample index\n", " plot_attention: if to plot attention on secondary axis\n", " add_loss_to_title: if to add loss to title. Default to False.\n", " show_future_observed: if to show actuals for future. Defaults to True.\n", " ax: matplotlib axes to plot on\n", "\n", " Returns:\n", " plt.Figure: matplotlib figure\n", " \"\"\"\n", " # plot prediction as normal\n", " fig = super().plot_prediction(\n", " x, out, idx=idx, add_loss_to_title=add_loss_to_title, show_future_observed=show_future_observed, ax=ax\n", " )\n", "\n", " # add attention on secondary axis\n", " if plot_attention:\n", " interpretation = self.interpret_output(out)\n", " ax = fig.axes[0]\n", " ax2 = ax.twinx()\n", " ax2.set_ylabel(\"Attention\")\n", " encoder_length = x[\"encoder_lengths\"][idx]\n", " ax2.plot(\n", " torch.arange(-encoder_length, 0),\n", " interpretation[\"attention\"][idx, :encoder_length].detach().cpu(),\n", " alpha=0.2,\n", " color=\"k\",\n", " )\n", " fig.tight_layout()\n", " return fig" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "If you want to add a completely new figure, override the :py:meth:`~pytorch_forecasting.models.base_model.BaseModel.log_prediction` method." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Log at the end of an epoch\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Logging at the end of an epoch is another common use case. You might want to calculate additional results in each step and then summarize them at the end of an epoch. Here, you can override the :py:meth:`~pytorch_forecasting.models.base_model.BaseModel.create_log` method to calculate additional results to summarize and the ``on_epoch_end()`` hook provided by PyTorch Lightning.\n", "\n", "In the example below, we first calculate some interpretation result (but only if logging is enabled) and add it to the ``log`` object for later summarization. In the ``on_epoch_end()`` hook we take the list of saved results, and\n", "use the ``log_interpretation()`` method (that is defined in the model elsewhere) to log a figure to the tensorboard." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "from pytorch_forecasting.utils import detach\n", "\n", "\n", "def create_log(self, x, y, out, batch_idx, **kwargs):\n", " # log standard\n", " log = super().create_log(x, y, out, batch_idx, **kwargs)\n", " # calculate interpretations etc for latter logging\n", " if self.log_interval > 0:\n", " interpretation = self.interpret_output(\n", " detach(out),\n", " reduction=\"sum\",\n", " attention_prediction_horizon=0, # attention only for first prediction horizon\n", " )\n", " log[\"interpretation\"] = interpretation\n", " return log\n", "\n", "\n", "def on_epoch_end(self, outputs):\n", " \"\"\"\n", " Run at epoch end for training or validation\n", " \"\"\"\n", " if self.log_interval > 0:\n", " self.log_interpretation(outputs)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Log at the end of training\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "A common use case is to log the final embeddings at the end of training. You can easily achieve this by levering the PyTorch Lightning ``on_fit_end()`` model hook. Override that method to log the embeddings.\n", "\n", "The follow example assumes that there is a ``input_embeddings`` is a dictionary like object of embeddings that are being trained such as the :py:class:`~pytorch_forecasting.models.nn.embeddings.MultiEmbedding` class. Further a hyperparameter ``embedding_labels`` exists (as automatically required and created by the :py:class:`~pytorch_forecasting.models.base_model.BaseModelWithCovariates`." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "def on_fit_end(self):\n", " \"\"\"\n", " run at the end of training\n", " \"\"\"\n", " if self.log_interval > 0:\n", " for name, emb in self.input_embeddings.items():\n", " labels = self.hparams.embedding_labels[name]\n", " self.logger.experiment.add_embedding(\n", " emb.weight.data.cpu(), metadata=labels, tag=name, global_step=self.global_step\n", " )" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Minimal testing of models\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Testing models is essential to quickly detect problems and iterate quickly. Some issues can be only identified after lengthy training but many problems show up after one or two batches. PyTorch Lightning, on which PyTorch Forecasting is built, makes it easy to set up such tests.\n" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Every model should be trainable with some minimal dataset. Here is how:\n", "\n", "#. Define a dataset that works with the model. If it takes long to create, you can save it to disk with the :py:meth:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet.save` method and load it with the :py:meth:`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet.load` method when you want to run tests. In any case, create a reasonably small dataset.\n", "\n", "#. Initialize your model with ``log_interval=1`` to test logging of plots - in particular the `plot_prediction()` method.\n", "\n", "#. Define a `Pytorch Lightning Trainer `_ and initialize it with ``fast_dev_run=True``. This ensures that not full epochs but just a couple of batches are passed through the training and validation steps.\n", "\n", "#. Train your model and check that it executes.\n", "\n", "As example, we marshall the ``FullyConnectedForDistributionLossModel`` defined earlier in this tutorial:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: True\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n", "Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.\n", "\n", " | Name | Type | Params\n", "--------------------------------------------------------------------\n", "0 | loss | NormalDistributionLoss | 0 \n", "1 | logging_metrics | ModuleList | 0 \n", "2 | network | FullyConnectedMultiOutputModule | 324 \n", "--------------------------------------------------------------------\n", "324 Trainable params\n", "0 Non-trainable params\n", "324 Total params\n", "0.001 Total estimated model params size (MB)\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d6e5e14c57e443629b86be774042e631", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Training: 0it [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a67cad7761fc4a039cfa004973b94b34", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Validation: 0it [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "`Trainer.fit` stopped: `max_steps=1` reached.\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from lightning.pytorch import Trainer\n", "\n", "model = FullyConnectedForDistributionLossModel.from_dataset(dataset, hidden_size=10, n_hidden_layers=2, log_interval=1)\n", "trainer = Trainer(fast_dev_run=True)\n", "trainer.fit(model, train_dataloaders=dataloader, val_dataloaders=dataloader)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" }, "vscode": { "interpreter": { "hash": "9aebce72564876525c4f775620217d3701f12ed8dccc94588028ba1e29a0a158" } } }, "nbformat": 4, "nbformat_minor": 4 }