NHiTS#
- class pytorch_forecasting.models.nhits.NHiTS(output_size: int | List[int] = 1, static_categoricals: List[str] = [], static_reals: List[str] = [], time_varying_categoricals_encoder: List[str] = [], time_varying_categoricals_decoder: List[str] = [], categorical_groups: Dict[str, List[str]] = {}, time_varying_reals_encoder: List[str] = [], time_varying_reals_decoder: List[str] = [], embedding_sizes: Dict[str, Tuple[int, int]] = {}, embedding_paddings: List[str] = [], embedding_labels: Dict[str, ndarray] = {}, x_reals: List[str] = [], x_categoricals: List[str] = [], context_length: int = 1, prediction_length: int = 1, static_hidden_size: int | None = None, naive_level: bool = True, shared_weights: bool = True, activation: str = 'ReLU', initialization: str = 'lecun_normal', n_blocks: List[int] = [1, 1, 1], n_layers: int | List[int] = 2, hidden_size: int = 512, pooling_sizes: List[int] | None = None, downsample_frequencies: List[int] | None = None, pooling_mode: str = 'max', interpolation_mode: str = 'linear', batch_normalization: bool = False, dropout: float = 0.0, learning_rate: float = 0.01, log_interval: int = -1, log_gradient_flow: bool = False, log_val_interval: int | None = None, weight_decay: float = 0.001, loss: MultiHorizonMetric | None = None, reduce_on_plateau_patience: int = 1000, backcast_loss_ratio: float = 0.0, logging_metrics: ModuleList | None = None, **kwargs)[source]#
Bases:
BaseModelWithCovariates
Initialize N-HiTS Model - use its
from_dataset()
method if possible.Based on the article N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting. The network has shown to increase accuracy by ~25% against
NBeats
and also supports covariates.- Parameters:
hidden_size (int) – size of hidden layers and can range from 8 to 1024 - use 32-128 if no covariates are employed. Defaults to 512.
static_hidden_size (Optional[int], optional) – size of hidden layers for static variables. Defaults to hidden_size.
loss – loss to optimize. Defaults to MASE(). QuantileLoss is also supported
shared_weights (bool, optional) – if True, weights of blocks are shared in each stack. Defaults to True.
naive_level (bool, optional) – if True, native forecast of last observation is added at the beginnging. Defaults to True.
initialization (str, optional) – Initialization method. One of [‘orthogonal’, ‘he_uniform’, ‘glorot_uniform’, ‘glorot_normal’, ‘lecun_normal’]. Defaults to “lecun_normal”.
n_blocks (List[int], optional) – list of blocks used in each stack (i.e. length of stacks). Defaults to [1, 1, 1].
n_layers (Union[int, List[int]], optional) – Number of layers per block or list of number of layers used by blocks in each stack (i.e. length of stacks). Defaults to 2.
pooling_sizes (Optional[List[int]], optional) – List of pooling sizes for input for each stack, i.e. higher means more smoothing of input. Using an ordering of higher to lower in the list improves results. Defaults to a heuristic.
pooling_mode (str, optional) – Pooling mode for summarizing input. One of [‘max’,’average’]. Defaults to “max”.
downsample_frequencies (Optional[List[int]], optional) – Downsample multiplier of output for each stack, i.e. higher means more interpolation at forecast time is required. Should be equal or higher than pooling_sizes but smaller equal prediction_length. Defaults to a heuristic to match pooling_sizes.
interpolation_mode (str, optional) – Interpolation mode for forecasting. One of [‘linear’, ‘nearest’, ‘cubic-x’] where ‘x’ is replaced by a batch size for the interpolation. Defaults to “linear”.
batch_normalization (bool, optional) – Whether carry out batch normalization. Defaults to False.
dropout (float, optional) – dropout rate for hidden layers. Defaults to 0.0.
activation (str, optional) – activation function. One of [‘ReLU’, ‘Softplus’, ‘Tanh’, ‘SELU’, ‘LeakyReLU’, ‘PReLU’, ‘Sigmoid’]. Defaults to “ReLU”.
output_size – number of outputs (typically number of quantiles for QuantileLoss and one target or list of output sizes but currently only point-forecasts allowed). Set automatically.
static_categoricals – names of static categorical variables
static_reals – names of static continuous variables
time_varying_categoricals_encoder – names of categorical variables for encoder
time_varying_categoricals_decoder – names of categorical variables for decoder
time_varying_reals_encoder – names of continuous variables for encoder
time_varying_reals_decoder – names of continuous variables for decoder
categorical_groups – dictionary where values are list of categorical variables that are forming together a new categorical variable which is the key in the dictionary
x_reals – order of continuous variables in tensor passed to forward function
x_categoricals – order of categorical variables in tensor passed to forward function
hidden_continuous_size – default for hidden size for processing continous variables (similar to categorical embedding size)
hidden_continuous_sizes – dictionary mapping continuous input indices to sizes for variable selection (fallback to hidden_continuous_size if index is not in dictionary)
embedding_sizes – dictionary mapping (string) indices to tuple of number of categorical classes and embedding size
embedding_paddings – list of indices for embeddings which transform the zero’s embedding to a zero vector
embedding_labels – dictionary mapping (string) indices to list of categorical labels
learning_rate – learning rate
log_interval – log predictions every x batches, do not log if 0 or less, log interpretation if > 0. If < 1.0 , will log multiple entries per batch. Defaults to -1.
log_val_interval – frequency with which to log validation set metrics, defaults to log_interval
log_gradient_flow – if to log gradient flow, this takes time and should be only done to diagnose training failures
prediction_length – Length of the prediction. Also known as ‘horizon’.
context_length – Number of time units that condition the predictions. Also known as ‘lookback period’. Should be between 1-10 times the prediction length.
backcast_loss_ratio – weight of backcast in comparison to forecast when calculating the loss. A weight of 1.0 means that forecast and backcast loss is weighted the same (regardless of backcast and forecast lengths). Defaults to 0.0, i.e. no weight.
log_gradient_flow – if to log gradient flow, this takes time and should be only done to diagnose training failures
reduce_on_plateau_patience (int) – patience after which learning rate is reduced by a factor of 10
logging_metrics (nn.ModuleList[MultiHorizonMetric]) – list of metrics that are logged during training. Defaults to nn.ModuleList([SMAPE(), MAE(), RMSE(), MAPE(), MASE()])
**kwargs – additional arguments to
BaseModel
.
Methods
forward
(x)Pass forward of network.
from_dataset
(dataset, **kwargs)Convenience function to create network from :py:class`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet`.
log_interpretation
(x, out, batch_idx)Log interpretation of network predictions in tensorboard.
plot_interpretation
(x, output, idx[, ax])Plot interpretation.
step
(x, y, batch_idx)Take training / validation step.
- forward(x: Dict[str, Tensor]) Dict[str, Tensor] [source]#
Pass forward of network.
- Parameters:
x (Dict[str, torch.Tensor]) – input from dataloader generated from
TimeSeriesDataSet
.- Returns:
output of model
- Return type:
Dict[str, torch.Tensor]
- classmethod from_dataset(dataset: TimeSeriesDataSet, **kwargs)[source]#
Convenience function to create network from :py:class`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet`.
- Parameters:
dataset (TimeSeriesDataSet) – dataset where sole predictor is the target.
**kwargs – additional arguments to be passed to
__init__
method.
- Returns:
NBeats
- log_interpretation(x, out, batch_idx)[source]#
Log interpretation of network predictions in tensorboard.
- plot_interpretation(x: Dict[str, Tensor], output: Dict[str, Tensor], idx: int, ax=None) Figure [source]#
Plot interpretation.
Plot two pannels: prediction and backcast vs actuals and decomposition of prediction into different block predictions which capture different frequencies.
- Parameters:
x (Dict[str, torch.Tensor]) – network input
output (Dict[str, torch.Tensor]) – network output
idx (int) – index of sample for which to plot the interpretation.
ax (List[matplotlib axes], optional) – list of two matplotlib axes onto which to plot the interpretation. Defaults to None.
- Returns:
matplotlib figure
- Return type:
plt.Figure
- property covariate_size: int#
Covariate size.
- Returns:
size of time-dependent covariates
- Return type:
int
- property n_stacks: int#
Number of stacks.
- Returns:
number of stacks.
- Return type:
int
- property static_size: int#
Static covariate size.
- Returns:
size of static covariates
- Return type:
int