class pytorch_forecasting.models.nhits.NHiTS(output_size: Union[int, List[int]] = 1, static_categoricals: List[str] = [], static_reals: List[str] = [], time_varying_categoricals_encoder: List[str] = [], time_varying_categoricals_decoder: List[str] = [], categorical_groups: Dict[str, List[str]] = {}, time_varying_reals_encoder: List[str] = [], time_varying_reals_decoder: List[str] = [], embedding_sizes: Dict[str, Tuple[int, int]] = {}, embedding_paddings: List[str] = [], embedding_labels: Dict[str, ndarray] = {}, x_reals: List[str] = [], x_categoricals: List[str] = [], context_length: int = 1, prediction_length: int = 1, static_hidden_size: Optional[int] = None, naive_level: bool = True, shared_weights: bool = True, activation: str = 'ReLU', initialization: str = 'lecun_normal', n_blocks: List[int] = [1, 1, 1], n_layers: Union[int, List[int]] = 2, hidden_size: int = 512, pooling_sizes: Optional[List[int]] = None, downsample_frequencies: Optional[List[int]] = None, pooling_mode: str = 'max', interpolation_mode: str = 'linear', batch_normalization: bool = False, dropout: float = 0.0, learning_rate: float = 0.01, log_interval: int = -1, log_gradient_flow: bool = False, log_val_interval: Optional[int] = None, weight_decay: float = 0.001, loss: Optional[MultiHorizonMetric] = None, reduce_on_plateau_patience: int = 1000, backcast_loss_ratio: float = 0.0, logging_metrics: Optional[ModuleList] = None, **kwargs)[source]#

Bases: BaseModelWithCovariates

Initialize N-HiTS Model - use its from_dataset() method if possible.

Based on the article N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting. The network has shown to increase accuracy by ~25% against NBeats and also supports covariates.

  • hidden_size (int) – size of hidden layers and can range from 8 to 1024 - use 32-128 if no covariates are employed. Defaults to 512.

  • static_hidden_size (Optional[int], optional) – size of hidden layers for static variables. Defaults to hidden_size.

  • loss – loss to optimize. Defaults to MASE(). QuantileLoss is also supported

  • shared_weights (bool, optional) – if True, weights of blocks are shared in each stack. Defaults to True.

  • naive_level (bool, optional) – if True, native forecast of last observation is added at the beginnging. Defaults to True.

  • initialization (str, optional) – Initialization method. One of [‘orthogonal’, ‘he_uniform’, ‘glorot_uniform’, ‘glorot_normal’, ‘lecun_normal’]. Defaults to “lecun_normal”.

  • n_blocks (List[int], optional) – list of blocks used in each stack (i.e. length of stacks). Defaults to [1, 1, 1].

  • n_layers (Union[int, List[int]], optional) – Number of layers per block or list of number of layers used by blocks in each stack (i.e. length of stacks). Defaults to 2.

  • pooling_sizes (Optional[List[int]], optional) – List of pooling sizes for input for each stack, i.e. higher means more smoothing of input. Using an ordering of higher to lower in the list improves results. Defaults to a heuristic.

  • pooling_mode (str, optional) – Pooling mode for summarizing input. One of [‘max’,’average’]. Defaults to “max”.

  • downsample_frequencies (Optional[List[int]], optional) – Downsample multiplier of output for each stack, i.e. higher means more interpolation at forecast time is required. Should be equal or higher than pooling_sizes but smaller equal prediction_length. Defaults to a heuristic to match pooling_sizes.

  • interpolation_mode (str, optional) – Interpolation mode for forecasting. One of [‘linear’, ‘nearest’, ‘cubic-x’] where ‘x’ is replaced by a batch size for the interpolation. Defaults to “linear”.

  • batch_normalization (bool, optional) – Whether carry out batch normalization. Defaults to False.

  • dropout (float, optional) – dropout rate for hidden layers. Defaults to 0.0.

  • activation (str, optional) – activation function. One of [‘ReLU’, ‘Softplus’, ‘Tanh’, ‘SELU’, ‘LeakyReLU’, ‘PReLU’, ‘Sigmoid’]. Defaults to “ReLU”.

  • output_size – number of outputs (typically number of quantiles for QuantileLoss and one target or list of output sizes but currently only point-forecasts allowed). Set automatically.

  • static_categoricals – names of static categorical variables

  • static_reals – names of static continuous variables

  • time_varying_categoricals_encoder – names of categorical variables for encoder

  • time_varying_categoricals_decoder – names of categorical variables for decoder

  • time_varying_reals_encoder – names of continuous variables for encoder

  • time_varying_reals_decoder – names of continuous variables for decoder

  • categorical_groups – dictionary where values are list of categorical variables that are forming together a new categorical variable which is the key in the dictionary

  • x_reals – order of continuous variables in tensor passed to forward function

  • x_categoricals – order of categorical variables in tensor passed to forward function

  • hidden_continuous_size – default for hidden size for processing continous variables (similar to categorical embedding size)

  • hidden_continuous_sizes – dictionary mapping continuous input indices to sizes for variable selection (fallback to hidden_continuous_size if index is not in dictionary)

  • embedding_sizes – dictionary mapping (string) indices to tuple of number of categorical classes and embedding size

  • embedding_paddings – list of indices for embeddings which transform the zero’s embedding to a zero vector

  • embedding_labels – dictionary mapping (string) indices to list of categorical labels

  • learning_rate – learning rate

  • log_interval – log predictions every x batches, do not log if 0 or less, log interpretation if > 0. If < 1.0 , will log multiple entries per batch. Defaults to -1.

  • log_val_interval – frequency with which to log validation set metrics, defaults to log_interval

  • log_gradient_flow – if to log gradient flow, this takes time and should be only done to diagnose training failures

  • prediction_length – Length of the prediction. Also known as ‘horizon’.

  • context_length – Number of time units that condition the predictions. Also known as ‘lookback period’. Should be between 1-10 times the prediction length.

  • backcast_loss_ratio – weight of backcast in comparison to forecast when calculating the loss. A weight of 1.0 means that forecast and backcast loss is weighted the same (regardless of backcast and forecast lengths). Defaults to 0.0, i.e. no weight.

  • log_gradient_flow – if to log gradient flow, this takes time and should be only done to diagnose training failures

  • reduce_on_plateau_patience (int) – patience after which learning rate is reduced by a factor of 10

  • logging_metrics (nn.ModuleList[MultiHorizonMetric]) – list of metrics that are logged during training. Defaults to nn.ModuleList([SMAPE(), MAE(), RMSE(), MAPE(), MASE()])

  • **kwargs – additional arguments to BaseModel.



Pass forward of network.

from_dataset(dataset, **kwargs)

Convenience function to create network from :py:class`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet`.

log_interpretation(x, out, batch_idx)

Log interpretation of network predictions in tensorboard.

plot_interpretation(x, output, idx[, ax])

Plot interpretation.

step(x, y, batch_idx)

Take training / validation step.

forward(x: Dict[str, Tensor]) Dict[str, Tensor][source]#

Pass forward of network.


x (Dict[str, torch.Tensor]) – input from dataloader generated from TimeSeriesDataSet.


output of model

Return type

Dict[str, torch.Tensor]

classmethod from_dataset(dataset: TimeSeriesDataSet, **kwargs)[source]#

Convenience function to create network from :py:class`~pytorch_forecasting.data.timeseries.TimeSeriesDataSet`.

  • dataset (TimeSeriesDataSet) – dataset where sole predictor is the target.

  • **kwargs – additional arguments to be passed to __init__ method.



log_interpretation(x, out, batch_idx)[source]#

Log interpretation of network predictions in tensorboard.

plot_interpretation(x: Dict[str, Tensor], output: Dict[str, Tensor], idx: int, ax=None) Figure[source]#

Plot interpretation.

Plot two pannels: prediction and backcast vs actuals and decomposition of prediction into different block predictions which capture different frequencies.

  • x (Dict[str, torch.Tensor]) – network input

  • output (Dict[str, torch.Tensor]) – network output

  • idx (int) – index of sample for which to plot the interpretation.

  • ax (List[matplotlib axes], optional) – list of two matplotlib axes onto which to plot the interpretation. Defaults to None.


matplotlib figure

Return type


step(x, y, batch_idx) Dict[str, Tensor][source]#

Take training / validation step.

property covariate_size: int#

Covariate size.


size of time-dependent covariates

Return type


property n_stacks: int#

Number of stacks.


number of stacks.

Return type


property static_size: int#

Static covariate size.


size of static covariates

Return type