pytorch_forecasting.models.temporal_fusion_transformer.TemporalFusionTransformer#

class pytorch_forecasting.models.temporal_fusion_transformer.TemporalFusionTransformer(hidden_size: int = 16, lstm_layers: int = 1, dropout: float = 0.1, output_size: int | list[int] = 7, loss: MultiHorizonMetric = None, attention_head_size: int = 4, max_encoder_length: int = 10, static_categoricals: list[str] | None = None, static_reals: list[str] | None = None, time_varying_categoricals_encoder: list[str] | None = None, time_varying_categoricals_decoder: list[str] | None = None, categorical_groups: dict | list[str] | None = None, time_varying_reals_encoder: list[str] | None = None, time_varying_reals_decoder: list[str] | None = None, x_reals: list[str] | None = None, x_categoricals: list[str] | None = None, hidden_continuous_size: int = 8, hidden_continuous_sizes: dict[str, int] | None = None, embedding_sizes: dict[str, tuple[int, int]] | None = None, embedding_paddings: list[str] | None = None, embedding_labels: dict[str, ndarray] | None = None, learning_rate: float = 0.001, log_interval: int | float = -1, log_val_interval: int | float = None, log_gradient_flow: bool = False, reduce_on_plateau_patience: int = 1000, monotone_constraints: dict[str, int] | None = None, share_single_variable_networks: bool = False, causal_attention: bool = True, logging_metrics: ModuleList = None, mask_bias: float = -1000000000.0, **kwargs)[source]#

Temporal Fusion Transformer for forecasting timeseries.

Initialize via from_dataset() method if possible.

Implementation of Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting.

Enhancements compared to the original implementation:

static variables can be continuous
multiple categorical variables can be summarized with an EmbeddingBag
variable encoder and decoder length by sample
categorical embeddings are not transformed by variable selection network (because it is a redundant operation)
variable dimension in variable selection network are scaled up via linear interpolation to reduce number of parameters
non-linear variable processing in variable selection network can be shared among decoder and encoder (not shared by default)
capabilities added through base model such as monotone constraints

Tune its hyperparameters with optimize_hyperparameters().

Parameters:

hidden_size (int, default=16) – hidden size of network which is its main hyperparameter. Can range from 8 to 512.
lstm_layers (int, default=1) – number of LSTM layers (2 is mostly optimal)
dropout (float, default=0.1) – dropout rate
output_size (int or list of int, default=7) – number of outputs (e.g. number of quantiles for QuantileLoss and one target or list of output sizes).
loss (MultiHorizonMetric, default=QuantileLoss()) – loss function taking prediction and targets
attention_head_size (int, default=4) – number of attention heads (4 is a good default)
max_encoder_length (int, default=10) – length to encode, can be far longer than the decoder length but does not have to be
static_categoricals (names of static categorical variables)
static_reals (names of static continuous variables)
time_varying_categoricals_encoder (names of categorical variables for encoder)
time_varying_categoricals_decoder (names of categorical variables for decoder)
time_varying_reals_encoder (names of continuous variables for encoder)
time_varying_reals_decoder (names of continuous variables for decoder)
categorical_groups (dictionary where values) – are list of categorical variables that are forming together a new categorical variable which is the key in the dictionary
x_reals (order of continuous variables in tensor passed to forward function)
x_categoricals (order of categorical variables in tensor passed to forward function)
hidden_continuous_size (default for hidden size for processing continuous variables (similar to categorical) – embedding size)
hidden_continuous_sizes (dictionary mapping continuous input indices to sizes for variable selection) – (fallback to hidden_continuous_size if index is not in dictionary)
embedding_sizes (dictionary mapping (string) indices to tuple of number of categorical classes and) – embedding size
embedding_paddings (list of indices for embeddings which transform the zero's embedding to a zero vector)
embedding_labels (dictionary mapping (string) indices to list of categorical labels)
learning_rate (learning rate)
log_interval (log predictions every x batches, do not log if 0 or less, log interpretation if > 0. If < 1.0) – , will log multiple entries per batch. Defaults to -1.
log_val_interval (frequency with which to log validation set metrics, defaults to log_interval)
log_gradient_flow (if to log gradient flow, this takes time and should be only done to diagnose training) – failures
(int) (reduce_on_plateau_patience)
(Dict[str (monotone_constraints) – variables mapping position (e.g. "0" for first position) to constraint (-1 for negative and +1 for positive, larger numbers add more weight to the constraint vs. the loss but are usually not necessary). This constraint significantly slows down training. Defaults to {}.
int]) (dictionary of monotonicity constraints for continuous decoder) – variables mapping position (e.g. "0" for first position) to constraint (-1 for negative and +1 for positive, larger numbers add more weight to the constraint vs. the loss but are usually not necessary). This constraint significantly slows down training. Defaults to {}.
(bool) (causal_attention) – decoder. Defaults to False.
(bool) – predictions. Defaults to True.
(nn.ModuleList[LightningMetric]) (logging_metrics) – Defaults to nn.ModuleList([SMAPE(), MAE(), RMSE(), MAPE()]).
mask_bias (float, optional) – Bias for the mask in ScaledDotProductAttention.forward, by default -1e9. Set to -float(“inf”) to allow mixed precision training.
**kwargs (additional arguments to BaseModel.)

BaseModel for timeseries forecasting from which to inherit from

Parameters:

log_interval (Union[int, float], optional) – Batches after which predictions are logged. If < 1.0, will log multiple entries per batch. Defaults to -1.
log_val_interval (Union[int, float], optional) – batches after which predictions for validation are logged. Defaults to None/log_interval.
learning_rate (float, optional) – Learning rate. Defaults to 1e-3.
log_gradient_flow (bool) – If to log gradient flow, this takes time and should be only done to diagnose training failures. Defaults to False.
loss (Metric, optional) – metric to optimize, can also be list of metrics. Defaults to SMAPE().
logging_metrics (nn.ModuleList[MultiHorizonMetric]) – list of metrics that are logged during training. Defaults to [].
reduce_on_plateau_patience (int) – patience after which learning rate is reduced by a factor of 10. Defaults to 1000
reduce_on_plateau_reduction (float) – reduction in learning rate when encountering plateau. Defaults to 2.0.
reduce_on_plateau_min_lr (float) – minimum learning rate for reduce on plateau learning rate scheduler. Defaults to 1e-5
weight_decay (float) – weight decay. Defaults to 0.0.
optimizer_params (Dict[str, Any]) – additional parameters for the optimizer. Defaults to {}.
monotone_constraints (Dict[str, int]) – dictionary of monotonicity constraints for continuous decoder variables mapping position (e.g. "0" for first position) to constraint (-1 for negative and +1 for positive, larger numbers add more weight to the constraint vs. the loss but are usually not necessary). This constraint significantly slows down training. Defaults to {}.
output_transformer (Callable) – transformer that takes network output and transforms it to prediction space. Defaults to None which is equivalent to lambda out: out["prediction"].
optimizer (str) – Optimizer, “ranger”, “sgd”, “adam”, “adamw” or class name of optimizer in torch.optim or pytorch_optimizer. Alternatively, a class or function can be passed which takes parameters as first argument and a lr argument (optionally also weight_decay). Defaults to “adam”.

__init__(hidden_size: int = 16, lstm_layers: int = 1, dropout: float = 0.1, output_size: int | list[int] = 7, loss: MultiHorizonMetric = None, attention_head_size: int = 4, max_encoder_length: int = 10, static_categoricals: list[str] | None = None, static_reals: list[str] | None = None, time_varying_categoricals_encoder: list[str] | None = None, time_varying_categoricals_decoder: list[str] | None = None, categorical_groups: dict | list[str] | None = None, time_varying_reals_encoder: list[str] | None = None, time_varying_reals_decoder: list[str] | None = None, x_reals: list[str] | None = None, x_categoricals: list[str] | None = None, hidden_continuous_size: int = 8, hidden_continuous_sizes: dict[str, int] | None = None, embedding_sizes: dict[str, tuple[int, int]] | None = None, embedding_paddings: list[str] | None = None, embedding_labels: dict[str, ndarray] | None = None, learning_rate: float = 0.001, log_interval: int | float = -1, log_val_interval: int | float = None, log_gradient_flow: bool = False, reduce_on_plateau_patience: int = 1000, monotone_constraints: dict[str, int] | None = None, share_single_variable_networks: bool = False, causal_attention: bool = True, logging_metrics: ModuleList = None, mask_bias: float = -1000000000.0, **kwargs)[source]#

BaseModel for timeseries forecasting from which to inherit from

Parameters:

log_interval (Union[int, float], optional) – Batches after which predictions are logged. If < 1.0, will log multiple entries per batch. Defaults to -1.
log_val_interval (Union[int, float], optional) – batches after which predictions for validation are logged. Defaults to None/log_interval.
learning_rate (float, optional) – Learning rate. Defaults to 1e-3.
log_gradient_flow (bool) – If to log gradient flow, this takes time and should be only done to diagnose training failures. Defaults to False.
loss (Metric, optional) – metric to optimize, can also be list of metrics. Defaults to SMAPE().
logging_metrics (nn.ModuleList[MultiHorizonMetric]) – list of metrics that are logged during training. Defaults to [].
reduce_on_plateau_patience (int) – patience after which learning rate is reduced by a factor of 10. Defaults to 1000
reduce_on_plateau_reduction (float) – reduction in learning rate when encountering plateau. Defaults to 2.0.
reduce_on_plateau_min_lr (float) – minimum learning rate for reduce on plateau learning rate scheduler. Defaults to 1e-5
weight_decay (float) – weight decay. Defaults to 0.0.
optimizer_params (Dict[str, Any]) – additional parameters for the optimizer. Defaults to {}.
monotone_constraints (Dict[str, int]) – dictionary of monotonicity constraints for continuous decoder variables mapping position (e.g. "0" for first position) to constraint (-1 for negative and +1 for positive, larger numbers add more weight to the constraint vs. the loss but are usually not necessary). This constraint significantly slows down training. Defaults to {}.
output_transformer (Callable) – transformer that takes network output and transforms it to prediction space. Defaults to None which is equivalent to lambda out: out["prediction"].
optimizer (str) – Optimizer, “ranger”, “sgd”, “adam”, “adamw” or class name of optimizer in torch.optim or pytorch_optimizer. Alternatively, a class or function can be passed which takes parameters as first argument and a lr argument (optionally also weight_decay). Defaults to “adam”.

Methods

`__call__`(args, *kwargs)	Call self as a function.
`__delattr__`(name)	Implement delattr(self, name).
`__dir__`()	Default dir() implementation.
`__eq__`(value, /)	Return self==value.
`__format__`(format_spec, /)	Default object formatter.
`__ge__`(value, /)	Return self>=value.
`__getattr__`(name)
`__getattribute__`(name, /)	Return getattr(self, name).
`__getstate__`()	Helper for pickle.
`__gt__`(value, /)	Return self>value.
`__hash__`()	Return hash(self).
`__init_subclass__`	This method is called when a class is subclassed.
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.
`__ne__`(value, /)	Return self!=value.
`__new__`(args, *kwargs)
`__reduce__`()	Helper for pickle.
`__reduce_ex__`(protocol, /)	Helper for pickle.
`__repr__`()	Return repr(self).
`__setattr__`(name, value)	Implement setattr(self, name, value).
`__setstate__`(state)
`__sizeof__`()	Size of object in memory, in bytes.
`__str__`()	Return str(self).
`__subclasshook__`	Abstract classes can override this to customize issubclass().
`_apply`(fn[, recurse])
`_apply_batch_transfer_handler`(batch[, ...])
`_call_batch_hook`(hook_name, *args)
`_call_impl`(args, *kwargs)
`_get_backward_hooks`()	Return the backward hooks for use in the call function.
`_get_backward_pre_hooks`()
`_get_name`()
`_load_from_state_dict`(state_dict, prefix, ...)	Copy parameters and buffers from `state_dict` into only this module, but not its descendants.
`_log_dict_through_fabric`(dictionary[, logger])
`_log_interpretation`(out)
`_logger_supports`(method)	Whether logger supports method.
`_maybe_warn_non_full_backward_hook`(inputs, ...)
`_named_members`(get_members_fn[, prefix, ...])	Help yield various names + members of modules.
`_on_before_batch_transfer`(batch[, ...])
`_pkg`()	Package containing the model.
`_register_load_state_dict_pre_hook`(hook[, ...])	See `register_load_state_dict_pre_hook()` for details.
`_register_state_dict_hook`(hook)	Register a post-hook for the `state_dict()` method.
`_replicate_for_data_parallel`()
`_save_to_state_dict`(destination, prefix, ...)	Save module state to the destination dictionary.
`_set_hparams`(hp)
`_slow_forward`(input, *kwargs)
`_to_hparams_dict`(hp)
`_verify_is_manual_optimization`(fn_name)
`_wrapped_call_impl`(args, *kwargs)
`add_module`(name, module)	Add a child module to the current module.
`all_gather`(data[, group, sync_grads])	Gather tensors or collections of tensors from multiple processes.
`apply`(fn)	Apply `fn` recursively to every submodule (as returned by `.children()`) as well as self.
`backward`(loss, args, *kwargs)	Called to perform backward on the loss returned in `training_step()`.
`bfloat16`()	Casts all floating point parameters and buffers to `bfloat16` datatype.
`buffers`([recurse])	Return an iterator over module buffers.
`calculate_prediction_actual_by_variable`(x, ...)	Calculate predictions and actuals by variable averaged by `bins` bins spanning from `-std` to `+std`
`children`()	Return an iterator over immediate children modules.
`clip_gradients`(optimizer[, ...])	Handles gradient clipping internally.
`compile`(args, *kwargs)	Compile this Module's forward using `torch.compile()`.
`configure_callbacks`()	Configure model-specific callbacks.
`configure_gradient_clipping`(optimizer[, ...])	Perform gradient clipping for the optimizer parameters.
`configure_model`()	Hook to create modules in a strategy and precision aware context.
`configure_optimizers`()	Configure optimizers.
`configure_sharded_model`()	Deprecated.
`cpu`()	See `torch.nn.Module.cpu()`.
`create_log`(x, y, out, batch_idx, **kwargs)	Create the log used in the training and validation step.
`cuda`([device])	Moves all model parameters and buffers to the GPU.
`deduce_default_output_parameters`(dataset, kwargs)	Deduce default parameters for output for from_dataset() method.
`double`()	See `torch.nn.Module.double()`.
`eval`()	Set the module in evaluation mode.
`expand_static_context`(context, timesteps)	add time dimension to static context
`extra_repr`()	Return extra information about parameters for representation/logging.
`extract_features`(x[, embeddings, period])	Extract features
`float`()	See `torch.nn.Module.float()`.
`forward`(x)	input dimensions: n_samples x time x variables
`freeze`()	Freeze all params for inference.
`from_dataset`(dataset[, ...])	Create model from dataset.
`get_attention_mask`(encoder_lengths, ...)	Returns causal mask to apply for self-attention layer.
`get_buffer`(target)	Return the buffer given by `target` if it exists, otherwise throw an error.
`get_extra_state`()	Return any extra state to include in the module's state_dict.
`get_parameter`(target)	Return the parameter given by `target` if it exists, otherwise throw an error.
`get_submodule`(target)	Return the submodule given by `target` if it exists, otherwise throw an error.
`half`()	See `torch.nn.Module.half()`.
`interpret_output`(out[, reduction, ...])	interpret output of model
`ipu`([device])	Move all model parameters and buffers to the IPU.
`load_from_checkpoint`(checkpoint_path[, ...])	Primary way of loading a model from a checkpoint.
`load_state_dict`(state_dict[, strict, assign])	Copy parameters and buffers from `state_dict` into this module and its descendants.
`log`(args, *kwargs)	See `lightning.pytorch.core.lightning.LightningModule.log()`.
`log_dict`(dictionary[, prog_bar, logger, ...])	Log a dictionary of values at once.
`log_embeddings`()	Log embeddings to tensorboard
`log_gradient_flow`(named_parameters)	log distribution of gradients to identify exploding / vanishing gradients
`log_interpretation`(outputs)	Log interpretation metrics to tensorboard.
`log_metrics`(x, y, out[, prediction_kwargs])	Log metrics every training/validation step.
`log_prediction`(x, out, batch_idx, **kwargs)	Log metrics every training/validation step.
`lr_scheduler_step`(scheduler, metric)	Override this method to adjust the default way the `Trainer` calls each scheduler.
`lr_schedulers`()	Returns the learning rate scheduler(s) that are being used during training.
`manual_backward`(loss, args, *kwargs)	Call this directly from your `training_step()` when doing optimizations manually.
`modules`([remove_duplicate])	Return an iterator over all modules in the network.
`mtia`([device])	Move all model parameters and buffers to the MTIA.
`named_buffers`([prefix, recurse, ...])	Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
`named_children`()	Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
`named_modules`([memo, prefix, remove_duplicate])	Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
`named_parameters`([prefix, recurse, ...])	Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
`on_after_backward`()	Log gradient flow for debugging.
`on_after_batch_transfer`(batch, dataloader_idx)	Override to alter or apply batch augmentations to your batch after it is transferred to the device.
`on_before_backward`(loss)	Called before `loss.backward()`.
`on_before_batch_transfer`(batch, dataloader_idx)	Override to alter or apply batch augmentations to your batch before it is transferred to the device.
`on_before_optimizer_step`(optimizer)	Called before `optimizer.step()`.
`on_before_zero_grad`(optimizer)	Called after `training_step()` and before `optimizer.zero_grad()`.
`on_epoch_end`(outputs)	run at epoch end for training or validation
`on_fit_end`()	Called at the very end of fit.
`on_fit_start`()	Called at the very beginning of fit.
`on_load_checkpoint`(checkpoint)	Called by Lightning to restore your model.
`on_predict_batch_end`(outputs, batch, batch_idx)	Called in the predict loop after the batch.
`on_predict_batch_start`(batch, batch_idx[, ...])	Called in the predict loop before anything happens for that batch.
`on_predict_end`()	Called at the end of predicting.
`on_predict_epoch_end`()	Called at the end of predicting.
`on_predict_epoch_start`()	Called at the beginning of predicting.
`on_predict_model_eval`()	Called when the predict loop starts.
`on_predict_start`()	Called at the beginning of predicting.
`on_save_checkpoint`(checkpoint)	Called by Lightning when saving a checkpoint to give you a chance to store anything else you might want to save.
`on_test_batch_end`(outputs, batch, batch_idx)	Called in the test loop after the batch.
`on_test_batch_start`(batch, batch_idx[, ...])	Called in the test loop before anything happens for that batch.
`on_test_end`()	Called at the end of testing.
`on_test_epoch_end`()	Called in the test loop at the very end of the epoch.
`on_test_epoch_start`()	Called in the test loop at the very beginning of the epoch.
`on_test_model_eval`()	Called when the test loop starts.
`on_test_model_train`()	Called when the test loop ends.
`on_test_start`()	Called at the beginning of testing.
`on_train_batch_end`(outputs, batch, batch_idx)	Called in the training loop after the batch.
`on_train_batch_start`(batch, batch_idx)	Called in the training loop before anything happens for that batch.
`on_train_end`()	Called at the end of training before logger experiment is closed.
`on_train_epoch_end`()	Called in the training loop at the very end of the epoch.
`on_train_epoch_start`()	Called in the training loop at the very beginning of the epoch.
`on_train_start`()	Called at the beginning of training after sanity check.
`on_validation_batch_end`(outputs, batch, ...)	Called in the validation loop after the batch.
`on_validation_batch_start`(batch, batch_idx)	Called in the validation loop before anything happens for that batch.
`on_validation_end`()	Called at the end of validation.
`on_validation_epoch_end`()	Called in the validation loop at the very end of the epoch.
`on_validation_epoch_start`()	Called in the validation loop at the very beginning of the epoch.
`on_validation_model_eval`()	Called when the validation loop starts.
`on_validation_model_train`()	Called when the validation loop ends.
`on_validation_model_zero_grad`()	Called by the training loop to release gradients before entering the validation loop.
`on_validation_start`()	Called at the beginning of validation.
`optimizer_step`(epoch, batch_idx, optimizer)	Override this method to adjust the default way the `Trainer` calls the optimizer.
`optimizer_zero_grad`(epoch, batch_idx, optimizer)	Override this method to change the default behaviour of `optimizer.zero_grad()`.
`optimizers`([use_pl_optimizer])	Returns the optimizer(s) that are being used during training.
`parameters`([recurse])	Return an iterator over module parameters.
`plot_interpretation`(interpretation)	Make figures that interpret model.
`plot_prediction`(x, out, idx[, ...])	Plot actuals vs prediction and attention
`plot_prediction_actual_by_variable`(data[, ...])	Plot predicions and actual averages by variables
`predict`(data[, mode, return_index, ...])	Run inference / prediction.
`predict_dataloader`()	An iterable or collection of iterables specifying prediction samples.
`predict_dependency`(data, variable, values[, ...])	Predict partial dependency.
`predict_step`(batch, batch_idx)	Step function called during `predict()`.
`prepare_data`()	Use this to download and prepare data.
`print`(args, *kwargs)	Prints only from process 0.
`register_backward_hook`(hook)	Register a backward hook on the module.
`register_buffer`(name, tensor[, persistent])	Add a buffer to the module.
`register_forward_hook`(hook, *[, prepend, ...])	Register a forward hook on the module.
`register_forward_pre_hook`(hook, *[, ...])	Register a forward pre-hook on the module.
`register_full_backward_hook`(hook[, prepend])	Register a backward hook on the module.
`register_full_backward_pre_hook`(hook[, prepend])	Register a backward pre-hook on the module.
`register_load_state_dict_post_hook`(hook)	Register a post-hook to be run after module's `load_state_dict()` is called.
`register_load_state_dict_pre_hook`(hook)	Register a pre-hook to be run before module's `load_state_dict()` is called.
`register_module`(name, module)	Alias for `add_module()`.
`register_parameter`(name, param)	Add a parameter to the module.
`register_state_dict_post_hook`(hook)	Register a post-hook for the `state_dict()` method.
`register_state_dict_pre_hook`(hook)	Register a pre-hook for the `state_dict()` method.
`remove_ignored_hparams`(ignore_list)	Remove ignored hyperparameters from the stored state.
`requires_grad_`([requires_grad])	Change if autograd should record operations on parameters in this module.
`save_hyperparameters`(*args[, ignore, frame, ...])	Save arguments to `hparams` attribute.
`set_extra_state`(state)	Set extra state contained in the loaded state_dict.
`set_submodule`(target, module[, strict])	Set the submodule given by `target` if it exists, otherwise throw an error.
`setup`(stage)	Called at the beginning of fit (train + validate), validate, test, or predict.
`share_memory`()	See `torch.Tensor.share_memory_()`.
`size`()	get number of parameters in model
`state_dict`(*args[, destination, prefix, ...])	Return a dictionary containing references to the whole state of the module.
`step`(x, y, batch_idx, **kwargs)	Run for each train/val step.
`teardown`(stage)	Called at the end of fit (train + validate), validate, test, or predict.
`test_dataloader`()	An iterable or collection of iterables specifying test samples.
`test_step`(batch, batch_idx)	Operates on a single batch of data from the test set.
`to`(args, *kwargs)	See `torch.nn.Module.to()`.
`to_empty`(*, device[, recurse])	Move the parameters and buffers to the specified device without copying storage.
`to_network_output`(**results)	Convert output into a named (and immutable) tuple.
`to_onnx`([file_path, input_sample])	Saves the model in ONNX format.
`to_prediction`(out[, use_metric])	Convert output to prediction using the loss metric.
`to_quantiles`(out[, use_metric])	Convert output to quantiles using the loss metric.
`to_tensorrt`([file_path, input_sample, ir, ...])	Export the model to ScriptModule or GraphModule using TensorRT compile backend.
`to_torchscript`([file_path, method, ...])	By default compiles the whole model to a `torch.jit.ScriptModule`.
`toggle_optimizer`(optimizer)	Makes sure only the gradients of the current optimizer's parameters are calculated in the training step to prevent dangling gradients in multiple-optimizer setup.
`toggled_optimizer`(optimizer)	Makes sure only the gradients of the current optimizer's parameters are calculated in the training step to prevent dangling gradients in multiple-optimizer setup.
`train`([mode])	Set the module in training mode.
`train_dataloader`()	An iterable or collection of iterables specifying training samples.
`training_step`(batch, batch_idx)	Train on batch.
`transfer_batch_to_device`(batch, device, ...)	Override this hook if your `DataLoader` returns tensors wrapped in a custom data structure.
`transform_output`(prediction, target_scale[, ...])	Extract prediction from network output and rescale it to real space / de-normalize it.
`type`(dst_type)	See `torch.nn.Module.type()`.
`unfreeze`()	Unfreeze all parameters for training.
`untoggle_optimizer`(optimizer)	Resets the state of required gradients that were toggled with `toggle_optimizer()`.
`val_dataloader`()	An iterable or collection of iterables specifying validation samples.
`validation_step`(batch, batch_idx)	Operates on a single batch of data from the validation set.
`xpu`([device])	Move all model parameters and buffers to the XPU.
`zero_grad`([set_to_none])	Reset gradients of all model parameters.

Attributes

`CHECKPOINT_HYPER_PARAMS_KEY`
`CHECKPOINT_HYPER_PARAMS_NAME`
`CHECKPOINT_HYPER_PARAMS_SPECIAL_KEY`
`CHECKPOINT_HYPER_PARAMS_TYPE`
`T_destination`
`__annotations__`
`__dict__`
`__doc__`
`__jit_unused_properties__`
`__module__`
`__weakref__`	list of weak references to the object
`_compiled_call_impl`
`_jit_is_scripting`
`_version`	This allows better BC support for `load_state_dict()`.
`automatic_optimization`	If set to `False` you are responsible for calling `.backward()`, `.step()`, `.zero_grad()`.
`call_super_init`
`categorical_groups_mapping`	Mapping of categorical variables to categorical groups
`categoricals`	List of all categorical variables in model
`current_epoch`	The current epoch in the `Trainer`, or 0 if not attached.
`current_stage`	Available inside lightning loops.
`decoder_variables`	List of all decoder variables in model (excluding static variables)
`device`
`device_mesh`	Strategies like `ModelParallelStrategy` will create a device mesh that can be accessed in the `configure_model()` hook to parallelize the LightningModule.
`dtype`
`dump_patches`
`encoder_variables`	List of all encoder variables in model (excluding static variables)
`example_input_array`	The example input array is a specification of what the module can consume in the `forward()` method.
`fabric`
`global_rank`	The index of the current process across all nodes and devices.
`global_step`	Total training batches seen across all epochs.
`hparams`	The collection of hyperparameters saved with `save_hyperparameters()`.
`hparams_initial`	The collection of hyperparameters saved with `save_hyperparameters()`.
`local_rank`	The index of the current process within a single node.
`log_interval`	Log interval depending if training or validating
`logger`	Reference to the logger object in the Trainer.
`loggers`	Reference to the list of loggers in the Trainer.
`n_targets`	Number of targets to forecast.
`on_gpu`	Returns `True` if this model is currently located on a GPU.
`predicting`
`reals`	List of all continuous variables in model
`static_variables`	List of all static variables in model
`strict_loading`	Determines how Lightning loads this model using .load_state_dict(..., strict=model.strict_loading).
`target_names`	List of targets that are predicted.
`target_positions`	Positions of target variable(s) in covariates.
`trainer`
`training`
`_parameters`
`_buffers`
`_non_persistent_buffers_set`
`_backward_pre_hooks`
`_backward_hooks`
`_is_full_backward_hook`
`_forward_hooks`
`_forward_hooks_with_kwargs`
`_forward_hooks_always_called`
`_forward_pre_hooks`
`_forward_pre_hooks_with_kwargs`
`_state_dict_hooks`
`_load_state_dict_pre_hooks`
`_state_dict_pre_hooks`
`_load_state_dict_post_hooks`
`_modules`