| Title: | Sparse Multiple Index Models for Nonparametric Forecasting |
| Version: | 0.1.3 |
| Description: | Implements a general algorithm for estimating Sparse Multiple Index (SMI) models for nonparametric forecasting and prediction. Estimation of SMI models requires the Gurobi mixed integer programming (MIP) solver via the gurobi R package. To use this functionality, the Gurobi Optimizer must be installed, and a valid license obtained and activated from https://www.gurobi.com. The gurobi R package must then be installed and configured following the instructions at https://support.gurobi.com/hc/en-us/articles/14462206790033-How-do-I-install-Gurobi-for-R. The package also includes functions for fitting nonparametric additive models with backward elimination, group-wise additive index models, and projection pursuit regression models as benchmark comparison methods. In addition, it provides tools for generating prediction intervals to quantify uncertainty in point forecasts produced by the SMI model and benchmark models, using the classical block bootstrap and a new method called conformal bootstrap, which integrates block bootstrap with split conformal prediction. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.1.0) |
| Imports: | cgaim, conformalForecast, dplyr, furrr, future, generics, ggplot2, gratia, gtools, Matrix, methods, mgcv, purrr, ROI, tibble, tidyselect, tidyr, tsibble, utils |
| Suggests: | gurobi, knitr, lubridate, rmarkdown, testthat (≥ 3.0.0) |
| RoxygenNote: | 7.3.2 |
| URL: | https://github.com/nuwani-palihawadana/smimodel, https://nuwani-palihawadana.github.io/smimodel/ |
| BugReports: | https://github.com/nuwani-palihawadana/smimodel/issues |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-04-02 05:48:54 UTC; nkod0004 |
| Author: | Nuwani Palihawadana
|
| Maintainer: | Nuwani Palihawadana <nuwanipalihawadana@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-08 14:10:13 UTC |
smimodel: Sparse Multiple Index Models for Nonparametric Forecasting
Description
Implements a general algorithm for estimating Sparse Multiple Index (SMI) models for nonparametric forecasting and prediction. Estimation of SMI models requires the Gurobi mixed integer programming (MIP) solver via the gurobi R package. To use this functionality, the Gurobi Optimizer must be installed, and a valid license obtained and activated from https://www.gurobi.com. The gurobi R package must then be installed and configured following the instructions at https://support.gurobi.com/hc/en-us/articles/14462206790033-How-do-I-install-Gurobi-for-R. The package also includes functions for fitting nonparametric additive models with backward elimination, group-wise additive index models, and projection pursuit regression models as benchmark comparison methods. In addition, it provides tools for generating prediction intervals to quantify uncertainty in point forecasts produced by the SMI model and benchmark models, using the classical block bootstrap and a new method called conformal bootstrap, which integrates block bootstrap with split conformal prediction.
Author(s)
Maintainer: Nuwani Palihawadana nuwanipalihawadana@gmail.com (ORCID) [copyright holder]
Other contributors:
Xiaoqian Wang Xiaoqian.Wang@amss.ac.cn (ORCID) [contributor]
See Also
Useful links:
Report bugs at https://github.com/nuwani-palihawadana/smimodel/issues
Point estimate accuracy measures
Description
Point estimate accuracy measures
Usage
MAE(residuals, na.rm = TRUE, ...)
MSE(residuals, na.rm = TRUE, ...)
point_measures
Arguments
residuals |
A vector of residuals from either the validation or test data. |
na.rm |
If |
... |
Additional arguments for each measure. |
Format
An object of class list of length 2.
Value
For the individual functions (MAE, MSE), returns a single numeric
scalar giving the requested accuracy measure.
For the exported object point_measures, returns a named list of functions
that can be supplied to higher-level accuracy routines.
Examples
set.seed(123)
ytrain <- rnorm(100)
ytest <- rnorm(30)
yhat <- ytest + rnorm(30, sd = 0.3)
resid <- ytest - yhat
MAE(resid)
MSE(resid)
Constructing index coefficient vectors with all predictors in each index
Description
Constructs vectors of coefficients for each index including a coefficient for all the predictors that are entering indices. i.e. if a coefficient is not provided for a particular predictor in a particular index, the function will replace the missing coefficient with a zero.
Usage
allpred_index(num_pred, num_ind, ind_pos, alpha)
Arguments
num_pred |
Number of predictors. |
num_ind |
Number of indices. |
ind_pos |
A list of length = |
alpha |
A vector of index coefficients. |
Value
A list containing the following components:
alpha_init_new |
A
|
index |
An |
index_positions |
A list of length = |
Augment function for class backward
Description
Generates residuals and fitted values of a fitted backward object.
Usage
## S3 method for class 'backward'
augment(x, ...)
Arguments
x |
A |
... |
Other arguments not currently used. |
Value
A tibble.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1205
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Validation set
sim_val <- sim_data[1001:1200, ]
# Predictors taken as non-linear variables
s.vars <- colnames(sim_data)[3:8]
# Model fitting
backwardModel <- model_backward(data = sim_train,
val.data = sim_val,
yvar = "y",
s.vars = s.vars)
# Obtain residuals and fitted values
augment(backwardModel)
Augment function for class gaimFit
Description
Generates residuals and fitted values of a fitted gaimFit object.
Usage
## S3 method for class 'gaimFit'
augment(x, ...)
Arguments
x |
A |
... |
Other arguments not currently used. |
Value
A tibble.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Predictors taken as index variables
index.vars <- colnames(sim_data)[3:7]
# Assign group indices for each predictor
index.ind = c(rep(1, 3), rep(2, 2))
# Predictors taken as non-linear variables not entering indices
s.vars = "x_lag_005"
# Model fitting
gaimModel <- model_gaim(data = sim_data,
yvar = "y",
index.vars = index.vars,
index.ind = index.ind,
s.vars = s.vars)
# Obtain residuals and fitted values
augment(gaimModel)
Augment function for class gamFit
Description
Generates residuals and fitted values of a fitted gamFit object.
Usage
## S3 method for class 'gamFit'
augment(x, ...)
Arguments
x |
A |
... |
Other arguments not currently used. |
Value
A tibble.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Predictors taken as non-linear variables
s.vars <- colnames(sim_data)[3:6]
# Predictors taken as linear variables
linear.vars <- colnames(sim_data)[7:8]
# Model fitting
gamModel <- model_gam(data = sim_data,
yvar = "y",
s.vars = s.vars,
linear.vars = linear.vars)
# Obtain residuals and fitted values
augment(gamModel)
Augment function for class lmFit
Description
Generates residuals and fitted values of a fitted lmFit object.
Usage
## S3 method for class 'lmFit'
augment(x, ...)
Arguments
x |
A |
... |
Other arguments not currently used. |
Value
A tibble.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Predictor variables
linear.vars <- colnames(sim_data)[3:8]
# Model fitting
lmModel <- model_lm(data = sim_data,
yvar = "y",
linear.vars = linear.vars)
# Obtain residuals and fitted values
augment(lmModel)
Augment function for class pprFit
Description
Generates residuals and fitted values of a fitted pprFit object.
Usage
## S3 method for class 'pprFit'
augment(x, ...)
Arguments
x |
A |
... |
Other arguments not currently used. |
Value
A tibble.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
pprModel <- model_ppr(data = sim_data,
yvar = "y",
index.vars = index.vars)
# Obtain residuals and fitted values
augment(pprModel)
Augment function for class smimodel
Description
Generates residuals and fitted values of a fitted smimodel object.
Usage
## S3 method for class 'smimodel'
augment(x, ...)
Arguments
x |
A |
... |
Other arguments not currently used. |
Value
A tibble.
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_data,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
# Obtain residuals and fitted values
augment(smimodel_ppr)
}
Augment function for class smimodelFit
Description
Generates residuals and fitted values of a fitted smimodelFit object.
Usage
## S3 method for class 'smimodelFit'
augment(x, ...)
Arguments
x |
A |
... |
Other arguments not currently used. |
Value
A tibble.
Plot estimated smooths from a fitted smimodel
Description
Plots the graphs of fitted spline(s). If a set of multiple models are fitted,
plots graphs of fitted spline(s) of a specified model (in argument
model) out of the set of multiple models fitted.
Usage
## S3 method for class 'smimodel'
autoplot(object, model = 1, ...)
Arguments
object |
A |
model |
An |
... |
Other arguments not currently used. |
Value
Plot(s) of fitted spline(s).
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_data,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
autoplot(smimodel_ppr)
}
Calculate interval forecast coverage
Description
This is a wrapper for the function conformalForecast::coverage.
Calculates the mean coverage and the ifinn matrix for prediction intervals on
validation set. If window is not NULL, a matrix of the rolling
means of interval forecast coverage is also returned.
Usage
avgCoverage(object, level = 95, window = NULL, na.rm = FALSE)
Arguments
object |
An object of class |
level |
Target confidence level for prediction intervals. |
window |
If not |
na.rm |
A |
Value
A list of class coverage with the following components:
mean |
Mean coverage across the validation set. |
ifinn |
A indicator matrix as a multivariate time series, where the
|
rollmean |
If |
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1055
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1050, ]
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
pprModel <- model_ppr(data = sim_train,
yvar = "y",
index.vars = index.vars)
# Conformal bootstrap prediction intervals (2-steps-ahead interval forecasts)
set.seed(12345)
pprModel_cb <- cb_cvforecast(object = pprModel,
data = sim_data,
yvar = "y",
predictor.vars = index.vars,
h = 2,
ncal = 30,
num.futures = 100,
window = 1000)
# Mean coverage of generated 95% conformal bootstrap prediction intervals
cov_data <- avgCoverage(object = pprModel_cb)
cov_data$mean
Calculate interval forecast width
Description
This is a wrapper for the function conformalForecast::width.
Calculates the mean width of prediction intervals on the validation set. If
window is not NULL, a matrix of the rolling means of interval
width is also returned. If includemedian is TRUE, the
information of the median interval width will be returned.
Usage
avgWidth(
object,
level = 95,
includemedian = FALSE,
window = NULL,
na.rm = FALSE
)
Arguments
object |
An object of class |
level |
Target confidence level for prediction intervals. |
includemedian |
If |
window |
If not |
na.rm |
A logical indicating whether |
Value
A list of class width with the following components:
width |
Forecast interval width as a multivariate time series, where the
|
mean |
Mean interval width across the validation set. |
rollmean |
If |
median |
Median interval width across the validation set. |
rollmedian |
If |
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1055
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1050, ]
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
pprModel <- model_ppr(data = sim_train,
yvar = "y",
index.vars = index.vars)
# Conformal bootstrap prediction intervals (2-steps-ahead interval forecasts)
set.seed(12345)
pprModel_cb <- cb_cvforecast(object = pprModel,
data = sim_data,
yvar = "y",
predictor.vars = index.vars,
h = 2,
ncal = 30,
num.futures = 100,
window = 1000)
# Mean width of generated 95% conformal bootstrap prediction intervals
width_data <- avgWidth(object = pprModel_cb)
width_data$mean
Single season block bootstrap prediction intervals through time series cross-validation forecasting
Description
Compute prediction intervals by applying the single season block bootstrap method to subsets of time series data using a rolling forecast origin.
Usage
bb_cvforecast(
object,
data,
yvar,
neighbour = 0,
predictor.vars,
h = 1,
season.period = 1,
m = 1,
num.futures = 1000,
level = c(80, 95),
forward = TRUE,
initial = 1,
window = NULL,
roll.length = 1,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colNames = NULL,
na.rm = TRUE,
verbose = list(solver = FALSE, progress = FALSE),
...
)
Arguments
object |
Fitted model object of class |
data |
Data set. Must be a data set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
predictor.vars |
A character vector of names of the predictor variables. |
h |
Forecast horizon. |
season.period |
Length of the seasonal period. |
m |
Multiplier. (Block size = |
num.futures |
Number of possible future sample paths to be generated. |
level |
Confidence level for prediction intervals. |
forward |
If |
initial |
Initial period of the time series where no cross-validation forecasting is performed. |
window |
Length of the rolling window. If |
roll.length |
Number of observations by which each rolling/expanding window should be rolled forward. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. (Since the nonlinear functions are estimated using splines, extrapolation is not desirable. Hence, if any predictor variable is treated non-linearly in the estimated model, will be truncated to be in the in-sample range before obtaining predictions. If any variables are listed here will be excluded from such truncation.) |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colNames |
If |
na.rm |
logical; if |
verbose |
A named list controlling verbosity options. Defaults to
|
... |
Other arguments not currently used. |
Value
An object of class bb_cvforecast, which is a list that
contains following elements:
x |
The original time series. |
method |
A character string "bb_cvforecast". |
fit_times |
The number of times the model is fitted in cross-validation. |
mean |
Point forecasts as a multivariate time series, where the
|
res |
The matrix of in-sample residuals produced in cross-validation.
The number of rows corresponds to |
model_fit |
Models fitted in cross-validation. |
level |
The confidence values associated with the prediction intervals. |
lower |
A list containing
lower bounds for prediction intervals for each level. Each element within
the list will be a multivariate time series with the same dimensional
characteristics as |
upper |
A list containing upper bounds
for prediction intervals for each level. Each element within the list will
be a multivariate time series with the same dimensional characteristics as
|
possible_futures |
A list of matrices containing future sample paths generated at each cross-validation step. |
See Also
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1105
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 +
(0.35*x_lag_002 + 0.7*x_lag_005)^2 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1100, ]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_train,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
# Block bootstrap prediction intervals (3-steps-ahead interval forecasts)
set.seed(12345)
smimodel_ppr_bb <- bb_cvforecast(object = smimodel_ppr,
data = sim_data,
yvar = "y",
predictor.vars = index.vars,
h = 3,
num.futures = 50,
window = 1000)
}
Futures through single season block bootstrapping
Description
Generates possible future sample paths by applying the single season block bootstrap method.
Usage
blockBootstrap(
object,
newdata,
resids,
preds,
season.period = 1,
m = 1,
num.futures = 1000,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL
)
Arguments
object |
Fitted model object. |
newdata |
Test data set. Must be a data set of class |
resids |
In-sample residuals from the fitted model. |
preds |
Predictions for the test set (i.e. data for the forecast horizon). |
season.period |
Length of the seasonal period. |
m |
Multiplier. (Block size = |
num.futures |
Number of possible future sample paths to be generated. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default - FALSE). |
recursive_colRange |
If |
Value
A matrix of simulated future sample paths.
Conformal bootstrap prediction intervals through time series cross-validation forecasting
Description
Compute prediction intervals by applying the conformal bootstrap method to subsets of time series data using a rolling forecast origin.
Usage
cb_cvforecast(
object,
data,
yvar,
neighbour = 0,
predictor.vars,
h = 1,
ncal = 100,
num.futures = 1000,
level = c(80, 95),
forward = TRUE,
initial = 1,
window = NULL,
roll.length = 1,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colNames = NULL,
na.rm = TRUE,
nacheck_frac_numerator = 2,
nacheck_frac_denominator = 3,
verbose = list(solver = FALSE, progress = FALSE),
...
)
Arguments
object |
Fitted model object of class |
data |
Data set. Must be a data set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
predictor.vars |
A character vector of names of the predictor variables. |
h |
Forecast horizon. |
ncal |
Length of a calibration window. |
num.futures |
Number of possible future sample paths to be generated in bootstrap. |
level |
Confidence level for prediction intervals. |
forward |
If |
initial |
Initial period of the time series where no cross-validation forecasting is performed. |
window |
Length of the rolling window. If |
roll.length |
Number of observations by which each rolling/expanding window should be rolled forward. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. (Since the nonlinear functions are estimated using splines, extrapolation is not desirable. Hence, if any predictor variable is treated non-linearly in the estimated model, will be truncated to be in the in-sample range before obtaining predictions. If any variables are listed here will be excluded from such truncation.) |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colNames |
If |
na.rm |
logical; if |
nacheck_frac_numerator |
Numerator of the fraction of non-missing values that is required in a test set. |
nacheck_frac_denominator |
Denominator of the fraction of non-missing values that is required in a test set. |
verbose |
A named list controlling verbosity options. Defaults to
|
... |
Other arguments not currently used. |
Value
An object of class cb_cvforecast, which is a list that
contains following elements:
x |
The original time series. |
method |
A character string "cb_cvforecast". |
fit_times |
The number of times the model is fitted in cross-validation. |
mean |
Point forecasts as a multivariate time
series, where the |
error |
Forecast errors given by |
res |
The matrix of in-sample residuals produced in cross-validation. |
level |
The confidence levels associated with the prediction intervals. |
cal_times |
The number of calibration windows considered in cross-validation. |
num_cal |
The number of non-missing multi-step forecast errors in each calibration window. |
skip_cal |
An indicator vector indicating whether a calibration window is skipped without constructing prediction intervals due to missing model or missing data in the test set. |
lower |
A list containing lower bounds for prediction
intervals for each level. Each element within the list will be a
multivariate time series with the same dimensional characteristics as
|
upper |
A list containing upper bounds for prediction
intervals for each level. Each element within the list will be a
multivariate time series with the same dimensional characteristics as
|
possible_futures |
A list of matrices containing future sample paths generated at each calibration step. |
See Also
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1105
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 +
(0.35*x_lag_002 + 0.7*x_lag_005)^2 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1100, ]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_train,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
# Conformal bootstrap prediction intervals (3-steps-ahead interval forecasts)
set.seed(12345)
smimodel_ppr_cb <- cb_cvforecast(object = smimodel_ppr,
data = sim_data,
yvar = "y",
predictor.vars = index.vars,
h = 3,
ncal = 30,
num.futures = 100,
window = 1000)
}
Eliminate a variable and fit a nonparametric additive model
Description
Eliminates a specified variable and fits a nonparametric additive model with
remaining variables, and returns validation set MSE. This is an internal
function of the package, and designed to be called from
model_backward.
Usage
eliminate(
ind,
train,
val,
yvar,
family = gaussian(),
s.vars = NULL,
s.basedim = NULL,
linear.vars = NULL,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL
)
Arguments
ind |
An |
train |
The data set on which the model(s) will be trained. Must be a
data set of class |
val |
Validation data set. (The data set on which the model selection
will be performed.) Must be a data set of class |
yvar |
Name of the response variable as a character string. |
family |
A description of the error distribution and link function to be
used in the model (see |
s.vars |
A |
s.basedim |
Dimension of the bases used to represent the smooth terms
corresponding to |
linear.vars |
A |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
Value
A numeric.
Forecasting using nonparametric additive models with backward elimination
Description
Returns forecasts and other information for nonparametric additive models with backward elimination.
Usage
## S3 method for class 'backward'
forecast(
object,
h = 1,
level = c(80, 95),
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1215
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Validation set
sim_val <- sim_data[1001:1200, ]
# Test set
sim_test <- sim_data[1201:1210, ]
# Predictors taken as non-linear variables
s.vars <- colnames(sim_data)[3:8]
# Model fitting
backwardModel <- model_backward(data = sim_train,
val.data = sim_val,
yvar = "y",
s.vars = s.vars)
forecast(backwardModel, newdata = sim_test)
Forecasting using GAIMs
Description
Returns forecasts and other information for GAIMs.
Usage
## S3 method for class 'gaimFit'
forecast(
object,
h = 1,
level = c(80, 95),
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Predictors taken as index variables
index.vars <- colnames(sim_data)[3:7]
# Assign group indices for each predictor
index.ind = c(rep(1, 3), rep(2, 2))
# Predictors taken as non-linear variables not entering indices
s.vars = "x_lag_005"
# Model fitting
gaimModel <- model_gaim(data = sim_train,
yvar = "y",
index.vars = index.vars,
index.ind = index.ind,
s.vars = s.vars)
forecast(gaimModel, newdata = sim_test)
Forecasting using GAMs
Description
Returns forecasts and other information for GAMs.
Usage
## S3 method for class 'gamFit'
forecast(
object,
h = 1,
level = c(80, 95),
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Predictors taken as non-linear variables
s.vars <- colnames(sim_data)[3:6]
# Predictors taken as linear variables
linear.vars <- colnames(sim_data)[7:8]
# Model fitting
gamModel <- model_gam(data = sim_train,
yvar = "y",
s.vars = s.vars,
linear.vars = linear.vars)
forecast(gamModel, newdata = sim_test)
Forecasting using PPR models
Description
Returns forecasts and other information for PPR models.
Usage
## S3 method for class 'pprFit'
forecast(
object,
h = 1,
level = c(80, 95),
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
pprModel <- model_ppr(data = sim_train,
yvar = "y",
index.vars = index.vars)
forecast(pprModel, newdata = sim_test)
Forecasting using SMI models
Description
Returns forecasts and other information for SMI models.
Usage
## S3 method for class 'smimodel'
forecast(
object,
h = 1,
level = c(80, 95),
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_train,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
forecast(smimodel_ppr, newdata = sim_test)
}
Greedy search for tuning penalty parameters
Description
Function to perform a greedy search over a given grid of penalty parameter
combinations (lambda0, lambda2), and fits a single SMI model with the best
(lowest validation set MSE) penalty parameter combination. If the optimal
combination lies on the edge of the grid, the penalty parameters are adjusted
by ±10%, and a second round of grid search is performed.This is a helper
function designed to be called from greedy_smimodel.
Usage
greedy.fit(
data,
val.data,
yvar,
neighbour = 0,
family = gaussian(),
index.vars,
initialise = c("ppr", "additive", "linear", "multiple", "userInput"),
num_ind = 5,
num_models = 5,
seed = 123,
index.ind = NULL,
index.coefs = NULL,
s.vars = NULL,
linear.vars = NULL,
nlambda = 100,
lambda.min.ratio = 1e-04,
refit = TRUE,
M = 10,
max.iter = 50,
tol = 0.001,
tolCoefs = 0.001,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = list(solver = FALSE, progress = FALSE),
parallel = FALSE,
workers = NULL,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
val.data |
Validation data set. (The data set on which the penalty
parameter selection will be performed.) Must be a data set of class
|
yvar |
Name of the response variable as a character string. |
neighbour |
|
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
nlambda |
The number of values for lambda0 (penalty parameter for L0 penalty) - default is 100. |
lambda.min.ratio |
Smallest value for lambda0, as a fraction of lambda0.max (data derived). |
refit |
Whether to refit the model combining training and validation
sets after parameter tuning. If |
M |
Big-M value used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
parallel |
The option to use parallel processing in fitting SMI models for different penalty parameter combinations. |
workers |
If |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
Value
A list that contains six elements:
initial |
A list of
information of the model initialisation. (For descriptions of the list
elements see |
best |
A list of
information of the final optimised model. (For descriptions of the list
elements see |
best_lambdas |
Selected penalty parameter combination. |
lambda0_seq |
Sequence of values for lambda0 used to construct the initial grid. |
lambda2_seq |
Sequence of values for lambda2 used to construct the initial grid. |
searched |
A |
SMI model estimation through a greedy search for penalty parameters
Description
Performs a greedy search over a given grid of penalty parameter combinations (lambda0, lambda2), and fits SMI model(s) with best (lowest validation set MSE) penalty parameter combination(s). If the optimal combination lies on the edge of the grid, the penalty parameters are adjusted by ±10%, and a second round of grid search is performed. If a grouping variable is used, penalty parameters are tuned separately for each individual model.
Usage
greedy_smimodel(
data,
val.data,
yvar,
neighbour = 0,
family = gaussian(),
index.vars,
initialise = c("ppr", "additive", "linear", "multiple", "userInput"),
num_ind = 5,
num_models = 5,
seed = 123,
index.ind = NULL,
index.coefs = NULL,
s.vars = NULL,
linear.vars = NULL,
nlambda = 100,
lambda.min.ratio = 1e-04,
refit = TRUE,
M = 10,
max.iter = 50,
tol = 0.001,
tolCoefs = 0.001,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = list(solver = FALSE, progress = FALSE),
parallel = FALSE,
workers = NULL,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
val.data |
Validation data set. (The data set on which the penalty
parameter selection will be performed.) Must be a data set of class
|
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
nlambda |
The number of values for lambda0 (penalty parameter for L0 penalty) - default is 100. |
lambda.min.ratio |
Smallest value for lambda0, as a fraction of lambda0.max (data derived). |
refit |
Whether to refit the model combining training and validation
sets after parameter tuning. If |
M |
Big-M value used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
parallel |
The option to use parallel processing in fitting SMI models for different penalty parameter combinations. |
workers |
If |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
Value
An object of class smimodel. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit contains a list with six elements:
initial |
A list of information of the model initialisation. (For
descriptions of the list elements see |
best |
A list of information of the final optimised model. (For
descriptions of the list elements see |
best_lambdas |
Selected penalty parameter combination. |
lambda0_seq |
Sequence of values for lambda0 used to construct the initial grid. |
lambda2_seq |
Sequence of values for lambda2 used to construct the initial grid. |
searched |
A |
The number of
rows of the tibble equals to the number of levels in the grouping
variable.
See Also
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1205
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Validation set
sim_val <- sim_data[1001:1200, ]
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smi_greedy <- greedy_smimodel(data = sim_train,
val.data = sim_val,
yvar = "y",
index.vars = index.vars,
initialise = "ppr",
lambda.min.ratio = 0.1)
# Best (optimised) fitted model
smi_greedy$fit[[1]]$best
# Selected penalty parameter combination
smi_greedy$fit[[1]]$best_lambdas
}
Initialising index coefficients
Description
Initialises index coefficient vector through linear regression or penalised linear regression.
Usage
init_alpha(
Y,
X,
index.ind,
init.type = "penalisedReg",
lambda0 = 1,
lambda2 = 1,
M = 10
)
Arguments
Y |
Column matrix of response. |
X |
Matrix of predictors entering indices. |
index.ind |
An |
init.type |
Type of initialisation for index coefficients.
( |
lambda0 |
If |
lambda2 |
If |
M |
If |
Value
A list containing the following components:
alpha_init |
Normalised vector of index coefficients. |
alpha_nonNormalised |
Non-normalised (i.e. prior to normalising) vector of index coefficients. |
Updating index coefficients and non-linear functions iteratively
Description
Iteratively updates index coefficients and non-linear functions using mixed
integer programming. (A helper function used within
update_smimodelFit; users are not expected to directly call
this function.)
Usage
inner_update(
x,
data,
yvar,
family = gaussian(),
index.vars,
s.vars,
linear.vars,
num_ind,
dgz,
alpha_old,
lambda0 = 1,
lambda2 = 1,
M = 10,
max.iter = 50,
tol = 0.001,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = list(solver = FALSE, progress = FALSE)
)
Arguments
x |
Fitted |
data |
Training data set on which models will be trained. Should be a
|
yvar |
Name of the response variable as a character string. |
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
s.vars |
A |
linear.vars |
A |
num_ind |
Number of indices. |
dgz |
The |
alpha_old |
Current vector of index coefficients. |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for loss. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
Value
A list containing following elements:
best_alpha |
The vector of best index coefficient estimates. |
min_loss |
Minimum value of the objective function(loss). |
index.ind |
An |
ind_pos |
A list that indicates which predictors belong to which index,
corresponding to |
X_new |
A matrix of selected predictor variables, corresponding to
|
Function for adding lags of time series variables
Description
Generates specified number of lagged variables of the given variable in the form of a tibble.
Usage
lag_matrix(variable, n = 10)
Arguments
variable |
Variable to be lagged. |
n |
Number of lags. The default value is |
Value
A tibble.
Examples
library(dplyr)
library(tibble)
library(tidyr)
# Adding lagged variables to an existing tibble
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(100)) |>
mutate(x_lag = lag_matrix(x_lag_000, 3)) |>
unpack(x_lag, names_sep = "_")
Calculating the loss of the MIP used to estimate a SMI model
Description
Calculates the value of the objective function (loss function) of the mixed integer program used to estimate a SMI model.
Usage
loss(Y, Yhat, alpha, lambda0, lambda2)
Arguments
Y |
Column matrix of response. |
Yhat |
Predicted value of the response. |
alpha |
Vector of index coefficients. |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
Value
A numeric.
Converting a fitted gam object to a smimodelFit object
Description
Converts a given object of class gam to an object of class
smimodelFit.
Usage
make_smimodelFit(
x,
data,
yvar,
neighbour,
index.vars,
index.ind,
index.data,
index.names,
alpha,
s.vars = NULL,
linear.vars = NULL,
lambda0 = NULL,
lambda2 = NULL,
M = NULL,
max.iter = NULL,
tol = NULL,
tolCoefs = NULL,
TimeLimit = NULL,
MIPGap = NULL,
NonConvex = NULL
)
Arguments
x |
A fitted |
data |
The original training data set. |
yvar |
Name of the response variable as a character string. |
neighbour |
|
index.vars |
A |
index.ind |
An |
index.data |
A |
index.names |
A |
alpha |
A vector of index coefficients. |
s.vars |
A |
linear.vars |
A |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
Value
An object of class smimodelFit, which is a list that contains
following elements:
alpha |
A sparse matrix of index coefficients vectors. Each column of the matrix corresponds to the index coefficient vector of each index. |
derivatives |
A |
var_y |
Name of the response variable. |
vars_index |
A |
vars_s |
A |
vars_linear |
A |
neighbour |
Number of neighbours of each key considered in model fitting. |
gam |
Fitted |
lambda0 |
L0 penalty parameter used for model fitting. |
lambda2 |
L2 penalty parameter used for model fitting. |
M |
Big-M value used in MIP. |
max.iter |
Maximum number of MIP iterations for a single round of index coefficients update. |
tol |
Tolerance for the objective function value (loss) used in solving MIP. |
tolCoefs |
Tolerance for coefficients used in updating index coefficients. |
TimeLimit |
Limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap used. |
Nonconvex |
The strategy used for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
Nonparametric Additive Model with Backward Elimination
Description
Fits a nonparametric additive model, with simultaneous variable selection through a backward elimination procedure as proposed by Fan and Hyndman (2012).
Usage
model_backward(
data,
val.data,
yvar,
neighbour = 0,
family = gaussian(),
s.vars = NULL,
s.basedim = NULL,
linear.vars = NULL,
refit = TRUE,
tol = 0.001,
parallel = FALSE,
workers = NULL,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
verbose = FALSE
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
val.data |
Validation data set. (The data set on which the model
selection will be performed.) Must be a data set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
family |
A description of the error distribution and link function to be
used in the model (see |
s.vars |
A |
s.basedim |
Dimension of the bases used to represent the smooth terms
corresponding to |
linear.vars |
A |
refit |
Whether to refit the model combining training and validation
sets after model selection. If |
tol |
Tolerance for the ratio of relative change in validation set MSE, used in model selection. |
parallel |
Whether to use parallel computing in model selection or not. |
workers |
If |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
Details
This function fits a nonparametric additive model formulated through Backward Elimination, as proposed by Fan and Hyndman (2012). The process starts with all predictors included in an additive model, and predictors are progressively omitted until the best model is obtained based on the validation set. Once the best model is obtained, the final model is re-fitted for the data set combining training and validation sets. For more details see reference.
Value
An object of class backward. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is an
object of class gam. For details refer mgcv::gamObject.
References
Fan, S. & Hyndman, R.J. (2012). Short-Term Load Forecasting Based on a Semi-Parametric Additive Model. IEEE Transactions on Power Systems, 27(1), 134-141.doi:10.1109/TPWRS.2011.2162082.
See Also
model_smimodel, model_gaim,
model_ppr, model_gam, model_lm
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1205
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Validation set
sim_val <- sim_data[1001:1200, ]
# Predictors taken as non-linear variables
s.vars <- colnames(sim_data)[3:8]
# Model fitting
backwardModel <- model_backward(data = sim_train,
val.data = sim_val,
yvar = "y",
s.vars = s.vars)
# Fitted model
backwardModel$fit[[1]]
Groupwise Additive Index Models (GAIM)
Description
A wrapper for cgaim::cgaim() enabling multiple GAIM models based on a
grouping variable. Currently does not support Constrained GAIM (CGAIM)s.
Usage
model_gaim(
data,
yvar,
neighbour = 0,
index.vars,
index.ind,
s.vars = NULL,
linear.vars = NULL,
verbose = FALSE,
...
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
index.vars |
A |
index.ind |
An |
s.vars |
A |
linear.vars |
A |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
... |
Other arguments not currently used. (Note that the arguments in
|
Details
Group-wise Additive Index Model (GAIM) can be written in the form
y_{i} = \sum_{j = 1}^{p} g_{j}(\boldsymbol{\alpha}_{j}^{T}\boldsymbol{x}_{ij}) +
\varepsilon_{i}, \quad i = 1, \dots, n,
where y_{i} is the univariate
response, \boldsymbol{x}_{ij} \in \mathbb{R}^{l{j}}, j = 1,
\dots, p are pre-specified non-overlapping subsets of
\boldsymbol{x}_{i}, and \boldsymbol{\alpha}_j are the
corresponding index coefficients, g_{j} is an unknown (possibly
nonlinear) component function, and \varepsilon_{i} is the random
error, which is independent of \boldsymbol{x}_{i}.
Value
An object of class gaimFit. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is an
object of class cgaim. For details refer cgaim::cgaim().
See Also
model_smimodel, model_backward,
model_ppr, model_gam, model_lm
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Predictors taken as index variables
index.vars <- colnames(sim_data)[3:7]
# Assign group indices for each predictor
index.ind = c(rep(1, 3), rep(2, 2))
# Predictors taken as non-linear variables not entering indices
s.vars = "x_lag_005"
# Model fitting
gaimModel <- model_gaim(data = sim_data,
yvar = "y",
index.vars = index.vars,
index.ind = index.ind,
s.vars = s.vars)
# Fitted model
gaimModel$fit[[1]]
Generalised Additive Models
Description
A wrapper for mgcv::gam() enabling multiple GAMs based on a grouping
variable.
Usage
model_gam(
data,
yvar,
family = gaussian(),
neighbour = 0,
s.vars,
s.basedim = NULL,
linear.vars = NULL,
verbose = FALSE,
...
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
family |
A description of the error distribution and link function to be
used in the model (see |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
s.vars |
A |
s.basedim |
Dimension of the bases used to represent the smooth terms
corresponding to |
linear.vars |
A |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
... |
Other arguments not currently used. |
Value
An object of class gamFit. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is an
object of class gam. For details refer mgcv::gamObject.
See Also
model_smimodel, model_backward,
model_gaim, model_ppr, model_lm
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Predictors taken as non-linear variables
s.vars <- colnames(sim_data)[3:6]
# Predictors taken as linear variables
linear.vars <- colnames(sim_data)[7:8]
# Model fitting
gamModel <- model_gam(data = sim_data,
yvar = "y",
s.vars = s.vars,
linear.vars = linear.vars)
# Fitted model
gamModel$fit[[1]]
Linear Regression models
Description
A wrapper for lm enabling multiple linear models based on a
grouping variable.
Usage
model_lm(data, yvar, neighbour = 0, linear.vars, verbose = FALSE, ...)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
linear.vars |
A character vector of names of the predictor variables. |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
... |
Other arguments not currently used. |
Value
An object of class lmFit. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is
an object of class lm. For details refer stats::lm.
See Also
model_smimodel, model_backward,
model_gaim, model_ppr, model_gam
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Predictor variables
linear.vars <- colnames(sim_data)[3:8]
# Model fitting
lmModel <- model_lm(data = sim_data,
yvar = "y",
linear.vars = linear.vars)
# Fitted model
lmModel$fit[[1]]
Projection Pursuit Regression (PPR) models
Description
A wrapper for stats::ppr() enabling multiple PPR models based on a
grouping variable.
Usage
model_ppr(
data,
yvar,
neighbour = 0,
index.vars,
num_ind = 5,
verbose = FALSE,
...
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
index.vars |
A |
num_ind |
An |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
... |
Other arguments not currently used. (For more information on other
arguments that can be passed, refer |
Details
A Projection Pursuit Regression (PPR) model (Friedman & Stuetzle (1981)) is given by
y_{i} = \sum_{j=1}^{p} {g_{j}(\boldsymbol{\alpha}_{j}^{T}\boldsymbol{x}_{i})} +
\varepsilon_{i}, \quad i = 1, \dots, n,
where y_{i} is the response,
\boldsymbol{x}_{i} is the q-dimensional predictor vector,
\boldsymbol{\alpha}_{j} = ( \alpha_{j1}, \dots, \alpha_{jp} )^{T},
j = 1, \dots, p are q-dimensional projection vectors (or
vectors of "index coefficients"), g_{j}'s are unknown nonlinear
functions, and \varepsilon_{i} is the random error.
Value
An object of class pprFit. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is an
object of class c("ppr.form", "ppr"). For details refer
stats::ppr().
References
Friedman, J. H. & Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association, 76, 817–823. doi:10.2307/2287576.
See Also
model_smimodel, model_backward,
model_gaim, model_gam, model_lm
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
pprModel <- model_ppr(data = sim_data,
yvar = "y",
index.vars = index.vars)
# Fitted model
pprModel$fit[[1]]
Sparse Multiple Index (SMI) Models
Description
Fits nonparametric multiple index model(s), with simultaneous predictor selection (hence "sparse") and predictor grouping. Possible to fit multiple SMI models based on a grouping variable.
Usage
model_smimodel(
data,
yvar,
neighbour = 0,
family = gaussian(),
index.vars,
initialise = c("ppr", "additive", "linear", "multiple", "userInput"),
num_ind = 5,
num_models = 5,
seed = 123,
index.ind = NULL,
index.coefs = NULL,
s.vars = NULL,
linear.vars = NULL,
lambda0 = 1,
lambda2 = 1,
M = 10,
max.iter = 50,
tol = 0.001,
tolCoefs = 0.001,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = list(solver = FALSE, progress = FALSE)
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
Details
Sparse Multiple Index (SMI) model is a semi-parametric model that can be written as
y_{i} = \beta_{0} +
\sum_{j = 1}^{p}g_{j}(\boldsymbol{\alpha}_{j}^{T}\boldsymbol{x}_{ij}) +
\sum_{k = 1}^{d}f_{k}(w_{ik}) + \boldsymbol{\theta}^{T}\boldsymbol{u}_{i} +
\varepsilon_{i}, \quad i = 1, \dots, n,
where y_{i} is the univariate
response, \beta_{0} is the model intercept, \boldsymbol{x}_{ij} \in
\mathbb{R}^{l_{j}}, j = 1, \dots, p are p subsets of predictors
entering indices, \boldsymbol{\alpha}_{j} is a vector of index
coefficients corresponding to the index h_{ij} =
\boldsymbol{\alpha}_{j}^{T}\boldsymbol{x}_{ij}, and g_{j} is a
smooth nonlinear function (estimated by a penalised cubic regression
spline). The model also allows for predictors that do not enter any
indices, including covariates w_{ik} that relate to the response
through nonlinear functions f_{k}, k = 1, \dots, d, and linear
covariates \boldsymbol{u}_{i}.
In the model formulation related to this implementation, both the number of
indices p and the predictor grouping among indices are assumed to be
unknown prior to model estimation. Suppose we observe y_1,\dots,y_n,
along with a set of potential predictors,
\boldsymbol{x}_1,\dots,\boldsymbol{x}_n, with each vector
\boldsymbol{x}_i containing q predictors. This function
implements algorithmic variable selection for index variables (i.e.
predictors entering indices) of the SMI model by allowing for zero index
coefficients for predictors. Non-overlapping predictors among indices are
assumed (i.e. no predictor enters more than one index). For algorithmic
details see reference.
Value
An object of class smimodel. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit contains a list with two elements:
initial |
A list of information of the model initialisation. (For
descriptions of the list elements see |
best |
A list of information of the final optimised model. (For
descriptions of the list elements see |
References
Palihawadana, N.K., Hyndman, R.J. & Wang, X. (2024). Sparse Multiple Index Models for High-Dimensional Nonparametric Forecasting. (Department of Econometrics and Business Statistics Working Paper Series 16/24).
See Also
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_data,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
# Best (optimised) fitted model
smimodel_ppr$fit[[1]]$best
}
Constructor function for the class smimodelFit
Description
Constructs an object of class smimodelFit using the information passed
to arguments.
Usage
new_smimodelFit(
data,
yvar,
neighbour = 0,
family = gaussian(),
index.vars,
initialise = c("additive", "linear", "userInput"),
index.ind = NULL,
index.coefs = NULL,
s.vars = NULL,
linear.vars = NULL
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
|
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is "additive", where the initial model
will be a nonparametric additive model. The other options are "linear" -
linear regression model (i.e. a special case single-index model, where the
initial values of the index coefficients are obtained through a linear
regression), and "userInput" - user specifies the initial model structure
(i.e. the number of indices and the placement of index variables among
indices) and the initial index coefficients through |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
Value
A list of initial model information. For descriptions of the list
elements see make_smimodelFit).
Scaling index coefficient vectors to have unit norm
Description
Scales a coefficient vector of a particular index to have unit norm.
Usage
normalise_alpha(alpha)
Arguments
alpha |
A vector of index coefficients. |
Value
A numeric vector.
Possible future sample paths (multi-step) from residuals of a fitted benchmark model
Description
Generates possible future sample paths (multi-step) using residuals of a fitted benchmark model through recursive forecasting.
Usage
possibleFutures_benchmark(
object,
newdata,
bootstraps,
exclude.trunc = NULL,
recursive_colRange
)
Arguments
object |
A fitted model object of the class |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
bootstraps |
Generated matrix of bootstrapped residual series. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive_colRange |
The range of column numbers in |
Value
A list containing the following components:
firstFuture |
A
|
future_cols |
A list of multi-steps-ahead simulated futures, where
each list element corresponds to each 1-step-ahead simulated future in
|
Possible future sample paths (multi-step) from smimodel residuals
Description
Generates possible future sample paths (multi-step) using residuals of a
fitted smimodel through recursive forecasting.
Usage
possibleFutures_smimodel(
object,
newdata,
bootstraps,
exclude.trunc = NULL,
recursive_colRange
)
Arguments
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
bootstraps |
Generated matrix of bootstrapped residual series. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive_colRange |
The range of column numbers in |
Value
A list containing the following components:
firstFuture |
A
|
future_cols |
A list of multi-steps-ahead simulated futures, where
each list element corresponds to each 1-step-ahead simulated future in
|
Obtaining forecasts on a test set from a fitted backward
Description
Gives forecasts on a test set.
Usage
## S3 method for class 'backward'
predict(
object,
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
A tsibble with forecasts on test set.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1215
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Validation set
sim_val <- sim_data[1001:1200, ]
# Test set
sim_test <- sim_data[1201:1210, ]
# Predictors taken as non-linear variables
s.vars <- colnames(sim_data)[3:8]
# Model fitting
backwardModel <- model_backward(data = sim_train,
val.data = sim_val,
yvar = "y",
s.vars = s.vars)
predict(object = backwardModel, newdata = sim_test)
Obtaining forecasts on a test set from a fitted gaimFit
Description
Gives forecasts on a test set.
Usage
## S3 method for class 'gaimFit'
predict(
object,
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
A tsibble with forecasts on test set.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Predictors taken as index variables
index.vars <- colnames(sim_data)[3:7]
# Assign group indices for each predictor
index.ind = c(rep(1, 3), rep(2, 2))
# Predictors taken as non-linear variables not entering indices
s.vars = "x_lag_005"
# Model fitting
gaimModel <- model_gaim(data = sim_train,
yvar = "y",
index.vars = index.vars,
index.ind = index.ind,
s.vars = s.vars)
predict(object = gaimModel, newdata = sim_test)
Obtaining forecasts on a test set from a fitted gamFit
Description
Gives forecasts on a test set.
Usage
## S3 method for class 'gamFit'
predict(
object,
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
A tsibble with forecasts on test set.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Predictors taken as non-linear variables
s.vars <- colnames(sim_data)[3:6]
# Predictors taken as linear variables
linear.vars <- colnames(sim_data)[7:8]
# Model fitting
gamModel <- model_gam(data = sim_train,
yvar = "y",
s.vars = s.vars,
linear.vars = linear.vars)
predict(object = gamModel, newdata = sim_test)
Obtaining forecasts on a test set from a fitted lmFit
Description
Gives forecasts on a test set.
Usage
## S3 method for class 'lmFit'
predict(object, newdata, recursive = FALSE, recursive_colRange = NULL, ...)
Arguments
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
A tsibble with forecasts on test set.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Predictor variables
linear.vars <- colnames(sim_data)[3:8]
# Model fitting
lmModel <- model_lm(data = sim_train,
yvar = "y",
linear.vars = linear.vars)
predict(object = lmModel, newdata = sim_test)
Obtaining forecasts on a test set from a fitted pprFit
Description
Gives forecasts on a test set.
Usage
## S3 method for class 'pprFit'
predict(
object,
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
A tsibble with forecasts on test set.
Examples
library(dplyr)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
pprModel <- model_ppr(data = sim_train,
yvar = "y",
index.vars = index.vars)
predict(object = pprModel, newdata = sim_test)
Obtaining forecasts on a test set from a fitted smimodel
Description
Gives forecasts on a test set.
Usage
## S3 method for class 'smimodel'
predict(
object,
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
A tsibble with forecasts on test set.
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_train,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
predict(object = smimodel_ppr, newdata = sim_test)
}
Obtaining forecasts on a test set from a smimodelFit
Description
Gives forecasts on a test set.
Usage
## S3 method for class 'smimodelFit'
predict(
object,
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
A tibble with forecasts on test set.
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1015
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1010, ]
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_train,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
predict(object = smimodel_ppr$fit[[1]]$best, newdata = sim_test)
}
Obtaining recursive forecasts on a test set from a fitted mgcv::gam
Description
Gives recursive forecasts on a test set.
Usage
predict_gam(
object,
newdata,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL,
...
)
Arguments
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
Value
A tibble with forecasts on test set.
Prepare a data set for recursive forecasting
Description
Prepare a test data for recursive forecasting by appropriately removing existing (actual) values from a specified range of columns (lagged response columns) of the data set. Handles seasonal data with gaps.
Usage
prep_newdata(newdata, recursive_colRange)
Arguments
newdata |
Data set to be ared. Should be a |
recursive_colRange |
The range of column numbers (lagged response
columns) in |
Value
A tibble.
Printing a backward object
Description
The default print method for a backward object.
Usage
## S3 method for class 'backward'
print(x, ...)
Arguments
x |
A model object of class |
... |
Other arguments not currently used. |
Value
No return value; called for side effects. Prints a summary of the fitted model(s) to console.
Printing a gaimFit object
Description
The default print method for a gaimFit object.
Usage
## S3 method for class 'gaimFit'
print(x, ...)
Arguments
x |
A model object of class |
... |
Other arguments not currently used. |
Value
No return value; called for side effects. Prints a summary of the fitted model(s) to console.
Printing a pprFit object
Description
The default print method for a pprFit object.
Usage
## S3 method for class 'pprFit'
print(x, ...)
Arguments
x |
A model object of class |
... |
Other arguments not currently used. |
Value
No return value; called for side effects. Prints a summary of the fitted model(s) to console.
Printing a smimodel object
Description
The default print method for a smimodel object.
Usage
## S3 method for class 'smimodel'
print(x, ...)
Arguments
x |
An object of class |
... |
Other arguments not currently used. |
Value
No return value; called for side effects. Prints a summary of the fitted model(s) to console.
Printing a smimodelFit object
Description
The default print method for a smimodelFit object.
Usage
## S3 method for class 'smimodelFit'
print(x, ...)
Arguments
x |
An object of class |
... |
Other arguments not currently used. |
Value
No return value; called for side effects. Prints a summary of the fitted model to console.
Randomly sampling a block
Description
Samples a block of specified size from a given series starting form a random point in the series.
Usage
randomBlock(series, block.size)
Arguments
series |
A series from which a block should be sampled. |
block.size |
Size of the block to be sampled. |
Value
A numeric vector.
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Remove actual values from a data set for recursive forecasting
Description
Appropriately removes existing (actual) values from the specified column range (lagged response columns) of a given data set (typically a test set for which recursive forecasting is required).
Usage
remove_lags(data, recursive_colRange)
Arguments
data |
Data set (a |
recursive_colRange |
The range of column numbers in |
Value
A tibble.
Generate multiple single season block bootstrap series
Description
Generates multiple replications of single season block bootstrap series.
Usage
residBootstrap(x, season.period = 1, m = 1, num.bootstrap = 1000)
Arguments
x |
A series of residuals from which bootstrap series to be generated. |
season.period |
Length of the seasonal period. |
m |
Multiplier. (Block size = |
num.bootstrap |
Number of bootstrap series to be generated. |
Value
A matrix of bootstrapped series.
Extract residuals from a fitted smimodel
Description
Generates residuals from a fitted smimodel object.
Usage
## S3 method for class 'smimodel'
residuals(object, ...)
Arguments
object |
A |
... |
Other arguments not currently used. |
Value
A numeric vector of residuals.
Examples
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_data,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
residuals(smimodel_ppr)
}
Scale data
Description
Scales the columns of the data corresponding to index.vars.
Usage
scaling(data, index.vars)
Arguments
data |
Training data set on which models will be trained. Should be a
|
index.vars |
A character vector of names of the predictor variables for which indices should be estimated. |
Value
A list containing the following components:
scaled_data |
The
scaled data set of class |
scaled_info |
A named
|
Single season block bootstrap
Description
Generates a single replication of single season block bootstrap series.
Usage
seasonBootstrap(x, season.period = 1, m = 1)
Arguments
x |
A series of residuals from which bootstrap series to be generated. |
season.period |
Length of the seasonal period. |
m |
Multiplier. (Block size = |
Value
A numeric vector.
SMI model estimation
Description
Fits a single nonparametric multiple index model to the data. This is a
helper function designed to be called from user-facing wrapper functions,
model_smimodel and greedy_smimodel.
Usage
smimodel.fit(
data,
yvar,
neighbour = 0,
family = gaussian(),
index.vars,
initialise = c("ppr", "additive", "linear", "multiple", "userInput"),
num_ind = 5,
num_models = 5,
seed = 123,
index.ind = NULL,
index.coefs = NULL,
s.vars = NULL,
linear.vars = NULL,
lambda0 = 1,
lambda2 = 1,
M = 10,
max.iter = 50,
tol = 0.001,
tolCoefs = 0.001,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = list(solver = FALSE, progress = FALSE)
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
|
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
Value
A list with two elements:
initial |
A list of information of the
model initialisation. (For descriptions of the list elements see
|
best |
A list of information of the
final optimised model. (For descriptions of the list elements see
|
Splitting predictors into multiple indices
Description
Splits a given number of predictors into a given number of indices.
Usage
split_index(num_pred, num_ind)
Arguments
num_pred |
Number of predictors. |
num_ind |
Number of indices. |
Value
A list containing the following components:
index |
An
|
index_positions |
A list of length = |
Truncating predictors to be in the in-sample range
Description
Truncates predictors to be in the in-sample range to avoid spline extrapolation.
Usage
truncate_vars(range.object, data, cols.trunc)
Arguments
range.object |
A matrix containing range of each predictor variable. Should be a matrix with two rows for min and max, and the columns should correspond to variables. |
data |
Out-of-sample data set of which variables should be truncated. |
cols.trunc |
Column names of the variables to be truncated. |
SMI model with a given penalty parameter combination
Description
Fits a nonparametric multiple index model to the data for a given combination
of the penalty parameters (lambda0, lambda2), and returns the validation set
mean squared error (MSE). (Used within greedy.fit; users are
not expected to use this function directly.)
Usage
tune_smimodel(
data,
val.data,
yvar,
neighbour = 0,
family = gaussian(),
index.vars,
initialise = c("ppr", "additive", "linear", "multiple", "userInput"),
num_ind = 5,
num_models = 5,
seed = 123,
index.ind = NULL,
index.coefs = NULL,
s.vars = NULL,
linear.vars = NULL,
lambda.comb = c(1, 1),
M = 10,
max.iter = 50,
tol = 0.001,
tolCoefs = 0.001,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = list(solver = FALSE, progress = FALSE),
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL
)
Arguments
data |
Training data set on which models will be trained. Must be a data
set of class |
val.data |
Validation data set. (The data set on which the penalty
parameter selection will be performed.) Must be a data set of class
|
yvar |
Name of the response variable as a character string. |
neighbour |
|
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
lambda.comb |
A |
M |
Big-M value used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
Value
A numeric.
Unscale a fitted smimodel
Description
Transforms back the index coefficients to suit original-scale index variables
if the same were scaled when estimating the smimodel (happens in
initialise = "ppr" in model_smimodel or
greedy_smimodel). Users are not expected to directly use this
function; usually called within smimodel.fit.
Usage
unscaling(object, scaledInfo)
Arguments
object |
A |
scaledInfo |
The list returned from a call of the function
|
Value
A smimodel object.
Updating index coefficients using MIP
Description
Updates index coefficients by solving a mixed integer program.
Usage
update_alpha(
Y,
X,
num_pred,
num_ind,
index.ind,
dgz,
alpha_old,
lambda0 = 1,
lambda2 = 1,
M = 10,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = FALSE
)
Arguments
Y |
Column matrix of response. |
X |
Matrix of predictors (size adjusted to number of indices). |
num_pred |
Number of predictors. |
num_ind |
Number of indices. |
index.ind |
An integer vector that assigns group index for each predictor. |
dgz |
The |
alpha_old |
Vector of index coefficients from previous iteration. |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
The option to print detailed solver output. |
Value
A vector of normalised index coefficients.
Updating a smimodelFit
Description
Optimises and updates a given smimodelFit.
Usage
update_smimodelFit(
object,
data,
lambda0 = 1,
lambda2 = 1,
M = 10,
max.iter = 50,
tol = 0.001,
tolCoefs = 0.001,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = list(solver = FALSE, progress = FALSE),
...
)
Arguments
object |
A |
data |
Training data set on which models will be trained. Must be a data
set of class |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
... |
Other arguments not currently used. |
Value
A list of optimised model information. For descriptions of the list
elements see make_smimodelFit).