These functions are primarily useful for writing methods for the
cv()
generic function. They are used internally in the package
and can also be used for extensions (see the vignette "Extending the cv package,
vignette("cv-extend", package="cv")
).
Usage
cvCompute(
model,
data = insight::get_data(model),
criterion = mse,
criterion.name,
k = 10L,
reps = 1L,
seed,
details = k <= 10L,
confint,
level = 0.95,
method = NULL,
ncores = 1L,
type = "response",
start = FALSE,
f,
fPara = f,
locals = list(),
model.function = NULL,
model.function.name = NULL,
...
)
cvMixed(
model,
package,
data = insight::get_data(model),
criterion = mse,
criterion.name,
k,
reps = 1L,
confint,
level = 0.95,
seed,
details,
ncores = 1L,
clusterVariables,
predict.clusters.args = list(object = model, newdata = data),
predict.cases.args = list(object = model, newdata = data),
fixed.effects,
...
)
cvSelect(
procedure,
data,
criterion = mse,
criterion.name,
model,
y.expression,
k = 10L,
confint = n >= 400,
level = 0.95,
reps = 1L,
save.coef,
details = k <= 10L,
save.model = FALSE,
seed,
ncores = 1L,
...
)
folds(n, k)
fold(folds, i, ...)
# S3 method for class 'folds'
fold(folds, i, ...)
# S3 method for class 'folds'
print(x, ...)
GetResponse(model, ...)
# Default S3 method
GetResponse(model, ...)
# S3 method for class 'merMod'
GetResponse(model, ...)
# S3 method for class 'lme'
GetResponse(model, ...)
# S3 method for class 'glmmTMB'
GetResponse(model, ...)
# S3 method for class 'modList'
GetResponse(model, ...)
Arguments
- model
a regression model object.
- data
data frame to which the model was fit (not usually necessary, except for
cvSelect()
).- criterion
cross-validation criterion ("cost" or lack-of-fit) function of form
f(y, yhat)
wherey
is the observed values of the response andyhat
the predicted values; the default ismse
(the mean-squared error).- criterion.name
a character string giving the name of the CV criterion function in the returned
"cv"
object).- k
perform k-fold cross-validation (default is
10
);k
may be a number or"loo"
or"n"
for n-fold (leave-one-out) cross-validation; forfolds()
,k
must be a number.- reps
number of times to replicate k-fold CV (default is
1
).- seed
for R's random number generator; optional, if not supplied a random seed will be selected and saved; not needed for n-fold cross-validation.
- details
if
TRUE
(the default if the number of foldsk <= 10
), save detailed information about the value of the CV criterion for the cases in each fold and the regression coefficients with that fold deleted.- confint
if
TRUE
(the default if the number of cases is 400 or greater), compute a confidence interval for the bias-corrected CV criterion, if the criterion is the average of casewise components.- level
confidence level (default
0.95
).- method
computational method to apply; use by some
cv()
methods.- ncores
number of cores to use for parallel computations (default is
1
, i.e., computations aren't done in parallel).- type
used by some
cv()
methods, such as the default method, wheretype
is passed to thetype
argument ofpredict()
; the default istype="response"
, which is appropriate, e.g., for a"glm"
model and may be recognized or ignored bypredict()
methods for other model classes.- start
used by some
cv()
methods; ifTRUE
(the default isFALSE
), thestart
argument, set to the vector of regression coefficients for the model fit to the full data, is passed toupdate()
, possibly making the CV updates faster, e.g. for a GLM.- f
function to be called by
cvCompute()
for each fold.- fPara
function to be called by
cvCompute()
for each fold using parallel computation.- locals
a named list of objects that are required in the local environment of
cvCompute()
forf()
orfPara()
.- model.function
a regression function, typically for a new
cv()
method, residing in a package that's not a declared dependency of the cv package, e.g.,nnet::multinom
.- model.function.name
the quoted name of the regression function, e.g.,
"multinom"
.- ...
to match generic; passed to
predict()
for the default method, and tofPara()
(for parallel computations) incvCompute()
.- package
the name of the package in which mixed-modeling function (or functions) employed resides; used to get the namespace of the package.
- clusterVariables
a character vector of names of the variables defining clusters for a mixed model with nested or crossed random effects; if missing, cross-validation is performed for individual cases rather than for clusters
- predict.clusters.args
a list of arguments to be used to predict the whole data set from a mixed model when performing CV on clusters; the first two elements should be
model
andnewdata
; see the "Extending the cv package" vignette (vignette("cv-extend", package="cv")
).- predict.cases.args
a list of arguments to be used to predict the whole data set from a mixed model when performing CV on cases; the first two elements should be
model
andnewdata
; see the "Extending the cv package" vignette (vignette("cv-extend", package="cv")
).- fixed.effects
a function to be used to compute fixed-effect coefficients for cluster-based CV when
details = TRUE
.- procedure
a model-selection procedure function (see Details).
- y.expression
normally the response variable is found from the
model
argument; but if, for a particular selection procedure, themodel
argument is absent, or if the response can't be inferred from the model, the response can be specified by an expression, such asexpression(log(income))
, to be evaluated within the data set provided by thedata
argument.- save.coef
save the coefficients from the selected models? Deprecated in favor of the
details
argument; if specified,details
is set is set to the value ofsave.coef
.- save.model
save the model that's selected using the full data set.
- n
number of cases, for constructed folds.
- folds
an object of class
"folds"
.- i
a fold number for an object of class
"folds"
.- x
a
"cv"
,"cvList"
, or"folds"
object to be printed
Value
The utility functions return various kinds of objects:
cvCompute()
returns an object of class"cv"
, with the CV criterion ("CV crit"
), the bias-adjusted CV criterion ("adj CV crit"
), the criterion for the model applied to the full data ("full crit"
), the confidence interval and level for the bias-adjusted CV criterion ("confint"
), the number of folds ("k"
), and the seed for R's random-number generator ("seed"
). Ifdetails=TRUE
, then the returned object will also include a"details"
component, which is a list of two elements:"criterion"
, containing the CV criterion computed for the cases in each fold; and"coefficients"
, regression coefficients computed for the model with each fold deleted. Somecv()
methods callingcvCompute()
may return a subset of these components and may add additional information. Ifreps
>1
, then an object of class"cvList"
is returned, which is literally a list of"cv"
objects.cvMixed()
also returns an object of class"cv"
or"cvList"
.cvSelect
returns an object of class"cvSelect"
inheriting from"cv"
, or an object of class"cvSelectList"
inheriting from"cvList"
.folds()
returns an object of class folds, for which there arefold()
andprint()
methods.GetResponse()
returns the (numeric) response variable from the model.The supplied
default
method returns themodel$y
component of the model object, or, ifmodel
is an S4 object, the result returned by theget_response()
function in the insight package. If this result isNULL
, the result ofmodel.response(model.frame(model))
is returned, checking in any case whether the result is a numeric vector.There are also
"lme"
,"merMod"
and"glmmTMB"
methods that convert factor responses to numeric 0/1 responses, as would be appropriate for a generalized linear mixed model with a binary response.
Functions
cvCompute()
: used internally bycv()
methods (not for direct use); exported to support newcv()
methods.cvMixed()
: used internally bycv()
methods for mixed-effect models (not for direct use); exported to support newcv()
methods.cvSelect()
: used internally bycv()
methods for cross-validating a model-selection procedure; may also be called directly for this purpose, but use viacv()
is preferred.cvSelect()
is exported primarily to support new model-selection procedures.folds()
: used internally bycv()
methods (not for direct use).fold()
: to extract a fold from a"folds"
object.fold(folds)
:fold()
method for"folds"
objects.print(folds)
:print()
method for"folds"
objects.GetResponse()
: function to return the response variable from a regression model.GetResponse(default)
: default method.GetResponse(merMod)
:"merMod"
method.GetResponse(lme)
:"lme"
method.GetResponse(glmmTMB)
:"glmmTMB"
method.GetResponse(modList)
:"modList"
method.
Examples
fit <- lm(mpg ~ gear, mtcars)
GetResponse(fit)
#> Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> 21.0 21.0 22.8 21.4
#> Hornet Sportabout Valiant Duster 360 Merc 240D
#> 18.7 18.1 14.3 24.4
#> Merc 230 Merc 280 Merc 280C Merc 450SE
#> 22.8 19.2 17.8 16.4
#> Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
#> 17.3 15.2 10.4 10.4
#> Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
#> 14.7 32.4 30.4 33.9
#> Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
#> 21.5 15.5 15.2 13.3
#> Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
#> 19.2 27.3 26.0 30.4
#> Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
#> 15.8 19.7 15.0 21.4
set.seed(123)
(ffs <- folds(n=22, k=5))
#> 5 folds of approximately 4 cases each
#> fold 1: 15 19 14 3 10
#> fold 2: 11 5 4 20 6
#> fold 3: 9 18 16 21
#> fold 4: 12 1 22 7
#> fold 5: 17 13 8 2
fold(ffs, 2)
#> [1] 11 5 4 20 6