% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/check_collinearity.R, R/check_concurvity.R
\name{check_collinearity}
\alias{check_collinearity}
\alias{multicollinearity}
\alias{check_collinearity.default}
\alias{check_collinearity.glmmTMB}
\alias{check_concurvity}
\title{Check for multicollinearity of model terms}
\usage{
check_collinearity(x, ...)

multicollinearity(x, ...)

\method{check_collinearity}{default}(x, ci = 0.95, verbose = TRUE, ...)

\method{check_collinearity}{glmmTMB}(x, component = "all", ci = 0.95, verbose = TRUE, ...)

check_concurvity(x, ...)
}
\arguments{
\item{x}{A model object (that should at least respond to \code{vcov()}, and if
possible, also to \code{model.matrix()} - however, it also should work without
\code{model.matrix()}).}

\item{...}{Currently not used.}

\item{ci}{Confidence Interval (CI) level for VIF and tolerance values.}

\item{verbose}{Toggle off warnings or messages.}

\item{component}{For models with zero-inflation component, multicollinearity
can be checked for the conditional model (count component, \code{component = "conditional"} or \code{component = "count"}), zero-inflation component
(\code{component = "zero_inflated"} or \code{component = "zi"}) or both components
(\code{component = "all"}). Following model-classes are currently supported:
\code{hurdle}, \code{zeroinfl}, \code{zerocount}, \code{MixMod} and \code{glmmTMB}.}
}
\value{
A data frame with information about name of the model term, the
(generalized) variance inflation factor and associated confidence intervals,
the adjusted VIF, which is the factor by which the standard error is
increased due to possible correlation with other terms (inflation due to
collinearity), and tolerance values (including confidence intervals), where
\code{tolerance = 1/vif}.
}
\description{
\code{check_collinearity()} checks regression models for multicollinearity by
calculating the (generalized) variance inflation factor (VIF, Fox & Monette
1992). \code{multicollinearity()} is an alias for \code{check_collinearity()}.
\code{check_concurvity()} is a wrapper around \code{mgcv::concurvity()}, and can be
considered as a collinearity check for smooth terms in GAMs. Confidence
intervals for VIF and tolerance are based on Marcoulides et al. (2019,
Appendix B).
}
\details{
\code{check_collinearity()} calculates the generalized variance inflation factor
(Fox & Monette 1992), which also returns valid results for categorical
variables. The \emph{adjusted} VIF is calculated as \verb{VIF^(1/(2*<nlevels>)} (Fox &
Monette 1992), which is identical to the square root of the VIF for numeric
predictors, or for categorical variables with two levels.
}
\note{
The code to compute the confidence intervals for the VIF and tolerance
values was adapted from the Appendix B from the Marcoulides et al. paper.
Thus, credits go to these authors the original algorithm. There is also
a \href{https://easystats.github.io/see/articles/performance.html}{\code{plot()}-method}
implemented in the \href{https://easystats.github.io/see/}{\pkg{see}-package}.
}
\section{Multicollinearity}{

Multicollinearity should not be confused with a raw strong correlation
between predictors. What matters is the association between one or more
predictor variables, \emph{conditional on the other variables in the
model}. In a nutshell, multicollinearity means that once you know the
effect of one predictor, the value of knowing the other predictor is rather
low. Thus, one of the predictors doesn't help much in terms of better
understanding the model or predicting the outcome. As a consequence, if
multicollinearity is a problem, the model seems to suggest that the
predictors in question don't seems to be reliably associated with the
outcome (low estimates, high standard errors), although these predictors
actually are strongly associated with the outcome, i.e. indeed might have
strong effect (\emph{McElreath 2020, chapter 6.1}).

Multicollinearity might arise when a third, unobserved variable has a causal
effect on each of the two predictors that are associated with the outcome.
In such cases, the actual relationship that matters would be the association
between the unobserved variable and the outcome.

Remember: "Pairwise correlations are not the problem. It is the conditional
associations - not correlations - that matter." (\emph{McElreath 2020, p. 169})
}

\section{Interpretation of the Variance Inflation Factor}{

The variance inflation factor is a measure to analyze the magnitude of
multicollinearity of model terms. A VIF less than 5 indicates a low
correlation of that predictor with other predictors. A value between 5 and
10 indicates a moderate correlation, while VIF values larger than 10 are a
sign for high, not tolerable correlation of model predictors (\emph{James et al.
2013}). The \emph{adjusted VIF} column in the output indicates how much larger
the standard error is due to the association with other predictors
conditional on the remaining variables in the model. Note that these
thresholds, although commonly used, are also criticized for being too high.
\emph{Zuur et al. (2010)} suggest using lower values, e.g. a VIF of 3 or larger
may already no longer be considered as "low".
}

\section{Multicollinearity and Interaction Terms}{

If interaction terms are included in a model, high VIF values are expected.
This portion of multicollinearity among the component terms of an
interaction is also called "inessential ill-conditioning", which leads to
inflated VIF values that are typically seen for models with interaction
terms \emph{(Francoeur 2013)}. Centering interaction terms can resolve this
issue \emph{(Kim and Jung 2024)}.
}

\section{Multicollinearity and Polynomial Terms}{

Polynomial transformations are considered a single term and thus VIFs are
not calculated between them.
}

\section{Concurvity for Smooth Terms in Generalized Additive Models}{

\code{check_concurvity()} is a wrapper around \code{mgcv::concurvity()}, and can be
considered as a collinearity check for smooth terms in GAMs."Concurvity
occurs when some smooth term in a model could be approximated by one or more
of the other smooth terms in the model." (see \code{?mgcv::concurvity}).
\code{check_concurvity()} returns a column named \emph{VIF}, which is the "worst"
measure. While \code{mgcv::concurvity()} range between 0 and 1, the \emph{VIF} value
is \code{1 / (1 - worst)}, to make interpretation comparable to classical VIF
values, i.e. \code{1} indicates no problems, while higher values indicate
increasing lack of identifiability. The \emph{VIF proportion} column equals the
"estimate" column from \code{mgcv::concurvity()}, ranging from 0 (no problem) to
1 (total lack of identifiability).
}

\examples{
m <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
check_collinearity(m)

\dontshow{if (insight::check_if_installed("see", minimum_version = "0.9.1", quietly = TRUE)) withAutoprint(\{ # examplesIf}
# plot results
x <- check_collinearity(m)
plot(x)
\dontshow{\}) # examplesIf}
}
\references{
\itemize{
\item Fox, J., & Monette, G. (1992). Generalized Collinearity Diagnostics.
Journal of the American Statistical Association, 87(417), 178–183.
\item Francoeur, R. B. (2013). Could Sequential Residual Centering Resolve
Low Sensitivity in Moderated Regression? Simulations and Cancer Symptom
Clusters. Open Journal of Statistics, 03(06), 24-44.
\item James, G., Witten, D., Hastie, T., and Tibshirani, R. (eds.). (2013). An
introduction to statistical learning: with applications in R. New York:
Springer.
\item Kim, Y., & Jung, G. (2024). Understanding linear interaction analysis with
causal graphs. British Journal of Mathematical and Statistical Psychology,
00, 1–14.
\item Marcoulides, K. M., and Raykov, T. (2019). Evaluation of Variance
Inflation Factors in Regression Models Using Latent Variable Modeling
Methods. Educational and Psychological Measurement, 79(5), 874–882.
\item McElreath, R. (2020). Statistical rethinking: A Bayesian course with
examples in R and Stan. 2nd edition. Chapman and Hall/CRC.
\item Vanhove, J. (2021) Collinearity Isn’t a Disease That Needs Curing.
Meta-Psychology, 5. \doi{10.15626/MP.2021.2548}
\item Zuur AF, Ieno EN, Elphick CS. A protocol for data exploration to avoid
common statistical problems: Data exploration. Methods in Ecology and
Evolution (2010) 1:3–14.
}
}
\seealso{
\code{\link[see:plot.see_check_collinearity]{see::plot.see_check_collinearity()}} for options to customize the plot.

Other functions to check model assumptions and and assess model quality: 
\code{\link{check_autocorrelation}()},
\code{\link{check_convergence}()},
\code{\link{check_heteroscedasticity}()},
\code{\link{check_homogeneity}()},
\code{\link{check_model}()},
\code{\link{check_outliers}()},
\code{\link{check_overdispersion}()},
\code{\link{check_predictions}()},
\code{\link{check_singularity}()},
\code{\link{check_zeroinflation}()}
}
\concept{functions to check model assumptions and and assess model quality}
