LCA Mathematical Model

Latent class analysis relies on a contingency table created by cross-tabulating all indicators of the latent class variable. Suppose we estimate a latent class model with nc classes from a set of M dichotomous items. Suppose also that we include in the model a covariate denoted X which may be either continuous or dichotomous (0 or 1 coded). Let the vector Yi = (Yi1,…,YiM) represent individual i‘s responses to the M items, where the possible values of Yim are 1, …, rm. Let Li = 1,2,…,nc be the latent class membership of individual i, and let I(y = k) be the indicator function; that is, a function that equals 1 if y equals k, and 0 otherwise. Suppose we let the last class be the reference class. Let Xi represent the value of the covariate for individual i; the covariate may be related to the probability, γ, of membership in each latent class, but is assumed to be otherwise unrelated to Yi. Then the contribution by individual i to the likelihood is

mathematical equation

The β parameters are the coefficients in logistic regressions using the covariate X to predict latent class membership. The γ parameters can be expressed as functions of the β parameters as follows:

mathematical equation

for = 1,… ,nc. Note that the last two expressions on the right are equal because we assume that the last (i.e., the ncth) class is used as the reference class. The reference class has its βs constrained to zero, because the relative probabilities of being in the other classes are being compared to the probability of this reference class. It is necessary to choose one class and set its βs to zero for the sake of model identifiability, because of the natural constraint that the probabilities for all classes must sum to one for each individual. The choice of reference class does not affect the final fitted probability estimates for any individual or class. This model allows us to estimate the log odds that individual i falls in latent class relative to the baseline class. For example, if class 2 is the reference class, then the log odds of membership in class 1 relative to class 2 for an individual with value on the covariate is

mathematical equation

Exponentiated β parameters are odds ratios, reflecting the increase in odds of class membership (relative to reference class nc) corresponding to a one-unit increase in the covariate. Note that multiple covariates can be included simultaneously, just as in logistic regression.