A More Formal Presentation of Factor Analysis

In a previous post, a simplified overview of Factor Analysis was provided. In this one, I would like to present a more formal description of it.  

Note: You can download a pdf version of this article using the link provided below this post. 

In factor analysis, we have a set of observed variables, X_1,X_1,…,X_p, which we believe are linear combinations of underlying unobserved factors, F_1,F_2,…,F_m, plus some error term, ɛ. The error term accounts for all other unobserved variability in the X variables.


Mathematically, for each observed variable, we express it as:

X_i=a_i1 F_1+ a_i2 F_2+a_im F_m+ ɛ_i,i=1,2,...,p

Where: 
X_i = Observed variable
F_j = Unobserved latent factors
a_ij = Factor loadings of factor j on variable i
ɛ_i = Unique variance (error term) associated with variable 
p = Total number of observed variables
m = Number of factors

Observed Variables (X_i): The actual data you have—like scores on a test, economic indicators, etc.

Unobserved Latent Factors (F_j): Variables not directly observed but inferred from the mathematical model. These capture the underlying processes or constructs that might explain the patterns in your data.

Factor Loadings (a_ij): These are the coefficients which indicate the degree to which each X_i is influenced by each F_j. Factor loadings are akin to weights, showing the impact of factors on the observed variables.

Error Term (ɛ_i): This represents the portion of variability in X_i that cannot be explained by the common factors.

Factor Extraction and Retention:
Factor extraction typically begins with calculating the covariance or correlation matrix of the observed variables. Eigenvectors and eigenvalues of this matrix are computed to determine the factors and how much variance each factor explains. The number of factors to retain might be determined using various criteria, such as Kaiser’s criterion (eigenvalue > 1) or the scree plot method.

Example in Econometrics:
Imagine an economist is looking at variables like GDP growth, unemployment rate, inflation rate, and interest rates (X_1,X_1,…,X_p) and hypothesizes that they are influenced by underlying, unobserved factors like economic stability and monetary policy (F_1,F_2).

GDP Growth=a_11 F_1+a_12 F_2+ɛ_1  
Unemployment Rate=a_21 F_1+a_12 F_2+ɛ_2
Inflation Rate=a_31 F_1+a_32 F_2+ɛ_3
Interest Rate=a_41 F_1+a_42 F_2+ɛ_4

The task is then to use the observed data to estimate the factor loadings (a_ij) and deduce the nature of the latent factors. This often involves rotation methods to make the solution more interpretable.

Factor analysis provides a robust technique for economists, and others, to explore and understand the dimensional structure of observed variables, providing insight into unseen influences in their data. While the math can be complex, the foundational understanding remains rooted in identifying and understanding these underlying, latent factors.

If you are keen to read more on factor analysis, please check out these books: 
Factor Analysis: Statistical Methods and Practical Issues by Jae-On Kim and Charles W. Mueller. This book provides a comprehensive overview of factor analysis and is widely respected in the field.

Applied Multivariate Statistical Analysis by Richard A. Johnson and Dean W. Wichern. This book covers a variety of multivariate techniques, including factor analysis, and is quite accessible for various skill levels.

A Handbook of Statistical Analyses using SPSS by Sabine Landau and Brian S. Everitt. Though software-specific, this handbook presents the application of numerous statistical methods, including factor analysis, using practical examples.

Sorry for the poor presentation of variables and equations. It is because Blogger does not support math language the way we type them in Microsoft Word or LaTeX. If you would like to see and read the equations and variables in a nice and neat shape, please download a pdf version of this article using the following link: A More Formal Presentation of Factor Analysis

No comments:

Post a Comment