Introduction One of the most subjective decisions made when using ordination techniques is the number of components to retain for interpretation. Generally, the researcher subjectively decides, but more objective methods are described below. Based on a literature review of the use of factor analysis and principal components analysis (Franklin et al. 1995, J. Veg. Sci. 6:99-106), more objective methods are necessary. The parallel analysis (PA) must match the number of samples (n) and number of variables (p) in your real data set (i.e. that data collected from the field). Further, the PA must match the type of analysis (e.g. factor analysis or principle component analysis), and must correspond to the type of matrix being decomposed (i.e. correlation of covariance). By setting criteria for the interpretation of ordination results, a researcher can objectively decrease the number of axes for interpretation and number of variables of interest , thus facilitating interpretation. Decomposing a correlation matrix: The program LONGMAN.SAS given below is the easiest program for interpreting the number of components to retain when decomposing a correlation matrix. Simply change the N and P values (on the first line after explanation text only) to match your field data set and run the program. The values derived from the analysis are the 95th percentile eigenvalues for each component based on the regression equations of Longman et al. 1989 (Multi. Behav. Res. 24:59-69). The regression equations are based on a Monte Carlo analysis. Eigenvalues from the analysis of the field data that are greater than the PA eigenvalues (comparing each axis separately - PA eigenvalue for axis one versus real eigenvalue for axis one, PA eigenvalue for axis two versus real eigenvalue for axis two, etc.) are significant at the 0.05 level, and should be retained for interpretation. The analysis should be rerun using the appropriate number of components. Decomposing of covariance matrix: The program COVRPARA.SAS given below will produce a pseudorandom matrix based on the means and standard deviations of the real data. Here, each variable in the analysis contributes to the distributions and variance of the pseudorandom matrix. Insert the standard deviations and means where appropriate and add or subtract variables as necessary. In this analysis, the program performs the Monte Carlo analysis, the number of which can be set as K (99 or 999 are recommended). Change appropriately the X1-X?? for number of variables, and the J size for the number of samples. Make sure the analysis used (e.g. factor analysis with rotations) on the collected data is the same as on the randomized data set created in this program. The results derived are the maximum eigenvalues from all the K analyses. Eigenvalues retained for interpretation should be greater than these eigenvalues. The analysis should be rerun using the appropriate number of components. Determining Significant loadings: When decomposing a correlation matrix, use the program LOADINGS.SAS. When decomposing a covariance matrix, use the program COVRPARA.SAS (see above). This program creates a random data set with the same n and p size as the collected data set and should be subjected to the same analysis (e.g. factor analysis with rotations). This step follows the determination of the number of components to retain, and this number (?) should be set in the PROC FACTOR statement as N=?. Set the number of samples (yyy), variables (www), and randomizations (zzz) in all areas and run. The results give you the maximum loadings from all the data sets analyzed. To determine the 95th percentile, multiply the number of the components by the number of variables (3 * 16 = 48), then multiply that number by 0.05 (48 * 0.05 = 3 with rounding). Thus, the third highest loading from the maximums shown gives the 95th percentile cutoff, and only loadings greater than this value should be interpreted. This program is written for a factor analysis. The univariate procedures need to be modified for any other types of analyses, so that the univariate analysis is performed on the appropriate variable.
TITLE 'PROGRAM TO DECOMPOSE A CORRELATION MATRIX - THE LONGMAN METHOD'; OPTIONS LS=73 NOCENTER; DATA LONGMAN; * THIS PROGRAM PRODUCES ESTIMATES OF THE 95TH PERCENTILE EIGENVALUES FROM A PARALLEL ANALYSIS, USING THE WORK OF LONGMAN ET AL. (1989). CHANGE THE VALUES OF N AND P TO THOSE OF YOUR DATA MATRIX. THIS PROGRAM SHOULD ONLY BE USED WHEN DECOMPOSING A CORRELATION MATRIX; N=36; P=13; * N = SAMPLE SIZE, P = NO. OF VARIABLES. ; LN = LOG (N); LP = LOG(P); LEIG1 = 0.0316*LN +0.7611*LP -0.0979 *(LN*LP) -0.3138; LAM1 =EXP (LEIG1); LEIG2 = 0.1162*LN +0.8613*LP -0.1122 *(LN*LP) -0.9281; LAM2 =EXP (LEIG2); LEIG3 = 0.1835*LN +0.9436*LP -0.1237 *(LN*LP) -1.4173; LAM3 =EXP (LEIG3); LEIG4 = 0.2578*LN +1.0636*LP -0.1388 *(LN*LP) -1.9976; LAM4 =EXP (LEIG4); LEIG5 = 0.3171*LN +1.1370*LP -0.1494 *(LN*LP) -2.4200; LAM5 =EXP (LEIG5); LEIG6 = 0.3809*LN +1.2213*LP -0.1619 *(LN*LP) -2.8644; LAM6 =EXP (LEIG6); LEIG7 = 0.4492*LN +1.3111*LP -0.1751 *(LN*LP) -3.3392; LAM7 =EXP (LEIG7); LEIG8 = 0.5309*LN +1.4265*LP -0.1925 *(LN*LP) -3.8950; LAM8 =EXP (LEIG8); LEIG9 = 0.5734*LN +1.4818*LP -0.1986 *(LN*LP) -4.2420; LAM9 =EXP (LEIG9); LEIG10= 0.6460*LN +1.5802*LP -0.2134 *(LN*LP) -4.7384; LAM10=EXP(LEIG10); PROC PRINT; VAR N P LAM1-LAM10; RUN;
TITLE ' PARALLEL ANALYSIS DECOMPOSING A COVARIANCE MATRIX'; *NOTE: THIS USES MEANS AND STANDARD DEVIATIONS FROM FIELD DATA; DATA ONE; OPTIONS LS=73; DO K = 1 TO 100;*SET NUMBER OF PERMUTATIONS; DO J = 1 TO 30; * GIVE NUMBER OF SAMPLES IN DATA SET; *** THE FOLLOWING FUNCTIONS CREATE PSEUDORANDOM DATA SETS USING THE MEANS AND STANDARD DEVIATIONS FROM THE REAL DATA; x1 = normal(0)*24 + 21.43; x2 = normal(0)*0.5 + 0.345; x3 = normal(0)*1.8 + 0.3; x4 = normal(0)*18 + 7.93; x5 = normal(0)*2.5 + 1.16; x6 = normal(0)*10 + 6.1; x7 = normal(0)*0.3 + 0.03; x8 = normal(0)*0.6+ 0.287; x9 = normal(0)*40 + 40.97; x10 = normal(0)*0.2 + 0.043; output; end; end; RUN; DATA TWO; SET ONE; PROC FACTOR COV METHOD=PRINCIPAL N=6 OUTSTAT=RESULTS;VAR X1-X28;BY K; DATA FINAL; SET RESULTS; IF _TYPE_ = 'EIGENVAL'; OPTIONS LS=73; PROC UNIVARIATE NOPRINT; VAR X1-X10; OUTPUT OUT=EIGEN MAX=MAXE1-MAXE10; PROC PRINT; RUN; DATA FINAL2; SET RESULTS; IF _TYPE_ = 'PATTERN' AND _NAME_ = 'FACTOR1'; OPTIONS LS=73; PROC UNIVARIATE NOPRINT; VAR X1-X10; OUTPUT OUT=PATTERN MAX=MAXP1-MAXP10; PROC PRINT; RUN;
TITLE 'THIS PROGRAM WILL GENERATE SIGNIFICANT LOADINGS'; * THIS PROGRAM SHOULD ONLY BE USED WHEN DECOMPOSING A CORRELATION MATRIX; OPTIONS LS=73; DATA ONE; * GENERALIZED PARALLEL FACTOR ANALYSIS PROCEDURE: 1. NUMBER OF VARIABLES IS SET WITH THE WWW INDEX VALUE, 2. NUMBER OF OBSERVATIONS IS SET WITH THE YYY INDEX FOR J IN THE FIRST DO STATEMENT AND IN THE VAR STATEMENT, 3. NUMBER OF ANALYSES IS SET WITH THE ZZZ INDEX, 4. LASTLY, PERFORM THE SAME FACTOR ANALYSIS ON THE SIMULATED DATQA MATRIX THAT YOU PERFORMED ON THE ACTUAL DATA MATRIX.; ******WARNING - THIS PROGRAM GENERATES A BIG LISTING****** ; ARRAY X (I) X1-X16; * SET THE NUMBER OF VARIABLES (WWW); DO K = 1 TO 50; * SET THE NUMBER OF ANALYSES (ZZZ); DO J = 1 TO 133; * SET THE SAMPLE SIZE (YYY); DO OVER X; X = NORMAL (0); END; OUTPUT; END; END; RUN; DATA TWO; SET ONE; * SET THE NUMBER OF FACTORS WITH THE N = PARAMETER; PROC FACTOR COV METHOD=PRINCIPAL N=3 OUTSTAT=RESULTS; VAR X1-X16; BY K; RUN; DATA FINAL; SET RESULTS; IF _TYPE_ = 'EIGENVAL'; OPTIONS LS=73; PROC UNIVARIATE NOPRINT; VAR X1-X16; OUTPUT OUT=EIGEN MAX=MAXE1-MAXE10; PROC PRINT; RUN; DATA FINAL2; SET RESULTS; IF _TYPE_ = 'PATTERN' AND _NAME_ = 'FACTOR1'; OPTIONS LS=73; PROC UNIVARIATE NOPRINT; VAR X1-X16; OUTPUT OUT=PATTERN MAX=MAXP1-MAXP16; PROC PRINT; RUN;
All contents of this and following web
pages is copyrighted
©1996, Southern Illinois University at Carbondale.
Last revised: 10 Sept, 2003 by DJG.