The Causal Toolbox:

A collection of programs for testing or exploring causal relationships

 

 

 

 

Bill Shipley

Département de biologie

Université de Sherbrooke

Sherbrooke (Québec) J1K 2R1

CANADA

Bill.Shipley@USherbrooke.ca

 

 www.callisto.si.usherb.ca:8080/bshipley


This manual describes how to use a series of DOS programs that have been written to accompany the book Cause and Correlation in Biology.  A User’s Guide to Path Analysis, Structural Equations and Causal Inference[1].  If you find any errors in these programs then please contact me.  No other guarantees are given and I accept no responsibility for analyses done using these programs.  The theory and interpretation of the methods that are implemented in these programs are described in the book; the purpose of this manual is to describe how to use the programs.

This set of programs is distributed as a self-extracting zip file.  Just click here.

 

______________________________________________________________________

These programs are free and should be referenced to the book.

______________________________________________________________________

 

How to run a program

 

All of these programs are compiled to run in a DOS environment but can be used from within a WINDOWS environment.  Every program in this Toolbox requires a DOS extender program called DOS4GW.EXE.  DOS4GW.EXE must be in the same directory as the program that you are using.  The easiest way is to create a directory (you can call it Toolbox) and copy all of the programs, including DOS4GW.EXE into this directory.

There are different ways to start a program.  You can simply double-click on the program from within the WINDOWS EXPLORER or you can open a DOS window (START, then PROGRAMS, then choose the MS-DOS prompt) and then type the name of the program at the DOS prompt (usually the > symbol).

All programs are limited to a maximum of 30 variables.  If raw data files are to be used, then some programs also have a limit of 500 lines; in such cases you can enter a covariance matrix based on an unlimited number of cases.


 

 

Program

Purpose

Pages

DGRAPH.EXE

To test an a priori path model

3 - 6

PCOR.EXE

To calculate partial correlations of various orders and associated probability levels

7 - 8

PERMR.EXE

To test for independence between variables using non-parametric permutation methods

9 - 10

RANKDATA.EXE

To convert data to ranks; output can be used in other programs for Spearman rank correlations

10

MXC2.EXE

To determine probability levels of the maximum likelihood chi-square statistic at small sample sizes

11

MULTICOV.EXE

To obtain covariance matrices from hierarchical data

12 - 13

EPA2.EXE

To conduct an exploratory path analysis

14 - 16

TETRAD.EXE

To test if a set of four variables implies vanishing tetrad differences

17 - 18

IDEN.EXE

To determine if a latent variable measurement model is structurally identified

19

DSEP.EXE

To determine if variables in a directed acyclic graph are d-separated given conditioning sets

20

 


DGRAPH

 

 

Reference: Chapter 3 (d-sep tests)

 

Input data set

 

The input data set must consist only of numbers.  No variable names, no formatting information and no missing data symbols (and therefore no missing data!) are allowed.  It must be a simple ASCII file but this type of file can be obtained from almost any word processor (text file) or a program like Microsoft EXCEL.  The file does not have to be formatted but there must be at least one blank space separating numbers.  We will use the data set called DGRAPH.DAT.  This data set consists of 100 lines and 4 variables.  The first four lines of this file are:

0.4669 1.1607 0.8334 0.8416

-0.6501 -0.0008 1.1535 0.3417

-0.1789 -0.8183 0.188 -0.1491

-0.3755 -2.2529 -1.6236 -0.7079

 

Start DGRAPH.EXE.  You will be asked if you want information on using the program.  Type yes if you do, otherwise no.  Type yes.  You will see that you can either enter a raw data file, such as DGRAPH.EXE or a complete covariance matrix.  In the present case this would be a 4X4 matrix (at least one blank space separating numbers) since there are 4 variables.

Next, you are asked to give the name of the output file.  This is the file to which the full results will be written (it will also be a simple text file and can be imported into any word processor).   The basic results are also echoed to the computer screen but the output file will hold a permanent copy of the results and contains some additional information.  You must obey the file naming rules of DOS (maximum of 8 characters beginning with a letter and (optionally) a dot followed by a three letter extension.  If you want the file to be saved to a directory other than your current directory then you must also give the directory name as well.  For instance, the following are valid output file names:

DGRAPH.OUT

Output

C:\newdir\myname.r1.

Type dgraph.out; this will be our output file (be careful not to call it dgraph.dat or you will overwrite the data file!).

You can then type a title (80 characters maximum) or else simply hit ENTER.  Type THIS IS AN EXAMPLE OF DGRAPH.

You must then tell the program if you will use a raw data file (type 1) or a covariance matrix (type 2); if the latter then you will also be asked to give the number of cases associated with the covariance matrix.  In this case, type 1 since we will use DGRAPH.DAT.

You must then tell the program how many columns (variables) are in the file.  You don’t have to actually use all of them if you only want to use a subset.  In this case, type 4.

Now you have to give the file name of your input data file including directory information if your data file is not in your default directory.  Assuming that DGRAPH.DAT is in your default directory, type DGRAPH.DAT.

Now you have to tell the program what your path diagram (directed graph) looks like.  We will use the following path model: X1àX2àX3àX4.    The symbol “>” means “causes”.  Entering “1” <ENTER>, “>” <ENTER>, “2” <ENTER> means that the variable in column 1 causes the variable in column 2; if there are n variables in the file (maximum of 30) then you can enter any number between 1 and n.  Here is how you will enter our model:

1  <ENTER>

>  <ENTER>

2 <ENTER>

2 <ENTER>

> <ENTER>

3 <ENTER>

3 <ENTER>

> <ENTER>

4 <ENTER>

-1 <ENTER> 

 

The –1 tells the program that you have finished entering the path model.

You can now exclude some variables if you don’t want to use them in the analysis; you would type in the number of variables to exclude and then the column numbers of these variables.  In our case we will use all four variable in DGRAPH.EXE, so type 0.

The program prints out your path model.  Check to see that you have properly entered it, and then type <ENTER>.

The program prints the basic results to the screen.  It then asks if you want to continue. If so, type 1 and you will then be able to either enter a new model or input a completely new data set.  We will test another (incorrect) model using the same data so type 1 and then, at the next prompt, 2.

After entering a new title, enter this model: X1àX4àX2àX3 using the instructions already explained.  You should find a X2 value of 47.59 (6 df) and a probability less than 0.000005.

To quit, type 0 at the prompt.  The results are now stored in our output file.  Here is an example output file (my comments to the output are written in bold italic):

 

THIS IS AN EXAMPLE OF DGRAPH                                                   

 _______________

 DGRAPH PROGRAM

 BY BILL SHIPLEY

 bshipley@courrier.usherb.ca

 _______________

 Directed graph:

  1--> 2

  2--> 3

  3--> 4

 Covariance matrix based on     100 observations:

      1.119       0.527       0.165       0.194

      0.527       1.012       0.451       0.270

      0.165       0.451       1.047       0.411

      0.194       0.270       0.411       0.831

 

 

 

 

 

   X  Y | Conditioning set(Q)  The following is the basis set

 

 

 Independence statement   1  Q is the conditioning set

 X= 1 Y= 3 |Q= 2

 r(X,Y|Q)=-0.0822 probability= 0.4192

 Independence statement   2

 X= 1 Y= 4 |Q= 3

 r(X,Y|Q)= 0.1509 probability= 0.1362

 Independence statement   3

 X= 2 Y= 4 |Q= 1 3

 r(X,Y|Q)= 0.0606 probability= 0.5544

 

 

 

  Overall test of the directed graph

  assuming the data were generated by

  the specified causal process:

 Chi-square value =    6.91

   Degrees of freedom =    6

  Probability = 0.32968

 

 

 

 Here is an incorrect model                                                     

 _______________

 DGRAPH PROGRAM

 BY BILL SHIPLEY

 bshipley@courrier.usherb.ca

 _______________

 Directed graph:

  1--> 4

  2--> 3

  4--> 2

 Covariance matrix based on     100 observations:

      1.119       0.527       0.165       0.194

      0.527       1.012       0.451       0.270

      0.165       0.451       1.047       0.411

      0.194       0.270       0.411       0.831

 

 

 

 

 

   X  Y | Conditioning set(Q)

 

 

 Independence statement   1

 X= 1 Y= 2 |Q= 4

 r(X,Y|Q)= 0.4659 probability= 0.0000

 Independence statement   2

 X= 1 Y= 3 |Q= 2

 r(X,Y|Q)=-0.0822 probability= 0.4192

 Independence statement   3

 X= 3 Y= 4 |Q= 1 2

 r(X,Y|Q)= 0.3708 probability= 0.0001

 

 

 

  Overall test of the directed graph

  assuming the data were generated by

  the specified causal process:

 Chi-square value =   47.59

   Degrees of freedom =    6

  Probability = 0.00000

 


PCOR

 

Reference: Chapter 3

 

This program calculates and tests for correlations of different orders.  If the data are normally distributed and linearly related then the tests are Pearson (partial) correlations.  If the data have been ranked (see RANKDATA) then the tests are Spearman (partial) correlations.  Input is done in the same way as in DGRAPH and you can enter either a raw data file or a covariance matrix.

We will use the raw data file called PCOR.DAT with 100 observations of 3 variables generated from the following equations:

 

X=N(0,1)

Y=0.5*X+N(0,0.75)

Z=0.5*X+N(0,0.75)

Thus, X and Z have a population correlation coefficient of 0.25 but are independent upon conditioning on Z (i.e. the population partial correlation coefficient is zero).

After giving you the resulting covariance matrix, the program asks you to give the first variable in the correlation pair.  We will first test for an association between X and Z so we will enter 1 and then 3.  When we are asked for the number of conditioning variables we enter 0 since we want a zero-order correlation.  The program echoes the partial correlation coefficient (0.3182) and its null probability (0.00117).  Next we tell the program that we wish to continue and that we want to use the same covariance matrix (i.e. we don’t want to read in a new data file or a new covariance matrix).  Now, we will test for a zero partial correlation between X and Z (i.e. variables 1 and 3) conditioned on one conditioning variable Y (i.e. variable 2).  The partial correlation – r(1,3)/2 – is 0.1457 with a null probability of 0.15044.

Here is the output file:

 

 PCOR program

 

 

 

 Covariance matrix, based on     100 observation

  0.871   0.312   0.238

  0.312   0.640   0.340

  0.238   0.340   0.645

 

 

 

  r( 1,  3)/

 r(X,Y)|Q=  0.3182 probability=0.00117

 

 

  r( 1,  3)/  2

 r(X,Y)|Q=  0.1457 probability=0.15044

 

 

 

 

 

 

 

PERMR

Reference: Chapter 3, pp. 79-82

 

This program will test for independence between two variables using permutation techniques as opposed to a linear relationship between them.  Either absolute or partial independence can be tested, depending on the nature of the input variables.  If the original variables are normally distributed then this is equivalent to a test of a Pearson correlation.  If you separately regress each of X and Y on Z and save the residuals of X and Y to a file, then you can use these residuals in PERMR to test for a first-order partial correlation etc.  No distributional assumptions are needed but the measure of association used assumes a linear relationship.

To illustrate we will use the data set PERMR.DAT which consists of 100 observations of 3 variables generated according to:

X=X2(2)

Y=X2(1)

Z=0.5X+N(0,0.75).

Here, X2(2) means a random variable generated from a chi-square distribution with 2 degrees of freedom. Thus, X and Y are independently distributed but X is correlated with Z.

After giving the name for the input and output files as usual, you then give the column numbers for the two variables whose independence you want to test.  We will first test for independence between X and Y (i.e. columns 1 and 2 in PERMR.DAT).  The measure of association is simply the sum of the two variables (X*Y) which, in these data, is 190.24.  We now specify how many permutation iterations we wish to use; we will use 5000 runs.  The more runs you use the more precise the probability estimate but the longer the analysis will take.  On my portable computer this takes less than 2 seconds.  You then must tell the computer whether you want to conduct a 2-tailed test or a 1-tailed test and, if the latter, whether you want the left tail test (i.e. probability of observing less than that in the data) or a right tail test.  We will do a 2-tailed test.  The resulting null probability of independence is 0.437 with a standard error of 0.014; thus the true probability has a 95% chance of being between 0.4644 and 0.40956.  To decrease the confidence interval simply increase the number of iterations.

Now we will test for an association between X and Z (i.e. columns 1 and 3 of PERMR.DAT) again using 5000 iterations and a 2-tailed test.  This time the measure of association is 578.95 and the permutation probability is less than 0.000005.

Here is the output file:

                      PERMR , by Bill Shipley

 bshipley@courrier.usherb.ca

 

 

 

 PERMR.DAT data file with 3 variables                                           

 

 

 

 Based on a sample size of     100

 

 Probability is based on    5000 permutation runs

 

 Two-tailed test

 

 The probability of independence is  0.43700

 95% confidence interval of the estimate is (+/-) 0.01375

 

 PERMR - testing for association between X and Z                                

 

 

 

 Based on a sample size of     100

 

 Probability is based on    5000 permutation runs

 

 Two-tailed test

 

 The probability of independence is  0.00000

 95% confidence interval of the estimate is (+/-) 0.00000

 


 RANKDATA

 

Reference:

 

This program inputs your raw data file and outputs either  the rank transformed values or the covariance matrix of the rank transformed values.  We will use the PERMR.DAT data set whose first two variables follow a very right-skewed distribution.  You can have the instructions printed to the screen.

We will rank-transform variables 1 and 2 and save these ranked data to a file called RANKDATA.OUT.  Note that variables not chosen to be ranked are not saved to the output file.  We therefore tell the program that we want to rank 2 variables and enter the column numbers (1 and 2).  These ranked data can then be used in other programs – for example DGRAPH, EPA2 or PCOR – which use ranked data to calculate Spearman rank (partial) correlations.


MCX2

 

Reference: Chapter 6 pp. 177-178

 

The probability estimate associated with the maximum likelihood chi-square statistic that is given in commercial SEM programs is asymptotic and only becomes accurate at large sizes; the actual size depends on a number of properties of the data.  The MXC2 program is to be used alongside your favourite SEM program.  One you have tested your model and obtained the chi-square value and the degrees of freedom but have a small sample size, MXC2 allows you to estimate the probability at this small sample size, assuming multivariate normality using Monte Carlo techniques.

First, you give the number of degrees of freedom associated (for example 6).  Now you have to choose the number of iterations.  The program runs quite quickly so you can give a fairly high number (say 5000).  You now specify the chi-square value obtained for your model (say 14.32) and your sample size (say 30).  The probability and its 95% confidence interval is output (in this case 0.0344 +/- 0.005052).

Note that the algorithm described on page 178 of the first printing contains errors; the corrections are posted on the ERRATA page of my web site.  Future printings will have the corrections included.


MULTICOV

 

Reference: Chapter 7

 

 

This program calculates the covariances,means, and scaling factor, appropriate for multilevel SEM, given a data set that has hierarchical structure. There must be no missing values in the data file.  There can be no more than 30 variables and no more than 5 levels in the hierarchy.  The form of the data matrix is as follows.  If there are x hierarchical levels, then the first x columns give the group structure.  The following columns give the variables.  For example, if there where 3 levels, 2 variables and 8 observations, the data set might look like the following; note that the highest level is column 1, level 2 is column 2 etc.

1 1 1 0.1 0.2

1 1 2 0.4 0.3

1 2 3 0.5 0.1

1 2 4 0.8 0.4

2 3 5 0.5 0.3

2 3 6 0.2 0.2

2 4 7 0.5 0.4

2 4 8 0.4 0.3

In this data file there are 8 observations.  The first level has two groups (1 and 2), the second level has 4 groups (1,2,3,4) and the last level has 8 "groups" (individual observations). EACH GROUP ID NUMBER MUST BE DIFFERENT; ONE CANNOT HAVE TWO *DIFFERENT* GROUPS WITH THE SAME ID NUMBER.

The following data set would not work properly:

1 1 1 0.1 0.2

1 1 2 0.4 0.3

1 2 1 0.5 0.1

1 2 2 0.8 0.4

2 1 1 0.5 0.3

2 1 2 0.2 0.2

2 2 1 0.5 0.4

2 2 2 0.4 0.3

because you would have two different level-2 groups called group "1".

Note that there must be at least 1 blank between each number in the data file (i.e. free format).  This program was written to permit large data files without huge memory requirements.  This means that large files may take a few minutes to process.  You can save both the nested covariance matrices and  the group means to external data files.

 

You simply give the file names for your input data file and your output data file.  To illustrate, we will enter the above (correct) data file (called MULTICOV.DAT), telling the program that there are 3 levels and 2 variables.  Here is the output file:

 

  MULTILEVEL COVARIANCE MATRICES GENERATED FROM THE FILE

multicov.dat                                                                   

 

 COVARIANCE MATRIX FOR LEVEL  3

 SCALING FACTOR C FROM THIS LEVEL TO THE NEXT LOWER LEVEL IS    4.000

  NUMBER OF GROUPS AT THIS LEVEL:   2

GROUP SIZES ARE EQUAL, C IS EXACT

DF ARE:    1.

    0.00500    -0.00500

   -0.00500     0.00500

 COVARIANCE MATRIX FOR LEVEL  2

 SCALING FACTOR C FROM THIS LEVEL TO THE NEXT LOWER LEVEL IS    2.000

  NUMBER OF GROUPS AT THIS LEVEL:   4

GROUP SIZES ARE EQUAL, C IS EXACT

DF ARE:    2.

    0.08500     0.00500

    0.00500     0.00500

 COVARIANCE MATRIX FOR LEVEL  1

 SCALING FACTOR C FROM THIS LEVEL TO THE NEXT LOWER LEVEL IS    1.000

  NUMBER OF GROUPS AT THIS LEVEL:   8

GROUP SIZES ARE EQUAL, C IS EXACT

DF ARE:    4.

    0.03500     0.02000

          0.02000                0.01500


EPA2

 

 

Reference: Chapter 8

 

 This program does not test for a specified causal model; rather, it takes a series of observations and searches for path models that are consistent with the data at a specified probability level.  It outputs a partially directed graph[2] from which you can obtain all those equivalent models that fit the data.

The input data file has exactly the same properties as described in DGRAPH and input to the program is also exactly the same.  In particular, you can either input a raw data file or a covariance matrix and the results are written to an output file whose name you choose.

We will use the input file called DGRAPH.DAT and write our results to the file called EPA2.OUT.  There are 4 variables in DGRAPH.DAT and we will first use all of these (the program asks you how many of the variables you want to use).  We then give names for each variable that is to be used.  You are next asked to give the highest rejection level to be used by the algorithm.  The algorithm starts at a small value and increases in increments of 0.05, each time searching for models that fit the data at your chosen significance level.  I therefore usually give a high value (eg. 0.9).  This is the value that I will input. You are next asked to give the significance level to be used in testing models.  This is the significance level that you would normally use to test models.  For instance, if you choose a significance value of 0.05 then EPA2 will output all those models that are consistent with the data at this significance level.  This is the level that we will use.

You are next asked if you want to exclude any two variables from sharing an edge.  If, for instance, you have reason to believe that variables 1 and 3 cannot share an edge (i.e. 1 isn’t a direct cause of 3, 3 isn’t a direct cause of 1 and neither 1 nor 3 have a direct common unmeasured cause) then you can instruct EPA2 to take this information into account.  We will exclude variables 1 and 3 from sharing an edge.  Note that after we have entered “1” and then “3” we must enter “-1” to tell EPA2 that we have finished excluding edges.

Next, you must choose which method to use in the search.  You would normally use method 1 but, if you have few variables and few observations, you should choose method 2; this method will take a very long time to complete if you have many variables.  You can also choose to use both methods.  Let’s do this.

Now you can force any two variables to share an edge if you have reason to do this (i.e. if the first is a direct cause of the second, if the second is a direct cause of the first or if the two share a direct common unmeasured cause).  Let’s force variables 1 and 2 to share an edge; remember to input “-1” when finished.

The program now derives the undirected dependency graph starting at the 0.01 level and tests to see if there are any graphs that fit the data at your chosen significance level.  When EPA2 finds such models it prints them to the screen and gives you the choice of stopping or continuing at a higher rejection level.  Here is the output file EPA2.OUT:

 

Exploratory Path Analysis 2, by Bill Shipley

 bshipley@courrier.usherb.ca

 

 

 

 input data is DGRAPH.DAT                                                       

 

 

 

 Analysis based on    100             observations

 Column number = variable name

  1  = v1     

  2  = v2     

  3  = v3     

  4  = v4     

 

 Model based on method 2

 

 v1       o-o v2     

 v2       o-o v3     

 v3       o-o v4     

 

 Model based on method 2

 

 v1       o-> v2     

 v3       o-> v2     

 v3       o-o v4     

 

 Model based on method 2

 

 v1       o-> v2     

 v2       <-> v3     

 v4       o-> v3     

 

 The undirected dependency graph was obtained at a

 rejection level of 0.010

 These models fit at a significance level of 0.050

  __________________________________

 Model based on method 2

 

 v1       o-o v2     

 v2       o-o v3     

 v3       o-o v4     

 

 Model based on method 2

 

 v1       o-> v2     

 v3       o-> v2     

 v3       o-o v4     

 

 Model based on method 2

 

 v1       o-> v2     

 v2       <-> v3     

 v4       o-> v3     

 

 The undirected dependency graph was obtained at a

 rejection level of 0.050

 These models fit at a significance level of 0.050

  __________________________________

 

The following results were obtained when using method 1:

Exploratory Path Analysis 2, by Bill Shipley

 bshipley@courrier.usherb.ca

 

 

 

 dgraph.dat file using method 1                                                 

 

 

 

 Analysis based on    100             observations

 Column number = variable name

  1  = v1     

  2  = v2     

  3  = v3     

  4  = v4     

 

 Model based on method 1

 

 v1       o-> v2     

 v2       <-> v3     

 v4       o-> v3     

 

 The undirected dependency graph was obtained at a

 rejection level of 0.010

 These models fit at a significance level of 0.050

  __________________________________

 

Notice that method 2 found 3 sets of equivalent models that would not have been rejected had we subjected each to DGRAPH while method 1 found only 1 (which was the second of the three sets found by method 2).  In fact the true model that generated these data was v1àv2àv3àv4 which is one of the equivalent models from the first of the three patterns found by method 2.

I should point out a few points about the algorithms used in EPA2.  Method 1 is essentially the same as the SGS algorithm except that the SGS algorithm doesn’t actually compare the data to the partially-directed graph using a statistical test.  To do this EPA2 generates one of the equivalent completely directed graphs and test this model using d-sep tests.  Method 2 instead begins with the undirected dependency graph (UDG), identifies all sets of 3 variables in the UDG that form an unshielded pattern, allows each to be either an unshielded collider or an unshielded non-collider, and generates all possible non-equivalent models from these, testing each against the data using d-sep tests.  This can take a very long time when there are more that a few variables.

To final points: First, this is an exploratory method and any model found will have to be independently tested.  Second, it is possible that there exists no path model that is consistent with the data at any rejection level.


TETRAD

 

Reference: Chapter 8

 

This program is also exploratory in nature.  Furthermore, the statistical test used is asymptotic in nature, meaning that the calculated probability levels are only exact when the sample size at very large sample sizes.  Little is known about the actual asymptotic properties of the test except that larger sample sizes are needed when the correlations between the observed variables involved in the tetrad equation are weak.   See Table 8.5 (p. 286) for some more details.  If the sample size is not large enough then the computed probability will tend to be larger than the true probability and this means that some vanishing tetrad equations that are accepted (i.e. whose null probability is larger than your significance level) may in fact not exist.

Input to TETRAD is exactly the same as in DGRAPH.  We will use the data set called TETRAD.DAT involving four measured variables (X1,X2,X3,X4). This data set is generated from the following structural equations (figure 8.20, page 272):

X1 = 0.5F1 + N(0,0.75)

X2 = 0.5F1 + N(0,0.75)

X3 = 0.5F2 + N(0,0.75)

X4 = 0.5F2 + N(0,0.75)

F2 = 0.5F1 + N(0,0.75).

Thus, there are two latent variables (F1 and F2).  After reading in the input file and giving the name of the output file, we are asked to give the column numbers of the four variables involved in the analysis.  Since there are only four columns in this data set we input 1, 2, 3 and 4.  We then specify our significance level.  Remember that all tetrad equations whose null probability is larger than this value will be retained.  We will enter a value of 0.05.  The results are echoed to the screen as well as being printed to the output file. We can then quit, specify another set of 4 variables or read in a new data set.

Here is the output file:

 

TETRAD, by Bill Shipley

 bshipley@courrier.usherb.ca

 

 

 

 model page 272                                                                 

 

 

 

 Analysis based on    500             observations

 Testing for zero tetrad equations at a significance of.0500

 

The following tetrad equation is zero:

  1* 3* 2* 4

 The null probability is 0.53637

 

 All treks going into the first and third variables or

   into the second and fourth variables of this tetrad

 equation pass through the same choke point.

 

 

In fact, given the causal structure of these data, there is only one vanishing tetrad equation of the three implied by the four variables (see page 272).  We can state that all treks passing into either variables 1 and 2 (the first and third variables in the equation) and/or variables 3 and 4 (the second and fourth variables in the equation) pass through a single variable (the choke point).  In this example F1 is the first choke point (through which all treks between the four variables that are into variables 1 and 2 pass) and F2 (through which all treks between the four variables that are into variables 3 and 4 pass) is a second choke point.

 


IDEN

 

Reference: Chapter 6 pages 169-171.

 

This program tests the identification status of a measurement model using the FC1 rule of Davis.  No empirical data are input to this program.  You first specify the number of observed indicator variables in your measurement model and the number of latent variables and then answer the questions that are posed. 

Consider the model shown in Figure 6.3 (page 170).  There are 6 indicator variables and 2 latents.  There are no indicator variables that are caused by more than one latent.  You now follow the instructions on the screen and answer the questions posed and the program tells you if the measurement model is structurally identified according to the FC1 rule.


DSEP

 

Reference: Chapter 2

 

This program is designed to allow you to test your ability to read d-separation statements from a directed acyclic graph.  No empirical data are read into the program; you simply input the DAG and then pose questions concerning d-separation.

As an example, we will input the following DAG: v1àv2àv3ßV4.  There are four vertices (v1, v2, v3 and v4) and three direct effects.  We enter the three pairs of vertices involved in the direct effects as cause <space> effect :

1 2

2 3

4 3

You now enter d-separation queries of the form: is X d-separated from Y given conditioning set Z?  For instance, to find out if v1 is d-separated from v3 given the conditioning set {v2, v4} we would give the number of the first vertex (1), the number of the second vertex (3), the number of vertices in the conditioning set (2), and then these two numbers (1 and 4).  The program gives you the answer (yes) and you can then pose further d-separation queries.

 



[1] Shipley, B. 2000.  Cause and correlation in biology.  A user’s guide to path analysis, structural equations and causal inference.  Cambridge University Press, Cambridge.  ISBN: 0-521-79153-7.

[2] This is called a partially directed inducing path graph

 

Hit Counter