Table of Contents

Beginning
Driver File
Data File
Covariate File (optional)

Beginning

To correctly run the programs you will need the following additional files:

  1. a "driver" file
  2. a "data" file
  3. a "covariate" file (optional)
To run the program, change directories to the directory in which the program resides. Type R at the command prompt.
Once R has started, type the following command at the R command prompt:

source("gc.r")

You will then be asked to supply the name of the driver file.

Driver file

The driver file enables the user to supply information to the program for processing. The driver file has the following general format:
DATAFILE=
COVFILE=
MODEL=
NULLLOCI=
MODELTYPE=
ALPHA=
COVLABEL=
OUTMODEL=
OUTNULL=
TESTTYPE=


The driver file has a specific format which must be followed closely to ensure proper execution of the program:
   -> Each argument must occupy only a single line in the file.
   -> Make sure that there are no blank lines at the end of the file.
   -> The arguments listed above are case-sensitive (all caps).
   -> Avoid using unnecessary blank spaces in the file.

The arguments to the file are:
DATAFILE= The name of the data file. (required).
COVFILE= The name of the covariate file. (optional).
MODEL= list of models to test. The models for testing are specified using integers relating to the position of the loci in the data file. Two types of models can be specified: "marginal models" (Y = beta_0 + beta_1 X_1 + error) and "interaction models" (Y = beta_0 + beta_1 X_1 + beta_2 X_2 + beta_3 X_1*X_2 + error). To fit marginal models, the loci under investigation can be indicated either by a string of integers, separated by commas, or by the "-" character for a consecutive string of integers (e.g., 1,5,20-40). Interaction models are specified by using the ":" character, and are comma delimited (e.g., 1:2,1:3,2:16). You may also use the "~" character to specify all pairwise models between two integers. For example, 4~6 is synonymous with 4:5,4:6,5:6. This shortcut makes it simple to specify that all possible interactions should be tested. You may only request either all marginal models or all interaction models for each run of the program. (required).
NULLLOCI= The null loci to use in estimation of inflaction factor(s), separated by commas. Sequences of null loci may be specified using the "-" character. If this argument is not supplied for either program, then ALL loci will be used in the estimation of the inflation factor(s) by default. If NULLLOCI is 0, then the inflation factor is assumed to be 1.
MODELTYPE= Response Type: =0 for binomial, =1 for gaussian. (required).
ALPHA= Alpha Level (default=0.05)
COVLABEL= The covariate file where labels for the covariates can be found, separated by commas (required if covariate file is supplied).
OUTMODEL= An output filen to dump model output and summary statistics (required).
OUTNULL= An output file to dump results from models used in estimating the inflation factor(s). This is ignored it NULLLOCI = 0. (optional)
TESTTYPE = Which statistical test to use to calculate p-values. If 1, p-values are adjusted for uncertainty in the estimated effect of substructure. (This is equivalent to the old GCF program.) If 0, p-values are not adjusted. (This is equivalent to the old GC program.) The default is 0.



An example driver file for the GC program:

DATAFILE=data.txt
COVFILE=cov.txt
MODEL=1~4,5:6
NULLLOCI=3-7
MODELTYPE=0
ALPHA=0.05
COVLABEL=bp,prev
OUTMODEL=modelout.txt
OUTNULL=nullout.txt

This file will tell the GC program that the data reside in the file data.txt, that the covariates reside in the file cov.txt, and to test the interaction models 1:2, 1:3, 1:4, 2:3, 2:4, 3:4, and 5:6. It will use the third through seventh loci in the data file for estimation of the inflation factor(s). It specifies that the response variable is binomial, the alpha level is 0.05, and the covariate labels are "bp" and "prev".

Another example driver file:

DATAFILE=data.txt
COVFILE=cov.txt
MODEL=1~4,5:6
MODELTYPE=0
ALPHA=0.05
COVLABEL=bp,prev
OUTMODEL=modelout.txt
OUTNULL=nullout.txt

This will run the GC program on the same data using all loci to estimate the inflation factors, the driver file would look like (notice the omission of the NULLLOCI= argument).

Data file

This file specifies the genotype data, which is organized in an abbreviated version of the "pre-linkage" format('0' means missing). Traditional pre-linkage information about family id and parents is excluded. because it is implicitly assumed that the population-based sample does not contain relatives. The first column in the data file must be the subject ID. The second column must be the values of the response variable. This is followed by allele 1 of genotype 1, allele 2 of genotype 1, allele 1 of genotype 2, etc. Use spaces to separate the columns in the file. An example data file is:

0001 1 1 1 1 2 1 2 1 1 1 2 ... 2 2 2 2
0002 1 1 2 1 2 1 1 1 2 1 2 ... 1 1 1 1
...
0027 1 2 2 1 2 1 1 1 2 1 2 ... 1 1 1 2
0028 0 1 1 1 2 1 1 1 2 1 1 ... 1 1 1 2

Covariate file (optional)

The first column in the covariate file must be the subject ID. The remaiming columns are the covariates. The columns must be separated by spaces. The number of rows in the covariate file must equal the number of rows in the data file. An example covariate file would look like:

0001 87 1
0002 23 1
...
0027 35 2
0028 77 1