Instructions for using create_MMprime_matrix.f90

 

This program is used for calculating the MM’ input matrix for GEM.  For large problems, thousands of individuals and tens of thousands of SNP, packages like R have problem calculating this matrix.  This program uses ~ n_ind*n_snp bytes to store the data and then uses n_ind2*8 to store the matrix.  For 5000 individuals and 20,000 SNP this should be ~300Mb.  There is some additional overhead but the problem should be smaller than .5Gb.  For large problems there are many calculations involved and do not expect the program to be done quickly.  It only calculates the lower half of the matrix and therefore becomes faster when you get deeper into your list of individuals.  It will print out a message every 100 individuals to tell you that it is still working.

 

 

You can download the following packs:

1)      Compiled version for the use on a WindowsXP machine.

2)      Compiled version for the use on a machine with AMD Celeron processors using RedHat Enterprises Linux v5 (Intel Fortran version)

3)      Source code.

4)      Example data.

 

WindowsXp usage:

1)      Open a DOS window (click on start, click on run, enter cmd in the box and click OK)

2)      Change to the directory where you have a copy of create_MMprime_matrix.exe.  Usually cd \My Documents\Bert\... etc.

3)      Issue the command create_MMprime_matrix.exe

4)      enter the name of your input file (in the example this is input_ex.txt)

 

Linux usage:

1)      Open a terminal window

2)      Change to the directory you want to be in.

3)      issue the command ./create_MMprime_matrix.exe

4)      enter the name of your input file (in the example this is input_ex.txt)

 

To compile your program use your favorite Fortran 95 compiler.  We have succesfully compiled the program using:

1)      Compaq Visual Fortran on a WindowsXP machine

2)      Intel Fortran Compiler 10.1, on a machine with AMD Celeron processors using RedHat Enterprises Linux v5

3)      Absoft 64-bit Fortran 95 9.0 with Service Pack 1, on a machine with AMD Celeron processors using RedHat Enterprises Linux v5

4)      gfortran the GNU compiler, on a machine with AMD Celeron processors using RedHat Enterprises Linux v5

 

Input Files:

 

There are two files that have to be provided:

·        File with some essential information to run the program:

1)      location and name for the log-file (if you put the name in "" you can use blanks and odd characters in this as well)

2)      location and name for the file with the SNP data (if you put the name in "" you can use blanks and odd characters in this as well)

3)      location and name to write the MMprime matrix to (if you put the name in "" you can use blanks and odd characters in this as well)

4)      number of individuals in the dataset specified in 2)

5)      number of SNP in the dataset specified in 2)

 

 

·        The file with the SNP data.  This file can be space, comma, or tab delimited.  The columns are:

1)      identifier for the individual

2)      sex of the individual

3)      Dx status of the individual

4)      group to which the individual is assigned

   4+1)---4+n_snp) SNP calls, one column per SNP (0 = 1/1, 1 = 1/2, 2 = 2/2, 3=0/0 (missing))

 

Example

 

Included is a small example.  Here we deal with 28 individuals and 52 SNP.  Input data is in input_ex.txt and ancestry_ex.txt.  While the output you are supposed to get should look like MMp_ex.txt and mmp_ex_log.txt.

 

 

The output file specified in A-4) is in the format that is used for GEM.

 

If you have questions please let me know at kleil at upmc dot edu

 

Bert Klei

Computational Genetics

WPIC-UPMC