Instructions for using
create_MMprime_matrix.f90
This
program is used for calculating the MM’ input matrix for GEM. For large problems, thousands of individuals
and tens of thousands of SNP, packages like R have problem calculating this
matrix. This program uses ~ n_ind*n_snp
bytes to store the data and then uses n_ind2*8 to store the matrix. For 5000 individuals and 20,000 SNP this
should be ~300Mb. There is some
additional overhead but the problem should be smaller than .5Gb. For large problems there are many
calculations involved and do not expect the program to be done quickly. It only calculates the lower half of the
matrix and therefore becomes faster when you get deeper into your list of
individuals. It will print out a message
every 100 individuals to tell you that it is still working.
You can download the
following packs:
1)
Compiled version for the use on a WindowsXP machine.
3)
Source code.
4)
Example data.
WindowsXp usage:
1)
Open
a DOS window (click on start, click on run, enter cmd in the box and click OK)
2)
Change
to the directory where you have a copy of create_MMprime_matrix.exe. Usually cd \My Documents\Bert\... etc.
3)
Issue
the command create_MMprime_matrix.exe
4)
enter
the name of your input file (in the example this is input_ex.txt)
Linux usage:
1)
Open
a terminal window
2)
Change
to the directory you want to be in.
3)
issue
the command ./create_MMprime_matrix.exe
4)
enter
the name of your input file (in the example this is input_ex.txt)
To compile your program
use your favorite Fortran 95 compiler. We have
succesfully compiled the program using:
1)
Compaq
Visual Fortran on a WindowsXP machine
2)
Intel
Fortran Compiler 10.1, on a machine with AMD Celeron processors using RedHat
Enterprises Linux v5
3)
Absoft
64-bit Fortran 95 9.0 with Service Pack 1, on a machine with AMD Celeron
processors using RedHat Enterprises Linux v5
4)
gfortran
the GNU compiler, on a machine with AMD Celeron processors using RedHat
Enterprises Linux v5
Input Files:
There
are two files that have to be provided:
·
File
with some essential information to run the program:
1)
location
and name for the log-file (if you put the name in "" you can use
blanks and odd characters in this as well)
2)
location
and name for the file with the SNP data (if you put the name in ""
you can use blanks and odd characters in this as well)
3)
location
and name to write the MMprime matrix to (if you put the name in ""
you can use blanks and odd characters in this as well)
4)
number
of individuals in the dataset specified in 2)
5)
number
of SNP in the dataset specified in 2)
·
The
file with the SNP data. This file can be
space, comma, or tab delimited. The
columns are:
1)
identifier
for the individual
2)
sex
of the individual
3)
Dx
status of the individual
4)
group
to which the individual is assigned
4+1)---4+n_snp) SNP calls, one column per SNP (0 = 1/1, 1 = 1/2, 2
= 2/2, 3=0/0 (missing))
Example
Included
is a small example. Here we deal with 28
individuals and 52 SNP. Input data is in
input_ex.txt and ancestry_ex.txt. While
the output you are supposed to get should look like MMp_ex.txt and
mmp_ex_log.txt.
The
output file specified in A-4) is in the format that is used for GEM.
If
you have questions please let me know at kleil at upmc dot edu
Bert
Klei
Computational
Genetics
WPIC-UPMC