What do the Phenotype and Covariate PLINK file formats look like? |
|
The Phenotype and Covariate PLINK File Formats:
easyGWAS requires Phenotype and Covariate data in PLINK [1] format for upload. Also all downloadable public datasets in
easyGWAS are in PLINK format.
Example files can be downloaded here
The Phenotype and Covariate file formats are identical. The first two columns of the files are the Family ID (
FID
) and the Sample ID (
IID
) followed by the Phenotype.
For the case that the Family ID is unknown use the Sample ID. For a given sample, the Sample ID in the Phenotype or Covariate file must match the Sample ID in the Genotype file.
Note: The Phenotype and Covariate files must have a header line, where the first two elements are always FID and IID followed by the names of the Phenotypes
Missing measurements can be represented by using the keyword
nan
. The file has to be tab or whitespace separated.
Here is a brief example of a Phenotype file containing 5 samples and two phenotypes with the names "PhenotypeName1" and "PhenotypeName2":
FID IID PhenotypeName1 PhenotypeName2
4304 4304 6.0 5.4
6925 6925 6.0 3.2
7319 7319 6.3 3.3
6963 6963 6.6 nan
6968 6968 nan 9.8
Uploading Phenotype or Covariate Files: The file names must end with *.pheno if the file is a Phenotype file or with *.cov if it is a Covariate file.
Detailed information about the PLINK file formats can be found at
PLINK's web page
Please note: If you use Windows or Excel to create your phenotype or covariate files you have to store the data in "Windows Formatted Text". Windows has different newline signs than unix. So when the file is stored in the wrong format, easyGWAS might have trouble parsing the file!
References
[1] Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR,
Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007)
PLINK: a toolset for whole-genome association and population-based
linkage analysis. American Journal of Human Genetics, 81.