What does the Gene Annotation File Format look like? |
|
The Gene Annotation File Format (GFF):
To upload Gene Annotation Files to
easyGWAS the file needs to follow the slightly modified GFF file format, version 2.
The fields in the file must be tab-seperated. Empty fields should be denoted with a
.
. The file must have the file extension
*.gff
.
The file must have 9 columns with the following information:
- Chromosome ID -- This must be the same as in the MAP file
- Source, e.g. the source or version of the data
- Feature, e.g. chromosome, gene, CDS, exon ...
- Start, e.g. start position of the chromosome, gene, CDS, exon, ...
- End, e.g. end position of the chromosome, gene, CDS, exon, ...
- Score, not relevant for easyGWAS use '.'
- Strand, use + (forward) or - (reverse) strand
- Frame, not relevant for easyGWAS use '.'
- Attribute -- Two fields a needed separated by a semicolon, ID which is the unique ID of the chromosome or gene and Name that is the name of the gene or chromosome
Each chromosome must be defined at the beginning in the following way:
Chr1 TAIR9 chromosome 1 30427671 . . . ID=Chr1;Name=Chromosome1
Chr2 TAIR9 chromosome 1 19698289 . . . ID=Chr2;Name=Chromosome2
Chr3 TAIR9 chromosome 1 23459830 . . . ID=Chr3;Name=Chromosome3
Chr4 TAIR9 chromosome 1 18585056 . . . ID=Chr4;Name=Chromosome4
Chr5 TAIR9 chromosome 1 26975502 . . . ID=Chr5;Name=Chromosome5
Following the chromosome definition, all other features can be defined in the following way:
Chr1 TAIR9 gene 3631 5899 . + . ID=AT1G01010;Name=AT1G01010
Chr1 TAIR9 mRNA 3631 5899 . + . ID=AT1G01010.1;Name=AT1G01010.1
Chr1 TAIR9 protein 3760 5630 . + . ID=AT1G01010.1-Protein;Name=AT1G01010.1
Chr1 TAIR9 gene 5928 8737 . - . ID=AT1G01020;Name=AT1G01020
Chr1 TAIR9 mRNA 5928 8737 . - . ID=AT1G01020.1;Name=AT1G01020.1
Detailed information about the GFF file format can be found here:
GFF/GTF File Format
Note: Genome-Annotation-Files can only be uploaded for existing species and genotypes. You first have to uploaded PED and MAP files. Chromosome identifiers in the GFF file must match the chromosome identifiers in the MAP file. No additional chromosome identifiers are allowed in the GFF file.