DFLAT

Developmental FunctionaL Annotation at Tufts

Data:

GO Annotation

Our GO annotation is available here on its own or as part of all current GO annotation available through the GOA.

GONE

The GONE (Gene Ontology Non-Eligible) database is where we store annotations relevant to our research but that don't quite meet GOA's standards. Usually an annotation falls into this category because either the gene/protein described is a family of genes/proteins rather than a specific one, there is no UniProt ID to identify the gene/protein in the system, a GO term does not yet exist to describe the particular function, process, or location of the gene/protein, the species is not clearly identifiable in the paper, or the evidence is not as reliable (GO evidence codes TAS and NAS). As individual annotations these are more suspect than current GO annotation. However, for functional analysis of expression data, these gene sets can be valuable even with a certain amount of noise.

Orthologous annotation

We have created orthologous annotation mapped from mouse genes in the developmental branch of the GO.

Gene sets for GSEA

All human annotation was downloaded from GOA on October 22, 2017 and supplemented with manually curated DFLAT/GONE annotation and inferred orthologous annotation. Gene sets for biological process, molecular function, cellular component, and all three ontologies combined are in GMT format for use in Gene Set Enrichment Analysis (GSEA).

To create orthologous annotation, a list of all annotations made to mouse genes annotated to the development branch of the GO was obtained from MGI. Only annotations with experimental evidence codes (EXP, IDA, IPI, IMP, IGI, and IEP) from the development branch of the GO were considered. If a mouse gene with these annotations had a unique human ortholog according to HGNC/MGI, the mouse annotation was assigned to the human ortholog and assigned an ISS evidence code.

Gene sets were then generated automatically from the combined input data, including the orthologs. Only annotations with IDA, IPI, IMP, IGI, IEP, ISS, and TAS evidence codes were used for gene set construction. Annotations with the qualifier NOT were excluded. All genes annotated to a given GO term were also annotated to all ancestors of the term in the GO tree, following only is_a and part_of relationships in the Gene Ontology, for maximal consistency with the methods used by the GOA team. Separate gene sets were created for the biological process, molecular function, and cellular component branches of the GO tree.

Scripts

Our gene sets can be recreated from scratch by following these instructions. A batch script for GSEA analysis is available to run GSEA on multiple data sets.

Supplemental Data

We archived older versions of our data and you can download them from following links:

· Our gene sets generated from GOA downloaded on Feburary 19, 2016: biological process, molecular function, cellular component, all three ontologies combined

· Our gene sets generated from GOA downloaded on April 10, 2013 (published with our BMC 2014 paper): biological process, molecular function, cellular component, all three ontologies combined

· Our gene sets generated from GOA downloaded in March 2011: biological process, molecular function, cellular component, all three ontologies combined

We also include here the GSEA data sets and class files for the five developmental data sets featured in our submitted DFLAT paper, as well as the supplementary data from our PSB 2011 paper on gene set mining.