OpenGWAS Phenotype Mapping

Work Group

Project Lead(s)

Project Status


Project Deliverable

Tool for ontology mapping; and mappings of OpenGWAS phenotypes

Collaborator Name

MRC Integrative Epidemiology Unit, University of Bristol

HMS Department

Center for Computational Biomedicine

Project Description

OpenGWAS is a database of genetic associations from over 40K Genome-Wide Association Study (GWAS) summary datasets, available for querying or download. The metadata associated with each GWAS dataset specify the phenotype under analysis via a free-text description. There are syntactic variations in the descriptions of phenotypes that, on the surface, seem conceptually equivalent. For example: “Non-cancer illness code, self-reported: chronic sinusitis,” “Diagnoses – main ICD10: J32 Chronic sinusitis,” and “Chronic sinusitis.” These variations are detrimental to search and may cause datasets of interest to be missed. To solve this, phenotype descriptions could be mapped to terms from relevant ontologies. 

Mapping phenotypes to ontologies would (1) facilitate the integration and meta-analysis of GWAS datasets indexed in OpenGWAS with GWAS datasets from other sources; and (2) enable ontology-based search—for example, querying for “Sinusitis” would yield datasets for all (more specific) kinds of sinusitis, such as acute, chronic, or recurrent.

The main objectives of this project are: (a) to develop an application to semi-automatically map plain-text phenotype descriptions to controlled terms from ontologies, and (b) to generate tentative mappings of the phenotypes in OpenGWAS metadata to terms from EFO, HPO, and other relevant ontologies.