Director: Rafael Goncalves, PhD

Artificial Intelligence applications require Knowledge Representation and Reasoning to encode knowledge in a way that computers can reason with. The Knowledge Representation group at CCB develops knowledge representation and reasoning solutions to facilitate the discovery, integration and meta-analysis of biomedical data originating from diverse sources. The data include electronic health records, healthcare insurance claims, genomics, environmental exposure, among others. The integration of these data will support large-scale epidemiological research in precision medicine, healthcare, and basic science, all with the goal of improving patient outcomes.

Our current projects revolve around the use of ontologies—computational artifacts that provide standardized symbols to represent phenotypes, diseases, gene products (entities), and that describe the relationships that exist between those entities. We adopt an open-development model in our projects, and all the resources that we develop are open-source. Learn about our ongoing and previous projects below!

Questions about CCB's Knowledge Representation projects can be sent to Rafael Goncalves, PhD, Director, Knowledge Representation

Contact our Center Administrator to ask about how CCB can help with your project!


  • OpenGWAS

    OpenGWAS is a database containing billions of genetic associations derived from Genome-Wide Association Studies (GWAS) aggregated from dozens of sources, such as FinnGen and the UK Biobank. The phenotypes are represented differently in these sources, so our work with OpenGWAS involves the standardization of phenotypes with ontology descriptions to facilitate data integration. Our goal is to improve the reusability and to facilitate the meta-analysis of GWAS data.

  • 23andMe

    23andMe has an enormous amount of genome-wide association data for numerous phenotypes. Our work with 23andMe involves the standardization of the phenotypes in GWAS metadata with suitable symbols from ontologies. Our goal is to facilitate the integration and meta-analysis of 23andMe’s GWAS data alongside other publicly-available GWAS data, such as those from OpenGWAS or the UK Biobank.


  • Harvey Mudd College Computer Science Clinic

    Harvey Mudd College Computer Science Clinic is a program that brings together students with academic or industry sponsors. Our work with the 2021/22 Harvey Mudd CS Clinic team involves research and development of methods and software to facilitate the mapping of free-text biomedical concepts to ontology terms. We are developing a tool that relies on NLP techniques, ontology semantics and automated reasoning to semi-automatically map concepts in bulk. The tool will include rich user interfaces to support user interaction, with the goal of being deployed and used by scientists in data normalization/curation workflows.