CCB is here to support your data science and computational needs. If interested in collaborating or need any related assistance, please email the details to

Additional information can be found on our Collaboration Page.



Project Title Work Group(s) Status Collaborator Project Deliverables
OpenGWAS Phenotype Mapping
  • Knowledge Representation
Ongoing MRC Integrative Epidemiology Unit, University of Bristol Tool for ontology mapping; and mappings of OpenGWAS phenotypes
23andMe Phenotype Mapping
  • Knowledge Representation
Ongoing 23andMe Mappings of 23andMe GWAS phenotypes
Benchmark of Ontology Mapping Tools
  • Knowledge Representation
Ongoing Kohane Lab, Department of Biomedical Informatics, Harvard Medical School Survey and benchmark test of ontology mapping tools
Programmatic Interface to HuBMAP Ontology
  • Knowledge Representation
Ongoing Cyberinfrastructure for Network Science Center, Indiana University Tool for programmatic interaction with HuBMAP ontology
Single-cell characterization of acute inflammation in patients with COVID-19
  • Computational Biology
Ongoing Jonathan Kagan, PhD An assessment of the quality of the scRNA-seq data obtained for the first 10 samples; 2) a characterization of the cell type composition of the first 10 samples; and 3) a statistical analysis of differential cell type abundance and investigation of T-cell receptor sequencing data and cytokine measurements between samples for defined contrasts.
Transparency In Coverage
  • Data & Analytics Platforms
Ongoing Mike Chernew, PhD, Healthcare Policy Data warehouse for insurance cost-sharing data
HSDM Research Data Repository
  • Data & Analytics Platforms
Ongoing Jane Barrow Data warehouse containing copy of production EHR data to support research activities
MERFISH Mouse Brain Data Viewer
  • Computational Biology
Ongoing Jeffrey Moffitt, PhD Posit Connect interface that enables interactive exploration of the data
Single-cell atlas of human variation in hematopoietic tissue
  • Computational Biology
Ongoing Allon Klein, PhD Processing, analysis, annotation, visualization, and exploration of a large single-cell hematopoiesis reference dataset
Harvard School of Dental Medicine- Dental Project
  • Computational Biology
Ongoing Jennifer L Gibbs, PhD To analyze the data that Dr. Gibbs’ group has to identify the relationship between psychosocial and demographic variables and cytokine features that can explain pain resolution.
Research Design and Analysis - R Component
  • Education
Ongoing Catherine Hayes, DMD, SM, DMSc R programming workshop as a supplementary component to the Research Design and Analysis Course
Identification of cell-cell interactions from high-resolution spatial transcriptomics data
  • Computational Biology
Ongoing Martin Hemberg, PhD Develop an open-source Python package and publish it on the Python Package Index (PyPI)
C elegans Database
  • Data & Analytics Platforms
Complete Max Heiman, PhD Database
Proteome-scale protein-protein interaction networks from the BioPlex project
  • Computational Biology
Complete Steven Gygi, PhD and Wade Harper, PhD R & Python packages
RNA sequencing atlas of vascular endothelial cells
  • Computational Biology
Complete Ulrich von Andrian, PhD O2-based RNA-seq pipeline & interactive data exploration platform
Harvey Mudd College CS Clinic 21/22
  • Knowledge Representation
Complete Harvey Mudd College Components of ontology mapping tool
Multiplexed Error Robust Fluorescence in Situ Hybridization(MERFISH)
  • Computational Biology
Complete Jeffrey Moffitt, PhD R/Bioconductor package, Repository of applications & downstream analyses and Interactive gallery of publicly available MERFISH datasets
AlphaFold & ColabFold
  • Computational Biology
Complete Research Computing and Edward Huttlin, PhD Modules on the O2 HPC cluster
Drugging the undruggable – machine-learning-based cancer immunotherapy design
  • Computational Biology,
  • Data & Analytics Platforms
Complete Ming-Ru Wu, MD, PhD Promoter visualization platform
Designmatch Container
  • Data & Analytics Platforms
Complete José R. Zubizarreta, PhD Develop a Docker image to collect and install all the necessary resources to run Designmatch in R.
Leveraging geographic information systems for spatial transcriptomics
  • Data & Analytics Platforms,
  • Computational Biology
Complete Harvey Mudd College Implement a GIS database-backend to represent and analyse spatial transcriptomics data
Whole-genome sequencing analysis of fluoroquinolone resistance acquisition in Mycobacterium tuberculosis
  • Computational Biology
Complete Maha Farhat, MD MSc Wrangle data to construct fully-processed & analysis-ready large WGS MTB sample constructed phylogenies that relate thousands of MTB isolates from different genetic backgrounds dated phylogenies and identification of key mutations based of phylogeny dating tools, identification of FLQ antibiotic resistance emergence in time and geographically
Thyroid hormone influence on the brain
  • Computational Biology
Complete Bernardo Sabatini, PhD Gene expression quantification and quality control for 16 bulk RNA-seq samples obtained from 8 mice; 2) differential expression analysis for defined contrasts; and 3) an exploration of alternative splicing for the ROBO3 gene
Computational pipeline for whole-genome sequencing analysis of yeast strains
  • Computational Biology
Complete Fred Winston, PhD Computational pipeline for whole-genome sequencing data analysis of yeast strains on HMS’ O2 cluster.
Parabiosis single-cell data viewer
  • Computational Biology
Complete Lee Rubin, PhD Posit Connect interface that enables interactive exploration of the data