Our group is interested in advancing precision medicine and health using big data analytics. We develop new algorithms and models for the big data in biobanks and eletronic health records. In particular, we are making new insights from big data that are often not possible with smaller data.
Population genetics informatics
Modern biobanks include genotypes up to 0.1%-1% of an entire large population. At this scale, genetic relatedness among samples is unavoidably ubiquitous. However, current methods are not efficient for uncovering genetic relatedness at such a scale. We developed ultra-efficient methods for detecting Identical-by-Descent (IBD) segments, a primary embodiment of genetic relatedness. Our RaPID method detected all IBD segments over a certain length orders of magnitude faster than existing methods, while offering higher power, accuracy, and sharper IBD segment boundaries.
We believe identifying IBD segments in population scale cohorts are the first step towards construction population scale genealogy which will be a fundamental infrastructure for future human society.
Representative publications
RaPID: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts. A Naseri, X Liu, K Tang, S Zhang, D Zhi. Genome biology 20 (1), 143 Link to code
Modeling of electronic health record (EHR) using deep learning
Patients’ health records and other health information are being collected and becoming available. This allows developing representation models that describe the inherent health status and treatment history of a patient. With access to multiple EHR databases with over 50 Million patients, We develop deep learning methods for uncovering the logic of medical practice and to help improve efficiency of clincial care.
Representative publications
L Rasmy, Y Xiang, Z Xie, C Tao, D Zhi. “Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction” npj Digital Medicine 4, 86 2021 arXiv
Imaging genetics using deep learning
We develop new deep learning (DL) based approaches for deriving new endophenotypes from imaging data, and associating these endophenotypes to genetic data. Marrying deep learning and GWAS, we can reveal new genes for Alzheimer’s diseases and retina developments.
Representative publications
Modern bioinformatics using deep learning
Deep learning is a powerful paradigm for modeling complex multi-modality data that is faced by modern biomedical research. We explore a variety of bioinformatics problems using deep learning approaches.
Representative publications
RaPID. Random Projection-based IBD Detection (RaPID).
pytorch_ehr. Open source codes for modeling EHR based on PyTorch
gene2vec. Open source codes for gene embedding based on co-expression.
HapSeq2. Our method for genotype calling and phasing for WGS data.
msBayes. Statistical Quantification of Methylation Levels by Next-generation Sequencing.