Our group is interested in advancing precision medicine and health using big data analytics. We develop new algorithms and models for the big data in biobanks and eletronic health records. In particular, we are making new insights from big data that are often not possible with smaller data.


Population genetics informatics

Modern biobanks include genotypes up to 0.1%-1% of an entire large population. At this scale, genetic relatedness among samples is unavoidably ubiquitous. However, current methods are not efficient for uncovering genetic relatedness at such a scale. We developed ultra-efficient methods for detecting Identical-by-Descent (IBD) segments, a primary embodiment of genetic relatedness. Our RaPID method detected all IBD segments over a certain length orders of magnitude faster than existing methods, while offering higher power, accuracy, and sharper IBD segment boundaries.

We believe identifying IBD segments in population scale cohorts are the first step towards construction population scale genealogy which will be a fundamental infrastructure for future human society.

Representative publications

Modeling of electronic health record (EHR) using deep learning

Patients’ health records and other health information are being collected and becoming available. This allows developing representation models that describe the inherent health status and treatment history of a patient. With access to multiple EHR databases with over 50 Million patients, We develop deep learning methods for uncovering the logic of medical practice and to help improve efficiency of clincial care.

Representative publications

Imaging genetics using deep learning

We develop new deep learning (DL) based approaches for deriving new endophenotypes from imaging data, and associating these endophenotypes to genetic data. Marrying deep learning and GWAS, we can reveal new genes for Alzheimer’s diseases and retina developments.

Representative publications

Modern bioinformatics using deep learning

Deep learning is a powerful paradigm for modeling complex multi-modality data that is faced by modern biomedical research. We explore a variety of bioinformatics problems using deep learning approaches.

Representative publications


RaPID. Random Projection-based IBD Detection (RaPID).

pytorch_ehr. Open source codes for modeling EHR based on PyTorch

gene2vec. Open source codes for gene embedding based on co-expression.

HapSeq2. Our method for genotype calling and phasing for WGS data.

msBayes. Statistical Quantification of Methylation Levels by Next-generation Sequencing.



Ardalan Naseri

Ardalan Naseri Research Scientist

  • Areas of Interest: Bioinformatics
  • CV
Bingyu Mao

Bingyu Mao PhD student

  • Areas of Interest: Deep learning, ODE
  • CV
Degui Zhi

Degui Zhi Associate Professor

  • Areas of Interest: Genome Informatics, Statistical Genetics, Machine Learning
  • Links: Google Scholar
  • Software: HapSeq2
Khush Patel

Khush Patel MS student

  • Areas of Interest: Deep learning, Medical Imaging, Predictive modeling, Natural language processing
  • CV
Laila Rasmy Gindy Bekhet
Sheikh Muhammad-Saiful-Islam

Sheikh Muhammad-Saiful-Islam PhD Student

  • Areas of Interest: Deep learning in Bioinformatics
Victor Wang

Victor Wang Research Intern

  • Areas of Interest: Bioinformatics
  • CV
Wanheng Zhang

Wanheng Zhang PhD student

  • [CV]
William Yue

William Yue Research Intern

  • Areas of Interest: Genome Informatics
  • Links: Github
Ziqian Xie

Ziqian Xie Postdoc (Jointly with Rui Chen)

  • Areas of Interest: Deep learning

Past Members

Ginny (Jie) Zhu

Ginny (Jie) Zhu Ph.D student

  • Areas of Interest: Machine learning, Health data science
  • CV
  • Links: Linkedin
Jing Zhang

Jing Zhang Graduate Student Researcher

  • Areas of Interest: big data visualization
Mia Tran

Mia Tran Masters' student

  • Areas of Interest: Deep learning model for EHR continuous variables
Ryan Lewis

Ryan Lewis Graduate Student Researcher

  • Areas of Interest: Population Genetics Informatics
  • Current position:
Soyeon Kim

Soyeon Kim Postdoc

  • Areas of Interest: Statistics
  • CV
Swati Goyal

Swati Goyal Graduate Research Assistant

  • Areas of Interest: Analyzing twitter data related to infectious diseases.
  • CV
Bijie Bie

Bijie Bie Postdoc

  • Areas of Interest: Communication studies, data science for social media
Guodong Wu

Guodong Wu PhD student

  • Areas of Interest: Statistical Genetics, Penalized Regression
  • Year: 2009-2014
Samad Jahandideh

Samad Jahandideh Postdoc

  • Areas of Interest: Structural Bioinformatics, Machine Learning
  • Links:Google scholar
  • CV
Xin Geng

Xin Geng Postdoc

Xueyan (Snow) Zhao

Xueyan (Snow) Zhao Postdoc

  • Areas of Interest: Statistical Genetics, PheWAS
  • CV