My research interests are focused on developing statistical methods, tools and software for the analysis of genomics data, which often contains noisy or missing data and systematic biases. Specifically, my research addresses statistical and computational challenges in functional genomics such as the pre-processing, normalization, analysis of raw high-throughput data (microarray and next-generation sequencing) leading to an improved quantification and understanding of biological variability. Some of my previous and current work has involved statistical methods for inferring latent variables using genomic data including EM algorithms. Here are a few software packages and data packages that I have authored:
- R/methylCC: R package available on GitHub to estimate the cell composition of whole blood in DNA methylation samples in microarray or sequencing platforms
- R/qsmooth: R package available on GitHub that implements a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions.
- R/quantro: R package available on Bioconductor to test for global differences between groups of distributions to decide when to use quantile normalization.
- R/quantroSim: Supporting data simulation R-package for the quantro R-package to simulate gene expression and DNA methylation data.
- R/explainr: translates S3 objects into text using standard templates in a simple and convenient way.
- postMUT: A tool implemented in Perl and R to predict the functionality of missense mutations.
- trapnell2014myoblasthuman: R data package that contains an ExpressionSet object from Trapnell et al. (2014) that performed a time-series experiment of bulk and single cell RNA-Seq at four time points in differentiated primary human myoblasts.
- patel2014gliohuman: R data package that contains a SummarizedExpression object from Patel et al. (2014) with single cell and bulk RNA-Seq data on five human glioblastoma tumors.
- bodymapRat: R data package that contains an ExpressionSet from the Yu et al. (2013) paper that performed the rat BodyMap across 11 organs and 4 developmental stage (PMID: 24510058).
- colonCancerWGBS: Cov files produced from Bismark after mapping six paired tumor-normal WGBS samples from Ziller et al. (2013) PMID: 23925113. Only chr22.
- myAffyData: AffyBatch object from an experiment using P493-6 cells expressing low or high levels of c-Myc. Data from Loven et al. (2012) Cell 151: 476-482.
- BackgroundExperimentYeast: AffyBatch object from an experiment to measure NSB and optical noise in yeast.
- Setting the Stage for Reproducibility and Replicability in Science. Presented at Brandeis University Mar 22, 2017.
- On the widespread and critical impact of systemic bias and batch effects in single-cell RNA-seq data. Presented at the Boston Single-Cell Network Meeting in March 2016 (Boston, MA, USA), presented at the Joint Statistical Meetings Aug 2016 (Chicago, IL, USA), and presented at the Single-Cell Genomics Conference Sept 2016 (Hinxton, UK).
- Why Statistics Matters in the Analysis of Genomics Data (Youtube video). Presented at the LSU Computational Biology seminar and the LSUConnect event in Feb 2015.
- Normalization of DNA methylation and Gene Expression Data in the Context of Global Variation. Presented at the Bioinformatics Meeting, Division of Immunology, Harvard Medical School in Sept 2014.