Breadcrumb
*Cancelled* - Hierarchical, non-parametric Bayesian clustering of digital gene expression data
Fri 26 April 2013, 14:15
Dimitris Vavoulis
Bristol
Organisers: Nick Whiteley, Feng Yu
ABSTRACT
Next-generation sequencing provides a revolutionary tool for studying gene expression. When applied to an RNA sample, these technologies produce a library of millions of sequence tags, which constitute a fundamentally discrete measure of gene expression. Typically, data of this type is characterised by over-dispersion and low numbers of biological replicates, which poses new challenges to their statistical analysis. An important aspect of this analysis is to identify interesting patterns in the data through its partitioning into different clusters. W
e present a non-parametric Bayesian clustering method, which combines the Negative Binomial distribution for modelling over-dispersed count data with the Hierarchical Dirichlet Process (HDP), and an associated blocked Gibbs sampling algorithm for inference in this model. The HDP permits information sharing within and between samples, thus compensating for the low number of replicates and making possible the robust estimation of the mean and dispersion parameters of each mixture component. Moreover, the use of an infinite mixture does not require any a priori information regarding the number of clusters, which is estimated together with the remaining model parameters. A paper summarising an early version of this clustering methodology can be found here: http://arxiv.org/abs/1301.4144
