Ph.D. Thesis Defense - Chengwei Su

“A Novel Algorithm for Efficient Learning of Bayesian Networks from High-Dimensional Data and Prior Knowledge”

November 6, 2014
12 pm - 2 pm
Location
Jackson Conf Room, Cummings Hall
Sponsored by
Thayer School
Audience
Public
More information
Daryl Laware

Thesis Committee

Mark Borsuk, Ph.D. (Chair)

George Cybenko, Ph.D.

Casey Greene, Ph.D.

Yulei He, Ph.D.

 

Abstract

 

The primary objective of the research is to develop and validate the approach and algorithms for efficiently learning a complex web of relations from a combination of prior knowledge, published literature, and high-dimensional data. Algorithms for inferring the structure of Bayesian networks (BNs) from data have become an increasingly popular method for uncovering the direct and indirect influences among variables in complex systems. A Bayesian model averaging method, Markov Chain Monte Carlo (MCMC), is typically applied for BN structural learning from data. However, existing state-of-the-art MCMC-based learning algorithms are rather slow in mixing and convergence in high-dimensional domains. To address these challenges, we first developed and tested intelligent strategies for prioritizing the structural search space using prior information. Second, we present a novel Markov blanket resampling (MBR) scheme that intermittently reconstructs the entire Markov blanket of nodes, thus allowing the sampler to more effectively traverse low-probability regions between local maxima.  Experiments across a range of network sizes show that the MBR scheme outperforms other state-of-the-art algorithms, both in terms of learning performance and convergence rate.  In particular, MBR achieves better learning performance when the number of observations is relatively small and faster convergence when the number of variables in the network is large. It is anticipated that our methodology will be especially useful for deciphering how genes and the environment interact to determine cancer risk by allowing BNs to be extended to a genome-wide scale.

Location
Jackson Conf Room, Cummings Hall
Sponsored by
Thayer School
Audience
Public
More information
Daryl Laware