A Comparative Analysis of Biclustering Algorithms for Gene Expression Data

Kemal Eren, Mehmet Deveci, Onur Kucuktunc, Umit V. Catalyurek

Abstract - The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated.

In this paper we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare twelve algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic datasets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters, and overlapping biclusters. The algorithms were also tested on eight large gene expression datasets obtained from the Gene Expression Omnibus (GEO). Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.

10.1093/bib/bbs032
PDF
biclustering, microarray, gene expression, clustering

K. Eren, M. Deveci, Onur Kucuktunc, U.V. Catalyurek, A Comparative Analysis of Biclustering Algorithms for Gene Expression Data, (accepted to) Briefings in Bioinformatics, 2012.

Supplementary Material