Biclustering
A Comparative Analysis of Biclustering Algorithms for Gene Expression Data
Kemal Eren, Mehmet Deveci, Onur Kucuktunc, Umit V. Catalyurek
Abstract -
The need to analyze high-dimension biological data is driving the
development of new data mining methods. Biclustering algorithms have
been successfully applied to gene expression data to discover local
patterns, in which a subset of genes exhibit similar expression
levels over a subset of conditions. However, it is not clear which
algorithms are best suited for this task. Many algorithms
have been published in the past decade, most of which have been compared
only to a small number of algorithms.
Surveys and comparisons exist in the literature, but because
of the large number and variety of biclustering algorithms, they are
quickly outdated.
In this paper we partially address this problem of evaluating the
strengths and weaknesses of existing biclustering methods. We used
the BiBench package to compare twelve algorithms, many of which were
recently published or have not been extensively studied. The
algorithms were tested on a suite of synthetic datasets to measure
their performance on data with varying conditions, such as different
bicluster models, varying noise, varying numbers of biclusters, and
overlapping biclusters. The algorithms were also tested on eight
large gene expression datasets obtained from the Gene Expression
Omnibus (GEO). Gene Ontology enrichment analysis was performed on
the resulting biclusters, and the best enrichment terms are reported.
Our analyses show that the biclustering method and its parameters should
be selected based on the desired model, whether that model allows
overlapping biclusters, and its robustness to noise.
In addition, we observe that the biclustering algorithms capable of
finding more than one model are more successful at capturing biologically
relevant clusters.
10.1093/bib/bbs032
PDF
biclustering, microarray, gene expression, clustering