Andrey A. Shabalin1,
Cheng Fan3, Charles M. Perou3,4,5, and Andrew B. Nobel1
1Department of Statistics and Operations Research, University of
North Carolina at Chapel Hill, Chapel Hill, NC 27599
2Department of Mathematical Sciences, Norwegian University of Science and Technology
3Lineberger Comprehensive Cancer Center, UNC-CH
4Department of Pathology and Laboratory Medicine, UNC-CH
5Department of Genetics, UNC-CH
Published in Oxford Bioinformatics on March 5, 2008.
Gene expression microarrays are currently being applied in a variety of biomedical applications. This paper considers the problem of how to merge data sets arising from different gene-expression studies of a common organism and phenotype. Of particular interest is how to merge data from different technological platforms. The paper makes two contributions to the problem. The first is a simple cross-study normalization method, which is based on linked gene/sample clustering of the given data sets. The second is the introduction and description of several general validation measures that can be used to assess and compare cross-study normalization methods. The proposed normalization method is applied to three existing breast cancer data sets, and is compared to several competing normalization methods using the proposed validation measures.
Matlab code for XPN
Supplementary materials contain heatmap illustration of the XPN model idea based on the real data and validation results on the intrinsic gene set.