Home // International Journal On Advances in Life Sciences, volume 4, numbers 1 and 2, 2012 // View article


Simulating Gene Expression Data To Estimate Sample Size For Class and Biomarker Discovery

Authors:
Kevin Coombes
Paul Roebuck
Jiexin Zhang

Keywords: gene expression; microarray; simulation; class prediction; multi-hit theory of cancer; biomarker

Abstract:
With modern advances in high-throughput technologies to measure gene expression profiles, researchers are eager to identify biomarkers that indicate pathogenic processes or pharmacologic responses. However, insufficient statistical power, often due to the limited sample sizes in real experiments, has hindered progress in this area. Realistic simulations can provide data to better estimate sample sizes and better evaluate analytical methods. Existing simulation tools have focused more on the technology and less on the biological complexity of patients and outcomes. In this paper, we describe an R package of gene expression simulation tools to address this problem. Our model incorporates both biological and technical noise on top of the true signal, transcriptional status, and block structures that mimic gene networks. More importantly, to simulate the multi-hit model of cancer development, our tool contains latent variables that link gene expression with binary outcome and survival data. We demonstrate the use of this R package by providing examples of simulated cancer subtype recovery and biomarker discovery.

Pages: 44 to 51

Copyright: Copyright (c) to authors, 2012. Used with permission.

Publication date: June 30, 2012

Published in: journal

ISSN: 1942-2660