Microarray Inspector: tissue cross contamination detection tool for microarray data
Languages of publication
Microarray technology changed the landscape of contemporary life sciences by providing vast amounts of expression data. Researchers are building up repositories of experiment results with various conditions and samples which serve the scientific community as a precious resource. Ensuring that the sample is of high quality is of utmost importance to this effort. The task is complicated by the fact that in many cases datasets lack information concerning pre-experimental quality assessment. Transcription profiling of tissue samples may be invalidated by an error caused by heterogeneity of the material. The risk of tissue cross contamination is especially high in oncological studies, where it is often difficult to extract the sample. Therefore, there is a need of developing a method detecting tissue contamination in a post-experimental phase. We propose Microarray Inspector: customizable, user-friendly software that enables easy detection of samples containing mixed tissue types. The advantage of the tool is that it uses raw expression data files and analyses each array independently. In addition, the system allows the user to adjust the criteria of the analysis to conform to individual needs and research requirements. The final output of the program contains comfortable to read reports about tissue contamination assessment with detailed information about the test parameters and results. Microarray Inspector provides a list of contaminant biomarkers needed in the analysis of adipose tissue contamination. Using real data (datasets from public repositories) and our tool, we confirmed high specificity of the software in detecting contamination. The results indicated the presence of adipose tissue admixture in a range from approximately 4% to 13% in several tested surgical samples.
- Affymetrix I : Statistical algorithms description document. Technical paper 2002.
- Affymetrix Statistical Algorithms Reference Guide.
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall Ka, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A (2011) NCBI GEO: archive for functional genomics data sets - 10 years on. Nucleic Acids Res 39: D1005-D1010.
- Bolstad BM, Collin F, Simpson KM, Irizarry RA, Speed TP (2004) Experimental design and low-level analysis of microarray data. Int Rev Neurobiol 60: 25-58.
- Chen JJ, Hsueh HM, Delongchamp RR, Lin CJ, Tsai CA (2007) Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data. BMC Bioinformatics 8: 412.
- Chunlei Wu, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW 3rd, Su AI (2009) BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol 10: R130.
- Clarke J, Seo P, Clarke B (2010) Statistical expression deconvolution from mixed tissue samples. Bioinformatics (Oxford, England) 26: 1043-1049.
- Dupuy A, Simon RM (2007) Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Nat Cancer Institute 99: 147-157.
- Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30: 207-210.
- Gautier L, Cope L, Bolstad BM, Irizarry Ra (2004) affy - analysis of Affymetrix GeneChip data at the probe level. Bioinformatics (Oxford, England) 20: 307-315.
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5: R80.
- Hollander M, Wolfe DA (1973) Nonparametric Statistical Methods. 2nd (1999) edition. New York: John Wiley & Sons.
- Ioannidis JP (2005) Microarrays and molecular research: noise discovery? Lancet 365: 454-455.
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31: e15.
- Ji H, Davis RW (2006) Data quality in genomics and microarrays. Nat Biotechnol 24: 1112-1113.
- Lähdesmäki H, Shmulevich L, Dunmire V, Yli-Harja O, Zhang W (2005) In silico microdissection of microarray data from heterogeneous cell populations. BMC Bioinformatics 6: 54.
- Lu P, Nakorchevskiy A, Marcotte EM (2003) Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations. Proc Natl Acad Sci USA 100: 10370-1035.
- Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics 18: 50-60.
- McCall MN, Murakami PN, Lukk M, Huber W, Irizarry R a (2011) Assessing affymetrix GeneChip microarray quality. BMC Bioinformatics 12: 137.
- Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365: 488-492.
- Pages H, Carlson M, Falco S, Li N (2008) AnnotationDbi: Annotation Database Interface. R package version 1.16.18.
- She X, Rohl C a, Castle JC, Kulkarni A V, Johnson JM, Chen R (2009) Definition, conservation and epigenetics of housekeeping and tissue-enriched genes. BMC Genomics 10: 269.
- Shi L, Tong W, Goodsaid F, Frueh FW, Fang H, Han T, Fuscoe JC, Casciano DA (2004) QA/QC: challenges and pitfalls facing the microarray community and regulatory agencies. Expert Rev Mol Diagnostics 4: 761-777.
- Shi L, Reid LH, Jones WD et al. (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24: 1151-1161.
- Shi L, Campbell G, Jones W, Campagne F (2010) The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 28: 827-838.
- Skrzypczak M, Goryca K, Rubel T, Paziewska A, Mikula M, Jarosz D, Pachlewski J, Oledzki J, Ostrowski J, Ostrowsk J (2010) Modeling oncogenic signaling in colon tumors by multidirectional analyses of microarray data directed for maximization of analytical reliability. PloS One 5: e13091.
- Tan PK (2003) Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 31: 5676-5684.
- Team RC (2012) A Language and Environment for Statistical Computing.
- Wang M, Master SR, Chodosh L a (2006) Computational expression deconvolution in a complex mammalian organ. BMC Bioinformatics 7: 328.
- Wang Y, Xia XQ, Jia Z, Sawyers A, Yao H, Wang-Rodriquez J, Mercola D, McClelland M (2010) In silico estimates of tissue components in surgical samples based on expression profiling data. Cancer Res 70: 6448-6455.
- Wilcoxon F (1945) Individual Comparisons by Ranking Methods. Individual Comparisons by Ranking Methods 1: 80-83.
- Venet D, Pecasse F, Maenhaut C, Bersini H (2001) Separation of samples into their constituents using gene expression data. Bioinformatics (Oxford, England) 17 (Suppl 1): S279-S287.
- Xiao SJ, Zhang C, Zou Q, Ji ZL (2010) TiSGeD: a database for tissue-specific genes. Bioinformatics (Oxford, England) 26: 1273-1275.
Publication order reference