Normalization and statistical methods for crossplatform expression array analysis

dc.contributor.advisorGamieldien, Junaid
dc.contributor.advisorChristoffels, Alan
dc.contributor.authorMapiye, Darlington S
dc.date.accessioned2015-10-19T12:49:53Z
dc.date.accessioned2024-05-17T07:20:10Z
dc.date.available2015-10-19T12:49:53Z
dc.date.available2024-05-17T07:20:10Z
dc.date.issued2012
dc.description>Magister Scientiae - MScen_US
dc.description.abstractA large volume of gene expression data exists in public repositories like the NCBI’s Gene Expression Omnibus (GEO) and the EBI’s ArrayExpress and a significant opportunity to re-use data in various combinations for novel in-silico analyses that would otherwise be too costly to perform or for which the equivalent sample numbers would be difficult to collects exists. For example, combining and re-analysing large numbers of data sets from the same cancer type would increase statistical power, while the effects of individual study-specific variability is weakened, which would result in more reliable gene expression signatures. Similarly, as the number of normal control samples associated with various cancer datasets are often limiting, datasets can be combined to establish a reliable baseline for accurate differential expression analysis. However, combining different microarray studies is hampered by the fact that different studies use different analysis techniques, microarray platforms and experimental protocols. We have developed and optimised a method which transforms gene expression measurements from continuous to discrete data points by grouping similarly expressed genes into quantiles on a per-sample basis. After cross mapping each probe on each chip to the gene it represents, thereby enabling us to integrate experiments based on genes they have in common across different platforms. We optimised the quantile discretization method on previously published prostate cancer datasets produced on two different array technologies and then applied it to a larger breast cancer dataset of 411 samples from 8 microarray platforms. Statistical analysis of the breast cancer datasets identified 1371 differentially expressed genes. Cluster, gene set enrichment and pathway analysis identified functional groups that were previously described in breast cancer and we also identified a novel module of genes encoding ribosomal proteins that have not been previously reported, but whose overall functions have been implicated in cancer development and progression. The former indicates that our integration method does not destroy the statistical signal in the original data, while the latter is strong evidence that the increased sample size increases the chances of finding novel gene expression signatures. Such signatures are also robust to inter-population variation, and show promise for translational applications like tumour grading, disease subtype classification, informing treatment selection and molecular prognostics.en_US
dc.identifier.urihttps://hdl.handle.net/10566/15237
dc.language.isoenen_US
dc.publisherUniversity of the Western Capeen_US
dc.rights.holderUniversity of the Western Capeen_US
dc.subjectDifferential expression analysisen_US
dc.subjectExpression arrayen_US
dc.subjectQuantile discretizationen_US
dc.subjectGene expressionen_US
dc.titleNormalization and statistical methods for crossplatform expression array analysisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mapiye_MSC_2012.pdf
Size:
20.35 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Plain Text
Description: