Methodology to identify a gene expression signature by merging microarray datasets

Olga Fajarda*, João Rafael Almeida, Sara Duarte-Pereira, Raquel M. Silva, José Luís Oliveira

*Autor correspondente para este trabalho

Resultado de pesquisarevisão de pares

47 Transferências (Pure)

Resumo

A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.

Idioma originalEnglish
Número do artigo106867
Número de páginas12
RevistaComputers in Biology and Medicine
Volume159
DOIs
Estado da publicaçãoPublicado - jun. 2023

Impressão digital

Mergulhe nos tópicos de investigação de “Methodology to identify a gene expression signature by merging microarray datasets“. Em conjunto formam uma impressão digital única.

Citação