Home // BIOTECHNO 2020, The Twelfth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies // View article
Cancer Classification through a Hybrid Machine Learning Approach
Authors:
Elmira Amiri Souri
Sophia Tskoa
Keywords: Machine Learning; Disease Classification; Clustering; Cancer Prediction
Abstract:
Understanding the underlying principles of cancer is a key endeavour in biomedical data mining. Although machine learning methods have been successful in discriminating normal from cancerous tissue with good accuracy, understanding of progression and formation of cancer across various cancer types is still restricted. Since cancer is a complex disease, being able to identify subgroups and investigate them separately may help in increasing the depth of our knowledge in terms of driver genes and oncogenic pathways. Moreover, as genes never act in isolation, methods that focus on single genes individually may be less efficient in uncovering key underlying molecular interactions. Algorithms that are capable of discovering the effect of combinations of genes have the potential to pave the way for extracting a new class of gene signatures that are neither mutated nor expressed differently, but rather act as mediators in forming oncogenic pathways. Here, we present a hybrid machine learning model to find cancer subgroups and an associated set of marker genes. In the proposed model, autoencoders are used to create a rich compressed set of features to identify cancer subgroups. Then, a two-step algorithm is developed based on information theory and regression analysis to find a set of discriminatory genes for each selected group for different types of cancer. This analysis is conducted based on the combined expression of genes to discover a new subset of genes associated with cancer. We show that we can still predict cancer accurately by decreasing the number of genes from thousands to tens for each subgroup. Pathway enrichment analysis is performed to find important pathways associated with a specific cancer type. The model is extensively analysed on datasets across nine cancer types and links between cancers are studied based on common gene signatures.
Pages: 20 to 27
Copyright: Copyright (c) IARIA, 2020
Publication date: September 27, 2020
Published in: conference
ISSN: 2308-4383
ISBN: 978-1-61208-792-4
Location: Lisbon, Portugal
Dates: from September 27, 2020 to October 1, 2020