Mining Big Data for the Transcriptional Landscape of Bacteria in Cancer

Date: 2022/11/3


To analyze the microRNA sequence data of 32 human cancer tissues, mine the implied bacterial transcriptional landscape, and construct the "BIC" database to provide cancer microenvironment information related to microbial communities.

The first author Kai-Pu Chen attended the ISEGB 2022 conference.

Internal organs other than the human gut were previously thought to be sterile. In particular, bacteria detected in tumor tissue are often considered contamination at the time of sampling. Recently, however, increasing evidence has shown that various microorganisms are present in cancer tissues. Based on the concept of "making the best use of everything", this study carried out big data analysis of human cancer microRNA sequence data from The Cancer Genome Atlas (TCGA). By reusing the sequences that should have been discarded because they did not correspond to human genes, and performing sequence alignment with bacterial gene sequence data, the information such as bacterial species and expression quantities present in cancer tissues are obtained and "BIC" database which provides biological information about the cancer microenvironment in relation to the microbial community is then developed. The research results were published in Nucleic Acids Research, the top-ranked journal in the field of biochemical research.

By mining the sequence data from 10,362 patient tissue samples in 32 cancer types and using multiple bioinformatic analyses, the results of cancer-associated bacterial information, including the relative abundance of bacteria, bacterial diversity, associations with clinical relevance, the co-expression network of bacteria and human genes, and their associated biological functions are acquired. BIC database provides an online interface for query and visualization so that users can quickly and effectively use and download this information. BIC is a public database that users can use freely, and all the developed source codes are available on GitHub. Researchers and enthusiasts in the bioinformatics-related community are also welcome to develop other applications accordingly.

This study is jointly conducted by Dr. Hsueh-Fen Juan, Distinguished Professor of the Department of Life Sciences and Graduate Institute of Biomedical Electronics and Bioinformatics and Director of the Center for Computational and Systems Biology, and Dr. Hsuan-Cheng Huang, Professor of the Institute of Biomedical Informatics of National Yang Ming Chiao Tung University. This work was supported by the Ministry of Science and Technology, Taiwan, the Ministry of Education (the Higher Education Sprout Project NTU), and the National Center for High-performance Computing (NCHC), which provided computational and storage resources. The first author, Kai-Pu Chen, is a Ph.D. student at the Graduate Institute of Biomedical Electronics and Bioinformatics. The research members include Dr. Chia-Lang Hsu, Associate Research Fellow of the Department of Medical Research of National Taiwan University Hospital, and Dr. Yen-Jen Oyang, Professor of the Graduate Institute of Biomedical Electronics and Bioinformatics.

The student, Kai-Pu Chen, is diagnosed with the rare disease, Spinal Muscular Atrophy (SMA). To complete this research work, we especially thank the National Taiwan University and the Department of Life Sciences for providing disabled-free space on campus so that Kai-Pu can work tirelessly in school without worry. In addition to gaining knowledge, he also learns bioinformatics and database construction, and thus comes this research result.

Article Link: https://doi.org/10.1093/nar/gkac891

Scroll to Top button