Mostrar el registro sencillo del ítem

dc.contributor.authorRacedo, Sebastianspa
dc.contributor.authorPortnoy, Ivanspa
dc.contributor.authorI. Vélez, Jorgespa
dc.contributor.authorSan-Juan-Vergara, Homerospa
dc.contributor.authorSanjuan, Marcospa
dc.contributor.authorZurek, Eduardospa
dc.date.accessioned2021-08-10T14:09:34Z
dc.date.available2021-08-10T14:09:34Z
dc.date.issued2021-07-09
dc.identifier.urihttps://hdl.handle.net/11323/8511spa
dc.description.abstractBackground High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. Results Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. Conclusions Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.spa
dc.format.mimetypeapplication/pdfspa
dc.language.isoeng
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/spa
dc.sourceBioData Miningspa
dc.subjectMicrobial communitiesspa
dc.subjectCompositional naturespa
dc.subjectClassification methodspa
dc.subject16 rRNA sequencingspa
dc.titleA new pipeline for structural characterization and classification of RNA-Seq microbiome dataspa
dc.typeArtículo de revistaspa
dc.source.urlhttps://biodatamining.biomedcentral.com/articles/10.1186/s13040-021-00266-7spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.identifier.doihttps://doi.org/10.1186/s13040-021-00266-7spa
dc.identifier.instnameCorporación Universidad de la Costaspa
dc.identifier.reponameREDICUC - Repositorio CUCspa
dc.identifier.repourlhttps://repositorio.cuc.edu.co/spa
dc.relation.referencesTurnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The Human Microbiome Project. Nature [Internet]. 2007;449(7164):804–10. Available from: https://doi.org/10.1038/nature06244.spa
dc.relation.referencesKitano H. Looking beyond the details: a rise in system-oriented approaches in genetics and molecular biology. Curr Genet [Internet]. 2002 [cited 2019 Nov 13];41(1):1–10. Available from: https://doi.org/10.1007/s00294-002-0285-z.spa
dc.relation.referencesOltvai ZN. Life’s complexity pyramid Zoltán N. Oltvai. 2010;763(2002).spa
dc.relation.referencesKitano H. Systems biology: a brief overview. 2015;(April 2002).spa
dc.relation.referencesVoorhies AA, Ott CM, Mehta S, Pierson DL, Crucian BE, Feiveson A, et al. Study of the impact of long-duration space missions at the International Space Station on the astronaut microbiome. Sci Rep [Internet]. 2019;1–17. Available from: https://doi.org/10.1038/s41598-019-46303-8spa
dc.relation.referencesSomerville C, Somerville S. Plant functional genomics. Science. 1999;285(5426):380–3.spa
dc.relation.referencesGill R, Datta S, Datta S. A statistical framework for differential network analysis from microarray data. BMC Bioinformatics. 2010;11(1):95.spa
dc.relation.referencesGill R, Datta S, Datta S. dna: an R package for differential network analysis. Bioinformation. 2014;10(4):233.spa
dc.relation.referencesJuric D, Lacayo NJ, Ramsey MC, Racevskis J, Wiernik PH, Rowe JM, et al. Differential gene expression patterns and interaction networks in BCR-ABL—positive and—negative adult acute lymphoblastic leukemias. J Clin Oncol. 2007;25(11):1341–9.spa
dc.relation.referencesVan Treuren W, Ren B, Gevers D, Kugathasan S, Denson LA, Va Y, et al. Resource the treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15:382–92.spa
dc.relation.referencesRuan D, Young A, Montana G. Differential analysis of biological networks. BMC Bioinformatics. 2015;16(1):327.spa
dc.relation.referencesSchloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41.spa
dc.relation.referencesRao KR, Lakshminarayanan S. Partial correlation based variable selection approach for multivariate data classification methods. Chemom Intell Lab Syst. 2007;86(1):68–81.spa
dc.relation.referencesKurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015;11(5):e1004226.spa
dc.relation.referencesAitchison J. The statistical analysis of compositional data. J R Stat Soc Ser B. 1982:139–77.spa
dc.relation.referencesFilzmoser P, Hron K, Reimann C. Science of the Total Environment Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Sci Total Environ [Internet]. 2009;407(23):6100–8. Available from: https://doi.org/10.1016/j.scitotenv.2009.08.008.spa
dc.relation.referencesClark C, Kalita J. A comparison of algorithms for the pairwise alignment of biological networks. Bioinformatics [Internet]. 2014;30(16):2351–9. Available from: https://doi.org/10.1093/bioinformatics/btu307.spa
dc.relation.referencesAtchison J, Shen SM. Logistic-normal distributions: some properties and uses. Biometrika. 1980;67(2):261–72.spa
dc.relation.referencesAitchison J. A new approach to null correlations of proportions. J Int Assoc Math Geol. 1981;13(2):175–89.spa
dc.relation.referencesEgozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C. Isometric Logratio transformations for compositional data analysis. Math Geol [Internet]. 2003;35(3):279–300. Available from: https://doi.org/10.1023/A:1023818214614.spa
dc.relation.referencesGreenacre M, Grunsky E. The isometric logratio transformation in compositional data analysis: a practical evaluation. 2019.spa
dc.relation.referencesPan M, Zhang J. Correlation-based linear discriminant classification for gene expression data. Genet Mol Res. 2017;16(1).spa
dc.relation.referencesHira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma 2015;2015.spa
dc.relation.referencesGoswami S, Chakrabarti A, Chakraborty B. Analysis of correlation structure of data set for efficient pattern classification. In: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF); 2015. p. 24–9.spa
dc.relation.referencesRussell EL, Chiang LH, Braatz RD. Data-driven methods for fault detection and diagnosis in chemical processes. New York: Springer Science & Business Media; 2012.spa
dc.relation.referencesSaeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–17.spa
dc.relation.referencesSerban N, Critchley-Thorne R, Lee P, Holmes S. Gene expression network analysis and applications to immunology. Bioinformatics. 2007;23(7):850–8.spa
dc.relation.referencesFriedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687.spa
dc.relation.referencesRadovic M, Ghalwash M, Filipovic N, Obradovic Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics. 2017;18(1):1–14.spa
dc.relation.referencesAnders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.spa
dc.relation.referencesPaulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013;10(12):1200–2.spa
dc.relation.referencesKavitha KR, Rajendran GS, Varsha J. A correlation based SVM-recursive multiple feature elimination classifier for breast cancer disease using microarray. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI); 2016. p. 2677–83.spa
dc.relation.referencesCollins GS, Mallett S, Omar O, Yu L-M. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med. 2011;9(1):103.spa
dc.relation.referencesAarøe J, Lindahl T, Dumeaux V, Sæbø S, Tobin D, Hagen N, et al. Gene expression profiling of peripheral blood cells for early detection of breast cancer. Breast Cancer Res. 2010;12(1):R7.spa
dc.relation.referencesDatta S. Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by random forest. Stat Appl Genet Mol Biol. 2008;7(2).spa
dc.relation.referencesŠonka M, Hlaváč V, Boyle R. Image processing, analysis, and machine vision. International Student Edition; 2008.spa
dc.relation.referencesDembélé D, Kastner P. Fold change rank ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinformatics. 2014;15(1):14.spa
dc.relation.referencesBevilacqua V, Mastronardi G, Menolascina F, Paradiso A, Tommasi S. Genetic algorithms and artificial neural networks in microarray data analysis: a distributed approach. Eng Lett. 2006;13(4).spa
dc.relation.referencesCa DAV, Mc V. Gene expression data classification using support vector machine and mutual information-based gene selection. Proc Comput Sci. 2015;47:13–21.spa
dc.relation.referencesvan Dam S, Vosa U, van der Graaf A, Franke L, de Magalhaes JP. Gene co-expression analysis for functional classification and gene--disease predictions. Brief Bioinform. 2018;19(4):575–92.spa
dc.relation.referencesDudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97(457):77–87.spa
dc.relation.referencesBhuvaneswari V, et al. Classification of microarray gene expression data by gene combinations using fuzzy logic (MGC-FL). Int J Comput Sci Eng Appl. 2012;2(4):79.spa
dc.relation.referencesBelciug S, Gorunescu F. Learning a single-hidden layer feedforward neural network using a rank correlation-based strategy with application to high dimensional gene expression and proteomic spectra datasets in cancer detection. J Biomed Inform. 2018;83:159–66.spa
dc.relation.referencesDettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics. 2003;19(9):1061–9.spa
dc.relation.referencesFriedman J, Hastie T, Tibshirani R, et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat. 2000;28(2):337–407.spa
dc.relation.referencesFreund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55(1):119–39.spa
dc.relation.referencesFix E, Hodges Jr JL. Discriminatory analysis-nonparametric discrimination: small sample performance; 1952.spa
dc.relation.referencesBreiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Boca Raton, FL: CRC Press; 1984.spa
dc.relation.referencesMartín-Fernández J-A, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Modelling. 2015;15(2):134–58.spa
dc.relation.referencesPearson K. Mathematical contributions to the theory of evolution—on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond. 1897;60(359–367):489–98.spa
dc.relation.referencesMcDonald D, Hyde E, Debelius JW, Morton JT, Gonzalez A, Ackermann G, et al. American Gut: an open platform for citizen science microbiome research. Msystems. 2018;3(3):e00031–18.spa
dc.relation.referencesDeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72(7):5069–72.spa
dc.type.coarhttp://purl.org/coar/resource_type/c_6501spa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/articlespa
dc.type.redcolhttp://purl.org/redcol/resource_type/ARTspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.rights.coarhttp://purl.org/coar/access_right/c_abf2spa


Ficheros en el ítem

Thumbnail
Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

  • Artículos científicos [3154]
    Artículos de investigación publicados por miembros de la comunidad universitaria.

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 International
Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 International