Prediction of DNA-binding Sites in Transcriptions Factor in Fur-like Proteins Using Machine Learning and Molecular Descriptors
- Authors: Muñoz J.1, Reyes-Suárez J.1, Besoain F.2, Arenas-Salinas M.1
-
Affiliations:
- Centro de Bioinformática, Simulación y Modelado (CBSM). Facultad de Ingeniería., Universidad de Talca
- Faculty of Engineering,, Campus Talca, Universidad de Talca,
- Issue: Vol 19, No 4 (2024)
- Pages: 398-407
- Section: Life Sciences
- URL: https://rjsocmed.com/1574-8936/article/view/643880
- DOI: https://doi.org/10.2174/0115748936264122231016094702
- ID: 643880
Cite item
Full Text
Abstract
Introduction:Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features.
Methods:In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance.
Results:The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively.
Conclusion:The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.
About the authors
Jessica Muñoz
Centro de Bioinformática, Simulación y Modelado (CBSM). Facultad de Ingeniería., Universidad de Talca
Email: info@benthamscience.net
José Reyes-Suárez
Centro de Bioinformática, Simulación y Modelado (CBSM). Facultad de Ingeniería., Universidad de Talca
Email: info@benthamscience.net
Felipe Besoain
Faculty of Engineering,, Campus Talca, Universidad de Talca,
Email: info@benthamscience.net
Mauricio Arenas-Salinas
Centro de Bioinformática, Simulación y Modelado (CBSM). Facultad de Ingeniería., Universidad de Talca
Author for correspondence.
Email: info@benthamscience.net
References
- Deng C, Wu Y, Lv X, et al. Refactoring transcription factors for metabolic engineering. Biotech Adv 2022; 57(August 2021): 107935. doi: 10.1016/j.biotechadv.2022.107935
- Neph S, Vierstra J, Stergachis AB, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 2012; 489(7414): 83-90. doi: 10.1038/nature11212 PMID: 22955618
- Yu H, Gerstein M. Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci USA 2006; 103(40): 14724-31. doi: 10.1073/pnas.0508637103 PMID: 17003135
- Geng H, Jiang R. cAMP receptor protein (CRP)-mediated resistance/tolerance in bacteria: Mechanism and utilization in biotechnology. Appl Microbiol Biotechnol 2015; 99(11): 4533-43. doi: 10.1007/s00253-015-6587-0 PMID: 25913005
- Lin Z, Zhang Y, Wang J. Engineering of transcriptional regulators enhances microbial stress tolerance. Biotechnol Adv 2013; 31(6): 986-91. doi: 10.1016/j.biotechadv.2013.02.010 PMID: 23473970
- Papavassiliou KA, Papavassiliou AG. Transcription factor drug targets. J Cell Biochem 2016; 117(12): 2693-6. doi: 10.1002/jcb.25605 PMID: 27191703
- Seo SW, Kim D, Latif H, OBrien EJ, Szubin R, Palsson BO. Deciphering Fur transcriptional regulatory network highlights its complex role beyond iron metabolism in Escherichia coli. Nat Commun 2014; 5(1): 4910. doi: 10.1038/ncomms5910 PMID: 25222563
- Hantke K. Iron and metal regulation in bacteria. Curr Opin Microbiol 2001; 4(2): 172-7. doi: 10.1016/S1369-5274(00)00184-3 PMID: 11282473
- Pich OQ, Merrell DS. The ferric uptake regulator of Helicobacter pylori: A critical player in the battle for iron and colonization of the stomach. Future Microbiol 2013; 8(6): 725-38. doi: 10.2217/fmb.13.43 PMID: 23701330
- Pohl E, Haller JC, Mijovilovich A, Meyer-Klaucke W, Garman E, Vasil ML. Architecture of a protein central to iron homeostasis: Crystal structure and spectroscopic analysis of the ferric uptake regulator. Mol Microbiol 2003; 47(4): 903-15. doi: 10.1046/j.1365-2958.2003.03337.x PMID: 12581348
- Sritharan M. Iron and bacterial virulence. Indian J Med Microbiol 2006; 24(3): 163-4. doi: 10.1016/S0255-0857(21)02343-4 PMID: 16912433
- Cissé C, Mathieu SV, Abeih MBO, et al. Inhibition of the ferric uptake regulator by peptides derived from anti-FUR peptide aptamers: Coupled theoretical and experimental approaches. ACS Chem Biol 2014; 9(12): 2779-86. doi: 10.1021/cb5005977 PMID: 25238402
- Mathieu S, Cissé C, Vitale S, et al. From peptide aptamers to inhibitors of FUR, bacterial transcriptional regulator of iron homeostasis and virulence. ACS Chem Biol 2016; 11(9): 2519-28. doi: 10.1021/acschembio.6b00360 PMID: 27409249
- He X, Liao X, Li H, Xia W, Sun H. Bismuth-induced inactivation of ferric uptake regulator from helicobacter pylori. Inorg Chem 2017; 56(24): 15041-8. doi: 10.1021/acs.inorgchem.7b02380 PMID: 29200284
- Zhang Y, Ni J, Gao Y RF‐SVM. Identification of DNA‐binding proteins based on comprehensive feature representation methods and support vector machine Proteins 2022; 90(2): 395-404. doi: 10.1002/prot.26229 PMID: 34455627
- Hendrix SG, Chang KY, Ryu Z, Xie ZR. Deepdise: Dna binding site prediction using a deep learning method. Int J Mol Sci 2021; 22(11): 5510. doi: 10.3390/ijms22115510 PMID: 34073705
- Liu B, Xu J, Lan X, et al. iDNA-Prot⋅dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 2014; 9(9): e106691. doi: 10.1371/journal.pone.0106691 PMID: 25184541
- Sang X, Xiao W, Zheng H, Yang Y, Liu T. HMMPred: Accurate prediction of DNA-binding proteins based on HMM profiles and XGBoost feature selection. Comput Math Methods Med 2020; 2020: 1-10. doi: 10.1155/2020/1384749 PMID: 32300371
- Zhang S, Zhao L, Zheng CH, Xia J. A feature-based approach to predict hot spots in proteinDNA binding interfaces. Brief Bioinform 2020; 21(3): 1038-46. doi: 10.1093/bib/bbz037 PMID: 30957840
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H. The Protein Data Bank. Nucleic Acids Res 2000; 28(1): 235-42.
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011; 28(10): 2731-9. doi: 10.1093/molbev/msr121 PMID: 21546353
- Humphrey W. VMD: Visual molecular dynamics. J Mol Graph 1996; 14(1): 33-8.
- Eargle J, Wright D, Luthey-Schulten Z. Multiple Alignment of protein structures and sequences for VMD. Bioinformatics 2006; 22(4): 504-6. doi: 10.1093/bioinformatics/bti825 PMID: 16339280
- Osorio D, Rondón-Villarreal P, Torres R. Peptides: A package for data mining of antimicrobial peptides. R J 2015; 7(1): 4-14. doi: 10.32614/RJ-2015-001
- Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chous general PseAAC. Sci Rep 2017; 7(1): 42362. doi: 10.1038/srep42362 PMID: 28205576
- Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: An online force field. Nucleic Acids Res 2005; 33(Web Server) (Suppl. 2): W382-8. doi: 10.1093/nar/gki387 PMID: 15980494
- Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng 2014; 40(1): 16-28. doi: 10.1016/j.compeleceng.2013.11.024
- Berisha V, Krantsevich C, Hahn PR, et al. Digital medicine and the curse of dimensionality. NPJ Digit Med 2021; 4(1): 153. doi: 10.1038/s41746-021-00521-5 PMID: 34711924
- Chowdhury SY, Shatabda S, Dehzangi A. iDNAProt-ES: Identification of DNA-binding proteins using evolutionary and structural features. Sci Rep 2017; 7(1): 14938. doi: 10.1038/s41598-017-14945-1 PMID: 29097781
- Lou W, Wang X, Chen F, Chen Y, Jiang B, Zhang H. Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One 2014; 9(1): e86703. doi: 10.1371/journal.pone.0086703 PMID: 24475169
Supplementary files
