Publication: DeepAllo: allosteric site prediction using protein language model (pLM) with multitask learning
| dc.contributor.department | Graduate School of Sciences and Engineering | |
| dc.contributor.department | KUIS AI (Koç University & İş Bank Artificial Intelligence Center) | |
| dc.contributor.department | Department of Computer Engineering | |
| dc.contributor.department | Department of Chemical and Biological Engineering | |
| dc.contributor.kuauthor | Faculty Member, Keskin, Özlem | |
| dc.contributor.kuauthor | Faculty Member, Gürsoy, Attila | |
| dc.contributor.kuauthor | Master Student, Khokhar, Moaaz Ur-Rehman | |
| dc.contributor.schoolcollegeinstitute | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
| dc.contributor.schoolcollegeinstitute | Research Center | |
| dc.contributor.schoolcollegeinstitute | College of Engineering | |
| dc.date.accessioned | 2025-09-10T04:55:25Z | |
| dc.date.available | 2025-09-09 | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Motivation Allostery, the process by which binding at one site perturbs a distant site, is being rendered as a key focus in the field of drug development with its substantial impact on protein function. The identification of allosteric pockets (sites) is a challenging task and several techniques have been developed, including Machine Learning to predict allosteric pockets that utilize both static and pocket features.Results Our work, DeepAllo, is the first study that combines fine-tuned protein language model (pLM) with FPocket features and shows an increase in prediction performance of allosteric sites over previous studies. The pLM model was fine-tuned on AlloSteric Database (ASD) in Multitask Learning setting and was further used as a feature extractor to train XGBoost and AutoML models. The best model predicts allosteric pockets with 89.66% F1 score and 90.5% of allosteric pockets in the top 3 positions, outperforming previous results. A case study has been performed on proteins with known allosteric pockets, which shows the proof of our approach. Moreover, an effort was made to explain the pLM by visualizing its attention mechanism among allosteric and non-allosteric residues.Availability and implementation The source code is available on GitHub (https://github.com/MoaazK/deepallo) and archived on Zenodo (DOI: 10.5281/zenodo.15255379). The trained model is hosted on Hugging Face (DOI: 10.57967/hf/5198). The dataset used for training and evaluation is archived on Zenodo (DOI: 10.5281/zenodo.15255437). | |
| dc.description.fulltext | Yes | |
| dc.description.harvestedfrom | Manual | |
| dc.description.indexedby | WOS | |
| dc.description.indexedby | Scopus | |
| dc.description.indexedby | PubMed | |
| dc.description.openaccess | Gold OA | |
| dc.description.publisherscope | International | |
| dc.description.readpublish | N/A | |
| dc.description.sponsoredbyTubitakEu | TÜBİTAK | |
| dc.description.sponsorship | Scientific and Technological Research Council of Turkiye (TÜBİTAK) [120C120] | |
| dc.description.version | Published Version | |
| dc.description.volume | 41 | |
| dc.identifier.doi | 10.1093/bioinformatics/btaf294 | |
| dc.identifier.eissn | 1367-4811 | |
| dc.identifier.embargo | No | |
| dc.identifier.filenameinventoryno | IR06354 | |
| dc.identifier.issn | 1367-4803 | |
| dc.identifier.issue | 6 | |
| dc.identifier.quartile | Q1 | |
| dc.identifier.scopus | 2-s2.0-105008110303 | |
| dc.identifier.uri | https://doi.org/10.1093/bioinformatics/btaf294 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14288/30073 | |
| dc.identifier.wos | 001503429800001 | |
| dc.keywords | Biotechnology and applied microbiology | |
| dc.keywords | Mathematical and computational biology | |
| dc.language.iso | eng | |
| dc.publisher | Oxford Univ Press | |
| dc.relation.affiliation | Koç University | |
| dc.relation.collection | Koç University Institutional Repository | |
| dc.relation.ispartof | Bioinformatics | |
| dc.relation.openaccess | Yes | |
| dc.rights | CC BY (Attribution) | |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | Biochemical research methods | |
| dc.subject | Computer science | |
| dc.subject | Statistics and probability | |
| dc.title | DeepAllo: allosteric site prediction using protein language model (pLM) with multitask learning | |
| dc.type | Journal Article | |
| dspace.entity.type | Publication | |
| relation.isOrgUnitOfPublication | 3fc31c89-e803-4eb1-af6b-6258bc42c3d8 | |
| relation.isOrgUnitOfPublication | 77d67233-829b-4c3a-a28f-bd97ab5c12c7 | |
| relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
| relation.isOrgUnitOfPublication | c747a256-6e0c-4969-b1bf-3b9f2f674289 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | 3fc31c89-e803-4eb1-af6b-6258bc42c3d8 | |
| relation.isParentOrgUnitOfPublication | 434c9663-2b11-4e66-9399-c863e2ebae43 | |
| relation.isParentOrgUnitOfPublication | d437580f-9309-4ecb-864a-4af58309d287 | |
| relation.isParentOrgUnitOfPublication | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 | |
| relation.isParentOrgUnitOfPublication.latestForDiscovery | 434c9663-2b11-4e66-9399-c863e2ebae43 |
Files
Original bundle
1 - 1 of 1
