Publication:
DeepAllo: allosteric site prediction using protein language model (pLM) with multitask learning

dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.departmentKUIS AI (Koç University & İş Bank Artificial Intelligence Center)
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Chemical and Biological Engineering
dc.contributor.kuauthorFaculty Member, Keskin, Özlem
dc.contributor.kuauthorFaculty Member, Gürsoy, Attila
dc.contributor.kuauthorMaster Student, Khokhar, Moaaz Ur-Rehman
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.contributor.schoolcollegeinstituteResearch Center
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.date.accessioned2025-09-10T04:55:25Z
dc.date.available2025-09-09
dc.date.issued2025
dc.description.abstractMotivation Allostery, the process by which binding at one site perturbs a distant site, is being rendered as a key focus in the field of drug development with its substantial impact on protein function. The identification of allosteric pockets (sites) is a challenging task and several techniques have been developed, including Machine Learning to predict allosteric pockets that utilize both static and pocket features.Results Our work, DeepAllo, is the first study that combines fine-tuned protein language model (pLM) with FPocket features and shows an increase in prediction performance of allosteric sites over previous studies. The pLM model was fine-tuned on AlloSteric Database (ASD) in Multitask Learning setting and was further used as a feature extractor to train XGBoost and AutoML models. The best model predicts allosteric pockets with 89.66% F1 score and 90.5% of allosteric pockets in the top 3 positions, outperforming previous results. A case study has been performed on proteins with known allosteric pockets, which shows the proof of our approach. Moreover, an effort was made to explain the pLM by visualizing its attention mechanism among allosteric and non-allosteric residues.Availability and implementation The source code is available on GitHub (https://github.com/MoaazK/deepallo) and archived on Zenodo (DOI: 10.5281/zenodo.15255379). The trained model is hosted on Hugging Face (DOI: 10.57967/hf/5198). The dataset used for training and evaluation is archived on Zenodo (DOI: 10.5281/zenodo.15255437).
dc.description.fulltextYes
dc.description.harvestedfromManual
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.indexedbyPubMed
dc.description.openaccessGold OA
dc.description.publisherscopeInternational
dc.description.readpublishN/A
dc.description.sponsoredbyTubitakEuTÜBİTAK
dc.description.sponsorshipScientific and Technological Research Council of Turkiye (TÜBİTAK) [120C120]
dc.description.versionPublished Version
dc.description.volume41
dc.identifier.doi10.1093/bioinformatics/btaf294
dc.identifier.eissn1367-4811
dc.identifier.embargoNo
dc.identifier.filenameinventorynoIR06354
dc.identifier.issn1367-4803
dc.identifier.issue6
dc.identifier.quartileQ1
dc.identifier.scopus2-s2.0-105008110303
dc.identifier.urihttps://doi.org/10.1093/bioinformatics/btaf294
dc.identifier.urihttps://hdl.handle.net/20.500.14288/30073
dc.identifier.wos001503429800001
dc.keywordsBiotechnology and applied microbiology
dc.keywordsMathematical and computational biology
dc.language.isoeng
dc.publisherOxford Univ Press
dc.relation.affiliationKoç University
dc.relation.collectionKoç University Institutional Repository
dc.relation.ispartofBioinformatics
dc.relation.openaccessYes
dc.rightsCC BY (Attribution)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectBiochemical research methods
dc.subjectComputer science
dc.subjectStatistics and probability
dc.titleDeepAllo: allosteric site prediction using protein language model (pLM) with multitask learning
dc.typeJournal Article
dspace.entity.typePublication
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication77d67233-829b-4c3a-a28f-bd97ab5c12c7
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublicationc747a256-6e0c-4969-b1bf-3b9f2f674289
relation.isOrgUnitOfPublication.latestForDiscovery3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublicationd437580f-9309-4ecb-864a-4af58309d287
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication.latestForDiscovery434c9663-2b11-4e66-9399-c863e2ebae43

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
IR06354.pdf
Size:
3.04 MB
Format:
Adobe Portable Document Format