Publication:
NRBO-AGP: a novel feature selection approach for accurate protein solubility prediction

dc.contributor.coauthorElmi Z.
dc.contributor.coauthorDanishvar S.
dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.kuauthorElmi, Soheila
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned2026-02-26T07:12:32Z
dc.date.available2026-02-25
dc.date.issued2026
dc.description.abstractProtein solubility determines how well a protein dissolves in an aqueous solution, and this property is a critical factor in the functional analysis of proteins and biotechnological applications. Accurately estimating solubility can provide significant advantages in areas such as protein engineering and drug discovery. This study proposes a new feature selection method, Newton-Raphson-based Optimization and Adaptive Gradient Perturbation (NRBO-AGP) for predicting protein solubility. The research combines the accuracy and speed of the Newton-Raphson method with the capacity of population-based optimization techniques to balance exploration and exploitation. Using 3144 protein sequences from the eSOL database, descriptor features were obtained for each protein, resulting in a dataset with 3104 features. The performance of NRBO-AGP was compared with eight different metaheuristic algorithms and evaluated using five regression models: MLP, AdaBoost, Gradient Boosting Trees, Random Forest, and Support Vector Regressor (SVR). The best results were obtained with the Gradient Boosting and Random Forest. Mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2) metrics were used for performance evaluation. The results show that NRBO-AGP outperforms other metaheuristic algorithms in all regression models. The best results were achieved with Gradient Boosting and Random Forest, reaching MAE:0.0001±0.0000, RMSE: 0.0008±0.0000, and R2: 0.9908±0.0005, and MAE: 0.0002±0.0000, RMSE: 0.0025±0.0000, and R2: 0.9908±0.0005. These findings show that NRBO-AGP is an effective feature selection tool for predicting protein solubility. Multiple statistical analyses based on Friedman and Nemenyi tests show that the NBRO-AGP method exhibits statistically significant superior performance (p<.05) compared to other metaheuristic algorithms in MAE and RMSE metrics and also achieves the highest performance in the R2 score. © 2025
dc.description.fulltextYes
dc.description.harvestedfromManual
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.openaccessHybrid OA
dc.description.openaccessGold OA
dc.description.peerreviewstatusN/A
dc.description.publisherscopeInternational
dc.description.readpublishN/A
dc.description.sponsoredbyTubitakEuN/A
dc.description.versionN/A
dc.identifier.doi10.1016/j.eswa.2025.129194
dc.identifier.eissn1873-6793
dc.identifier.embargoNo
dc.identifier.issn0957-4174
dc.identifier.quartileQ1
dc.identifier.scopus2-s2.0-105012187852
dc.identifier.urihttps://doi.org/10.1016/j.eswa.2025.129194
dc.identifier.urihttps://hdl.handle.net/20.500.14288/32463
dc.identifier.volume296
dc.identifier.wos001546967000003
dc.keywordsDrug discovery
dc.keywordsFeature selection
dc.keywordsMetaheuristic approach
dc.keywordsProtein solubility prediction
dc.language.isoeng
dc.publisherElsevier
dc.relation.affiliationKoç University
dc.relation.collectionKoç University Institutional Repository
dc.relation.ispartofExpert Systems with Applications
dc.relation.openaccessYes
dc.rightsCC BY-NC-ND (Attribution-NonCommercial-NoDerivs)
dc.rights.uriAttribution, Non-commercial, No Derivative Works (CC-BY-NC-ND)
dc.subjectBioinformatics
dc.subjectProtein engineering
dc.titleNRBO-AGP: a novel feature selection approach for accurate protein solubility prediction
dc.typeJournal Article
dspace.entity.typePublication
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery434c9663-2b11-4e66-9399-c863e2ebae43

Files