Zoon0predv: potential virus species crossover prediction using convolutional neural networks and viral protein sequence patterns

Abstract

Biomedical science has made substantial progress toward diagnosing, understanding the pathogenesis, and treating various causative agents of infectious disease. However, novel microbial pathogens continue to emerge, and existing pathogens continue to evolve alternative strategies to thrive in ever-changing environments. Various infectious disease etiological agents originate from animal reservoirs, and several have, over time, acquired the ability to cross the species barrier, altering their host range. Computational approaches in biomedical science capable of analyzing large datasets are invaluable for predicting and monitoring disease outbreaks and their effectiveness is greatly enhanced when integrated with machine learning techniques. The goal of this study is to develop a machine learning model for the prediction of potentially zoonotic organisms, using viral surface proteins that facilitate host cell entry as input data. Sequence data and metadata were obtained from UniProtKB, transformed into a machine-readable format, using frequency chaos game representation and a convolutional neural network model was developed to identify sequence patterns consistent with viruses which infect humans. The model achieves generalized performance of 96.78% accuracy, 0.97 F1 score, and 0.93 MCC (Matthews Correlation Coefficient) on unseen data. The model potentially provides a robust framework for application in early identification of emerging viral threats, supporting public health surveillance and risk mitigation.

Description

Citation

Serage, R.A., Nyirenda, C.N., Omomule, T.G., Christoffels, A.G. and Anderson, D.E., 2026. Zoon0PredV: Potential Virus Species Crossover Prediction Using Convolutional Neural Networks and Viral Protein Sequence Patterns. Bioinformatics and Biology Insights, 20, p.11779322251415123.