Research Articles (SANBI)
Permanent URI for this collection
Browse
Browsing by Issue Date
Now showing 1 - 20 of 127
Results Per Page
Sort Options
Item The contribution of exon-skipping events on chromosome 22 to protein coding diversity(Cold Spring Harbor Laboratory Press, 2001) Hide, Winston A.; Babenko, Vladimir N.; van Heusden, Peter A.Completion of the human genome sequence provides evidence for a gene count with lower bound 30,000–40,000. Significant protein complexity may derive in part from multiple transcript isoforms. Recent EST based studies have revealed that alternate transcription, including alternative splicing, polyadenylation and transcription start sites, occurs within at least 30–40% of human genes. Transcript form surveys have yet to integrate the genomic context, expression, frequency, and contribution to protein diversity of isoform variation. We determine here the degree to which protein coding diversity may be influenced by alternate expression of transcripts by exhaustive manual confirmation of genome sequence annotation, and comparison to available transcript data to accurately associate skipped exon isoforms with genomic sequence. Relative expression levels of transcripts are estimated from EST database representation. The rigorous in silico method accurately identifies exon skipping using verified genome sequence. 545 genes have been studied in this first hand-curated assessment of exon skipping on chromosome 22.Item Comparative analysis of testis and ovary transcriptomes in zebrafish by combining experimental and computational tools(Wiley, 2004) Li, Yang; Chia, Jer, M; Bartfai, Richard; Christoffels, Alan; Yue, Gen, H; Ding, Ke; Ho, Mei, Y; Hill, James, A; Stupka, Elia; Orban, LaszloStudies on the zebrafish model have contributed to our understanding of several important developmental processes, especially those that can be easily studied in the embryo. However, knowledge on late events such as gonad differentiation in the zebrafish is still limited. Here an analysis on the gene sets is expressed in the adult zebrafish testis and ovary in an attempt to identify genes with potential role in (zebra)fish gonad development and function. We produced 10 533 expressed sequence tags (ESTs) from zebrafish testis or ovary and downloaded an additional 23 642 gonad-derived sequences from the zebrafish EST database. We clustered these sequences together with over 13 000 kidney-derived zebrafish ESTs to study partial transcriptomes for these three organs. We searched for genes with gonad-specific expression by screening macroarrays containing at least 2600 unique cDNA inserts with testis-, ovary- and kidney-derived cDNA probes. Clones hybridizing to only one of the two gonad probes were selected, and subsequently screened with computational tools to identify 72 genes with potentially testis-specific and 97 genes with potentially ovary-specific expression, respectively. PCR-amplification confirmed gonad-specificity for 21 of the 45 clones tested (all without known function). Our study, which involves over 47 000 EST sequences and specialized cDNA arrays, is the first analysis of adult organ transcriptomes of zebrafish at such a scale. The study of genes expressed in adult zebrafish testis and ovary will provide useful information on regulation of gene expression in teleost gonads and might also contribute to our understanding of the development and differentiation of reproductive organs in vertebrates.Item Opportunities in Africa for training in genome science(Academic Journals, 2004) Masiga, Daniel K.; Isokpehi, Raphael D.Genome science is a new type of biology that unites genetics, molecular biology, computational biology and bioinformatics. The availability of the human genome sequence, as well as the genome sequences of several other organisms relevant to health, agriculture and the environment in Africa necessitates the development and delivery of several types and levels of training that will enhance the use of genome data and the associated computational resources. A survey of initiatives that provide opportunities for training in genome science is presented. Current efforts to increase the ability of African scientists to computationally process and analyse genomic and post-genomic data have the potential to produce excellent scientists who perform cutting-edge, hypothesis-based research, and who will accelerate the continent's scientific and technological development.Item FRAGS: Estimation of coding sequence substitution rates from fragmentary data(BMC, 2004) Swart, Estienne C; Hide, Winston A; Seoighe, CathalRates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account.Item Mice and men: Their promoter properties(PLoS Genetics, 2006) Bajic, Vladimir B.; Tan, Sin lam; Christoffels, Alan; Schonbach, Christian; Lipovich, Leonard; Yang, Liang; Hofmann, Oliver; Kruger, Adele; Hide, Winston; Kai, Chikatoshi; Kawai, Jun; Hume, David, A.; Carninci, Piero; Hayashizaki, YoshihideUsing the two largest collections of Mus musculus and Homo sapiens transcription start sites (TSSs) determined based on CAGE tags, ditags, full-length cDNAs, and other transcript data, we describe the compositional landscape surrounding TSSs with the aim of gaining better insight into the properties of mammalian promoters. We classified TSSs into four types based on compositional properties of regions immediately surrounding them. These properties highlighted distinctive features in the extended core promoters that helped us delineate boundaries of the transcription initiation domain space for both species. The TSS types were analyzed for associations with initiating dinucleotides, CpG islands, TATA boxes, and an extensive collection of statistically significant cis-elements in mouse and human. We found that different TSS types show preferences for different sets of initiating dinucleotides and ciselements. Through Gene Ontology and eVOC categories and tissue expression libraries we linked TSS characteristics to expression. Moreover, we show a link of TSS characteristics to very specific genomic organization in an example of immune-response-related genes (GO:0006955). Our results shed light on the global properties of the two transcriptomes not revealed before and therefore provide the framework for better understanding of the transcriptional mechanisms in the two species, as well as a framework for development of new and more efficient promoter- and gene-finding tools.Item DDESC: Dragon database for exploration of sodium channels in human(BMC Cancer, 2008) Sagar, Sunil; Kaur, Mandeep; Dawe, Adam; Seshadri, Sundararajan V.; Christoffels, Alan; Schaefer, Ulf; Radovanovic, Aleksander; Bajic, Vladimir B.Sodium channels are heteromultimeric, integral membrane proteins that belong to a superfamily of ion channels. The mutations in genes encoding for sodium channel proteins have been linked with several inherited genetic disorders such as febrile epilepsy, Brugada syndrome, ventricular fibrillation, long QT syndrome, or channelopathy associated insensitivity to pain. In spite of these significant effects that sodium channel proteins/genes could have on human health, there is no publicly available resource focused on sodium channels that would support exploration of the sodium channel related information. We report here Dragon Database for Exploration of Sodium Channels in Human (DDESC), which provides comprehensive information related to sodium channels regarding different entities, such as "genes and proteins", "metabolites and enzymes", "toxins", "chemicals with pharmacological effects", "disease concepts", "human anatomy", "pathways and pathway reactions" and their potential links. DDESC is compiled based on text- and data-mining. It allows users to explore potential associations between different entities related to sodium channels in human, as well as to automatically generate novel hypotheses. DDESC is first publicly available resource where the information related to sodium channels in human can be explored at different levels.Item Transcriptomic analysis reveal novel genes with sexually dimorphic expression in the zebrafish gonad and brain(Plosone, 2008) Sreenivasan, Rajini; Cai, Minnie; Bartfai, Richard; Wang, Xingang; Orban, Laszlo; Christoffels, AlanOur knowledge on zebrafish reproduction is very limited. We generated a gonad-derived cDNA microarray from zebrafish and used it to analyze large-scale gene expression profiles in adult gonads and other organs. We have identified 116638 gonad-derived zebrafish expressed sequence tags (ESTs), 21% of which were isolated in our lab. Following in silico normalization, we constructed a gonad-derived microarray comprising 6370 unique, full-length cDNAs from differentiating and adult gonads. Labeled targets from adult gonad, brain, kidney and ‘rest-of-body’ from both sexes were hybridized onto the microarray. Our analyses revealed 1366, 881 and 656 differentially expressed transcripts (34.7% novel) that showed highest expression in ovary, testis and both gonads respectively. Hierarchical clustering showed correlation of the two gonadal transcriptomes and their similarities to those of the brains. In addition, we have identified 276 genes showing sexually dimorphic expression both between the brains and between the gonads. By in situ hybridization, we showed that the gonadal transcripts with the strongest array signal intensities were germline-expressed. We found that five members of the GTP-binding septin gene family, from which only one member (septin 4) has previously been implicated in reproduction in mice, were all strongly expressed in the gonads. We have generated a gonad-derived zebrafish cDNA microarray and demonstrated its usefulness in identifying genes with sexually dimorphic co-expression in both the gonads and the brains. We have also provided the first evidence of large-scale differential gene expression between female and male brains of a teleost. Our microarray would be useful for studying gonad development, differentiation and function not only in zebrafish but also in related teleosts via cross-species hybridizations. Since several genes have been shown to play similar roles in gonadogenesis in zebrafish and other vertebrates, our array may even provide information on genetic disorders affecting gonadal phenotypes and fertility in mammals.Item Database for exploration of functional context of genes implicated in ovarian cancer(Oxford Journals, 2009) Kaur, Mandeer; Radovanovic, Aleksander; Essack, Magbubah; Schaefer, Ulf; Maqungo, Monique; Kibler, Tracey; Schmeier, Sebastian; Christoffels, Alan; Narasimhan, Kothandaraman; Choolani, Mahesh; Bajic, Vladimir B.Ovarian cancer (OC) is becoming the most common gynecological cancer in developed countries and the most lethal gynecological malignancy. It is also the fifth leading cause of all cancer-related deaths in women. The identification of diagnostic biomarkers and development of early detection techniques for OC largely depends on the understanding of the complex functionality and regulation of genes involved in this disease. Unfortunately, information about these OC genes is scattered throughout the literature and various databases making extraction of relevant functional information a complex task. To reduce this problem, we have developed a database dedicated to OC genes to support exploration of functional characterization and analysis of biological processes related to OC. The database contains general information about OC genes, enriched with the results of transcription regulation sequence analysis and with relevant text mining to provide insights into associations of the OC genes with other genes, metabolites, pathways and nuclear proteins. Overall, it enables exploration of relevant information for OC genes from multiple angles, making it a unique resource for OC and will serve as a useful complement to the existing public resources for those interested in OC genetics.Item DDEC: Dragon databaseof genes implicated in esophageal cancer(BioMed Central, 2009) Essack, Magbubah; Radovanovic, Aleksander; Schaefer, Ulf; Schmeier, Sebastian; Seshadri, Sundararajan V.; Christoffels, Alan; Kaur, Mandeep; Bajic, Vladimir B.Esophageal cancer ranks eighth in order of cancer occurrence. Its lethality primarily stems from inability to detect the disease during the early organ-confined stage and the lack of effective therapies for advanced-stage disease. Moreover, the understanding of molecular processes involved in esophageal cancer is not complete, hampering the development of efficient diagnostics and therapy. Efforts made by the scientific community to improve the survival rate of esophageal cancer have resulted in a wealth of scattered information that is difficult to find and not easily amendable to data-mining. To reduce this gap and to complement available cancer related bioinformatic resources, we have developed a comprehensive database (Dragon Database of Genes Implicated in Esophageal Cancer) with esophageal cancer related information, as an integrated knowledge database aimed at representing a gateway to esophageal cancer related data. Manually curated 529 genes differentially expressed in EC are contained in the database. We extracted and analyzed the promoter regions of these genes and complemented gene-related information with transcription factors that potentially control them. We further, precompiled text-mined and data-mined reports about each of these genes to allow for easy exploration of information about associations of EC-implicated genes with other human genes and proteins, metabolites and enzymes, toxins, chemicals with pharmacological effects, disease concepts and human anatomy. The resulting database, DDEC, has a useful feature to display potential associations that are rarely reported and thus difficult to identify. Moreover, DDEC enables inspection of potentially new 'association hypotheses' generated based on the precompiled reports. We hope that this resource will serve as a useful complement to the existing public resources and as a good starting point for researchers and physicians interested in EC genetics.Item Genome-wide SNP identification by high-throughput sequencing and selective mapping allows sequence assembly positioning using a framework genetic linkage map(BioMed Central, 2010) Celton, Jean M.; Christoffels, Alan; Sargant, Daniel J.; Xu, Xiangming; Rees, Jasper G.Determining the position and order of contigs and scaffolds from a genome assembly within an organism’s genome remains a technical challenge in a majority of sequencing projects. In order to exploit contemporary technologies for DNA sequencing. We developed a strategy for whole genome single nucleotide polymorphism sequencing allowing the positioning of sequence contigs onto a linkage map using the bin mapping method. The strategy was tested on a draft genome of the fungal pathogen Venturia inaequalis, the causal agent of apple scab, and further validated using sequence contigs derived from the diploid plant genome Fragaria vesca. Using our novel method we were able to anchor 70% and 92% of sequences assemblies for V. inaequalis and F. vesca, respectively, to genetic linkage maps. We demonstrated the utility of this approach by accurately determining the bin map positions of the majority of the large sequence contigs from each genome sequence and validated our method by mapping single sequence repeat markers derived from sequence contigs on a full mapping population.Item DDPC: Dragon database of genes associated with prostate cancer(Oxford Journals, 2011) Maqungo, Monique; Kaur, Mandeep; Kwofie, Samuel K.; Radovanovic, Aleksander; Schaefer, Ulf; Schmeier, Sebastian; Oppon, Ekow; Christoffels, Alan; Bajic, Vladimir B.Prostate cancer (PC) is one of the most commonly diagnosed cancers in men. PC is relatively difficult to diagnose due to a lack of clear early symptoms. Extensive research of PC has led to the availability of a large amount of data on PC. Several hundred genes are implicated in different stages of PC, which may help in developing diagnostic methods or even cures. In spite of this accumulated information, effective diagnostics and treatments remain evasive. We have developed Dragon Database of Genes associated with Prostate Cancer (DDPC) as an integrated knowledgebase of genes experimentally verified as implicated in PC. DDPC is distinctive from other databases in that (i) it provides pre-compiled biomedical text-mining information on PC, which otherwise require tedious computational analyses, (ii) it integrates data on molecular interactions, pathways, gene ontologies, gene regulation at molecular level, predicted transcription factor binding sites on promoters of PC implicated genes and transcription factors that correspond to these binding sites and (iii) it contains DrugBank data on drugs associated with PC. We believe this resource will serve as a source of useful information for research on PC.Item DAMPD: a manually curated antimicrobial peptide database(Oxford University Press, 2012) Sundararajan, Vijayaraghava S.; Gabere, Musa N.; Pretorius, Ashley; Adam, Saleem; Christoffels, Alan; Minna, Lehvaslaiho; Archer, John A.C.; Bajic, Vladimir B.The demand for antimicrobial peptides (AMPs) is rising because of the increased occurrence of pathogens that are tolerant or resistant to conventional antibiotics. Since naturally occurring AMPs could serve as templates for the development of new anti-infectious agents to which pathogens are not resistant, a resource that contains relevant information on AMP is of great interest. To that extent, we developed the Dragon Antimicrobial Peptide Database (DAMPD, http://apps.sanbi.ac.za/dampd) that contains 1232 manually curated AMPs. DAMPD is an update and a replacement of the ANTIMIC database. In DAMPD an integrated interface allows in a simple fashion querying based on taxonomy, species, AMP family, citation, keywords and a combination of search terms and fields (Advanced Search). A number of tools such as Blast, ClustalW, HMMER, Hydrocalculator, SignalP, AMP predictor, as well as a number of other resources that provide additional information about the results are also provided and integrated into DAMPD to augment biological analysis of AMPs.Item The African Coelecanth genome provides insights into tetrapod evolution(Macmillan Publishers, 2013) Christoffels, Alan; Hesse, Uljana; Gamieldien, Junaid; Panji, Sumir; Picone, Barbara; Van Heusden, PeterThe discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.Item Trends in Genotypic HIV-1 Antiretroviral resistance between 2006 and 2012 in South African Patients receiving first- and second line antiretroviral treatment regimens(University of the Western Cape, 2013) Van Zyl, Gert U.; Liu, Tommy F.; Claassen, Mathilda; Engelbrecht, Susan; de Oliveira, Tulio; Preiser, Wolfgang; Wood, Natasha T.; Travers, Simon A.; Shafer, Robert W.South Africa's national antiretroviral (ARV) treatment program expanded in 2010 to include the nucleoside reverse transcriptase (RT) inhibitors (NRTI) tenofovir (TDF) for adults and abacavir (ABC) for children. We investigated the associated changes in genotypic drug resistance patterns in patients with first-line ARV treatment failure since the introduction of these drugs, and protease inhibitor (PI) resistance patterns in patients who received ritonavir-boosted lopinavir (LPV/r)-containing therapy.Item Evidence that dicot-infecting mastreviruses are particularly prone to inter-species recombination and have likely been circulating in Australia for longer than in Africa and the Middle East(Elsevier, 2013) Kraberger, Simona; Harkins, Gordon William; Kumari, Safaa G.; Thomas, John E.; Schwinghamer, Mark W.; Sharman, Murray; Collings, David A.; Briddon, Rob W.; Martin, Darren Patrick; Varsani, ArvindViruses of the genus Mastrevirus (family Geminiviridae) are transmitted by leafhoppers and infect either mono- or dicotyledonous plants. Here we have determined the full length sequences of 49 dicot-infecting mastrevirus isolates sampled in Australia, Eritrea, India, Iran, Pakistan, Syria, Turkey and Yemen. Comprehensive analysis of all available dicot-infecting mastrevirus sequences showed the diversity of these viruses in Australia to be greater than in the rest of their known range, consistent with earlier studies, and that, in contrast with the situation in monocot-infecting mastreviruses, detected inter-species recombination events outnumbered intra-species recombination events. Consistent with Australia having the greatest diversity of known dicot-infecting mastreviruses phylogeographic analyses indicating the most plausible scheme for the spread of these viruses to their present locations, suggest that most recent common ancestor of these viruses is likely nearer Australia than it is to the other regions investigated.Item Challenges of biobanking in South Africa to facilitate indigenous research in an environment burdened with human immunodeficiency virus, tuberculosis, and emerging non-communicable diseases(Mary Ann Liebert, Inc., 2013) Abayomi, Akin; Christoffels, Alan; Grewal, Ravnit; Karam, Locunda A.; Rossouw, Catherine; Staunton, Ciara; Swanepoel, Carmen; van Rooyen, BeverleyThe high burden of infectious diseases and the growing problem of noncommunicable and metabolic disease syndromes in South Africa (SA) forces a more focused research approach to facilitate cutting-edge scientific growth and public health development. Increased SA research on these diseases and syndromes and the collection of associated biospecimens has ensured a plethora of biobanks created by individuals, albeit without the foresight of prospective and collective use by other local and international researchers. As the need for access to high-quality specimens in statistically relevant numbers has increased, so has the necessity for the development of national human biobanks in SA and across the Continent. The prospects of achieving sustainable centralized biobanks are still an emerging and evolving concept, primarily and recently driven by the launch of the H3Africa consortium, which includes the development of harmonized and standardized biobanking operating procedures. This process is hindered by a myriad of complex societal considerations and ethico-legal challenges. Efforts to consolidate and standardize biological sample collections are further compromised by the lack of full appreciation by national stakeholders of the biological value inherent in these collections, and the availability of high quality human samples with well-annotated data for future scientific research and development. Inadequate or nonexistent legislative structures that specifically regulate the storage, use, dispersal, and disposal of human biological samples are common phenomena and pose further challenges. Furthermore, concerns relating to consent for unspecific future uses, as well as access to information and data protection, are all new paradigms that require further consideration and public engagement. This article reviews important fundamental issues such as governance, ethics, infrastructure, and bioinformatics that are important foundational prerequisites for the establishment and evolution of successful human biobanking in South Africa.Item The Influence of N-Linked Glycans on the MolecularDynamics of the HIV-1 gp120 V3 Loop(PLoS ONE, 2013) Wood, Natasha T.; Fadda, Elisa; Davis, Robert; Grant, Oliver C.; Martin, Joanne C.; Woods, Robert J.; Travers, Simon A.N-linked glycans attached to specific amino acids of the gp120 envelope trimer of a HIV virion can modulate the binding affinity of gp120 to CD4, influence coreceptor tropism, and play an important role in neutralising antibody responses. Because of the challenges associated with crystallising fully glycosylated proteins, most structural investigations have focused on describing the features of a non-glycosylated HIV-1 gp120 protein. Here, we use a computational approach to determine the influence of N-linked glycans on the dynamics of the HIV-1 gp120 protein and, in particular, the V3 loop. We compare the conformational dynamics of a non-glycosylated gp120 structure to that of two glycosylated gp120 structures, one with a single, and a second with five, covalently linked high-mannose glycans. Our findings provide a clear illustration of the significant effect that N-linked glycosylation has on the temporal and spatial properties of the underlying protein structure. We find that glycans surrounding the V3 loop modulate its dynamics, conferring to the loop a marked propensity towards a more narrow conformation relative to its non-glycosylated counterpart. The conformational effect on the V3 loop provides further support for the suggestion that N-linked glycosylation plays a role in determining HIV-1 coreceptor tropism.Item Avihepadnavirus diversity in parrots is comparable to that found amongst all other avian species(Elsevier, 2013) Piasecki, Tomasz; Harkins, Gordon William; Chrzastek, Klaudia; Julian, Laurel; Martin, Darren Patrick; Varsani, ArvindAvihepadna viruses have previously been isolated from various species of duck ,goose, stork, heron and crane. Recently the first parrot avihepadna virus was isolated from a Ring-necked Parakeet in Poland. In this study, 41 psittacineliver samples archived in Poland over the last nine years were tested for presence of Parrot hepatitis B virus(PHBV). We cloned and sequenced PHBVisolates from 18 birds including a Crimson Rosella, an African grey parrot and sixteen Ring-necked Parakeets. PHBVisolates display a degree of diversity (478% genome wide pair wise identity) that is comparable to that found amongst all other avihepadna viruses (479% genome wide pair wise identity). The PHBV viruses can be subdivided into seven genetically distinct groups (tentatively named A-G) of which the two isolated of PHBV-Gare the most divergent sharing 79% genome wide pair wise identity with all their PHBVs. All PHBV isolates display classical avihepadnavirus genome architecture.Item Characterizing the emergence and persistence of drug resistant mutations in HIV-1 subtype C infections using 454 ultra deep pyrosequencing(BioMed Central -The Open Access Publisher, 2013) Bansode, Vijay; McCormack, Grace P.; Shrestha, Ram K.; Travers, Simon A.; Crampin, Amelia C.; Ngwira, Bagrey; French, Neil; Glynn, Judith R.BACKGROUND: The role of HIV-1 RNA in the emergence of resistance to antiretroviral therapies (ARTs) is well documented while less is known about the role of historical viruses stored in the proviral DNA. The primary focus of this work was to characterize the genetic diversity and evolution of HIV drug resistant variants in an individual’s provirus during antiretroviral therapy using next generation sequencing. METHODS: Blood samples were collected prior to antiretroviral therapy exposure and during the course of treatment from five patients in whom drug resistance mutations had previously been identified using consensus sequencing. The spectrum of viral variants present in the provirus at each sampling time-point were characterized using 454 pyrosequencing from multiple combined PCR products. The prevalence of viral variants containing drug resistant mutations (DRMs) was characterized at each time-point. RESULTS: Low abundance drug resistant viruses were identified in 14 of 15 sampling time-points from the five patients. In all individuals DRMs against current therapy were identified at one or more of the sampling time-points. In two of the five individuals studied these DRMs were present prior to treatment exposure and were present at high prevalence within the amplified and sequenced viral population. DRMs to drugs other than those being currently used were identified in four of the five individuals. CONCLUSION: The presence of DRMs in the provirus, regardless of their observed prevalence did not appear to have an effect on clinical outcomes in the short term suggesting that the drug resistant viral variants present in the proviral DNA do not appear to play a role in the short term in facilitating the emergence of drug resistance.Item Evidence of pervasive biologically functional secondary structures within the Genomes of Eukaryotic Single-Stranded DNA Viruses(American Society for Microbiology, 2013) Muhire, Brejnev Muhizi; Golden, Michael; Tanov, Emil Pavlov; Harkins, Gordon William; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y. F.; Ngandu, Nobubelo Kwanele; Semegni, Yves; Monjane, Adérito Luis; Varsani, Arvind; Shepherd, Dionne Natalie; Martin, Darren PatrickSingle-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here.