Class IV and class V viruses are single-stranded RNA (ssRNA viruses) in which the positive sense genome is translated into protein by host machinery, allowing viral reproduction. ssRNA class V viruses contain a negative sense genome that must be transcribed into a positive-sense strand before translation. During the virus's attempt to use the host cell to replicate, host proteins interact heavily with the viral genome. Researchers have been investigating these interactions.
Study: Investigating the human host – ssRNA virus interaction landscape using the SMEAGOL toolbox. Image Credit: ART-ur/Shutterstock
A preprint version of the study is available on the
bioRxiv* server while the article undergoes peer review. The study
The researchers used SMEAGOL, a python library designed for motif occurrence analysis for nucleic acid sequences using position weight matrices (PWMs). These PWMs represent the binding specificity of nucleic acid-interacting regulators, and SMEAGOL can load these from certain databases of RNA-binding proteins (RBP) binding specificities. Most curated databases of RBP binding motifs contain both high and low confidence PWMs, and SMEAGOL allows the analysis and filtration of these and even scanning the curated results. Post-processing modules allow statistical enrichment or depletion of sequence motifs and predicted effects of sequence variants on PWM sites.
To determine enriched and depleted binding sites across ssRNA virus genomes, PWMs representing the determined sequence binding preferences of RBPs from the previously mentioned databases are filtered and curated using SMEAGOL to reveal a set of 362 that can be used. These represented the binding specificities of 146 RBPs. Next, the set of PWMs was used with SMEAGOL to scan the genome of 197 ssRNA viruses to identify potential binding sites for the RBPs.
By determining the enrichment or depletion of RBP binding motifs in each viral genome compared to dinucleotide-randomized versions of the genome, the researchers hoped to find evidence of evolutionary selection for or against RBP binding to viral RNA. As the complementary strand of ssRNA is produced during viral life cycles and can interact with host cell features, this was also analyzed.
The researchers found that +ssRNA (Class IV) genomes had more motif enrichment or depletion in both strands compared to -ssRNA (Class V) genomes. Still, this number did vary significantly between viral families. While some RBP binding motifs are more likely to be depleted/enriched in +ssRNA genomes, other motifs tend to be more specific to a specific family. G-rich motifs are frequently depleted in the plus-strand of +ssRNA genomes, while the motif for RBMX is enriched on the negative molecule in several families across both groups.
Next, the scientists examined viral families individually, beginning with coronaviruses. These have the longest genomes of all viruses examined and show more enrichment/depletion than any other family. The number of RBP motifs depleted on the plus strand is much higher than the number of enriching motifs, potentially to prevent binding by non-beneficial host RBPs. This is due to the depletion of U-rich elements bound by RBPs. Flaviviruses are poor in uridines but still more enriched for UREs than coronaviruses.
The researchers also looked at the 5' and 3' UTRs that bind host RBPs. Of the 116 enriched motif-UTR pairs, they found that 22 were not enriched in the whole genome, and 3' UTRs were more likely to show enrichment than the 5'. When examining the Hepatitis C virus genome specifically, they found it to be highly enriched in binding sites of U-rich element-binding RBPs. The most enriched is the ELAV-like RNA binding protein 1 (ELAVL1). This is highly enriched in the 3' UTR and is known to displace Polyprimidine Tract Binding Protein 1 to facilitate the binding of viral proteins for more effective replication.
The authors highlight that this is the first large-scale analysis of interactions between RNA viruses and human RBPs. They found multiple RBP-binding motifs either enriched or depleted in ssRNA families. These could trend globally and in a family or species-specific manner. Host factors can bind these RBPs, and the differences in predicted host interactions between viral families could be significant.
The deep learning methods shown here could be very useful in helping to identify further interactions. The researchers suggest that SMEAGOL could be extended in the future to incorporate deep learning models and structure-based models further.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information