2023-03-24

Carly E. Middleton, Ph.D. Student, Department of Biostatistics and Bioinformatics, University of Louisville

"Assessment of positive selection across SARS-CoV-2 variants via maximum likelihood approaches and a continuous-time approximation to the Wright-Fisher diffusion model"

Study of the genome of the SARS-CoV-2 virus, particularly with regard to understanding evolution of the virus, is crucial for managing the COVID-19 pandemic. To this end, we sample viral genomes from the online GISAID repository and use several of the maximum likelihood approaches implemented in the open source program PAML to assess evidence for positive selection in the protein-coding regions of the SARS-CoV-2 genome. Across all major variants identified by June 2021, we find limited evidence for positive selection. In particular, we identify positive selection in a small proportion of sites (5-15%) in the protein-coding region of the spike protein across variants. Most other variants did not show a strong signal for positive selection overall, though there were indications of positive selection in the Delta and Kappa variants for the nucleocapsid protein. We additionally use a forward selection procedure to fit a model that allows branch-specific estimates of selection along a phylogeny relating the variants, and find that there is variation in the selective pressure across variants for the spike protein. We then test for positive selection within variants again, this time utilizing an allele frequency approach employed by a continuous-time approximation to the Wright-Fisher diffusion model. Our results from both methods of selection identification highlight the utility of computational approaches for identifying genomic regions under selection.

Stay connected TwitterFacebook LinkedIn YouTubeInstagram