New dataset version (tag 2022-09-27T12:00:00Z
)
All SARS-CoV-2 datasets
- Data update: New Pango lineages are included, see cov-lineages/pango-designation@efabcb6...cfe736 for new desigantions that are included
- Identical sequences have been removed from B.1* lineages to reduce size of that part of the tree from ~1.6k to ~800.
BA.2 dataset (experimental)
- Reversions to wild type (Wuhan-Hu-1) are now labelled as
rev
to make it easier to spot problematic sequences - The dataset now contains antibody escape and ACE2 binding data from two repositories of Jesse Bloom's group on Github: https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS_Omicron/epistatic-shifts/ and https://jbloomlab.github.io/SARS2_RBD_Ab_escape_maps/escape-calc/. For more information, please refer to: https://doi.org/10.1093/ve/veac021, https://doi.org/10.1101/2022.09.15.507787 and https://doi.org/10.1101/2022.09.20.508745.
Monkeypox datasets
- New lineages A.2.2 and B.1.10-B.1.12 have been added, see here for details: https://github.com/mpxv-lineages/lineage-designation/blob/master/designation_records/B.1.10-A.2.2_2022-09-26.md
hMPXV B.1 dataset
- Mutations to a genotype found in MPXV-UK_P2 or MPXV-M5312_HM12_Rivers are now "labelled" as
rev
(reversion to reference). This should help identify wrong calls to reference when using the B.1 dataset. Until now, these artefacts were only visible asreversions
when using the hMPXV or all-clades datasets.
MPXV (All clades)
- Frame shifts and stop codons that are encountered in a majority of sequences from clades IIa or I are now annotated as "known" mutations, which means that they do not influence the quality score. This should help increase the signal to noise ratio when uploading sequences from either of the clades.