Release notes
Description
- Large scale designation release implementing an ambiguity threshold cut off of 5% of ambiguity within the coding region of SARS-CoV-2. This includes both N and non-N ambiguity.
- 1,382,550 sequences with ambiguity assessed, GISAID data as of 2021-05-07
Figure 1: Ambiguity content in sequences on GISAID, coloured by whether they have been designated a Pango lineage or not. Vertical dashed lines highlight the 1% and 5% ambiguity thresholds respectively.
Outcome
-
Designated sequences that passed amb threshold of 5%: 436,887
-
Number of sequences removed because of ambiguity calculation >5%: 18,246
Figure 2: Updated designation record to include only those sequences with a coding region ambiguity content less than 5%. Vertical dashed lines highlight the 1% and 5% ambiguity thresholds respectively. -
Note: designated sequences that didn't match a sequence in GISAID and so we'ren't checked for ambiguity: 18,665, many of these will be from COG-UK and are not yet on GISAID
-
Total sequences written to file: 455,552
Lineages that now have a less than 5 designated sequences
Total count = 4
Lineage previous_count new_count
C.10 4 4
C.6 4 3
AT.1 6 4
B.1.617 1 1
Lineage stats
- Lineages with altered designations: 912
- Total lineage count: 1288