github broadinstitute/gatk 4.6.0.0

3 days ago

Download release: gatk-4.6.0.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/

Highlights of the 4.6.0.0 release:

  • We've fixed a serious CRAM writing bug that affects GATK versions 4.3 through 4.5 and Picard versions 2.27.3 through 3.1.1. This bug can, in limited cases, lead to reads with an incorrect base sequence being written. See this comment to GATK issue 8768 and the full release notes below for more details on what conditions trigger the bug.

    • To help users detect whether their CRAM files are affected, we've released a CRAM scanning tool called CRAMIssue8768Detector that can detect whether a particular CRAM file is affected by this bug. If you suspect that some of your CRAM files may have been affected, please run this tool on them for confirmation!
  • By overwhelming popular demand, we've switched back to using the standard ./. representation for no-calls in GenotypeGVCFs and GenomicsDB instead of 0/0 with DP=0. This reverts the change described in our article GenotypeGVCFs and the death of the dot.

    • We intend to publish a new article shortly to replace that older article with further details on this change. When we do so, we'll link to it from here.
  • The Mutect2 germline resource can now have split multiallelic format

  • Added an --inverted-read-filter argument to allow for selecting reads that fail read filters from the command line easily

  • We've fixed a number of issues with HTTP support, mainly affecting the loading of side inputs such as indices over HTTP

  • Reduced the number of layers in the GATK docker image to help users running into docker quota issues

Full list of changes:

  • Important CRAM writing bug fix and detection tool

    • We've updated to HTSJDK 4.1.1 and Picard 3.2.0 (#8900), which fix a serious bug in the CRAM writing code first reported in GATK issue 8768
    • This issue affects GATK versions 4.3.0.0 through 4.5.0.0, and is fixed in GATK 4.6.0.0.
    • This issue also affects Picard versions 2.27.3 through 3.1.1, and is fixed in Picard 3.2.0.
    • The bug is triggered when writing a CRAM file using one of the affected GATK/Picard versions, and both of the following conditions are met:
      • At least one read is mapped to the very first base of a reference contig
      • The file contains more than one CRAM container (10,000 reads) with reads mapped to that same reference contig
    • When both of these conditions are met, the resulting CRAM file may have corrupt containers associated with that contig containing reads with an incorrect sequence.
    • Since many common references such as hg38 have N's at the very beginning of the autosomes and X/Y, many pipelines will not be affected by this bug. However, users of a telomere-to-telomere reference, users doing mitochondrial calling, and users with reads aligned to the alt sequences will want to scan their CRAM files for possible corruption.
    • The other mitigating circumstance is that when a CRAM is affected, the signal will be overwhelmingly obvious, with the mismatch rate typically jumping from sub-1% to 80-90% for the affected regions, making it likely to be caught by standard QC processes.
    • We've released a CRAM scanning tool called CRAMIssue8768Detector (#8819) that can detect whether a particular CRAM file is affected by this bug. If you suspect that some of your CRAM files may have been affected, please run this tool on them for confirmation!
  • Joint Calling

    • We've switched back to using the standard ./. representation for no-calls in GenotypeGVCFs and GenomicsDB instead of 0/0 with DP=0 (#8715) (#8741) (#8759)
    • Fix for GenotypeGVCFs with mixed ploidy sites (#8862)
    • Fix for GnarlyGenotyper when PLs are null (#8878)
    • Fixed bug in ReblockGVCF when removing annotations (#8870)
    • Enable ReblockGVCF to subset AS annotations that aren't "raw" (pipe-delimited) (#8771)
    • Remove header lines in ReblockGVCF when we remove FORMAT annotations (#8895)
    • ReblockGVCF: Add malaria spanning deletion exception regression test with fix (#8802)
    • Restore some GnarlyGenotyper tests (#8893)
  • HaplotypeCaller

    • Fix to long deletions that overhang into the assembly window causing exceptions in HaplotypeCaller (#8731)
  • Mutect2

    • The Mutect2 germline resource can now have split multiallelic format (#8837)
    • Make the Mutect2 haplotype and clustered events filters smarter about germline events (#8717)
    • Added the DragSTR model to the Mutect2 WDL (#8716)
    • Improvements to Mutect2's Permutect training data mode (#8663)
    • Bigger Permutect tensors and Permutect test datasets can be annotated with truth VCF (#8836)
    • Mutect2 WDL and GetSampleName can handle multiple sample names in BAM headers (#8859)
    • Permutect dataset engine outputs contig and read group indices, not names (#8860)
    • Normal artifact LOD is now defined without the extra minus sign (#8668)
  • CNV Calling

    • Fixed the GT header in PostprocessGermlineCNVCalls's --output-genotyped-intervals output (#8621)
  • SV Calling

    • Reduced SVConcordance memory footprint (#8623)
    • Rewrote complex SV functional annotation in SVAnnotate (#8516)
    • We now handle the CTX_INV subtype in SVAnnotate (#8693)
  • Flow-based Calling

    • SNVQ recalibration tool added for flow-based reads (#8697)
    • Bug fix in flow-based allele filtering (#8775)
    • Fixed a bug in flow-based AlleleFiltering that ignored more than a single sample (#8841)
    • Fixed an edge case in flow-based variant annotation (#8810)
  • Notable Enhancements

    • Added an --inverted-read-filter argument to allow for selecting reads that fail read filters from the command line easily (#8724)
    • Inverted SoftClippedReadFilter to conform to the standard filtering logic (#8888)
    • Reduced the number of docker layers in the GATK image from 44 to 16 (#8808)
    • VariantFiltration: added a --mask-description argument to write custom mask filter description in VCF header (#8831)
    • GatherVcfsCloud is no longer beta (#8680)
  • Miscellaneous Changes

    • GetPileupSummaries now uses the standard MappingQualityReadFilter instead of a custom --min-mapping-quality argument (#8781)
    • Funcotator: suppress a log message about b37 contigs when not doing b37/hg19 conversion (#8758)
    • Output the new image name at the end of a successful cloud docker build (#8627)
    • Exclude the test folder from code coverage calculations (#8744)
    • Removed deprecated genomes in the cloud docker image that was causing CNN WDL test failures (#8891)
    • Re-commit large test files as lfs stubs (#8769)
    • Standardize test results directory between normal/docker tests (#8718)
    • Improve failure message in VariantContextTestUtils (#8725)
    • Update the setup_cloud github action (#8651)
    • Parameterize the logging frequency for ProgressLogger in GatherVcfsCloud (#8662)
  • Documentation

    • Updated the README to include list of popular software included in docker image (#8745)
  • Dependencies

    • Updated HTSJDK to 4.1.1, which fixes the CRAM writing bug described above (#8900)
    • Updated Picard to 3.2.0, which fixes the CRAM writing bug described above (#8900)
    • Updated GenomicsDB to 1.5.3, which supports M1 Macs and switches no-call representation back to ./. (#8710) (#8759)
    • Updated http-nio to 1.1.1, which fixes several URL-handling bugs with HTTP support (#8889)
    • Updated several miscellaneous dependencies to fix security vulnerabilities (#8898)

Don't miss a new gatk release

NewReleases is sending notifications on new releases.