github broadinstitute/gatk 4.0.1.0

latest releases: 4.5.0.0, after_master_merge, before_master_merge...
6 years ago

Highlights of this release include a preview version of a future neural-network-based VQSR replacement, the ability to generate a VCF from the GermlineCNVCaller output, allele-specific annotation support in GenomicsDBImport, as well as a number of important post-4.0 bug fixes. See below for the full list of changes.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Changes in this release:

  • New experimental tool NeuralNetInference (#4097)
    • An eventual VQSR replacement.
    • Performs variant score inference with a 1D Convolutional Neural Network with a pre-trained model. This is faster but not as high quality the 2D model which is coming along with training and tranche-style filtering in the next GATK release (#4245).
    • Tool name subject to change!
  • GenomicsDBImport:
    • Add support for allele-specific annotations (#4261) (#3707)
    • Allow sample names with whitespace in the sample name map file (#3982)
    • Fix segfault crash on long path names (#4160)
    • Allow multiple import commands to be run in the same workspace directory (#4106)
    • Fix segfault crash during import when flag fields not declared in the VCF header (#3736)
    • Improve warning message when PLs are dropped for records with too many alleles (#3745)
  • CNV tools:
    • Added PostprocessGermlineCNVCalls tool for generating VCFs from GermlineCNVCaller output (#4254)
    • Exposed bounds for determining copy-neutral region in CallCopyRatioSegments (#4263)
    • Added support for CRAM inputs to CNV WDLs (#4257)
    • Miscellaneous bug fixes, documentation updates, and WDL cleanup.
  • HaplotypeCaller
    • Fix the --min-base-quality-score/-mbq argument, which previously had no effect (#4128). This fix also affects Mutect2.
    • Fix a "contig must be non-null and not equal to *, and start must be >= 1" error by patching an edge case in the ReadClipper code: when reverting soft-clipped bases of a read at the start of a contig, don't explode if you end up with an empty read (#4203)
  • Mutect2:
    • Smarter contamination model (#4195)
    • Removed the --dbsnp and --comp arguments. The best practice now is to pass in gnomAD as the germline-resource.
    • Removed a number of other arguments that were HaplotypeCaller-specific and not appropriate for Mutect2, such as --emit-ref-confidence.
    • Mutect2 WDL: CRAM support (#4297)
    • Mutect2 WDL: Compressed vcf output and Funcotator options (#4271)
    • Miscellaneous WDL cleanup
  • HaplotypeCallerSpark:
    • Fixes to the tool that make its output much closer to that of the non-Spark HaplotypeCaller (#4278). Note that this tool (unlike the non-Spark HaplotypeCaller) is still in beta, and should not be used for any real work. There are still major performance issues with the tool that in practice prevent running on certain kinds of large data and in certain modes.
    • Disallow writing a .vcf.gz when in GVCF mode, as this combination currently doesn't work (#4277)
  • BwaSpark:
    • set more reasonable default set of read filters (#4286)
  • PathSeq:
    • Add WDL for running the PathSeq pipeline with a README and example JSON input. (#4143)
  • Fix piping between Picard tools run via the GATK by changing logging output to stderr (#4167)
  • Disallow unindexed block-compressed tribble files as input to walkers (#4240) (#4224). This works around a bug in HTSJDK that could cause such files to appear truncated. Until the HTSJDK bug is fixed, block-compressed .vcf.gz files (and similar files) will need to be accompanied by an index, which can be generated using the IndexFeatureFile tool.
  • Restore .list as an allowed extension for files containing multiple values for command-line arguments (#4270). The previous extension .args is also still allowed. This feature allows users to provide a file ending in .list or .args containing all of the values for an argument that accepts multiple values (for example: a list of BAM files), instead of typing all the values individually on the command line.
  • Fix conda environment creation to work better with the release distribution. (#4233)
  • IndexFeatureFile: more informative error message when trying to index a malformed file (#4187)
  • Suggest using BED files as a way to resolve ambiguous interval queries. (#4183)
  • Set Spark parameter userClassPathFirst = false #3933 (#3946)
  • Update to HTSJDK 2.14.1 (#4210)

Don't miss a new gatk release

NewReleases is sending notifications on new releases.