github broadinstitute/gatk 4.1.3.0

latest releases: 4.6.1.0, 4.6.0.0, 4.5.0.0...
5 years ago

Download release: gatk-4.1.3.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/

Highlights of the 4.1.3.0 release:

  • GnarlyGenotyper, a new beta joint genotyping tool which, along with ReblockGVCF, forms part of a forthcoming more scalable version of our joint genotyping pipeline that we call the "GATK Biggest Practices" pipeline
  • FuncotateSegments, a new beta companion tool to Funcotator that performs functional annotation on a segment file (.seg) rather than a VCF
  • GenomicsDBImport now has the ability to incrementally update an existing GenomicsDB workspace
  • Several important bug fixes to HaplotypeCaller and Mutect2

Compatibility notes:

  • GermlineCNVCaller models built in cohort mode with previous releases are no longer compatible. Users should rebuild these models with this release before running GermlineCNVCaller in case mode. See the CNV Tools section below for more details.

Full list of changes:

  • New Tools

    • GnarlyGenotyper (beta tool) (#4947) (#6075)

      • The GnarlyGenotyper is designed to perform joint genotyping on cohorts of at least tens of thousands of samples called with HaplotypeCaller and post-processed with ReblockGVCF to produce a multi-sample callset in a super highly scalable manner.
      • Caveats:
        • GnarlyGenotyper is intended to be used with GVCFs for which low quality variants have already been removed, derived from post-processing HaplotypeCaller GVCFs with ReblockGVCF. See the "Biggest Practices" usage example in the ReblockGVCF docs for details.
        • GnarlyGenotyper does not subset alternate alleles and can return some highly multi-allelic sites. PLs will not be output for sites with more than 6 alts to save space.
        • GnarlyGenotyper assumes all diploid genotypes
      • Annotations:
        • To generate all the annotations necessary for VQSR, input variants to the GnarlyGenotyper must include the QUALapprox and VarDP annotations along with the latest RAW_MQandDP annotation.
        • If allele-specific annotations are present, they will be used appropriately and a new AS_AltDP annotation giving the total depth across samples for each alternate allele will be added.
      • A GATK "Biggest Practices" pipeline including the GnarlyGenotyper is forthcoming pending some fixes improving on the above caveats.
    • FuncotateSegments (beta tool) (#5941)

      • A companion tool to Funcotator that performs functional annotation on a segment file (.seg) rather than a VCF
      • The Somatic CNV pipeline can optionally run this tool for functional annotation
  • HaplotypeCaller/Mutect2

    • Fixed a regression in HaplotypeCaller/Mutect2 that caused some variants to be lost at sites with high complexity (#5952)
    • Fixed a GGA (GENOTYPE_GIVEN_ALLELES) mode bug in HaplotypeCaller/Mutect2 where added alleles' cigars could have soft clips (#6047)
      • This bug would manifest as a "Cigar cannot be null" error
    • Fixed a bug where cached indel informativeness values could be incorrectly applied to the wrong sites in HaplotypeCaller/Mutect2 (#5911)
    • Fixed an edge case in HaplotypeCaller/Mutect2 where dangling end merging creates cycles (#5960)
    • Added hidden arguments to the assembly engine to track found haplotype counts and kmers used (#6049)
    • Fixed a bug in CalculateContamination when contamination is indistinguishable from zero (#5971)
    • Fixed a bug where normal p value argument in FilterMutectCalls was declared static (#5982)
  • CNV Tools

    • Added FuncotateSegments as an option to the Somatic CNV WDL (#5967)
    • Added QC metrics to the Germline CNV workflow (#6017)
    • Enabled GC-bias correction by default in CNV workflows (#5966)
    • Added denoised coverage file concatenation output to gCNV postprocessor (#5823) Note: The addition of this feature breaks compatibility with gCNV cohort-mode models built with previous releases.
    • Changed cr.igv.seg output of ModelSegments to give log2 Segment_Mean. (#5976)
    • Fixed CNV plotting script to allow spaces in input filenames. (#5983)
  • GenomicsDBImport

    • Added support for making incremental updates to existing workspaces (#5970)
      • This can be done using the new --genomicsdb-update-workspace-path argument
    • Fixed a crash in GenomicsDBImport on queries at positions inside deletions (#5899)
    • Treat AS_QUALapprox and AS_VarDP strings as array of int vectors (#5933)
  • Mitochondrial Calling Pipeline

    • Added NIO support and updated to WDL 1.0 (#6074)
  • Spark Tools

    • Removed the beta label from many simple Spark tools (#5991)
    • Bug fix for reading references from GCS on Spark (#6070)
    • Eliminated an unnecessary sort step in HaplotypeCallerSpark (#5909)
    • Fixed BaseRecalibratorSpark failure on a cluster due to system classloader issue (#5979)
    • Added a WDL for ReadsPipelineSpark (#5904)
    • Added a command-line argument to toggle using NIO on reading for Spark (#6010)
    • Added advanced arguments to MarkDuplicatesSpark to allow non-queryname sorted inputs when specifying multiple input bams and to treat unsorted inputs as queryGroup-sorted (#5974)
    • Clarified the behavior of MarkDuplicatesSpark when given multiple input bams, and improved the sorting behavior if given a mix of queryname-sorted and query-grouped bams (#5901)
    • Changed spark.yarn.executor.memoryOverhead to spark.executor.memoryOverhead as promoted by Spark 2.3 (#6032)
    • Handle newly-added arguments in ApplyBQSRUniqueArgumentCollection (#5949)
  • Miscellaneous Changes

    • Added a new BaseQualityHistogram variant annotation to generate base quality histograms (#5986)
    • Added a new SoftClippedReadFilter that can filter out reads where the ratio of soft-clipped bases to total bases exceeds some given value (#5995)
    • Fixed a serious bug in ValidateVariants where the tool would silently do no validation in the default case when a DBSNP file was not provided (#5984)
    • Fixed a "Record covers a position previously traversed" error in ValidateVariants for GVCFS with multiple contigs (#6028)
    • The RMSMappingQuality annotation now requires the --allow-old-rms-mapping-quality-annotation-data argument to run with GVCFs created by older versions of the GATK (#6060)
    • Added a simple TSV/CSV/XSV writer with cloud write support as an alternative to TableWriter (#5930)
    • Funcotator: added Funcotator stand-alone WDL to supported area (#5999)
    • Extracted the GenotypeGVCFs engine into publicly accessible class/function (#6004)
    • Refactored VariantEval methods to allow subclasses to override (#5998)
    • AnalyzeSaturationMutagenesis: arbitrarily choose 1 read for disjoint pairs, dump rejected reads, and various other improvements (#5926) (#6043)
    • Normalized some AssemblyRegion args in HaplotypeCallerSpark (#5977)
    • Don't redundantly delete temporary directories in RSCriptExecutor (#5894)
    • Treat all source files as UTF-8 for java, javadoc (#5946)
    • Updated an out-of-date argument name in an error message for the CycleCovariate
    • Changed an error about "duplicate feature inputs" to be a UserException (#5951)
    • Got rid of ExpandingArrayList in favor of ArrayList (#6069)
    • Disabled Codecov for now on travis due to spurious errors (#6052)
    • Lowered the Xms value in the test JVM (#6087)
    • Updated the travis installed R version to 3.2.5, matching our base docker image (#6073)
    • Fixed an erroneous warning about GCS test configuration (#5987)
    • Added a code of conduct (#6036)
  • Documentation

    • FilterVariantTranches documentation fix and improvement (#5837)
    • Updated FilterMutectCalls usage examples (#5890)
    • Added --max-mnp-distance 0 to usage example in CreateSomaticPanelOfNormals docs (#5972)
    • Updated the MarkDuplicatesSpark documentation to no longer contain a misleading usage example (#5938)
    • Added a clarification to the README to warn users to set their Gradle JVM properly in Intellij after setup (#6066)
    • Added links to download Java 8 to the README (#6025)
    • Remove non-ascii chars from javadoc (#5936)
  • Dependencies

    • Updated HTSJDK to 2.20.1 (#6083)
    • Updated Picard to 2.20.5 (#6083)
    • Updated Disq to 0.3.3 (#6083)
    • Updated Spark to 2.4.3 (#5990)
    • Updated Gradle to 5.4.1 (#6007)
    • Updated GenomicsDB to 1.1.0.1 (#5970)

Don't miss a new gatk release

NewReleases is sending notifications on new releases.