Download release: gatk-4.4.0.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/
Highlights of the 4.4.0.0 release:
-
We've moved to Java 17, the latest long-term support (LTS) Java release, for building and running GATK! Previously we required Java 8, which is now end-of-life.
- Newer non-LTS Java releases such as Java 18 or Java 19 may work as well, but since they are untested by us we only officially support running with Java 17.
-
Significant enhancements to
SelectVariants
, including arguments to enableGVCF
filtering support and to work with genotype fields more easily. -
A new tool
SVConcordance
, that calculates SV genotype concordance between an "evaluation" VCF and a "truth" VCF -
Bug fixes and enhancements to the support for the Ultima Genomics flow-based sequencing platform introduced in GATK 4.3.0.0
Full list of changes:
-
Flow-based Variant Calling
FlowFeatureMapper
: added surrounding-median-quality-size feature (#8222)- Removed hardcoded limit on max homopolymer call (#8088)
- Fixed bug in dynamic read disqualification (#8171)
- Fixed a bug in the parsing of the T0 tag (#8185)
- Updated flow-based calling
Mutect2
parameters to make them consistent with theHaplotypeCaller
parameters (#8186)
-
SelectVariants
- Enabled GVCF type filtering support in
SelectVariants
(#7193)- Added an optional argument
--ignore-non-ref-in-types
to support correct handling of VariantContexts that contain a NON_REF allele. This is necessary because every variant in a GVCF file would otherwise be assigned the type MIXED, which makes it impossible to filter for e.g. SNPs. - Note that this only enables correct handling of GVCF input. The filtered output files are VCF (not GVCF) files, since reference blocks are not extended when a variant is filtered out.
- Added an optional argument
SelectVariants
: added new arguments for controlling genotype JEXL filtering (#8092)-select-genotype
: with this new genotype-specific JEXL argument, we support easily filtering by genotype fields with expressions like 'GQ > 0', where the behavior in the multi-sample case is 'GQ > 0' in at least one sample. It's still possible to manually access genotype fields using the old-select
argument and expressions such asvc.getGenotype('NA12878').getGQ() > 0
.--apply-jexl-filters-first
: This flag is provided to allow the user to do JEXL filtering before subsetting the format fields, in particular the case where the filtering is done on INFO fields only, which may improve speed when working with a large cohort VCF that contains genotypes for thousands of samples.
- Enabled GVCF type filtering support in
-
SV Calling
-
Notable Enhancements
GenotypeGVCFs
: added an--keep-specific-combined-raw-annotation
argument to keep specified raw annotations (#7996)VariantAnnotator
now warns instead of fails when the variant contains too many alleles (#8075)- Read filters now output total reads processed in addition to the number of reads filtered (#7947)
- Added
GenomicsDB
arguments to theCreateSomaticPanelOfNormals
tool (#6746) - Added a
DeprecatedFeature
annotation and a process for officially marking GATK tools as deprecated (#8100) - Prevent tool
close()
methods from hiding underlying errors (#7764)
-
Bug Fixes
- Fixed issue causing
VariantRecalibrator
to sometimes fail if user provided duplicate -an options (#8227) ReblockGVCF
: remove A,R, and G length attributes whenReblockGVCF
subsets an allele (#8209)- Previously if an input gVCF had allele length, reference length, or genotype length annotations in the FORMAT field,
ReblockGVCF
would not remove all of them at sites where an allele was dropped. This makes the output gVCF invalid since the annotation length no longer matches the length described in the header at those sites. Now we fix up F1R2, F2R1, and AF annotations and remove any other annotations that are not already handled that are defined as A, R, or G length in the header.
- Previously if an input gVCF had allele length, reference length, or genotype length annotations in the FORMAT field,
- Fixed a
gCNV
bug that breaks the inference when only 2 intervals are provided (#8180) - Fixed NPE from unintialized logger in
GenotypingEngine
(#8159) - Fixed asynchronous Python exception propagation in
StreamingPythonExecutor
/CNNScoreVariants
(#7402) - Fixed issue in
ShiftFasta
where the interval list output was never written (#8070) - Bugfix for the type of some output files in the somatic CNV WDL (#6735) (#8130)
MergeAnnotatedRegions
now requires a reference as asserted in its documentation (#8067)
- Fixed issue causing
-
Miscellaneous Changes
- Deprecated an untested
VariantRecalibrator
argument and an oldReblockGVCF
argument that produced invalid GVCFs (#8140) - Removed old
GnarlyGenotyper
code with a diploid assumption to prepare for adding haploid support toGnarlyGenotyper
(#8140) ReblockGVCF
: add error message for when tree-score-threshold is set but the TREE_SCORE annotation is not present (#8218)TransferReadTags
: allow empty unaligned bams as input (#8198)- Refactored
JointVcfFiltering
WDL and expanded tests. (#8074) - Updated the carrot github action workflow to the most recent version, which supports using
#carrot_pr
to trigger branch vs master comparison runs (#8084) - Replaced uses of
File.createTempFile()
withIOUtils.createTempFile()
to ensure that temp files are deleted on shutdown (#6780) - Don't require python just to instantiate the
CNNScoreVariants
tool classes. (#8128) - Made several
Funcotator
methods and fields protected so it is easier to extend the tool (#8124) (#8166) - Test for presence of ack result message and simplify
ProcessControllerAckResult
API (#7816) - Fixed the path reported by the gatkbot when there are test failures (#8069)
- Fixed incorrect boolean value in
DirichletAlleleDepthAndFractionIntegrationTest
(#7963) - Removed two ancient and unused
HaplotypeCaller
test files that are no longer needed (#7634) - Added scattered gCNV case WDL to dockstore file (#8217)
- Deprecated an untested
-
Documentation
- Updated instructions for installing Java in the README (#8089)
- Added documentation on
OMP_NUM_THREADS
andMKL_NUM_THREADS
toGermlineCNVCaller
andDetermineGermlineContigPloidy
(#8223) - Improvements to
PileupDetectionArgumentCollection
documentation (#8050) - Fixed typo in documentation for
VariantAnnotator
(#8145)
-
Dependencies
- Moved to
Java 17
, the latest LTS Java release, for building/running GATK (#8035) - Updated
Gradle
to 7.5.1 (#8098) - Updated the GATK base docker image to 3.0.0 (#8228)
- Updated
HTSJDK
to 3.0.5 (#8035) - Updated
Picard
to 3.0.0 (#8035) - Updated
Barclay
to 5.0.0 (#8035) - Updated
GenomicsDB
to 1.4.4 (#7978) - Updated
Spark
to 3.3.1 (#8035) - Updated
Hadoop
to 3.3.1. (#8102) - Require
commons-text
1.10.0 to fix a security vulnerability (#8071)
- Moved to