Download release: gatk-4.5.0.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/
Highlights of the 4.5.0.0 release:
-
HaplotypeCaller
now supports custom ploidy regions that can be specified via a new--ploidy-regions
argument, overriding the global-ploidy
setting -
The default
SmithWaterman
implementation forHaplotypeCaller
andMutect2
is now the hardware-accelerated version, resulting in a significant speedup -
Funcotator
has a new datasource release that brings in the latest version ofGencode
and several other key data sources -
We've updated our dependencies and our docker environment to greatly cut down on known security vulnerabilities
-
We've greatly improved support for
http
/https
inputs in GATK-native tools (though most Picard tools bundled with GATK do not yet support it) -
We've ported some additional DRAGEN features to
HaplotypeCaller
that bring us closer to functional equivalence with DRAGEN v3.7.8 -
GenomicsDBImport
now has support for Azure storageaz://
URIs -
GnarlyGenotyper
now has haploid support -
Lots of important bug fixes, including a fix for a bug in the Intel GKL that could cause output files to intermittently fail to be compressed properly
Full list of changes:
-
HaplotypeCaller
- HaplotypeCaller now supports custom ploidy regions (#8609)
- Added a new argument to
HaplotypeCaller
called--ploidy-regions
which allows the user to input a.bed
or.interval_list
with the "name" column equal to a positive integer for the ploidy to use when calling variants in that region - The main use case is for calling haploid variants outside the PAR for XY individuals as required by the VCF spec, but this provides a much more flexible interface for other similar niche applications, like genotyping individuals with other known aneuploidies
- The global
-ploidy
flag will still provide the background default (or the built-in ploidy of 2 for humans), but the user-supplied values will supersede these in overlapping regions
- Added a new argument to
- Changed the
SmithWaterman
implementation to default toFASTEST_AVAILABLE
(#8485) - Fixed a bug in pileup calling mode relating to the number of haplotypes (#8489)
- Huge simplication of genotyping likelihoods calculations -- no change in output (#6351)
- Be explicit about when variants are biallelic (#8332)
- Fixed debug log severity for read threading assembler messages (#8419)
- Fixed issue with visibility of the
--dont-use-softclipped-bases
argument (#8271)
- HaplotypeCaller now supports custom ploidy regions (#8609)
-
Mutect2
- Added a
--base-qual-correction-factor
to allow a scale factor to be provided to modify the base qualities reported by the sequencer and used in theMutect2
substitution error model (#8447)- Set to zero to turn off the error model changes introduced in GATK 4.1.9.0
- Fixed a bug in
FilterMutectCalls
for GVCFs (#8458)- When using GVCFs with
Mutect2
(for example with the Mitochondria mode), in the filtering step ADs for symbolic alleles are set to 0 so it doesn't contribute to overall AD. There was an off-by-one error that removed the alt allele AD rather than the<NON_REF>
allele AD. This led to NaNs and errors when a site had no ref reads (for example a GT of[ref,alt,<NON_REF>]
and AD of[0,300,0]
would accidentally be changed to an AD of[0,0,0]
if the alt index was removed instead of the<NON_REF>
index).
- When using GVCFs with
- Added a
-
DRAGEN-GATK
- Added implementations of the "columnwise detection" and "PDHMM" (partially-determined HMM) features from DRAGEN to bring us much closer to functional equivalence with DRAGEN v3.7.8 (#8083)
- Development work to prepare the way for the final missing DRAGEN 3.7.8 feature, "joint detection":
- Graph method for PDHMM event groups that unifies finding/merging and overlap/mutual exclusion (#8366)
- Rewrote haplotype construction methods in
PartiallyDeterminedHaplotypeComputationEngine
(#8367) - More refactoring in
PartiallyDeterminedHaplotypeComputationEngine
and preparing for joint detection (#8492) - Innocuous housekeeping changes in the partially-determined haplotypes code (#8361)
- Clarify cryptic bitwise operations in the partially-determined haplotype
EventGroup
subclass (#8400)
-
Joint Calling
- Added haploid support to
GnarlyGenotyper
(#7750) - Fix to allow
GenotypeGVCFs
to properly handle events not in minimal representation (#8567) ReblockGVCF
: added a--keep-site-filters
argument to keep site-level filters (#8304) (#8308)ReblockGVCF
: added a--add-site-filters-to-genotype
argument to move site-level filters to genotype-level filters (#8484)ReblockGVCF
: added a--format-annotations-to-remove
argument to specify format-level annotations to remove from all genotypes in final GVCF (#8411)ReblockGVCF
: added a check to make sure the input VCF is a GVCF rather than a single sample VCF (#8411)- Improved an error message in
GnarlyGenotyper
(#8270) - Added a
mergeWithRemapping()
method inReferenceConfidenceVariantContextMerger
to perform allele remapping prior to genotyping (#8318) - GVS (Genomic Variant Store) development:
- Added haploid support to
-
GenomicsDB
-
Funcotator
- New data source release V1.8 (#8512)
- Updated
Gencode
to version 43, and also updatedCOSMIC
,Clinvar
, and several other datasources to their latest versions - The data sources are now split by reference into separate hg19 and hg38 bundles to cut down on size
- Updated
- Fixed support for newer
Gencode
GTF versions by making theGencodeGTFField
parsing more permissive (#8351) - Fixed
Funcotator
VCF output renderer to correctly preserve B37 contig names on output for B37 aligned files (#8539) - Fix bug in VCF comparison code that causes
Funcotator
to crash with certain datasources (#8445) - Connected the splice site window size to CLI parameters (#8463)
- Allow
LocatableXsvFuncotationFactory
to read gzipped files (#8363)
- New data source release V1.8 (#8512)
-
CNV Calling
-
SV Calling
- Added support for breakend replacement alleles in
SVCluster
(#8408)- Implements allele collapsing for "breakend replacement" BND alleles, as described in section 5.4 of the VCFv4.2 spec
- Size similarity linkage and bug fixes for SV matching tools (#8257)
- Added size similarity criterion to the
SVConcordance
andSVCluster
tools. This is particularly useful for accurately matching smaller SVs that have a high degree of breakpoint uncertainty, in which case reciprocal overlap does not work well. PESR/mixed variant types must have size similarity, reciprocal overlap, and breakend window criteria met. Depth-only variants may have either size similarity + reciprocal overlap OR breakend window criteria met (or both).
- Added size similarity criterion to the
- Updated SV split-read strand validation and clustering (#8378)
- Adds some flexibility to the allowed split-read strand annotations on SV records:
- Allow INS -+ strands
- Allow INV null strands
- When clustering, only require that strands match for INV/BND records
- Adds some flexibility to the allowed split-read strand annotations on SV records:
- Sample set and annotation improvements for
SVConcordance
(#8211)
- Added support for breakend replacement alleles in
-
Mitochondrial pipeline
-
Flow-based Calling
- New/updated flow-based read tools (#8579)
- Added a new
GroundTruthScorer
tool to score reads against a reference/ground truth - Updated
FlowFeatureMapper
- Added a new
- Created an
AddFlowBaseQuality
tool that writes reads from flow-based SAM/BAM/CRAM files that pass criteria to a new file while adding a base-quality attribute (BQ) (#8235) - Added an experimental tool
FlowPairHMMAlignReadsToHaplotypes
that aligns flow-based reads to set of haplotypes / templates (#8305) - Fixed an issue with reads that contain the tp tag sometimes being incorrectly identified as flow-based (#8337)
- Minor changes and fixes to flow-based annotations (#8442)
- Removed a line in
FlowBasedAnnotation
that contained a bug and thus was meaningless (#8421) - Additional annotation in FeatureMap (#8347)
- Removed unnecessary flow-based argument and option (#8342)
GroundTruthScorer
doc update (#8597)- Removed unnecessary and buggy validation check (#8580)
- New/updated flow-based read tools (#8579)
-
Notable Enhancements
- Major security fixes in our dependencies and docker environment
- Greatly improved HTTP support (#8611)
- Updated the
http-nio
library and made tweaks to HTSJDK to make it available in more places. The new version ofhttp-nio
should provide much more reliable access to http(s) file paths. This is supported by all methods accessing Paths, and includes SAM/BAM/CRAM and VCF/Feature files. It includes a new retry mechanism which retries after transient errors. It also includes bug fixes and various other minor improvements, such as making encoded Path handling more consistent.
- Updated the
- Added a new
PrintFileDiagnostics
tool that can output the internal metadata ofCRAM
,CRAI
andBAI
files for diagnostic purposes (#8577) - Added a new
TransmittedSingleton
annotation and added quality threshold arguments to thePossibleDenovo
annotation (#8329) - Support multiple read name inputs in
ReadNameReadFilter
(#8405) - Added a native GATK implementation for
2bit
references, and removed the dependency on the ADAM library (#8606)
-
Bug Fixes
- Fixed a major bug in the Intel GKL that could cause output files to intermittently fail to be compressed properly (#8409)
-
Miscellaneous Changes
CNNVariantTrain
: exposed more CNN training parameters as arguments (#8483)- Support underscores in bucket names on Google Cloud (#8439)
- Performed some refactoring on the new annotation-based filtering tools (#8131)
- Added tags to
dockstore.yaml
(#8323) - Added the ability to specify the RELEASE arg to the cloud-based docker build, and added a new docker release script (#8247)
- Added an option to
AnalyzeSaturationMutagenesis
to keep disjoint mates (#8557) - Exit with code 137 when we get an
OutOfMemoryError
(#8277) - Updates to reduce size of docker image (#8259)
- Free up space on Github Actions runners for all jobs (#8386) (#8371) (#8373)
- Fixed warnings in Github Actions (#8241)
- Disabled line-by-line codecov comments (#8613)
- Fixed a bug in the GATK download metrics script (#8418)
- Updated the Spark version in the GATK jar manifest, and hooked up the Spark version constant in build.gradle (#8625)
- Fixed a warning in Gradle (#8431)
- Pinned joblib to v1.1.1 in the python environment (#8391)
- Updated the Ubuntu version for the Carrot github action because github dropped support for 18.04 (#8299)
-
Documentation
-
Dependencies