Download release: gatk-4.1.9.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/
Highlights of the 4.1.9.0 release:
-
A major update to
Funcotator
, bringing in the latest Gencode release, fixing compatibility issues with dbSNP, and more! -
Two new tools,
GeneExpressionEvaluation
andReferenceBlockConcordance
-
Significant performance improvements to
DepthOfCoverage
andSelectVariants
-
Some important bug fixes:
- Fixed a bug in
HaplotypeCaller
andMutect2
where we were losing insertion events that immediately followed a deletion - A fix for the "CreateSomaticPanelOfNormals output PoN has much less variants in 4.1.8.0 than before" issue reported in #6744
- A fix for a frequently-encountered
NullPointerException
in theAS_StrandBiasTest
annotation when runningCombineGVCFs
reported in #6766
- Fixed a bug in
Full list of changes:
-
New Tools
-
GeneExpressionEvaluation
: a tool for evaluating gene expression from RNA-seq reads aligned to whole genome (#6602)- This tool counts fragments to evaluate gene expression from RNA-seq reads aligned to the genome. Features to evaluate expression over are defined in an input annotation file in gff3 fomat. Output is a tsv listing sense and antisense expression for all stranded grouping features, and expression (labeled as sense) for all unstranded grouping features.
-
ReferenceBlockConcordance
: a new tool to evaluate concordance of reference blocks in GVCF files (#6802)- This tool compares the reference blocks of two GVCF files against each other and produces three histograms:
- Truth block histogram: Indicates the number of occurrences of reference blocks with a given confidence score and length in the truth GVCF
- Eval block histogram: Indicates the number of occurrences of reference blocks with a given confidence score and length in the eval GVCF
- Confidence concordance histogram: Reflects the confidence scores of bases in reference blocks in the truth and eval VCF, respectively. An entry of 10 at bin "80,90" means that there are 10 bases which simultaneously have a reference confidence of 80 in the truth GVCF and a reference confidence of 90 in the eval GVCF.
- This tool compares the reference blocks of two GVCF files against each other and produces three histograms:
-
-
HaplotypeCaller/Mutect2
- Fixed a bug in
HaplotypeCaller
andMutect2
where we were losing insertion events that immediately followed a deletion (#6696) - Added a workaround for an issue with multiallelics in the
CreateSomaticPanelOfNormals
pipeline (#6871)- This fixes the "CreateSomaticPanelOfNormals output PoN has much less variants in 4.1.8.0 than before" issue reported in #6744
- Made improvements to the
Mutect2
active region detection code that resulted in recovering some low-AF calls that we were missing (#6821) - Made the
HaplotypeCaller
/Mutect2
adaptive pruner smarter in complex graphs, resulting in modest improvements to indel sensitivity when using the adaptive pruning option (#6520) - Fixed a bug in variation event detection code that could sometimes lead to mistreating indel assembly windows as SNP assembly windows (#6661)
- Fixed a bug in
FragmentUtils
where insertion quals were used instead of deletion quals when adjusting base qualities for two overlapping reads from the same fragment (#6815) - Fixed a concurrent modification exception error for local runs of
HaplotypeCallerSpark
(#6741) - Marked the
--linked-de-bruijn-graph
argument as Advanced rather than Hidden (#6737) - Made a small tweak to
Mutect2
's callable sites count (#6791) - Added a "requester pays" option to
Mutect2
WDL tasks that access bams for use with Google Cloud "requester pays" buckets (#6879)
- Fixed a bug in
-
Funcotator
- A major set of updates to
Funcotator
(#6660)- Updated to the latest Gencode release
- Fixed the contig naming compatibility issue with dbSNP reported in #6564 ("hg38 dbSNP has incorrect contig names")
- Now both hg19 and hg38 have the contig names translated to "chr__"
- Added 'lncRNA' to GeneTranscriptType.
- Added "TAGENE" gene tag.
- Added the MANE_SELECT tag to FeatureTag.
- Added the STOP_CODON_READTHROUGH tag to FeatureTag.
- Updated the GTF versions that are parseable.
- Fixed a parsing error with new versions of gencode and the remap positions (for liftover files).
- Added test for indexing new lifted over gencode GTF.
- Added Gencode_34 entries to MAF output map.
- Pointed data source downloader at new data sources URL.
- Minor updates to workflows to point at new data sources.
- Updated retrieval scripts for dbSNP and Gencode.
- Added required field to gencode config file generation.
- Now gencode retrieval script enforces double hash comments at top of gencode GTF files.
- Fixed an erroneous trailing tab in MAF file output reported in #6693
- Added a maximum version number for data sources in
Funcotator
(#6807) - Added a "requester pays" option to the
Funcotator
WDL for use with Google Cloud "requester pays" buckets (#6874) FuncotateSegments
: fixed an issue with the default value of --alias-to-key-mapping being set to an immutable value (#6700)
- A major set of updates to
-
GenomicsDB
- Updated to GenomicsDB Version 1.3.2, which brings better propagation of errors messages from the GenomicsDB library (#6852)
- Using the GATK option GATK_STACKTRACE_ON_USER_EXCEPTION will now also output a limited C/C++ stacktrace
- Updated to GenomicsDB Version 1.3.2, which brings better propagation of errors messages from the GenomicsDB library (#6852)
-
CNV Tools
- Fixed a bug in the
KernelSegmenter
: the minimal data to calculate the segmentation cost should be2 * windowSize
, rather thanwindowSize
(#6835) - Germline CNV WDL improvements for WGS (#6607)
- Modified gCNV WDLs to improve Cromwell performance when running on a large number of intervals, as in WGS
- Added optional disabled_read_filters input to CollectCounts
- Enabled GCS streaming for CollectCounts and CollectAllelicCounts
- Added a "requester pays" option to the germline and somatic CNV WDLs for use with Google Cloud "requester pays" buckets (#6870)
- Fixed a bug in the
-
Mitochondrial Pipeline
-
Notable Enhancements
- Significantly improved the performance of
DepthOfCoverage
by removing slow string formatting calls (#6740)- In a test run with default arguments locally the runtime for a WGS full chr15 drops from ~8.9 minutes to ~4.7 minutes after this patch
- Significantly improved the performance of
SelectVariants
with large numbers of samples by changing an operation to scale linearly instead of quadratically with the number of samples (#6729)- On one example with several thousand samples there was a speed up from ~5 minutes to 0.1 minutes
- WDL generation: made several improvements to automatic WDL generation, annotated additional tools for WDL generation, and added a section to the README with instructions on generating WDLs for GATK tools (#6800)
- Added a suite of utility methods for working with Google BigQuery:
BigQueryUtils
(#6759) (#6861) - The GATK docker image can now be built with a simple
docker build .
command (no extra arguments needed) (#6764) (#6842) (#6782) - Added a Dockstore yml file with workflow descriptions for the WDLs in the GATK repo, to facilitate automatic publication to Dockstore (#6770)
- Significantly improved the performance of
-
Bug Fixes
- Fixed a
NullPointerException
in theAS_StrandBiasTest
annotation reported in #6766 (#6847) - Fixed a bug with soft clips in
LeftAlignIndels
(#6792) VariantRecalibrator
: uniquify annotations to fix the error reported in #2221 (#6723)- Fixed an issue where
ContextCovariate
inBaseRecalibrator
mistakenly assumed that all non-ACGT bases in the read are N (#6625) - Fixed a crash in
CountBasesSpark
when using the-L
option (#6767)
- Fixed a
-
Miscellaneous Changes
- Significant refactoring of the SV discovery classes (#6652)
FilterVariantTranches
: report more info when the ref alleles don't match (#6723)- We now report the target url in exceptions thrown by
HtsgetReader
(#6799) - Added more information to error messages in
AssemblyRegion
for contigs not in the reference dictionary (#6781) - Improved an error message in
GATKRead.setMatePosition()
(#6779) - Updated the Barclay WDL template for compatibility with the Debian distribution (#6841)
- Temporarily disabled
HtsgetReader
tests to work around issues caused by a server-side upgrade. (#6804) - Re-enabled an
IndexFeatureFile
test for uncompressed BCF. (#6716)
-
Documentation
- Marked
LearnReadOrientationModel
as aDocumentedFeature
(#6726) - Added a gentle warning about loss of True Positives with the default
FilterIntervals
params (#6751) - Updated the README to mention that the conda environment is not officially supported on macOS at this time. (#6788)
- Fixed a typo in the example command for
SplitIntervals
(#6869) - Fixed a typo in the
--tmp-dir
argument in theGenomicsDBImport
docs (#6785) - Fixed a typo in the
--tmp-dir
argument in theGenotypeGVCFs
docs (#6784) - Removed outdated argument references from the
DepthOfCoverage
documentation. (#6810) - Fixed a typo with "-genelist" argument to "-gene-list" in the
DepthOfCoverage
documentation. (#6880) - Fixed a typo in the docs for the
Mutect2
--pcr-indel-qual argument (#6840)
- Marked
-
Dependencies