Download release: gatk-4.1.3.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/
Highlights of the 4.1.3.0 release:
GnarlyGenotyper
, a new beta joint genotyping tool which, along withReblockGVCF
, forms part of a forthcoming more scalable version of our joint genotyping pipeline that we call the "GATK Biggest Practices" pipelineFuncotateSegments
, a new beta companion tool toFuncotator
that performs functional annotation on a segment file (.seg
) rather than a VCFGenomicsDBImport
now has the ability to incrementally update an existing GenomicsDB workspace- Several important bug fixes to
HaplotypeCaller
andMutect2
Compatibility notes:
GermlineCNVCaller
models built in cohort mode with previous releases are no longer compatible. Users should rebuild these models with this release before runningGermlineCNVCaller
in case mode. See the CNV Tools section below for more details.
Full list of changes:
-
New Tools
-
GnarlyGenotyper (beta tool) (#4947) (#6075)
- The
GnarlyGenotyper
is designed to perform joint genotyping on cohorts of at least tens of thousands of samples called withHaplotypeCaller
and post-processed withReblockGVCF
to produce a multi-sample callset in a super highly scalable manner. - Caveats:
GnarlyGenotyper
is intended to be used with GVCFs for which low quality variants have already been removed, derived from post-processingHaplotypeCaller
GVCFs withReblockGVCF
. See the "Biggest Practices" usage example in theReblockGVCF
docs for details.GnarlyGenotyper
does not subset alternate alleles and can return some highly multi-allelic sites. PLs will not be output for sites with more than 6 alts to save space.GnarlyGenotyper
assumes all diploid genotypes
- Annotations:
- To generate all the annotations necessary for VQSR, input variants to the
GnarlyGenotyper
must include theQUALapprox
andVarDP
annotations along with the latestRAW_MQandDP
annotation. - If allele-specific annotations are present, they will be used appropriately and a new
AS_AltDP
annotation giving the total depth across samples for each alternate allele will be added.
- To generate all the annotations necessary for VQSR, input variants to the
- A GATK "Biggest Practices" pipeline including the
GnarlyGenotyper
is forthcoming pending some fixes improving on the above caveats.
- The
-
FuncotateSegments (beta tool) (#5941)
- A companion tool to
Funcotator
that performs functional annotation on a segment file (.seg
) rather than a VCF - The Somatic CNV pipeline can optionally run this tool for functional annotation
- A companion tool to
-
-
HaplotypeCaller/Mutect2
- Fixed a regression in
HaplotypeCaller
/Mutect2
that caused some variants to be lost at sites with high complexity (#5952) - Fixed a GGA (GENOTYPE_GIVEN_ALLELES) mode bug in
HaplotypeCaller
/Mutect2
where added alleles' cigars could have soft clips (#6047)- This bug would manifest as a "Cigar cannot be null" error
- Fixed a bug where cached indel informativeness values could be incorrectly applied to the wrong sites in
HaplotypeCaller
/Mutect2
(#5911) - Fixed an edge case in
HaplotypeCaller
/Mutect2
where dangling end merging creates cycles (#5960) - Added hidden arguments to the assembly engine to track found haplotype counts and kmers used (#6049)
- Fixed a bug in
CalculateContamination
when contamination is indistinguishable from zero (#5971) - Fixed a bug where normal p value argument in
FilterMutectCalls
was declared static (#5982)
- Fixed a regression in
-
CNV Tools
- Added
FuncotateSegments
as an option to the Somatic CNV WDL (#5967) - Added QC metrics to the Germline CNV workflow (#6017)
- Enabled GC-bias correction by default in CNV workflows (#5966)
- Added denoised coverage file concatenation output to gCNV postprocessor (#5823) Note: The addition of this feature breaks compatibility with gCNV cohort-mode models built with previous releases.
- Changed cr.igv.seg output of ModelSegments to give log2 Segment_Mean. (#5976)
- Fixed CNV plotting script to allow spaces in input filenames. (#5983)
- Added
-
GenomicsDBImport
- Added support for making incremental updates to existing workspaces (#5970)
- This can be done using the new
--genomicsdb-update-workspace-path
argument
- This can be done using the new
- Fixed a crash in
GenomicsDBImport
on queries at positions inside deletions (#5899) - Treat AS_QUALapprox and AS_VarDP strings as array of int vectors (#5933)
- Added support for making incremental updates to existing workspaces (#5970)
-
Mitochondrial Calling Pipeline
- Added NIO support and updated to WDL 1.0 (#6074)
-
Spark Tools
- Removed the beta label from many simple Spark tools (#5991)
- Bug fix for reading references from GCS on Spark (#6070)
- Eliminated an unnecessary sort step in
HaplotypeCallerSpark
(#5909) - Fixed
BaseRecalibratorSpark
failure on a cluster due to system classloader issue (#5979) - Added a WDL for
ReadsPipelineSpark
(#5904) - Added a command-line argument to toggle using NIO on reading for Spark (#6010)
- Added advanced arguments to
MarkDuplicatesSpark
to allow non-queryname sorted inputs when specifying multiple input bams and to treat unsorted inputs as queryGroup-sorted (#5974) - Clarified the behavior of
MarkDuplicatesSpark
when given multiple input bams, and improved the sorting behavior if given a mix of queryname-sorted and query-grouped bams (#5901) - Changed
spark.yarn.executor.memoryOverhead
tospark.executor.memoryOverhead
as promoted by Spark 2.3 (#6032) - Handle newly-added arguments in
ApplyBQSRUniqueArgumentCollection
(#5949)
-
Miscellaneous Changes
- Added a new
BaseQualityHistogram
variant annotation to generate base quality histograms (#5986) - Added a new
SoftClippedReadFilter
that can filter out reads where the ratio of soft-clipped bases to total bases exceeds some given value (#5995) - Fixed a serious bug in
ValidateVariants
where the tool would silently do no validation in the default case when a DBSNP file was not provided (#5984) - Fixed a "Record covers a position previously traversed" error in
ValidateVariants
for GVCFS with multiple contigs (#6028) - The
RMSMappingQuality
annotation now requires the--allow-old-rms-mapping-quality-annotation-data
argument to run with GVCFs created by older versions of the GATK (#6060) - Added a simple TSV/CSV/XSV writer with cloud write support as an alternative to TableWriter (#5930)
Funcotator
: added Funcotator stand-alone WDL to supported area (#5999)- Extracted the
GenotypeGVCFs
engine into publicly accessible class/function (#6004) - Refactored
VariantEval
methods to allow subclasses to override (#5998) AnalyzeSaturationMutagenesis
: arbitrarily choose 1 read for disjoint pairs, dump rejected reads, and various other improvements (#5926) (#6043)- Normalized some AssemblyRegion args in
HaplotypeCallerSpark
(#5977) - Don't redundantly delete temporary directories in
RSCriptExecutor
(#5894) - Treat all source files as UTF-8 for java, javadoc (#5946)
- Updated an out-of-date argument name in an error message for the
CycleCovariate
- Changed an error about "duplicate feature inputs" to be a UserException (#5951)
- Got rid of
ExpandingArrayList
in favor ofArrayList
(#6069) - Disabled Codecov for now on travis due to spurious errors (#6052)
- Lowered the Xms value in the test JVM (#6087)
- Updated the travis installed R version to 3.2.5, matching our base docker image (#6073)
- Fixed an erroneous warning about GCS test configuration (#5987)
- Added a code of conduct (#6036)
- Added a new
-
Documentation
FilterVariantTranches
documentation fix and improvement (#5837)- Updated
FilterMutectCalls
usage examples (#5890) - Added
--max-mnp-distance 0
to usage example inCreateSomaticPanelOfNormals
docs (#5972) - Updated the
MarkDuplicatesSpark
documentation to no longer contain a misleading usage example (#5938) - Added a clarification to the README to warn users to set their Gradle JVM properly in Intellij after setup (#6066)
- Added links to download Java 8 to the README (#6025)
- Remove non-ascii chars from javadoc (#5936)
-
Dependencies