Highlights of this release include support for outputting phased variants in HaplotypeCaller
/Mutect2
, restoring the --include-non-variant-sites
argument to GenotypeGVCFs
, a port of the GATK3 tool VariantEval
, a new library (Disq, https://github.com/disq-bio/disq) for working with BAM/CRAM/VCF/etc. formats on Spark, and GCS (Google Cloud Storage) support in Funcotator
.
As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/
Full list of changes in this release:
-
HaplotypeCaller
/Mutect2
- Output VCF spec-compliant phased variants in HaplotypeCaller and Mutect2
- Added an experimental adaptive pruning option for local assembly (#5473)
- Improved implementation of allele-specific new qual (#5460)
- Use cigar complexity to break ties in uninformative reads' best haplotypes (#5359)
- Improved handling of regions that are too short after trimming in HaplotypeCaller and in Mutect2 (Closes issue #5079)
- Optimization in
CigarUtils
to shortcut to M-only CIGAR when provably optimal (#5466) - Changed SUPPORTED_ALLELES_TAG from SA to XA (#5418)
-
HaplotypeCaller
-
Mutect2
- Big improvements to CalculateContamination's model for determining hom alt sites (#5413)
- Reduce false negatives from mapping quality filter on long indels in Mutect2 (#5497)
- Added a mismatch ratio option in realignment filter (#5501)
- Made Mutect2 read position filter default much less stringent (#5487)
- Fixed M2 bug for germline resources with AF=. (#5442)
- Fix read position annotation bug in M2 filter (#5495)
- Cleaner Mutect2 VCF fields (#5510)
- Moved PerAlleleAnnotations to the INFO field (#5518)
- Removed unnecessary inheritance of M2 filtering arguments collection (#5498)
-
GenotypeGVCFs
- Restored the --include-non-variant-sites argument from GATK3 to GenotypeGVCFs (#5219)
-
Ported the GATK3 tool
VariantEval
to GATK4 (#5043) -
Replaced the Hadoop-BAM library with the newly-developed Disq library (https://github.com/disq-bio/disq) for efficiently working with BAM/CRAM/VCF/etc. formats on Spark (#5138)
- Improves Spark performance across-the-board, and fixes many edge-case bugs in Hadoop-BAM
-
Funcotator
- Added GCS support to Funcotator data sources, so that data sources can now be accessed directly from GCS buckets (#5425)
- Added support for annotating 5'/3' flanks (#5403)
- Funcotator now creates default annotations for difficult variants. (#5374)
- Funcotator now can create annotations for symbollic alleles and masked alleles (#5406)
- Funcotator now can match between hg19 and b37 data sources. (#5491)
- Added in regression tests and fixes for correctness of many annotations (#5302)
- Now DE_NOVO_START_IN_FRAME and DE_NOVO_START_OUT_FRAME are correct. (#5357)
- Added cDNA Strings for Intronic Variants (#5321)
- VCF data sources create an ID field for the ID of the variant
used for the annotation (#5327) - Funcotator now computes MT protein changes. (#5361)
- Funcotator now correctly populates transcript position. (#5380)
- Added a script that can create data sources from BED files. (#5438)
- Updated testing Gencode data sources to fully exercise test data set (#5423)
- Moved validation test data out of large files area. (#5381)
- Updated top-level class documentation for Funcotator. (#4655)
- Added scripts to liftover gnomAD. Also bugfixes for Funcotator NIO. (#5514)
-
HaplotypeCallerSpark
-
MarkDuplicatesSpark
: Added a few of the remaining unimplemented useful features from Picard (#5377) -
CNV workflows
- Changed
FilterIntervals
to operate on the intersection of intervals in all inputs. (#5408) - Fixed RAM usage parameter error in combine_tracks.wdl (#5358)
- Various other improvements to combine_tracks.wdl (#5384)
- Fixed gCNV WDL broken by Cromwell update on FireCloud. (#5407)
- Replaced bash script in gCNV ScatterIntervals task with updated version of IntervalListTools. (#5414)
- Changed
-
CNNScoreVariants
- Check for and require hardware AVX support (#5291)
-
Changed
SelectVariants
so that it can handle multiple rsIDs separated by ';' in a VCF file (#5464) -
Miscellaneous Changes
- Added
setIsUnplaced()
to theGATKRead
API to distinguish reads with no mapping information (#5320) - Fixed an integer overflow bug in the
RMSMappingQuality
annotation (#5435) - Fixed floating-point bug in MannWhitneyU on some JVMs. (#5371)
- Standardized the output argument for
LeftAlignIndels
(#5474) SplitIntervals
now produces an.interval_list
file (#5392)- Fixed a bug with GATK_GCS_STAGING in the GATK launcher script #1338 (#5452)
- Added ExampleReadWalkerWithVariantsSpark.java and tests (#5289)
- Add description getter and javadoc in GATKReportTable (#5443)
- Fixed message in GATKAnnotationPluginDescription (#5444)
- Replaced some uses of PrintWriter (#5461)
- Refactor GVCFWriter to allow push/pull iteration. (#5311)
- Add scripts/dataproc-cluster-ui to release bundle. (#5401)
- Marked
VariantAnnotator
as a@DocumentedFeature
(#5480) - Removed obsolete intel conda environment references. (#5482)
- Deleted the CountSet class (#5467)
- Test framework: disabled gcloud login on travis for non-cloud non-wdl tests (#5335)
- Updated Spark scripts to reflect changes from #5386 and #5127. (#5415)
- Fixed jexl logging and updated VariantFiltration doc. (#5422)
- Fixed some dead links in the README (#5405)
- Added
-
Dependencies