github broadinstitute/gatk 4.1.9.0

latest releases: 4.5.0.0, after_master_merge, before_master_merge...
3 years ago

Download release: gatk-4.1.9.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/

Highlights of the 4.1.9.0 release:

  • A major update to Funcotator, bringing in the latest Gencode release, fixing compatibility issues with dbSNP, and more!

  • Two new tools, GeneExpressionEvaluation and ReferenceBlockConcordance

  • Significant performance improvements to DepthOfCoverage and SelectVariants

  • Some important bug fixes:

    • Fixed a bug in HaplotypeCaller and Mutect2 where we were losing insertion events that immediately followed a deletion
    • A fix for the "CreateSomaticPanelOfNormals output PoN has much less variants in 4.1.8.0 than before" issue reported in #6744
    • A fix for a frequently-encountered NullPointerException in the AS_StrandBiasTest annotation when running CombineGVCFs reported in #6766

Full list of changes:

  • New Tools

    • GeneExpressionEvaluation: a tool for evaluating gene expression from RNA-seq reads aligned to whole genome (#6602)

      • This tool counts fragments to evaluate gene expression from RNA-seq reads aligned to the genome. Features to evaluate expression over are defined in an input annotation file in gff3 fomat. Output is a tsv listing sense and antisense expression for all stranded grouping features, and expression (labeled as sense) for all unstranded grouping features.
    • ReferenceBlockConcordance: a new tool to evaluate concordance of reference blocks in GVCF files (#6802)

      • This tool compares the reference blocks of two GVCF files against each other and produces three histograms:
        • Truth block histogram: Indicates the number of occurrences of reference blocks with a given confidence score and length in the truth GVCF
        • Eval block histogram: Indicates the number of occurrences of reference blocks with a given confidence score and length in the eval GVCF
        • Confidence concordance histogram: Reflects the confidence scores of bases in reference blocks in the truth and eval VCF, respectively. An entry of 10 at bin "80,90" means that there are 10 bases which simultaneously have a reference confidence of 80 in the truth GVCF and a reference confidence of 90 in the eval GVCF.
  • HaplotypeCaller/Mutect2

    • Fixed a bug in HaplotypeCaller and Mutect2 where we were losing insertion events that immediately followed a deletion (#6696)
    • Added a workaround for an issue with multiallelics in the CreateSomaticPanelOfNormals pipeline (#6871)
      • This fixes the "CreateSomaticPanelOfNormals output PoN has much less variants in 4.1.8.0 than before" issue reported in #6744
    • Made improvements to the Mutect2 active region detection code that resulted in recovering some low-AF calls that we were missing (#6821)
    • Made the HaplotypeCaller/Mutect2 adaptive pruner smarter in complex graphs, resulting in modest improvements to indel sensitivity when using the adaptive pruning option (#6520)
    • Fixed a bug in variation event detection code that could sometimes lead to mistreating indel assembly windows as SNP assembly windows (#6661)
    • Fixed a bug in FragmentUtils where insertion quals were used instead of deletion quals when adjusting base qualities for two overlapping reads from the same fragment (#6815)
    • Fixed a concurrent modification exception error for local runs of HaplotypeCallerSpark (#6741)
    • Marked the --linked-de-bruijn-graph argument as Advanced rather than Hidden (#6737)
    • Made a small tweak to Mutect2's callable sites count (#6791)
    • Added a "requester pays" option to Mutect2 WDL tasks that access bams for use with Google Cloud "requester pays" buckets (#6879)
  • Funcotator

    • A major set of updates to Funcotator (#6660)
      • Updated to the latest Gencode release
      • Fixed the contig naming compatibility issue with dbSNP reported in #6564 ("hg38 dbSNP has incorrect contig names")
      • Now both hg19 and hg38 have the contig names translated to "chr__"
      • Added 'lncRNA' to GeneTranscriptType.
      • Added "TAGENE" gene tag.
      • Added the MANE_SELECT tag to FeatureTag.
      • Added the STOP_CODON_READTHROUGH tag to FeatureTag.
      • Updated the GTF versions that are parseable.
      • Fixed a parsing error with new versions of gencode and the remap positions (for liftover files).
      • Added test for indexing new lifted over gencode GTF.
      • Added Gencode_34 entries to MAF output map.
      • Pointed data source downloader at new data sources URL.
      • Minor updates to workflows to point at new data sources.
      • Updated retrieval scripts for dbSNP and Gencode.
      • Added required field to gencode config file generation.
      • Now gencode retrieval script enforces double hash comments at top of gencode GTF files.
      • Fixed an erroneous trailing tab in MAF file output reported in #6693
    • Added a maximum version number for data sources in Funcotator (#6807)
    • Added a "requester pays" option to the Funcotator WDL for use with Google Cloud "requester pays" buckets (#6874)
    • FuncotateSegments: fixed an issue with the default value of --alias-to-key-mapping being set to an immutable value (#6700)
  • GenomicsDB

    • Updated to GenomicsDB Version 1.3.2, which brings better propagation of errors messages from the GenomicsDB library (#6852)
      • Using the GATK option GATK_STACKTRACE_ON_USER_EXCEPTION will now also output a limited C/C++ stacktrace
  • CNV Tools

    • Fixed a bug in the KernelSegmenter: the minimal data to calculate the segmentation cost should be 2 * windowSize, rather than windowSize (#6835)
    • Germline CNV WDL improvements for WGS (#6607)
      • Modified gCNV WDLs to improve Cromwell performance when running on a large number of intervals, as in WGS
      • Added optional disabled_read_filters input to CollectCounts
      • Enabled GCS streaming for CollectCounts and CollectAllelicCounts
    • Added a "requester pays" option to the germline and somatic CNV WDLs for use with Google Cloud "requester pays" buckets (#6870)
  • Mitochondrial Pipeline

    • Fix to correctly handle spaces in sample names in the Mitochondria WDL (#6773)
    • Exposed a max_reads_per_alignment_start argument in the Mitochondria WDL (#6739)
    • Updated the HaploChecker Dockerfile to reflect the correct haplocheck CLI (#6867)
  • Notable Enhancements

    • Significantly improved the performance of DepthOfCoverage by removing slow string formatting calls (#6740)
      • In a test run with default arguments locally the runtime for a WGS full chr15 drops from ~8.9 minutes to ~4.7 minutes after this patch
    • Significantly improved the performance of SelectVariants with large numbers of samples by changing an operation to scale linearly instead of quadratically with the number of samples (#6729)
      • On one example with several thousand samples there was a speed up from ~5 minutes to 0.1 minutes
    • WDL generation: made several improvements to automatic WDL generation, annotated additional tools for WDL generation, and added a section to the README with instructions on generating WDLs for GATK tools (#6800)
    • Added a suite of utility methods for working with Google BigQuery: BigQueryUtils (#6759) (#6861)
    • The GATK docker image can now be built with a simple docker build . command (no extra arguments needed) (#6764) (#6842) (#6782)
    • Added a Dockstore yml file with workflow descriptions for the WDLs in the GATK repo, to facilitate automatic publication to Dockstore (#6770)
  • Bug Fixes

    • Fixed a NullPointerException in the AS_StrandBiasTest annotation reported in #6766 (#6847)
    • Fixed a bug with soft clips in LeftAlignIndels (#6792)
    • VariantRecalibrator: uniquify annotations to fix the error reported in #2221 (#6723)
    • Fixed an issue where ContextCovariate in BaseRecalibrator mistakenly assumed that all non-ACGT bases in the read are N (#6625)
    • Fixed a crash in CountBasesSpark when using the -L option (#6767)
  • Miscellaneous Changes

    • Significant refactoring of the SV discovery classes (#6652)
    • FilterVariantTranches: report more info when the ref alleles don't match (#6723)
    • We now report the target url in exceptions thrown by HtsgetReader (#6799)
    • Added more information to error messages in AssemblyRegion for contigs not in the reference dictionary (#6781)
    • Improved an error message in GATKRead.setMatePosition() (#6779)
    • Updated the Barclay WDL template for compatibility with the Debian distribution (#6841)
    • Temporarily disabled HtsgetReader tests to work around issues caused by a server-side upgrade. (#6804)
    • Re-enabled an IndexFeatureFile test for uncompressed BCF. (#6716)
  • Documentation

    • Marked LearnReadOrientationModel as a DocumentedFeature (#6726)
    • Added a gentle warning about loss of True Positives with the default FilterIntervals params (#6751)
    • Updated the README to mention that the conda environment is not officially supported on macOS at this time. (#6788)
    • Fixed a typo in the example command for SplitIntervals (#6869)
    • Fixed a typo in the --tmp-dir argument in the GenomicsDBImport docs (#6785)
    • Fixed a typo in the --tmp-dir argument in the GenotypeGVCFs docs (#6784)
    • Removed outdated argument references from the DepthOfCoverage documentation. (#6810)
    • Fixed a typo with "-genelist" argument to "-gene-list" in the DepthOfCoverage documentation. (#6880)
    • Fixed a typo in the docs for the Mutect2 --pcr-indel-qual argument (#6840)
  • Dependencies

    • Upgraded Picard to 2.23.3 (#6717)
    • Upgraded Barclay to 4.0.1. (#6864)
    • Updated GenomicsDB to 1.3.2 (#6852)
    • Added a new dependency on Google BigQuery 1.117.1 (#6759)

Don't miss a new gatk release

NewReleases is sending notifications on new releases.