broadinstitute/gatk 4.beta.6 on GitHub

This release brings a critical bug fix to the GenomicsDBImport tool related to sample ordering, plus a new tool FixCallSetSampleOrdering to repair vcfs generated using the pre-4.beta.6 version of the tool. See the description of the bug in #3682 to determine whether you are affected. Do not run FixCallSetSampleOrdering unless you are sure that you are affected by the bug in #3682.

Other highlights include upgrading to the latest version of the Picard tools, and adding engine support for reading Gencode GTF files.

A docker image for this release can be found in the broadinstitute/gatk repository on dockerhub. Within the image, cd into /gatk then run gatk-launch commands as usual.

Note: Due to our current dependency on a snapshot of google-cloud-java, this release cannot be published to maven central.

Full list of changes for this release:

Fixed sample name reordering bug in GenomicsDBImport (#3667)
New tool FixCallSetSampleOrdering to repair vcfs affected by #3682 (#3675)
Integrate latest Picard tools via Picard jar. (#3620)
Adding in codec to read from Gencode GTF files. Fixes #3277 (#3410)
Upgrade to HTSJDK version 2.12.0 (#3634)
Upgrade to GKL version 0.7 (#3615)
Upgrade to GenomicsDB version 0.7.0 (#3575)
Upgrade Mockito from 1.10.19 -> 2.10.0. (#3581)
Add GVCF support to VariantsSparkSink (#3450)
Fix writing variants to GCS buckets (#3485)
Support unmapped reads in Spark. (#3369)
Correct gVCF header lines (#3472)
Dump more evidence info for SV pipeline debugging (#3691)
Add omitFromCommandLine=true for example tools (#3696)
Change gatkDoc and gatkTabComplete build tasks to include Picard. (#3683)
Adding data.table R package. (#3693)
Added a missing newline in ParamUtils method. (#3685)
Fix minor HTML issues in ReadFilter documentation (#3654)
Add CRAM integration tests for HaplotypeCaller. (#3681)
Fix SamAssertionUtils SortSam call. (#3665)
Add ExtremeReadsTest (#3070)
removing required FASTA reference input that was needed before (for its dict) for sorting variants in output VCF, now using header in input SAM/BAM (#3673)
re-enable snappy use in htsjdk (#3635)
fix 3612 (#3613)
pass read metadata to all code that needs to translate contig ids using read metadata (#3671)
quick fix for broken read (mapped to no ref bases) (#3662)
Fix log4j logging by removing extra copy from the classpath.#2622 (#3652)
add suggestion to regularly update gcloud to README (#3663)
Automatically distribute the BWA-MEM index image file to executors for BwaSpark (#3643)
Have PSFilter strip mate number from read names (#3640)
Added the tool PreprocessIntervals that bins the intervals given by the user to be used for coverage collection. (#3597)
Cpx SV PR serisers, part-4 (#3464)
fixed bug in which F1R2 and F2R1 annotation kept discarded alleles (#3636)
imprecise deletion calling (#3628)
Significant improvements to CalculateContamination (#3638)
Adds supplementary alignment info into fastq output, also additional… (#3630)
Adding tool to annotate with pair orientation info (#3614)
add elapsed time to assembly info in intervals file (#3629)
Created a VariantAnnotationArgumentCollection to reduce code duplication and added a StandardM2Annotation group (#3621)
Docs for turning assembled haplotypes into variant alleles (#3577)
Simplify spark_eval scripts and improve documentation. (#3580)
Renames StructuralVariantContext to SVContext. (#3617)
Added KernelSegmenter. (#3590)
Fix bug in for allele order independant comparison (#3616)
Docs for local assembly (#3363)
Added a method to VariantContextUtils which supports allele alt allele order independant comparison of variant contexts. (#3598)
Fixed incorrect logger in CollectAllelicCounts and RecalibrationReport. (#3606)
updating to newer htsjdk snapshot (#3588)
clear diffuse high frequency kmers (#3604)
update SmithWatermanAligner in preparation for native optimized aligner (#3600)
added spark tool for extracting original SAM records based on a file containning read names (#3589)
update README with correct path to install_R_packages.R #3601 (#3602)
HostAlignmentReadFilter and PSScorer use only identity scores and exp… (#3537)
Fixed alt-allele count in AllelicCountCollector and changed unspecified alleles in AllelicCount to N. (#3550)
Fix bad version check in manage_sv_pipeline.sh (#3595)
Use a handmade TestReferenceMultiSource in tests instead of a mock. (#3586)
Repackage ReadFilter plugin tests (#3525)
BamOut in M2 WDL and unsupported version with NIO for SpecOps Team (#3582)
Changed the path for posting the test reports
updates sv manager and cluster creation scripts to utilize dataproc cluster timed self-termination feature (#3579)
Implemented watershed algorithm for finding local minima in 1D data based on topological persistence. (#3515)
Reduce number of output partitions in PathSeqPipelineSpark (#3545)
add gathering of imprecise evidence links and extend evidence intervals to make links coherent in most cases (#3469)
Refactor PrimaryAlignmentReadFilter to PrimaryLineReadFilter (#3195)
Update ReadFilters documentation (#3128)
Changes in BwaMemIntegrationTest to avoid a 3-4 minutes runtime. (#3563)
Make error informative for non-diploid family likelihoods #3320 (#3329)
TableFeature javadoc and more tests (#3175)
Re-enable ancient BED test in IndexFeatureFile. (#3507)
add external evidence stream for CNVs (#3542)
clip M2 alleles before emitting in case some alleles were dropped (#3509)
Docs for M2 filtering (#3560)
Fix static test blocks and @BeforeSuite usages to prevent excessive code execution when tests aren't included in a suite. (#3551)
hide prototyping tools in sv package from help message (but still runnable if knowing their existence) (#3556)
Add support for running tools with omitFromCommandLine=true (#3486)
Adds utility methods to ReadUtils and CigarUtils. (#3531)
Cpx SV PR serisers, part-3 (#3457)