- Fix CollectAlignmentSummaryMetrics bug in IS_BISULFITE_SEQUENCE mode (#782). This serious bug was introduced by Sonar-squid automation, the use of which is now removed. Tool now behaves as expected. A unit test for the bisulfite context was added to check the mode behaves as expected going forward.
- Fix CollectHsMetrics and CollectTargetedPcrMetrics to correctly handle read pairs that are fully overlapped when CLIP_OVERLAPPING_READS is set to true (#784). In clipping fully overlapping read pairs, one read is flagged unmapped. However, the TargetMetricsCollector was considering the unmapped read. This is now fixed so that the unmapped read is no longer considered.
- Fix for no calls in VCF of GenotypeConcordance (#785, #768). This pertains to the optional argument OUTPUT_VCF that outputs in VCF format the sites that were a part of the comparison. Any sites with a no call or symbolic allele were causing the tool to error when run with OUTPUT_VCF set to true. The tool ran fine if the optional argument was not specified. Metrics were unaffected by the bug in either mode. Now the tool handles outputting VCFs containing no calls and symbolic alleles.
- Fixes shift in quality used in TargetedMetricsCollector (for HetSensitivity; #771, #769). This was a bug in CollectHsMetrics where soft-clipped bases were being counted towards qualities and other bases were not, in a staggered manner.
- Parallel map in MergableMetric fails since merge is not synchronized. Replaced parallel stream with serial stream. Impacts FindMendelianViolations. (#756)
- Must use a RIBOSOMAL_INTERVALS file if RRNA_FRAGMENT_PERCENTAGE = 0.0 for CollectRnaSeqMetrics. The RRNA_FRAGMENT_PERCENTAGE refers to how much of the read must overlap the RDNA locus. When set to zero, and if there is no intervals file, then the tool now gives an informative error message (#743, #728).
- FilterVcf now logs progress with a progress meter (#757).
- When using the OUTPUT_PER_RG option with RevertSam or SamToFastq, users get an informative error message for missing read groups.
- Allow addition of an optional file extension to results with the FILE_EXTENSION parameter for CollectIlluminaLaneMetrics (#747). This allows, e.g. a .txt extension.
- Added MAX_TARGET_COVERAGE to targeted metrics (#803). This new metric will now be calculated for HsMetrics and TargetedPcrMetrics.
- Optional flag to add an attribute in the SAM/BAM file used to store the size of a duplicate set activated with the TAG_DUPLICATE_SET_MEMBERS option for MarkDuplicates and UmiAwareMarkDuplicatesWithMateCigar. A separate attribute is also added to store which read was selected as representative out of a duplicate set (#562). The
DItag is for duplicate set index and the
DStag is for the duplicate set size.
- Allow adding custom adapter pairs to IlluminaBasecallsToSam (#795).
- Adds a couple of metrics computed on UMIs for UmiAwareMarkDuplicatesWithMateCigar (#740). These metrics are UMI_LENGTH, OBSERVED_UNIQUE_UMIS, INFERRED_UNIQUE_UMIS, DUPLICATE_SETS_WITHOUT_UMI, DUPLICATE_SETS_WITH_UMI, INFERRED_UMI_ENTROPY, OBSERVED_UMI_ENTROPY, and UMI_BASE_QUALITIES. These metrics aid in quality control.
- New approach to splitting on sub-group in EstimateLibraryComplexity (#713) that impacts duplicate marking tools. The change is to the hashing scheme.
- New algorithm makes CollectWgsMetrics run more efficiently (#556).
- Replace repository git link with https link so regular users can clone to repo (#792).
- Add CBCL parser (reader) to plug into command line tools going forward. (#791) CBCL refers to per cycle base call files.
- Add an MIT-style open license that now displays on the Broad/Picard repository landing page as a badge.
- Use default HTSJDK method for standard option CreateSequenceDictionary. (#724) This was necessary as related duplicate code between HTSJDK and Picard is now removed from the Picard repository.