github broadinstitute/picard 2.9.5
on GitHub

Bug fixes

  • Fix MergeBamAlignment’s AbstractAlignmentMerger to use actual max records in RAM instead of a hard-coded constant (#834). The hard-coded constant, defined by MAX_RECORDS_INRAM was set to 500K. Now, the tool instead calls upon this.maxRecordsInRam to get the value for the tool argument MAX_RECORDS_IN_RAM. Lower this parameter to improve performance in low memory machines.
  • Fix absence of samples in CollectVariantCallingMetrics when a sample’s calls are all homozygous-reference (hom-ref; #846; addresses bug #797). If a sample’s calls were all hom-ref, CollectVariantCallingMetrics would omit the sample from the metrics collection. With the fix, the tool now lists the sample in the metrics.

New features

  • DownsampleSam supports piping from standard input (stdin; #839). For example, the tool can now handle samtools view -h xyz.bam | java -jar $PICARD DownsampleSam I=/dev/stdin O=downsample.bam METRICS_FILE=downsample_metrics.txt.
  • DownsampleSam now produces QualityYieldMetrics for the resulting reads after downsampling (#840). The tool requires specifying the name of the metrics file and does not add any file extension. So to specify a text file, e.g. set METRICS_FILE=xyz_metrics.txt. The metrics are the same as that produced by the standalone tool CollectQualityYieldMetrics and describe the general quality of reads. Specifically, the metrics bin read bases by quality and stratify by reads passing filters (PF) as defined by the 0x200 SAM flag.
  • CreateSequenceDictionary now allows short name arguments (#844). GENOME_ASSEMBLY, URI and SPECIES are each also callable with AS, UR and SP, respectively.
  • BedToIntervalList documentation clarifies tool requirements (#842). Namely, the
    SEQUENCE_DICTIONARY or SD argument uses five different types of files from which the tool can find and derive reference contig information. These are an actual .dict dictionary file, a .fa or .fasta reference with a dictionary in the same directory, an interval list with @SQ lines in the header, a SAM/BAM file with @SQ lines in the header and a VCF with #contig lines in the header. Code is unchanged.
  • CrossCheckFingerprints now includes tumor-aware results when results have been rolled-up to sample or library level (#642). The tool crosschecks sample matches using BAMs and VCFs at various levels, e.g. BAM file, sample or library and writes results to either a matrix (MATRIX_OUTPUT) or a metrics file (OUTPUT). The tool allows for missing read group RG tags in a BAM when VALIDATION_STRINGENCY is not STRICT, i.e. LENIENT or SILENT.
  • New tool ClusterCrosscheckMetrics takes metric output of CrosscheckFingerprints and finds clusters of groups that connect via high LOD (#642).
  • Enable LiftOverVcf to lift over negative strand bi-allelic indels that it previously could not lift over (#836). Here, bi-allelic refers to not multiallelic and not mixed-type variant. That is, the tool now lifts over indel variants that then land on the reverse strand in the lifted-to-reference. The tool will then left-aligns the indels as per doi: 10.1093/bioinformatics/btv112.
  • Add CBCL support to ExtractIlluminaBarcodes and CollectBasecallingMetrics (#848). The briefly lived USE_NEW_CONVERTER argument in IlluminaBasecallsToSam is now defunct in favor of the tool automatically detecting CBCL or BCL data types without user specification. That is, the tool now supports CBCL (NovaSeq) sequencer data seamlessly.

For developers

  • Cleanup Fingerprinting code (#850). Remove unused merge by sample function and leave the mergeBy function to enable rolling-up by sample.
3 years ago