Bug fixes
- Fix MergeBamAlignment's AbstractAlignmentMerger to use actual max records in RAM instead of a hard-coded constant (#834). The hard-coded constant, defined by
MAX_RECORDS_INRAM
was set to 500K. Now, the tool instead calls uponthis.maxRecordsInRam
to get the value for the tool argumentMAX_RECORDS_IN_RAM
. Lower this parameter to improve performance in low memory machines. - Fix absence of samples in CollectVariantCallingMetrics when a sample's calls are all homozygous-reference (hom-ref; #846; addresses bug #797). If a sample's calls were all hom-ref, CollectVariantCallingMetrics would omit the sample from the metrics collection. With the fix, the tool now lists the sample in the metrics.
New features
- DownsampleSam supports piping from standard input (stdin; #839). For example, the tool can now handle
samtools view -h xyz.bam | java -jar $PICARD DownsampleSam I=/dev/stdin O=downsample.bam METRICS_FILE=downsample_metrics.txt
. - DownsampleSam now produces QualityYieldMetrics for the resulting reads after downsampling (#840). The tool requires specifying the name of the metrics file and does not add any file extension. So to specify a text file, e.g. set
METRICS_FILE=xyz_metrics.txt
. The metrics are the same as that produced by the standalone tool CollectQualityYieldMetrics and describe the general quality of reads. Specifically, the metrics bin read bases by quality and stratify by reads passing filters (PF) as defined by the 0x200 SAM flag. - CreateSequenceDictionary now allows short name arguments (#844).
GENOME_ASSEMBLY
,URI
andSPECIES
are each also callable withAS
,UR
andSP
, respectively. - BedToIntervalList documentation clarifies tool requirements (#842). Namely, the
SEQUENCE_DICTIONARY
orSD
argument uses five different types of files from which the tool can find and derive reference contig information. These are an actual.dict
dictionary file, a.fa
or.fasta
reference with a dictionary in the same directory, an interval list with@SQ
lines in the header, a SAM/BAM file with@SQ
lines in the header and a VCF with#contig
lines in the header. Code is unchanged. - CrossCheckFingerprints now includes tumor-aware results when results have been rolled-up to sample or library level (#642). The tool crosschecks sample matches using BAMs and VCFs at various levels, e.g. BAM file, sample or library and writes results to either a matrix (
MATRIX_OUTPUT
) or a metrics file (OUTPUT
). The tool allows for missing read groupRG
tags in a BAM whenVALIDATION_STRINGENCY
is not STRICT, i.e. LENIENT or SILENT. - New tool ClusterCrosscheckMetrics takes metric output of CrosscheckFingerprints and finds clusters of groups that connect via high LOD (#642).
- Enable LiftOverVcf to lift over negative strand bi-allelic indels that it previously could not lift over (#836). Here, bi-allelic refers to not multiallelic and not mixed-type variant. That is, the tool now lifts over indel variants that then land on the reverse strand in the lifted-to-reference. The tool will then left-aligns the indels as per doi: 10.1093/bioinformatics/btv112.
- Add CBCL support to ExtractIlluminaBarcodes and CollectBasecallingMetrics (#848). The briefly lived
USE_NEW_CONVERTER
argument in IlluminaBasecallsToSam is now defunct in favor of the tool automatically detecting CBCL or BCL data types without user specification. That is, the tool now supports CBCL (NovaSeq) sequencer data seamlessly.
For developers
- Cleanup Fingerprinting code (#850). Remove unused
merge by sample
function and leave themergeBy
function to enable rolling-up by sample.