- Fix MergeBamAlignment’s AbstractAlignmentMerger to use actual max records in RAM instead of a hard-coded constant (#834). The hard-coded constant, defined by
MAX_RECORDS_INRAMwas set to 500K. Now, the tool instead calls upon
this.maxRecordsInRamto get the value for the tool argument
MAX_RECORDS_IN_RAM. Lower this parameter to improve performance in low memory machines.
- Fix absence of samples in CollectVariantCallingMetrics when a sample’s calls are all homozygous-reference (hom-ref; #846; addresses bug #797). If a sample’s calls were all hom-ref, CollectVariantCallingMetrics would omit the sample from the metrics collection. With the fix, the tool now lists the sample in the metrics.
- DownsampleSam supports piping from standard input (stdin; #839). For example, the tool can now handle
samtools view -h xyz.bam | java -jar $PICARD DownsampleSam I=/dev/stdin O=downsample.bam METRICS_FILE=downsample_metrics.txt.
- DownsampleSam now produces QualityYieldMetrics for the resulting reads after downsampling (#840). The tool requires specifying the name of the metrics file and does not add any file extension. So to specify a text file, e.g. set
METRICS_FILE=xyz_metrics.txt. The metrics are the same as that produced by the standalone tool CollectQualityYieldMetrics and describe the general quality of reads. Specifically, the metrics bin read bases by quality and stratify by reads passing filters (PF) as defined by the 0x200 SAM flag.
- CreateSequenceDictionary now allows short name arguments (#844).
SPECIESare each also callable with
- BedToIntervalList documentation clarifies tool requirements (#842). Namely, the
SDargument uses five different types of files from which the tool can find and derive reference contig information. These are an actual
.dictdictionary file, a
.fastareference with a dictionary in the same directory, an interval list with
@SQlines in the header, a SAM/BAM file with
@SQlines in the header and a VCF with
#contiglines in the header. Code is unchanged.
- CrossCheckFingerprints now includes tumor-aware results when results have been rolled-up to sample or library level (#642). The tool crosschecks sample matches using BAMs and VCFs at various levels, e.g. BAM file, sample or library and writes results to either a matrix (
MATRIX_OUTPUT) or a metrics file (
OUTPUT). The tool allows for missing read group
RGtags in a BAM when
VALIDATION_STRINGENCYis not STRICT, i.e. LENIENT or SILENT.
- New tool ClusterCrosscheckMetrics takes metric output of CrosscheckFingerprints and finds clusters of groups that connect via high LOD (#642).
- Enable LiftOverVcf to lift over negative strand bi-allelic indels that it previously could not lift over (#836). Here, bi-allelic refers to not multiallelic and not mixed-type variant. That is, the tool now lifts over indel variants that then land on the reverse strand in the lifted-to-reference. The tool will then left-aligns the indels as per doi: 10.1093/bioinformatics/btv112.
- Add CBCL support to ExtractIlluminaBarcodes and CollectBasecallingMetrics (#848). The briefly lived
USE_NEW_CONVERTERargument in IlluminaBasecallsToSam is now defunct in favor of the tool automatically detecting CBCL or BCL data types without user specification. That is, the tool now supports CBCL (NovaSeq) sequencer data seamlessly.
- Cleanup Fingerprinting code (#850). Remove unused
merge by samplefunction and leave the
mergeByfunction to enable rolling-up by sample.