MultiQC/MultiQC v1.22 on GitHub

Highlights - notebooks and performance

Version 1.22 brings some major behind-the-scenes refactoring to MultiQC. This unlocks a number of new features, such as the ability to use MultiQC as a Python library in scripts / notebooks, and run-time validation of plot config attributes.

This release also introduces some huge performance improvements thanks to @rhpvorderman.
Compared to v1.21, a typical v1.22 run is 53% faster and has a 6x smaller peak-memory footprint - well worth updating! 🏃🏻‍♂️ 💨

Finally, support for the depreciated HighCharts plotting library is fully removed in v1.22, bringing to a close a long standing project to migrate to Plotly.

For more information, please see the upcoming MultiQC release blog article on the Seqera website: https://seqera.io/blog/

MultiQC updates

Remove the highcharts template and Highcharts and Matplotlib dependencies (#2409)
Remove CSP.txt and the linting check, move the script that prints missing hashes under scripts. Admins of servers with Content Security Policy can use it to print missing hashes when they install a new MultiQC version with: python scripts/print_missing_csp.py --report full_report.html (#2421)
Do not maintain change log between releases (#2427)
Use native clipboard API (#2419)
Profile runtime: visualize per-module memory and run time (#2548, #2547)
Refactoring for performance:
- Search file blocks rather than individual lines for faster results (#2513)
- Refactor file content search for a 40% speed increase (#2505)
- Sort filepatterns for faster searching (#2506)
- Use array.array for in-memory plot data, stream to render Jinja and dump JSON to reduce memory requirement (#2515)
- Speed up all modules by caching spectra.scale and using sets instead of lists (#2509)
- Stream json data to a file to save 30% of the memory (#2510)
- Do replace_nan in place rather than creating a new object (#2529)
- Use gzip rather than lzstring for compression and decompression of the plot data (#2504)
- Use gzip level 6 for faster json compression (#2553)
- Clean up module raw data after running each module, significantly reduces the memory footprint (#2551)
Refactoring for interactivity and validation:
- Top-level functions for MultiQC use as a library (#2442)
- Pydantic models for plots and datasets (#2442)
- Validating plot configs with Pydantic (#2534)
- Use dataclasses for table and violin columns (#2546)
- Break up the main run function into submodules (#2446)
- Deprecate multiqc.utils.config and multiqc.utils.report in favour of multiqc.config and multiqc.report (#2542)
- Static typing of the report and config modules (#2445)
- Add type hints into core codebase (#2434)
- Consistent config options: rename decimalPlaces to tt_decimals (#2451)
- Remove encoding and shebang headers from module files (#2425)
- Refactor line plot categories: keep boolean throughout the code, and data points as pairs for simplicity (#2418)
Fixes:
- Fix error when using default sort (#2544)
- Do not attempt to render flat plot when no data (#2490)
- Fix export plots with --export and always export data (#2489)
- Fix: make sure modify lambda not present in JSON dump (#2455)
- Enable --export even when writing interactive plots (#2444)
- Replace NaN with null in exported JSON (#2432)
- Fix y_minrange option (#2415)
Reduce report size: exclude plot data for sections in remove_sections (#2460)
Add ge and le to cond_formatting_rules (#2494)
CI: use uv pip (#2352)
Lint check for use of f["content_lines"] (#2485)
Allow to set style of line graph (lines or lines+markers) per plot (#2413)
Add CMD to Dockerfile so a default run without any parameters displays the --help (#2279)

New modules

Hostile (#2501)
- New module: Hostile is a short and long host reads removal tool
Sequali (#2441)
- New module: Sequali Universal sequencing QC

Module updates

Adapter Removal
- Standardize module names: use the came case (#2433)
Bamdst
- Fix chromosome reports when contig data labels are missing (#2479)
- Fix for the case when chromosomes.report is not provided (#2477)
- Stress file name requirements for chromosomes report (#2478)
BBTools
- Set missing values to None for bbmap qahist (#2411)
Bcftools
- Stats: add multialleic sites column (#2414)
BCL Convert
- Show message when no undetermined reads instead of error (#2526)
- Fix for absent index reads (#2511)
- Add all file types to sources (#2456)
Busco
- Fix barplot colors (#2453)
Cell Ranger
- Fix parsing antibody tab without antibody_treemap_plot (#2525)
Cutadapt
- Speed up module by caching parsing versions (#2528)
DRAGEN
- Add ploidy estimation table (#2496)
fastp
- When could not parse sample name from command (i.e. stdin), use filename and proceed (#2536)
FastQC
- Skip per tile sequence quality section in FastQC reports for better performance (#2552)
- Fix a ZeroDivisionError error (#2462)
- Fix memory leak to make 7 times faster and use 10 times less memory (#2552)
- Do not keep intermediate data in memory to reduce memory footprint further (#2516 )
- Add option to ignore FastQC quality thresholds (#2486)
goleft indexcov
- Work correctly even if no valid contigs in input (#2540)
mosdepth
- Fix absolute coverage plot (#2488)
nonpareil
- Change write_data_file label to be consistent with other modules (#2472)
Picard
- WgsMetrics: coverage plot: show % based ≥x, not >x (#2473)
- CrosscheckFingerprints: support multiple files, preserve sample order in heatmap (#2454)
qc3C
- Fix detecting sample name for relative path (#2502)
QualiMap
- BamQC: when trimming long tails, keep at least 20x (#2431)
Samtools
- Add support for markdup (#2254)
- Add violin multiple datasets & samtools flagstat percentage switch (#2430)
Space Ranger
- fix for missing genomic_dna section (#2429)
xengsort
- Fix parsing long files (do no use content_lines) (#2484)

New Contributors

@clintval made their first contribution in #2254
@alanhoyle made their first contribution in #2279
@rhpvorderman made their first contribution in #2441
@TBradley27 made their first contribution in #2473
@SumeetTiwari07 made their first contribution in #2501

Full Changelog: v1.21...v1.22