Highlights - notebooks and performance
Version 1.22 brings some major behind-the-scenes refactoring to MultiQC. This unlocks a number of new features, such as the ability to use MultiQC as a Python library in scripts / notebooks, and run-time validation of plot config attributes.
This release also introduces some huge performance improvements thanks to @rhpvorderman.
Compared to v1.21, a typical v1.22 run is 53% faster and has a 6x smaller peak-memory footprint - well worth updating! 🏃🏻♂️ 💨
Finally, support for the depreciated HighCharts plotting library is fully removed in v1.22, bringing to a close a long standing project to migrate to Plotly.
For more information, please see the upcoming MultiQC release blog article on the Seqera website: https://seqera.io/blog/
MultiQC updates
- Remove the
highchartstemplate and Highcharts and Matplotlib dependencies (#2409) - Remove CSP.txt and the linting check, move the script that prints missing hashes under
scripts. Admins of servers with Content Security Policy can use it to print missing hashes when they install a new MultiQC version with:python scripts/print_missing_csp.py --report full_report.html(#2421) - Do not maintain change log between releases (#2427)
- Use native clipboard API (#2419)
- Profile runtime: visualize per-module memory and run time (#2548, #2547)
- Refactoring for performance:
- Search file blocks rather than individual lines for faster results (#2513)
- Refactor file content search for a 40% speed increase (#2505)
- Sort
filepatternsfor faster searching (#2506) - Use
array.arrayfor in-memory plot data, stream to render Jinja and dump JSON to reduce memory requirement (#2515) - Speed up all modules by caching
spectra.scaleand using sets instead of lists (#2509) - Stream json data to a file to save 30% of the memory (#2510)
- Do
replace_nanin place rather than creating a new object (#2529) - Use gzip rather than lzstring for compression and decompression of the plot data (#2504)
- Use gzip level 6 for faster json compression (#2553)
- Clean up module raw data after running each module, significantly reduces the memory footprint (#2551)
- Refactoring for interactivity and validation:
- Top-level functions for MultiQC use as a library (#2442)
- Pydantic models for plots and datasets (#2442)
- Validating plot configs with Pydantic (#2534)
- Use dataclasses for table and violin columns (#2546)
- Break up the main run function into submodules (#2446)
- Deprecate
multiqc.utils.configandmultiqc.utils.reportin favour ofmultiqc.configandmultiqc.report(#2542) - Static typing of the report and config modules (#2445)
- Add type hints into core codebase (#2434)
- Consistent config options: rename
decimalPlacestott_decimals(#2451) - Remove encoding and shebang headers from module files (#2425)
- Refactor line plot categories: keep boolean throughout the code, and data points as pairs for simplicity (#2418)
- Fixes:
- Fix error when using default sort (#2544)
- Do not attempt to render flat plot when no data (#2490)
- Fix export plots with
--exportand always export data (#2489) - Fix: make sure
modifylambda not present in JSON dump (#2455) - Enable
--exporteven when writing interactive plots (#2444) - Replace
NaNwithnullin exported JSON (#2432) - Fix
y_minrangeoption (#2415)
- Reduce report size: exclude plot data for sections in
remove_sections(#2460) - Add
geandletocond_formatting_rules(#2494) - CI: use
uv pip(#2352) - Lint check for use of
f["content_lines"](#2485) - Allow to set style of line graph (
linesorlines+markers) per plot (#2413) - Add
CMDtoDockerfileso a default run without any parameters displays the--help(#2279)
New modules
- Hostile (#2501)
- New module: Hostile is a short and long host reads removal tool
- Sequali (#2441)
- New module: Sequali Universal sequencing QC
Module updates
- Adapter Removal
- Standardize module names: use the came case (#2433)
- Bamdst
- BBTools
- Set missing values to
Noneforbbmap qahist(#2411)
- Set missing values to
- Bcftools
- Stats: add multialleic sites column (#2414)
- BCL Convert
- Busco
- Fix barplot colors (#2453)
- Cell Ranger
- Fix parsing antibody tab without
antibody_treemap_plot(#2525)
- Fix parsing antibody tab without
- Cutadapt
- Speed up module by caching parsing versions (#2528)
- DRAGEN
- Add ploidy estimation table (#2496)
- fastp
- When could not parse sample name from command (i.e.
stdin), use filename and proceed (#2536)
- When could not parse sample name from command (i.e.
- FastQC
- Skip per tile sequence quality section in FastQC reports for better performance (#2552)
- Fix a
ZeroDivisionErrorerror (#2462) - Fix memory leak to make 7 times faster and use 10 times less memory (#2552)
- Do not keep intermediate data in memory to reduce memory footprint further (#2516 )
- Add option to ignore FastQC quality thresholds (#2486)
- goleft indexcov
- Work correctly even if no valid contigs in input (#2540)
- mosdepth
- Fix absolute coverage plot (#2488)
- nonpareil
- Change write_data_file label to be consistent with other modules (#2472)
- Picard
- qc3C
- Fix detecting sample name for relative path (#2502)
- QualiMap
- BamQC: when trimming long tails, keep at least 20x (#2431)
- Samtools
- Space Ranger
- fix for missing
genomic_dnasection (#2429)
- fix for missing
- xengsort
- Fix parsing long files (do no use
content_lines) (#2484)
- Fix parsing long files (do no use
New Contributors
- @clintval made their first contribution in #2254
- @alanhoyle made their first contribution in #2279
- @rhpvorderman made their first contribution in #2441
- @TBradley27 made their first contribution in #2473
- @SumeetTiwari07 made their first contribution in #2501
Full Changelog: v1.21...v1.22