Type: Feature + bugfix release — matcher robustness (stylized-Unicode / emoji / resolution-marker normalization), Phase-1 channel-name alias matching, Dispatcharr-sourced timezone, multi-source stream dedup, CSV source labeling, plus the project's first automated test suite & CI. Consolidates the work shipped across 1.26.1641824–1.26.1650009.
Matching & normalization
Three normalization passes were added to the top of FuzzyMatcher.normalize_name so stylized stream names match their plain channel names. Each was validated by an old-vs-new corpus diff over the real ~54k-name stream pool and all channel databases, asserting 0 harmful changes (no real ASCII or non-Latin name altered).
Stylized-Unicode decoration stripping (bug-048):
- Streams decorate names with stylized-Unicode tier/format markers — superscripts (
RK: WEATHERNATION ᴿᴬᵂ,… ⁶⁰ᶠᵖˢ,… ⱽᴵᴾ,… ³⁸⁴⁰ᴾ), Latin small-caps (… ꜰʜᴅ), and bullets (◉). The ASCII tag regexes couldn't see them, soRK: WEATHERNATION ᴿᴬᵂnever matched channel WeatherNation (0 matches). - Fix:
normalize_namenow drops whole tokens that are pure stylized decoration, then NFKD-canonicalizes the rest. Decoration is detected by Unicode character name (SUPERSCRIPT/SUBSCRIPT/SMALL CAPITAL/MODIFIER LETTER), not hard-coded code-point ranges — real markers fall outside the obvious blocks (small-capH=U+029C, modifierV=U+2C7D). Collision-safe (ASCIIGold/VIPuntouched) and non-Latin-safe (Arabic/Cyrillic/CJK preserved). Runs unconditionally; punctuation-glued ornaments (◉:, superscriptHD/RAW) are handled too.
Emoji-as-letter substitution (bug-051):
- Some streams use an emoji as a letter:
beIN SP⚽RTS/Sp⚽rts(the soccer ball stands in foro= SPORTS, the beIN family, ~682 names). Previously the ball became a space (sp rts) and never matchedsports. - Fix:
normalize_namemaps an emoji to the letter it replaces only when flanked by ASCII letters (SP⚽RTS→SPORTS), and strips emoji used purely as decoration (♬,☾, standalone⚽, and zero-widthU+FE0F/ZWJ). Recovers the basebeIN Sportsfeed when its streams are present in the selected sources.
Numeric resolution-marker stripping (bug-055):
- Resolution tags the keyword quality patterns missed —
3840P,2160P,1080P/1080i,720P— are now stripped viaRESOLUTION_PATTERNS(\b\d{3,4}[pi]\b, gated by quality stripping, applied before the keyword patterns to avoid space-gluing). Requires thep/iglued to the digits, so bare numbers and single-digit channel numbers (Channel 4,Studio 1080) are untouched.
Matcher score depended on whether rapidfuzz was installed (bug-026):
FuzzyMatcher.calculate_similarityhad two implementations — a rapidfuzz fast path returning1 - distance / max(len), and a pure-Python fallback returning(len1 + len2 - distance) / (len1 + len2). They disagreed: at threshold 95,Fox Sports 1vsFox Sports 2scored 0.917 (rapidfuzz) vs 0.958 (pure-Python), flipping the match decision.- Fix: the pure-Python branch (and its early-termination bounds) now use
1 - distance / max(len), matching rapidfuzz exactly. Production runs the rapidfuzz path, so live behavior is unchanged — only the no-rapidfuzz fallback was corrected. Enforced by an automated parity test.
Features
- Phase-1 channel-name alias matching: an exact-normalized alias layer (
FuzzyMatcher.alias_lookup+ a built-in US alias table) force-includes known aliases into a channel's matches, independent of the fuzzy threshold. A new Custom Aliases setting accepts a JSON object of additional"channel": ["alias", …]mappings (merged with the built-ins; invalid entries are logged and skipped). - Timezone now follows Dispatcharr's global setting: the plugin's own Timezone dropdown was removed. Scheduled runs read the timezone from Dispatcharr (
CoreSettings.get_system_time_zone()), validated viapytz, with aUTCfallback. One less setting to keep in sync. - CSV stream-source labeling: every stream name in a CSV export is now tagged with its M3U source (e.g.
GO: CNN [streamq.tv-bk15]), so identical names from different providers are distinguishable in reports. - Multi-source stream dedup (GitHub #28 / #29): deduplication is keyed on
(name, m3u_account), so the same channel name from different providers both survive (multi-source failover); only true same-source duplicates collapse. Runs after the quality sort, so the kept copy is the best one. - Norwegian (NO) channel database (GitHub #30).
Internal & tooling
- First automated test suite (
tests/, 176 passing):fuzzy_matchermatching/normalization (incl. the three new normalizers, with collision/non-Latin guards),plugin.pypure helpers (via a Django-stubbing conftest), channel-database schema validation, and version-sync. Cases are regression locks derived from the bug history — every fix above ships with one. - CI (
.github/workflows/ci.yml): py_compile + version-sync + database validation + pytest on every push/PR, with a least-privilegepermissionsblock andpytzinstalled for the timezone tests. - Pre-commit gate (
.githooks/pre-commit, opt-in) and helper scripts (scripts/check_version_sync.py,scripts/validate_databases.py). docs/DEVELOPMENT.md+ design specs/plans underdocs/.- Deploy process corrected:
plugin.jsonmtime hot-reload proved unreliable in practice — alwaysdocker restart dispatcharrafter copying, and copy every changed file (incl.fuzzy_matcher.pyand new modules likealiases.py).
Notes
fuzzy_matcher.pybumped to v26.165.0009.- The three normalization fixes are matcher-shared; port guides for the sibling plugins (Channel-Maparr, EPG-Janitor, Lineuparr, Metadata-Trackarr) are kept as local
MATCHER-NORMALIZATION-PORT.mdreferences.