github domainaware/parsedmarc 10.0.0

5 hours ago

Enhancements

Support for RFC 9989 / RFC 9990 / RFC 9991 reports

Adds parsing support for the final DMARC specification (RFC 9989), the new aggregate-report schema (RFC 9990), and the new failure-report format (RFC 9991), while preserving full RFC 7489 / RFC 6591 backward compatibility.

New aggregate-report fields surfaced from the RFC 9990 XSD — added to types, parsing, CSV output, and Elasticsearch/OpenSearch mappings:

  • np — non-existent subdomain policy (none/quarantine/reject)
  • testing — testing mode flag (n/y); reports whether the published DMARC record sets t=y. It is a new field, not a replacement for pct; the pct mechanism was removed entirely by RFC 9989 Appendix A.6 with no per-message replacement.
  • discovery_method — policy discovery method (psl/treewalk)
  • generator — report generator software identifier, in report_metadata
  • human_result — optional descriptive text on DKIM/SPF auth results (langAttrString; a possible lang attribute is automatically unwrapped)
  • xml_namespace — the XML namespace declared on the <feedback> root, if any. RFC 9990 reports declare urn:ietf:params:xml:ns:dmarc-2.0.

pct is no longer present in RFC 9990's PolicyPublishedType and parses as None when absent. fo is still part of RFC 9990 and is preserved when set; it parses as None only when the reporter omits it.

The parser detects an RFC 9990 report from the dmarc-2.0 XML namespace or the presence of any RFC 9990-only field, so namespaceless reports that follow the RFC 9990 shape still receive RFC 9990-aware validation warnings (missing required DKIM selector, removed-in-RFC-9990 policy-override types forwarded / sampled_out). RFC 9990 also added policy_test_mode to the policy-override enumeration; it is parsed and stored unchanged.

For failure reports (RFC 9991), Identity-Alignment and Auth-Failure are split on CFWS-aware commas (whitespace is stripped from each token, per the RFC 9991 ABNF) and a warning is logged when either REQUIRED field is missing.

Several elements that became langAttrString in RFC 9990 (extra_contact_info, error, comment, human_result) are now safely unwrapped when the reporter sends them with a lang attribute.

Backwards compatibility to RFC 7489 is maintained.

PostgreSQL storage backend

New optional PostgreSQL output backend as a lighter-weight alternative to Elasticsearch/OpenSearch, configured via a [postgresql] section (host/port/user/password/database or a libpq connection_string), or equivalently through PARSEDMARC_POSTGRESQL_* environment variables and their _FILE Docker-secret variants like every other backend. Tables are created automatically on first run, and the schema captures the RFC 9990 aggregate fields (np, testing, discovery_method, generator, xml_namespace, and per-result human_result). A Grafana dashboard (dashboards/grafana/Grafana-DMARC_Reports-PostgreSQL.json) is included. Aggregate and SMTP-TLS reports are de-duplicated via ON CONFLICT; failure reports via an arrival-date / From / To / Subject check mirroring the Elasticsearch backend.

The backend is opt-in: install it with pip install parsedmarc[postgresql] (it pulls in psycopg). It is not a mandatory dependency because the prebuilt psycopg binary wheels are not available for every platform.

Docker-secret support via _FILE env vars

Any PARSEDMARC_{SECTION}_{KEY} environment variable can now also be supplied via a file by appending _FILE to its name (e.g. PARSEDMARC_IMAP_PASSWORD_FILE=/run/secrets/imap_password). The file's contents (with trailing CR/LF stripped) are used as the value. This is the same convention used by the official Postgres, MariaDB, and Redis container images, so credentials no longer have to appear in plain environment: blocks where docker inspect, container logs, and /proc/<pid>/environ would expose them.

When both the direct var and its _FILE companion are set, the file wins. A missing or unreadable file raises ConfigurationError rather than silently falling back to an empty value. The four pre-existing *_file config keys ([general] log_file, [msgraph] token_file, [gmail_api] credentials_file, [gmail_api] token_file) keep their direct-path semantics; wrap them in a Docker secret by doubling the suffix (PARSEDMARC_GMAIL_API_CREDENTIALS_FILE_FILE).

Elastic Cloud Serverless compatibility

New [elasticsearch] serverless config flag (env var PARSEDMARC_ELASTICSEARCH_SERVERLESS). Elastic Cloud Serverless manages sharding and replication itself and rejects the number_of_shards / number_of_replicas index settings with HTTP 400 — previously every write into a Serverless project failed at index-creation time. With the flag set, create_indexes strips those two keys from the settings sent to Elasticsearch and passes any other settings (e.g. refresh_interval) through unchanged. Non-Serverless deployments are unaffected.

Bug fixes

  • save_smtp_tls_report_to_s3 was completely broken. parsedmarc/s3.py:save_report_to_s3 unconditionally read report["report_metadata"] when assembling S3 object metadata, but SMTP TLS reports are flat per RFC 8460 §4.3 — they have no report_metadata sub-object — and parse_smtp_tls_report_json correctly stores begin_date as the raw ISO-8601 string from the report. The S3 path branch also assumed begin_date was a datetime and did .year / .month / .day on it. The CLI's surrounding try/except silently swallowed the resulting KeyError, so every SMTP-TLS report quietly failed to upload to S3 in production. Both issues are fixed: SMTP-TLS metadata is now built from the flat report fields directly, and the date is normalized via human_timestamp_to_datetime.
  • append_json corrupted JSON output files on the second write. The original implementation opened files in "a+" mode, then seek()ed backwards to overwrite the trailing ] with ,\n before appending more elements. Python's docs are explicit: on POSIX, writes in "a"/"a+" mode always go to EOF regardless of seek position. The result was that every second call onto an existing file produced [...]\n],\n[...]-style corrupted output instead of a single merged JSON array. Anyone running parsedmarc in watch mode with JSON output enabled had aggregate.json / failure.json / smtp_tls.json quietly turning into invalid JSON after the first overlap. Replaced with a read-merge-write pattern: load the existing array (if any), append the new elements, rewrite the whole file. append_csv was not affected — it doesn't seek backwards.
  • Removed redundant try/except in parsedmarc/webhook.py. save_aggregate_report_to_webhook / save_failure_report_to_webhook / save_smtp_tls_report_to_webhook each wrapped self._send_to_webhook(...) in a try/except, but _send_to_webhook already catches every Exception itself, so the outer except blocks were unreachable dead code.
  • Report files whose names contain glob metacharacters were silently skipped. The CLI expanded every file argument with glob() (parsedmarc/cli.py), which interprets [, ], *, and ? as pattern syntax (see the glob docs). A literal path such as [Netease DMARC Failure Report] Rent Reminder.eml — the bracketed shape many providers use for emailed failure reports — was treated as a character class, matched nothing, and was dropped before reaching the parser, with no error. File arguments that already exist on disk are now taken literally; only non-existent paths are treated as glob patterns, so shell-style wildcards (samples/*.xml) still expand.
  • OpenSearch Dashboards reported a mapping conflict on the aggregate index pattern's org_email field. The shipped dashboards/opensearch/opensearch_dashboards.ndjson froze a cached field-list snapshot in which org_email was a text / object conflict, alongside leftover org_email.#text and org_email.#text.keyword subfields — artifacts of a cluster that had once indexed a langAttrString email dict ({"#text": …, "@lang": …}) before the parser unwrapped it. org_email is mapped as Text() and the parser now unwraps a dict email to a plain string, so live data is consistent; cleared the stale conflict and the two artifact subfields from the index pattern, leaving org_email (text) and org_email.keyword so importers no longer see the warning.
  • dashboard-dev-bootstrap.sh imported the OpenSearch Dashboards saved objects into the wrong tenant. The script sent securitytenant: global_tenant, but the OpenSearch security plugin reads that header as a tenant name, and global_tenant is a sample custom tenant shipped in the security demo config — not the shared Global tenant, whose token is the literal global. The import succeeded into a separate global_tenant tenant (its own .kibana_<hash>_globaltenant_1 index), so the dashboards were invisible to anyone viewing the Global tenant in OpenSearch Dashboards. Changed the default OSD_TENANT to global. (An empty/omitted securitytenant header is not equivalent — it falls back to the user's configured default tenant, not Global.) This affects the contributor dev stack only, not the shipped dashboards.

Breaking changes

Forensic reports have been renamed to failure reports

Forensic reports have been renamed to failure reports throughout the project to reflect the proper naming of the reports since RFC 7489.

  • Core: types.py, __init__.pyForensicReportFailureReport, parse_forensic_reportparse_failure_report, report type "failure"
  • Output modules: elastic.py, opensearch.py, splunk.py, kafkaclient.py, syslog.py, gelf.py, webhook.py, loganalytics.py, s3.py
  • CLI: cli.py — args, config keys, index names (dmarc_failure)
  • Docs & dashboards: all markdown, Grafana JSON, OpenSearch NDJSON, Splunk XML
Backward compatibility
  • Old function/type names preserved as aliases: parse_forensic_report = parse_failure_report, ForensicReport = FailureReport, etc.
  • CLI config accepts both old (save_forensic, forensic_topic) and new keys (save_failure, failure_topic)
  • The archive subfolder for failure reports is now Failure (under archive_folder), renamed from Forensic. To avoid a split archive across Forensic/ and Failure/, parsedmarc migrates an existing Forensic subfolder into Failure automatically on startup (best-effort): it renames the folder when no Failure folder exists yet, merges the two when both already exist, and logs-and-skips any mailbox it cannot reorganize (warn, don't crash). This consolidation uses the folder-management API (folder_exists / rename_folder / merge_folders) added in mailsuite 2.1.0, so the required mailsuite version is now >=2.1.0.
  • RFC 7489 reports parse with None for RFC 9990-only fields
  • Updated dashboards with queries are backward compatible: queries match data indexed under both old (dmarc_forensic* / dmarc:forensic) and new (dmarc_failure* / dmarc:failure) names, so dashboards show data from before and after the rename:
    • OpenSearch Dashboards: Index pattern uses dmarc_f* to match both dmarc_forensic* and dmarc_failure*
    • Splunk: Base search queries (sourcetype="dmarc:failure" OR sourcetype="dmarc:forensic")
    • Elasticsearch/OpenSearch: Duplicate-check searches query across both dmarc_failure* and dmarc_forensic* index patterns

Don't miss a new parsedmarc release

NewReleases is sending notifications on new releases.