github ail-project/ail-framework v6.6
AIL v6.6 – Advanced PDF Processing and Manual Crawling Support

7 hours ago

AIL v6.6 is a release with a strong focus on PDF ingestion and translation, crawler improvements, and operational enhancements across users, queues, and metadata handling.

This version significantly expands AIL’s document-processing and data-collection capabilities by introducing a hardened PDF ingestion pipeline where all PDFs are converted to PDF/A and stripped of embedded metadata before ingestion to remove malicious content.

It also allows users to browse content locally with Lacus and send captured pages directly to AIL as crawler data, along with associated browser cookies and local storage imported as a cookiejar for reuse by the crawler, while continuing to improve reliability, scalability, and analyst workflows.

New PDF Parsing Capabilities

The example below is an automatic conversion of PDF document processed in AIL from a chat channel where the translation is done automatically and displayed as an overlay on the original text.

Screenshot from 2025-12-12 09-59-50

The PDF metadata are also now considered as new pivot data points (e.g. author name) in AIL to allow correlation among those selectors.

Screenshot from 2025-12-12 10-37-10

Highlights

Process PDF

PDF content is extracted and processed as markdown.

  • Full PDF file support: processing, correlation, and content extraction to Markdown
  • New PDF translation pipeline with:
    • Optional translation toggle
    • Improved translation quality and layout
    • Progress tracking
    • Automatic saving of translated PDFs
  • Extended PDF metadata handling:
    • Author extraction and correlation
  • Default PDF size limit set to 100 MB, with configurable limits

Crawler & Capture Improvements

  • Improved crawler request handling with user-specific cookiejar local storage
  • New capability to manually import crawler captures (a documentation will follow in the next days)
  • Fixed crawler importer endpoints, validation, and API responses
  • Improved domain dashboard caching and capture status handling
  • Updated Tor Browser user agent

Tracker & Queues

  • Improved performance of regex-based trackers
  • New dedicated queue for tracker modules
  • Improved handling of decoded tracker terms and non-string content
  • Better file-name processing for trackers and modules
  • Corrected correlation between file names and PDFs

Metadata, Correlation & Objects

  • New optional custom metadata on AIL objects
  • Improved file-name and PDF correlation logic
  • Domains are now automatically tagged when screenshots or images are unsafe

Images & Screenshots Engine

  • Improved LLM prompts
  • Better domain description generation
  • Fixed multiple issues where LLM responses did not fully respect requested formats

Chat & Monitoring

  • Improved visibility of reply and forwarded_from metadata in message viewers
  • Fixed missing forwarded metadata in chat views
  • Enhanced chat monitoring requests:
    • Track new requests
    • Mark as done or rejected

Users & Administration

  • New admin user view
  • Ability to enable or disable user accounts
  • Improved user settings management

Others

  • Added Meilisearch installation instructions to HOWTO
  • Added extensive crawler API tests covering authentication, validation, and error handling
  • Optimized GitHub Actions workflows

🔧 Fixes & Maintenance

This release includes a large number of bug fixes across:

  • Tracker navigation and object removal
  • Cookiejar imports and ACL handling
  • Feeder dashboards and API cleanup
  • Language detection edge cases
  • Domain explorer pagination (I2P)
  • Multiple UI consistency issues

Dependency updates include:

  • Addition of faup-rs and pymupdf
  • Development dependency bumps (e.g. Vite)

Thanks to Our Contributors

A big thank you to everyone who contributed to AIL v6.6 through code, reviews, testing, and feedback.

Special thanks to:
@cavedave, and @alexandercronin

Funding 🇪🇺

AIL is developed and maintained with the support of the European Union as part of the HOPLITE European Project.
HOPLITE aims to strengthen the capacity of law enforcement authorities to gather, analyse, and share open-source and closed-source intelligence, integrating AI-driven analysis to enhance real-time threat detection and response workflows. The project builds on previous EU-funded initiatives and enhances key components such as AIL and MISP to deliver scalable, responsible, and actionable threat intelligence capabilities for cybersecurity and public safety across EU Member States.

Law enforcement agencies willing to discover and leverage the MISP/AIL-LEA platforms can apply on the misp-lea.org website.
EU logo

Don't miss a new ail-framework release

NewReleases is sending notifications on new releases.