Datasets Features
- Introduce PDF support (#7318) by @yabramuvdi in #7325
>>> from datasets import load_dataset, Pdf
>>> repo = "path/to/pdf/folder" # or username/dataset_name on Hugging Face
>>> dataset = load_dataset(repo, split="train")
>>> dataset[0]["pdf"]
<pdfplumber.pdf.PDF at 0x1075bc320>
>>> dataset[0]["pdf"].pages[0].extract_text()
...
What's Changed
- Fix local pdf loading by @lhoestq in #7466
- Minor fix for metadata files in extension counter by @lhoestq in #7464
- Priotitize json by @lhoestq in #7476
New Contributors
- @yabramuvdi made their first contribution in #7325
Full Changelog: 3.4.1...3.5.0