pdfcpu/pdfcpu v0.7.0 on GitHub

Hello!

🧑‍🔬 We packed lots of goodies into this release for you..

Performance

You will like this ✨
Thanks to @fancycode we have improved PDF parsing significantly.
While this is not easily comparable running the pdfcpu testsuite is now 8 seconds faster under MacOS 14.2.1:

Before:

./coverage.sh  67.60s user 13.35s system 119% cpu 1:07.93 total

After:

./coverage.sh  59.64s user 12.55s system 107% cpu 1:07.01 total

PDF 2.0 Support

We now have basic support for writing back PDF 2.0 files.
This means you may start using all pdfcpu operations that update validated PDF 2.0 files.
Basic support means, your mileage may vary, especially when you try to process a file using one of the new 2.0 features.

Since it is hard to get a hand on PDF 2.0 files using a specific new 2.0 feature there is a disclaimer printed on the command line asking for your input and contribution. Please open an issue and share your file in case pdfcpu has a problem digesting your file.
The same applies if you just want to see some specific 2.0 feature supported.

In general, please 🙏🏻 report back any issues - there is no way to fix something that does not get reported!

New Zoom Command

pdfcpu zoom [-p(ages) selectedPages] -- description inFile [outFile]

Zoom in/out of selected pages either by magnification factor or corresponding margin.
When zooming out the unused page content space results into horizontal and vertical margins.
These are different from each other but correspond to a certain factor.

Examples:

Zoom into magnification of 200%

pdfcpu zoom -- "factor: 2"  in.pdf out.pdf

Zoom out to magnification of 50%

pdfcpu zoom -- "factor: .5" in.pdf out.pdf

Zoom out to a magnification equivalent to a horizontal margin of 1 cm

pdfcpu zoom -unit cm -- "hmargin: 1" in.pdf out.pdf

Zoom out to a magnification equivalent to a vertical margin of 30 points.
Draw a border around zoomed out page content and fill unused page space light gray

pdfcpu zoom -- "vmargin: 30, border:true, bgcolor:lightgray" in.pdf out.pdf ...

Please consult pdfcpu help zoom for more and also the official documentation

Enhanced Booklet command

Thanks to @adamgreenhall we have an even more powerful booklet command for producing zines:

We now have booklet styles 2, 4, 6 and 8 and you may choose one of the following booklet types, each representing a certain method for arranging pages into a booklet:

booklet, bookletadvanced, perfectbound

Examples:

Arrange pages of in.pdf 2 per sheet side (4 per sheet, back and front) onto out.pdf

pdfcpu booklet -- "formsize:Letter" out.pdf 2 in.pdf

Arrange pages of in.pdf 4 per sheet side (8 per sheet, back and front) onto out.pdf:

pdfcpu booklet -- "formsize:Ledger" out.pdf 4 in.pdf

Arrange pages of in.pdf 6 per sheet side (12 per sheet, back and front) onto out.pdf

pdfcpu booklet -- "formsize:Ledger" out.pdf 6 in.pdf

Arrange pages of in.pdf 8 per sheet side (16 per sheet, back and front) onto out.pdf

pdfcpu booklet -- "formsize:A3" out.pdf 8 in.pdf

Arrange pages of in.pdf 4 per sheet side, with short-edge binding onto out.pdf

pdfcpu booklet -- "formsize:A3, binding:short" out.pdf 4 in.pdf

Arrange pages of in.pdf 2 per sheetside as sequence of folios covering 4*foliosize pages each.

pdfcpu booklet -- "formsize:A4, multifolio:on" hardbackbook.pdf 2 in.pdf

Arrange pages of in.pdf 2 per sheet side, arranged for perfect binding, onto out.pdf

pdfcpu booklet -- "formsize:A4, btype:perfectbound" out.pdf 2 in.pdf

Arrange pages of in.pdf 4 per sheet side, arranged for advanced binding, onto out.pdf

pdfcpu booklet -- "formsize:A3, btype:bookletadvanced" out.pdf 4 in.pdf

Please consult pdfcpu help booklet for more and also the official documentation

Configuration Changes

There are two changes to the configuration:

validationNone was eliminated
postProcessValidate is new and enables safeguard validation

Validation mode ValidationNone has been eliminated for a couple of reasons.
First of all during validation there are a lot of things happening like internalizing and caching needed for command processing,
secondly PDF validation has become quite performant.

We are introducing the new config flag postProcessValidate.
This flag which is turned on by default enables the validation of your processed cross reference table right before writing.
This is considered a useful safeguard, since in cases when writing back a problematic cross reference table without problems,
only the next read/parse/validation attempt will take notice of a problem.
If you disable this you will get an additional performance boost overall but with the caveat described above.

As usual please renew your configuration!

Form filling now expects the user font Roboto-Regular when using eastern european scripts.
You can do this manually or just remove your pdfcpu configuration all together and recreate it like so:

Locate the pdfcpu folder using pdfcpu conf
Remove/backup the pdfcpu folder
Recreate a brand new pdfcpu folder by executing any pdfcpu cmd on the CLI eg. execute one more time pdfcpu conf
Edit your configuration

Samples And Tests

This all is complementing the official documentation

To get a better understanding of pdfcpu's operations please make sure you check out all tests and the corresponding PDF output and all json input where appropriate:

pdfcpu/pkg/samples/* comes loaded with 230 MB worth of PDFs produced by corresponding tests and json input located at:

pdfcpu/pkg/api/test
pdfcpu/pkg/testdata/json

Thanks

🙏 to all bug reporters and feature requestors.
Special thanks for contributed PRs go to @adamgreenhall, @fancycode, @kalimit, @sivukhin and @afh

Little Commercial Break

pdfcpu is in need of more frequent financial supporters!
Please consider becoming a sponsor especially if you are a (small) business 🙏
If you are a developer within a business please go to your superior or team lead and have them compare the benefits/costs vs. commercial solutions. If you prefer to operate in stealth mode that's fine - you can always become a private sponsor.
What's important is to keep the project funded and on a clear, steady path 🚀

Meet The Maintainer

I will be in the San Francisco Bay Area this fall.
If you are a recurring sponsor or not but a business using pdfcpu I would like to get to know you and your pdfcpu use case. I'll be happy to meet also one-on-one possibly over 🍻 for a technical chat/discussion and to get feedback right from the trenches.
Just get in touch with me: hhrutter@gmail.com

Next Steps

Support for PDF 2.0 encryption will be tackled next, after that digital signatures.
A Beta version is within reach 👍🏻

Have fun 💚 with pdfcpu!

Changelog

dfaa588 Bump version, fix #818
c0a39e9 Add zoom cmd, fix #756
d581dc1 Fix #809
5b7d844 Add config flag postProcessValidate
8735421 Fix #815
88f1b3d Fix #814
268e6bb Merge PR #811
da12eed Fix #813
dedaddc Merge #795, cleanup
95c2d64 Avoid copying from "bytes.Buffer" to get underlying bytes.
044a6c0 Use type switch instead of long list of type tests.
3d4cbdb Further improve parsing of dictionaries / names.
fc87a22 Fix #794
b4af9ea Eliminate model.ValidationNone
d5fd063 Fix #807
8f3e992 Fix #628
cfd7627 Finalize extended booklet cmd as contributed by Adam Greenhall
a893411 Fix booklet cmd parsing, clean up
d3e607d Fix #807
4527ff4 Fix #806
18b8e77 Fix #805
a8b4a4a Fix #798
694f81f Fix #794 , add PDF 2.0 disclaimer
1d5da77 cli documentation
032b32d Fix #773
055e03f Fix #765
261c563 Fix #758, #770
9295163 Fix #779, #780
865e6b7 Fix #724
222cf6c Fix #796
6ae90db Fix #772
96659b7 Fix #789
dae09eb Fix #786
ac6f14a Fix #766
60e13f3 Fix #760
6e235c0 Fix #759
793c509 Fix #771
6935271 Fix bug with types when splitting pdf
ef9bfc9 add type cast check
ef759de fix validation in ParseXRefStreamDict for even sized arrays
6e5acd7 fix bug in clone for FilterPipeline DecodeParams in StreamDict object
12e046d Fix building from distribution archive
043541b Fix #775, #490
04634d3 Add testcase that parses a large dictionary.
bec27a4 Avoid calling "DecodeName" when parsing dictionaries.
d5443fe Add tests for new reading functions that take a Context.
f2e4421 Add new reading / parsing functions that take a Context object.
b89d7b1 Fix #766
e33b502 Fix #755