zpaqfranz now...

"hopefully" intercept control-c to delete empty 0 bytes long chunks
"hopefully" automagically delete 0 bytes long chunks before run
"hopefully" intercept control-c to delete 0 bytes long archives
get a better scanning... update (every 1 sec)

New hasher QUICK (just a fake hash!)

zpaqfranz sum j:\ -quick -summary -ssd

This is a "fake" hash, better a similarity estimator.
For (smaller than 64KB) file get a full xxhash64, for larger one takes xxhash64 of 16KB (head), 16KB (middle), 16KB (tail).

The use, as can be understood (!), is twofold

1) Rapid estimation of file-level duplication of very large amounts of data
Using "exact" systems, i.e., calculating the hashes of each individual file to search for duplicates, is (still) very slow and expensive
"Quick" hashing, of course, does not guarantee against "wrong collision" at all (this happens even for small amounts of data)
The effect is to depend more on the number of files than on their size, running @ 50GB/s or even more, much more.
Sometimes you want to quickly "understand" if a new file server can benefit from de-duplication

2) fast check for backups

New command backup

As everyone knows (or may be not) my very first contribute to zpaq was a rudimentary implementation of multipart archives, then merged by Mahoney (with his usual high skill).

Unfortunately, however, zpaq is more an archiver rather than a backup system: there are no realistic ways to check the integrity of multipart archives.

There are critical cases where you want to do cloud backups on systems that do NOT allow the --append of rsync (OK, rclone and robocopy, I'm talking about you)

Similarly, computing hashes on inexpensive VPS cloud systems, usually with very slow disks is difficult, already for sizes around ~50GB

This new release creates a text-based index file that keeps the list of multiparts, their size, their MD5 and their quick hash

Multipart backup with zpaqfranz

zpaqfranz backup z:\prova.zpaq *.cpp

Will create automagically create

a multipart archive starting from prova_00000001.zpaq, prova_00000002.zpaq, prova_00000001.zpaq...
a textfile index prova_00000000_backup.txt
a binary index prova_00000000_backup.index

Why? Spiegone is coming...

When you use "?" inside filename, you will get a multipart archive

zpaq a z:\pippo_???????.zpaq *.txt

Every new version, in zpaq, is just appended to the archive, but in this case the file is "splitted" in "pieces".
This is almost perfect for a rclone / rsync (without --append) / robocopy, whatever, to send the minimum amount of data.

So far, so good.

BUT

zpaq does not handle very well

the zero length: if you press control-C during compression, a 0-bytes long pippo_00000XX.zpaq is (can) be made
the "hole" (a missing "piece", pippo001, pippo002, pippo007, pippo008...)
mixing different backups. You can replace one piece of a zpaq multipart archive with another, and zpaq will joyfully consider it, without noticing the error (!). Since each session is "self-sufficient" zpaq not only does not warn the user, but in the case of encryption (i.e., with -key) nasty thing happens.
cannot really (quickly) check the archive for missing parts: if a "piece" is lost, it is possible that everything (from those version to the last) is lost too. Even more, if you hold data from third-party clients, for testing an encrypted archive you need the password, which you simply don't have. And 99.9 percent of backups are encrypted, even the one on LAN-connected NASes.
speed. If you have a 10.000 "pieces" backup, splitted in 10.000 "chunks", with zpaq you really cannot say if everything is OK, unless you run a (lengthy) full-scale test, this can take hours (ex. virtual machine disks)

Therefore...

New command testbackup

zpaqfranz testbackup z:\prova.zpaq

This command does a lot of different things, with either the optional switches

-verify enforce a full MD5 check
-ssd for multithreaded run (on solid state)
-verbose show infos
-range from:to to check only "some" pieces
-to where-the-zpaq-pieces-are
-paranoid

WHAT?

The answer is how quickly testing remote "cloud" backups: usually you will

zpaqfranz to a local drive
robocopy / rsync / zpaqfranz r to a "remote" location
run a remote script (to locally check, locally in the cloud server) || download the remote file, locally, then check back

The last point is the key: getting a smaller file (the last multipart) makes everything much faster.
You can md5sum the "remote" file, comparing against the stored MD5, that's it
Currently (before 58.2) you need to do a full-hash of the entire archive (that can become quite big). Not a big deal for a full-scale Debian o FreeBSD server.

I hope this is clear (?), I'll post a full real world-wiki example here
A few examples, better than a thousand words

zpaqfranz testbackup z:\prova

Use the "quick hash" to check if all the pieces are the exact size, and "seems" to be filled with the right data. Almost instantaneous

zpaqfranz testbackup z:\prova -verify

Check all pieces with MD5. Now if everything is OK you are almost sure. In this case the files are expected in the same position of creation

zpaqfranz testbackup z:\prova -paranoid

Compare the binary index vs the zpaq parts. If the data match perfectly you can be confident. For encrypted volumes the password is needed by -key

zpaqfranz testbackup z:\prova -verify -ssd -to z:\restored

Test MD5 (in multithread mode), searching the .zpaqs inside z:\restored

zpaqfranz testbackup z:\prova -range 10: -verify

Check from chunk 10 until the last (range examples: -range 3:5 -range :3, range 10:)

New command last

This will return the last partname, usually for scripting

zpaqfranz last z:\prova_????????

New command last2

Compare the last 2 rows of a textfile, assuming hash name. As you can guess, it facilitates, in scripted backup processing, the comparison of remote hashes with local ones. Refer to the example wiki, I will put up some working scripts.

zpaqfranz last2 c:\stor\confronto.txt -big

New sum switches

To make a md5sum-like, you can use a barrage of switches

zpaqfranz sum *.txt -md5 -pakka -noeta -stdout -nosort

Do not forget -ssd for non spinning drives

FAQ

Is this a panacea?

Of course NOT
Personally, I don't like splitting backups into many different parts at all, the risk of one being lost, garbled or corrupted is high
However, in certain cases, there is no alternative. I do not mention one of the most well-known Internet service providers, to avoid publicity (after all... they do not pay me :)

Better a or backup ?

I use both of them
I am thinking of an evolution of multipart with error correction (not detect, correction), but the level of priority is modest

Why the ancient MD5?

Unless now zpaqfranz use XXH3 for this kind of detection (-checktxt)
But, sometimes, you must choose the fastest among "usual" ones (spoiler: some cheap cloud vendors)

fcorbelli/zpaqfranz 58.2 Windows 32/64 binary, 64 bit-HW accelerated on GitHub