github bigscience-workshop/petals v1.1.5
v1.1.5: Faster fine-tuning, bug fixes, and more

latest releases: v2.2.0, v2.1.0, v2.0.1.post2...
12 months ago

Highlights

⏱ Faster fine-tuning. Fine-tuning uses ~2x less traffic (tensors are now sent in bfloat16 by default) and builds routes using a heuristic maximizing the swarm's throughput. This should address timeout errors that could happen during fine-tuning.

🐞 Bug fixes. On servers, this release fixes out-of-memory errors and freezing network throughput evals. On clients, it fixes issues with slicing RemoteSequential and silently ignoring unsupported .generate() kwargs. Also, this release fixes warnings originated from hivemind.p2p and hivemind.compression.

🛣️ Updated throughput formula. We have updated the throughput formula to reflect that servers hosting many blocks still run forward and backward passes through only one block at a time. Don't be surprised if your throughput became smaller than in 1.1.4 — these numbers are not directly comparable!

🖼️ Improved lower-level interfaces. We have refactored lower-level interfaces, such as RemoteSequential and RemoteSequenceManager, to be more reliable (e.g. when doing retries) and much easier to use. Some rarely used low-level functions in petals.dht_utils were removed.

What's Changed

  • Fix OOMs happening in case of accelerate >= 0.16.0 by @borzunov in #310
  • Refactor RemoteSequenceManager by @borzunov in #309
  • Update hivemind to 1.1.8, enable efficient bfloat16 encoding by @borzunov in #311
  • Replace .make_sequence(..., mode="random") with mode="max_throughput" by @borzunov in #313
  • Divide compute throughput by average no. of used blocks by @borzunov in #314
  • Raise error for unexpected .generate() kwargs by @borzunov in #315
  • Abort speedtest if it runs too long by @borzunov in #316
  • Bump version to 1.1.5 by @borzunov in #312

Full Changelog: v1.1.4...v1.1.5

Don't miss a new petals release

NewReleases is sending notifications on new releases.