v1.1.0 Release Notes
📢 Introducing Bacalhau v1.1.0 - Unleash the Power!
We are thrilled to announce the release of Bacalhau v1.1.0, a significant milestone in our quest for unparalleled computing capabilities. Packed with exciting new features like Full Fleet Targeting, Configurable Compute Timeouts, persistent storage, integration with private data swarms and API TLS support, this release is sure to take your computational experience to new heights! 🚀
But that's not all! We invite you to explore the experimental features of this release, such as Long-Running Jobs, as we continue to push the boundaries of computational possibilities.
So, what are you waiting for? Upgrade to Bacalhau v1.1.0 and unlock a world of infinite possibilities in distributed computing! 🌟
curl https://get.bacalhau.org/install.sh | bash
New features
Full Fleet Targeting
Jobs can now target all nodes in a network simultaneously, allowing for more efficient and parallel operations jobs that need to query or modify an entire fleet.
Full fleet jobs are perfect for fleet management, allowing an operator to quickly understand the state of all of their nodes at once with a single command.
Full fleet jobs will only succeed if all known nodes in a network can be reached and can execute the job successfully. Jobs can still be targeted at a subset of the fleet by using labels or resource requirements.
Pass the --target=all
parameter to any Bacalhau job command or set Deal.TargetAll: true
in an existing Bacalhau job spec.
New node CLI and APIs
New CLI and APIs have been introduced allowing users to easily list nodes in a network and see what compute resources that are available.
Use the new command bacalhau node list
to get a tabular output of all known nodes:
You can then use bacalhau node describe
to get in-depth output about a specific node.
Configurable Timeouts
Jobs can now last for days or weeks, enabling the execution of big computations that require longer processing times.
By default, compute nodes now do not enforce an execution timeout and jobs default to the longest allowed timeout. Job submitters can still request a timeout using the --timeout
flag or the Timeout
field in their job spec.
Node operators can still choose to limit the maximum timeout allowed by passing the --max-timeout
flag to the serve
command or by specifying the new Node.Compute.Capacity.JobTimeouts.MaxJobExecutionTimeout
property in their config file.
Richer Node Configuration
We're excited to unveil enhanced configuration options in Bacalhau v1.1.0! With a heightened focus on flexibility, we've expanded the ways you can configure Bacalhau, whether it be via a configuration file, command-line flags, or environment variables.
The new release introduces a persistent configuration file that provides more flexibility and control over node configurations. Read the documentation for how to get started with configuration files.
Key Changes from v1.0.3 to v1.1.0:
- The enriched
config.yaml
now has a trove of default configuration values, an improvement from the empty version inv1.0.3
. - Event and Libp2p tracing is no longer activated by default. Enable this by specifying paths for via
EventTracerPath
andLibp2PTracerPath
inconfig.yaml
. - The node’s private key is no longer called**
private_key.1235
** and is named **libp2p_private_key
**by default. Configure its path withLibp2PKeyPath
inconfig.yaml
. user_id.pem
remains consistent. Direct its location usingKeyPath
inconfig.yaml
.- Directory name has changed from
execution-state-<NODE_ID>
to<NODE_ID>-compute
, and now, apart fromjobStats.json
, it also includesexecutions.db
using BoltDB when using persistent storage mode. Define its path usingExecutionStore.Path
inconfig.yaml
. - New directories include
<NODE_ID>-requester
(stores the state for the requester node using BoltDB),executor_storages
(hosts data for Bacalhau storage types), andplugins
(houses executor plugin binaries). Configure their paths respectively viaJobStore.Path
,ComputeStoragePath
, andExecutorPluginPath
inconfig.yaml
.
⚠️Note: there are optional migration steps for existing Bacalhau users who want to keep their previous configuration. See the end of this note for how to migrate.
Support for TLS on public APIs
TLS certificates for serving client-facing APIs are now supported, ensuring secure and encrypted communication between Bacalhau clients and requester nodes.
To use a TLS certificate to encrypt communication, you can:
- Configure automatic certificates from Let’s Encrypt by passing
--autocert=<your-hostname>
and ensuring the Bacalhau binary can respond to challenges by runningsudo setcap CAP_NET_BIND_SERVICE+ep $(which bacalhau)
. - Pass a certificate to
--tlscert
and the corresponding private key to--tlskey
.
By default, if none of the above options are used, the server will continue to serve its API endpoints over HTTP.
Persistent Storage of Jobs and Executions
Compute and requester nodes now support persistent storage, ensuring data integrity and allowing for long-term job and execution audit records. This feature is now switched on by default and records are persisted to the Bacalhau repository.
See the documentation for how to configure persistence.
Improved Error Messages
Clearer error messages are now displayed when no node is available to run a job, making troubleshooting easier and more efficient.
Instead of receiving ‘not enough nodes to run the job’, users will now get more specific help messages, such as ‘Docker image does not exist or repo is inaccessible’ or ‘job timeout exceeds the maximum allowed’.
Fine-Grained Control Over Image Entrypoint and Parameters
Users now have finer control over the entrypoint and parameters passed to a Docker image. Previously, Bacalhau would ignore the default entrypoint to the image and replace it with the first argument after bacalhau docker run <image>
. Now, the default entrypoint in the image is used and all of the positional arguments are passed as the command to that entrypoint.
The entrypoint can still be explicitly overriden by using the --entrypoint
flag or by setting the Entrypoint
field in a Docker job spec.
GPU Support Inside Docker Containers
Bacalhau now has the capability to automatically utilize GPUs when the Bacalhau node is running inside a Docker container. Ensure that the Bacalhau node is started with a GPU capability by passing --gpus=all
to docker run
, and Bacalhau nodes will automatically detect GPUs running on the host machine.
Submit a job to a node running inside Docker using bacalhau docker run --gpu=1
to run the job in a new GPU-enabled container on the host.
Support for Private IPFS Clusters
Integration with private IPFS clusters has been added, providing enhanced security and control over data storage and retrieval.
To connect to a private swarm, pass the path to a swarm key to --ipfs-swarm-key
, set the BACALHAU_IPFS_SWARM_KEY
environment variable or configure the Node.IPFS.SwarmKeyPath
configuration property.
When connecting to a private swarm, Bacalhau will no longer bootstrap using or connect to public peers and will rely on the swarm for all data retrieval.
These steps are also necessary on clients who use bacalhau get
to download from a private IPFS swarm.
Note that these steps are not necessary if using the --ipfs-connect
flag, which already can connect to IPFS nodes running a private swarm.
New Experimental Features
All of these features are experimental, meaning that their APIs are liable to change in an upcoming release. You are encouraged to try out these features and provide feedback or bug reports on Bacalhau Slack.
Long-Running Jobs
Bacalhau jobs can now run indefinitely and will automatically restart when nodes come back online, allowing for continuous and uninterrupted processing.
Long-running jobs allow compute workloads to process data that arrives continuously, and is perfect for tasks such as pre-filtering logs, processing real-time analytics, or working with edge sensors.
With the introduction of long-running jobs, ML inference tasks can now operate in a "warm-boot" environment. This means that the necessary resources and dependencies are already loaded, significantly reducing the time taken to run an inference job.
With this experimental feature, you can now unleash the power of Bacalhau to handle dynamic and ever-changing data streams, ensuring continuous and uninterrupted processing of your computational workloads.
Deprecated Features
Estuary
The Estuary publisher is no longer supported in this release. Compute nodes will now reject jobs that require the Estuary publisher.
Verification
The Verifiers feature is no longer supported in this release. Compute nodes will silently ignore verification requirements on jobs.
⚠️ Migration steps
Users who wish to continue using their previous Bacalhau private key or their previous Bacalhau Client ID as their identity will need to either:
- Rename
private_key.1235
tolibp2p_private_key
- Modify the
config.yaml
to use the previous key by editing the value ofLibp2PKeyPath
to point to its path.
Up Next
These upcoming features aim to provide users with increased flexibility and convenience in their computational workflows while maintaining a focus on privacy and security.
User-definable executor plugins
In the next release, users will have the opportunity to experiment with pluggable executors, which will allow them to run jobs without the need to worry about Docker images. The first executor that we will make available will be for Python, and it will be able to execute Python srcipts by using the command bacalhau run python script.py
. This feature aims to provide a more seamless and convenient experience for running jobs.
Cluster bootstrapping and private data
Additionally, on the roadmap for future releases, we are planning to introduce easier bootstrapping of Bacalhau clusters. This will simplify the process of setting up and configuring Bacalhau clusters, making it more accessible for users. Furthermore, we are also working on adding support for private data and jobs, ensuring enhanced security and control over sensitive information.