pydio/cells v4.0.0-rc5 on GitHub

Release Candidate 5 for Cells v4

Cells v4 is a major leap forward for clustered deployments. It features a brand-new microservices engine for unparalleled performance, and a new dependency management making it much easier to delegate core services (such as configs, registry, caches, etc.) to cloud standards.

RC3, RC4, RC5

We have released many packages since RC2 to validate and fix many glitches. This is really getting closer to stability !

For developers, one important change occurred during the last weeks: we now rely on Go18 (tested on Go19) to build the code. All dependencies versions were bumped for security. This should not impact users or API consumers though.

We also re-organized some CLI commands to make the hierarchy more consistent.

Moving to Go modules at last

The V4 branch was a long-term development project that started with our desire to adopt Go modules. Cells was written at a time when modules were not yet part of the language (we used the vendor folder...). This made the migration to modules complex: Go's auto-migration tool was not usable for us (it simply crashed), and we ended up recreating the code base from scratch using modules, re-adding our libraries and dependencies one by one.

As expected, this redesign led to a big reduction in our dependencies, a huge simplification of the architecture and, as a result, a lot of interesting features!

To make a long story short:

We got rid of the microservices framework we were using (Micro), regaining control of the gRPC layer.
We updated the main dependencies to their latest versions, including Caddy, Hydra and Minio.
We make a better use of resources within a process by sharing "servers" (http|gRPC) between "services". This greatly reduces the number of network ports used by Cells. This has a huge impact on performance, typically for single-node deployments.
These "servers" and "services" are much better managed and can be more easily started/stopped on different nodes, making cluster deployments easier than ever (see below).

MongoDB as a drop-in replacement for all services based on on-file storages

Historically we've been using BoltDB and BleveSearch and we like it. They are pure GO key/value stores and indexers and it allows Cells to provide full indexing functionality without external dependencies. By default, the search engine, activity stream or logs use these JSON document shops to provide rich, out-of-the-box functionality. But these stores are disk and memory intensive, and while they are suitable for small and medium-sized deployments, they create bottlenecks for large deployments.

We therefore looked at alternatives for implementing new 'drivers' for the data abstraction layer of each of these services, and chose MongoDB as a feature-rich, scalable and indexed JSON document store. All services using BoltDB/Bleve as storage now offer an alternative MongoDB implementation, a migration path from one another, and the ability to scale horizontally drastically.

File-based storage is still a very good option for small/medium instances, avoiding the need to manage another dependency, but the Cells installation steps will now offer to configure a MongoDB connection to avoid the need to migrate data in the long term. Note that Mongo does not replace MySQL storages, that DB is still required for Cells.

Cluster Me Please!

Cells was developed from day one as a set of microservices, but we had to face the fact that deploying Cells in a multi-node, highly available environment was extremely complex and almost nobody could really make it work... The v4 was the perfect time to tackle this problem!

We took a step back, learned our lesson from v1 to v3, and looked closely at cloud-native DevOps best practices (yes K8s, we're looking at you). The main objective was: how to create a fully stateless instance of Cells (image, container, you name it...) that can be easily distributed and replicated.

Similar to the move from BoltDB to Mongo, we implemented DAOs to decouple and externalize many layers, making Cells V4 finally cloud-ready. To achieve that without re-inventing the wheel, Cells V4 stands on the shoulders of giants :

ETCD for configs and services registry
NATS for message broadcasting (pub/sub)
REDIS for shared cache
MONGO for JSON documents
HASHICORP VAULT for secrets and certificates management

Again, all these are optional and Cells can still be deployed as a standalone, dependency-free binary on a Rasperry Pi (even the older 32bits versions) !

Migration and testing

Single-node deployments

Upgrade process is standard and should be straight-forward (and we would really love to hear from you on that).
There are a couple of important notes during this upgrade :

Hydra JWKs will be regenerated in the DB, with effect of invalidating all existing authentication token. You will be logged out after upgrade, and if you are using Personal Access Tokens, you will have to regenerate new ones.
Cells Sites Bind URL should not use a domain name but should declare a binding PORT, eventually including the Network Interface IP. If you have connection issue after upgrade, make sure to edit sites (cells configure sites) to bind to e.g. 0.0.0.0:8080 instead of domain.name:8080

Migrating Bolt/Bleve storages to Mongo

Migrate from existing on-file storage to MongoDB using the following steps:

Install MongoDB, currently tested against version 5.0.X, prepare a Mongo database for cells data
Stop cells, as Bolt/Bleve files must not be opened by the application during the migration process
Use cells admin config db add command to configure a connection:
- Setup connection using mongo connection string like mongodb://user:pass@ip:port/dbname
- Accept prompt for using this connection as default document DSN
- Accept prompt to perform data migration from existing bolt/bleve files to mongo. This can take some time.
Now restart Cells. You should see "Successfully pinged and connected to MongoDB" in the logs.
As search engine data are not migrated, you have to relaunch indexation on the pydio.grpc.search service using cells admin resync --service=pydio.grpc.search --path=/

Now you should be good to go. Try searching for * in Cells search engine, you should have blazing fast results.

Cluster deployments

We will provide dedicated blog posts on this topic very soon. You can already have a look at this sample Docker Compose file that shows the required dependencies and how to specify their endpoints to Cells using environment variables.

Change log

You can find a summary of the change log here.

pydio/cells v4.0.0-rc5 Cells v4 release candidate on GitHub