Sablier is not stateless / HA ready in Kubernetes #484

dixneuf19 · 2025-01-08T14:48:45Z

Describe the bug

Hi, I joined a new team in the process of deploying your product to automatically stop and start some software stacks running in Kubernetes, depending on your great open-source software.

My coworker deployed Sablier as a StatefulSet, with only one replica, a PVC, and the file storage feature activated. This setup effectively maintains its state across restarts, making it somewhat functional.

However, it is quite a sub-optimal setup for Kubernetes. Since it is a StatefulSet with one replica, updating Sablier or just losing the pod means the whole Sablier service becomes unavailable for a while.

The StatefulSet with one replica is necessary because the state is saved in memory in the tinykv store. If we have several Sablier pods with a random load balancer (i.e., what you have with a Deployment and N replicas), the state would be inconsistent between pods, leading to premature stops for some apps.

I find all of this a bit brittle for a good Kubernetes deployment, ideally using as many stateless pods as possible with a remote distributed store. Another solution is to have a truly stateful Sablier app, with built-in clustering. And a last one would be going full into a Kubernetes Operator, using CRD and leases to acheive HA and leader election.

Anyway, here are my questions:

Are you aware of these limitations regarding using Sablier in Kubernetes? If yes, should we amend the documentation accordingly, and also change the official Helm chart which tries to deploy a Deployment for an inherently stateful app?
Do you have any plans to have better support for an HA deployment on Kubernetes, either by having an external kvstore such as Redis, or going more into a Kubernetes operator, with CRD and leases?

I think that your tool is a very interesting approach for on-demand environments and a good way to reduce load in cloud environments. I would be happy to help better support this kind of usage on Kubernetes.

Here is some context, but my point is quite agnostic of the version/reverse proxy used.

Context

Sablier version: 1.8.1
Provider: Kubernetes
Reverse proxy: Traefik v2.11
Running as a StatefulSet in Kubernetes

Anyway, thanks for your time on this FOSS software!

acouvreur · 2025-01-08T23:39:10Z

Hello @dixneuf19 !

Are you aware of these limitations regarding using Sablier in Kubernetes? If yes, should we amend the documentation accordingly, and also change the official Helm chart which tries to deploy a Deployment for an inherently stateful app?

I am aware of these limitations, we should definitely change the chart to be a StatefulSet.

Do you have any plans to have better support for an HA deployment on Kubernetes, either by having an external kvstore such as Redis, or going more into a Kubernetes operator, with CRD and leases?

I initially planned to go with Redis but didn't want people to be bothered by setting this up. And I think this project needs to be able to run with a redis backend (that replaces tinykv). I will look into that in the coming months.

For the Kubernetes operator, I planned on doing it but mainly for auto configuration based on known reverse proxies. What did you have in mind exactly ? Also how would you imagine configuring this through CRD ?

For leases, I'm not familiar with that, could you please detail what you have in mind ?

Thanks!

dixneuf19 · 2025-01-09T16:20:51Z

Hi,

Regarding the redis backend I think it would be a good and easy solution, since there is already a Store interface in the code.
Furthermore, having a Redis along Sablier app in Kubernetes is not much an bother. For example Argocd does it, it includes Bitnami Redis as a subchart in its official Helm chart. We should just be careful around persistence.

On the operator it could either be

a full fledged operator to select and configure the shutdown behavior namespace/app, and even configure known reverse proxies. However for this last point it might conflict with the existing way of configuring their plugin.
just use CRD as a way to store some state in Kubernetes etcd. For example you could have a CRD for each group/session, and then save last request/timeout info in this resource. If some action could conflict if several Sablier app acts at the same time (for example shutdown a namespace), Kubernetes Leases are a simple way to elect a leader without going full raft or another complex cluster protocol.

Anyway the first solution should be the simplest for the moment !

gedw99 · 2025-01-10T04:13:39Z

You can use nats . A golang system that is much faster than Redis and very easy to integrate via nats.go.

unlike Redis is can do global super clusters , so really scale outs ..

I run it as a global super cluster in 3 dc’s with 3 nats in each DC. Zero failure points.

any dc can go down and the client route to the nearest dc auto magically .

No bgp anycast needed ..

Redis is really pretty old hat these days inho .. sorry but just being frank.

dixneuf19 · 2025-01-16T09:39:09Z

Yes Redis might be a "Maslow’s Hammer" for DevOps community, a familiar and abused tool. And the whole license change and ValKey alternatives should challenge this it a bit.

Honestly, as long as the solution is lightweight and easy to operate, I am fine with it.

dixneuf19 added the bug Something isn't working label Jan 8, 2025

acouvreur added enhancement New feature or request and removed bug Something isn't working labels Feb 2, 2025

acouvreur mentioned this issue Feb 2, 2025

[FEATURE]: Make Sablier HA #498

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sablier is not stateless / HA ready in Kubernetes #484

Sablier is not stateless / HA ready in Kubernetes #484

dixneuf19 commented Jan 8, 2025

acouvreur commented Jan 8, 2025 •

edited

Loading

dixneuf19 commented Jan 9, 2025 •

edited

Loading

gedw99 commented Jan 10, 2025 •

edited

Loading

dixneuf19 commented Jan 16, 2025

Sablier is not stateless / HA ready in Kubernetes #484

Sablier is not stateless / HA ready in Kubernetes #484

Comments

dixneuf19 commented Jan 8, 2025

Describe the bug

acouvreur commented Jan 8, 2025 • edited Loading

dixneuf19 commented Jan 9, 2025 • edited Loading

gedw99 commented Jan 10, 2025 • edited Loading

dixneuf19 commented Jan 16, 2025

acouvreur commented Jan 8, 2025 •

edited

Loading

dixneuf19 commented Jan 9, 2025 •

edited

Loading

gedw99 commented Jan 10, 2025 •

edited

Loading