Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sablier is not stateless / HA ready in Kubernetes #484

Open
dixneuf19 opened this issue Jan 8, 2025 · 4 comments
Open

Sablier is not stateless / HA ready in Kubernetes #484

dixneuf19 opened this issue Jan 8, 2025 · 4 comments
Labels
enhancement New feature or request

Comments

@dixneuf19
Copy link

Describe the bug

Hi, I joined a new team in the process of deploying your product to automatically stop and start some software stacks running in Kubernetes, depending on your great open-source software.

My coworker deployed Sablier as a StatefulSet, with only one replica, a PVC, and the file storage feature activated. This setup effectively maintains its state across restarts, making it somewhat functional.

However, it is quite a sub-optimal setup for Kubernetes. Since it is a StatefulSet with one replica, updating Sablier or just losing the pod means the whole Sablier service becomes unavailable for a while.

The StatefulSet with one replica is necessary because the state is saved in memory in the tinykv store. If we have several Sablier pods with a random load balancer (i.e., what you have with a Deployment and N replicas), the state would be inconsistent between pods, leading to premature stops for some apps.

I find all of this a bit brittle for a good Kubernetes deployment, ideally using as many stateless pods as possible with a remote distributed store. Another solution is to have a truly stateful Sablier app, with built-in clustering. And a last one would be going full into a Kubernetes Operator, using CRD and leases to acheive HA and leader election.

Anyway, here are my questions:

  • Are you aware of these limitations regarding using Sablier in Kubernetes? If yes, should we amend the documentation accordingly, and also change the official Helm chart which tries to deploy a Deployment for an inherently stateful app?
  • Do you have any plans to have better support for an HA deployment on Kubernetes, either by having an external kvstore such as Redis, or going more into a Kubernetes operator, with CRD and leases?

I think that your tool is a very interesting approach for on-demand environments and a good way to reduce load in cloud environments. I would be happy to help better support this kind of usage on Kubernetes.

Here is some context, but my point is quite agnostic of the version/reverse proxy used.

Context

Sablier version: 1.8.1
Provider: Kubernetes
Reverse proxy: Traefik v2.11
Running as a StatefulSet in Kubernetes

Anyway, thanks for your time on this FOSS software!

@dixneuf19 dixneuf19 added the bug Something isn't working label Jan 8, 2025
@acouvreur
Copy link
Member

acouvreur commented Jan 8, 2025

Hello @dixneuf19 !

Are you aware of these limitations regarding using Sablier in Kubernetes? If yes, should we amend the documentation accordingly, and also change the official Helm chart which tries to deploy a Deployment for an inherently stateful app?

I am aware of these limitations, we should definitely change the chart to be a StatefulSet.

Do you have any plans to have better support for an HA deployment on Kubernetes, either by having an external kvstore such as Redis, or going more into a Kubernetes operator, with CRD and leases?

I initially planned to go with Redis but didn't want people to be bothered by setting this up. And I think this project needs to be able to run with a redis backend (that replaces tinykv). I will look into that in the coming months.

For the Kubernetes operator, I planned on doing it but mainly for auto configuration based on known reverse proxies. What did you have in mind exactly ? Also how would you imagine configuring this through CRD ?

For leases, I'm not familiar with that, could you please detail what you have in mind ?

Thanks!

@dixneuf19
Copy link
Author

dixneuf19 commented Jan 9, 2025

Hi,

Regarding the redis backend I think it would be a good and easy solution, since there is already a Store interface in the code.
Furthermore, having a Redis along Sablier app in Kubernetes is not much an bother. For example Argocd does it, it includes Bitnami Redis as a subchart in its official Helm chart. We should just be careful around persistence.

On the operator it could either be

  • a full fledged operator to select and configure the shutdown behavior namespace/app, and even configure known reverse proxies. However for this last point it might conflict with the existing way of configuring their plugin.
  • just use CRD as a way to store some state in Kubernetes etcd. For example you could have a CRD for each group/session, and then save last request/timeout info in this resource. If some action could conflict if several Sablier app acts at the same time (for example shutdown a namespace), Kubernetes Leases are a simple way to elect a leader without going full raft or another complex cluster protocol.

Anyway the first solution should be the simplest for the moment !

@gedw99
Copy link

gedw99 commented Jan 10, 2025

You can use nats . A golang system that is much faster than Redis and very easy to integrate via nats.go.

unlike Redis is can do global super clusters , so really scale outs ..

I run it as a global super cluster in 3 dc’s with 3 nats in each DC. Zero failure points.

any dc can go down and the client route to the nearest dc auto magically .

No bgp anycast needed ..

Redis is really pretty old hat these days inho .. sorry but just being frank.

@dixneuf19
Copy link
Author

Yes Redis might be a "Maslow’s Hammer" for DevOps community, a familiar and abused tool. And the whole license change and ValKey alternatives should challenge this it a bit.

Honestly, as long as the solution is lightweight and easy to operate, I am fine with it.

@acouvreur acouvreur added enhancement New feature or request and removed bug Something isn't working labels Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants