-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io error: broken pipe when accessing redis-backed KV store #2974
Comments
I have now tried to
Still the same error and no additional output in the logs. Removing the pod and letting it get recreated solves it until it happens again |
Adding a
|
cc @ThorstenHans for Azure wisdom and @fibonacci1729 for Redis KV knowledge |
@tfenster Thanks for the detailed error description and all the information for reproducing this one. I did the following:
I see all requests resulting in a HTTP 200 (I simply return the list of keys as response). I also used With runtimes older than Spin 2.8.0, we had a similar issue in the context of sqlite (see fermyon/enterprise-architectures-and-patterns#18). IIRC, That said, could you check which node-installer image is used for deploying You can find the version by executing this command: KWASM_NS=kwasm
KWASM_RELEASE_NAME=kwasm-operator
helm get values -n $KWASM_NS $KWASM_RELEASE_NAME My setup works with |
@ThorstenHans can you interact with it, then let it run for a bit and then try again? It works for me initially as well, but after some time (maybe 15 minutes, not sure), it breaks. I would also be happy to create an instance for you and give you full access, if that helps The node installer image is the same as for you, I have created the cluster only two days ago
One other thought: Which go and tinygo versions are you using? Might that be a cause for issues? |
That's the trick. After waiting some time I see the same error. I'll try to spot how / where this happens |
Great, thanks |
@tfenster you could use CosmosDB as KV store, I tested it with the constraints mentioned above. And it works (even if the app is on idle for quite some time) With a CosmosDB instance deployed, you create a database and a container using your desired configuration, the only requirement is that the partition key is set to The runtime configuration file should look similar to this: [key_value_store.default]
type = "azure_cosmos"
key = "your_key_here=="
account = "your_account_name"
database = "your_db_name"
container = "your_container_name" @fibonacci1729 this is a good indicator that there is something broken in the factor for |
@ThorstenHans that's what I first tried, but I ran into the RU limits very quickly for the cheap instances, so I would prefer Redis |
At a glance, this seems like a quirk of using |
not sure if and where I can configure it, but what would be a timeout that could fix this? |
Actually after a slightly longer glance, I think the issue is that Re: a reasonable timeout on the server, i'm not entirely sure. I think any reasonable timeout would just be delaying the behavior you're currently seeing. |
ok, then I'll patiently wait for a fix 😊 Thanks a lot for the quick check! |
@fibonacci1729 Would you already have an idea if and when a fix for this issue could become available? |
Hey @tfenster! Just context switching to this now. I'll have a better idea by the EOD once i can reproduce and poke around a bit to confirm the fix. I'll update here. |
Signed-off-by: Brian H <[email protected]>
fix #2974 -- use redis::aio::ConnectionManager
Thanks a lot @fibonacci1729 for working on it so quickly! Do I now need to wait for a release or can I already somehow get the fix? |
First, a new Spin release must be cut, containerd-shim-spin must update the corresponding dependency, and also cut a release. I'll reach out to corresponding people and try to get a timeframe for those. |
Hey @tfenster! We will kick off a patch release of Spin Monday after the AM public Spin meeting which should be available by the EOD. I imagine getting a release of the containerd-shim-spin should follow a similar timeline the next day. We'll try to get you situated by mid week next week! |
Great, thanks! |
Mentioning that we also have canary builds of Spin, so you could at least try things out locally by grabbing Spin from https://github.com/fermyon/spin/releases. |
Thanks for the hint @radu-matei! I tried and just to confirm your fix, @fibonacci1729, it works locally now for > 1h with the canary build. The latest release would have definitely broken in that time, so looks like it is fixed 🎉 |
That’s great to hear, thanks for the confirmation, @tfenster! Enjoy your weekend! |
Signed-off-by: Brian H <[email protected]> (cherry picked from commit ec11ba2)
[Backport v3.1] fix #2974 -- use redis::aio::ConnectionManager
@tfenster quick update, @kate-goldenring created a corresponding PR over on |
Here is an updated node installer image you can try out. It has the Spin patch in it: # Remove annotation from all nodes so can reapply it later to trigger a new install of the shim
kubectl annotate node --all kwasm.sh/kwasm-node=-
# Upgrade node installer to latest shim
helm upgrade --install \
kwasm-operator kwasm/kwasm-operator \
--namespace kwasm \
--create-namespace \
--set "kwasmOperator.installerImage=ghcr.io/spinkube/containerd-shim-spin/node-installer:20250114-015325-gf938c61"
# Reapply annotation to provision all nodes
kubectl annotate node --all kwasm.sh/kwasm-node=true You should see a new |
@kate-goldenring Thanks, I'll try and get back. Do I also need to rebuild my application with the new spin version? |
@tfenster you can use your existing app -- no need to rebuild. It will use the newly updated host components of the Spin runtime. |
Looks good! It works now for ~1h, the previous version never made it so long. Thanks for the quick solution and turnaround to everyone involved! @itowlson @ThorstenHans @fibonacci1729 @radu-matei @kate-goldenring |
Great to hear it is working! We also released the shim with a fix so you can update to shim # Remove annotation from all nodes so can reapply it later to trigger a new install of the shim
kubectl annotate node --all kwasm.sh/kwasm-node=-
# Upgrade node installer to latest shim
helm upgrade --install \
kwasm-operator kwasm/kwasm-operator \
--namespace kwasm \
--create-namespace \
--set "kwasmOperator.installerImage=ghcr.io/spinkube/containerd-shim-spin/node-installer:v0.18.0"
# Reapply annotation to provision all nodes
kubectl annotate node --all kwasm.sh/kwasm-node=true |
Did that as well, thanks. A little nit to add in case anyone comes across this: The |
I have an application (tfenster/verified-bluesky) that worked fine on your cloud, but due to the limit of 1024 entries to the KV store, I moved it to Azure (Azure Kubernetes Service, Azure Cache for Redis). Since then, it works for some time (didn't exactly check, but seems like a few minutes), but then calls to the KV store fail with "io error: broken pipe". I have narrowed it down to the following piece of code as the easiest repro, but other places accessing the store also fail with the same error:
The call to
kv.OpenStore
still seems to work, butstore.GetKeys()
fails and I get this response:For full context, see main.go. The logs only show this
The setup of the redis backend and creation of the runtime config looks like this
I then bring that to this secret (content obviously redacted)
Deploying the spin app looks like this spinapp.yaml and the latest build to create and push the OCI artifact is here.
Let me know if I can share anything else
Not sure if that is relevant, but this is the output of the commands you asked for in the template, taken from my dev environment:
spin --version
):spin plugins list --installed
)The text was updated successfully, but these errors were encountered: