CASMTRIAGE-7308: Add Troubleshooting Steps #5430

arka-pramanik-hpe · 2024-10-08T15:45:08Z

Istio Pods Crashing post upgrade due to `fs.inotify` Limits

Description

CASMTRIAGE-7308
After the Istio upgrade, the nodes have not yet been rebooted into a new image where these limits (fs.inotify.max_user_instances and fs.inotify.max_user_watches) have been increased. As a result, when the pods are restarted, they might be trying to monitor more files or create more inotify instances than allowed by the system. This can cause the pods to crash or fail because they are unable to watch the files or directories they need, which may be critical for service mesh operations like traffic management, logging, or configuration changes.
In addition, power outage or node reboot mid-upgrade would hit the same problem because the pods would restart without the required kernel parameters being updated on the nodes.

Relates to:

cray-istio PR

Checklist

If I added any command snippets, the steps they belong to follow the prompt conventions (see example).
If I added a new directory, I also updated .github/CODEOWNERS with the corresponding team in Cray-HPE.
My commits or Pull-Request Title contain my JIRA information, or I do not have a JIRA.

troubleshooting/known_issues/Istio_Pods_Crashing_due_to_fs_inotify_Limits.md

* Istio Pods Crashing post upgrade due to `fs.inotify` Limits

arka-pramanik-hpe force-pushed the CASMTRIAGE-7308 branch 3 times, most recently from 0eb551d to 33a95aa Compare October 8, 2024 15:55

arka-pramanik-hpe changed the title ~~Add Troubleshooting Steps~~ CASMTRIAGE-7308: Add Troubleshooting Steps Oct 8, 2024

arka-pramanik-hpe mentioned this pull request Oct 8, 2024

Add Troubleshooting Steps Cray-HPE/cray-istio#46

Open

7 tasks

arka-pramanik-hpe requested review from spillerc-hpe, shunr-hpe, leliasen-hpe, rajeshranganathan-hpe, mtupitsyn and dborman-hpe October 8, 2024 16:11

arka-pramanik-hpe self-assigned this Oct 8, 2024

leliasen-hpe reviewed Oct 8, 2024

View reviewed changes

troubleshooting/known_issues/Istio_Pods_Crashing_due_to_fs_inotify_Limits.md Outdated Show resolved Hide resolved

arka-pramanik-hpe force-pushed the CASMTRIAGE-7308 branch 4 times, most recently from e507f4a to 82e7101 Compare October 9, 2024 06:39

CASMTRIAGE-7308: Add Troubleshooting Steps

5d7fd77

* Istio Pods Crashing post upgrade due to `fs.inotify` Limits

arka-pramanik-hpe force-pushed the CASMTRIAGE-7308 branch from 82e7101 to 5d7fd77 Compare October 9, 2024 10:39

arka-pramanik-hpe requested a review from leliasen-hpe October 9, 2024 10:47

leliasen-hpe approved these changes Oct 10, 2024

View reviewed changes

mtupitsyn approved these changes Oct 10, 2024

View reviewed changes

arka-pramanik-hpe merged commit 4f5784e into release/1.6 Oct 10, 2024
8 checks passed

arka-pramanik-hpe deleted the CASMTRIAGE-7308 branch October 10, 2024 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASMTRIAGE-7308: Add Troubleshooting Steps #5430

CASMTRIAGE-7308: Add Troubleshooting Steps #5430

arka-pramanik-hpe commented Oct 8, 2024 •

edited

Loading

CASMTRIAGE-7308: Add Troubleshooting Steps #5430

CASMTRIAGE-7308: Add Troubleshooting Steps #5430

Conversation

arka-pramanik-hpe commented Oct 8, 2024 • edited Loading

Istio Pods Crashing post upgrade due to fs.inotify Limits

Description

Checklist

arka-pramanik-hpe commented Oct 8, 2024 •

edited

Loading

Istio Pods Crashing post upgrade due to `fs.inotify` Limits