-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observe non-working times feature #48
Comments
I did some of the things you proposed in my PR already. If you want extend it with the things I don't have that'd be the easiest for you. |
@twildeboer @klautcomputing I try to look at the PR again over the weekend. At first sight the way to specify the time frame as well as the implementation seemed quite complicated to me. @klautcomputing would you think defining the range similar to https://github.com/hjacobs/kube-downscaler#configuration would simplify usage as well as implementation and still be able to capture your use cases? e.g.: be active at work time as well as midday on weekends would be: --active-at "Fri-Fri 10:00-16:00 CET, Sat-Sun 10:00-12:00 CET" |
@linki could you leave a couple of comments on my code where you think my implementation is too complicated?
Did you maybe mean This raises the general question of whether we want chaoskube to be purely opt in. Given that chaos engineering is not something that should surprise a team, but they should have made an active decision to test their systems with chaos it might be the right choice and would get rid of |
Generally speaking, I suggest being careful to resist the temptation to over-engineer features. Rather, design and implmenet what you know is needed and then see how that goes and whether there is demand for more or something different. Regarding this feature specifically, speaking only for our own use case, we do not have need for both detailed "off-time" and "on-time" specifications. Our team has typical work hours and has an on-call rotation for non-working hours. I imagine that would generally describe the majority of the chaoskube users. Since chaoskube is (from our perspective) intended to be run as an on-going stabiliity test, all we care about is being able to limit which services are impacted, and not making on-call life harder on anyone unnecessarily. You may notice that chaosmonkey does not provide such detailed scheduling, AFAIK. If someone wants to run chaoskube on the weekend, they can just deploy another instance of it to do whatever they want. The scheduling will never be perfect anyway, since the holidays will need to be updated from time to time, at least. Finally, we view chaoskube as a tool that gives us confidence in the resilience of our systems, but it is not critical to our infrastructure and does not need precise scheduling capabilities. Another reason to avoid precise scheduling capability is that it is significantly more difficult to implement correctly. You will have to include all kinds of logic to handle periods that span midnight and Daylight Saving jumps. And you will have to try to find a way to support such configuration that is not confusing. People will get confused about what their configuration really means, no matter how carefully you write your documentation, and then you will get all kinds of bug reports that are actually user-error or user misunderstanding. You could, perhaps, if the need was shown to be significant, add the ability to override each global "off-time" attribute with service-specific ones through annotations. But I would wait and see if this is a real need, because it adds complexity. Our team does not need this. |
@linki - PR for this feature waiting for you. |
@klautcomputing @twildeboer Thank you for all your input. The above feature is part of v0.8.0 so I'm going to close this issue. I think we found a fairly easy way to configure it althought the equivalent of I also think that at some point some configuration should be overridable by annotations or moved entirely to annotations, e.g. for users defining a |
I'm proposing a feature addition to chaoskube that would add the ability suspend the chaos during nights, weekends and holidays using the following command-line options. These are designed to be somewhat consistent with the current pattern of chaoskube options as well as the configuration options for Chaos Monkey. They should be self-explanatory:
The options above imply that both
--observe-off-times true
and--location '...'
must be present for the feature to take effect. There is purposefully no default location so the user is forced to provide this, since most SRE staff is probably not working in the GMT timezone, so defaulting to UTC would not really make sense in this case.Note that this requires a IANA Time Zone as opposed to a three-letter timezone abbreviation such as 'EDT' or 'EST', that would have to change with Daylight Saving conventions. Daylight Saving is automatically accounted for by using the IANA Time Zones.
I intend to post a PR as soon as I have this implemented, but wanted to get some feedback in case I'm missing something.
The text was updated successfully, but these errors were encountered: