-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC: BIOS version & settings management #138
Conversation
2ac16c3
to
e663a97
Compare
a6733e4
to
b87e059
Compare
0aea100
to
958928c
Compare
b682a57
to
3468011
Compare
func (s *ServerHTTP) registerRoutes() { | ||
s.mux.HandleFunc("/scan", s.scanHandler) | ||
s.mux.HandleFunc("/settings-apply", s.settingsApplyHandler) | ||
s.mux.HandleFunc("/version-update", s.versionUpdateHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea: would be great to have an endpoint to get all current active tasks from the sync.Map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanhipfel thanks for idea.
Totally agree. Also we might need to implement queues for requests, rate limiting, retrying on client side and tons of other stuff. Apart from that, I'd also like to generate http API from spec without hardcoding endpoints.
But for now, I think the main goal is to come to agreement - will we proceed further with this design or not.
// if referred server is not in Available state - stop reconciliation | ||
if server.Status.State != metalv1alpha1.ServerStateAvailable { | ||
return ctrl.Result{RequeueAfter: r.RequeueInterval}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to stop the server reconciliation in the meantime?
e.g.: do not allow serverClaim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we do not need to stop server's reconciliation - we do not care what is the state of the server. But we need to stop serverBIOS reconciliation, bc both BIOS version and settings update will lead to server reboot. So for now we decided that we'll work only with servers which are in "Available" state.
If the server is in available state, then we'll need to set it to "maintenance". But the servers' maintenance topic is still open. Thus I decide to not to mention it at all for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok but during a serverBios update, someone could still claim the server. Any available server can be claimed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok but during a serverBios update, someone could still claim the server. Any available server can be claimed
@stefanhipfel that's a good point. However, I do not really like the idea that the serverBIOS controller would change server's state to exclude it from reconciliation. For number of reasons the server's "owner" might want to postpone the update. Especially in terms of version upgrade.
I could add the "Maintenance" state for server and update the controller, so it will check whether server's state is "Maintenance" instead of "Available".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aobort yes i think so as well, that the serverBIOS should not change a server's state.
Alternative idea: the server reconciler checks on the serverBIOS state.
overall I think the POC looks ok |
// if referred server is not in Available state - stop reconciliation | ||
if server.Status.State != metalv1alpha1.ServerStateAvailable { | ||
return ctrl.Result{RequeueAfter: r.RequeueInterval}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok but during a serverBios update, someone could still claim the server. Any available server can be claimed
type ServerBIOSStatus struct { | ||
// BIOS contains a bios version and settings. | ||
// +optional | ||
BIOS BIOSSettings `json:"bios,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could add the current state of the bios task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanhipfel Taking into account:
- possible separating of metal-operator and bios/fw operator
- possible separation of the bios/fw operator and task runner
I'd suggest to proceed as is for now. Then we could add some endpoint to retrieve active tasks for concrete server or something like this. But it will require to store the references for running tasks etc. So I'd avoid adding it right now due to necessity to design this better.
@aobort overall I think the POC looks good. Next step would be to gather some improvements and test it with real examples! |
@defo89 @aobort @afritzler what do u guys think about taking the bios controller out of the metal operator, same as the boot/ipam operator? We would be more flexible in the future and changes could be implemented independently of the metal operator |
I proposed it in the beginning of this "epic". Thus, I agree. |
d06686f
to
b02ff6e
Compare
b02ff6e
to
ed1a9fc
Compare
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
ed1a9fc
to
971fffa
Compare
@stefanhipfel I've updated the controller: now it checks whether server is in "Maintenance" state. After #76 solved, we'll update server bios controller respectively. Adding the task state into the serverBIOS object's status is prematurely, from my perspective. Let's first run the stuff on real hardware and look how the "instant" operations (like scan and settings apply) work. Also added initial test for BIOS versions retrieval. cc: @afritzler @defo89 |
Signed-off-by: Artem Bortnikov <[email protected]>
Signed-off-by: Artem Bortnikov <[email protected]>
Proposed Changes
This PR contains PoC implementation for management of server's BIOS version and settings:
This approach fully separate concrete job implementation from the reconciliation flow.
To discuss
BootOrder
Do we need to move boot order field from server to serverBIOS CR? Semantically, yes. As it's one of BIOS settings.
BIOS settings in
.status
From my perspective, it reasonable to reflect in
.status.bios.settings
only those settings which are set in.spec.bios.settings
. This will make comparison of.spec
and.status
much easier.Storing bios settings
Do we need custom type for bios setting:
to avoid attempting to apply settings which are not supported in the specified BIOS version? I.e.:
upgrading from version 1.0.0 to 2.0.0 will lead to the bios setting responsible for legacy boot is deprecated/unsupported, thus it is explicitly marked as unsupported and will not be considered during settings applying.