Skip to content

Commit

Permalink
ceph: Adds support for MDB controller and machineLabel controller
Browse files Browse the repository at this point in the history
Add support for MachineDisruptionBudget controller and MachineLabel controller required for fencing
in OpenShift. This will ensure that machines are only fenced and OSDs are only stopped
when Ceph is in a healthy state.

Co-authored-by: Ashish Ranjan <[email protected]>
Signed-off-by: travisn <[email protected]>
  • Loading branch information
travisn and Ashish Ranjan committed Sep 6, 2019
1 parent 49c5169 commit 638b591
Show file tree
Hide file tree
Showing 20 changed files with 683 additions and 14 deletions.
6 changes: 4 additions & 2 deletions Documentation/ceph-cluster-crd.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ If this value is empty, each pod will get an ephemeral directory to store their
- `hostNetwork`: uses network of the hosts instead of using the SDN below the containers.
- `mon`: contains mon related options [mon settings](#mon-settings)
For more details on the mons and when to choose a number other than `3`, see the [mon health design doc](https://github.com/rook/rook/blob/master/design/mon-health.md).
- `mgr`: manager top level section
- `modules`: is the list of Ceph manager modules to enable
- `rbdMirroring`: The settings for rbd mirror daemon(s). Configuring which pools or images to be mirrored must be completed in the rook toolbox by running the
[rbd mirror](http://docs.ceph.com/docs/mimic/rbd/rbd-mirroring/) command.
- `workers`: The number of rbd daemons to perform the rbd mirroring between clusters.
Expand All @@ -130,8 +132,8 @@ For more details on the mons and when to choose a number other than `3`, see the
- `disruptionManagement`: The section for configuring management of daemon disruptions
- `managePodBudgets`: if `true`, the operator will create and manage PodDsruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph-managed-disruptionbudgets.md). The operator will block eviction of OSDs by default and unblock them safely when drains are detected.
- `osdMaintenanceTimeout`: is a duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the default DOWN/OUT interval) when it is draining. This is only relevant when `managePodBudgets` is `true`. The default value is `30` minutes.
- `mgr`: manager top level section
- `modules`: is the list of ceph manager modules to enable
- `manageMachineDisruptionBudgets`: if `true`, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy. Only available on OpenShift.
- `machineDisruptionBudgetNamespace`: the namespace in which to watch the MachineDisruptionBudgets.

### Mon Settings

Expand Down
28 changes: 26 additions & 2 deletions Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Gopkg.toml
Original file line number Diff line number Diff line change
Expand Up @@ -130,3 +130,7 @@ ignored = [
[[constraint]]
branch = "master"
name = "github.com/kube-object-storage/lib-bucket-provisioner"

[[override]]
name = "github.com/openshift/machine-api-operator"
revision = "a0949226d20ea454cf08252a182a8e32054027c3"
1 change: 1 addition & 0 deletions PendingReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ an example usage
- Added a new property in `storageClassDeviceSets` named `portable`:
- If `true`, the OSDs will be allowed to move between nodes during failover. This requires a storage class that supports portability (e.g. `aws-ebs`, but not the local storage provisioner).
- If `false`, the OSDs will be assigned to a node permanently. Rook will configure Ceph's CRUSH map to support the portability.
- Rook can now manage MachineDisruptionBudgets for the OSDs (only available on OpenShift). MachineDisruptionBudgets for OSDs are dynamically managed as documented in the `disruptionManagement` section of the [CephCluster CR](Documentation/ceph-cluster-crd.md##luster-settings). This can be enabled with the `manageMachineDisruptionBudgets` flag in the cluster CR.
- Rook can now manage PodDisruptionBudgets for the following Daemons: OSD, Mon, RGW, MDS. OSD budgets are dynamically managed as documented in the [design](https://github.com/rook/rook/blob/master/design/ceph-managed-disruptionbudgets.md). This can be enabled with the `managePodBudgets` flag in the cluster CR. When this is enabled, drains on OSDs will be blocked by default and dynamically unblocked in a safe manner one failureDomain at a time. When a failure domain is draining, it will be marked as no out for a longer time than the default DOWN/OUT interval.
- Rook now has a new config CRD `mgr` to enable ceph manager modules
- Flexvolume plugin now supports dynamic PVC expansion.
Expand Down
22 changes: 22 additions & 0 deletions cluster/charts/rook-ceph/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,28 @@ rules:
- deployments
verbs:
- "*"
- apiGroups:
- healthchecking.openshift.io
resources:
- machinedisruptionbudgets
verbs:
- get
- list
- watch
- create
- update
- delete
- apiGroups:
- machine.openshift.io
resources:
- machines
verbs:
- get
- list
- watch
- create
- update
- delete
---
# Aspects of ceph-mgr that require cluster-wide access
kind: ClusterRole
Expand Down
5 changes: 5 additions & 0 deletions cluster/examples/kubernetes/ceph/cluster-on-pvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,8 @@ spec:
volumeMode: Block
accessModes:
- ReadWriteOnce
disruptionManagement:
managePodBudgets: false
osdMaintenanceTimeout: 30
manageMachineDisruptionBudgets: false
machineDisruptionBudgetNamespace: openshift-machine-api
17 changes: 14 additions & 3 deletions cluster/examples/kubernetes/ceph/cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,17 @@ spec:
# storeType: filestore
# - name: "172.17.4.301"
# deviceFilter: "^sd."
disruptionManagement: #The section for configuring management of daemon disruptions
managePodBudgets: false #if `true`, the operator will create and manage PodDsruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph-managed-disruptionbudgets.md). The operator will block eviction of OSDs by default and unblock them safely when drains are detected.
osdMaintenanceTimeout: 30 # is a duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the default DOWN/OUT interval) when it is draining. This is only relevant when `managePodBudgets` is `true`. The default value is `30` minutes.
# The section for configuring management of daemon disruptions during upgrade or fencing.
disruptionManagement:
# If true, the operator will create and manage PodDsruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
# via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph-managed-disruptionbudgets.md). The operator will
# block eviction of OSDs by default and unblock them safely when drains are detected.
managePodBudgets: false
# A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
# default DOWN/OUT interval) when it is draining. This is only relevant when `managePodBudgets` is `true`. The default value is `30` minutes.
osdMaintenanceTimeout: 30
# If true, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy.
# Only available on OpenShift.
manageMachineDisruptionBudgets: false
# Namespace in which to watch for the MachineDisruptionBudgets.
machineDisruptionBudgetNamespace: openshift-machine-api
31 changes: 30 additions & 1 deletion cluster/examples/kubernetes/ceph/common.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,14 @@ spec:
type: boolean
storage:
properties:
disruptionManagement:
properties:
managePodBudgets:
type: boolean
osdMaintenanceTimeout:
type: integer
manageMachineDisruptionBudgets:
type: boolean
useAllNodes:
type: boolean
nodes:
Expand Down Expand Up @@ -635,7 +643,28 @@ rules:
- deployments
verbs:
- "*"

- apiGroups:
- healthchecking.openshift.io
resources:
- machinedisruptionbudgets
verbs:
- get
- list
- watch
- create
- update
- delete
- apiGroups:
- machine.openshift.io
resources:
- machines
verbs:
- get
- list
- watch
- create
- update
- delete
---
# Aspects of ceph-mgr that require cluster-wide access
kind: ClusterRole
Expand Down
6 changes: 5 additions & 1 deletion cluster/examples/kubernetes/ceph/operator-openshift.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ spec:
- name: ROOK_DISABLE_DEVICE_HOTPLUG
value: "false"

# Whether to enable the flex driver. By default it is enabled and is fully supported, but will be deprecated in some future release
# Whether to enable the flex driver. By default it is enabled and is fully supported, but will be deprecated in some future release
# in favor of the CSI driver.
- name: ROOK_ENABLE_FLEX_DRIVER
value: "true"
Expand All @@ -207,6 +207,10 @@ spec:
- name: ROOK_ENABLE_DISCOVERY_DAEMON
value: "true"

# Whether to start machineDisruptionBudget and machineLabel controller to watch for the osd pods and MDBs.
- name: ROOK_ENABLE_MACHINE_DISRUPTION_BUDGET
value: "false"

# Enable the CSI driver.
# To run the non-default version of the CSI driver, see the override-able image properties in operator.yaml
- name: ROOK_CSI_ENABLE_CEPHFS
Expand Down
4 changes: 3 additions & 1 deletion cmd/rook/ceph/operator.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (
operator "github.com/rook/rook/pkg/operator/ceph"
"github.com/rook/rook/pkg/operator/ceph/cluster/mon"
"github.com/rook/rook/pkg/operator/ceph/csi"
"github.com/rook/rook/pkg/operator/ceph/disruption"
"github.com/rook/rook/pkg/operator/k8sutil"
"github.com/rook/rook/pkg/util/flags"
"github.com/spf13/cobra"
Expand Down Expand Up @@ -65,9 +66,10 @@ func init() {
operatorCmd.Flags().StringVar(&csi.CephFSPluginTemplatePath, "csi-cephfs-plugin-template-path", csi.DefaultCephFSPluginTemplatePath, "path to ceph-csi cephfs plugin template")
operatorCmd.Flags().StringVar(&csi.CephFSProvisionerSTSTemplatePath, "csi-cephfs-provisioner-sts-template-path", csi.DefaultCephFSProvisionerSTSTemplatePath, "path to ceph-csi cephfs provisioner statefulset template")
operatorCmd.Flags().StringVar(&csi.CephFSProvisionerDepTemplatePath, "csi-cephfs-provisioner-dep-template-path", csi.DefaultCephFSProvisionerDepTemplatePath, "path to ceph-csi cephfs provisioner deployment template")
//csi grpc flag
operatorCmd.Flags().BoolVar(&csi.EnableCSIGRPCMetrics, "csi-enable-grpc-metrics", true, "enable grpc metrics in ceph-csi")

operatorCmd.Flags().BoolVar(&disruption.EnableMachineDisruptionBudget, "enable-machine-disruption-budget", false, "enable fencing controllers")

flags.SetFlagsFromEnv(operatorCmd.Flags(), rook.RookEnvVarPrefix)
flags.SetLoggingFlags(operatorCmd.Flags())
operatorCmd.RunE = startOperator
Expand Down
6 changes: 6 additions & 0 deletions pkg/apis/ceph.rook.io/v1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -433,4 +433,10 @@ type DisruptionManagementSpec struct {
// it only works if managePodBudgetss is true.
// the default is 30 minutes
OSDMaintenenceTimeout time.Duration `json:"osdMaintenanceTimeout,omitempty"`

// This enables management of machinedisruptionbudgets
ManageMachineDisruptionBudgets bool `json:"manageMachineDisruptionBudgets,omitempty"`

// Namespace to look for MDBs by the machineDisruptionBudgetController
MachineDisruptionBudgetNamespace string `json:"machineDisruptionBudgetNamespace,omitempty"`
}
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ import (
"github.com/rook/rook/pkg/operator/ceph/disruption/controllerconfig"

cephv1 "github.com/rook/rook/pkg/apis/ceph.rook.io/v1"
// metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

const (
Expand Down
60 changes: 60 additions & 0 deletions pkg/operator/ceph/disruption/machinedisruption/add.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/*
Copyright 2019 The Rook Authors. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package machinedisruption

import (
healthchecking "github.com/openshift/machine-api-operator/pkg/apis/healthchecking/v1alpha1"
cephv1 "github.com/rook/rook/pkg/apis/ceph.rook.io/v1"
"github.com/rook/rook/pkg/operator/ceph/disruption/controllerconfig"
"sigs.k8s.io/controller-runtime/pkg/controller"
"sigs.k8s.io/controller-runtime/pkg/handler"
"sigs.k8s.io/controller-runtime/pkg/manager"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
"sigs.k8s.io/controller-runtime/pkg/source"
)

// Add adds a new Controller to the Manager based on machinedisruption.ReconcileMachineDisruption and registers the relevant watches and handlers.
// Read more about how Managers, Controllers, and their Watches, Handlers, Predicates, etc work here:
// https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg
func Add(mgr manager.Manager, context *controllerconfig.Context) error {
mgrScheme := mgr.GetScheme()
healthchecking.AddToScheme(mgrScheme)
cephv1.AddToScheme(mgrScheme)

reconcileMachineDisruption := &MachineDisruptionReconciler{
client: mgr.GetClient(),
scheme: mgrScheme,
context: context,
}

reconciler := reconcile.Reconciler(reconcileMachineDisruption)
// create a new controller
c, err := controller.New(controllerName, mgr, controller.Options{Reconciler: reconciler})
if err != nil {
return err
}

err = c.Watch(&source.Kind{Type: &cephv1.CephCluster{}}, &handler.EnqueueRequestForObject{})
if err != nil {
return err
}

return c.Watch(&source.Kind{Type: &healthchecking.MachineDisruptionBudget{}}, &handler.EnqueueRequestForOwner{
IsController: true,
OwnerType: &cephv1.CephCluster{},
})
}
23 changes: 23 additions & 0 deletions pkg/operator/ceph/disruption/machinedisruption/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
Copyright 2019 The Rook Authors. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

/*
Package machinedisruption ensures that openshift fencing doesn't interfere with running ceph resources in a way that results in data loss/unavailability.
The design and purpose for machinedisruption management is found at:
https://github.com/rook/rook/blob/master/design/ceph-openshift-fencing-mitigation.md
*/

package machinedisruption
Loading

0 comments on commit 638b591

Please sign in to comment.