Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zookeeper backup controller #413

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

ibumarskov
Copy link

@ibumarskov ibumarskov commented Nov 23, 2021

Change log description

Implementation of automatic data backup.

Purpose of the change

Implement #408

What the code does

Adds a dedicated controller responsible for automatic backups of zookeeper data. Current
implementation provides periodic copying of transaction logs and snapshots (on-disk representation of the data) to
dedicated persistence volume with specified storage class within kubernetes cluster.

How to verify it

Create CR for zookeper backup controller
Exmaple CR of zookeeper backup:

apiVersion: "zookeeper.pravega.io/v1beta1"
kind: "ZookeeperCluster"
metadata:
  name: "example-backup"
spec:
    zookeeperCluster: "zookeeper-cluster"
    schedule: "0 0 */1 * *"
    backupsToKeep: "7"
    dataStorageClass: "backup-class"
    image:
      repository: "pravega/zkbackup"
      tag: "0.1"

Additional required actions:

Build backup docker image and upload it to hub.docker with image name pravega/zkbackup:0.1

Ilya Bumarskov added 5 commits November 23, 2021 16:52
Signed-off-by: Ilya Bumarskov <[email protected]>
Signed-off-by: Ilya Bumarskov <[email protected]>
Signed-off-by: Ilya Bumarskov <[email protected]>
@codecov
Copy link

codecov bot commented Nov 23, 2021

Codecov Report

Base: 85.05% // Head: 85.23% // Increases project coverage by +0.18% 🎉

Coverage data is based on head (847e788) compared to base (c29d475).
Patch coverage: 80.50% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #413      +/-   ##
==========================================
+ Coverage   85.05%   85.23%   +0.18%     
==========================================
  Files          12       14       +2     
  Lines        1606     1924     +318     
==========================================
+ Hits         1366     1640     +274     
- Misses        155      184      +29     
- Partials       85      100      +15     
Impacted Files Coverage Δ
controllers/zookeeperbackup_controller.go 74.69% <74.69%> (ø)
api/v1beta1/zookeeperbackup_types.go 100.00% <100.00%> (ø)
api/v1beta1/zz_generated.deepcopy.go 98.23% <100.00%> (+0.31%) ⬆️
controllers/zookeepercluster_controller.go 69.07% <0.00%> (+3.61%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

deploy/role.yaml Outdated
- zookeeper.pravega.io
resources:
- '*'
- zookeeperbackups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not zookeeperbackups be already included in the first line ('*')?

@jkhalack
Copy link
Contributor

Build backup docker image and upload it to hub.docker with image name pravega/zkbackup:0.1

It might be easier to enhance the operator docker image with the backup script (e.g., copy the script here https://github.com/pravega/zookeeper-operator/blob/master/Dockerfile#L34) and use it in the CronJob with a different entrypoint. This way we won't need to maintain an extra docker image or pull it to a cluster, as it would already be present in the system.

@ibumarskov
Copy link
Author

Build backup docker image and upload it to hub.docker with image name pravega/zkbackup:0.1

It might be easier to enhance the operator docker image with the backup script (e.g., copy the script here https://github.com/pravega/zookeeper-operator/blob/master/Dockerfile#L34) and use it in the CronJob with a different entrypoint. This way we won't need to maintain an extra docker image or pull it to a cluster, as it would already be present in the system.

Good point,
I'll provide new patch with fixes and unit tests soon.

@lunarfs
Copy link
Contributor

lunarfs commented Apr 7, 2022

Hi @ibumarskov where did this land? I am currently evaluating my options for having automated backup of our cluster.. any plans of getting this pr to master

@ibumarskov
Copy link
Author

Hi @ibumarskov where did this land? I am currently evaluating my options for having automated backup of our cluster.. any plans of getting this pr to master

Hi, currently I have not enough time for this task. You can finish the task if you want.

@ibumarskov ibumarskov force-pushed the zk-backup-controller branch from 6b1471c to d98262d Compare April 13, 2022 06:12
Signed-off-by: Ilya Bumarskov <[email protected]>
@ibumarskov ibumarskov force-pushed the zk-backup-controller branch from d98262d to a53eebc Compare April 13, 2022 07:22
@ibumarskov ibumarskov force-pushed the zk-backup-controller branch 4 times, most recently from 532e03b to 06853d8 Compare April 13, 2022 07:56
@ibumarskov ibumarskov force-pushed the zk-backup-controller branch from 06853d8 to c9583f4 Compare April 13, 2022 08:05
@anishakj
Copy link
Contributor

@ibumarskov would you like to update this PR

@ibumarskov ibumarskov force-pushed the zk-backup-controller branch from ed8c666 to 1e7c76c Compare June 22, 2022 11:55
@lunarfs
Copy link
Contributor

lunarfs commented Jun 22, 2022

please add
cat config/crd/bases/zookeeper.pravega.io_zookeeperbackups.yaml >> charts/zookeeper-operator/templates/zookeeper.pravega.io_zookeeperclusters_crd.yaml
to the Makefile line 119
https://github.com/ibumarskov/zookeeper-operator/blob/1e7c76cab67192161adabfbc10e3c63ed146f8f9/Makefile#L118
to have the CRD contain the backup definition

@lunarfs
Copy link
Contributor

lunarfs commented Jun 22, 2022

please add
cat config/crd/bases/zookeeper.pravega.io_zookeeperbackups.yaml >> charts/zookeeper-operator/templates/zookeeper.pravega.io_zookeeperclusters_crd.yaml
to the Makefile line 119
https://github.com/ibumarskov/zookeeper-operator/blob/1e7c76cab67192161adabfbc10e3c63ed146f8f9/Makefile#L118
to have the CRD contain the backup definition

with that change i managed to get a backup scheduled after running make generate to get the new CRD.
Will get back on the status later
the yml i used for scheduling looks like this..
apiVersion: "zookeeper.pravega.io/v1beta1" kind: "ZookeeperBackup" metadata: name: "example-backup" namespace: mynamespace spec: zookeeperCluster: "zookeeper-cluster" schedule: "*/5 * * * *" backupsToKeep: "7" dataStorageClass: "backup-class" image: repository: "<MYREPO>/zookeeper-operator" tag: "0.2.13-47-localbuild-2"

README.md Outdated Show resolved Hide resolved
Comment on lines +73 to +76
type ZookeeperBackupStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty struct

build_backup/backup.sh Outdated Show resolved Hide resolved

func (r *ZookeeperBackupReconciler) Reconcile(_ context.Context, request reconcile.Request) (reconcile.Result, error) {
r.Log = logBk.WithValues("Request.Namespace", request.Namespace, "Request.Name", request.Name)
r.Log.Info("Reconciling ZookeeperBackup")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this message is needed, it would be nice to log the name and the namespace of the CR, otherwise there is no value in the log message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually name (CR) and namespace should be added to each log entry. ZookeeperCluster reconcile function has the same logging. Example:
{"level":"info","ts":1649845790.4947126,"logger":"controller_zookeepercluster","msg":"Reconciling ZookeeperCluster","Request.Namespace":"tf","Request.Name":"tf-zookeeper"}

}
changed := zookeeperBackup.WithDefaults()
if changed {
r.Log.Info("Setting default settings for zookeeper-backup")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the name and the namespace of the CR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All logs are provided with name/namespaces:
r.Log = logBk.WithValues("Request.Namespace", request.Namespace, "Request.Name", request.Name)

zkCluster := zookeeperBackup.Spec.ZookeeperCluster
err = r.Client.Get(context.TODO(), types.NamespacedName{Name: zkCluster, Namespace: zookeeperBackup.Namespace}, foundZookeeperCluster)
if err != nil && errors.IsNotFound(err) {
r.Log.Error(err, fmt.Sprintf("Zookeeper cluster '%s' not found", zkCluster))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, log ZookeeperBackup name and the namespace.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.

@ibumarskov ibumarskov force-pushed the zk-backup-controller branch from 8222c56 to 5d124c4 Compare June 23, 2022 07:45
- Add backup crd to charts (makefile)
- Add validation of BACKUPS_TO_KEEP variable
- Improve logging

Signed-off-by: ibumarskov <[email protected]>
@ibumarskov ibumarskov force-pushed the zk-backup-controller branch from 5d124c4 to 42522e7 Compare June 23, 2022 08:33
@ibumarskov
Copy link
Author

@lunarfs, @jkhalack thanks for yours comments, I have fixed all issues.
@jkhalack regarding Status of backup controller. It would be nice to show the status in backup controller, but this requires additional work and if you don't mind, it can be done as part of another pull request.

@lunarfs
Copy link
Contributor

lunarfs commented Jun 23, 2022

so a bit of feedback, I scheduled a backup yesterday, but i named the cluster wrong, the backup scheduled niceley, but i found it hard to find status on the backup, wat would have been nice was a

kubectl   get zookeeperbackup 
NAME             AGE.      STATUS
example-backup   22h.   Failed, Cluster not found

or something similar, and maybe when it compleets

kubectl   get zookeeperbackup 
NAME             AGE.      STATUS
example-backup   22h.   Backup compleeted 11.22 AM

@anishakj anishakj requested a review from nishant-yt June 23, 2022 13:51
@lunarfs
Copy link
Contributor

lunarfs commented Jun 24, 2022

so a bit of feedback, I scheduled a backup yesterday, but i named the cluster wrong, the backup scheduled niceley, but i found it hard to find status on the backup, wat would have been nice was a

kubectl   get zookeeperbackup 
NAME             AGE.      STATUS
example-backup   22h.   Failed, Cluster not found

or something similar, and maybe when it compleets

kubectl   get zookeeperbackup 
NAME             AGE.      STATUS
example-backup   22h.   Backup compleeted 11.22 AM

also I am not sure that i get it right.. but i would expect this to work

kubectl get ZookeeperBackupList
Error from server (NotFound): Unable to list "zookeeper.pravega.io/v1beta1, Resource=zookeeperbackuplists": the server could not find the requested resource (get zookeeperbackuplists.zookeeper.pravega.io)

@syusuf-r7
Copy link

Hi @ibumarskov I'm also interested in this. Are there plans to get this merged into master soon?

@ibumarskov
Copy link
Author

ibumarskov commented Aug 19, 2022

@anishakj ZookeeperClusterList returns same error:
kubectl get ZookeeperClusterList Error from server (NotFound): Unable to list "zookeeper.pravega.io/v1beta1, Resource=zookeeperclusterlists": the server could not find the requested resource (get zookeeperclusterlists.zookeeper.pravega.io)
Can you please merge this PR and continue to work on improvements within other PRs ?

@anishakj anishakj requested a review from derekm August 19, 2022 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants