Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PD and TiKV wait until local IP address matches the one published to external DNS #5381

Merged
merged 17 commits into from
Nov 17, 2023

Conversation

smineyev81
Copy link
Contributor

@smineyev81 smineyev81 commented Nov 9, 2023

What problem does this PR solve?

Currently PD and TiKV pods await for dns name assigned to them to be resolvable by external DNS but they do not check if that IP matches IP assigned to them which causes problems on pod restart when DNS has old IP which might not match new IP. Which leads to false-positive: PD/TiKv pods are ready from k8s point of view but not accessible by dns name assigned to them.

What is changed and how does it work?

This change adds extra check to DP and TiKv startup scripts that makes sure that IP address received from external DNS matches the IP on a host (essentially nslookup result should match one of the IPs returned by hostname -I)

Code changes

  • Has Go code change
  • Has CI related scripts change

Tests

  • Unit test
  • E2E test
  • Manual test
  • No code

Side effects

  • Breaking backward compatibility
  • Other side effects:

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release Notes

NONE

Copy link
Contributor

ti-chi-bot bot commented Nov 9, 2023

Welcome @smineyev81! It looks like this is your first PR to pingcap/tidb-operator 🎉

Copy link
Member

@csuzhangxc csuzhangxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we update the start scripts directly and once we upgrade the TiDB Operator, all existing components will be restarted, this is not acceptable in some scenes.

@smineyev81
Copy link
Contributor Author

smineyev81 commented Nov 11, 2023

If we update the start scripts directly and once we upgrade the TiDB Operator, all existing components will be restarted, this is not acceptable in some scenes.

addressed with moving new feature under feature flag which is exposed via new field: TidbCluster.Spec.WaitForDnsNameIpMatchOnStartup

@codecov-commenter
Copy link

codecov-commenter commented Nov 13, 2023

Codecov Report

Merging #5381 (f925376) into master (d83372b) will increase coverage by 6.07%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5381      +/-   ##
==========================================
+ Coverage   61.56%   67.63%   +6.07%     
==========================================
  Files         228      239      +11     
  Lines       28866    32644    +3778     
==========================================
+ Hits        17772    22080    +4308     
+ Misses       9349     8761     -588     
- Partials     1745     1803      +58     
Flag Coverage Δ
e2e 47.34% <79.41%> (?)
unittest 61.62% <100.00%> (+0.05%) ⬆️

Copy link
Member

@csuzhangxc csuzhangxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please execute hack/update-all.sh to update CRDs to fix the CI

pkg/manager/member/startscript/v2/common.go Outdated Show resolved Hide resolved
@smineyev81
Copy link
Contributor Author

please execute hack/update-all.sh to update CRDs to fix the CI

regenerated and pushed

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-across-kubernetes

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-serial


// WaitForDnsNameIpMatchOnStartup indicates whether PD and TiKV has to wait
// until local IP address matches the one published to external DNS
WaitForDnsNameIpMatchOnStartup bool `json:"waitForDnsNameIpMatchOnStartup,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a comment to indicate it is only supported in start script v2 now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any new commit after this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any new commit after this comment.

sorry, push did not go through first time.
now changes are there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to re-generate OpenAPI spec.

Copy link
Contributor Author

@smineyev81 smineyev81 Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think if I change WaitForDnsNameIpMatchOnStartup bool with more generic field StartupScriptFeatureFlags []string json:"startupScriptFeatureFlags, omitempty"`` ? This will allow us to enable features without extra fields to CRD in future. See last commit.

PS: I want to change this as we foresee another feature that needs to be added to startup scripts very soon

@csuzhangxc
Copy link
Member

/run-all-tests

@csuzhangxc
Copy link
Member

/run-all-tests

do not be surprised if some E2E tests failed as the CI environment is not stable now

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-across-kubernetes

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-br

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-across-kubernetes

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-br

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-tngm

1 similar comment
@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-tngm

@csuzhangxc
Copy link
Member

/run-pull-e2e-kind-across-kubernetes

@ti-chi-bot ti-chi-bot bot added the lgtm label Nov 17, 2023
Copy link
Contributor

ti-chi-bot bot commented Nov 17, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csuzhangxc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented Nov 17, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-11-17 05:52:13.444743863 +0000 UTC m=+4401131.031854000: ☑️ agreed by csuzhangxc.

@ti-chi-bot ti-chi-bot bot added the approved label Nov 17, 2023
@csuzhangxc csuzhangxc merged commit d8426ec into pingcap:master Nov 17, 2023
4 checks passed
@csuzhangxc
Copy link
Member

/cherry-pick release-1.5

@ti-chi-bot
Copy link
Member

@csuzhangxc: new pull request created to branch release-1.5: #5399.

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot pushed a commit to ti-chi-bot/tidb-operator that referenced this pull request Nov 17, 2023
csuzhangxc added a commit that referenced this pull request Nov 17, 2023
…d to external DNS (#5381) (#5399)

Co-authored-by: Sergey <[email protected]>
Co-authored-by: csuzhangxc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants