Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Fix gcs and raylet logging for stdout #48952

Merged
merged 84 commits into from
Dec 22, 2024

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Nov 26, 2024

Same motivation as #48931, but different implementation.

TLDR for the problem:

  • The excessive logging is caused by bug in setting rotation size in C++ side spdlog, and redirection log from python side doesn't have rotation support
  • The proposed solution in this PR is to manage the whole log via spdlog, and disable redirection logic from python
  • This PR updates logging logic for both GCS and raylet, and only for stdout but not stderr, which is left for next PR

Signed-off-by: hjiang <[email protected]>
@dentiny dentiny requested review from jjyao and rynewang November 26, 2024 22:15
@dentiny dentiny requested a review from a team as a code owner November 26, 2024 22:15
@dentiny dentiny added the go add ONLY when ready to merge, run all tests label Nov 26, 2024
Signed-off-by: hjiang <[email protected]>
@dentiny dentiny force-pushed the hjiang/gcs-service-logging branch from 79144b4 to 7c1c266 Compare November 27, 2024 00:47
python/ray/_private/services.py Outdated Show resolved Hide resolved
src/ray/gcs/gcs_server/gcs_server_main.cc Outdated Show resolved Hide resolved
src/ray/gcs/gcs_server/gcs_server_main.cc Outdated Show resolved Hide resolved
? std::numeric_limits<int64_t>::max()
: FLAGS_log_rotation_size;
RAY_CHECK_EQ(setenv(
"RAY_ROTATION_MAX_BYTES", std::to_string(log_rotation_max_size), /*overwrite=*/1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain diff of RAY_ROTATION_MAX_BYTES vs FLAGS_log_rotation_size ? If we already have the former, then we only need to fix existing behavior? I see gcs_server_main.cc already call ray::RayLog::StartRayLog and why does the log rotations in it do not work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't do anything in this PR, instead only do 2 things:

  1. remove python stdout/stderr redirection
  2. change ray_log_shutdown_raii from /*log_dir=*/"" to /*log_dir=*/FLAGS_log_dir

will the rotations automatically work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the file name will be changed. We want to keep the existing gcs_server.out filename for backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the rotations automatically work?

To answer your question, passing the log directory works for log rotation.
But one motivation would be backward compatibility, namely keep the gcs_server.out filename.

src/ray/gcs/gcs_server/gcs_server_main.cc Outdated Show resolved Hide resolved
python/ray/_private/services.py Outdated Show resolved Hide resolved
@dentiny dentiny requested review from rynewang and jjyao November 28, 2024 00:48
Signed-off-by: hjiang <[email protected]>
@rynewang
Copy link
Contributor

rynewang commented Dec 2, 2024

Now we have 2 ways to specify a RayLog storage:

  • log_dir
  • log_file

and log_file has higher priority than log_dir.

This setup is a bit nuanced and instead can we do something simpler like this:

  1. log_dir controls if spdlog logs or not
  2. log_file is relative path (or just a stem name), only used as log file name override

so that:

  1. log_dir non-empty, log_name empty -> writes to log_dir by spdlog default log names, subject to duplicate file name renaming and rotations
  2. log_dir non-empty, log_name non-empty -> writes to log_dir/log_name by spdlog, subject to duplicate file name renaming and rotations
  3. both empty: no spdlog writes
  4. log_dir empty, log_name non-empty -> illegal, RAY_LOG(FATAL)

@dentiny
Copy link
Contributor Author

dentiny commented Dec 2, 2024

Now we have 2 ways to specify a RayLog storage:

  • log_dir
  • log_file

and log_file has higher priority than log_dir.

This setup is a bit nuanced and instead can we do something simpler like this:

  1. log_dir controls if spdlog logs or not
  2. log_file is relative path (or just a stem name), only used as log file name override

so that:

  1. log_dir non-empty, log_name empty -> writes to log_dir by spdlog default log names, subject to duplicate file name renaming and rotations
  2. log_dir non-empty, log_name non-empty -> writes to log_dir/log_name by spdlog, subject to duplicate file name renaming and rotations
  3. both empty: no spdlog writes
  4. log_dir empty, log_name non-empty -> illegal, RAY_LOG(FATAL)

Updated, let me know if I understand correctly.

python/ray/_private/services.py Outdated Show resolved Hide resolved
src/ray/gcs/gcs_server/gcs_server_main.cc Outdated Show resolved Hide resolved
src/ray/util/logging.cc Show resolved Hide resolved
src/ray/util/logging.cc Outdated Show resolved Hide resolved
src/ray/util/logging.cc Outdated Show resolved Hide resolved
src/ray/util/logging.h Outdated Show resolved Hide resolved
@dentiny dentiny requested a review from rynewang December 3, 2024 00:27
@dentiny dentiny force-pushed the hjiang/gcs-service-logging branch from 0410cb3 to 3ebaa57 Compare December 5, 2024 00:38
src/ray/raylet/main.cc Outdated Show resolved Hide resolved
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
src/ray/util/logging.cc Show resolved Hide resolved
const std::string &log_dir,
const std::string &log_filepath,
size_t log_rotation_max_size,
size_t log_rotation_file_num) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where it's used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing argument is not ideal in a way, since you need to define a priority between env vs passed var.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you can always pass in the final rotation size and calculate on the caller side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated.

const std::string &logDir = "");
const std::string &log_dir = "",
const std::string &log_filepath = "",
size_t log_rotation_max_size = kDefaultLogRotationMaxSize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_log_rotation_max_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google coding stlye suggests constant starts with k and CamelCase: https://google.github.io/styleguide/cppguide.html#Constant_Names

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the parameter name. It's the default value when the env var is not set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't do that... otherwise you have to define a priority system for pass-ed value and env value.

Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rynewang please merge after CI passes.

@dentiny
Copy link
Contributor Author

dentiny commented Dec 21, 2024

The failed unit test doesn't seem to be related to my change.

@dentiny
Copy link
Contributor Author

dentiny commented Dec 21, 2024

Hi @rynewang , the CI has passed (prev failures are due to known flaky tests), could you please kindly help me review / merge this PR when you have some time? Thank you so much!

@rynewang rynewang merged commit d49a906 into ray-project:master Dec 22, 2024
5 checks passed
rynewang pushed a commit that referenced this pull request Dec 24, 2024
Followup PR for #48952

This is cleans up TODO items left in prev one, which merges `log_dir`
and `log_filepath` into one.

---------

Signed-off-by: dentiny <[email protected]>
rynewang pushed a commit that referenced this pull request Dec 24, 2024
Followup PR for #48952 to enable
worker log rotation, which I verified to work.

Signed-off-by: dentiny <[email protected]>
srinathk10 pushed a commit that referenced this pull request Jan 3, 2025
Followup PR for #48952

This is cleans up TODO items left in prev one, which merges `log_dir`
and `log_filepath` into one.

---------

Signed-off-by: dentiny <[email protected]>
srinathk10 pushed a commit that referenced this pull request Jan 3, 2025
Followup PR for #48952 to enable
worker log rotation, which I verified to work.

Signed-off-by: dentiny <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants