Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libsubprocess: do not allow short writes with flux_subprocess_write() #5548

Merged
merged 4 commits into from
Nov 10, 2023

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Nov 9, 2023

This was pulled off some other work.

Currently, flux_subprocess_write() writes any bytes it can to the destination buffer, which could result in a short write (return code < len). However, a check for a short write is not done is most places (a check for rc < 0 seems to be the common case), and where it is checked it is immediately promoted to a fatal error. In some cases (such as the job-ingest pipeline) a short write can be dangerous since it isn't recovered and writing only part of a line "hangs" the protocol.

In the interest of simplicity, add a check for the available buffer space in flux_subprocess_write() and return immediately with return code -1 and errno set to ENOSPC if there is not enough space for all the data. Update any callers that attempt to handle a short write as a special case.

In the future we'll need a more sophisticated buffered write function anyway that can handle flow control and/or a chain of buffers instead of adding short-write handling at each caller (e.g. see #5542)

Problem: The flux_subprocess_write() function may return a short write
if all the data being written does not fit in the destination buffer,
but short writes are tricky to handle and mostly cause a fatal error
in current code anyway. In some places, short writes are not detected
and would result in unpredictable behavior.

Avoid short writes in flux_subprocess_write() by checking the available
buffer space before writing. Return -1 with errno set to ENOSPC if
there is not enough available space to write the entire buffer.
Problem: No tests in the libsubprocess unit tests ensure that
flux_subprocess_write() returns -1 wih errno == ENOSPC when the
requested data cannot be buffered.

Add a test that creates a subprocess with a very small buffer and
ensures that a write exceeding that buffer space returns -1 with ENOSPC
(instead of a short write).
Problem: In the libsubprocess server code, the result of
flux_subprocess_write() is checked for a short write, but this is no
longer a possible result.

Remove the unnecessary check.
Problem: The rexec and rexec_getline tests in the testsuite check
for a short write from flux_subprocess_write(), but this is no longer
possible.

Remove the special case for a short write return code from
flux_subprocess_write() in these test programs.
@grondo grondo force-pushed the subprocess-no-short-writes branch from fbf1832 to 2e8a643 Compare November 9, 2023 19:13
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Merging #5548 (2e8a643) into master (b86d444) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5548      +/-   ##
==========================================
+ Coverage   83.42%   83.45%   +0.02%     
==========================================
  Files         480      487       +7     
  Lines       80730    82286    +1556     
==========================================
+ Hits        67347    68669    +1322     
- Misses      13383    13617     +234     
Files Coverage Δ
src/common/libsubprocess/server.c 81.81% <ø> (-0.16%) ⬇️
src/common/libsubprocess/subprocess.c 86.95% <100.00%> (-0.77%) ⬇️

... and 172 files with indirect coverage changes

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it does a good job of relieving the nastiest symptom of this failure - hanging the validator. LGTM.

@grondo
Copy link
Contributor Author

grondo commented Nov 10, 2023

This solves half the problem. I'll have another PR with the other half tomorrow. (Struggling a bit with how to test...)

I'll set MWP here though.

@mergify mergify bot merged commit f8558d0 into flux-framework:master Nov 10, 2023
32 checks passed
@grondo grondo deleted the subprocess-no-short-writes branch May 29, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants