-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out_s3: major refactor to only upload in a single "daemon" routine #6919
base: 1.9
Are you sure you want to change the base?
Conversation
… update s3 warn output messages with function s3_retry_warn() Signed-off-by: Clay Cheng <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
…hunks Signed-off-by: Wesley Pettit <[email protected]>
S3 now always uses sync IO. In the future, we will switch back to async with the concept of a daemon coroutine that is the only thing that can send data. Either way, we don't need to mark that chunks are actively being sent with the locked bool A subsequent commit will refactor this code to use locked for a similar but slightly different purpose. Locked will mean a chunk is ready/done and there should be no new appends to it. Locked chunks will be added to a queue for sending. Unlocked chunks are ready for new flushes to buffer new data. Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
…timer callback only Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
@PettitWesley hey there, will this PR help with enabling concurrency which means we can enable multiple workers? |
@kailunshi No... this PR is incomplete but once complete aims simply to fix instabilities in the plugin... Adding support for multi-workers is a much more long term thing. Unfortunately since S3 has its own internal buffer there needs to be synchronization if multiple threads of trying to send. While that could be implemented, its not high priority right now IMO. S3 is fairly performant: see our load test results here: https://github.com/aws/aws-for-fluent-bit/releases |
@PettitWesley i agree it's pretty performant but we hit the bottleneck of 15K per second without gzip and less than 10K rps with compression enabled. |
@kailunshi is the 15K per second there correct? Is it bytes or records? if its bytes, that doesn't match up with our load test results: https://github.com/aws/aws-for-fluent-bit/releases If its records, how large are your records? Our load tests use a 1KB log size, so our 30 MB/s is 30,000 records per second actually... |
@PettitWesley sorry by 15k i mean, 15,000 log record (all json format) per second. i randomly checked a few records, and they are around 1.5KB to 2KB. i think the bottleneck got hit with compression turned on as i think it's running in the same CPU. in your load test, do you enable compression? |
@kailunshi we do not enable compression, its just a basic upload to S3 test: https://github.com/aws/aws-for-fluent-bit/blob/mainline/load_tests/logger/stdout_logger/log_configuration/s3.json |
@PettitWesley yeah with compression, one cpu quickly gets saturated which causes the limitation for us. |
27f898b
to
b311a0d
Compare
Signed-off-by: Wesley Pettit <[email protected]>
b311a0d
to
78b0e2f
Compare
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
plugins/out_s3/s3.c
Outdated
*/ | ||
cb_s3_upload(config, ctx); | ||
} | ||
/* this would use sync IO which we want to avoid */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that this means old data will not be sent until the first chunk is received. Maybe it's fine to use sync networking here since it's only once. We just don't want to consistantly use sync network.
If so, cb_s3_upload may need a new parameter called is_async
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I'm still debating this part myself. And testing. This is one bit which can block shutdown from happening properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just temporary while I'm testing. I need to get the flush startup chunks routine working and testing in the daemon coroutine first, then I'll add this back.
Currently the flush startup chunks routine is flushing all chunks, since it just iterates over the fstore queue...
We have to have this because otherwise S3 won't upload any old data until its sent new data, which is silly.
plugins/out_s3/s3.c
Outdated
if (chunk) { | ||
chunk->failures += 1; | ||
if (chunk->failures > ctx->ins->retry_limit){ | ||
s3_retry_warn(ctx, tag, chunk->input_name, create_time, FLB_FALSE); | ||
s3_store_file_delete(ctx, chunk); | ||
s3_retry_warn(ctx, tag, chunk->input_name, file_first_log_time, FLB_FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leonardo is the one who found the create_time issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah i see.
if (chunk) { | ||
chunk->failures += 1; | ||
if (chunk->failures > ctx->ins->retry_limit){ | ||
s3_retry_warn(ctx, tag, chunk->input_name, create_time, FLB_FALSE); | ||
s3_store_file_delete(ctx, chunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after failures is greater than retry limit why is the chunk not deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You're unifying all the chunk maintenance logic to send. This makes things simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we have to remove from the upload_queue list before deleting else segfault IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized we need to support the unlimited retries option here: https://docs.fluentbit.io/manual/administration/scheduling-and-retries
… coro on shutdown, send timed out chunks in daemon coro Signed-off-by: Wesley Pettit <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to check this more...
chunk->create_time, FLB_FALSE); | ||
if (chunk->locked == FLB_TRUE) { | ||
/* remove from upload_queue */ | ||
if (chunk->_head.next != NULL && chunk->_head.prev != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check is valid, because calloc is used to create s3_file
} | ||
ctx->timer_created = FLB_TRUE; | ||
/* Notify the event loop about our return status */ | ||
n = flb_pipe_w(pipe_fd, (void *) &val, sizeof(val)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong here.
The following line doesn't affect this coroutine because flb_output_flush_prepare_destroy(out_flush);
was deleted
fluent-bit/src/flb_output_thread.c
Line 91 in 9dc3f9e
flb_output_flush_finished(config, out_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly clear how the current code is working after the coroutine for it is destroyed.
- handle_output_event
- flb_output_flush_finished
- flb_output_flush_destroy
- flb_coro_destroy
- co_delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the coroutine is not destroyed at all, the task is just marked as completed basically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code revision is looking promising! Added some concerns about the new stopping condition added.
/* | ||
* FLB engine uses a graceful cooperative shutdown model. | ||
* If coroutines never end, the system won't stop. | ||
* So the daemon coroutine must exit itself when the engine is in shutdown mode. | ||
*/ | ||
while (config->is_running == FLB_TRUE) { | ||
/* Cleanup old buffers found on startup */ | ||
flush_startup_chunks(ctx); | ||
|
||
/* upload any ready chunks */ | ||
cb_s3_upload(config, ctx); | ||
|
||
if (config->is_running == FLB_FALSE) { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Do you know if retries are not processed after the grace timeout has expired?
I see ->is_running is set in the flb_engine_shutdown function after grace timeout is triggered (as mentioned in the comment)
https://github.com/fluent/fluent-bit/blob/1.9/src/flb_engine.c
If retries can still be triggered, however, then we may have an edge case where a new chunk could be ingested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concern 2
If there are failures in cb_s3_upload, then we may want this daemon thread to continue running. In that case, the while loop condition could be:
while (config->is_running == FLB_TRUE || <there is an item in the chunk buffer>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concern 3
What if there are a bunch of coroutines scheduled to run and deliver data to s3, but haven't run yet. This data will be lost.
Is there a way to identify the number of chunks on a plugin? I see we have input plugins
Maybe you could wait for the buffer space to drop to 0 indicating that all the data has been successfully enqueued into the s3 buffer. Then also check that the buffer is empty
fs_backlog_chunks_size:
fluent-bit/include/fluent-bit/flb_output.h
Line 366 in 703da7d
size_t fs_backlog_chunks_size; |
So we have 3 places where logs can exist
1 Input - This seems to stop being ingested here:
Lines 474 to 477 in 52fe1bd
/* flb_engine_shutdown was already initiated */ | |
if (config->is_running == FLB_FALSE) { | |
return 0; | |
} |
2. As a Flush Coro that hasn't started yet or has started. - Most likely this is indicated by the fs_backlog_chunks_size field. 0, means that this data has been sent to (3.)
3. The S3 Buffer - Once processed, the data is sent to the s3 buffer. The buffer's linked list being empty would indicate that this data has been sent or dropped.
Let's make sure all data from 2 and 3 are processed (empty) before stopping the daemon coroutine, as it is needed to send out the last bits of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the revised while condition would be:
while (config->is_running == FLB_TRUE || x->fs_backlog_chunks_size != 0 || <there is an item in the chunk buffer>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to preserve the existing behavior which is that S3 never blocks the engine shutdown. Since S3 always buffers locally, any data it gets sent can be recovered. Adding some sort of wait would introduce new shutdown blocking behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If retries can still be triggered, however, then we may have an edge case where a new chunk could be ingested.
Its not.
Once all flush coros are done, the engine will invoke cb_s3_exit which will try once to send all pending chunks. So the data still has a chance to be sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, if retries aren't processed, and we don't want the engine to be blocked on shutdown then terminating the daemon coroutine seems like a good idea.
Not sure if cb_s3_exit can use the async network stack.
Also it looks like flb_output_exit which invokes the cb_exit callback is called before the remaining coroutines finish. I suppose this is acceptable, because after the grace timeout, all logs that should be processed already had their chance to do so.
Line 1042 in 52fe1bd
flb_output_exit(config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if cb_s3_exit can use the async network stack.
I think its fine if it uses sync?
Also it looks like flb_output_exit which invokes the cb_exit callback is called before the remaining coroutines finish. I suppose this is acceptable, because after the grace timeout, all logs that should be processed already had their chance to do so.
this is not what I see in the code... I see that its asking coros to stop happens first, but there's no wait/guarantee that they have finished before cb_exit...
though in my testing before when the daemon coro didn't end, I didn't see the cb_s3_exit being run... so I think we might be misreading the code here.
If the exit callback and the daemon coro attempted to send data at the same time that could cause problems.
I;'ll look more into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Let me know what you find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments make sense to me. Shutting down when grace expires, even if some remaining coroutines are queued to add data to the buffer seems reasonable. We could say that at this point in time, after the grace is expired, that the remaining data should have already been sent out.
I think that the current way of doing things keeps things simple and is effective.
plugins/out_s3/s3.c
Outdated
cb_s3_upload(config, ctx); | ||
} | ||
/* this would use sync IO which we want to avoid */ | ||
// /* clean up any old buffers found on startup */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PettitWesley I keep forgetting to add this back...
@@ -1202,17 +1172,12 @@ static int upload_data(struct flb_s3 *ctx, struct s3_file *chunk, | |||
else { | |||
s3_retry_warn(ctx, (char *) chunk->fsf->meta_buf, m_upload->input_name, | |||
chunk->create_time, FLB_TRUE); | |||
s3_store_file_unlock(chunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few lines above here we call s3_decrement_index(ctx);
after UploadPart failure. But its only incremented on CreateUpload, not for each part. So this is a bug.
I think I'll fix that in a separate bug fix PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh oops. I read this wrong. We do it only if we are removing the upload, so this is correct.
@@ -1041,6 +1029,7 @@ static int upload_data(struct flb_s3 *ctx, struct s3_file *chunk, | |||
*/ | |||
if (chunk != NULL) { | |||
file_first_log_time = chunk->first_log_time; | |||
input_name = chunk->input_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to actually use this below
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
As 1.9 branch is no longer actively maintained, this PR won't be merged. For the AWS distro, I have this PR here: https://github.com/PettitWesley/fluent-bit/pull/24/files Which is slightly different since AWS distro has diverged slightly from 1.9 branch. |
/* skip metadata stream */ | ||
if (fs_stream == ctx->stream_metadata) { | ||
continue; | ||
} | ||
|
||
/* on startup, we only send old chunks in this routine */ | ||
if (is_startup == FLB_TRUE && fs_stream == ctx->stream_active) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: need to double check that I didn't mess up the async flags for startup vs normal execution
@@ -24,20 +24,22 @@ | |||
#include <fluent-bit/flb_fstore.h> | |||
|
|||
struct s3_file { | |||
int locked; /* locked chunk is busy, cannot write to it */ | |||
int locked; /* locked = no appends to this chunk */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just remove the unlock function, to prevent accidental bugs, its not needed anymore.
This PR will not be updated and is missing the async_flags setting. Instead I am using: PettitWesley#24 |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.