-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 key timestamp is NOT the timestamp of any log record #459
Comments
As noted here, there are potentially two bugs/needed enhancements, and the second is to support non-UTC timestamp for S3: #432 (comment) |
@PettitWesley I assume this is still a known issue and being worked on and/or tracked? We just saw this on a cluster which has a massive amount of pods (41). The folder we wrote to within S3 had the correct date but the timestamp of the gzip file itself was from yesterday: contents within however are from today. Would it make sense to update the "last modified date" of the gzip file just prior to uploading? |
https://github.com/aws/aws-for-fluent-bit/releases/tag/v2.31.3 All of the S3 fixes will come back soonish, once I complete the S3 stability refactor (code complete and tested but one pending core change to enable it) : PettitWesley/fluent-bit#24 |
@PettitWesley that's for getting back. We'll keep following for now. Not a show-stopper but something we noticed trying to debug logs that left us scratching our heads wondering if we were sane or not :) |
I'm not sure that the bug fix for #459 (comment) above will resolve the issue We are using fluentbit to write logs to s3 and then using Athena partitioning to query the logs eg A file written to s3 with path year=2023/month=10/day=04/hour=03/somefile.gz SELECT * If the file receives its s3 prefix from the time of the first log, this log could contain records from hour 4. Ideally fluentbit would cutover to a new file at the partition change. |
Describe the question/issue
I thought this was not the case but it turns out our code does not actually take the log timestamp and use it to set the file name. Customers likely expect that a S3 file with a certain timestamp would have the first log entry to have that timestamp and then for all subsequent logs in the same S3 file to be afterwards.
Since this is not the case, it may be difficult to find specific logs in files.
See the code here:
The timestamp is always just the current time at which out_s3 started creating the file on disk for buffering. Not the upload time. And not a timestamp from the logs.
The text was updated successfully, but these errors were encountered: