Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No logs after task completion #45376

Open
1 of 2 tasks
asemelianov opened this issue Jan 3, 2025 · 6 comments
Open
1 of 2 tasks

No logs after task completion #45376

asemelianov opened this issue Jan 3, 2025 · 6 comments
Labels
area:helm-chart Airflow Helm Chart area:logging area:UI Related to UI/UX. For Frontend Developers. kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet

Comments

@asemelianov
Copy link

asemelianov commented Jan 3, 2025

Official Helm Chart version

1.15.0 (latest released)

Apache Airflow version

2.10.4

Kubernetes Version

1.28

Helm Chart configuration

No response

Docker Image customizations

No response

What happened

We use CeleryKubernetes Executor. Not in all cases, but sometimes after the task is completed, the output with logs disappears. In order for the output with logs to appear in the UI, you have to refresh the page.
Снимок экрана 2025-01-03 в 15 16 52

What you think should happen instead

No response

How to reproduce

We launch the pipeline, select any task and wait for its completion. After the task is completed, the output disappears. We refresh the page, the output with logs is in place. Sometimes refreshing the page does not immediately display the logs, but you need to wait about a minute.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@asemelianov asemelianov added area:helm-chart Airflow Helm Chart kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jan 3, 2025
Copy link

boring-cyborg bot commented Jan 3, 2025

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added area:logging area:UI Related to UI/UX. For Frontend Developers. labels Jan 3, 2025
@javier-salazar-zefr
Copy link

We experienced a similar issue with version 2.10.3. In our case what was happening was that it would fail to fetch remote logs. From looking at the requests it seemed it only tried to get attempt=1 which did not exist in the bucket then after refreshing it would get attempt=2.log and that would successfully show the logs in the UI, this behavior was consistent with every task on the DAG.

After trying different things we ended up downgrading to 2.9.2 and the issue went away. In version 2.9.2 it seems that it gets all attempts on the first time loading the screen so it showed 1 and 2, if we looked at 1 we could see the same error we saw on that first load on 2.10.3 but clicking on the 2 tab showed the full logs from the bucket.

Unsure if this would be the same issue but they sound similar.

@potiuk
Copy link
Member

potiuk commented Jan 6, 2025

Unsure if this would be the same issue but they sound similar.

That sounds like related to num_try change @dstandish (#39336)

@potiuk potiuk added this to the Airflow 2.10.5 milestone Jan 6, 2025
@asemelianov
Copy link
Author

@javier-salazar-zefr @potiuk thanks!

@dstandish
Copy link
Contributor

This issue (i.e. the one reported by @asemelianov ) does not seem try number related. The reported issue is after task completion, the logs are temporarily absent. This is not terribly surprising because while the task is running, airlfow will read logs from the worker. Then, when task done, the logs are uploaded to blob storage (most commonly) and then the webserver will have to read from there instead. There were other recent meddlings with this area of code, i.e. dealing with scenarios when the logs are not found on worker, and that also could change this behavior.

I'm not quite clear on the scenario from @javier-salazar-zefr but it may warrant a separate issue with more detailed explanation.

@Jorricks
Copy link
Contributor

Jorricks commented Jan 14, 2025

I think this might be unrelated to the issue posted here, but decided to reply here as #39336 was mentioned as a potential issue here. I think the try_number is actually correct except in a very specific case;

We just did the Airflow upgrade of 2.3.4 to 2.10.4. Usually, we are very strict on turning of all components during any sort of maintenance, this time we were a bit more Yolo (as it was our beta environment). The only preparation was setting some pools to 0, leading to quite some tasks being queued in the pools.

We found out that the tasks that were queued before the upgrade were uploading logs to try_number=0. This checks out with the change regarding the try_number; #39336. This is because previously with Airflow 2.3.4 we'd increment the version in the workers, now with 2.10.4 we increment it in the scheduler. For the tasks that were queued with 2.3.4, but executed on celery workers of 2.10.4, they never got an incremented try_number.

We were actually very impressed with the upgrade as we left some 2.3.4 workers on (on k8s in terminating state) that were still finishing some tasks, and even during the DB upgrade & after the DB upgrade, the jobs they were executing finished fine and were marked in the DB as a success.

TLDR; we only had issues with the ones that are scheduled with <2.10 and executed with celery workers >=2.10, as we could not fetch their logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart area:logging area:UI Related to UI/UX. For Frontend Developers. kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

5 participants