Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual success of a mid-pipeline task triggers its downstream even though an upstream sensor is still deferred (question about none_failed behavior) #45292

Closed
2 tasks done
darenpang opened this issue Dec 30, 2024 · 4 comments
Assignees
Labels
area:core area:Scheduler including HA (high availability) scheduler kind:bug This is a clearly a bug

Comments

@darenpang
Copy link

darenpang commented Dec 30, 2024

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.10.1

What happened?

I have a DAG with three tasks in sequence:

wait_for_3am (a DateTimeSensorAsync), which waits until 3:00 AM and thus remains in deferred state until the target time.
empty_task_1 (an EmptyOperator), which depends on wait_for_3am.
empty_task_2 (another EmptyOperator), which depends on empty_task_1.
All tasks inherit the default trigger rule of none_failed. According to the documentation for none_failed, all upstream tasks must have “not failed or upstream_failed” (i.e. they should have succeeded or been skipped) in order to trigger the downstream tasks.

However, if I manually mark empty_task_1 as success while wait_for_3am is still in the deferred state (i.e., it has neither failed nor succeeded yet), then empty_task_2 immediately runs. From a scheduler standpoint, empty_task_2 sees empty_task_1 = success and therefore proceeds under none_failed, even though wait_for_3am has not actually completed.

What you think should happen instead?

I expected that if an upstream sensor (wait_for_3am) is still in a non-terminal (deferred) state, marking empty_task_1 as success would not trigger empty_task_2 immediately .

How to reproduce

Try this DAG, mark empty_task_1 as success .

from datetime import datetime, timedelta
from airflow import DAG
from airflow.utils.trigger_rule import TriggerRule
from airflow.sensors.date_time import DateTimeSensorAsync
from airflow.operators.empty import EmptyOperator

default_args = {
    'owner': 'airflow',
    "trigger_rule": TriggerRule.NONE_FAILED,
}

with DAG(
    'simple_dag_with_datetime_sensor',
    default_args=default_args,
    description='A simple DAG with DateTimeSensorAsync and two EmptyOperators',
    schedule_interval=timedelta(days=1),
    start_date=datetime(2024, 12, 30),
    catchup=False,
    max_active_runs=1,
) as dag:

    wait_for_3am = DateTimeSensorAsync(
        task_id='wait_for_3am',
        target_time=datetime.combine(datetime.today(), datetime.min.time()) + timedelta(hours=3),
    )

    empty_task_1 = EmptyOperator(
        task_id='empty_task_1',
    )

    empty_task_2 = EmptyOperator(
        task_id='empty_task_2',
    )

    wait_for_3am >> empty_task_1 >> empty_task_2

Operating System

RHEL8

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@darenpang darenpang added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Dec 30, 2024
@dosubot dosubot bot added the area:Scheduler including HA (high availability) scheduler label Dec 30, 2024
@RNHTTR RNHTTR removed the needs-triage label for new issues that we didn't triage yet label Jan 1, 2025
@RNHTTR
Copy link
Contributor

RNHTTR commented Jan 1, 2025

Assigned you @darenpang

@darenpang
Copy link
Author

@RNHTTR
Would you please confirm the intended behavior?

@darenpang
Copy link
Author

@RNHTTR
I’ve found this in the document.

By default, Airflow will wait for all upstream (direct parents) tasks for a task to be successful before it runs that task.

So Airflow does not recursively check grandparent tasks. It only looks to see if all direct parents are not failed. So this behavior is expected and consistent with the documentation.
If you confirm this , maybe this issue should be closed . Thank you .

@potiuk
Copy link
Member

potiuk commented Jan 5, 2025

Correct. That's expected.

@potiuk potiuk closed this as completed Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:Scheduler including HA (high availability) scheduler kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

3 participants