AIP-81 Implement Log Insert Endpoint to REST API (FastAPI) #45304

bugraoz93 · 2024-12-30T21:12:31Z

Description

We should be able to insert Log (from airflow.models.log import Log) similar to here,

airflow/airflow/utils/cli_action_loggers.py

Line 106 in 52ed7d7

def default_action_log(

This will enable us to use the logging feature without relying on the session.

Use case/motivation

To provide the same functionality for the CLI with API integration.
AIP-81

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

jason810496 · 2025-01-03T12:55:25Z

Hi @bugraoz93,

If I may suggest, this could also be achieved by using a command-line-specific audience in the JWT. Once the new action_logging decorator is applied to FastAPI, the audit log logic could be handled in a single API call.
This approach could also reduce the chances of missing audit logs. For instance, if the CLI is forcibly terminated right after calling the core API and before the audit log API is invoked, the audit log might not be recorded.

What do you think?
cc @pierrejeambrun @vincbeck

perry2of5 · 2025-01-03T19:35:21Z

What would this log from the CLI code that shouldn't already be logged by the REST API itself?

bugraoz93 · 2025-01-03T20:42:39Z

Hi @bugraoz93,

If I may suggest, this could also be achieved by using a command-line-specific audience in the JWT. Once the new action_logging decorator is applied to FastAPI, the audit log logic could be handled in a single API call. This approach could also reduce the chances of missing audit logs. For instance, if the CLI is forcibly terminated right after calling the core API and before the audit log API is invoked, the audit log might not be recorded.

What do you think? cc @pierrejeambrun @vincbeck

@jason810496 It makes sense. There is one small thing here, we should identify when the event originates from cli_ and log the full_command. One way to achieve this is by using headers in the API, though this would result in a custom implementation specific to the CLI on the API side.
If we intend to include this functionality in the REST API for the CLI use case, I would be happy to close this in favour of API auditing the CLI actions. Otherwise, I would suggest moving forward with this approach.

bugraoz93 · 2025-01-03T20:47:12Z

What would this log from the CLI code that shouldn't already be logged by the REST API itself?

We are passing the event as a cli_ with extra as full_command (json.dumps({"host_name": host_name, "full_command": full_command})). This is a specific event sent by the CLI that wraps all the CLI action commands. Below, cli_action_logger will stay for local commands, but it should be replaced for remote commands because this uses the session. There are two things to consider here. The first is API call Audit Logs, which will again come from signed in users, which is the regular API auditing. The second one is we should also log this event coming from CLI means event: cli_<> and extra: full_command, while understanding that the requests are coming from the CLI via headers or similar. If we only use the API auditing, there will be nothing to differentiate if the action is taken from the CLI or not. This again requires bespoke implementation on the API end to make it possible. This brings about maintaining code pieces in the API for the CLI, which can make the two things more tightly coupled. That's why I thought it would be easier just to implement an endpoint for it. I think understanding if a call is coming from the CLI would make things a lot easier to debug in case of security emergencies or for collecting evidence for breaches.

airflow/airflow/utils/cli.py

Line 117 in a2c5a48

cli_action_loggers.on_post_execution(**metrics)

airflow/airflow/utils/cli_action_loggers.py

Lines 118 to 160 in 52ed7d7

    
               Behave similar to ``action_logging``; default action logger callback. 
        
               The difference is this function uses the global ORM session, and pushes a 
        
               ``Log`` row into the database instead of actually logging. 
        
               """ 
        
               from sqlalchemy.exc import OperationalError, ProgrammingError 
        
               from airflow.models.log import Log 
        
               from airflow.utils import timezone 
        
               try: 
        
                   # Use bulk_insert_mappings here to avoid importing all models (which using the classes does) early 
        
                   # on in the CLI 
        
                   session.bulk_insert_mappings( 
        
                       Log, 
        
                       [ 
        
                           { 
        
                               "event": f"cli_{sub_command}", 
        
                               "task_instance": None, 
        
                               "owner": user, 
        
                               "extra": json.dumps({"host_name": host_name, "full_command": full_command}), 
        
                               "task_id": task_id, 
        
                               "dag_id": dag_id, 
        
                               "logical_date": logical_date, 
        
                               "dttm": timezone.utcnow(), 
        
                           } 
        
                       ], 
        
                   ) 
        
                   session.commit() 
        
               except (OperationalError, ProgrammingError) as e: 
        
                   expected = [ 
        
                       '"log" does not exist',  # postgres 
        
                       "no such table",  # sqlite 
        
                       "log' doesn't exist",  # mysql 
        
                   ] 
        
                   error_is_ok = e.args and any(x in e.args[0] for x in expected) 
        
                   if not error_is_ok: 
        
                       logger.warning("Failed to log action %s", e) 
        
                   session.rollback() 
        
               except Exception as e: 
        
                   logger.warning("Failed to log action %s", e) 
        
                   session.rollback()

perry2of5 · 2025-01-03T22:20:21Z

Thanks.

Perhaps this is a solved problem, but I can't think of a way to guarantee we know if a client is CLI or something else if a malicious actor is involved. I would think they could do the OAuth dance in the CLI way or the WebUI way and get a different audience unless a particular account was locked to only one type or the other. Still, it would be a known, though likely compromised, account that accessed the API..... In this case knowing the IP address the request reached the outermost load balancer is probably as much as we can rely on. Possibly my knowledge is out of date here.

For a non-malicious actor, knowing the exact CLI command might help to research why something was changed a particular way. I suppose the question is: is there is enough value to knowing the exact CLI command in addition to the REST API request parameters and body to be worth adding the special case? Beyond that, knowing the audience (and knowing not to depend on it) might be as much as is worth implementing. I'm wishy-washy on the value of this. I hope my ramblings are useful :)

bugraoz93 · 2025-01-03T22:46:41Z

I was giving an example of why we should maintain the behaviour and explaining why this feature is needed in case the current behaviours persist. Keeping an additional audit log is always valuable when it comes to security.

I think every idea is useful here, as it brings another perspective to the table. Thanks! :)

bugraoz93 added kind:feature Feature Requests area:API Airflow's REST/HTTP API AIP-81 Enhanced Security in CLI via Integration of API labels Dec 30, 2024

github-project-automation bot added this to AIP-81: Enhanced Security in CLI via Integration of API Dec 30, 2024

bugraoz93 moved this to Todo (Ready) in AIP-81: Enhanced Security in CLI via Integration of API Dec 30, 2024

bugraoz93 changed the title ~~AIP-81 Implement Log Insert Endpoint to REST API~~ AIP-81 Implement Log Insert Endpoint to REST API (FastAPI) Dec 30, 2024

dosubot bot added the area:logging label Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIP-81 Implement Log Insert Endpoint to REST API (FastAPI) #45304

AIP-81 Implement Log Insert Endpoint to REST API (FastAPI) #45304

bugraoz93 commented Dec 30, 2024

jason810496 commented Jan 3, 2025 •

edited

Loading

perry2of5 commented Jan 3, 2025

bugraoz93 commented Jan 3, 2025

bugraoz93 commented Jan 3, 2025

perry2of5 commented Jan 3, 2025

bugraoz93 commented Jan 3, 2025

AIP-81 Implement Log Insert Endpoint to REST API (FastAPI) #45304

AIP-81 Implement Log Insert Endpoint to REST API (FastAPI) #45304

Comments

bugraoz93 commented Dec 30, 2024

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

jason810496 commented Jan 3, 2025 • edited Loading

perry2of5 commented Jan 3, 2025

bugraoz93 commented Jan 3, 2025

bugraoz93 commented Jan 3, 2025

perry2of5 commented Jan 3, 2025

bugraoz93 commented Jan 3, 2025

jason810496 commented Jan 3, 2025 •

edited

Loading