Define extension API for user-defined invariants. #14029

wiedld · 2025-01-06T19:05:08Z

Is your feature request related to a problem or challenge?

As part of the work to automatically check invariants for the logical and execution plans, we have provided infrastructure to run an invariant checker. This invariant checker runs at limited time points in order to not degrade planning performance; e.g. after all optimizations are completed. In debug mode, it runs these checks more often and can therefore help to quickly isolate at which point (e.g. which specific optimizer run) make the plan become invalid.

We want to also enable users to add their own invariants. Users are already able to add their own Logical and Execution plan extensions, as well as their own optimization runs which modify these plans. Therefore it may be useful for an invariant extension interface for user-defined invariants. e.g. If a change in Datafusion core's optimizer passes will cause a problem in a user-defined Logical plan extension, then the user could define an invariant based upon what their Logical plan extension requires.

Refer to specific examples in this conversation, for plan extensions which have their own invariants. For the example case of our own ProgressiveEval -- we require the input partition streams to have specific sort orders, non-overlapping column ranges, and no pushdown of the offset (issue) in order to provide the correct result. An invariant check, performed after each optimizer run (while in debug mode), would enable us to quickly isolate the problem during DF upgrade.

(We have several other, more complex, examples of how changes in the optimization of UNIONs has produced invalid plans for our SortPreservingMerge. So this is not a one-off example, the above is merely the simplest concrete example.)

Describe the solution you'd like

Take the existing invariant infrastructure provided as part of this issue, and provide extension points for users to define their own invariants.

Describe alternatives you've considered

Alternative 1: for a user-defined Execution plan extension, have a runtime check of invariants be performed.
- Con: this detects problems after planning time, thereby increasing both time-until-error as well as resource utilization.
Alternative 2: for either Logical or Physical plan extensions, the user can define an optimization run which is intended to detect invariant violations which are in conflict with their plan extensions.
- Pro: can detect invariant violation at planning time
- Con: arguably more code code complexity:
  - in order to isolate exactly which plan mutation (Datafusion core change) caused the problem, it would need to be coded to run after each optimizer pass.

Additional context

No response

The text was updated successfully, but these errors were encountered:

alamb · 2025-01-06T20:49:56Z

Thanks @wiedld -- I don't fully understand the usecase

Take the existing invariant infrastructure provided as part of #13652 (comment), and provide extension points for users to define their own invariants.

Could you provide an example of such an invariant?

I normally think of "invariants" as some property that always holds true for a certain type of node (for example that LogicalPlan::Join always has 2 inputs). The invariants in this case are defined by the semantics of the node itself (so as a user I couldn't add a invariant that LogicalPlan::Join had 3 inputs)

It would perhaps make sense to provide a way to define invariants for UserDefinedLogicalNode and user provided implementations of ExecutionPlan

Defining an invarint check for ExecutionPlan I think would satisfy the usecase you mention above having specific rules for ProgressiveEval (a user defined extension node)

alamb · 2025-01-06T20:52:25Z

For example, if we added a function like this to the ExecutionPlan trait, as proposed in #13986 (comment) I think that would permit implementing the invariant check without any more work as I understand it

impl ExecutionPlan {
...
    fn check_node_invariants(&self, invariant_level: InvariantLevel) -> Result<()> 
      Ok(())
    }
...
}

wiedld · 2025-01-06T20:55:23Z

For example, if we added a function like this to the ExecutionPlan trait, as proposed in #13986 (comment) I think that would permit implementing the invariant check without any more work as I understand it

Agreed. If this^^ is the final form of the execution plan invariants, then we don't need an additional interface for the execution plan invariant extensions. But that would still leave the logical plan invariant extensions for consideration.

If that WIP keeps the ExecutionPlan::check_node_invariants in it's final merge-able state, then we have half of this ticket completed.

alamb · 2025-01-07T11:28:27Z

But that would still leave the logical plan invariant extensions for consideration.

I would recommend adding a method to: https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.UserDefinedLogicalNode.html

Perhaps even the same API:

pub trait UserDefinedLogicalNode { 
...
    fn check_invariants(&self, invariant_level: InvariantLevel) -> Result<()> 
      Ok(())
    }
...

🤔

Then you could extend LogicalPlan::check_invariants to call that method for any LogicalPlan::UserDefined

datafusion/datafusion/expr/src/logical_plan/plan.rs

Line 1134 in 4e877a0

pub fn check_invariants(&self, check: InvariantLevel) -> Result<()> {

wiedld added the enhancement New feature or request label Jan 6, 2025

wiedld mentioned this issue Jan 6, 2025

Automatically check "invariants" #13652

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define extension API for user-defined invariants. #14029

Define extension API for user-defined invariants. #14029

wiedld commented Jan 6, 2025

alamb commented Jan 6, 2025

alamb commented Jan 6, 2025

wiedld commented Jan 6, 2025 •

edited

Loading

alamb commented Jan 7, 2025

Define extension API for user-defined invariants. #14029

Define extension API for user-defined invariants. #14029

Comments

wiedld commented Jan 6, 2025

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

alamb commented Jan 6, 2025

alamb commented Jan 6, 2025

wiedld commented Jan 6, 2025 • edited Loading

alamb commented Jan 7, 2025

wiedld commented Jan 6, 2025 •

edited

Loading