-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perflint
] fix invalid hoist in perf401
#14369
Conversation
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
PERF401 | 38 | 18 | 20 | 0 | 0 |
Linter (preview)
ℹ️ ecosystem check detected linter changes. (+18 -20 violations, +0 -0 fixes in 4 projects; 51 projects unchanged)
apache/airflow (+6 -6 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL
- dev/breeze/src/airflow_breeze/commands/release_management_commands.py:3074:9: PERF401 Use `list.extend` to create a transformed list + dev/breeze/src/airflow_breeze/commands/release_management_commands.py:3074:9: PERF401 Use a list comprehension to create a transformed list - dev/breeze/src/airflow_breeze/utils/exclude_from_matrix.py:32:9: PERF401 Use `list.extend` to create a transformed list + dev/breeze/src/airflow_breeze/utils/exclude_from_matrix.py:32:9: PERF401 Use a list comprehension to create a transformed list - dev/breeze/src/airflow_breeze/utils/packages.py:327:9: PERF401 Use `list.extend` to create a transformed list + dev/breeze/src/airflow_breeze/utils/packages.py:327:9: PERF401 Use a list comprehension to create a transformed list - docs/exts/docs_build/fetch_inventories.py:103:9: PERF401 Use `list.extend` to create a transformed list + docs/exts/docs_build/fetch_inventories.py:103:9: PERF401 Use a list comprehension to create a transformed list - providers/src/airflow/providers/microsoft/azure/hooks/wasb.py:721:13: PERF401 Use `list.extend` with an async comprehension to create a transformed list + providers/src/airflow/providers/microsoft/azure/hooks/wasb.py:721:13: PERF401 Use an async list comprehension to create a transformed list - scripts/in_container/update_quarantined_test_status.py:81:13: PERF401 Use `list.extend` to create a transformed list + scripts/in_container/update_quarantined_test_status.py:81:13: PERF401 Use a list comprehension to create a transformed list
apache/superset (+8 -8 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL
- superset/db_engine_specs/base.py:107:9: PERF401 Use `list.extend` to create a transformed list + superset/db_engine_specs/base.py:107:9: PERF401 Use a list comprehension to create a transformed list + superset/db_engine_specs/lib.py:245:9: PERF401 Use `list.extend` to create a transformed list - superset/db_engine_specs/lib.py:245:9: PERF401 Use a list comprehension to create a transformed list + superset/db_engine_specs/lib.py:251:9: PERF401 Use `list.extend` to create a transformed list - superset/db_engine_specs/lib.py:251:9: PERF401 Use a list comprehension to create a transformed list + superset/db_engine_specs/lib.py:261:9: PERF401 Use `list.extend` to create a transformed list - superset/db_engine_specs/lib.py:261:9: PERF401 Use a list comprehension to create a transformed list + superset/db_engine_specs/lib.py:281:9: PERF401 Use `list.extend` to create a transformed list - superset/db_engine_specs/lib.py:281:9: PERF401 Use a list comprehension to create a transformed list + superset/db_engine_specs/lib.py:293:9: PERF401 Use `list.extend` to create a transformed list - superset/db_engine_specs/lib.py:293:9: PERF401 Use a list comprehension to create a transformed list + superset/tasks/cache.py:208:13: PERF401 Use `list.extend` to create a transformed list - superset/tasks/cache.py:208:13: PERF401 Use a list comprehension to create a transformed list + tests/integration_tests/security/migrate_roles_tests.py:50:13: PERF401 Use `list.extend` to create a transformed list - tests/integration_tests/security/migrate_roles_tests.py:50:13: PERF401 Use a list comprehension to create a transformed list
bokeh/bokeh (+1 -3 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL
- src/bokeh/plotting/_figure.py:479:17: PERF401 Use `list.extend` to create a transformed list - src/bokeh/plotting/_figure.py:485:17: PERF401 Use `list.extend` to create a transformed list - src/bokeh/server/contexts.py:310:17: PERF401 Use `list.extend` to create a transformed list + src/bokeh/server/contexts.py:310:17: PERF401 Use a list comprehension to create a transformed list
latchbio/latch (+3 -3 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
- src/latch/ldata/_transfer/upload.py:163:25: PERF401 Use `list.extend` to create a transformed list + src/latch/ldata/_transfer/upload.py:163:25: PERF401 Use a list comprehension to create a transformed list - src/latch/registry/utils.py:70:13: PERF401 Use `list.extend` to create a transformed list + src/latch/registry/utils.py:70:13: PERF401 Use a list comprehension to create a transformed list - src/latch_cli/services/cp/utils.py:54:9: PERF401 Use `list.extend` to create a transformed list + src/latch_cli/services/cp/utils.py:54:9: PERF401 Use a list comprehension to create a transformed list
Changes by rule (1 rules affected)
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
PERF401 | 38 | 18 | 20 | 0 | 0 |
@w0nder1ng Another fun edge case that PR #14369 doesn't currently fix. Scoping rules are different between list comprehensions and for loops: another example from pytorch if kwargs is None:
kwargs = {}
- impl_args = []
- for a in args:
- impl_args.append(_helper(a, map_fn))
+ impl_args = [_helper(a, map_fn) for a in args]
impl_kwargs = {}
for k in kwargs.keys():
impl_kwargs[k] = _helper(a, map_fn) the @w0nder1ng If you want a good test bed, just run the autofix on PyTorch (or another large codebase) and see after applying the fixes if any other ruff rule violations are immediately detected. |
Also another really minor nit is that it can duplicate comments. - for dtype in ["f16", "bf16"]:
- kernels.append(
- cls(
- aligned=True,
- dtype=dtype,
- sm_range=(80, SM[SM.index(80) + 1]),
- apply_dropout=False,
- preload_mmas=True,
- block_i=128,
- block_j=64,
- max_k=96,
- # Sm80 has a faster kernel for this case
- dispatch_cond="cc == 86 || cc == 89",
- )
+ # Sm80 has a faster kernel for this case
+ kernels.extend(
+ cls(
+ aligned=True,
+ dtype=dtype,
+ sm_range=(80, SM[SM.index(80) + 1]),
+ apply_dropout=False,
+ preload_mmas=True,
+ block_i=128,
+ block_j=64,
+ max_k=96,
+ # Sm80 has a faster kernel for this case
+ dispatch_cond="cc == 86 || cc == 89",
)
+ for dtype in ["f16", "bf16"]
+ ) See here. |
I think in this case, the lint shouldn't apply at all. If applying the fix breaks the code, then someone manually doing the same thing also wouldn't work. It should probably check that all references to the loop variable are inside the for loop before reporting the lint. I'll also try to fix the duplicate comments. I'm starting to see why this didn't have a fix before :) |
The comment duplication happened when the comment was inside the append body, it should be fixed now. |
// ``` | ||
let for_loop_target = checker | ||
.semantic() | ||
.lookup_symbol(id.as_str()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, resolve_name
returns None
for the for-loop target. Also, references to it outside of the for-loop are not included in its list of references; e.g.
def f():
result = []
for val in range(5):
result.append(val * 2)
print(val) # this reference is not included in the list of references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here is that we build the semantic model lazily as we traverse the AST for checking and it hasn't reached the point after the for loop yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charliermarsh any suggestions on how to best find all usages of target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a working version would look something like this:
let for_loop_read = checker.semantic().resolve_load(for_loop_target_name_expr);
let for_loop_binding = match for_loop_read {
ReadResult::Resolved(id) => checker.semantic().binding(id),
_ => unreachable!("for loop target must exist"),
};
Unfortunately, resolve_load
requires an &mut SemanticModel
and I don't see a way to get one from the checker. Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't look deeply, but often in those cases you need to make it a deferred rule, so it runs after the semantic model has been built (e.g., run it in deferred_for_loops
).
Could you take a look why https://github.com/apache/superset/blob/e528cb48c44543c14c1ac9a93528b147bcaecfde/scripts/benchmark_migration.py#L128 is no longer reported. I might have missed something obvious but it isn't clear to me why the diagnostic isn't reported anymore. |
I suspect the regressions are because of the |
Because the lambda arg here https://github.com/apache/superset/blob/e528cb48c44543c14c1ac9a93528b147bcaecfde/scripts/benchmark_migration.py#L131C28-L131C33 is shadowing the variable "model" in the for loop. In this specific case, it's probably okay since it's in a lambda arg, but in general it shouldn't apply the fix there. I suppose only references rvalues are problematic outside of the forloop, not lvalues (if all they do is immediately get overwritten after all). |
Okay hopefully last nit: reduction_axes: List[int] = []
- for i in range(input_rank):
- if i != axis:
- reduction_axes.append(i)
+ reduction_axes.extend(i for i in range(input_rank) if i != axis) Doesn't seem to like to hoist if there are any type_hints on the original |
I'll take a look |
Annotated assigns have a different statement type than normal assigns, and I wasn't handling it. The fix should work on type-annotated lists now. |
This one is still occurring sadly. Maybe because it references another loop lol? |
Despite this minor false positive, looks like all the other fixes worked well on the PyTorch codebase (for the torch/torchgen folders) pytorch/pytorch#140980. 👍 |
I'm still stuck on two things.
result = []; test = [] # deleting this whole line is wrong, but only deleting the binding statement also is wrong
# since it would leave the comment if there were no other statement
for ...
result = []
for i in range(10):
result.append(i+1)
print(i) # "fixing" this for loop is invalid because something relies on the loop variable, but the semantic model doesn't see this reference yet |
@diceroll123 @MichaReiser Any idea how to tackle these edge cases? |
@Skylion007 no, not from the top of my head and I haven't had time yet to look into it more closely |
I did not look into it any further after my last comment on #14551. This PR and my PR contain a bunch of overlapping changes (while addressing different bugs), and I don't want to step on any toes or spend more time on it if it's not going to be considered, so if one or the other is merged, or some kind of plan is made, we can go from there I suppose! No strong feelings on my part. 😄 |
One other place this autofix breaks (but ruff does appropriately catch it!). Apply it to for d in dbase, dexp: transforming that to a list comprehension likely breaks I will say that even it's current form, this autofix is extremely useful, and the encountered edge cases are very rare in real code, and usually caught by other checks. |
This unaccepted rule would likely have to tackle many of the same problems. Did they solve this edge case? #11769 |
I think that's all of the edge cases, except for the one involving the loop variable. I'd be happy to try to convert the rule into a binding rule, but I think that might be outside of the scope of this PR. |
Even after moving the lint into def f():
result = []
for i in range(5):
# ^ running checker.semantic().resolve_name() on this ExprName returns None.
result.append(i * 2)
# ^ when I run binding.statement() on this binding,
# it returns the "i = 1" statement, even though that is clearly wrong
i = 1
# ^ lookup_symbol returns this i
print(i) Am I missing something here, or is this behavior incorrect? See the last commit for the code I used to get this result. |
That might be because |
let binding_string = checker | ||
.locator() | ||
.slice(for_loop_target.statement(checker.semantic()).unwrap()); | ||
dbg!(binding_string); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbg!(binding_string); |
This is because the semantic model now captures the state after running the entire program. It's complete, and running let last_target_binding = checker
.semantic()
.lookup_symbol(id)
.expect("For loop to exist");
let mut bindings = [last_target_binding].into_iter().chain(
checker
.semantic()
.shadowed_bindings(checker.semantic().scope_id, last_target_binding)
.filter_map(|shadowed| shadowed.same_scope().then_some(shadowed.shadowed_id())),
);
let target_binding = bindings
.find_map(|binding_id| {
let binding = checker.semantic().binding(binding_id);
if dbg!(binding.statement(checker.semantic())).and_then(Stmt::as_for_stmt)
== Some(for_stmt)
{
Some(binding)
} else {
None
}
})
.expect("for target binding must exist");
drop(bindings); There might be a better way to detect if the binding belongs to the target that e.g. avoids picking up named expressions in the iterator part. |
}) | ||
.expect("for target binding must exist"); | ||
let target_binding = checker.semantic().binding(target_binding_id); | ||
// TODO: should this be a HashMap? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this a vector, but if there's enough references, it might make more sense for it to be a HashMap instead. What do you think?
It seems we're now skipping over the two Is this intentional? Should we change the implementation not to provide a fix but still the usage? Edit: I think this is intentional because |
let shadowed_references: Vec<_> = checker | ||
.semantic() | ||
.shadowed_bindings(checker.semantic().scope_id, target_binding_id) | ||
.flat_map(|shadowed| { | ||
let shadowed_binding = checker.semantic().binding(shadowed.shadowed_id()); | ||
shadowed_binding.references() | ||
}) | ||
.collect(); | ||
|
||
drop(bindings); | ||
|
||
if target_binding | ||
.references() | ||
.filter(|r_id| !shadowed_references.contains(r_id)) | ||
.map(|reference| checker.semantic().reference(reference)) | ||
.any(|r| !for_stmt.range.contains_range(r.range())) | ||
{ | ||
return; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this code is doing. Would you mind adding a comment explaining why looking at the shadowed_references
is necessary?
It would be shadowed if there weren't a return. A smaller example: def f(switch):
i = 1
# ^ this is the `i` which actually gets returned
if switch:
items = [1, 2, 3, 4]
result = []
# v if it weren't for the return, this `i` would be the binding
for i in items:
result.append(i + 1)
# v unconditional return means that the binding is not actually relevant
return result
# v fix thinks this `i` is the loop variable
return [i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uff, this rule is complicated. Thanks for pushing through it
I made it through the detection logic and pushed a few smaller simplifications. I'm now about to review the fix and noticed the parse
call. I suggest to either not provide a fix in the multi assignment case (it's not that common) or to use the SimpleTokenizer
to avoid using the full parser here. It might even simplify some of your code because it allows e.g. searching for the ;
let binding_text = locator | ||
.slice(locator.full_lines_range(binding_stmt_range)) | ||
.trim_whitespace_start(); | ||
let tokens = parse(binding_text, Mode::Module).map_err(|err| { | ||
anyhow!( | ||
"Failed to tokenize binding statement: {err} {}", | ||
binding_text | ||
) | ||
})?; | ||
tokens | ||
.tokens() | ||
.iter() | ||
.any(|token| matches!(token.kind(), TokenKind::Semi)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards not providing a fix in this case to avoid the extra complexity.
Either way, we should avoid parsing here. We can use the SimpleTokenizer
if we need tokenization and can't just use the AST
I've removed the tokenization without changing the rest of the code. The code now iterates over the characters directly because I was having trouble with the Is it worth also removing the code to handle multiple statements to simplify the logic? |
I pushed a change that uses the But I think we're good now and can land this change. Thank you for working on this very involved. rule |
Beautiful job landing this rule @w0nder1ng . Any plans on tackling the dict comprehension version since the logic is very similar? |
Sure, when I have the opportunity |
This should fix #14362. This new fix currently deletes lines like this:
Is there a convenient way to get every statement within a given
TextRange
to detect when this is happening?