Expand ameba's functionality with semantic information #513

nobodywasishere · 2024-11-24T17:44:19Z

Currently ameba uses only the stdlib parser in order to provide an AST for the rules to operate over. This is useful, but limits the kinds of things that ameba can be used for. This is an issue to discuss if semantic information to ameba, what that would look like, and what are the downsides / tradeoffs to different approaches. For instance, if we added a top level semantic pass:

That would require operating at the shard-level instead of individual files (at least for semantic-level rules)
- Opens up to more semantic compiler errors, but I think the parser rules and top-level semantic rules should operate independently
- Need to figure out how this would work with linting while typing for the vscode extension, as there may be multiple files open with changes in them, and we'd want ameba to operate on those instead of the ones saved on disk
- How would the top-level semantic interact with the Excluded in .ameba.yml?
One of amebas benefits right now is how fast it is, we should profile this to see how worse it is
- We'd be going from parsing singular files to parsing all files in stdlib/lib/current shard + semantic pass
What would we be able to do with a top-level semantic that we can't now?
- Namespace collision detection: Incorrect scope after including **::Serializable crystal-lang/crystal#6194
- (Maybe) type unsafe equality if both sides of comparison are typed
- (Maybe) type unsafe hash key types if things are explicitly typed (see crystal-lang/crystal#8893)
- Typing/{Instance,Class}VarTypeRestriction - currently running into potential issues where an instance var could be typed somewhere, but just not in the current file, leading to a failure
- Know what calls are to methods or to macros, for better Lint/UnusedVariable and others

The text was updated successfully, but these errors were encountered:

nobodywasishere · 2024-11-24T19:05:44Z

Other rule ideas possible with top level semantic (not saying these should be done, but that they could be):

Lint/NoReopen - disallow reopening of classes (or maybe monkey-patching stdlib classes/structs/etc)
Documentation/OverwrittenDocs - warn when adding documentation to a class/module/lib in multiple places which ones will be ignored
Documentation/UselessShowdocDirective - warn when a :showdoc: directive doesn't work (once those are added)

nobodywasishere · 2024-12-29T19:03:04Z

Need to figure out how this would work with linting while typing for the vscode extension, as there may be multiple files open with changes in them, and we'd want ameba to operate on those instead of the ones saved on disk

This will just need to be left up to an LSP using ameba, and not something we should worry about.

How would the top-level semantic interact with the Excluded in .ameba.yml?

Any issues found in a file that's excluded should be ignored, like how it is currently (except for Lint/SyntaxError and Lint/SemanticError). Additionally, it should automatically exclude any file not in the current directory, or inside 'lib/', to prevent errors from stdlib or shards popping up. (Can't just limit to src/ as not all projects limit crystal code to src/).

For another rule idea, I run into this issue a lot:

class MessageHandler
  def on_request(msg)
    # generic message handling
  end
  
  def on_request(msg : SpecificRequestType)
    # specific message handling
  end
end

If I accidentally mistype SpecificRequestType, I won't know I got it wrong until I actually run the application. It would be really useful to have a Lint/UnknownType rule that checks that all explicitly typed things actually exist, and this should only require a top-level semantic.

One of the biggest issues I see with Crystal currently is that no semantic analysis is done to code that isn't used. This means when developing library code, unless you write good specs (and know you have to write good specs), you can have semantic issues in your code and have no idea until someone else goes to use it. It would be useful to have some level of semantic analysis for methods, such that explicitly typed things can be checked to make sure their methods actually exist. For example:

def hello(a : String)
  a.method_doesnt_exist
  # ^^^^^^^^^^^^^^^^^^^ error: `String` doesn't have method `method_doesnt_exist`
end

If the rule runs into a method that isn't typed, we don't do anything beyond that. The only types this should infer is from literals or methods whose return types are specified. It should be fast, not exhaustive.

nobodywasishere · 2025-01-20T00:50:25Z

I've started messing around with integrating ameba into larimar (https://github.com/nobodywasishere/larimar), a language server that I've been working on since last June.

This allows for not only running lints while typing (via the language server protocol in an editor-independent way), but also for re-using functionality ameba has developed over time. For example, I re-used the ScopeVisitor to provide inlay hints for when a local var is "declared":

This can be made better (provide actual type resolved to instead of just ?, use semantic analysis to be more accurate), but I think this is really cool as a starting point. So far I find this really promising, and could allow the pathway for a single binary that provides all tooling capabilities, similar to Ruby LSP.

As more time is spent investigating / experimenting with semantic analysis, I think it would be useful for larimar and ameba to share the same / similar visitors and mechanisms. As mentioned above, the visitor / rule that checks for unknown methods could be the same visitor that larimar uses for providing autocomplete.

nobodywasishere · 2025-01-24T13:26:22Z

Here's a general approach I think may work:

Semantic rules still inherit from Base, but have a method that returns true indicating they're a semantic rule
- This can probably be done with overloading
All syntactic rules are run first
If the syntax rule fails or the --semantic flag isn't passed, semantic rules aren't run
- These should probably be marked as "skipped" in the formatter
As semantic requires an entrypoint, there are a few options:
- Use a CLI flag to indicate entrypoint(s)
- Allow the user to define the entrypoint(s) in the .ameba.yml
```
Entrypoints:
  - repo1:
      main: monorepo1/main.cr
      flags:
        - my_flag
  - repo2:
      main: monorepo2/main.cr
```
- Use the targets defined in the .shard.yml (with custom flags option as above)
  - This would be ideal for me (and if flags would get official support) so it can be used by larimar / other tooling as well
- Use everything in current folder (DEFAULT_GLOBS w/o ECR)
For each entrypoint:
- Do a top-level semantic once
- Pass in compiler result alongside original source when calling rule.test(source, result)
- Also allow rules to operate on just the compiler result (rule.test(result))
- Different semantic rules can then be implemented differently, whether they iterate over the generated tree, or they iterate over the parser tree for the source directly
- Semantic issues found should note which entrypoint they came from

One basic example I wrote up quickly for Lint/UnknownType. Ideally we'd have a SemanticScopeVisitor that would handle getting the current type / self, so that this can handle relative types, but this is just a proof of concept.

Some notes:

Including the full compiler instead of just the parser significantly increases ameba build times
For some reason, running program.top_level_semantic completely breaks the compilers ability to find files in the CRYSTAL_PATH. Need to spend more time looking into this.
This is only the top level semantic, so it's useful for determining the available types and methods, but can be limited in what it can provide (re: type safe equality)
- It may be worth having an additional full semantic pass with its own rules, but I'd want to get this level of semantic rules implemented first

Thoughts? cc @Sija @veelenga @straight-shoota @devnote-dev

nobodywasishere · 2025-01-25T15:39:44Z

I figured out the path issues I was running into - needed to pass in environment variables from the current Crystal installation.

With some work implementing the above, I have a basic Lint/UnknownType rule implemented:

Branch here:
https://github.com/nobodywasishere/ameba/tree/nobody/semantic-experiment

veelenga · 2025-01-27T16:04:57Z

@nobodywasishere looks promising. Could you please draft a PR, so we can move from there.

nobodywasishere · 2025-01-27T17:42:07Z

Sure thing! It won't be perfect, and the approach I'm taking may not be the best, but it's at least a starting point.

nobodywasishere · 2025-01-28T17:52:26Z

crystal-lang/crystal#14613 (comment)

Another rule that could be implemented is Lint/Deprecated, may require a full semantic though.

Sija added question feature labels Nov 25, 2024

nobodywasishere mentioned this issue Jan 28, 2025

Initial proof of concept for expanding ameba's functionality with semantic information #550

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand ameba's functionality with semantic information #513

Expand ameba's functionality with semantic information #513

nobodywasishere commented Nov 24, 2024

nobodywasishere commented Nov 24, 2024

nobodywasishere commented Dec 29, 2024 •

edited

Loading

nobodywasishere commented Jan 20, 2025

nobodywasishere commented Jan 24, 2025 •

edited

Loading

nobodywasishere commented Jan 25, 2025

veelenga commented Jan 27, 2025

nobodywasishere commented Jan 27, 2025

nobodywasishere commented Jan 28, 2025

Expand ameba's functionality with semantic information #513

Expand ameba's functionality with semantic information #513

Comments

nobodywasishere commented Nov 24, 2024

nobodywasishere commented Nov 24, 2024

nobodywasishere commented Dec 29, 2024 • edited Loading

nobodywasishere commented Jan 20, 2025

nobodywasishere commented Jan 24, 2025 • edited Loading

nobodywasishere commented Jan 25, 2025

veelenga commented Jan 27, 2025

nobodywasishere commented Jan 27, 2025

nobodywasishere commented Jan 28, 2025

nobodywasishere commented Dec 29, 2024 •

edited

Loading

nobodywasishere commented Jan 24, 2025 •

edited

Loading