Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand ameba's functionality with semantic information #513

Open
nobodywasishere opened this issue Nov 24, 2024 · 8 comments
Open

Expand ameba's functionality with semantic information #513

nobodywasishere opened this issue Nov 24, 2024 · 8 comments

Comments

@nobodywasishere
Copy link
Contributor

Currently ameba uses only the stdlib parser in order to provide an AST for the rules to operate over. This is useful, but limits the kinds of things that ameba can be used for. This is an issue to discuss if semantic information to ameba, what that would look like, and what are the downsides / tradeoffs to different approaches. For instance, if we added a top level semantic pass:

  • That would require operating at the shard-level instead of individual files (at least for semantic-level rules)
    • Opens up to more semantic compiler errors, but I think the parser rules and top-level semantic rules should operate independently
    • Need to figure out how this would work with linting while typing for the vscode extension, as there may be multiple files open with changes in them, and we'd want ameba to operate on those instead of the ones saved on disk
    • How would the top-level semantic interact with the Excluded in .ameba.yml?
  • One of amebas benefits right now is how fast it is, we should profile this to see how worse it is
    • We'd be going from parsing singular files to parsing all files in stdlib/lib/current shard + semantic pass
  • What would we be able to do with a top-level semantic that we can't now?
    • Namespace collision detection: Incorrect scope after including **::Serializable crystal-lang/crystal#6194
    • (Maybe) type unsafe equality if both sides of comparison are typed
    • (Maybe) type unsafe hash key types if things are explicitly typed (see crystal-lang/crystal#8893)
    • Typing/{Instance,Class}VarTypeRestriction - currently running into potential issues where an instance var could be typed somewhere, but just not in the current file, leading to a failure
    • Know what calls are to methods or to macros, for better Lint/UnusedVariable and others
@nobodywasishere
Copy link
Contributor Author

Other rule ideas possible with top level semantic (not saying these should be done, but that they could be):

  • Lint/NoReopen - disallow reopening of classes (or maybe monkey-patching stdlib classes/structs/etc)
  • Documentation/OverwrittenDocs - warn when adding documentation to a class/module/lib in multiple places which ones will be ignored
  • Documentation/UselessShowdocDirective - warn when a :showdoc: directive doesn't work (once those are added)

@nobodywasishere
Copy link
Contributor Author

nobodywasishere commented Dec 29, 2024

Need to figure out how this would work with linting while typing for the vscode extension, as there may be multiple files open with changes in them, and we'd want ameba to operate on those instead of the ones saved on disk

This will just need to be left up to an LSP using ameba, and not something we should worry about.

How would the top-level semantic interact with the Excluded in .ameba.yml?

Any issues found in a file that's excluded should be ignored, like how it is currently (except for Lint/SyntaxError and Lint/SemanticError). Additionally, it should automatically exclude any file not in the current directory, or inside 'lib/', to prevent errors from stdlib or shards popping up. (Can't just limit to src/ as not all projects limit crystal code to src/).


For another rule idea, I run into this issue a lot:

class MessageHandler
  def on_request(msg)
    # generic message handling
  end
  
  def on_request(msg : SpecificRequestType)
    # specific message handling
  end
end

If I accidentally mistype SpecificRequestType, I won't know I got it wrong until I actually run the application. It would be really useful to have a Lint/UnknownType rule that checks that all explicitly typed things actually exist, and this should only require a top-level semantic.

One of the biggest issues I see with Crystal currently is that no semantic analysis is done to code that isn't used. This means when developing library code, unless you write good specs (and know you have to write good specs), you can have semantic issues in your code and have no idea until someone else goes to use it. It would be useful to have some level of semantic analysis for methods, such that explicitly typed things can be checked to make sure their methods actually exist. For example:

def hello(a : String)
  a.method_doesnt_exist
  # ^^^^^^^^^^^^^^^^^^^ error: `String` doesn't have method `method_doesnt_exist`
end

If the rule runs into a method that isn't typed, we don't do anything beyond that. The only types this should infer is from literals or methods whose return types are specified. It should be fast, not exhaustive.

@nobodywasishere
Copy link
Contributor Author

I've started messing around with integrating ameba into larimar (https://github.com/nobodywasishere/larimar), a language server that I've been working on since last June.

This allows for not only running lints while typing (via the language server protocol in an editor-independent way), but also for re-using functionality ameba has developed over time. For example, I re-used the ScopeVisitor to provide inlay hints for when a local var is "declared":

Image

This can be made better (provide actual type resolved to instead of just ?, use semantic analysis to be more accurate), but I think this is really cool as a starting point. So far I find this really promising, and could allow the pathway for a single binary that provides all tooling capabilities, similar to Ruby LSP.

As more time is spent investigating / experimenting with semantic analysis, I think it would be useful for larimar and ameba to share the same / similar visitors and mechanisms. As mentioned above, the visitor / rule that checks for unknown methods could be the same visitor that larimar uses for providing autocomplete.

@nobodywasishere
Copy link
Contributor Author

nobodywasishere commented Jan 24, 2025

Here's a general approach I think may work:

  • Semantic rules still inherit from Base, but have a method that returns true indicating they're a semantic rule
    • This can probably be done with overloading
  • All syntactic rules are run first
  • If the syntax rule fails or the --semantic flag isn't passed, semantic rules aren't run
    • These should probably be marked as "skipped" in the formatter
  • As semantic requires an entrypoint, there are a few options:
    • Use a CLI flag to indicate entrypoint(s)
    • Allow the user to define the entrypoint(s) in the .ameba.yml
      Entrypoints:
        - repo1:
            main: monorepo1/main.cr
            flags:
              - my_flag
        - repo2:
            main: monorepo2/main.cr
    • Use the targets defined in the .shard.yml (with custom flags option as above)
      • This would be ideal for me (and if flags would get official support) so it can be used by larimar / other tooling as well
    • Use everything in current folder (DEFAULT_GLOBS w/o ECR)
  • For each entrypoint:
    • Do a top-level semantic once
    • Pass in compiler result alongside original source when calling rule.test(source, result)
    • Also allow rules to operate on just the compiler result (rule.test(result))
    • Different semantic rules can then be implemented differently, whether they iterate over the generated tree, or they iterate over the parser tree for the source directly
    • Semantic issues found should note which entrypoint they came from

One basic example I wrote up quickly for Lint/UnknownType. Ideally we'd have a SemanticScopeVisitor that would handle getting the current type / self, so that this can handle relative types, but this is just a proof of concept.

Image

Some notes:

  • Including the full compiler instead of just the parser significantly increases ameba build times
  • For some reason, running program.top_level_semantic completely breaks the compilers ability to find files in the CRYSTAL_PATH. Need to spend more time looking into this.
  • This is only the top level semantic, so it's useful for determining the available types and methods, but can be limited in what it can provide (re: type safe equality)
    • It may be worth having an additional full semantic pass with its own rules, but I'd want to get this level of semantic rules implemented first

Thoughts? cc @Sija @veelenga @straight-shoota @devnote-dev

@nobodywasishere
Copy link
Contributor Author

I figured out the path issues I was running into - needed to pass in environment variables from the current Crystal installation.

With some work implementing the above, I have a basic Lint/UnknownType rule implemented:

Image

Branch here:
https://github.com/nobodywasishere/ameba/tree/nobody/semantic-experiment

@veelenga
Copy link
Member

@nobodywasishere looks promising. Could you please draft a PR, so we can move from there.

@nobodywasishere
Copy link
Contributor Author

Sure thing! It won't be perfect, and the approach I'm taking may not be the best, but it's at least a starting point.

@nobodywasishere
Copy link
Contributor Author

crystal-lang/crystal#14613 (comment)

Another rule that could be implemented is Lint/Deprecated, may require a full semantic though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants