Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Scanner]: Introduce a Scanner class to handle file logic before checkers #63

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

c0nst4ntin
Copy link
Collaborator

What:

  • Bug Fix
  • New Feature
  • Refactor

Description:

Note

This is just a first draft! I am looking for active feedback on this, so we can develop this feature together. So before you open a new pull request with a slightly different version, feel free to write a review or suggestion here. I am more than open to accepting your suggestions on this topic!

This pull request adds a new Scanner class as introduced in #22 which handles the file scanning and then calls the checkers for each class.

In #22 it was proposed to also create the reflection in the scanner, in order to only have to create one reflection, which we then can process in multiple classes. However, at the moment we have one checker that checks the file's path and name and one checker that uses reflection.

I already had to add a function supports() in the Checker Interface to know when to call which checker.

Important

At the moment the tests obviously fail, as I changed the interfaces and checkers so the initialization and tests are all messed up. Before fixing the test suite I would love to get your feedback to further develop this feature

Related:

@loki495
Copy link
Contributor

loki495 commented Jan 10, 2025

This is looking good! I like that we're now iterating through the filesystem just once, then passing the paths along to the checkers if they can handle it.

I had some ideas in this comment and this one. In particular, changing the FileSystemChecker and ClassChecker to implementations of a scanner interface that will only worry about getting a list of words to be checked later, basically decoupling making the word list from the actual spellchecking.

That would also allow moving the filtering of ignored words (and maybe the caching) out of the InMemorySpellchecker (unless I'm missing the point of it, which is quite possible). Currently that filtering happens each time the spellchecker->check is called.

Then either a main scanner class or the kernel would coalesce the issues from all implementations of the scanners into a single [ word => [ path, line, column]] array so that:

  • we can filter out:
  • repeated words don't even hit the cache (one InMemorySpellchecker->check call per allowed word, even if it appears in many files)
  • we're setting things up to be able to create a Fixer class that could go through all the occurrences of each word.

Peck would then work vaguely in these stages:

  • kernel registers scanners/checkers/output/fixer classes
  • command calls the kernel->getWords or kernel->scanner->getWords to get filtered list of words to check
  • command loops through words
    • sends them to the InMemorySpellchecker
    • if result is a Misspelling, pass it to the registered output (or fixer) class
  • if no misspellings found at all, output success

This would also decouple fixing/output from the kernel/checker/scanner,.

What to you think?

Copy link
Contributor

@benjam-es benjam-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need updating to the latest version of the main branch as a base.

I am not quite following the flow if this at the moment. Does this do any caching?

The idea for a scanner for me is that a checker would use a scanner, but you would have a couple of scanners (class and file system) and many checkers (file system, class, class method, class constant etc). With that in mind, the scanner would run first and have a cache, so the classes are scanned once, and then the checkers run using those cached words to spell check.

This then lends itself to future expansion where we could filter unique on the scanner results to use 'aspell' less, and we could also look at stats on repeated word count which can be used for suggestions for ignore words or preset words

@benjam-es
Copy link
Contributor

I wonder if the package has changes a lot since this. Perhaps switch to an issue and revisit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants