Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split checkers up #38

Open
GrandadEvans opened this issue Jan 2, 2025 · 4 comments
Open

Split checkers up #38

GrandadEvans opened this issue Jan 2, 2025 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@GrandadEvans
Copy link
Contributor

I am wanting to write the code for a system that will have an Interface, and mean each "checker" has it's own class (Open/Closed principle).
The peck.json file can have

{
  "checkers": [
    "file-names",
    "method-names",
    "class-names",
    "comments",
    "etc"
  ]
}

and then the checkers can be iterated through, a list of words compiled, and then checked

Class BaseChacker {
    public function __construct(
        public $wordsToCheck = [],
        public $checkersBasePath = './src/Checkers/',
    ) {
        foreach($checkers as $checker) {
            if ($this->ableToImportCheckerThroughResolution($checker)) {
                $this->addCheckersWordsToWordList();
            }
        }
    }

    private function ableToImportCheckerThroughResolution(string $checker): array {
        $checkerFileName = $checker->checkerFilename;
        $path = "{$this->checkersBasePath}{$checker}";

        if (!file_exists($path)) {
            throw new CheckerDoesNotExistException($checker);
        }
    }
}

I want to see the existing PRs I have through first, so that I can get some feedback, and get an up to date codebase, and then I know whether the changes I have so far can be implemented into the new changes.
Saying that I'll probably start work on it anyway

@GrandadEvans
Copy link
Contributor Author

The return type of array on the private method should be bool, and it should return true after it checks for the file existence, but I was making it up on the spot, and realised I had 3 minutes to go to get today's DuoLingo done (super-professional)

@c0nst4ntin
Copy link
Collaborator

I am also a big fan of splitting this up. It would allow for a better separation of concerns and fewer conflicts with multiple pull requests. We just need to make sure that we don't create a new reflection class instance for each checker. 

I believe this was the proposal in #22 with the suggested scanner.

@loki495
Copy link
Contributor

loki495 commented Jan 7, 2025

I also like the idea of splitting up scanners, reflecting all the classes (once), and compiling the list of words before doing any of the actual spellchecking. Doing it in "stages" (i.e. scanning -> spellchecking -> output) would keep things separate and cleaner.

While scanning we could also handle trimming down the list of words that the checkers receive. For example:

  • we could process the ignored words and directories from the config at this stage, not in the checkers
  • we could ignore short or common words ("the", "in", "for","as" etc.) that sometimes are part of method or variable names
  • currently we're sometimes passing long lines to the InMemorySpellchecker (and the cache). A "checkIfMessageIsValid" method, will be passed as "check if a message is valid". A DocComment line gets passed as a whole. This makes the cache less effecive, plus many words like "param", "return", "object", "string" are not being filtered out. Each word should probably be added to the word list individually, and filtered out if necessary.
  • I also like the idea of the technical dictionary in Feature: Technical Dictionary #43, this would also help with the last point
  • with the new cache its not such a big deal anymore, at leat its not being processed by ASpell each time any more, but the same word will be passed multiple times to the InMemorySpellchecker and the cached file will be read from disk and unserialized each time (this could use it's own optimization, maybe an in-memory cache within the cache?).

And maybe we could even keep track of all occurrences (line and column) of each word in each file? It would eat up progressively more memory the larger the code base gets, but this would make a future "fixer" a trivial matter. Maybe the original word list gets discarded once the spellchecking stage is over, just keeping the misspellings info, or maybe some other level of caching might be needed.

@loki495
Copy link
Contributor

loki495 commented Jan 7, 2025

One more 5am thought: Maybe part of the issue is that the current naming is confusing? The "checker" interface could be the scanner, while the InMemorySpellChecker would be the actual singulat checker (with maybe a config setting to choose between ASpell or other?), so maybe we just need to clean up and consolidate the "checker" -> "scanner" interface? Make all the "checkers" just get the word list ("namesToCheck", and maybe line/column info), and pass it back to the kernel, do the word splitting and filtering, then pass it to the actual checker which handle the cache and return the misspellings and suggestions, then the kernel or command passes the list to either an output or a fixer class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants