Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Affix class cleanup #1329

Merged
merged 24 commits into from
Jul 30, 2022
Merged

Affix class cleanup #1329

merged 24 commits into from
Jul 30, 2022

Conversation

ampli
Copy link
Member

@ampli ampli commented Jul 21, 2022

My new "amy" modifications are based on 2 other PRs, and it may be a good idea not to combine them all into a big PR.
This is the first one.
After it is applied, I will make sure there are no conflicts and send the second one.

@ampli
Copy link
Member Author

ampli commented Jul 22, 2022

Please don't apply yet. I just found an old bug in the handling of LPUNC.
It seems the problem happened because I once removed the punctuation subscripts of the affix classes (besides UNITS) in en/4.0.affix, and then a code rot has happened so it now cannot handle subscript marks on LPUNC anymore (and they exist in some other languages).
I will fix that.

@linas
Copy link
Member

linas commented Jul 24, 2022

OK.

@ampli
Copy link
Member Author

ampli commented Jul 24, 2022

More in issue #1330.

@ampli
Copy link
Member Author

ampli commented Jul 26, 2022

I added fixes according to issue #1330.
Now it may be applied.

---> The strippable affixes for lt still need attention.
The strippable affixes for any, ady, and amy will be handled in an upcoming PR.
A fix for strippable subscripted affixes will appear in the next PR.
I'm also preparing a PR for CAPSTART.
Finally, I would like to send a cleanup PR for assorted things.

@ampli ampli force-pushed the affix-class branch 2 times, most recently from 8d3c186 to 89d292d Compare July 26, 2022 15:26
@ampli
Copy link
Member Author

ampli commented Jul 26, 2022

A fix for strippable subscripted affixes will appear in the next PR.

It turned out it is already included in this PR:

  • LPUNC,MPUNCT: Fix subscript mishandling

@ampli
Copy link
Member Author

ampli commented Jul 26, 2022

Sorry again, please don't apply yet!
This patch causes a significant slowness, and I also found a few parse differences (in en) that I still need to find their reason.
I need to find out which changes in tokenize.c cause the slowness and the parse differences.

@ampli
Copy link
Member Author

ampli commented Jul 26, 2022

I need to find out which changes in tokenize.c cause the slowness and the parse differences.

This was a bug in is_afdict_punc(): My fixed comparison was more buggy than the original.
This causes both slowness and inadequate tokenization.

I fixed it and made extensive checks so now finally it may be applied!

@ampli
Copy link
Member Author

ampli commented Jul 27, 2022

Now the affix file may have subscripted affixes again. They are slightly more efficient because their dict lookup is done with their subscript (I guess the possible speedup is small because the pruning will most probably remove the extra disjuncts that may be fetched by an unsubscripted affix).

@linas linas merged commit 41c0d57 into opencog:master Jul 30, 2022
@linas
Copy link
Member

linas commented Jul 30, 2022

OK, thanks

@ampli ampli deleted the affix-class branch August 16, 2022 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants