-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
score_selector
should find the innermost match.
#2152
Comments
This behavior also happens on Windows 7/10 x64 and Debian 9 x64. |
For reference, here's an example Python implementation of what I believe to be the current behavior: def score_selector_leftmost(scope, selector):
scope = enumerate(scope.split(' '), start=1)
selector = selector.split(' ')
score = 0
for i, part in scope:
if len(selector) == 0: break
if part.startswith(selector[0]):
selector = selector[1:]
score += 8 ** i
return score And the preferred behavior: def score_selector_rightmost(scope, selector):
scope = enumerate(scope.split(' '), start=1)
selector = selector.split(' ')
scope = reversed(list(scope))
selector = list(reversed(selector))
score = 0
for i, part in scope:
if len(selector) == 0: break
if part.startswith(selector[0]):
selector = selector[1:]
score += 8 ** i
return score Tests: TESTS = [
('function class function punctuation', 'function punctuation'),
('function class function punctuation', 'class punctuation'),
]
for scope, selector in TESTS:
print(score_selector_leftmost(scope, selector), score_selector_rightmost(scope, selector))
I assume that the real implementation is C++, but I hope that the change may be as simple as twiddling some indexes. |
To highlight the difference in a simpler/quicker way - no need to create color schemes etc, just execute the following in the ST console: >>> sublime.score_selector('source.json meta.structure.dictionary.json meta.structure.dictionary.value.json meta.structure.array.json meta.structure.dictionary.json punctuation.section.dictionary.begin.json', 'meta.structure.dictionary punctuation.section')
524480
>>> sublime.score_selector('source.json meta.structure.dictionary.json meta.structure.dictionary.value.json meta.structure.array.json meta.structure.dictionary.json punctuation.section.dictionary.begin.json', 'meta.structure.array punctuation.section')
536576 |
Any news on this one? Fixing this will benefit many languages(C, C++, JavaScript, MQL4 & 5) as you'll be able to give different colors to the punctuation and for example, differentiate an array literal vs access, object literal vs simple block, etc... |
@wbond Is the scoring function inside the executable or is it a default plugin that I can fix? |
It's not anywhere accessible, such as a default package, and it's used in core features like syntax highlighting, so it must be in the executable. |
@borela The scoring function as well as other API internal functions if I'm not mistaken is living in plugin_host.exe. I've made some little analysis about that particular function in this thread in order to figure out how it works behind the curtains but I haven't been able yet to figure out the underlying algorithm. |
Why would you assume in |
Just check the plugin_host strings, ie: strings plugin_host.exe . And then grep for api_ . Also, attach your debugger to plugin_host process and look around... You'll see the name of those functions correspond to the sublime_api bultin module |
Just because the names are there doesn't imply the algorithm being defined there. Or rather, of course there need to be strings for the functions exposed by the |
My main goal right now is to understand how extract_scope works, by doing so I'll be able to duplicate the toggle_comment command 1:1 on a scintilla widget. Reason why I felt knowing how score_selector works was important is mainly because I felt that function could be helpful to understand extract_scope. That said, being able to know exactly where these api_* core functions are defined exactly using the debugger would be amazing, as right now when trying to reverse specific sublime functions Im using some sort of statitical analysis inference based on datasets... Which is pretty much a really slow way to reverse thse simple algorithms 😥. If you have relevant information, ideas, thoughts or any hypothesis about all of this please let me know and I'll check it out when I'm back home. |
After some more experimentation, this seems to be the algorithm that Sublime uses for def split_scope(scope):
return [
re.split(r'\.', part)
for part in re.split(r'\s+', scope)
if part
]
def score_selector(scope, selector):
return _score_selector_leftmost(
split_scope(scope),
split_scope(selector)
)
def _score_selector_leftmost(scope, selector):
score = 0
scope_count = len(scope)
selector_count = len(selector)
scope_index = 0
selector_index = 0
if selector_count == 0:
return 1
while True:
if selector_index >= selector_count:
return score
elif scope_index >= scope_count:
return 0
part_score = _score_part(scope[scope_index], selector[selector_index])
if part_score > 0:
score += part_score << (3 * min(scope_index+1, 20))
score &= ((1 << 64) - 1) # Keep 64 bits
selector_index += 1
scope_index += 1
def _score_part(scope_part, selector_part):
scope_len = len(scope_part)
selector_len = len(selector_part)
score = 0
i = 0
while True:
if i >= selector_len:
return min(score, 7)
elif i >= scope_len or scope_part[i] != selector_part[i]:
return 0
else:
score += 1
i += 1 Written in a C-ish style (sorry). Notes:
|
@Thom1729 You're the man here! Your post/experiment is quite awesome, hats off... few weeks ago I'd tried to reverse engineer that one and failed miserably... you can read about it here. When I find a moment I'll compare it against Sublime's but I believe you've already done so... Btw, did you know about the existence of pyblime, guys like you who're curious about how things works behind the curtains would fit extremely well in our growing at fast pace team :D (take my marketing sentece with a little grain of salt). Anyway, thank you very much for all your reserach! high quality content there ;) |
FYI, I've updated the code to handle overflow in the same way as Sublime. Not really sure why it works that way, but it seems to be exact. |
@Thom1729 Hats off man... pretty impressive you've reversed this algo... May I ask you something out of curious? Which approach did you take to be able to figure out the original behaviour? Definitely the approach I'd taken was not good :) |
I already had the impression from somewhere that the score was a sum of bits, where each bit represented a successful match. By observation, It's a bit odd that the first group starts at 8. The obvious choice is 1, the first bit. But then an empty selector would score zero, which is used to mean failure, even though an empty selector logically should match everything. It seems to be a special case that the empty selector scores 1. It gets an entire group to itself, even though it doesn't need it -- two bits are wasted. Anyway, I knew that the system failed in some way when there were too many nested scopes, which sounded like an overflow issue. To investigate this, I tested If the empty selector handling only used one bit, then we could use the other 63 bits to fit 21 groups without issue. The 22nd would still break. Also, the score wouldn't look so friendly in octal because the three-bit groups wouldn't line up properly. The algorithm itself is just the result of writing this down in Python while thinking about C. Often, when there's weird behavior in Sublime, you can find the answer by thinking about the most obvious way to write a C program that would reproduce the known behavior. As often as not, that program handles unknown cases the same as Sublime. This way of thinking was especially useful when I wrote SublimeSyntaxParser. |
@Thom1729 Wow, love it... that material deserves to live in a nice blog article... it's the type of articles I love reading actually... the approach you've taken driven by your hypothesis has been definitely much more optimal than mine based on gathering data and observation, which leaded me nowhere. Also, I liked specially this part:
Which makes total sense... in fact, when I was trying to figure out the behaviour of insert/erase/replace I used a similar approach, as my "starting point" was just assuming the maths of these algorithms should be something extremely simple as they behave extremely fast, therefore the operations involved shouldn't be complex at all, thinking that way really helped that time to find equivalent functions. Talking of which, you seem to be well versed on Sublime's internal behaviour as well as parsing & scoping... sorry to bother with more questions but... maybe you know something about #2691 ? Being able to figure out how |
Circling back to the original topic, if the C implementation of def _score_selector_rightmost(scope, selector):
score = 0
scope_count = len(scope)
selector_count = len(selector)
scope_index = scope_count - 1
selector_index = selector_count - 1
if selector_count == 0:
return 1
while True:
if selector_index < 0:
return score
elif scope_index < 0:
return 0
part_score = _score_part(scope[scope_index], selector[selector_index])
if part_score > 0:
score += part_score << (3 * min(scope_index+1, 20))
score &= ((1 << 64) - 1) # Keep 64 bits
selector_index -= 1
scope_index -= 1 |
@Thom1729 I've tested your above routine in pyblime and i've found some corner cases that won't behave as Sublime, you can see them in this unit-test commented. Just check these ones:
so you'll see your routine will give:
In any case, it's almost a perfect 1:1 routine ;) . Btw, added proper credits here... hopefully you'll get interested in the project and become a contributor though as your skills seem to be the real deal ;) Ps. Again, thanks for the good job |
It looks like the scope or selector If I have time, I'll see about modifying the algorithm accordingly. |
@FichteFoll Please add this to the next dev cycle, having this feature fixed would simplify color schemes e and allow us to target punctuations for languages where [] {} can mean different things in different contexts. |
adding a milestone would be misleading unless SublimeHQ are actually going to work on it in this dev cycle... |
@keith-hall I know that but since he is adding specific issues to milestones I thought he had enough rapport with the core devs to ask for this. |
All issues in the "next dev cycle" milestone are confirmed to have been looked into by a core dev and to be addressed by the "next dev cycle" releases. The issues I add to that have either been mentioned by a dev on the forum, Discord, or otherwise. wbond also adds the milestone to issues directly (when he can). |
Old behavior (via): >>> sublime.score_selector("meta.mapping meta.sequence meta.mapping punctuation.separator.sequence", "meta.mapping punctuation.separator.sequence")
12304
>>> sublime.score_selector("meta.mapping meta.sequence meta.mapping punctuation.separator.sequence", "meta.sequence punctuation.separator.sequence")
12416 New behavior in 4173: >>> sublime.score_selector("meta.mapping meta.sequence meta.mapping punctuation.separator.sequence", "meta.mapping punctuation.separator.sequence")
1664
>>> sublime.score_selector("meta.mapping meta.sequence meta.mapping punctuation.separator.sequence", "meta.sequence punctuation.separator.sequence")
1552 🎉 |
And with the example given in this issue:
Now: >>> sublime.score_selector('source.json meta.structure.dictionary.json meta.structure.dictionary.value.json meta.structure.array.json meta.structure.dictionary.json punctuation.section.dictionary.begin.json', 'meta.structure.dictionary punctuation.section')
77824
>>> sublime.score_selector('source.json meta.structure.dictionary.json meta.structure.dictionary.value.json meta.structure.array.json meta.structure.dictionary.json punctuation.section.dictionary.begin.json', 'meta.structure.array punctuation.section')
67072 |
This change is required for #4015 to succeed. Its indentation rule selectors make use of sublimehq/sublime_text#2152 to recursively select the correct ruleset in deeply nested alternating json objects/lists and function constructs. ```js var foo = { case: () => { // data: don't dedent switch () { case: // code: dedent var bar = { case: () => { // data: don't dedent ... } } } } ``` This is available as of ST4173. In all builds before, selectors match the outermost contexts only and thus indentation starts to fail in on of the inner object/function blocks.
Summary
The
sublime.score_selector
API method (and theView.score_selector
method) should always give higher scores to more specific matches.Expected behavior
When scoring multiple selectors at a point, I expect the highest scoring selector to be the one that is most specific (given that it matches). What “most specific” means to me is that each part of the selector should match the rightmost possible component of the given scope. (This can be done greedily in linear time.)
Actual behavior
In at least some circumstances, a “less specific” selector scores higher than a “more specific” selector. This poses challenges in some circumstances when creating color schemes. See:
The present behavior has confused some color scheme authors, although to my knowledge the confusion has not been formally articulated before. The API documentation does not specify the exact behavior, so it's hard to call this issue a bug in the strictest sense. However, I think that the new behavior suggested here would be the most intuitive for users and the most useful for developers.
Steps to reproduce
To illustrate the issue, use the following
.sublime-color-scheme
:Then observe the following file with the core JSON syntax:
I expect all of the curly braces to be red and all of the square brackets to be green. However, the innermost braces are the “wrong” color.
(It may be observed that this example is not the most effective way of highlighting JSON punctuation, because we can distinguish curly braces from square brackets directly. However, that solution does not apply to more complex cases in other languages.)
Environment
The text was updated successfully, but these errors were encountered: