You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this issue is to share some learnings (and some salt about DbgHelp) that I have gained in the past few days furthering my descent into madness my understanding of symbols; whether any of this should be considered an issue that prompts changes to DbgShell or an endorsement of any particular approach I leave to the readers' judgement. SymbolSearchPerf.cs.txt
SymSearch results are a superset of SymEnumSymbols - there is no symbol that SymEnumSymbols will return that SymSearch can not also return.
"But wait!" you say, "I have done searches that get results from SymEnumSymbols but not SymSearch!" And indeed, you are correct. So I shall share the sad tale of a simple, unassuming symbol, known to its friends as mshtml!g_hIsolatedHeap.
Nothing. Have I lied to you? No! I have not! It is DbgHelp that lies! This symbol's proper name is notmshtml!g_hIsolatedHeap at all, but in truth is mshtml!_g_hIsolatedHeap!
SymEnumSymbols succceeds where SymSearch does not because DbgHelp secretly inserts search globs to cover for the lies it tells! But at what cost, you ask? "No, I didn-" Yes, you did. But unfortunately, there are so many different ways that DbgHelp is heckin' slow, you can't unambiguously pin it to one single thing† - but one can at least quantify how slow the various ways of calling into it to ask about our friend mshtml!_g_hIsolatedHeap are!
Method
Mean
Error
Ratio
FindSymbol_Enum
1,751,811.693 us
77,915.3682 us
1.000
FindSymbol_Enum_ExplicitStar
1,225,950.550 us
22,614.1927 us
0.691
FindSymbol_Enum_NoType
1,167,999.680 us
17,032.5053 us
0.668
FindSymbol_Enum_Mangled_SymOpts
705,314.202 us
34,711.8878 us
0.421
FindSymbol_Enum_PublicsOnly_Mangled_SymOpts
670,834.033 us
16,164.5092 us
0.384
FindSymbol_Search_ForcedStarGlob
3,409,318.860 us
37,855.2651 us
1.949
FindSymbol_Search_SecretUncodumentedWildcard
3,017,108.117 us
156,676.2463 us
1.736
FindSymbol_Search_Mangled_SymOpts
368,730.864 us
6,861.2060 us
0.210
(spoiler alert: Benchmarkdotnet picked microseconds for this table because some of the rows I cut out actually need it)
There are few things you can tell from that table (should the names happen to make sense to you, or should you reference against the attached code). First is that with the default settings DbgHelp does try to match against your original search terms before searching again with the injected star glob. Next you might notice that SymSearch manages to take twice as long (unless you set symopts to search against still-"decorated" streams, a trival concession for our symbol with no decoration to be had). I'm not really sure how it does that. One might also notice that there are undocumented wildcards (that one might want to, you know, escape). In this case, I used #, which seems to mean "match the preceding character 0 or 1 times"‡, but there's also some handling for a [a-z] style syntax.
Now, if one were to examine DbgHelp more closely††, one might notice that the public apis seem to pretty much just be messy wrappers aound DIA functions. And there is an undocumented SymGetDiaSession, what might one get from that? Crashes, unless one also notices the SymFreeDiaString and surmises that DbgHelp's DIA is special and doesn't use SysAllocString to allocate "BSTR" values.
Method
Mean
Error
Ratio
findChildren_Mangled_DiaComInterop
2,816.546 us
45.3609 us
0.002
findChildren_Mangled_DiaCppCli
111.870 us
0.5413 us
0.000
findChildren_Unmangled_DiaCppCli
338,699.820 us
6,055.1215 us
0.194
findChildren_Mangled_DiaCppCli_BuiltinSearchGlob
146,313.182 us
7,739.2609 us
0.084
Wow, using .NET COM interop with an exact search string gets us an answer 100 times faster (as long as we don't look at any string properties...). C++/CLI our way around the string issue and we get a whopping 3,000 times faster. Even undecoration & search globs can't get us as slow as the DbgHelp APIs go. But wait! There's more... most of those 112 microsecond were just fetching the DiaSession!
Method
Mean
Error
Ratio
findChildren_Mangled_DiaCppCli_CachedSession
4.629 us
0.0850 us
0.000
But why stop there? If you profile it, it's painfully obvious that DbgHelp's DIA is hamstrung by some inefficient UTF8 -> UTF16 conversion logic... and the disassembly tarot cards will tell you that the version of DIA in msdia140 somewhat improves on that conversion, in addition to a statically linked towlower... so how would that fare?
Well, shoot. Those numbers are rather remarkably close to DbgHelp's... I profiled them to make sure I wasn't accidentally double testing. I'm starting to suspect there was a reason the DbgHelp DIA used LocalAlloc instead of SysAllocString... msdia140 will use LocalAlloc if you use a secret alternate GUID to get your IDiaDataSource but I think I should post this and go to sleep before I get lured into testing that
( SymbolSearchPerf.cs.txt contains the benchmark code, which should compile & run against the head of my personal DbgShell repository)
† aside from the dynamically linked towlower call in DbgHelp!nextToken
‡ except in mshtml!_#_hIsolatedHeap where it means "get stuck in an infinite loop"
†† Not by diassembling it, of course, that would be against the terms of use!
The text was updated successfully, but these errors were encountered:
Thank you for doing this analysis and sharing it. I have long wondered about the SymEnumSymbols vs SymSearch thing. (You'd think I could've just asked someone who works here, and I did try that, but got no reply.)
Apparently SymSearch is only super slow if you're using the Recurse or AllItems search options in combination with undecorating symbols - because it doesn't apply the search mask until after populating the symbol info, causing every single symbol's name to go through an extra unnecessary (and suprisingly slow) utf8 -> utf16 -> utf8 round trip before undecorating it.
I also can't actually tell how AllItems is different from GlobalsOnly when there' isn't also Recurse (aside from being much slower).
And an actual DbgShell issue: Recurse with a "*" symbol name search errors out because some of the symbols returned don't have names
The purpose of this issue is to share some learnings (and some salt about DbgHelp) that I have gained in the past few days furthering
my descent into madnessmy understanding of symbols; whether any of this should be considered an issue that prompts changes to DbgShell or an endorsement of any particular approach I leave to the readers' judgement.SymbolSearchPerf.cs.txt
"But wait!" you say, "I have done searches that get results from SymEnumSymbols but not SymSearch!" And indeed, you are correct. So I shall share the sad tale of a simple, unassuming symbol, known to its friends as
mshtml!g_hIsolatedHeap
.As expected. And what about SymSearch?
Nothing. Have I lied to you? No! I have not! It is DbgHelp that lies! This symbol's proper name is not
mshtml!g_hIsolatedHeap
at all, but in truth ismshtml!_g_hIsolatedHeap
!SymEnumSymbols succceeds where SymSearch does not because DbgHelp secretly inserts search globs to cover for the lies it tells! But at what cost, you ask? "No, I didn-" Yes, you did. But unfortunately, there are so many different ways that DbgHelp is heckin' slow, you can't unambiguously pin it to one single thing† - but one can at least quantify how slow the various ways of calling into it to ask about our friend
mshtml!_g_hIsolatedHeap
are!(spoiler alert: Benchmarkdotnet picked microseconds for this table because some of the rows I cut out actually need it)
There are few things you can tell from that table (should the names happen to make sense to you, or should you reference against the attached code). First is that with the default settings DbgHelp does try to match against your original search terms before searching again with the injected star glob. Next you might notice that
SymSearch
manages to take twice as long (unless you set symopts to search against still-"decorated" streams, a trival concession for our symbol with no decoration to be had). I'm not really sure how it does that. One might also notice that there are undocumented wildcards (that one might want to, you know, escape). In this case, I used#
, which seems to mean "match the preceding character 0 or 1 times"‡, but there's also some handling for a[a-z]
style syntax.Now, if one were to examine DbgHelp more closely††, one might notice that the public apis seem to pretty much just be messy wrappers aound DIA functions. And there is an undocumented
SymGetDiaSession
, what might one get from that? Crashes, unless one also notices theSymFreeDiaString
and surmises that DbgHelp's DIA is special and doesn't useSysAllocString
to allocate "BSTR
" values.Wow, using .NET COM interop with an exact search string gets us an answer 100 times faster (as long as we don't look at any string properties...). C++/CLI our way around the string issue and we get a whopping 3,000 times faster. Even undecoration & search globs can't get us as slow as the DbgHelp APIs go. But wait! There's more... most of those 112 microsecond were just fetching the DiaSession!
But why stop there? If you profile it, it's painfully obvious that DbgHelp's DIA is hamstrung by some inefficient UTF8 -> UTF16 conversion logic... and the
disassemblytarot cards will tell you that the version of DIA in msdia140 somewhat improves on that conversion, in addition to a statically linkedtowlower
... so how would that fare?Well, shoot. Those numbers are rather remarkably close to DbgHelp's... I profiled them to make sure I wasn't accidentally double testing. I'm starting to suspect there was a reason the DbgHelp DIA used
LocalAlloc
instead ofSysAllocString
... msdia140 will use LocalAlloc if you use a secret alternate GUID to get your IDiaDataSource but I think I should post this and go to sleep before I get lured into testing that( SymbolSearchPerf.cs.txt contains the benchmark code, which should compile & run against the head of my personal DbgShell repository)
† aside from the dynamically linked towlower call in DbgHelp!nextToken
‡ except in
mshtml!_#_hIsolatedHeap
where it means "get stuck in an infinite loop"†† Not by diassembling it, of course, that would be against the terms of use!
The text was updated successfully, but these errors were encountered: