-
-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safearray_as_ndarray context manager not working as intended #551
Comments
Firstly, I would like to clarify the situation. If there is a proposal for a change, there should be a problem that you are facing that would be resolved by this change. Also, I would like to help solve this problem, but I am not very familiar with |
To be clear, the proposed change is on line 326 of safearray.py Change:
to:
This simple change allows the "safearray_as_ndarray:" context manager to directly convert a safearray to a numpy array at much faster speeds. I would argue using the original keys() instead of the the proposed values() is actually a bug due to the inconsistent way the comtypes.npsupport.typecodes dictionary mapped dtypes and ctypes as created in npsupport.py in past versions. See #201 for the genesis of the problem. Since comtypes.npsupport.typecodes now consistently maps dtypes to ctypes, we need the values of the dictionary (i.e. the ctypes) , not the keys (i.e. dtypes) in the conditional check. This bug first appeared in 1.1.13. In summary, npsupport.py does NOT need to change, only line 326 of safearray.py should be changed as shown above. The change results in much faster safearray transfers using the "with safearray_as_ndarray:" context manager. I can provide a code sample, but it will require downloading a particular COM server that is available on the web. Let me know if this is desired and I will post the code and download details. |
Please do so. Share your situation without hesitation. |
"""
comtypes safearray_as_ndarray context manager test
This code tests the speed of a 1D array transfer from a
source COM server to a target Numpy array. The sample
DADiSP 6.7 COM server can be downloaded here:
https://www.dadisp.com/30dayidx.htm
Once installed, run DADiSP 6.7 as Admin and enter this trial
password:
GJXRZ6G3B7
then exit the application. DADiSP 6.7 should now be registered in
the Registery as a COM server.
The speedtest() function tests the safearray transfer by creating
a 24 M sample (24*1024*1024 x 1) random array in the server and then
transfers the array from the server to Python using comtypes, first
by the standard method and then with the safearry_as_ndarray context
manager. The transfer times are compared.
For example:
>>> speedtest()
Standard: 5.110, Context Manager: 1.93 - lower is faster
Although the context manager as written provides a nice speedup, we can
obtain much faster speeds by replacing line 325 in safearray.py:
comtypes.npsupport.typecodes.keys()
with:
comtypes.npsupport.typecodes.values()
This allows the conditional to correctly test for supported ctypes. Re-running
speedtest in a new session with the changed code yields:
>>> speedtest()
Standard: 5.732, Context Manager: 0.48 - lower is faster
The safearray_as_ndarray context manager method is now about 4x faster than
previously, a huge improvement for a simple one-liner!
"""
import comtypes
import atexit
from comtypes import client
from comtypes.safearray import safearray_as_ndarray
from time import time
tics = []
dp = None
# get large array, return time in seconds
def getdata1(dp, vname="w1"):
# start timer
tics.append(time())
# transfer data
data = dp.getdata(vname)
# end time
t = time() - tics.pop()
return t
# get large array using context manager
def getdata2(dp, vname="w1"):
# start timer
tics.append(time())
# transfer data with context manager
with safearray_as_ndarray:
data = dp.getdata(vname)
# end time
t = time() - tics.pop()
return t
# COM server
def initdp():
# invoke server
dp = client.CreateObject("DADiSP.Application")
# put 24MB x 1 array in W1
dp.execute("W1 = randn(24*1024*1024, 1)")
# return server
return dp
# exit handler
def exit_handler():
dp.quit()
# main function - array transfer speed comparison
def speedtest():
global dp
if dp == None:
dp = initdp()
atexit.register(exit_handler)
t1 = getdata1(dp)
t2 = getdata2(dp)
print("Standard: %5.3f, Context Manager: %5.2f - lower is faster" % (t1, t2)) |
@rrace10 thank you for this detailed and helpful set of observations. As the person responsible for the most recent changes to You will see in the conversation on issue #238 that I observed the opposite behaviour to your recent observations: the code checked the values which were dtypes, instead of the keys which were ctypes. At that point @cfarrow pointed out that the actual issue was that the npsupport module was inconsistent: #220 had reversed the mapping between ctypes and dtypes. I fixed the mapping in #239 to make it consistent with the older numpy versions. At this point, everything was working correctly. However, what I did not know at that point was that a separate PR #235 had also been raised to swap @junkmd, I think that making the change proposed by @rrace10 should be uncontroversial, it is in effect just reverting #308. I don't know why that PR got merged after so much time and with no apparent push for it to happen. Obviously the best thing would be to introduce a test here that would prevent this kind of regression from happening again. I don't have a lot of time to devote to this at the moment - @rrace10 is that something that you would be interested in taking on? |
@rrace10 @bennyrowland Now, I am summarizing my thoughts to share with the community about my concerns about changing the |
I have summarized my concerns about changing the
comtypes/comtypes/safearray.py Lines 324 to 326 in ce0e393
if safearray_as_ndarray and self._itemtype_ in list(
- comtypes.npsupport.typecodes.keys()
+ # `values()` is faster than `keys()`. See https://github.com/enthought/comtypes/issues/551
+ comtypes.npsupport.typecodes.values()
): The problem we are dealing with in this issue is a change to something that tests has been broken or non-existent for many years, like #473, so this is a difficult topic. And of course, I welcome contributions from not only @rrace10 but also from the community, for tests that guarantee changes to |
Does this mean that the proposed changes will make some of the tests that are currently being skipped work? |
I did not explore the skipped unittests in safearray.py, but they seem more to do with memory leak detection, so I doubt the change will have any impact, positive or negative. The code I posted was meant as a demonstration of the speed increase due to the change. I agree it is not suitable as a unittest since it relies on a 3rd party application that would be unwieldy to install and configure. A typical unittest is also problematic because the result of the change is a speed increase that is difficult to quantify. However, it would be straightforward to add a test in test_npsupport.py to verify comtypes.npsupport.typecodes always maps dtypes to ctypes. This does not directly test the change in safearray.py, but it could prove useful to future proof comtypes from changes that led to this issue. Your proposed comment in safearray.py is also useful. Here’s a proposed test to add to test_npsupport.py: def test_comtypes_npsupport_typecodes(self):
import ctypes
comtypes.npsupport.enable()
self.assertEqual(comtypes.npsupport.typecodes['|i1'], ctypes.c_byte)
self.assertEqual(comtypes.npsupport.typecodes['|u1'], ctypes.c_ubyte)
self.assertEqual(comtypes.npsupport.typecodes['<f4'], ctypes.c_float)
self.assertEqual(comtypes.npsupport.typecodes['<i2'], ctypes.c_short)
self.assertEqual(comtypes.npsupport.typecodes['<u2'], ctypes.c_ushort)
self.assertEqual(comtypes.npsupport.typecodes['<f8'], ctypes.c_double)
self.assertEqual(comtypes.npsupport.typecodes['<i4'], ctypes.c_long)
self.assertEqual(comtypes.npsupport.typecodes['<u4'], ctypes.c_ulong)
self.assertEqual(comtypes.npsupport.typecodes['|b1'], ctypes.c_bool)
self.assertEqual(comtypes.npsupport.typecodes['<i8'], ctypes.c_longlong)
self.assertEqual(comtypes.npsupport.typecodes['<u8'], ctypes.c_ulonglong) The test verifies that comtypes.npsupport.typecodes provides the correct mapping. The current release passes the test, but again it is intended to prevent future releases from reintroducing code that led to the problem. |
I agree that it is difficult to guarantee such cases in unit tests.
I welcome the addition of tests to robust the mapping of dtype formats and comtypes/comtypes/_npsupport.py Lines 19 to 92 in ce0e393
Having tests will make it easier for future maintainers to understand the behavior of the production code.
I think that comments in such difficult-to-test areas are a powerful approach to prevent regressions.
To prevent a failing assertion from causing subsequent assertions not to be executed, I suggest using def test_comtypes_npsupport_typecodes(self):
import ctypes
comtypes.npsupport.enable()
for fmt, typ in [
("|i1", ctypes.c_byte),
...,
("<u8", ctypes.c_ulonglong),
]:
with self.subTest(fmt=fmt, typ=typ):
self.assertIs(comtypes.npsupport.typecodes[fmt], typ) This will help identify the "truly wrong" parts. What I was concerned about is that the contributor might have submitted PR #235 to solve a problem they were facing, such as their codebase not working. In the next release, I plan to make only this one change related to
@rrace10, Feel free to PR to resolve this problem. |
@junkmd #235 was a valid fix at the time it was submitted, but when it was merged (as #308) it was no longer valid because the issue had been solved another way, so instead it produced this issue. AFAICT #334 is unrelated to these other issues - it relates to sending data as a numpy array, whereas the other issues are around receiving data as a numpy array. We should improve the server tests, but I don't think anyone has a good picture of how everything works enough to work out what a test suite should look like. At the very least, we can probably get some of the numpy tests running. I will put it on my (long) todo list :-) |
Ah, I misunderstood the timeline of PRs and merges. Thank you for your good point. Reviewers, contributors, and maintainers are all human, so I think such things are common in open source. I strongly feel that respecting each other and sharing feedback makes ensuring psychological safety and the open source community healthy. |
The function _check_ctypeslib_typecodes() in _npsupport.py returns a dictionary that is also assigned to ctypeslib._typecodes. The dictionary maps numpy dtype string representations ('i4', 'f8', etc.) to ctypes classes. The ctypeslib._typecodes dictionary appears correct and is used by _get_elements_raw() in safearray.py on line 325. The _get_elements_raw() function is part of the implementation of the "with safearray_as_ndarray:" context manager to optimize the transfer/conversion of safearrays/ndarrays.
However, _get_elements_raw() uses the keys() field of ctypeslib._typecodes to determine if the particular ctype is supported. The conditional is:
This appears incorrect, since the self.itemtype is a ctype, not a dtype as returned by comtypes.npsupport.typecodes.keys(). The end result is "with safearray_as_ndarray:" is essentially ignored with no speed up. Changing the conditional to:
works as expected, with an array transfer speed increase of up to 6x on my system. The keys() bug first appeared in release 1.13 and persists in 1.4.2. Using the simple change:
enables "with safearray_as_ndarray:" to work as intended and also passes all the unit tests.
The text was updated successfully, but these errors were encountered: