Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Python iterator approach: vectorise without dtype #9

Open
wants to merge 60 commits into
base: master
Choose a base branch
from
Open

New Python iterator approach: vectorise without dtype #9

wants to merge 60 commits into from

Conversation

jwoillez
Copy link
Contributor

Have a look only at the last two commit, as this is a continuation of another PR.

To give it a spin:

import erfa.nditer as erfa

import numpy as np

pnat = np.zeros([2,3], dtype=np.double)
v = np.zeros([5,1,3], dtype=np.double)
s = 1.0
bm1 = np.sqrt(1.0-0.1**2)

pnat[:,0] = 1.0
v[:,:,1] = 0.01

print(pnat)
print(v)
print(s)
print(bm1)

ppr = erfa.ab(pnat, v, s, bm1)

print(ppr)

The only tricky part in the implementation is that NpyIter is not exposed in numpy.pxd. This requires the use of private NpyIter_GetDataPtrArray() from numpy/arrayobject.h. This implementation is inspired from https://github.com/rainwoodman/chealpy/blob/master/chealpy/npyiter.pxd

It could be further improved by including the output parameter ppr in the iterator itself. This would save indexation time, after the call to eraAb...

jwoillez added 30 commits July 28, 2014 09:08
For later automation, it is easier to have all inputs vectorized.
Just because one of its parameters is an input and an output at the
same time. And it also uses the complex eraASTROM structure.
@jwoillez
Copy link
Contributor Author

jwoillez commented Nov 4, 2014

@mhvk Below is a verbatim copy of my timing script:

import importlib
import numpy as np
import timeit

erfa_nditer = importlib.import_module('erfa.nditer')
erfa_numpy = importlib.import_module('erfa.cython_numpy_auto')

pnat1 = np.zeros([100000,3], dtype=np.double)
pnat1[:,0] = 1.0
pnat2 = np.zeros([100000], dtype=[('f1','d',(3,))])
pnat2[:]['f1'][0] = 1.0
v1 = np.zeros([3], dtype=np.double)
v2 = np.zeros([1], dtype=[('f1','d',(3,))])
s = 1.0
bm1 = np.sqrt(1.0-0.1**2)

def run_nditer():
    ppr = erfa_nditer.ab(pnat1, v1, s, bm1)

def run_numpy():
    ppr = erfa_numpy.ab(pnat2, v2, s, bm1)

print(min(timeit.repeat("run_nditer()", "from __main__ import run_nditer", repeat=100, number=1)))
print(min(timeit.repeat("run_numpy()", "from __main__ import run_numpy", repeat=100, number=1)))

bm1_ = (<double *>(dataptrarray[3]))[0]
ppr_ = (<double *>(dataptrarray[4]))
eraAb(pnat_, v_, s_, bm1_, ppr_)
status = iternext(GetNpyIter(it))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am not very versant in C, but would it be faster to assign GetNpyIter(it) to a variable before entering the loop, and then use that variable rather than call this (inline) function multiple times?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principal it should be basically the same because GetNpyIter is just an accessor that changes the type.
But I just tried it now like this:

    cdef char** dataptrarray = GetDataPtrArray(GetNpyIter(it))
    cdef IterNextFunc iternext = GetIterNext(GetNpyIter(it), NULL)
    cdef NpyIter* ititer = GetNpyIter(it)
    cdef int status = 1
    while status:
        pnat_ = (<double *>(dataptrarray[0]))
        v_    = (<double *>(dataptrarray[1]))
        s_    = (<double *>(dataptrarray[2]))[0]
        bm1_  = (<double *>(dataptrarray[3]))[0]
        ppr_  = (<double *>(dataptrarray[4]))
        eraAb(pnat_, v_, s_, bm1_, ppr_)
        status = iternext(ititer)

And I got an ~ 2% speedup on a 5000-element array. So quite small, but it does seem to help.

Unless there's some possibility the iter pointer changes during iteration?

@mhvk
Copy link

mhvk commented Nov 4, 2014

@jwoillez - Sorry, I was unclear: I wondered whether you could point me to the .pyx file where the cython_numpy_auto version of ab resides... (just to look at it on github).

@eteq
Copy link
Member

eteq commented Nov 5, 2014

@jwoillez @mhvk - note that it's not entirely a fair comparison to compare this to what's in astropy/astropy#2992 because the latter has other changes dealing with error-processing. But hopefully that will be the same cost either way.

@mhvk, to look at the cython_numpy_auto version, you'll have to generate it as per what @jwoillez said in #6 - I don't think it's in github.

@eteq
Copy link
Member

eteq commented Nov 5, 2014

Oh, but I realized I have a local copy - it basically looks like this, @mhvk:

def ab(pnat, v, s, bm1):
    # convert nd-array to single-field structured array for argument "pnat"
    pnat_arr = numpy.empty(numpy.shape(pnat)[:-1], dtype=numpy.dtype([('fi0', 'd', (3,))]))
    pnat_arr['fi0'] = pnat
    pnat = pnat_arr

    # convert nd-array to single-field structured array for argument "v"
    v_arr = numpy.empty(numpy.shape(v)[:-1], dtype=numpy.dtype([('fi0', 'd', (3,))]))
    v_arr['fi0'] = v
    v = v_arr

    in_shape = numpy.broadcast(pnat, v, s, bm1).shape
    ppr_out = numpy.empty(in_shape, dtype=numpy.dtype([('fi0', 'd', (3,))]))

    cdef numpy.broadcast it = numpy.broadcast(pnat, v, s, bm1, ppr_out)
    cdef double * _pnat
    cdef double * _v
    cdef double _s
    cdef double _bm1
    cdef double * _ppr

    while numpy.PyArray_MultiIter_NOTDONE(it):
        _pnat = (<double *>numpy.PyArray_MultiIter_DATA(it, 0))
        _v = (<double *>numpy.PyArray_MultiIter_DATA(it, 1))
        _s = (<double*>numpy.PyArray_MultiIter_DATA(it, 2))[0]
        _bm1 = (<double*>numpy.PyArray_MultiIter_DATA(it, 3))[0]
        _ppr = (<double *>numpy.PyArray_MultiIter_DATA(it, 4))

        eraAb(_pnat, _v, _s, _bm1, _ppr)

        numpy.PyArray_MultiIter_NEXT(it) 

    # convert from single-field structured dtype to regular nd-array
    ppr_out = ppr_out['fi0']

    return ppr_out

@jwoillez
Copy link
Contributor Author

jwoillez commented Nov 5, 2014

I added d2dtf(), aper(), and atco13(), in order to see what the wrapper would look like in various cases.
There is also a fix for a mismatch between numpy int and C int.

@jwoillez
Copy link
Contributor Author

jwoillez commented Nov 5, 2014

I would appreciate if someone could have a critical look at the content of cython_npyiter.
Before any automation, I would want to check the proper handling of 2D arrays (e.g. double[3][3]).

@jwoillez
Copy link
Contributor Author

jwoillez commented Nov 5, 2014

In the commit above, to deal with double[3][3], I haven't found a better way than considering it a double[9].

@mhvk
Copy link

mhvk commented Nov 6, 2014

@eteq - just to clarify, the approach with the regular arrays here is now 30% faster than with the record arrays? Given the copies made for the latter, I guess that is not so surprising; the record array version probably could be sped up as well, but nice not to have a need for that!

This is based on the work  in cython_numpy_auto and the non-automated
tests in cython_npyiter
To make it match with the automated version and help me splot
differences in generated erfa.pyx
@jwoillez
Copy link
Contributor Author

jwoillez commented Nov 7, 2014

And the commits above add automation to the npyiter method!

@eteq
Copy link
Member

eteq commented Nov 7, 2014

@mhvk - yep, that's what I was seeing.

@jwoillez - thanks! I'll try to adjust astropy/astropy#2992 to use this auto-generating approach, then, as it's clearly an improvement over the other one. Hopefully it won't be too much work to get this to behave fine with the error-checking and array-wrapping additions I mad there. If we're really lucky, maybe it will even fix the astropy.time issues w/ numpy 1.9 (after all, this is the "recommended" approach to iterating now, it seems)...

Once that's in, we'll have to decide what to do with this repo. My instinct is to maybe just merge everything and then add a clear note in the README that this is for experimentation, and that the file in the astropy repo is where people should go for "production" code.

@jwoillez
Copy link
Contributor Author

@eteq Do you really want to spend any time cleaning this repository? If the official python wrapper is going to be moved to astropy, I would just get rid of this one. If improvements and/or experiments are needed at a later date, why not do that in the astropy side?

@eteq
Copy link
Member

eteq commented Nov 13, 2014

@jwoillez - I suppose that's true. I guess I'm just thinking of keeping this for "posterity" as it were, but it can just be left as-is right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants