Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential speed gains with 'f' order for BLAS #10

Open
iancharest opened this issue Jul 2, 2020 · 9 comments
Open

potential speed gains with 'f' order for BLAS #10

iancharest opened this issue Jul 2, 2020 · 9 comments

Comments

@iancharest
Copy link

i'm wondering whether there is a speedup in python that could be done with the @ operations.

X : ndarray, shape (n, p)
        Design matrix for regression, with n number of
        observations and p number of model parameters.
y : ndarray, shape (n, b)
        Data, with n number of observations and b number of targets.

in some cases we have more model parameters than observations (e.g. when using betas to predict some variables)

(this insight came from reading this):
https://www.benjaminjohnston.com.au/matmul

in these instances, given that scipy.linalg.blas.sgemm is faster with 'f' than 'c'
perhaps we would perform much faster if the "large" dimension was the first one

@arokem
Copy link
Member

arokem commented Jul 9, 2020

Thanks! I will look into it.

Another speedup we could get is by (optionally) using jax and running on a TPU/GPU.

@iancharest
Copy link
Author

Kendrick and I were both (especially him) bedazzled by the speed at which jax accomplished svd. We got crazy speedups even using standard CPU on my machine.
https://towardsdatascience.com/turbocharging-svd-with-jax-749ae12f93af
I think this may be worth the while.

@iancharest
Copy link
Author

do you want me to look into it? i could open a PR and a jax branch.

@arokem
Copy link
Member

arokem commented Feb 4, 2021 via email

@iancharest
Copy link
Author

Hey just some further thoughts on this. Haven't really gone down the line of jax probably facing the same issues as you. But more recently I realised that an important speedup comes with an intel-numpy install. Basically the bottleneck in fracricge is numpy's svd which, when computed using the default openblas is much slower than intel mkl.

so perhaps this is of interest to some users:

virtualenv --python /usr/bin/python3.8 ~/Environments/intel-np
source intel-np/bin/activate

pip install pip -U
pip install intel-numpy
pip install ipython
pip install fracridge

I compared this fresh intel mkl supported version with the standard numpy using timeit like such:

import numpy as np
from fracridge import fracridge

X = np.random.rand(1000,40).T
y = np.random.rand(40, 20)

n_alphas = 20
fracs = np.linspace(1/n_alphas, 1 + 1/n_alphas, n_alphas)

%timeit fracridge(X, y, fracs)

the results were striking, with :

1.24 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
vs.
7.66 ms ± 2.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

respectively for intel-numpy vs. box standard numpy.

@arokem
Copy link
Member

arokem commented Aug 16, 2022

This is great! We should add that to the docs.

@arokem
Copy link
Member

arokem commented Aug 16, 2022

Do I understand correctly that the crucial bit is pip install intel-numpy?

@iancharest
Copy link
Author

Yes, although I had to make a python3.8 environment, with 3.9 or 3.10 it was struggling to install.

@arokem
Copy link
Member

arokem commented Aug 18, 2022

Yeah. I can't seem to be able to install this on any version of Python (conda). Possibly because I am trying to do this on a mac?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants