-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
potential speed gains with 'f' order for BLAS #10
Comments
Thanks! I will look into it. Another speedup we could get is by (optionally) using jax and running on a TPU/GPU. |
Kendrick and I were both (especially him) bedazzled by the speed at which jax accomplished svd. We got crazy speedups even using standard CPU on my machine. |
do you want me to look into it? i could open a PR and a jax branch. |
Yeah. It looks really impressive. I did give it a try at some point, but
couldn't quite get the speedups I was expecting, so I gave up on it.
Admittedly, I probably didn't do enough to profile it with data of a
variety of sizes.
I would be really happy for a PR that incorporates it. The tests do some
benchmarking, so we'll get some (again, probably limited) performance
metrics from the CI.
…On Thu, Feb 4, 2021 at 8:40 AM Ian Charest ***@***.***> wrote:
do you want me to look into it? i could open a PR and a jax branch.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA46NTG5KYTFWGOUO6UNXTS5LEYFANCNFSM4OPD7XDQ>
.
|
Hey just some further thoughts on this. Haven't really gone down the line of jax probably facing the same issues as you. But more recently I realised that an important speedup comes with an intel-numpy install. Basically the bottleneck in fracricge is numpy's svd which, when computed using the default openblas is much slower than intel mkl. so perhaps this is of interest to some users: virtualenv --python /usr/bin/python3.8 ~/Environments/intel-np
source intel-np/bin/activate
pip install pip -U
pip install intel-numpy
pip install ipython
pip install fracridge I compared this fresh intel mkl supported version with the standard numpy using timeit like such: import numpy as np
from fracridge import fracridge
X = np.random.rand(1000,40).T
y = np.random.rand(40, 20)
n_alphas = 20
fracs = np.linspace(1/n_alphas, 1 + 1/n_alphas, n_alphas)
%timeit fracridge(X, y, fracs) the results were striking, with : 1.24 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) respectively for intel-numpy vs. box standard numpy. |
This is great! We should add that to the docs. |
Do I understand correctly that the crucial bit is |
Yes, although I had to make a python3.8 environment, with 3.9 or 3.10 it was struggling to install. |
Yeah. I can't seem to be able to install this on any version of Python (conda). Possibly because I am trying to do this on a mac? |
i'm wondering whether there is a speedup in python that could be done with the @ operations.
in some cases we have more model parameters than observations (e.g. when using betas to predict some variables)
(this insight came from reading this):
https://www.benjaminjohnston.com.au/matmul
in these instances, given that scipy.linalg.blas.sgemm is faster with 'f' than 'c'
perhaps we would perform much faster if the "large" dimension was the first one
The text was updated successfully, but these errors were encountered: