potential speed gains with 'f' order for BLAS #10

iancharest · 2020-07-02T18:09:36Z

i'm wondering whether there is a speedup in python that could be done with the @ operations.

X : ndarray, shape (n, p)
        Design matrix for regression, with n number of
        observations and p number of model parameters.
y : ndarray, shape (n, b)
        Data, with n number of observations and b number of targets.

in some cases we have more model parameters than observations (e.g. when using betas to predict some variables)

(this insight came from reading this):
https://www.benjaminjohnston.com.au/matmul

in these instances, given that scipy.linalg.blas.sgemm is faster with 'f' than 'c'
perhaps we would perform much faster if the "large" dimension was the first one

arokem · 2020-07-09T15:09:50Z

Thanks! I will look into it.

Another speedup we could get is by (optionally) using jax and running on a TPU/GPU.

iancharest · 2021-02-04T16:39:56Z

Kendrick and I were both (especially him) bedazzled by the speed at which jax accomplished svd. We got crazy speedups even using standard CPU on my machine.
https://towardsdatascience.com/turbocharging-svd-with-jax-749ae12f93af
I think this may be worth the while.

iancharest · 2021-02-04T16:40:16Z

do you want me to look into it? i could open a PR and a jax branch.

arokem · 2021-02-04T16:56:14Z

Yeah. It looks really impressive. I did give it a try at some point, but couldn't quite get the speedups I was expecting, so I gave up on it. Admittedly, I probably didn't do enough to profile it with data of a variety of sizes. I would be really happy for a PR that incorporates it. The tests do some benchmarking, so we'll get some (again, probably limited) performance metrics from the CI.

…

On Thu, Feb 4, 2021 at 8:40 AM Ian Charest ***@***.***> wrote: do you want me to look into it? i could open a PR and a jax branch. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA46NTG5KYTFWGOUO6UNXTS5LEYFANCNFSM4OPD7XDQ> .

iancharest · 2022-08-15T18:47:34Z

Hey just some further thoughts on this. Haven't really gone down the line of jax probably facing the same issues as you. But more recently I realised that an important speedup comes with an intel-numpy install. Basically the bottleneck in fracricge is numpy's svd which, when computed using the default openblas is much slower than intel mkl.

so perhaps this is of interest to some users:

virtualenv --python /usr/bin/python3.8 ~/Environments/intel-np
source intel-np/bin/activate

pip install pip -U
pip install intel-numpy
pip install ipython
pip install fracridge

I compared this fresh intel mkl supported version with the standard numpy using timeit like such:

import numpy as np
from fracridge import fracridge

X = np.random.rand(1000,40).T
y = np.random.rand(40, 20)

n_alphas = 20
fracs = np.linspace(1/n_alphas, 1 + 1/n_alphas, n_alphas)

%timeit fracridge(X, y, fracs)

the results were striking, with :

1.24 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
vs.
7.66 ms ± 2.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

respectively for intel-numpy vs. box standard numpy.

arokem · 2022-08-16T03:12:17Z

This is great! We should add that to the docs.

arokem · 2022-08-16T03:13:28Z

Do I understand correctly that the crucial bit is pip install intel-numpy?

iancharest · 2022-08-16T14:05:22Z

Yes, although I had to make a python3.8 environment, with 3.9 or 3.10 it was struggling to install.

arokem · 2022-08-18T03:10:59Z

Yeah. I can't seem to be able to install this on any version of Python (conda). Possibly because I am trying to do this on a mac?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential speed gains with 'f' order for BLAS #10

potential speed gains with 'f' order for BLAS #10

iancharest commented Jul 2, 2020

arokem commented Jul 9, 2020

iancharest commented Feb 4, 2021

iancharest commented Feb 4, 2021

arokem commented Feb 4, 2021 via email

iancharest commented Aug 15, 2022

arokem commented Aug 16, 2022

arokem commented Aug 16, 2022

iancharest commented Aug 16, 2022

arokem commented Aug 18, 2022

potential speed gains with 'f' order for BLAS #10

potential speed gains with 'f' order for BLAS #10

Comments

iancharest commented Jul 2, 2020

arokem commented Jul 9, 2020

iancharest commented Feb 4, 2021

iancharest commented Feb 4, 2021

arokem commented Feb 4, 2021 via email

iancharest commented Aug 15, 2022

arokem commented Aug 16, 2022

arokem commented Aug 16, 2022

iancharest commented Aug 16, 2022

arokem commented Aug 18, 2022