Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pydata/sparse wrappers for levels, COO, and CSC/CSR formats #1

Merged
merged 2 commits into from
Jan 25, 2024

Conversation

mtsokol
Copy link
Member

@mtsokol mtsokol commented Jan 22, 2024

Hi @willow-ahrens @hameerabbasi,

Here's a PR with a draft implementation of Python wrappers for Tensor constructors, levels, COO, and CSC formats. Some comments:

  • test_coo and test_csc show the construction of fibers using the same code as Hameer's notebook (which adheres to pydata/sparse). Tomorrow I will add CSR and DOK/SparseHash so the whole notebook can be executed (except for GCXS).
  • Each wrapper has a _obj attribute which is the underlying Julia object.
  • Tensor can be constructed by providing levels description (wrapper classes!) and NumPy/Finch array, or just Finch object directly (jl_data).
  • Only simple element-wise operations are executed here.
  • There's also an implementation of the todense function that is heavily used in pydata/sparse. Here it's done by constructing a new lvl description with Dense levels only and creating a new Tensor.
  • In a separate PR (I think?) I will add CI stages for running those tests.

Questions:

  1. Is it possible to access fill_value in Element? I mean the first type parameter in Element{0.0, Float64, Int64}(...)
  2. How can I get the shape tuple of Tensor? Right now I obtain shape by iterating through levels and aggregating shape into one tuple.

@willow-ahrens
Copy link
Collaborator

This looks awesome! I won't have time for a detailed review until tomorrow, but on first glance this looks great!

@willow-ahrens
Copy link
Collaborator

To answer your questions:

  1. You can access the fill value by calling default(fbr).
  2. You can get the shape tuple by calling size(fbr).

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall except the dependency on sparse. I'd invert the depencency, and move those bits to sparse instead (otherwise we have a circular depencency, and that isn't nice).

@mtsokol mtsokol force-pushed the pydata-sparse-constructors branch 3 times, most recently from 2ae072d to ca887be Compare January 23, 2024 14:05
@mtsokol
Copy link
Member Author

mtsokol commented Jan 23, 2024

LGTM overall except the dependency on sparse. I'd invert the dependency, and move those bits to sparse instead (otherwise we have a circular dependency, and that isn't nice).

@hameerabbasi sparse is declared only as a tests dependency, as in test suites I want to compare it with sparse, is it Ok?

FYI, I added an initial setup for CI, but it didn't run, I guess GitHub workflows need to be enabled in repo's settings?
Also, I moved pyproject.toml config from poetry to the same as pydata/sparse (I'm not familiar with poetry here). Could we follow pydata/sparse setup for CI and build process?

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Jan 23, 2024

sparse is declared only as a tests dependency, as in test suites I want to compare it with sparse, is it Ok?

TBH, I'm not entirely sure how pip will react here. But I guess we can put that in sparse too, if it causes issues, or test against numpy.array_api.

FYI, I added an initial setup for CI, but it didn't run, I guess GitHub workflows need to be enabled in repo's settings?
Also, I moved pyproject.toml config from poetry to the same as pydata/sparse (I'm not familiar with poetry here). Could we follow pydata/sparse setup for CI and build process?

That SGTM, but I guess we should ask @willow-ahrens

@willow-ahrens For context, we follow pre-commit rules for linting and formatting. It's quite nice with auto fixes, even in PRs.

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@willow-ahrens
Copy link
Collaborator

willow-ahrens commented Jan 23, 2024

@mtsokol

sparse is declared only as a tests dependency, as in test suites I want to compare it with sparse, is it Ok?

I think that's fine for now! Eventually we may want to use an independent source of truth, but this is fine for now. I don't think circular dependencies will be an issue because it's not finch that depends on sparse, the test environment depends on sparse. The only issue is that if sparse ever begins to use Finch, then testing finch against sparse is just testing finch against itself.

FYI, I added an initial setup for CI, but it didn't run, I guess GitHub workflows need to be enabled in repo's settings?

Sometimes CI needs to be committed to main before it runs. Maybe make a separate PR for CI and we can merge it, then use this branch to test it?

Also, I moved pyproject.toml config from poetry to the same as pydata/sparse (I'm not familiar with poetry here). Could we follow pydata/sparse setup for CI and build process?

For CI, absolutely let's get that going. For build process, I'm open to this but I'd like to discuss in a meeting. Poetry is a popular option and it's really quite nice. I encourage you to give it a try! What in particular do you want to fix about the build process? What led to pydata/sparse using setuptools-scm? If we do find it necessary, could we change the build system in a separate PR?

@hameerabbasi

For context, we follow pre-commit rules for linting and formatting. It's quite nice with auto fixes, even in PRs.

Yes absolutely, let's get that going in a separate PR and merge it so that we can test it out on this one!

@willow-ahrens
Copy link
Collaborator

Here's the guide I was following to use Poetry: https://www.freecodecamp.org/news/how-to-build-and-publish-python-packages-with-poetry/

@mtsokol mtsokol changed the title pydata/sparse wrappers for levels, COO, and CSC formats pydata/sparse wrappers for levels, COO, and CSC/CSR formats Jan 23, 2024
Copy link
Collaborator

@willow-ahrens willow-ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo my suggested change to printing, I approve of the new functionality. Please make changes to CI as a separate PR. Please also make changes to the build system as another separate PR. Thanks for helping out with this!

class _Display:
def __repr__(self):
return self._obj.__repr__()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

class _Display:
    def __repr__(self):
        return jl.sprint(jl.show, self._obj)
    def __str__(self):
        return jl.sprint(jl.show, jl.MIME("text/plain"), self._obj)

@willow-ahrens
Copy link
Collaborator

Once you add CI and pre-commit hooks in a separate PR, I'll make sure they run and that CI must pass before merging.

Copy link
Collaborator

@willow-ahrens willow-ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't catch this the first time around, but CSR is not Sparse(Dense(...)). Rather, it is the transposition of CSC. Sparse(Dense(...)) is a weird format that looks like

[ 0 0 0 0;
  1 2 3 4;
  0 0 0 0;
  0 0 0 0;
  6 7 8 9]

src/finch/tensor.py Outdated Show resolved Hide resolved
@mtsokol mtsokol force-pushed the pydata-sparse-constructors branch 2 times, most recently from 107e3bb to bde02fe Compare January 25, 2024 09:57
@mtsokol mtsokol force-pushed the pydata-sparse-constructors branch from bde02fe to 5a53b62 Compare January 25, 2024 10:26
@hameerabbasi
Copy link
Collaborator

From the CI:

--cov-report --cov-report html --cov-report=xml --cov-report=term --cov sparse --cov-config .coveragerc tests

Looks like we need pytest-cov in the test deps.

@mtsokol mtsokol force-pushed the pydata-sparse-constructors branch 2 times, most recently from dcc9910 to 89accb7 Compare January 25, 2024 11:24
@mtsokol
Copy link
Member Author

mtsokol commented Jan 25, 2024

Looks like we need pytest-cov in the test deps.

@hameerabbasi Now tests are running in the CI. One thing - I changed Miniforge variant to the previous Mambaforge as the former one caused 404 error image not found or something similar.

@@ -35,15 +35,15 @@ jobs:
allow-softlinks: true
environment-file: ci/environment.yml
python-version: ${{ matrix.python }}
miniforge-variant: Miniforge
miniforge-variant: Mambaforge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
miniforge-variant: Mambaforge
miniforge-version: latest

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should move to setup-miniconda@v3

@mtsokol mtsokol force-pushed the pydata-sparse-constructors branch from 89accb7 to fa4eef1 Compare January 25, 2024 18:40
@mtsokol mtsokol force-pushed the pydata-sparse-constructors branch from fa4eef1 to 8ac85c9 Compare January 25, 2024 18:53
@mtsokol
Copy link
Member Author

mtsokol commented Jan 25, 2024

Hi @willow-ahrens @hameerabbasi,

I pushed the latest changes: There are simple COO, CSF, CSR, and CSC wrappers together with Finch Tensor/Levels wrappers.

In terms of row/column major manipulation, instead of reversing dims I did it by reinterpreting the input as column major on entry np.array(input, order="F") and back to row major before returning np.array(result, order="C"). I added an order flag check to test to verify that todense() returns "C"-order numpy array.

@willow-ahrens
Copy link
Collaborator

Ah, I see. Let's merge it and make a separate PR to support swizzles.

@willow-ahrens willow-ahrens merged commit c6098f0 into finch-tensor:main Jan 25, 2024
5 checks passed
@mtsokol mtsokol deleted the pydata-sparse-constructors branch January 25, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants