-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Update performance comparison section of io docs #28890
DOC: Update performance comparison section of io docs #28890
Conversation
Hi Marc @datapythonista |
doc/source/user_guide/io.rst
Outdated
@@ -5593,164 +5593,166 @@ Given the next test set: | |||
|
|||
.. code-block:: python | |||
|
|||
from numpy.random import randn | |||
from numpy.random import randn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change this example to use our formatting, meaning don import like this, rather use
np.random.randn
directory, and np.random.seed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Hi Marc @datapythonista . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @WuraolaOyewusi
The diff is a bit difficult to check, but looks good. You just fixed the indentation, besides rerunning and addressing the review comments, right?
@jorisvandenbossche if you want to have a look.
I restored the indentation for a moment, to have an easier diff (but can undo that in the end, as a 4-space indent is easier to work with) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good, just noticed one problem with the hdf compression
24009288 Oct 10 06:43 test_fixed.hdf | ||
24009288 Oct 10 06:43 test_fixed_compress.hdf | ||
24458940 Oct 10 06:44 test_table.hdf | ||
24458940 Oct 10 06:44 test_table_compress.hdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this here: it seems something went wrong with the compression (as it is exactly the same size as the non-compressed one; and also the timing is not slower). Maybe it did fall back to non-compressed because you didn't have the compression lib installed?
(however, if it does that silently, that feels like a bug to me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche I found out the original code had 3-space indentation, aligning my update to the previous code was the reason some checks failed. When I made the indentation 4-space. It passed.
Let me check the notebook again about the compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche, @datapythonista
I just noticed this here: it seems something went wrong with the compression (as it is exactly the same size as the non-compressed one; and also the timing is not slower). Maybe it did fall back to non-compressed because you didn't have the compression lib installed?
(however, if it does that silently, that feels like a bug to me)
I ran the codes again, tried version '0.25.0' and it's still the same. It seems like a bug.
What can I do to fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a separate issue, referencing this issue, explaining about the lack of compression in hdf.
@jorisvandenbossche probably worth merging this as is, and fix that in a separate PR, since it's an unrelated change. And looks like a bug in the code, and I guess the fix won't be trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok Marc
lgtm. @jorisvandenbossche |
@WuraolaOyewusi can you fix up the merge conflict? I think we can merge this in once complete |
@WillAyd |
Great thanks @WuraolaOyewusi keep em coming |
…ndexing-1row-df * upstream/master: (109 commits) stronger typing in libreduction (pandas-dev#29502) API: rename labels to codes (pandas-dev#29509) CLN: remove unnecessary type checks (pandas-dev#29517) implement _BaseGrouper (pandas-dev#29520) CLN: F-string formatting in pandas/_libs/*.pyx (pandas-dev#29527) Fixed more SS03 errors (pandas-dev#29540) consolidate dim checks (pandas-dev#29536) REF: separate out _get_cython_func_and_vals (pandas-dev#29537) remove unnecessary exception (pandas-dev#29538) TST:Add test to check single category col returns series with single row slice (pandas-dev#29521) Make color validation more forgiving (pandas-dev#29122) DOC: update bottleneck repo and documentation urls (pandas-dev#29516) TST: add test for df construction from dict with tuples (pandas-dev#29497) add test for pd.melt dtypes preservation (pandas-dev#29510) updated DataFrame.equals docstring (pandas-dev#29496) Resolved merge conflicts (pandas-dev#29506) DOC: Improved pandas/compact/__init__.py (pandas-dev#29507) DOC: Update performance comparison section of io docs (pandas-dev#28890) TST: add test for df.where() with category dtype (pandas-dev#29454) DOC: Fix docs on merging categoricals. (pandas-dev#28185) ...
xref python-sprints/pandas-mentoring#163