Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Update performance comparison section of io docs #28890

Merged
merged 26 commits into from
Nov 9, 2019
Merged

DOC: Update performance comparison section of io docs #28890

merged 26 commits into from
Nov 9, 2019

Conversation

WuraolaOyewusi
Copy link
Contributor

@WuraolaOyewusi WuraolaOyewusi commented Oct 10, 2019

@WuraolaOyewusi
Copy link
Contributor Author

Hi Marc @datapythonista

@jreback jreback added the Docs label Oct 11, 2019
@jreback jreback added this to the 1.0 milestone Oct 11, 2019
@@ -5593,164 +5593,166 @@ Given the next test set:

.. code-block:: python

from numpy.random import randn
from numpy.random import randn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this example to use our formatting, meaning don import like this, rather use
np.random.randn directory, and np.random.seed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@WuraolaOyewusi
Copy link
Contributor Author

WuraolaOyewusi commented Oct 20, 2019

Hi Marc @datapythonista .
Kindly check this PR?

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WuraolaOyewusi

The diff is a bit difficult to check, but looks good. You just fixed the indentation, besides rerunning and addressing the review comments, right?

@jorisvandenbossche if you want to have a look.

@jorisvandenbossche jorisvandenbossche changed the title Update performance comparison section of io docs DOC: Update performance comparison section of io docs Oct 21, 2019
@jorisvandenbossche
Copy link
Member

I restored the indentation for a moment, to have an easier diff (but can undo that in the end, as a 4-space indent is easier to work with)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good, just noticed one problem with the hdf compression

24009288 Oct 10 06:43 test_fixed.hdf
24009288 Oct 10 06:43 test_fixed_compress.hdf
24458940 Oct 10 06:44 test_table.hdf
24458940 Oct 10 06:44 test_table_compress.hdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this here: it seems something went wrong with the compression (as it is exactly the same size as the non-compressed one; and also the timing is not slower). Maybe it did fall back to non-compressed because you didn't have the compression lib installed?
(however, if it does that silently, that feels like a bug to me)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I found out the original code had 3-space indentation, aligning my update to the previous code was the reason some checks failed. When I made the indentation 4-space. It passed.

Let me check the notebook again about the compression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about the compression.

IMG_20191021_103019

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche, @datapythonista

I just noticed this here: it seems something went wrong with the compression (as it is exactly the same size as the non-compressed one; and also the timing is not slower). Maybe it did fall back to non-compressed because you didn't have the compression lib installed?
(however, if it does that silently, that feels like a bug to me)

I ran the codes again, tried version '0.25.0' and it's still the same. It seems like a bug.
What can I do to fix it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a separate issue, referencing this issue, explaining about the lack of compression in hdf.

@jorisvandenbossche probably worth merging this as is, and fix that in a separate PR, since it's an unrelated change. And looks like a bug in the code, and I guess the fix won't be trivial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok Marc

@jreback
Copy link
Contributor

jreback commented Oct 31, 2019

lgtm. @jorisvandenbossche

@WillAyd
Copy link
Member

WillAyd commented Nov 8, 2019

@WuraolaOyewusi can you fix up the merge conflict? I think we can merge this in once complete

@WuraolaOyewusi
Copy link
Contributor Author

@WillAyd
Done

@WillAyd WillAyd merged commit 6498bc1 into pandas-dev:master Nov 9, 2019
@WillAyd
Copy link
Member

WillAyd commented Nov 9, 2019

Great thanks @WuraolaOyewusi keep em coming

keechongtan added a commit to keechongtan/pandas that referenced this pull request Nov 11, 2019
…ndexing-1row-df

* upstream/master: (109 commits)
  stronger typing in libreduction (pandas-dev#29502)
  API: rename labels to codes (pandas-dev#29509)
  CLN: remove unnecessary type checks (pandas-dev#29517)
  implement _BaseGrouper (pandas-dev#29520)
  CLN: F-string formatting in pandas/_libs/*.pyx (pandas-dev#29527)
  Fixed more SS03 errors (pandas-dev#29540)
  consolidate dim checks (pandas-dev#29536)
  REF: separate out _get_cython_func_and_vals (pandas-dev#29537)
  remove unnecessary exception (pandas-dev#29538)
  TST:Add test to check single category col returns series with single row slice (pandas-dev#29521)
  Make color validation more forgiving (pandas-dev#29122)
  DOC: update bottleneck repo and documentation urls (pandas-dev#29516)
  TST: add test for df construction from dict with tuples (pandas-dev#29497)
  add test for pd.melt dtypes preservation (pandas-dev#29510)
  updated DataFrame.equals docstring (pandas-dev#29496)
  Resolved merge conflicts (pandas-dev#29506)
  DOC: Improved pandas/compact/__init__.py (pandas-dev#29507)
  DOC: Update performance comparison section of io docs (pandas-dev#28890)
  TST: add test for df.where() with category dtype (pandas-dev#29454)
  DOC: Fix docs on merging categoricals. (pandas-dev#28185)
  ...
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants