From e53bce0596a3c089096da3d6caec71f5dafdbf03 Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 08:00:18 +0100 Subject: [PATCH 01/19] Update io.rst --- doc/source/user_guide/io.rst | 121 ++++++++++++++++++++--------------- 1 file changed, 71 insertions(+), 50 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 2c8f66dd99e72..b75c68440016d 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5572,7 +5572,7 @@ Performance considerations -------------------------- This is an informal comparison of various IO methods, using pandas -0.20.3. Timings are machine dependent and small differences should be +0.24.2. Timings are machine dependent and small differences should be ignored. .. code-block:: ipython @@ -5676,38 +5676,49 @@ Given the next test set: def test_pickle_read_compress(): pd.read_pickle('test.pkl.compress', compression='xz') + + def test_parquet_write(df): + df.to_parquet('test.parquet') + + def test_parquet_read(): + pd.read_parquet('test.parquet') + -When writing, the top-three functions in terms of speed are are -``test_pickle_write``, ``test_feather_write`` and ``test_hdf_fixed_write_compress``. +When writing, the top-three functions in terms of speed are +``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``. .. code-block:: ipython - In [14]: %timeit test_sql_write(df) - 2.37 s ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - - In [15]: %timeit test_hdf_fixed_write(df) - 194 ms ± 65.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) - - In [26]: %timeit test_hdf_fixed_write_compress(df) - 119 ms ± 2.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) - - In [16]: %timeit test_hdf_table_write(df) - 623 ms ± 125 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - - In [27]: %timeit test_hdf_table_write_compress(df) - 563 ms ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - - In [17]: %timeit test_csv_write(df) - 3.13 s ± 49.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - - In [30]: %timeit test_feather_write(df) - 103 ms ± 5.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) - - In [31]: %timeit test_pickle_write(df) - 109 ms ± 3.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) - - In [32]: %timeit test_pickle_write_compress(df) - 3.33 s ± 55.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + In [4]: %timeit test_sql_write(df) + 3.29 s ± 43.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + + In [5]: %timeit test_hdf_fixed_write(df) + 19.4 ms ± 560 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) + + In [6]: %timeit test_hdf_fixed_write_compress(df) + 19.6 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) + + In [7]: %timeit test_hdf_table_write(df) + 449 ms ± 5.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + + In [8]: %timeit test_hdf_table_write_compress(df) + 448 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + + In [9]: %timeit test_csv_write(df) + 3.66 s ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + + In [10]: %timeit test_feather_write(df) + 9.75 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) + + In [11]: %timeit test_pickle_write(df) + 30.1 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) + + In [12]: %timeit test_pickle_write_compress(df) + 4.29 s ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + + In [13]: %timeit test_parquet_write(df) + 67.6 ms ± 706 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) + When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and ``test_hdf_fixed_read``. @@ -5715,42 +5726,52 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and .. code-block:: ipython In [18]: %timeit test_sql_read() - 1.35 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + 1.77 s ± 17.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [19]: %timeit test_hdf_fixed_read() - 14.3 ms ± 438 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) + 19.4 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [28]: %timeit test_hdf_fixed_read_compress() - 23.5 ms ± 672 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - + 19.5 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) + In [20]: %timeit test_hdf_table_read() - 35.4 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - + 38.6 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) + In [29]: %timeit test_hdf_table_read_compress() - 42.6 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) + 38.8 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In [22]: %timeit test_csv_read() - 516 ms ± 27.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + 452 ms ± 9.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [33]: %timeit test_feather_read() - 4.06 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) - + 12.4 ms ± 99.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) + In [34]: %timeit test_pickle_read() - 6.5 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) + 18.4 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [35]: %timeit test_pickle_read_compress() - 588 ms ± 3.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + 915 ms ± 7.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + + In [35]: %timeit test_parquet_read() + 24.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) + + +For this test case ``test.pkl.compress``, ``test.parquet`` and ``test.feather`` took the least space on disk. Space on disk (in bytes) .. code-block:: none - 34816000 Aug 21 18:00 test.sql - 24009240 Aug 21 18:00 test_fixed.hdf - 7919610 Aug 21 18:00 test_fixed_compress.hdf - 24458892 Aug 21 18:00 test_table.hdf - 8657116 Aug 21 18:00 test_table_compress.hdf - 28520770 Aug 21 18:00 test.csv - 16000248 Aug 21 18:00 test.feather - 16000848 Aug 21 18:00 test.pkl - 7554108 Aug 21 18:00 test.pkl.compress + 29519500 Oct 10 06:45 test.csv + 16000248 Oct 10 06:45 test.feather + 8281983 Oct 10 06:49 test.parquet + 16000857 Oct 10 06:47 test.pkl + 7552144 Oct 10 06:48 test.pkl.compress + 34816000 Oct 10 06:42 test.sql + 24009288 Oct 10 06:43 test_fixed.hdf + 24009288 Oct 10 06:43 test_fixed_compress.hdf + 24458940 Oct 10 06:44 test_table.hdf + 24458940 Oct 10 06:44 test_table_compress.hdf + + + From 76ccef35a7d0d5115550760f54b7b86888126773 Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 08:10:49 +0100 Subject: [PATCH 02/19] Update io.rst --- doc/source/user_guide/io.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index b75c68440016d..8ee66c21275af 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5725,34 +5725,34 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and .. code-block:: ipython - In [18]: %timeit test_sql_read() + In [14]: %timeit test_sql_read() 1.77 s ± 17.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - In [19]: %timeit test_hdf_fixed_read() + In [15]: %timeit test_hdf_fixed_read() 19.4 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - In [28]: %timeit test_hdf_fixed_read_compress() + In [16]: %timeit test_hdf_fixed_read_compress() 19.5 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - In [20]: %timeit test_hdf_table_read() + In [17]: %timeit test_hdf_table_read() 38.6 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - In [29]: %timeit test_hdf_table_read_compress() + In [18]: %timeit test_hdf_table_read_compress() 38.8 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) - In [22]: %timeit test_csv_read() + In [19]: %timeit test_csv_read() 452 ms ± 9.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - In [33]: %timeit test_feather_read() + In [20]: %timeit test_feather_read() 12.4 ms ± 99.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) - In [34]: %timeit test_pickle_read() + In [21]: %timeit test_pickle_read() 18.4 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) - In [35]: %timeit test_pickle_read_compress() + In [22]: %timeit test_pickle_read_compress() 915 ms ± 7.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - In [35]: %timeit test_parquet_read() + In [23]: %timeit test_parquet_read() 24.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) From 9672526a252b267d0d40a9939d20ef9b5f8a4c8a Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 08:53:07 +0100 Subject: [PATCH 03/19] Update io.rst --- doc/source/user_guide/io.rst | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 8ee66c21275af..36ac07c110dbd 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5676,16 +5676,17 @@ Given the next test set: def test_pickle_read_compress(): pd.read_pickle('test.pkl.compress', compression='xz') - + + def test_parquet_write(df): df.to_parquet('test.parquet') - + + def test_parquet_read(): pd.read_parquet('test.parquet') -When writing, the top-three functions in terms of speed are -``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``. +When writing, the top-three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``. .. code-block:: ipython @@ -5703,13 +5704,13 @@ When writing, the top-three functions in terms of speed are In [8]: %timeit test_hdf_table_write_compress(df) 448 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [9]: %timeit test_csv_write(df) 3.66 s ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [10]: %timeit test_feather_write(df) 9.75 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) - + In [11]: %timeit test_pickle_write(df) 30.1 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) From d2c1e202a900f20d973592057cfba1415e82fade Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 09:28:08 +0100 Subject: [PATCH 04/19] Update io.rst --- doc/source/user_guide/io.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 36ac07c110dbd..c511119e8203b 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5680,11 +5680,11 @@ Given the next test set: def test_parquet_write(df): df.to_parquet('test.parquet') - + def test_parquet_read(): pd.read_parquet('test.parquet') - + When writing, the top-three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``. From 709d5716b016d6d24ecdb5f3bb5fe4d2b97f86ea Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 10:24:04 +0100 Subject: [PATCH 05/19] Update io.rst --- doc/source/user_guide/io.rst | 112 +++++++++++++++++------------------ 1 file changed, 55 insertions(+), 57 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index c511119e8203b..2fe7f870edcd2 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5593,97 +5593,95 @@ Given the next test set: .. code-block:: python - from numpy.random import randn + from numpy.random import randn + sz = 1000000 + df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) - sz = 1000000 - df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) + def test_sql_write(df): + if os.path.exists('test.sql'): + os.remove('test.sql') + sql_db = sqlite3.connect('test.sql') + df.to_sql(name='test_table', con=sql_db) + sql_db.close() - def test_sql_write(df): - if os.path.exists('test.sql'): - os.remove('test.sql') - sql_db = sqlite3.connect('test.sql') - df.to_sql(name='test_table', con=sql_db) - sql_db.close() + def test_sql_read(): + sql_db = sqlite3.connect('test.sql') + pd.read_sql_query("select * from test_table", sql_db) + sql_db.close() - def test_sql_read(): - sql_db = sqlite3.connect('test.sql') - pd.read_sql_query("select * from test_table", sql_db) - sql_db.close() + def test_hdf_fixed_write(df): + df.to_hdf('test_fixed.hdf', 'test', mode='w') - def test_hdf_fixed_write(df): - df.to_hdf('test_fixed.hdf', 'test', mode='w') + def test_hdf_fixed_read(): + pd.read_hdf('test_fixed.hdf', 'test') - def test_hdf_fixed_read(): - pd.read_hdf('test_fixed.hdf', 'test') + def test_hdf_fixed_write_compress(df): + df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') - def test_hdf_fixed_write_compress(df): - df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') + def test_hdf_fixed_read_compress(): + pd.read_hdf('test_fixed_compress.hdf', 'test') - def test_hdf_fixed_read_compress(): - pd.read_hdf('test_fixed_compress.hdf', 'test') + def test_hdf_table_write(df): + df.to_hdf('test_table.hdf', 'test', mode='w', format='table') - def test_hdf_table_write(df): - df.to_hdf('test_table.hdf', 'test', mode='w', format='table') + def test_hdf_table_read(): + pd.read_hdf('test_table.hdf', 'test') - def test_hdf_table_read(): - pd.read_hdf('test_table.hdf', 'test') + def test_hdf_table_write_compress(df): + df.to_hdf('test_table_compress.hdf', 'test', mode='w', + complib='blosc', format='table') - def test_hdf_table_write_compress(df): - df.to_hdf('test_table_compress.hdf', 'test', mode='w', - complib='blosc', format='table') + def test_hdf_table_read_compress(): + pd.read_hdf('test_table_compress.hdf', 'test') - def test_hdf_table_read_compress(): - pd.read_hdf('test_table_compress.hdf', 'test') + def test_csv_write(df): + df.to_csv('test.csv', mode='w') - def test_csv_write(df): - df.to_csv('test.csv', mode='w') + def test_csv_read(): + pd.read_csv('test.csv', index_col=0) - def test_csv_read(): - pd.read_csv('test.csv', index_col=0) - - - def test_feather_write(df): - df.to_feather('test.feather') + def test_feather_write(df): + df.to_feather('test.feather') def test_feather_read(): pd.read_feather('test.feather') - def test_pickle_write(df): - df.to_pickle('test.pkl') + def test_pickle_write(df): + df.to_pickle('test.pkl') - def test_pickle_read(): - pd.read_pickle('test.pkl') + def test_pickle_read(): + pd.read_pickle('test.pkl') - def test_pickle_write_compress(df): - df.to_pickle('test.pkl.compress', compression='xz') + def test_pickle_write_compress(df): + df.to_pickle('test.pkl.compress', compression='xz') - def test_pickle_read_compress(): - pd.read_pickle('test.pkl.compress', compression='xz') + def test_pickle_read_compress(): + pd.read_pickle('test.pkl.compress', compression='xz') - def test_parquet_write(df): - df.to_parquet('test.parquet') - - - def test_parquet_read(): - pd.read_parquet('test.parquet') + def test_parquet_write(df): + df.to_parquet('test.parquet') + + + def test_parquet_read(): + pd.read_parquet('test.parquet') When writing, the top-three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``. @@ -5704,22 +5702,22 @@ When writing, the top-three functions in terms of speed are ``test_feather_write In [8]: %timeit test_hdf_table_write_compress(df) 448 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [9]: %timeit test_csv_write(df) 3.66 s ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [10]: %timeit test_feather_write(df) 9.75 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) - + In [11]: %timeit test_pickle_write(df) 30.1 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [12]: %timeit test_pickle_write_compress(df) 4.29 s ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [13]: %timeit test_parquet_write(df) 67.6 ms ± 706 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - + When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and ``test_hdf_fixed_read``. From ddd39f60fafed5da8fa1d08d23f7c43a03b5f555 Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 10:50:21 +0100 Subject: [PATCH 06/19] Update io.rst --- doc/source/user_guide/io.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 2fe7f870edcd2..2ce99e6bb0514 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5678,8 +5678,8 @@ Given the next test set: def test_parquet_write(df): df.to_parquet('test.parquet') - - + + def test_parquet_read(): pd.read_parquet('test.parquet') @@ -5702,7 +5702,7 @@ When writing, the top-three functions in terms of speed are ``test_feather_write In [8]: %timeit test_hdf_table_write_compress(df) 448 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [9]: %timeit test_csv_write(df) 3.66 s ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) From 26b5db1dadb875da50a6286be0b9ec50034551e2 Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 11:29:36 +0100 Subject: [PATCH 07/19] Update io.rst --- doc/source/user_guide/io.rst | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 2ce99e6bb0514..4aec2e3055cc0 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5597,7 +5597,6 @@ Given the next test set: sz = 1000000 df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) - def test_sql_write(df): if os.path.exists('test.sql'): os.remove('test.sql') @@ -5702,7 +5701,7 @@ When writing, the top-three functions in terms of speed are ``test_feather_write In [8]: %timeit test_hdf_table_write_compress(df) 448 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [9]: %timeit test_csv_write(df) 3.66 s ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) @@ -5735,7 +5734,7 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and In [17]: %timeit test_hdf_table_read() 38.6 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - + In [18]: %timeit test_hdf_table_read_compress() 38.8 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) @@ -5753,10 +5752,9 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and In [23]: %timeit test_parquet_read() 24.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - + For this test case ``test.pkl.compress``, ``test.parquet`` and ``test.feather`` took the least space on disk. - Space on disk (in bytes) .. code-block:: none From cf85f9520011c8b0dcf07ce9e7ce6d3ab8a80656 Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 12:09:27 +0100 Subject: [PATCH 08/19] Update io.rst --- doc/source/user_guide/io.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 4aec2e3055cc0..ca0f8bbfbd531 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5596,14 +5596,14 @@ Given the next test set: from numpy.random import randn sz = 1000000 df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) - + def test_sql_write(df): if os.path.exists('test.sql'): os.remove('test.sql') sql_db = sqlite3.connect('test.sql') df.to_sql(name='test_table', con=sql_db) sql_db.close() - + def test_sql_read(): sql_db = sqlite3.connect('test.sql') pd.read_sql_query("select * from test_table", sql_db) @@ -5701,7 +5701,7 @@ When writing, the top-three functions in terms of speed are ``test_feather_write In [8]: %timeit test_hdf_table_write_compress(df) 448 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [9]: %timeit test_csv_write(df) 3.66 s ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) @@ -5734,7 +5734,7 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and In [17]: %timeit test_hdf_table_read() 38.6 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - + In [18]: %timeit test_hdf_table_read_compress() 38.8 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) @@ -5746,14 +5746,14 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and In [21]: %timeit test_pickle_read() 18.4 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) - + In [22]: %timeit test_pickle_read_compress() 915 ms ± 7.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [23]: %timeit test_parquet_read() 24.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - + For this test case ``test.pkl.compress``, ``test.parquet`` and ``test.feather`` took the least space on disk. Space on disk (in bytes) From 3d71d4078dad4b6c85a77306db94e618e3697013 Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 13:06:24 +0100 Subject: [PATCH 09/19] Update io.rst --- doc/source/user_guide/io.rst | 116 +++++++++++++++++------------------ 1 file changed, 58 insertions(+), 58 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index ca0f8bbfbd531..eec83ef04b549 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5593,94 +5593,94 @@ Given the next test set: .. code-block:: python - from numpy.random import randn - sz = 1000000 - df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) - - def test_sql_write(df): - if os.path.exists('test.sql'): - os.remove('test.sql') - sql_db = sqlite3.connect('test.sql') - df.to_sql(name='test_table', con=sql_db) - sql_db.close() - - def test_sql_read(): - sql_db = sqlite3.connect('test.sql') - pd.read_sql_query("select * from test_table", sql_db) - sql_db.close() + from numpy.random import randn + sz = 1000000 + df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) + + def test_sql_write(df): + if os.path.exists('test.sql'): + os.remove('test.sql') + sql_db = sqlite3.connect('test.sql') + df.to_sql(name='test_table', con=sql_db) + sql_db.close() + + def test_sql_read(): + sql_db = sqlite3.connect('test.sql') + pd.read_sql_query("select * from test_table", sql_db) + sql_db.close() - def test_hdf_fixed_write(df): - df.to_hdf('test_fixed.hdf', 'test', mode='w') + def test_hdf_fixed_write(df): + df.to_hdf('test_fixed.hdf', 'test', mode='w') - def test_hdf_fixed_read(): - pd.read_hdf('test_fixed.hdf', 'test') + def test_hdf_fixed_read(): + pd.read_hdf('test_fixed.hdf', 'test') - def test_hdf_fixed_write_compress(df): - df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') + def test_hdf_fixed_write_compress(df): + df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') - def test_hdf_fixed_read_compress(): - pd.read_hdf('test_fixed_compress.hdf', 'test') + def test_hdf_fixed_read_compress(): + pd.read_hdf('test_fixed_compress.hdf', 'test') - def test_hdf_table_write(df): - df.to_hdf('test_table.hdf', 'test', mode='w', format='table') + def test_hdf_table_write(df): + df.to_hdf('test_table.hdf', 'test', mode='w', format='table') - def test_hdf_table_read(): - pd.read_hdf('test_table.hdf', 'test') + def test_hdf_table_read(): + pd.read_hdf('test_table.hdf', 'test') - def test_hdf_table_write_compress(df): - df.to_hdf('test_table_compress.hdf', 'test', mode='w', - complib='blosc', format='table') + def test_hdf_table_write_compress(df): + df.to_hdf('test_table_compress.hdf', 'test', mode='w', + complib='blosc', format='table') - def test_hdf_table_read_compress(): - pd.read_hdf('test_table_compress.hdf', 'test') + def test_hdf_table_read_compress(): + pd.read_hdf('test_table_compress.hdf', 'test') - def test_csv_write(df): - df.to_csv('test.csv', mode='w') + def test_csv_write(df): + df.to_csv('test.csv', mode='w') - def test_csv_read(): - pd.read_csv('test.csv', index_col=0) + def test_csv_read(): + pd.read_csv('test.csv', index_col=0) - def test_feather_write(df): + def test_feather_write(df): df.to_feather('test.feather') def test_feather_read(): - pd.read_feather('test.feather') + pd.read_feather('test.feather') - def test_pickle_write(df): - df.to_pickle('test.pkl') + def test_pickle_write(df): + df.to_pickle('test.pkl') - def test_pickle_read(): - pd.read_pickle('test.pkl') + def test_pickle_read(): + pd.read_pickle('test.pkl') - def test_pickle_write_compress(df): - df.to_pickle('test.pkl.compress', compression='xz') + def test_pickle_write_compress(df): + df.to_pickle('test.pkl.compress', compression='xz') - def test_pickle_read_compress(): - pd.read_pickle('test.pkl.compress', compression='xz') - - - def test_parquet_write(df): - df.to_parquet('test.parquet') + def test_pickle_read_compress(): + pd.read_pickle('test.pkl.compress', compression='xz') + + + def test_parquet_write(df): + df.to_parquet('test.parquet') - def test_parquet_read(): - pd.read_parquet('test.parquet') + def test_parquet_read(): + pd.read_parquet('test.parquet') When writing, the top-three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``. @@ -5698,10 +5698,10 @@ When writing, the top-three functions in terms of speed are ``test_feather_write In [7]: %timeit test_hdf_table_write(df) 449 ms ± 5.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [8]: %timeit test_hdf_table_write_compress(df) 448 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [9]: %timeit test_csv_write(df) 3.66 s ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) @@ -5734,7 +5734,7 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and In [17]: %timeit test_hdf_table_read() 38.6 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - + In [18]: %timeit test_hdf_table_read_compress() 38.8 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) @@ -5743,13 +5743,13 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and In [20]: %timeit test_feather_read() 12.4 ms ± 99.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) - + In [21]: %timeit test_pickle_read() 18.4 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) - + In [22]: %timeit test_pickle_read_compress() 915 ms ± 7.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - + In [23]: %timeit test_parquet_read() 24.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) From 8c8ed93b4d15df0b7ce1234d9da921c0fb95acba Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Thu, 10 Oct 2019 13:43:47 +0100 Subject: [PATCH 10/19] Update io.rst --- doc/source/user_guide/io.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index eec83ef04b549..6ca10f79abadb 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5594,16 +5594,19 @@ Given the next test set: .. code-block:: python from numpy.random import randn + sz = 1000000 df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) + def test_sql_write(df): if os.path.exists('test.sql'): os.remove('test.sql') sql_db = sqlite3.connect('test.sql') df.to_sql(name='test_table', con=sql_db) sql_db.close() - + + def test_sql_read(): sql_db = sqlite3.connect('test.sql') pd.read_sql_query("select * from test_table", sql_db) From 3e62c8f5ea73debcacf6d0825a5e8fff6124ca4b Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Fri, 11 Oct 2019 10:35:42 +0100 Subject: [PATCH 11/19] Update io.rst --- doc/source/user_guide/io.rst | 102 +++++++++++++++++------------------ 1 file changed, 51 insertions(+), 51 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 6ca10f79abadb..47c70c018b6eb 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5593,97 +5593,97 @@ Given the next test set: .. code-block:: python - from numpy.random import randn + from numpy.random import randn - sz = 1000000 - df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) + sz = 1000000 + df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) - def test_sql_write(df): - if os.path.exists('test.sql'): - os.remove('test.sql') - sql_db = sqlite3.connect('test.sql') - df.to_sql(name='test_table', con=sql_db) - sql_db.close() + def test_sql_write(df): + if os.path.exists('test.sql'): + os.remove('test.sql') + sql_db = sqlite3.connect('test.sql') + df.to_sql(name='test_table', con=sql_db) + sql_db.close() - def test_sql_read(): - sql_db = sqlite3.connect('test.sql') - pd.read_sql_query("select * from test_table", sql_db) - sql_db.close() + def test_sql_read(): + sql_db = sqlite3.connect('test.sql') + pd.read_sql_query("select * from test_table", sql_db) + sql_db.close() - def test_hdf_fixed_write(df): - df.to_hdf('test_fixed.hdf', 'test', mode='w') + def test_hdf_fixed_write(df): + df.to_hdf('test_fixed.hdf', 'test', mode='w') - def test_hdf_fixed_read(): - pd.read_hdf('test_fixed.hdf', 'test') + def test_hdf_fixed_read(): + pd.read_hdf('test_fixed.hdf', 'test') - def test_hdf_fixed_write_compress(df): - df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') + def test_hdf_fixed_write_compress(df): + df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') - def test_hdf_fixed_read_compress(): - pd.read_hdf('test_fixed_compress.hdf', 'test') + def test_hdf_fixed_read_compress(): + pd.read_hdf('test_fixed_compress.hdf', 'test') - def test_hdf_table_write(df): - df.to_hdf('test_table.hdf', 'test', mode='w', format='table') + def test_hdf_table_write(df): + df.to_hdf('test_table.hdf', 'test', mode='w', format='table') - def test_hdf_table_read(): - pd.read_hdf('test_table.hdf', 'test') + def test_hdf_table_read(): + pd.read_hdf('test_table.hdf', 'test') - def test_hdf_table_write_compress(df): - df.to_hdf('test_table_compress.hdf', 'test', mode='w', - complib='blosc', format='table') + def test_hdf_table_write_compress(df): + df.to_hdf('test_table_compress.hdf', 'test', mode='w', + complib='blosc', format='table') - def test_hdf_table_read_compress(): - pd.read_hdf('test_table_compress.hdf', 'test') + def test_hdf_table_read_compress(): + pd.read_hdf('test_table_compress.hdf', 'test') - def test_csv_write(df): - df.to_csv('test.csv', mode='w') + def test_csv_write(df): + df.to_csv('test.csv', mode='w') - def test_csv_read(): - pd.read_csv('test.csv', index_col=0) + def test_csv_read(): + pd.read_csv('test.csv', index_col=0) - def test_feather_write(df): - df.to_feather('test.feather') + def test_feather_write(df): + df.to_feather('test.feather') - def test_feather_read(): - pd.read_feather('test.feather') + def test_feather_read(): + pd.read_feather('test.feather') - def test_pickle_write(df): - df.to_pickle('test.pkl') + def test_pickle_write(df): + df.to_pickle('test.pkl') - def test_pickle_read(): - pd.read_pickle('test.pkl') + def test_pickle_read(): + pd.read_pickle('test.pkl') - def test_pickle_write_compress(df): - df.to_pickle('test.pkl.compress', compression='xz') + def test_pickle_write_compress(df): + df.to_pickle('test.pkl.compress', compression='xz') - def test_pickle_read_compress(): - pd.read_pickle('test.pkl.compress', compression='xz') + def test_pickle_read_compress(): + pd.read_pickle('test.pkl.compress', compression='xz') - def test_parquet_write(df): - df.to_parquet('test.parquet') + def test_parquet_write(df): + df.to_parquet('test.parquet') - def test_parquet_read(): - pd.read_parquet('test.parquet') + def test_parquet_read(): + pd.read_parquet('test.parquet') When writing, the top-three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``. @@ -5734,7 +5734,7 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and In [16]: %timeit test_hdf_fixed_read_compress() 19.5 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - + In [17]: %timeit test_hdf_table_read() 38.6 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) From 1af539cae0ff284830ae5cb5dfbdc023bf6b7103 Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Fri, 11 Oct 2019 11:23:23 +0100 Subject: [PATCH 12/19] Update io.rst --- doc/source/user_guide/io.rst | 27 +++------------------------ 1 file changed, 3 insertions(+), 24 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 47c70c018b6eb..e9efb3eebd9cb 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5597,8 +5597,7 @@ Given the next test set: sz = 1000000 df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) - - + def test_sql_write(df): if os.path.exists('test.sql'): os.remove('test.sql') @@ -5606,86 +5605,66 @@ Given the next test set: df.to_sql(name='test_table', con=sql_db) sql_db.close() - def test_sql_read(): sql_db = sqlite3.connect('test.sql') pd.read_sql_query("select * from test_table", sql_db) sql_db.close() - def test_hdf_fixed_write(df): df.to_hdf('test_fixed.hdf', 'test', mode='w') - def test_hdf_fixed_read(): pd.read_hdf('test_fixed.hdf', 'test') - def test_hdf_fixed_write_compress(df): df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') - def test_hdf_fixed_read_compress(): pd.read_hdf('test_fixed_compress.hdf', 'test') - def test_hdf_table_write(df): df.to_hdf('test_table.hdf', 'test', mode='w', format='table') - def test_hdf_table_read(): pd.read_hdf('test_table.hdf', 'test') - def test_hdf_table_write_compress(df): df.to_hdf('test_table_compress.hdf', 'test', mode='w', complib='blosc', format='table') - def test_hdf_table_read_compress(): pd.read_hdf('test_table_compress.hdf', 'test') - def test_csv_write(df): df.to_csv('test.csv', mode='w') - def test_csv_read(): pd.read_csv('test.csv', index_col=0) - def test_feather_write(df): df.to_feather('test.feather') - def test_feather_read(): pd.read_feather('test.feather') - def test_pickle_write(df): df.to_pickle('test.pkl') - def test_pickle_read(): pd.read_pickle('test.pkl') - def test_pickle_write_compress(df): df.to_pickle('test.pkl.compress', compression='xz') - def test_pickle_read_compress(): pd.read_pickle('test.pkl.compress', compression='xz') - - + def test_parquet_write(df): df.to_parquet('test.parquet') - def test_parquet_read(): pd.read_parquet('test.parquet') - When writing, the top-three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``. .. code-block:: ipython @@ -5720,10 +5699,10 @@ When writing, the top-three functions in terms of speed are ``test_feather_write In [13]: %timeit test_parquet_write(df) 67.6 ms ± 706 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) - When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and ``test_hdf_fixed_read``. + .. code-block:: ipython In [14]: %timeit test_sql_read() From ce51d5eb99b1e5302f8a719e5e778bbe1ac9ea4d Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Fri, 11 Oct 2019 11:50:51 +0100 Subject: [PATCH 13/19] Update io.rst --- doc/source/user_guide/io.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index e9efb3eebd9cb..fe80c5b46a932 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5642,10 +5642,10 @@ Given the next test set: pd.read_csv('test.csv', index_col=0) def test_feather_write(df): - df.to_feather('test.feather') + df.to_feather('test.feather') def test_feather_read(): - pd.read_feather('test.feather') + pd.read_feather('test.feather') def test_pickle_write(df): df.to_pickle('test.pkl') From 524c7e09a153df929aa99007d084fc8b84a22303 Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Sat, 12 Oct 2019 07:29:47 +0100 Subject: [PATCH 14/19] Update io.rst --- doc/source/user_guide/io.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index fe80c5b46a932..130d6707ded8d 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5594,8 +5594,10 @@ Given the next test set: .. code-block:: python from numpy.random import randn + from numpy.random import seed sz = 1000000 + seed(42) df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) def test_sql_write(df): From 2b77c5dd5f00170239504812af52a655878ca3cf Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Sat, 12 Oct 2019 18:39:46 +0100 Subject: [PATCH 15/19] Update io.rst --- doc/source/user_guide/io.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 130d6707ded8d..bf8c18037efb7 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5593,12 +5593,12 @@ Given the next test set: .. code-block:: python - from numpy.random import randn - from numpy.random import seed + + import numpy as np sz = 1000000 - seed(42) - df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) + np.random.seed(42) + df = pd.DataFrame({'A': np.random.randn(sz), 'B': [1] * sz}) def test_sql_write(df): if os.path.exists('test.sql'): From 22247380945777ccfb5f9079354527f285419993 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 21 Oct 2019 09:02:25 +0200 Subject: [PATCH 16/19] restore indentation --- doc/source/user_guide/io.rst | 100 +++++++++++++++++------------------ 1 file changed, 50 insertions(+), 50 deletions(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index bf8c18037efb7..1d4b668b09457 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5596,76 +5596,76 @@ Given the next test set: import numpy as np - sz = 1000000 - np.random.seed(42) - df = pd.DataFrame({'A': np.random.randn(sz), 'B': [1] * sz}) + sz = 1000000 + np.random.seed(42) + df = pd.DataFrame({'A': np.random.randn(sz), 'B': [1] * sz}) - def test_sql_write(df): - if os.path.exists('test.sql'): - os.remove('test.sql') - sql_db = sqlite3.connect('test.sql') - df.to_sql(name='test_table', con=sql_db) - sql_db.close() + def test_sql_write(df): + if os.path.exists('test.sql'): + os.remove('test.sql') + sql_db = sqlite3.connect('test.sql') + df.to_sql(name='test_table', con=sql_db) + sql_db.close() - def test_sql_read(): - sql_db = sqlite3.connect('test.sql') - pd.read_sql_query("select * from test_table", sql_db) - sql_db.close() + def test_sql_read(): + sql_db = sqlite3.connect('test.sql') + pd.read_sql_query("select * from test_table", sql_db) + sql_db.close() - def test_hdf_fixed_write(df): - df.to_hdf('test_fixed.hdf', 'test', mode='w') + def test_hdf_fixed_write(df): + df.to_hdf('test_fixed.hdf', 'test', mode='w') - def test_hdf_fixed_read(): - pd.read_hdf('test_fixed.hdf', 'test') + def test_hdf_fixed_read(): + pd.read_hdf('test_fixed.hdf', 'test') - def test_hdf_fixed_write_compress(df): - df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') + def test_hdf_fixed_write_compress(df): + df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') - def test_hdf_fixed_read_compress(): - pd.read_hdf('test_fixed_compress.hdf', 'test') + def test_hdf_fixed_read_compress(): + pd.read_hdf('test_fixed_compress.hdf', 'test') - def test_hdf_table_write(df): - df.to_hdf('test_table.hdf', 'test', mode='w', format='table') + def test_hdf_table_write(df): + df.to_hdf('test_table.hdf', 'test', mode='w', format='table') - def test_hdf_table_read(): - pd.read_hdf('test_table.hdf', 'test') + def test_hdf_table_read(): + pd.read_hdf('test_table.hdf', 'test') - def test_hdf_table_write_compress(df): - df.to_hdf('test_table_compress.hdf', 'test', mode='w', - complib='blosc', format='table') + def test_hdf_table_write_compress(df): + df.to_hdf('test_table_compress.hdf', 'test', mode='w', + complib='blosc', format='table') - def test_hdf_table_read_compress(): - pd.read_hdf('test_table_compress.hdf', 'test') + def test_hdf_table_read_compress(): + pd.read_hdf('test_table_compress.hdf', 'test') - def test_csv_write(df): - df.to_csv('test.csv', mode='w') + def test_csv_write(df): + df.to_csv('test.csv', mode='w') - def test_csv_read(): - pd.read_csv('test.csv', index_col=0) + def test_csv_read(): + pd.read_csv('test.csv', index_col=0) - def test_feather_write(df): - df.to_feather('test.feather') + def test_feather_write(df): + df.to_feather('test.feather') - def test_feather_read(): - pd.read_feather('test.feather') + def test_feather_read(): + pd.read_feather('test.feather') - def test_pickle_write(df): - df.to_pickle('test.pkl') + def test_pickle_write(df): + df.to_pickle('test.pkl') - def test_pickle_read(): - pd.read_pickle('test.pkl') + def test_pickle_read(): + pd.read_pickle('test.pkl') - def test_pickle_write_compress(df): - df.to_pickle('test.pkl.compress', compression='xz') + def test_pickle_write_compress(df): + df.to_pickle('test.pkl.compress', compression='xz') - def test_pickle_read_compress(): - pd.read_pickle('test.pkl.compress', compression='xz') + def test_pickle_read_compress(): + pd.read_pickle('test.pkl.compress', compression='xz') - def test_parquet_write(df): - df.to_parquet('test.parquet') + def test_parquet_write(df): + df.to_parquet('test.parquet') - def test_parquet_read(): - pd.read_parquet('test.parquet') + def test_parquet_read(): + pd.read_parquet('test.parquet') When writing, the top-three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``. From df377c18912dd7b05df6bd1639f2e689fccf1d84 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 21 Oct 2019 09:05:52 +0200 Subject: [PATCH 17/19] fixup --- doc/source/user_guide/io.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 1d4b668b09457..e46437fdc4b1c 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5594,7 +5594,7 @@ Given the next test set: .. code-block:: python - import numpy as np + import numpy as np sz = 1000000 np.random.seed(42) From e3eba95bd3989d3fbeb1f6e68303506165daa1f3 Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Fri, 8 Nov 2019 19:06:23 +0100 Subject: [PATCH 18/19] Update io.rst --- doc/source/user_guide/io.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index e46437fdc4b1c..beefafebd56ff 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5595,6 +5595,7 @@ Given the next test set: import numpy as np + import os sz = 1000000 np.random.seed(42) From 3aa5dead5f4597e94de849829b1faf5619fd931d Mon Sep 17 00:00:00 2001 From: Wuraola Oyewusi Date: Fri, 8 Nov 2019 19:14:45 +0100 Subject: [PATCH 19/19] Update io.rst --- doc/source/user_guide/io.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index beefafebd56ff..e46437fdc4b1c 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -5595,7 +5595,6 @@ Given the next test set: import numpy as np - import os sz = 1000000 np.random.seed(42)