Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: max on axis=1 returns wrong values on type datetime64[ns] when NaT is present in values #60646

Open
2 of 3 tasks
antoinefalck opened this issue Jan 2, 2025 · 1 comment
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Reduction Operations sum, mean, min, max, etc.

Comments

@antoinefalck
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame(["NaT", "2024-04-16 09:20:00.123456789"], dtype="datetime64[ns]")
df.max(axis=1)

# prints:
#
# 0                             NaT
# 1   2024-04-16 09:20:00.123456768
# dtype: datetime64[ns]

Issue Description

The max (axis=1) of a DataFrame of one column should return that column (type pd.Series).
However, when the dtype is datetime64[ns] and the column contains a NaT, there are small differences with the original and returned column.
In the MWE the difference is 21 nanoseconds.

Note: This only happens if there is a NaT in column values.

Expected Behavior

Expected behavior is to print

0                             NaT
1   2024-04-16 09:20:00.123456789
dtype: datetime64[ns]

Installed Versions

INSTALLED VERSIONS
------------------
commit                : 0691c5cf90477d3503834d983f69350f250a6ff7
python                : 3.11.10
python-bits           : 64
OS                    : Linux
OS-release            : 4.14.355-271.569.amzn2.x86_64
Version               : #1 SMP Tue Nov 5 10:11:37 UTC 2024
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : C.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.3
numpy                 : 1.26.0
pytz                  : 2024.1
dateutil              : 2.9.0
pip                   : 24.3.1
Cython                : 0.29.30
sphinx                : None
IPython               : 8.21.0
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : 4.12.3
blosc                 : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : 2024.10.0
html5lib              : 1.1
hypothesis            : None
gcsfs                 : None
jinja2                : 3.1.4
lxml.etree            : None
matplotlib            : 3.8.1
numba                 : None
numexpr               : 2.10.1
odfpy                 : None
openpyxl              : 3.1.5
pandas_gbq            : None
psycopg2              : None
pymysql               : None
pyarrow               : 15.0.2
pyreadstat            : None
pytest                : None
python-calamine       : None
pyxlsb                : None
s3fs                  : 2024.10.0
scipy                 : 1.14.1
sqlalchemy            : None
tables                : 3.8.0
tabulate              : 0.9.0
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
tzdata                : 2024.2
qtpy                  : None
pyqt5                 : None
@antoinefalck antoinefalck added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 2, 2025
@rhshadrach
Copy link
Member

Thanks for the report! In _nanminmax, the presence of NaT causes pandas to coerce to float here:

result = _maybe_null_out(result, axis, mask, values.shape)

It seems like it should be possible to avoid this coercion for datetimes as they are stored as integers here, NaT being the smallest (signed) integer value. Further investigations and PRs to fix are welcome!

@rhshadrach rhshadrach added Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Reduction Operations sum, mean, min, max, etc. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

No branches or pull requests

2 participants