You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sql(s"create table t1(c1 decimal(18, 2)) using parquet")
sql(s"insert into t1 values(1.23), (-1.23), (0.0), (null)")
checkSparkAnswerAndOperator(s"select c1, hash(c1), xxhash64(c1) from t1 order by c1")
When precision is greater than 18, spark use bytes of unscaled value to calculate hash. But the results are still inconsistent.
test:
sql(s"create table t2(c1 decimal(20, 2)) using parquet")
sql(s"insert into t2 values(1.23), (-1.23), (0.0), (null)")
checkSparkAnswerAndOperator(s"select c1, hash(c1), xxhash64(c1) from t2 order by c1")
Describe the bug
Result mismatch with vanilla spark in hash function with decimal input.
Steps to reproduce
Case 1:
precision <= 18
When precision is less than or equal to 18, spark use unscaled long value of decimal to calculate hash.
https://github.com/apache/spark/blob/d12bb23e49928ae36b57ac5fbeaa492dadf7d28f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L560-L567
test:
result:
Case 2:
precision > 18
When precision is greater than 18, spark use bytes of unscaled value to calculate hash. But the results are still inconsistent.
test:
result:
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: