Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RNTuple] Writing: support String and Bool types, and Vector Field #356

Merged
merged 6 commits into from
Oct 17, 2024

Conversation

Moelf
Copy link
Member

@Moelf Moelf commented Oct 16, 2024

julia> newtable = (;
                      one_vuint = [[0xcececece, 0xcdcdcdcd], [0xabababab]],
                      two_uint = [0xabababab, 0xefefefef]
                  )
(one_vuint = Vector{UInt32}[[0xcececece, 0xcdcdcdcd], [0xabababab]], two_uint = UInt32[0xabababab, 0xefefefef])

julia> UnROOT.write_rntuple(open("/tmp/a.root", "w"), newtable; rntuple_name="myntuple")

julia> LazyTree("/tmp/a.root", "myntuple")
 Row │ one_vuint                 two_uint
     │ Vector{UInt32}            UInt32
─────┼──────────────────────────────────────
 1   │ [3469659854, 3452816845]  2880154539
 2   │ [2880154539]              4025479151
  • Investigate why nesting one more level of vector fails
  • Make ROOT C++ happy

@Moelf
Copy link
Member Author

Moelf commented Oct 16, 2024

so the following is bad.

The field records should have parent_field_id 0, 0, 1 respectively, and for the column records, they should have field_id 0, 1, 2 respectively.

julia> newtable = (;
                      three_v_vuint = [[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]]
                  )
(three_v_vuint = Vector{Vector{UInt32}}[[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]],)

julia> frs, crs = UnROOT.schema_to_field_column_records(newtable);

julia> frs
3-element Vector{UnROOT.FieldRecord}:
 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000000, struct_role=0x0001, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="three_v_vuint", type_name="", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000000, struct_role=0x0001, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="_0", type_name="", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000000, struct_role=0x0000, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="_0", type_name="std::uint32_t", type_alias="", field_desc="", )


julia> crs
3-element Vector{UnROOT.ColumnRecord}:
 UnROOT.ColumnRecord(type=0x0001, nbits=0x0040, field_id=0x00000000, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )

 UnROOT.ColumnRecord(type=0x0001, nbits=0x0040, field_id=0x00000000, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )

 UnROOT.ColumnRecord(type=0x000b, nbits=0x0020, field_id=0x00000001, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )

Reference layout from ROOT output

      ├─ :vector_vector_int32 ⇒ Vector
      │                         ├─ :offset ⇒ Leaf{UnROOT.Index64}(col=6)
      │                         └─ :content ⇒ Vector
      │                                       ├─ :offset ⇒ Leaf{UnROOT.Index64}(col=7)
      │                                       └─ :content ⇒ Leaf{Int32}(col=8)
 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000005, struct_role=0x0001, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="vector_vector_int32", type_name="std::vector<std::vector<std::int32_t>>", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000005, struct_role=0x0001, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="_0", type_name="std::vector<std::int32_t>", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000006, struct_role=0x0000, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="_0", type_name="std::int32_t", type_alias="", field_desc="", )

# ------------------

 UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x00000005, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )

 UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x00000006, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )

 UnROOT.ColumnRecord(type=0x001b, nbits=0x0020, field_id=0x00000007, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 93.87755% with 3 lines in your changes missing coverage. Please review.

Project coverage is 84.72%. Comparing base (4ee4732) to head (f7a6995).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/RNTuple/Writing/TFileWriter.jl 90.90% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #356      +/-   ##
==========================================
+ Coverage   84.65%   84.72%   +0.07%     
==========================================
  Files          21       21              
  Lines        3043     3084      +41     
==========================================
+ Hits         2576     2613      +37     
- Misses        467      471       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Moelf
Copy link
Member Author

Moelf commented Oct 16, 2024

julia> newtable = (;
                      one_vuint = [[0xcececece, 0xcdcdcdcd], [0xabababab]],
                      two_uint = [0xabababab, 0xcdcdcdcd],
                      three_v_vuint = [[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]]
                  )
(one_vuint = Vector{UInt32}[[0xcececece, 0xcdcdcdcd], [0xabababab]], two_uint = UInt32[0xabababab, 0xcdcdcdcd], three_v_vuint = Vector{Vector{UInt32}}[[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]])

julia> UnROOT.write_rntuple(open("/tmp/a.root", "w"), newtable; rntuple_name="myntuple")

julia> LazyTree("/tmp/a.root", "myntuple")
 Row │ one_vuint                 two_uint    three_v_vuint
     │ Vector{UInt32}            UInt32      Vector{Vector{U
─────┼────────────────────────────────────────────────────────────────────
 1   │ [3469659854, 3452816845]  2880154539  Vector{UInt32}[UInt32[0xcece
 2   │ [2880154539]              3452816845  Vector{UInt32}[UInt32[0xabab

Comment on lines +494 to +509
Element_T = eltype(input_T)
content_parent_field_id = Element_T <: Real ? implicit_field_id : parent_field_id
add_field_column_record!(field_records, column_records, Element_T, "_0"; parent_field_id = content_parent_field_id, col_field_id = length(field_records))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really not elegant and I feel like this would fail in more complicated case (i.e. not sure what happens when you have structs or 4-th order jagging)

@Moelf
Copy link
Member Author

Moelf commented Oct 16, 2024

julia> newtable = (;
               three_v_vuint = [[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]]
           );

julia> using UnROOT

julia> UnROOT.write_rntuple(open("/tmp/a.root", "w"), newtable; rntuple_name="myntuple")
>>> df  =  ROOT.RDataFrame("myntuple", "/tmp/a.root")
>>> df.GetColumnNames()
vector<string>{ "three_v_vuint._0._0" }
>>> df.Display("three_v_vuint._0._0").Print()
+-----+---------------------+
| Row | three_v_vuint._0._0 |
+-----+---------------------+
| 0   | { 3469659854 }      |
|     | { 3452816845 }      |
+-----+---------------------+
| 1   | { 2880154539 }      |
+-----+---------------------+

currently C++ ROOT is happy with the "base" case, but when we combine different types there's a checksum mismatch error

  static RResult<unsigned int> ROOT::Experimental::Internal::RNTupleSerializer::DeserializeEnvelope(const void*, uint64_t, uint16_t) [/build/jenkins/workspace/lcg_nightly_pipeline/build/projects/ROOT-HEAD/src/ROOT/HEAD/tree/ntuple/v7/src/RNTupleSerialize.cxx:902]
  static ROOT::Experimental::RResult<void> ROOT::Experimental::Internal::RNTupleSerializer::DeserializePageList

@Moelf
Copy link
Member Author

Moelf commented Oct 16, 2024

fixed!

>>> import ROOT

>>> df  =  ROOT.RDataFrame("myntuple", "/tmp/a.root")

>>> df.GetColumnNames()
vector<string>{ "one_vuint._0", "three_v_vuint._0._0", "two_uint" }

>>> df.Display("two_uint").Print()
+-----+------------+
| Row | two_uint   |
+-----+------------+
| 0   | 2880154539 |
+-----+------------+
| 1   | 3452816845 |
+-----+------------+

>>> df.Display("one_vuint._0").Print()
+-----+--------------+
| Row | one_vuint._0 |
+-----+--------------+
| 0   | 3469659854   |
|     | 3452816845   |
+-----+--------------+
| 1   | 2880154539   |
+-----+--------------+

>>> df.Display("three_v_vuint._0._0").Print()
+-----+---------------------+
| Row | three_v_vuint._0._0 |
+-----+---------------------+
| 0   | { 3469659854 }      |
|     | { 3452816845 }      |
+-----+---------------------+
| 1   | { 2880154539 }      |
+-----+---------------------+

@Moelf Moelf requested a review from tamasgal October 16, 2024 13:27
@Moelf
Copy link
Member Author

Moelf commented Oct 17, 2024

ROOT is also happy with our Bool and String:

>>> df  =  ROOT.RDataFrame("myntuple", "/tmp/a.root")
>>> list(df.Take['bool']("two_bool"))
[True, False, True, True, False, False, True, False, True, False, False, True, True, False, False, True]

>>> df  =  ROOT.RDataFrame("myntuple", "/tmp/a.root")
>>> list(df.Take['std::string']("x"))
['abc', 'def']

@Moelf Moelf changed the title [RNTuple] write vector field [RNTuple] Writing: support String and Bools types, support Vector Field Oct 17, 2024
@Moelf Moelf changed the title [RNTuple] Writing: support String and Bools types, support Vector Field [RNTuple] Writing: support String and Bool types, and Vector Field Oct 17, 2024
Comment on lines +108 to +109
N_pad = 8 - mod1(length(bytes), 8)
append!(bytes, zeros(eltype(bytes), N_pad))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a bug in the reading logic all along, very happy to have caught it because we have writing-reading round trip test

Moelf added 4 commits October 18, 2024 01:02
add tests

fix bit magic
more tests

fix

print everything in PyROOT

add coverage

add coverage
@Moelf Moelf merged commit 0724c14 into main Oct 17, 2024
8 checks passed
@Moelf Moelf deleted the rnt_vec_writing branch October 17, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant