-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RNTuple] Writing: support String and Bool types, and Vector Field #356
Conversation
Moelf
commented
Oct 16, 2024
•
edited
Loading
edited
- Investigate why nesting one more level of vector fails
- Make ROOT C++ happy
so the following is bad. The field records should have julia> newtable = (;
three_v_vuint = [[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]]
)
(three_v_vuint = Vector{Vector{UInt32}}[[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]],)
julia> frs, crs = UnROOT.schema_to_field_column_records(newtable);
julia> frs
3-element Vector{UnROOT.FieldRecord}:
UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000000, struct_role=0x0001, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="three_v_vuint", type_name="", type_alias="", field_desc="", )
UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000000, struct_role=0x0001, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="_0", type_name="", type_alias="", field_desc="", )
UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000000, struct_role=0x0000, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="_0", type_name="std::uint32_t", type_alias="", field_desc="", )
julia> crs
3-element Vector{UnROOT.ColumnRecord}:
UnROOT.ColumnRecord(type=0x0001, nbits=0x0040, field_id=0x00000000, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )
UnROOT.ColumnRecord(type=0x0001, nbits=0x0040, field_id=0x00000000, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )
UnROOT.ColumnRecord(type=0x000b, nbits=0x0020, field_id=0x00000001, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )
Reference layout from ROOT output
UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000005, struct_role=0x0001, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="vector_vector_int32", type_name="std::vector<std::vector<std::int32_t>>", type_alias="", field_desc="", )
UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000005, struct_role=0x0001, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="_0", type_name="std::vector<std::int32_t>", type_alias="", field_desc="", )
UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x00000006, struct_role=0x0000, flags=0x0000, repetition=0, source_field_id=-1, root_streamer_checksum=-1, field_name="_0", type_name="std::int32_t", type_alias="", field_desc="", )
# ------------------
UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x00000005, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )
UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x00000006, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, )
UnROOT.ColumnRecord(type=0x001b, nbits=0x0020, field_id=0x00000007, flags=0x0000, representation_idx=0x0000, first_ele_idx=0, ) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #356 +/- ##
==========================================
+ Coverage 84.65% 84.72% +0.07%
==========================================
Files 21 21
Lines 3043 3084 +41
==========================================
+ Hits 2576 2613 +37
- Misses 467 471 +4 ☔ View full report in Codecov by Sentry. |
julia> newtable = (;
one_vuint = [[0xcececece, 0xcdcdcdcd], [0xabababab]],
two_uint = [0xabababab, 0xcdcdcdcd],
three_v_vuint = [[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]]
)
(one_vuint = Vector{UInt32}[[0xcececece, 0xcdcdcdcd], [0xabababab]], two_uint = UInt32[0xabababab, 0xcdcdcdcd], three_v_vuint = Vector{Vector{UInt32}}[[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]])
julia> UnROOT.write_rntuple(open("/tmp/a.root", "w"), newtable; rntuple_name="myntuple")
julia> LazyTree("/tmp/a.root", "myntuple")
Row │ one_vuint two_uint three_v_vuint
│ Vector{UInt32} UInt32 Vector{Vector{U
─────┼────────────────────────────────────────────────────────────────────
1 │ [3469659854, 3452816845] 2880154539 Vector{UInt32}[UInt32[0xcece
2 │ [2880154539] 3452816845 Vector{UInt32}[UInt32[0xabab |
Element_T = eltype(input_T) | ||
content_parent_field_id = Element_T <: Real ? implicit_field_id : parent_field_id | ||
add_field_column_record!(field_records, column_records, Element_T, "_0"; parent_field_id = content_parent_field_id, col_field_id = length(field_records)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really not elegant and I feel like this would fail in more complicated case (i.e. not sure what happens when you have structs or 4-th order jagging)
9bd2f17
to
de5a0aa
Compare
julia> newtable = (;
three_v_vuint = [[[0xcececece], [0xcdcdcdcd]], [[0xabababab]]]
);
julia> using UnROOT
julia> UnROOT.write_rntuple(open("/tmp/a.root", "w"), newtable; rntuple_name="myntuple")
currently C++ ROOT is happy with the "base" case, but when we combine different types there's a checksum mismatch error
|
fixed! >>> import ROOT
>>> df = ROOT.RDataFrame("myntuple", "/tmp/a.root")
>>> df.GetColumnNames()
vector<string>{ "one_vuint._0", "three_v_vuint._0._0", "two_uint" }
>>> df.Display("two_uint").Print()
+-----+------------+
| Row | two_uint |
+-----+------------+
| 0 | 2880154539 |
+-----+------------+
| 1 | 3452816845 |
+-----+------------+
>>> df.Display("one_vuint._0").Print()
+-----+--------------+
| Row | one_vuint._0 |
+-----+--------------+
| 0 | 3469659854 |
| | 3452816845 |
+-----+--------------+
| 1 | 2880154539 |
+-----+--------------+
>>> df.Display("three_v_vuint._0._0").Print()
+-----+---------------------+
| Row | three_v_vuint._0._0 |
+-----+---------------------+
| 0 | { 3469659854 } |
| | { 3452816845 } |
+-----+---------------------+
| 1 | { 2880154539 } |
+-----+---------------------+ |
ROOT is also happy with our >>> df = ROOT.RDataFrame("myntuple", "/tmp/a.root")
>>> list(df.Take['bool']("two_bool"))
[True, False, True, True, False, False, True, False, True, False, False, True, True, False, False, True]
>>> df = ROOT.RDataFrame("myntuple", "/tmp/a.root")
>>> list(df.Take['std::string']("x"))
['abc', 'def'] |
N_pad = 8 - mod1(length(bytes), 8) | ||
append!(bytes, zeros(eltype(bytes), N_pad)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a bug in the reading logic all along, very happy to have caught it because we have writing-reading round trip test
add tests fix bit magic
more tests fix print everything in PyROOT add coverage add coverage
7a7b632
to
f7a6995
Compare