Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to read triply jagged structure #226

Closed
tamasgal opened this issue Mar 15, 2023 · 6 comments
Closed

Failed to read triply jagged structure #226

tamasgal opened this issue Mar 15, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@tamasgal
Copy link
Member

tamasgal commented Mar 15, 2023

I just hit the wall when trying to read triply jagged data in our ROOT files. Let me use this issue to collect my findings, I have to fix this for sure 😉 Feel free to comment though, I am happy about every feedback 😆

E/Evt/trks/trks.fitinf points to a branch, which is coming from a class Evt which has a vector<Trk> and each Trk has a field called fitinf which is a vector<double>. There are other fields which are doubly jagged in this sense, e.g. rec_stages which is a vector<int>. This means that "E/Evt/trks/trks.fitinf" will be triply nested.

I thought UnROOT can eat this but unfortunately it cannot.

julia> using UnROOT

julia> f = ROOTFile("test/samples/km3net_offline.root")
ROOTFile with 2 entries and 25 streamers.
test/samples/km3net_offline.root
├─ E (TTree)
│  └─ "Evt"
└─ Header (Head)


julia> LazyBranch(f, "E/Evt/trks/trks.fitinf")
ERROR: TypeError: in Array, in element type, expected Type, got a value of type Nothing
Stacktrace:
 [1] Array
   @ ./boot.jl:459 [inlined]
 [2] Vector{nothing}()
   @ Core ./boot.jl:478
 [3] LazyBranch(f::ROOTFile, b::UnROOT.TBranchElement_10)
   @ UnROOT ~/Dev/UnROOT.jl/src/iteration.jl:116
 [4] LazyBranch(f::ROOTFile, s::String)
   @ UnROOT ~/Dev/UnROOT.jl/src/iteration.jl:125
 [5] top-level scope
   @ REPL[93]:1

julia> LazyBranch(f, "E/Evt/trks/trks.E")  # works fine
10-element LazyBranch{SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, UnROOT.Nooffsetjagg, ArraysOfArrays.VectorOfVectors{Float64, Vector{Float64}, Vector{Int32}, Vector{Tuple{}}}}: 
 [99.10458562488608, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [99.10458562488608, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [99.10458562488608, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [37.8551524925863, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [99.10458562488608, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [7.1691678741479565, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [99.10458562488608, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [99.10458562488608, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [49.13672985920654, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [20.35137468173687, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


julia> f["E/Evt/trks/trks.fitinf"]
UnROOT.TBranchElement_10
  cursor: UnROOT.Cursor
  fName: String "trks.fitinf"
  fTitle: String "fitinf[trks_]"
  fFillColor: Int16 0
  fFillStyle: Int16 1001
  fCompress: Int32 1
  fBasketSize: Int32 32000
  fEntryOffsetLen: Int32 40
  fWriteBasket: Int32 1
  fEntryNumber: Int64 10
  fIOFeatures: UnROOT.ROOT_3a3a_TIOFeatures
  fOffset: Int32 0
  fMaxBaskets: UInt32 0x0000000a
  fSplitLevel: Int32 0
  fEntries: Int64 10
  fFirstEntry: Int64 0
  fTotBytes: Int64 15979
  fZipBytes: Int64 7484
  fBranches: UnROOT.TObjArray
  fLeaves: UnROOT.TObjArray
  fBaskets: UnROOT.TObjArray
  fBasketBytes: Array{Int64}((10,)) [7484, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  fBasketEntry: Array{Int64}((10,)) [0, 10, 0, 0, 0, 0, 0, 0, 0, 0]
  fBasketSeek: Array{Int64}((10,)) [46158, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  fFileName: String ""
  fClassName: String "Trk"
  fParentName: String "Trk"
  fClonesName: String ""
  fCheckSum: UInt32 0x1604acda
  fClassVersion: Int16 10
  fID: Int32 13
  fType: Int32 41
  fStreamerType: Int32 300
  fMaximum: Int32 0
  fBranchCount: Missing missing
  fBranchCount2: Missing missing


julia> f["E/Evt/trks/trks.E"]
UnROOT.TBranchElement_10
  cursor: UnROOT.Cursor
  fName: String "trks.E"
  fTitle: String "E[trks_]"
  fFillColor: Int16 0
  fFillStyle: Int16 1001
  fCompress: Int32 1
  fBasketSize: Int32 32000
  fEntryOffsetLen: Int32 40
  fWriteBasket: Int32 1
  fEntryNumber: Int64 10
  fIOFeatures: UnROOT.ROOT_3a3a_TIOFeatures
  fOffset: Int32 0
  fMaxBaskets: UInt32 0x0000000a
  fSplitLevel: Int32 0
  fEntries: Int64 10
  fFirstEntry: Int64 0
  fTotBytes: Int64 4574
  fZipBytes: Int64 226
  fBranches: UnROOT.TObjArray
  fLeaves: UnROOT.TObjArray
  fBaskets: UnROOT.TObjArray
  fBasketBytes: Array{Int64}((10,)) [226, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  fBasketEntry: Array{Int64}((10,)) [0, 10, 0, 0, 0, 0, 0, 0, 0, 0]
  fBasketSeek: Array{Int64}((10,)) [40683, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  fFileName: String ""
  fClassName: String "Trk"
  fParentName: String "Trk"
  fClonesName: String ""
  fCheckSum: UInt32 0x1604acda
  fClassVersion: Int16 10
  fID: Int32 5
  fType: Int32 41
  fStreamerType: Int32 8
  fMaximum: Int32 0
  fBranchCount: Missing missing
  fBranchCount2: Missing missing

The streamer information reveals the following differences:

f["E/Evt/trks/trks.E"] which is a "doubly nested" double

  fID: Int32 5
  fType: Int32 41
  fStreamerType: Int32 8

f["E/Evt/trks/trks.fitinf"] which is a "triply nested" double

  fID: Int32 13
  fType: Int32 41
  fStreamerType: Int32 300

300 comes from const kSTL = 300 (constants.jl), so it's already giving the correct hint that it's an STL container of the parent class (in this case Trk), while the former (E branch) has streamer type 8 (double).

Another example is f["E/Evt/trks/trks.hit_ids"] which is a vector<int> of Trk, so similar to fitinf but int. This is obviously encoded in fID:

  fID: Int32 14
  fType: Int32 41
  fStreamerType: Int32 300
@tamasgal tamasgal added the bug Something isn't working label Mar 15, 2023
@tamasgal
Copy link
Member Author

So one of the problems is that auto_T_jagg() chokes (

function auto_T_JaggT(f::ROOTFile, branch; customstructs::Dict{String, Type})
). The implementation there is a bit wonky anyways. Looking at what uproot3 does, it's of course obvious that it's neither straightforward, nor easy 😆 cf. https://github.com/scikit-hep/uproot3/blob/54f5151fb7c686c3a161fbe44b9f299e482f346b/uproot3/interp/auto.py#L246

fTitle: String "E[trks_]" matches the regex which identifies it as "vector". Of course trks_ is not matching any of the primitive types, like int or double etc. so we need to dig deeper.

For LazyBranch(f, "E/Evt/trks/trks.fitinf") we get

_type = Nothing
_jaggtype = UnROOT.Nooffsetjagg
classname = "Trk"
parentname = "Trk"
(T, J) = (Vector{nothing}, UnROOT.Nooffsetjagg)

Here, we need to hook into the Trk class name and UnROOT gives us the correct streamers:

julia> UnROOT.streamerfor(f, "Trk")
UnROOT.StreamerInfo(UnROOT.TStreamerInfo{UnROOT.TObjArray}("Trk", "", 0x1604acda, 10, UnROOT.TObjArray("", 0, Any[UnROOT.TStreamerBase
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "AAObject"
  fTitle: String ""
  fType: Int32 0
  fSize: Int32 0
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 813462445, 0, 0, 0]
  fTypeName: String "BASE"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fBaseVersion: Int32 5
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "id"
  fTitle: String "dox: track id"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerObjectAny
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "pos"
  fTitle: String "dox: postion of the track at time t"
  fType: Int32 62
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "Vec"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerObjectAny
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "dir"
  fTitle: String "dox: track direction"
  fType: Int32 62
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "Vec"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "t"
  fTitle: String "dox: track time (when the particle is at pos )"
  fType: Int32 8
  fSize: Int64 8
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "double"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "E"
  fTitle: String "dox: Energy (either mc truth or reconstructed)"
  fType: Int32 8
  fSize: Int64 8
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "double"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "len"
  fTitle: String "dox: length, if applicable"
  fType: Int32 8
  fSize: Int64 8
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "double"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "lik"
  fTitle: String "dox: likelihood or lambda value (for aafit, lambda)"
  fType: Int32 8
  fSize: Int64 8
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "double"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "type"
  fTitle: String "dox: MC: particle type in PDG encoding."
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "rec_type"
  fTitle: String "dox: identifyer for the overall fitting algorithm/chain/strategy"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "rec_stages"
  fTitle: String "dox: list of identifyers of succesfull fitting stages resulting in this track"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector<int>"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 3
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "status"
  fTitle: String "dox: MC status code"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "mother_id"
  fTitle: String "dox: MC id of the parent particle"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "fitinf"
  fTitle: String "dox: place to store additional fit info, for jgandalf, see http://common.pages.km3net.de/jpp/JFitParameters_8hh.html"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector<double>"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 8
, UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "hit_ids"
  fTitle: String "dox: list of associated hit-ids (corresponds to Hit::id)."
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector<int>"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 3
, UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "error_matrix"
  fTitle: String "dox: (5x5) error covariance matrix (stored as linear vector)"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector<double>"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 8
, UnROOT.TStreamerSTLstring
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "comment"
  fTitle: String "dox: use as you like"
  fType: Int32 500
  fSize: Int32 8
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "string"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 365
  fCtype: Int32 365
])), Set(Any["AAObject"]))

and there we can see that fitinf has the following streamer:

UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "fitinf"
  fTitle: String "dox: place to store additional fit info, for jgandalf, see http://common.pages.km3net.de/jpp/JFitParameters_8hh.html"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector<double>"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 8

Now we get the correct vector<double> hint.

Apparently the streamerfor needs to be improved, so that we can get the streamer of a field directly.

I need to sort this out but I am currently lost in all the details (it's along time ago I worked on the automatic parsing part).

@tamasgal
Copy link
Member Author

OK, got a bit further

UnROOT.jl/src/utils.jl

Lines 59 to 72 in 3c2ad37

function JaggType(f, branch, leaf)
# https://github.com/scikit-hep/uproot3/blob/54f5151fb7c686c3a161fbe44b9f299e482f346b/uproot3/interp/auto.py#L144
(match(r"\[.*\]", leaf.fTitle) !== nothing) && return Nooffsetjagg
leaf isa TLeafElement && leaf.fLenType==0 && return Offsetjagg
!hasproperty(branch, :fClassName) && return Nojagg
try
streamer = streamerfor(f, branch.fClassName).streamer.fElements.elements[1]
(streamer.fSTLtype == Const.kSTLvector) && return Offsetjagg
catch
end
return Nojagg
end

This is definitely not working here. As written above, the regex match on [] already does a too early return and the streamerfor return value is not treated correctly, it simply returns the first object in the vector without checking what field we are looking for.

@tamasgal
Copy link
Member Author

I just realised that fID is the element index of the streamer of a class.

fID = 13 for fitinf and voila (zero-based indexing, so +1), fTypeName = vector<double>

julia> UnROOT.streamerfor(f, "Trk").streamer.fElements[13+1]
UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "fitinf"
  fTitle: String "dox: place to store additional fit info, for jgandalf, see http://common.pages.km3net.de/jpp/JFitParameters_8hh.html"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector<double>"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 8

and here another example for E which has ``fID = 5and you can seefTypeName = double`

julia> UnROOT.streamerfor(f, "Trk").streamer.fElements[5+1]
UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "E"
  fTitle: String "dox: Energy (either mc truth or reconstructed)"
  fType: Int32 8
  fSize: Int64 8
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "double"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0

@Moelf
Copy link
Member

Moelf commented Mar 15, 2023

also #119

@tamasgal
Copy link
Member Author

Indeed, the day has come ;)

@tamasgal
Copy link
Member Author

tamasgal commented Apr 3, 2023

Fixed in #231 and works nicely 🙂

@tamasgal tamasgal closed this as completed Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants