-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to create my own raw marshal/unmarshal? #51
Comments
Adding more 'machines' is tricky, unfortunately, Wish it wasn't, but it's not just some unexported fields: the constraints of going fast and being flexible seem to conflict hereabouts. In order to avoid an allocation for every object encountered, the obj package has this concept of a "slab", where it just allocates a frankly excessive amount of space which contains all the working memory that could possibly be needed to handle an object in various ways: Lines 21 to 30 in 3d65705
And there's no way to do that in golang in an extensible / package-boundary-crossing way. (So in other words... that comment about custom marshal machines might be out of date :/ I wanted to support that, but it was an early idea and I don't think I had realized how it wouldn't play out well with the whole slab concept.) If there's a cleverer way to do this, it's gonna need a lot of thought. Or possible it's time for some sizably different angle of attack: the whole "user-land stack" thing is a pretty all-in design choice in the So, are there other ways to build faster paths and still reuse stuff? Yeah! For that matter, The entire It's just finding something that's incrementally adoptable and composable that's the tricky bit. |
So about those options for progress that's in non-incremental territory. There's probably more than a few possible development trajectories, because "non-incremental" kind of opens the floodgates. There are lots, and lots, and lots of different ways one could write object<->token mapping code, and still reuse the codecs for token<->serial mapping. But there's a few I've looked at, so I'll try to comment on those here. (There's a lot of work being pursued in these directions going on within ipld/go-ipld-prime, btw... but being non-incremental approaches, it'll take a while to show fruit. And I haven't been trying to port subsets of that work back into refmt while it's going on.) There are two big things that are costly about the way that the
So what can we do about either one, or both of those?
So you can see how there's many options. But none of the choices are trivial. If you wanted to pursue some of these, I'd say "go for it" and try to be helpful, but in a lot of cases there's no nice resting point in the middle of implementing it; one just has to do the whole dang thing. And then see if it got faster or not, because there's very little meaningful testing and benchmarking one can do before having the whole, holistic thing to benchmark as a unit. To re-summarize what I alluded to at the top and a bit throughout: go-ipld-prime is trying the 'Node' approach, and it's doing a complete alternative to the 'obj' package based on that, while reusing the codecs and token interfaces. I'm also doing the codegen approach over there, but optionally. While that's a very large body of work, some parts of it are seeming close to paying off now. So you might want to keep on eye on how that evolves. |
And one more "P.S." -- I'm not sure how deep and what directions your own investigations into your bottlenecks have gone, but fwiw, I've recently been finding that pprof output files are amazingly useful; especially once benchmarks aren't able to provide precise enough guidance for what to look at next. The time pprofs are good; the mem and alloc ones often even better. The tools for inspecting them have also gotten radically more awesome in the last couple years. Profiling outputs are especially valuable compared to benchmarks for the kind of stuff we experience in refmt, because the performance profile of an operation is intensely data-dependent. I've also started using assembly dumps a lot recently to make sense of what the compiler is actually doing and thus to make sure my microbenchmarks aren't telling exotic lies, etc, and that's turned out to be a lot more relevant than I would've expected. (It's really easy to make a microbenchmark that lies.) The ' If you wanna have a quick call sometime to talk more about ways to gather data like this I'd be happy to :) Some of the recent major perf improvements I mentioned earlier were almost a direct result of someone throwing some pprof files at me from non-trivial prod usage, so, yeah... they're precious. |
Thank you so much for your insanely thorough and thoughtful answer (and so quick)! Given that we're working with IPLD cbor objects nearly exclusively it seems like the ipld-prime route is probably the best place for me to look... last time I checked it out it didn't seem like I could just drop it into a production system and expect it to work :). I've been using the pprof tools a lot (mostly CPU/memory). Skipping obj altogether might be interesting for some fast path things... looks like Node is doing that over in ipld-prime with "Marshal" as opposed to "Encode" |
Yeah, don't quite wanna claim that the go-ipld-prime stuff is drop-in yet, and the profiling effort on that is also so far... minimal. It's getting close to ready, though. And a couple of early benchmarks seem to be indicating it's roughly on par with refmt already, before serious optimization work, so... seems likely there's good things to come there :) |
We have a few objects where refmt serialization is the bottleneck in our app... taking up to 1ms to serialize an object.
I'm wondering if it's possible to specify a fast-path for these objects... here it seems to say I might be able to have my own machine:
However, the MarshalMachine interface has unexported type expectations in the fields.
Any other suggestions for keeping refmt around but being able to register my own "fast-path" ?
The text was updated successfully, but these errors were encountered: