-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance on 8.8k triples (508kb turtle) ontology #154
Comments
Doing some quick profiling suggests to me that the bulk of the time is not spent in |
Apologies. The performance issues I was seeing in |
@kasei can I help turning some of this off, to see how much performance will improve? |
Turning what off? I'd be happy to see PRs on IRI or the Attean parsers to improve performance. |
That being said, I think any changes that completely bypass the IRI validity checks will have to be opt-in in an obvious way that helps to indicate that it may cause problems elsewhere in Attean or related modules. |
Also, in profiling the code I noticed that a lot fo the time is spent not in the serialization but in the memory model/store code. This is an area where Attean has lagged behind RDF::Trine. Improvements to this code, or a port (or new implementation) of something like RDF::Trine::Store::DBI (based on SQLite or other) might have a bigger impact on performance than trying to avoid IRI parsing... |
"Turn off" the optional Type checks. |
The
Not necessarily. The memory store in Attean is a trivial implementation, but even though it's all in-memory and you are using a small dataset, something like SQLite might be faster just as a result of working with native datatypes (for example). I'm not guaranteeing that such an implementation would be faster, but work in this area (whether on a more optimized in-memory store, or on a bridge to something like SQLite) would certainly improve performance in this sort of use case. |
I should add timing with a java (rdf4j) reimplenentation that we have. Afair it's 10x faster |
I've pushed a beta version of AtteanX::Store::LMDB to CPAN and along with some minor performance improvements in Attean (unreleased, available via GitHub for now), saw a large performance improvement on your use File::Temp qw(tempdir);
my $path = tempdir(CLEANUP => 1);
my $store = Attean->get_store('LMDB')->new(filename => $path, initialize => 1); There's probably still improvements to be had with an actually lazy implementation of IRI, but I'd be interested to hear how a more performant store impacts your use cases. |
Thanks! I'll try it soon. |
@VladimirAlexiev is that 3.5Mb file available somewhere? I'd be happy to give it a try and profile the run to see where else might benefit from improvements. |
@VladimirAlexiev following up on the mention of the LMDB store, I just noticed that it requires manually installing LMDB as a system library. I had thought it was built-in to the LMDB_File module, but that seems not to be the case. I think it's still the best solution right now for performant use, but obviously might be an issue in some environments. I'll try to have a look at some of the more portable store options for improving performance (either improving the memory store or porting the SQLite store from RDF::Trine). |
@VladimirAlexiev It turns out I had the SQLite code sitting around unreleased, which I've now pushed to CPAN. So if the system library installation for LMDB is problematic, AtteanX::Store::DBI will probably be the next best option. To get a temporary SQLite store (in-memory only), do this: our $store = Attean->get_store('DBI')->temporary_store(); |
(Extracted from #153).
https://github.com/VladimirAlexiev/soml/tree/master/owl2soml/eg#schemaorg describes running the script https://github.com/VladimirAlexiev/soml/tree/master/owl2soml on schema.org renditions (508k ttl, 730k rdf, 808k jsonld). The script produces 428k yaml and takes substantial time to process: 4 minutes for ttl (Have not yet been able to make it run on jsonld and rdf).
My code doesn't use SPARQL (for now), just Model -> subjects/properties/objects/holds and Iter -> next/elements. What's the easiest way to profile this code?
I suspect significant time is spent converting between Attean::IRI and URI (#151). I use
lazy
to suspend IRI parsing, but there is no such option forURI
The Turtle file is 8.8k triples and RIOT takes 6s to convert it to ttl:
I wonder how long would Attean take on such conversion...
The text was updated successfully, but these errors were encountered: