Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize the pattern matcher #1645

Open
linas opened this issue Apr 25, 2018 · 9 comments
Open

Parallelize the pattern matcher #1645

linas opened this issue Apr 25, 2018 · 9 comments

Comments

@linas
Copy link
Member

linas commented Apr 25, 2018

See opencog/cogutil#108 for the discussion.

@vsbogd
Copy link
Contributor

vsbogd commented Apr 27, 2018

See email for the discussion.

I will continue discussion here as @linas suggested.

@linas, thank you for spotting NextLink issue #1507. Copying PatternMatchCallback was my first attempt to implement parallel matching. But many code parts expect that PatternMatchCallback.grounding() method collects the result and return it synchronously. And there are dozen different grounding() implementations which do it slightly differently.

NextLink idea can resolve this issue. Next steps as I see them:

  • add NextLink to collect PatternMatch results asynchronously
  • enable parallel matching using copy of PME and PMCB only for queries via NextLink (use single thread version for old clients)
  • migrate old clients client by client to asynchronous results collection

Does it makes sense?

@linas
Copy link
Member Author

linas commented Apr 27, 2018

OK. Working on NextLink is kind-of diving into the deep end of the pool; its not clear that you've learned how to swim, yet. For example, NextLink was not originally envisioned as a way to launch some heavy-weight, multi-threaded search; quite the opposite, I was thinking of "lazy evaluation", where it gives you just one answer at a time, only if and when you ask for it.

A better way of doing parallelization is called "scatter-gather", or "map-reduce", and NextLink is a poor API for map-reduce, and I have not spent any time thinking of a good atomese API for map-reduce/scatter-gather type parallel searching. Thus, the next step should really be "is there a good map-reduce-type API for pattern matching"?

@linas
Copy link
Member Author

linas commented Apr 27, 2018

So, perhaps the following might be an API:

    QuestionLink
         AnchorNode "FooLocation"
         NumberNode 42
         PatternLink
              ...

so that calling the c++ method QuestionLink::execute() will cause PatternLink::execute() to run (in parallel) until 42 answers are found, which are anchored to the AnchorNode. To obtain the answers, the user then calls

    GatherLink
        AnchorNode "FooLocation"

so that calling the c++ method GatherLink::execute() to return exactly one answer. The above QuestionLink/GatherLink pair form a scatter/gather or map/reduce type API for search.

For lazy evaluation, so that the QuestionLink does not launch any threads, and does no work at all, until GatherLink is called -- for that, perhaps NumberNode 0 could be used to indicate lazy evaluation.

The above proposal is .. I dunno .. its OK, but its not perfect; it seems to have some obvious issues and problems. If I spent half-the-day thinking about it, I might find better answers. Care to find better answers? Keep in mind that "better answers" may require implementing #1502 first. This would, in turn, unify the code base and functions for GetLink and MapLink into one. and have large, pervasive changes to the atomspace.

The problem there is that there are a bunch of dominoes all lined up, ready to fall, and parallelizing searches is a domino in the middle of the dominoe chain.

It really might be better if you initially focused on fixing some of the existing bugs, rather than tackling fundamental design issues in a major subsystem component ...

@ngeiswei
Copy link
Member

@linas said

It really might be better if you initially focused on fixing some of the existing bugs, rather than tackling fundamental design issues in a major subsystem component ...

That's a wise suggestion.

@vsbogd, I guess you could take some time to go through the github issues of the atomspace repo to see if you find something with the right level challenge for you (probably avoiding the ones labeled "quick task" as they are already too easy for you and best kept for newcomers). As well as obviously keeping improving the benchmark (like addressing opencog/benchmark#8) which is extremely worthwhile.

@linas
Copy link
Member Author

linas commented Jun 19, 2018

See also #1750 as a better way of publishing changes coming out from various subsystems.

@linas
Copy link
Member Author

linas commented Dec 26, 2019

This should be much easier, now, after pull reqs #2453 and #2455 -- See in particular:

// TODO: This is kind-of the main entry point into the CPU-cycle
// sucking part of the pattern search. It might be worth
// parallelizing at this point. That is, ***if*** the _search_set
// is large, or the pattern is large/complex, then it might be
// worth it to create N threads, and N copies of PatternMatchEngine
// and run one search per thread. Maybe. CAUTION: this is not
// always the bottleneck, and so adding heavy-weight thread
// initialization here might hurt some users. See the benchmark
// `nano-en.scm` in the benchmark get repo, for example.
// Note also: parallelizing will require adding locks to some
// portions of the callback, e.g. the `grounding()` callback,
// so that two parallel reports don't clobber each other.
PatternMatchEngine pme(pmc);
pme.set_pattern(*_variables, *_pattern);
for (const Handle& h : _search_set)

@linas
Copy link
Member Author

linas commented May 15, 2020

Pull reqs #2615 and #2616 fully parallelize the pattern engine. All unit tests pass. However ... the overhead of creating threads using the OMP for-each is huge: RandomUTest runs 25x slower and GetStateUTest runs 33x slower, so this code is disabled.

The correct fix might be to create a pool of hot "standby" threads, and pull from that pool whenever a thread is needed. Whether this can get performance back to an acceptable level is unknown.

I think the moral of the story is that most pattern-matches happen very quickly, and the overhead of parallelizing drowns the mechanism. A few pattern matches are slow, and these can be parallelized. But it's not clear how to know which is which.

@ngeiswei
Copy link
Member

A few pattern matches are slow, and these can be parallelized. But it's not clear how to know which is which.

Maybe one can come up with a quick estimation formula based on atomspace size, the number of pattern clauses and so. Interestingly with some rudimentary form of pattern indexing, such estimation could be even better.

@linas
Copy link
Member Author

linas commented Dec 4, 2021

Two proposals mentioned in other issues, but not mentioned here:

  • Introduce a ParallelAndLink that would run queries in parallel.
  • Set a Value on the pattern that says "run me in parallel mode".

An example of using the first solution would be something like (BindLink (ParallelAndLink stuff-to-match) rewrite-stuff) This way, we don't have to invent half-a-dozen different parallel variants. Along the way, maybe we could come up with a different name for AndLink when used in this way?

Anyway, all the basic support for this has already been implemented several years ago. The code has two different ways of doing this, depending on the compiler support for parallelization. So all that's missing is the "easy stuff" of creating the ParallelAndLink and wiring it into place. Note that QueueValue is 100% thread-safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants