v2 pkpool #20

omor1 · 2020-06-05T01:05:18Z

461b6ed prevents steal attempts to the same core and, excepting implementation issues, should be OK.
5eb5330 ends up with a better packet distribution, but maybe causes other issues—might need more testing. I've only modified server_psm2.h since server_ibv.h/server_ibv_helper.h already set p->context.poolid and server_ofi.h is out-of-date and I'm not sure it compiles.

steal id = (rand (mod n-1)) + id + 1 (mod n) Current implementation has a lot of repeated calls to lc_pool_get_local that the compiler doesn't seem to fully get rid of. Maybe marking it with __attribute__((pure)) will help? Need to re-read docs to ensure we don't violate its constraints.

omor1 · 2020-06-05T01:06:13Z

Whoops, I just realized that a bit of #19 snuck into 0af5316, let me fix that...

Populate p->context.poolid when allocating a packet and use lc_pool_put_to when returning it so that packets don't get stuck in the same pool. This reduces the number of retries needed, but may have other effects - needs testing.

include/lc/pool.h

danghvu · 2020-06-05T05:25:00Z

src/medium.c

@@ -6,8 +6,7 @@
 lc_status lc_sendm(void* src, size_t size, int rank, int tag, lc_ep ep)
 {
  LC_POOL_GET_OR_RETN(ep->pkpool, p);
-  lci_pk_init(ep, (size > 1024) ? lc_pool_get_local(ep->pkpool) : -1,
-              LC_PROTO_DATA, p);
+  lci_pk_init(ep, lc_pool_get_local(ep->pkpool), LC_PROTO_DATA, p);


Note that in some benchmark it can be an issue since the cost of putting the data on a remote pool can be significant i.e. it's always guarantee a cache miss vs. returning to the its own pool which is very cheap.

That's a fair point. In our use case however, there is only ever one thread (per device I guess) that can return packets. Is it then better to cause a cache miss or to allow a pool to empty, guaranteeing a cache miss later?

Agreed, there is no right or wrong here. Things like this need to be tuned and measured against application behavior. Hence giving some flex over it for tuning is better, I would change 1024 to a const, and if you set to 0, compiler will get rid of it.

Btw note also cache miss on the progress thread trying to return to the original pool can be worst than on sender stealing. Firstly it delay progress, secondly assuming we have large number of thread doing send, cost are amortized as you may have other threads doing useful works. Again may not matter in your case now which you have only one sender thread.

Right, I see your point.

The branch of ParSEC I'm currently on is old and funnels all sends via the communication thread; but more recent versions can send from other threads as well, so this will become more relevant in the future once I rebase.

I've remembered that actually, the communication and progress threads are supposed to run on the same core (they're bound). This means that the should end up using the same pool. In PaRSEC/LCI, the user's main thread is actually the one to initialize LCI and thus fill the initial pool; the communication and progress threads are started later (and presumably run on a different core) and therefore start with an empty pool.

I'll do some more tests that discard the changes for medium and long sends, but keep the remaining changes (most notably those in server_psm2.h).

Long-term, I'll probably introduce some compile-time parameters for a) limit for medium send packet return and b) whether to do long send packet return.

Strange if they are using the same pool, please verify.

I'll verify tomorrow, but my intuition is that my above statement is correct. Both threads should be bound to the same core and so sched_getcpu should return the same value, leading to using the same pool. This is PaRSEC-specific though, and in the future if/when we enable using more than one device we'll need to have better logic for thread placement (in PaRSEC).

Depending on the activation mode, PaRSEC communication thread might not be bound to a core, but instead be allowed to freely move around. You can be more restrictive and force a binding to the communication thread, in which case the resource will be assumed reserved, and no computation thread will be allowed to use it.

I think the default (which I've been testing with) is to have it be bound. When we have a single progress thread, it makes sense to bind it to the communication thread, but if we have more (multiple NICs/hardware queues/"LCI devices") then the decision is less obvious. At that point, we'll want to introduce an MCA parameter to control the binding.

include/lc/pool.h

Vu noted that this should be tunable at compile time, as different applications/systems may want different behavior. Adds two compile-time parameter definitions in config.h: LC_PKT_RET_MED_SIZE: min size of med send to return to sender pool LC_PKT_RET_LONG: whether to return long send packet to sender pool

If pool->npools == 1, there is no valid steal target: return self pid. Also includes formatting fixes and changes discussed in #20.

omor1 · 2020-06-09T01:55:10Z

Maybe reformulate this as lc_pool_steal_from(lc_pool* pool, int32_t pid) and lc_pool_get_steal_id(int32_t npools, int32_t pid)?

danghvu · 2020-06-09T04:09:00Z

This lgtm now! Thanks, feel free to merge

omor1 · 2020-06-10T21:59:09Z

081bab4 renames lc_pool_get_local to lc_pool_get_local_id
7ee652c implements #20 (comment)

Rename lc_pool_get_local to better reflect what it actually does.

omor1 · 2020-06-11T02:49:29Z

Something here seems to break collectives (or at least barrier). I'm working on debugging this...
The sends and receives seem to complete, but the synchronizer is never triggered. Let me know if you have any immediate thoughts?

danghvu · 2020-06-11T03:20:00Z

Collective in LCI v2? Do you configure it with sync? Do you reproduce it with any example code?

…

On Wed, Jun 10, 2020 at 7:49 PM Omri Mor ***@***.***> wrote: Something here seems to break collectives (or at least barrier). I'm working on debugging this... The sends and receives seem to complete, but the synchronizer is never triggered. Let me know if you have any immediate thoughts? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIZNSS3SOHHKKV4XZBG5LDRWBA4NANCNFSM4NTDTEZA> .

omor1 · 2020-06-11T03:22:10Z

Yes I did, and I just found the bug. It's a mistake of my own making: after finding a valid steal target, it would go on and try to steal from itself (line 114 in pool.h).

The collective examples all use a progress thread, which gets starved for packets and therefore the receive never completes.

pkpool: return packets to allocating pool

5eb5330

Populate p->context.poolid when allocating a packet and use lc_pool_put_to when returning it so that packets don't get stuck in the same pool. This reduces the number of retries needed, but may have other effects - needs testing.

omor1 force-pushed the v2-pkpool branch from 0af5316 to 5eb5330 Compare June 5, 2020 01:12

danghvu reviewed Jun 5, 2020

View reviewed changes

include/lc/pool.h Show resolved Hide resolved

danghvu reviewed Jun 5, 2020

View reviewed changes

include/lc/pool.h Outdated Show resolved Hide resolved

danghvu reviewed Jun 5, 2020

View reviewed changes

include/lc/pool.h Outdated Show resolved Hide resolved

omor1 added 2 commits June 5, 2020 17:39

pool: prevent div by 0 error with npools == 1

9809f95

If pool->npools == 1, there is no valid steal target: return self pid. Also includes formatting fixes and changes discussed in #20.

omor1 added 2 commits June 10, 2020 17:10

pool: lc_pool_get_local -> lc_pool_get_local_id

081bab4

Rename lc_pool_get_local to better reflect what it actually does.

pool: change steal API for better design

7ee652c

omor1 force-pushed the v2-pkpool branch from 1ec1c7c to 7ee652c Compare June 10, 2020 22:12

pool: stop stealing from yourself, it doesn't work

5d4bbe2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2 pkpool #20

v2 pkpool #20

omor1 commented Jun 5, 2020 •

edited

Loading

omor1 commented Jun 5, 2020

danghvu Jun 5, 2020

omor1 Jun 5, 2020

danghvu Jun 5, 2020

omor1 Jun 5, 2020

omor1 Jun 5, 2020

danghvu Jun 9, 2020

omor1 Jun 9, 2020 •

edited

Loading

bosilca Jun 9, 2020

omor1 Jun 9, 2020

omor1 commented Jun 9, 2020

danghvu commented Jun 9, 2020

omor1 commented Jun 10, 2020 •

edited

Loading

omor1 commented Jun 11, 2020

danghvu commented Jun 11, 2020 via email

omor1 commented Jun 11, 2020 •

edited

Loading

v2 pkpool #20

Are you sure you want to change the base?

v2 pkpool #20

Conversation

omor1 commented Jun 5, 2020 • edited Loading

omor1 commented Jun 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

omor1 Jun 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

omor1 commented Jun 9, 2020

danghvu commented Jun 9, 2020

omor1 commented Jun 10, 2020 • edited Loading

omor1 commented Jun 11, 2020

danghvu commented Jun 11, 2020 via email

omor1 commented Jun 11, 2020 • edited Loading

omor1 commented Jun 5, 2020 •

edited

Loading

omor1 Jun 9, 2020 •

edited

Loading

omor1 commented Jun 10, 2020 •

edited

Loading

omor1 commented Jun 11, 2020 •

edited

Loading