-
Notifications
You must be signed in to change notification settings - Fork 869
WeeklyTelcon_20160906
Jeff Squyres edited this page Nov 18, 2016
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Artem Polyakov
- Brad Benton
- Geoffroy Vallee
- George
- Howard
- Josh Hursey
- Nathan Hjelm
- ralph
- Sylvain Jeaugey
- Todd Kordenbrock
- Milestones
- 1.10.4
- Only potential blocker is issue with wrapper compiler.
- mpifort is not libpath-ing rpath lib
- when you do C builds, add rpath to all dependent libs during build.
- static builds on 1.10
- 1.10.4 Released!
- Ralph will "bulk move" still open 1.10.4 PRs to 1.10.5.
- Only potential blocker is issue with wrapper compiler.
-
- 2.0.1 is OUT
- moving oustanding stuff to 2.0.2 or 2.1.0
- Jeff and Howard pulled in some PRs for 2.0.2
- coll_sync - macro had a type-o in it. Works, but was wrong. Fixed.
- Figured out bug with powerpc atomics - there is a fix.
- optionA - re-enabled PGI atomic and apply a patch.
- optionB - or re-write atomics.
- Summary- there are a small number of asm files that are handlined.
- If there are non-inline atomics, and no asm file - fails horribly in configure
- If there are no-inline atomics, but asm is stale, fails at Build time (powerpc).
- JHjelm - is proposing to remove asm files (as all compilers we support support inline atomics).
- We had a check that said "if PGI, then just use asm file"
- We should require PGI version > 10.8 (for inline atomics).
- Nvidia (Sylvain) agreed this was okay.
- Paul filed bug with PGI inline assembly fix.
- Schedule - End of October.
- Issue 2030 - Comm Spawn is still Broken. - timeout in OPAL_PMIX_Exchange macro. Fixed in master?
- Very hard to reproduce.
- Race condition that's tickled by MTT, but not manually. Have seen this for years.
-
Issue 2049 -
- Patcher issue. Can't write to page (in shared code, read only page).
- disabling patcher framework fixes this.
- No Open BSD drivers, since Open BSD puts program shared pages in read-only, Linux does not.
- Resolved to NOT support this on Open BSD at this time.
-
Issue 2028 - SPML Yoda not BTL 3.0 compliant
- Blocking issue for 2.1.0!
- Work not done for Open SHMEM.
- Still allocate a fragment
- OpenSHMEM - works with Open1, and whatever MxM flavors. ???
- Open question, who's going to fix this.
- Artem - Mellanox is now testing yoda in their jenkins.
- Suggest we remove the broken test from Mellanox jenkins.
- Artem will fix now.
- rework way callbacks are done, and for put and get, don't allocate a fragment.
- Hjelm - can help by telling how BTL3 works.
-
Significant degradation in message rates observed on Master - Issue 1831
- Master from 2 days ago, so yet includes all MT fixes, etc.
- George trying to figure out where bandwidth latency slowdown came in.
- Message rate was good again, but Bandwidth / Latency, not yet.
- Significantly slower for large message on this machine, despite configured with CMA.
- Really strange that vader is slower with SM, since they're making the same calls. Bizarre!
- Looks like we went from on-cache to off-cache performance.
- Not a weird binding issue. George did more testing to ensure not a binding issue.
- Need more people to try to reproduce this.
- Hidden in a message that Giles sent today. Really funny bug.
- If you send a message inside a communicator, and then free it, and allocate it, and THEN receive the message on the new communicator. If the message is small enough, it goes eager comm->frags->cannot_match.
- Later when you create a communicator, we can match that message.
- Because we can re-use a CID.
- NOT hard to fix. Multiple ways.
- doesn't happen in MPICH.
- window is probably small. Need a distributed system that is out of sync.
- Could split up CID into two parts.
- Why do we always return the lowest CID? - Fragmentation would be horrible if we didn't.
- Someone will file a bug about this. Need to think through this.
-
Ralph sent out proposed language for new Contributor agreement. Need to talk to legal departments.
- We've always had by-laws on wiki
- Folks should comment, so we don't iterate with legal too many times.
- Once we've finalize, we need to have an official vote.
-
Don't know if ompi_release -> ompi transition will be done by next tuesday.
- Still pulling in the ready PRs.
- need to cut a 2.0 branch from v2.x branch.
Review Master MTT testing (https://mtt.open-mpi.org/)
- Master has a sea of red.
- Mellanox is pulling Yoda issue out of Jenkins.
- Date of another face to face. January or February? Think about, and discuss next week.
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel