-
Notifications
You must be signed in to change notification settings - Fork 869
WeeklyTelcon_20170530
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Edgar Gabriel
- Artem Polyakov
- Jeff Squyres (Cisco)
- Howard
- Josh Hursey
- Joshua Ladd
- Murali (LLNL)
- Todd Kordenbrock
- David Bernholdt
- Nathan Hjelm
- Ralph
- Brian (Amazon)
Review All Open Blockers
- 1.10.x branch is closed, don't bother filing PRs.
- Howard is doing a few more items on the Checklist, with plans to release June 1st.
- Ralph closed some really old Issues that were not updated in a long time.
- This week - There are a lot of issues on v2.0.x milestone. We'd like to move these to either v2.1.x or v3.x Do we need to fix them in v2.0.x? or beyond?
- EVERYONE please review open v2.0.x Issues.
- Close them if it's already addressed.
- Move them if needed.
- No update here. No reason to update, or Schedule for next release.
- Take this offline to talk at face to face:
- Issue 3442 - 32bit builds are busted, probably affects v2.1.x also.
- Could be exotic architecture issue, or possibly just our CMA glue isn't right. CMA seems to be masking the issue?
Review Milestones v3.0
- Ralph created an unofficial RC1 of PMIx 2.0
- updated master with this.
- Rolled this into giant orted PR.
- Some discussion last week about Checkpoint restart -
- Think we decided that they'd remove take CR 3554 (remove various sub components)
- We still need a PR to remove CR from v3.x (leave it in master).
- Brian is in driver seat for RCs on this one.
- Howard hasn't been able to talk to Brian in 2 weeks. He will reach out.
- v3.x update to v3.0.x changes didn't happen.
- Like to do this after Pacific time hours tonight.
- Open PRs will have to be re-created.
- Brian will send out email to devel.
- When we did v2.x we pulled out Checkpoint Restart out of master, and then remove it from v3.x/v3.0.x also.
- Brian will do this after the rename.
- Schedule for v3.0.0
- branch rename tonight.
- pull in PMIx orted changes, and PMIx v2.0
Review Master Pull Requests
Review Master MTT testing
- Still seeing some 'make check' errors, which is disturbing.
- Jeff hasn't been able to focus on that.
- Some kind of compile error, but not seeing the compile error.
- 32bit, 64bit is fine.
- Should clean up compiler warnings.
- MPI_Send_receive_replace - seems to fail consistently.
- Simply large send, managed CUDA.
- Timeouts are all CUDA related - nvidia.
- Issue: Redhat stock autoconf (rather than build our own)
- Someone added autogen requirement of "correct" 1.15 version.
- update broke Travis, but Travis always break (bad)
- website specifies versions of automake / autoconf we require.
- bug in 1.14, so everyone jumped to 1.15. (thought 1.12 is reported to work)
- We should not merge things to master, If PR checker breaks.
- https://github.com/open-mpi/ompi/pull/3602 - make autoconf track posted requirements.,
- PMIx requires 1.15 - got dinged that they weren't checking for version of autoconf that website says we require.
- Came in on Thursday, and started failing when we recurse down there.
- Came up on mailing list - Do sometimes get people reporting the 1.14 bug, because no requirement check.
- ACTION - Brian will update PRchecker / CI to use correct version of autogen.
- Face2Face Meeting-2017-07
- Date: July 11-13 (9am Tuesday - noon on Thursday.
- Cisco has booked space in Chicago.
- Cisco has reserved some space right next to O Hare (can get shuttle to hotel).
- we have met there before.
- Jeff will come in Monday evening.
- Cisco has reserved some space right next to O Hare (can get shuttle to hotel).
- Ralph's goal is to get all PMIx runtime bugs into v3.0, make it as clean as possible.
- Scalability? All the scaling fixing in there, exception of 3 mappers, that could be updated if someone wanted to (to improve scalability with new PMIx 2.0 way of doing things):
- Non-updated (anyone interested?): sequential mapper, rankfile, and min-dist mappers
- Scalability? All the scaling fixing in there, exception of 3 mappers, that could be updated if someone wanted to (to improve scalability with new PMIx 2.0 way of doing things):
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu