Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couple fixes to fi_pingpong #9971

Merged
merged 1 commit into from Apr 4, 2024
Merged

Couple fixes to fi_pingpong #9971

merged 1 commit into from Apr 4, 2024

Conversation

ghost
Copy link

@ghost ghost commented Apr 2, 2024

First commit is cleanup. Second commit adds source address parameter to fix issue #9900, using fi_pingpong in multi-adapter environment.

@ghost ghost requested a review from j-xiong April 3, 2024 17:59
@zachdworkin
Copy link
Contributor

bot:aws:retest

@shijin-aws
Copy link
Contributor

@zachdworkin AWS CI failed due to fi_pingpong failed on Windows platform

15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:283 :  * PingPong options:
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:285 :   - src_addr            : [None]
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:286 :   - src_port            : [0]
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:287 :   - dst_addr            : [None]
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:288 :   - dst_port            : [0]
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:289 :   - sizes_enabled       : default size = 64
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:290 :   - iterations          : default iterations: 10
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:293 :   - provider            : efa
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:2237: Selected endpoint: RDM
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:509 : Initializing control messages
15:13:10 socket(): C:\9971_1712094667\libfabric\util\pingpong.c:430 , ret=-22 (Invalid argument)
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:1876: Freeing resources of test suite
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:1910: Resources of test suite freed
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:283 :  * PingPong options:
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:285 :   - src_addr            : [None]
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:286 :   - src_port            : [0]
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:287 :   - dst_addr            : [127.0.0.1]
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:288 :   - dst_port            : [0]
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:289 :   - sizes_enabled       : default size = 64
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:290 :   - iterations          : default iterations: 10
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:293 :   - provider            : efa
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:2237: Selected endpoint: RDM
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:509 : Initializing control messages
15:13:10 [error] C:\9971_1712094667\libfabric\util\pingpong.c:321 : getaddrinfo : E
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:1876: Freeing resources of test suite
15:13:10 [debug] C:\9971_1712094667\libfabric\util\pingpong.c:1910: Resources of test suite freed
15:13:10 2024-04-02 22:11:37,190 - INFO - test_runner_windows - Server process failed with return code 22
15:13:10 
15:13:10 Build step 'Execute shell' marked build as failure
15:13:10 Recording test results

@shijin-aws
Copy link
Contributor

shijin-aws commented Apr 3, 2024

Did your change modify the default behavior for OOB exchange?

@zachdworkin
Copy link
Contributor

@shijin-aws #9963 did modify the default behaviour for fabtests. The backlog went from 0 to 511. This prevents SYN Flooding on port 3000 when using oob exchange in fabtests. I did not change the behaviour outside of fabtests.

@shijin-aws
Copy link
Contributor

shijin-aws commented Apr 3, 2024

Is fi_ini necessarily called on Windows? Saw the commit that ofi_osd_init() is removed from the test because it is called in fi_ini, but I find fi_ini has pthread calls which may not be compiled on windows at all

@j-xiong
Copy link
Contributor

j-xiong commented Apr 3, 2024

fi_ini is always called from fi_getinfo and fi_fabric. There are stub definitions for pthread under include/windows.

@shijin-aws
Copy link
Contributor

Any ideas how can this change cause 15:13:10 socket(): C:\9971_1712094667\libfabric\util\pingpong.c:430 , ret=-22 (Invalid argument) ?

@ghost
Copy link
Author

ghost commented Apr 3, 2024

Any ideas how can this change cause 15:13:10 socket(): C:\9971_1712094667\libfabric\util\pingpong.c:430 , ret=-22 (Invalid argument) ?

It might be an ordering issue. ofi_socket is called before any calls to libfabric. I don't care that much about removing ofi_osd_init() call. I will drop that commit from this PR.

@shijin-aws
Copy link
Contributor

Any ideas how can this change cause 15:13:10 socket(): C:\9971_1712094667\libfabric\util\pingpong.c:430 , ret=-22 (Invalid argument) ?

It might be an ordering issue. ofi_socket is called before any calls to libfabric. I don't care that much about removing ofi_osd_init() call. I will drop that commit from this PR.

Ah, that makes sense. Thank you!

In multi-adapter environment, fi_pingpong is failing with -61,
No data available error when not using the first interface.

Add source address parameter and use it with hints src_addr to
get at the correct interface for data transfer.

Signed-off-by: Chien Tin Tung <[email protected]>
@shijin-aws
Copy link
Contributor

@chien-intel windows test passed for the latest push

@ghost
Copy link
Author

ghost commented Apr 3, 2024

Great, thanks!

@shijin-aws shijin-aws merged commit f9faa20 into ofiwg:main Apr 4, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants