Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[CI] LTP: Add MAP_PRIVATE patch to LTP sources and enable LTP in SGX CI #2187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jinengandhi-intel
Copy link
Contributor

@jinengandhi-intel jinengandhi-intel commented Feb 23, 2021

Description of the changes

To run LTP tests with SGX (which doesn't support shared memory mappings), as a workaround changed the mmap flag from MAP_SHARED to MAP_PRIVATE (unintuitively) the LTP test execution works correctly with this change.

How to test this PR?


This change is Reviewable

@mkow
Copy link
Member

mkow commented Feb 23, 2021

Jenkins, test this please

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jinengandhi-intel)

a discussion (no related file):
Really nice PR!


a discussion (no related file):
If you're interested, that's how this PR fails on Jenkins:

+ make SGX=1 ltp-sgx.xml
./runltp_xml.py  -c ltp.cfg -c ltp-sgx.cfg -c ltp-bug-1075.cfg /home/jenkins/workspace/graphene-sgx-18.04-apps/LibOS/shim/test/ltp//install/runtest/syscalls -O ltp-sgx.xml
Cancelling nested steps due to timeout
Sending interrupt signal to process
Terminated
Makefile:68: recipe for target 'ltp-sgx.xml' failed


.ci/lib/stage-test-sgx.jenkinsfile, line 2 at r1 (raw file):

stage('test-sgx') {
    timeout(time: 15, unit: 'MINUTES') {

It looks like 15 minutes is not enough for our wimpy Jenkins machines (they are really-really slow 2-physical-cores SGX machines with 128MB of EPC).


.ci/lib/stage-test-sgx.jenkinsfile, line 7 at r1 (raw file):

                cd LibOS/shim/test/ltp

                # To run ltp tests with GSGX and work around the mmap issue,

We don't use GSGX in our repo, let's just change to SGX


LibOS/shim/test/ltp/ltp.cfg, line 2026 at r1 (raw file):

[sendfile06_64]
skip = yes

Could you remove this redundant newline?


LibOS/shim/test/ltp/ltp-bug-1075.cfg, line 28 at r1 (raw file):

skip = yes

[chmod03]

Why did you add this test, it seems to have worked before?


LibOS/shim/test/ltp/ltp-bug-1075.cfg, line 31 at r1 (raw file):

skip = yes

[chmod04]

Why did you add this test, it seems to have worked before?


LibOS/shim/test/ltp/ltp-bug-1075.cfg, line 79 at r1 (raw file):

skip = yes

[epoll_wait01]

Why did you add this test, it seems to have worked before?


LibOS/shim/test/ltp/ltp-bug-1075.cfg, line 82 at r1 (raw file):

skip = yes

[epoll_wait03]

Why did you add this test, it seems to have worked before?

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jinengandhi-intel)


.ci/lib/stage-test-sgx.jenkinsfile, line 2 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

It looks like 15 minutes is not enough for our wimpy Jenkins machines (they are really-really slow 2-physical-cores SGX machines with 128MB of EPC).

I suggest to start with say 30 minutes, and then (if it succeeds), decrease in intervals of 5 minutes.

Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @jinengandhi-intel)


.ci/lib/stage-test-sgx.jenkinsfile, line 2 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I suggest to start with say 30 minutes, and then (if it succeeds), decrease in intervals of 5 minutes.

At least on my machine it worked within 15mins, will increase it to 30 to be on the safe side.


.ci/lib/stage-test-sgx.jenkinsfile, line 7 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We don't use GSGX in our repo, let's just change to SGX

Done


LibOS/shim/test/ltp/ltp.cfg, line 2026 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you remove this redundant newline?

Done


LibOS/shim/test/ltp/ltp-bug-1075.cfg, line 28 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why did you add this test, it seems to have worked before?

At least on our servers it has never worked with SGX:

SGX=1 ./pal_loader chmod03
WARNING: Using insecure argv source. Don't use this configuration in production!
Accessing file:etc/nsswitch.conf is denied (Operation denied). This file is not trusted or allowed. Trusted files should be regular files (seekable).
Accessing file:/lib/x86_64-linux-gnu/libnss_files.so.2 is denied (Operation denied). This file is not trusted or allowed. Trusted files should be regular files (seekable).
chmod03 1 TBROK : /home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/chmod/chmod03.c:151: getpwnam failed: errno=EACCES(13): Permission denied
chmod03 2 TBROK : /home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/chmod/chmod03.c:151: Remaining cases broken

But instead of adding to this file where we track the mmap failures, I have moved the skip to the ltp-sgx.cfg.


LibOS/shim/test/ltp/ltp-bug-1075.cfg, line 31 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why did you add this test, it seems to have worked before?

Same as above, the test has always failed for us.

SGX=1 ./pal_loader chmod04
WARNING: Using insecure argv source. Don't use this configuration in production!
Accessing file:etc/nsswitch.conf is denied (Operation denied). This file is not trusted or allowed. Trusted files should be regular files (seekable).
Accessing file:/lib/x86_64-linux-gnu/libnss_files.so.2 is denied (Operation denied). This file is not trusted or allowed. Trusted files should be regular files (seekable).
chmod04 1 TBROK : /home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/lib/tst_sig.c:234: unexpected signal SIGSEGV(11) received (pid = 1).
chmod04 2 TBROK : /home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/lib/tst_sig.c:234: Remaining cases broken
chmod04 0 TWARN : /home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/lib/tst_tmpdir.c:319: tst_rmdir: TESTDIR was NULL; no removal attempted

Have moved the skip to ltp-sgx.cfg as this is not related to mmap issue.


LibOS/shim/test/ltp/ltp-bug-1075.cfg, line 79 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why did you add this test, it seems to have worked before?

This test originally failed with the mmap shared error:

SGX=1 ./pal_loader epoll_wait01
WARNING: Using insecure argv source. Don't use this configuration in production!
file_map does not currently support writable pass-through mappings on SGX. You may add the PAL_PROT_WRITECOPY (MAP_PRIVATE) flag to your file mapping to keep the writes inside the enclave but they won't be reflected outside of the enclave.
/home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/lib/tst_test.c:108: TBROK: mmap((nil),4096,3,1,3,0) failed: EACCES (13)

After the fix, it fails with the below error which I need to analyze:

SGX=1 ./pal_loader epoll_wait01
WARNING: Using insecure argv source. Don't use this configuration in production!
/home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/lib/tst_test.c:1250: TINFO: Timeout per run is 0h 05m 00s
/home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/lib/safe_macros.c:261: TBROK: /home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/epoll_wait/epoll_wait01.c:43: read(3,0x3cbac80,45056) failed, returned 4096: EINVAL (22)

Again moving the skip statement to ltp-sgx.cfg.


LibOS/shim/test/ltp/ltp-bug-1075.cfg, line 82 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why did you add this test, it seems to have worked before?

The test does work now but not all subtests pass:

SGX=1 ./pal_loader epoll_wait03
WARNING: Using insecure argv source. Don't use this configuration in production!
epoll_wait03 1 TPASS : epoll_wait() fails as expected: TEST_ERRNO=EBADF(9): Bad file descriptor
epoll_wait03 2 TPASS : epoll_wait() fails as expected: TEST_ERRNO=EINVAL(22): Invalid argument
epoll_wait03 3 TPASS : epoll_wait() fails as expected: TEST_ERRNO=EINVAL(22): Invalid argument
epoll_wait03 4 TPASS : epoll_wait() fails as expected: TEST_ERRNO=EINVAL(22): Invalid argument
epoll_wait03 5 TFAIL : /home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/epoll_wait/epoll_wait03.c:129: epoll_wait() succeed unexpectedly

I want to analyze the failing test first, post which I will create a must-pass struct for this.

We have more of such tests that we plan to cover as part of another PR. For now, I will let the skip statement be in the ltp-sgx.cfg

@jinengandhi-intel jinengandhi-intel force-pushed the ltp_exp branch 2 times, most recently from a9db9e4 to 1781a1a Compare February 24, 2021 04:28
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jinengandhi-intel)


LibOS/shim/test/ltp/ltp-bug-1075.cfg, line 28 at r1 (raw file):

Previously, jinengandhi-intel wrote…

At least on our servers it has never worked with SGX:

SGX=1 ./pal_loader chmod03
WARNING: Using insecure argv source. Don't use this configuration in production!
Accessing file:etc/nsswitch.conf is denied (Operation denied). This file is not trusted or allowed. Trusted files should be regular files (seekable).
Accessing file:/lib/x86_64-linux-gnu/libnss_files.so.2 is denied (Operation denied). This file is not trusted or allowed. Trusted files should be regular files (seekable).
chmod03 1 TBROK : /home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/chmod/chmod03.c:151: getpwnam failed: errno=EACCES(13): Permission denied
chmod03 2 TBROK : /home/rasp/jinen_workpsace/graphene/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/chmod/chmod03.c:151: Remaining cases broken

But instead of adding to this file where we track the mmap failures, I have moved the skip to the ltp-sgx.cfg.

Ok, thanks, I agree with moving these tests to SGX-specific file.

@dimakuv
Copy link

dimakuv commented Feb 24, 2021

Jenkins, test this please

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jinengandhi-intel)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

If you're interested, that's how this PR fails on Jenkins:

+ make SGX=1 ltp-sgx.xml
./runltp_xml.py  -c ltp.cfg -c ltp-sgx.cfg -c ltp-bug-1075.cfg /home/jenkins/workspace/graphene-sgx-18.04-apps/LibOS/shim/test/ltp//install/runtest/syscalls -O ltp-sgx.xml
Cancelling nested steps due to timeout
Sending interrupt signal to process
Terminated
Makefile:68: recipe for target 'ltp-sgx.xml' failed

Ok, it got better with a 30-minute timeout. The tests finish but one test fails:

./runltp_xml.py  -c ltp.cfg -c ltp-sgx.cfg -c ltp-bug-1075.cfg /home/jenkins/workspace/graphene-sgx-apps/LibOS/shim/test/ltp//install/runtest/syscalls -O ltp-sgx.xml
2021-02-24 08:02:36,922 LTP.setrlimit01: -> ERROR (Timed out after 45.0 s.)
2021-02-24 08:06:20,870 LTP: LTP finished tests=1327 passed=308 failures=0 errors=1 skipped=1018 returncode=1

Is there anything special about setrlimit01? Are you sure this test works under SGX?

Also, in another run kill11 failed under SGX, but I think this is a known intermittent issue.


Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ok, it got better with a 30-minute timeout. The tests finish but one test fails:

./runltp_xml.py  -c ltp.cfg -c ltp-sgx.cfg -c ltp-bug-1075.cfg /home/jenkins/workspace/graphene-sgx-apps/LibOS/shim/test/ltp//install/runtest/syscalls -O ltp-sgx.xml
2021-02-24 08:02:36,922 LTP.setrlimit01: -> ERROR (Timed out after 45.0 s.)
2021-02-24 08:06:20,870 LTP: LTP finished tests=1327 passed=308 failures=0 errors=1 skipped=1018 returncode=1

Is there anything special about setrlimit01? Are you sure this test works under SGX?

Also, in another run kill11 failed under SGX, but I think this is a known intermittent issue.

Thanks Dmitrii for doing the analysis! Not sure if it is one of those inconsistent tests, it had passed on our server for sure, will try to run the test in a loop and check the output.


Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)

a discussion (no related file):

Previously, jinengandhi-intel wrote…

Thanks Dmitrii for doing the analysis! Not sure if it is one of those inconsistent tests, it had passed on our server for sure, will try to run the test in a loop and check the output.

For now have marked the tests as skipped. Can someone trigger a new build? Hopefully no more test failures.


Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jinengandhi-intel)

a discussion (no related file):

Previously, jinengandhi-intel wrote…

For now have marked the tests as skipped. Can someone trigger a new build? Hopefully no more test failures.

I will trigger Jenkins now.


@dimakuv
Copy link

dimakuv commented Feb 25, 2021

Jenkins, test this please

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jinengandhi-intel)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I will trigger Jenkins now.

Looks like it's working but there is this error during Jenkins cleanup:

              ERROR: Files modified by build, but not gitignored:
--------------------------------------------------------------------------------
 M LibOS/shim/test/ltp/src
Entering 'LibOS/shim/test/ltp/src'
 M lib/tst_test.c
Entering 'LibOS/shim/test/ltp/src/testcases/kernel/mce-test'

This is because you modify lib/tst_test.c manually. You should add it to gitignore: https://github.com/oscarlab/graphene/blob/master/LibOS/shim/test/ltp/.gitignore


Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jinengandhi-intel)

a discussion (no related file):
Hm, even 30 minutes is not enough. Please note that 30-minute limit includes building SGX-enabled LTP tests as well as running them. Are you sure 15 minutes that you reported previously included both these steps?

Anyway, please increase the limit to say 45 minutes. That's getting sad though :(


Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Looks like it's working but there is this error during Jenkins cleanup:

              ERROR: Files modified by build, but not gitignored:
--------------------------------------------------------------------------------
 M LibOS/shim/test/ltp/src
Entering 'LibOS/shim/test/ltp/src'
 M lib/tst_test.c
Entering 'LibOS/shim/test/ltp/src/testcases/kernel/mce-test'

This is because you modify lib/tst_test.c manually. You should add it to gitignore: https://github.com/oscarlab/graphene/blob/master/LibOS/shim/test/ltp/.gitignore

Will add it.


a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Hm, even 30 minutes is not enough. Please note that 30-minute limit includes building SGX-enabled LTP tests as well as running them. Are you sure 15 minutes that you reported previously included both these steps?

Anyway, please increase the limit to say 45 minutes. That's getting sad though :(

What's the difference between graphene-sgx-apps and graphene-sgx-18.04-apps jobs? One full run of the former job completed in 36 minutes (including the LTP tests passing with GSX), whereas here with the latter it timed out just running the LTP tests.


Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @jinengandhi-intel)

a discussion (no related file):

Previously, jinengandhi-intel wrote…

What's the difference between graphene-sgx-apps and graphene-sgx-18.04-apps jobs? One full run of the former job completed in 36 minutes (including the LTP tests passing with GSX), whereas here with the latter it timed out just running the LTP tests.

Our Jenkins machines are quite different in terms of performance, from what I know (some have more CPUs, some have less; some have better SSD drives, some have worse HDD drives). So it could be that one of these jobs landed on a wimpy machine, whereas the other landed on a fast machine.

In terms of difference between graphene-sgx-apps and graphene-sgx-18.04-apps jobs -- there is no difference.


@mkow mkow changed the title Increase test coverage for LTP running with Graphene SGX. [CI] Run LTP on Linux-SGX Feb 26, 2021
@mkow mkow requested a review from woju February 26, 2021 01:00
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv, @jinengandhi-intel, and @woju)

a discussion (no related file):
@jinengandhi-intel Could you please rewrite your commit messages to the format/convention we use? See https://graphene.readthedocs.io/en/latest/devel/contributing.html and https://commit.style/.

Adding LTP test suite to run with GSGX

GSGX is the name of our deprecated driver (see https://github.com/oscarlab/graphene-sgx-driver/). Please take a look at the contributing guide and use a proper component tag instead :)



.ci/lib/stage-test-sgx.jenkinsfile, line 7 at r4 (raw file):

                cd LibOS/shim/test/ltp

                # To run ltp tests with SGX and work around the mmap issue,

ltp -> LTP


.ci/lib/stage-test-sgx.jenkinsfile, line 7 at r4 (raw file):

the mmap issue

I think there isn't enough context here for the reader to know what "mmap issue" you have in mind. Could you briefly describe that LTP uses shared memory, which is not supported on SGX, so we need this hack? (and that unintuitively, it actually seems that LTP works with it)


.ci/lib/stage-test-sgx.jenkinsfile, line 13 at r4 (raw file):


                make ${MAKEOPTS} all sgx-tokens
                make SGX=1 ltp-sgx.xml

SGX is already set in env, no need to hardcode it here (see e.g. .ci/linux-sgx-ubuntu18.04-gcc-release-apps.jenkinsfile)

@jinengandhi-intel jinengandhi-intel force-pushed the ltp_exp branch 2 times, most recently from 656db94 to 3edc1e8 Compare February 26, 2021 03:36
Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv, @mkow, and @woju)

a discussion (no related file):

Previously, jinengandhi-intel wrote…

Will add it.

Done.


a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Our Jenkins machines are quite different in terms of performance, from what I know (some have more CPUs, some have less; some have better SSD drives, some have worse HDD drives). So it could be that one of these jobs landed on a wimpy machine, whereas the other landed on a fast machine.

In terms of difference between graphene-sgx-apps and graphene-sgx-18.04-apps jobs -- there is no difference.

Increased the timeout to 45mins


a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

@jinengandhi-intel Could you please rewrite your commit messages to the format/convention we use? See https://graphene.readthedocs.io/en/latest/devel/contributing.html and https://commit.style/.

Adding LTP test suite to run with GSGX

GSGX is the name of our deprecated driver (see https://github.com/oscarlab/graphene-sgx-driver/). Please take a look at the contributing guide and use a proper component tag instead :)

Done.



.ci/lib/stage-test-sgx.jenkinsfile, line 7 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ltp -> LTP

Done.


.ci/lib/stage-test-sgx.jenkinsfile, line 7 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
the mmap issue

I think there isn't enough context here for the reader to know what "mmap issue" you have in mind. Could you briefly describe that LTP uses shared memory, which is not supported on SGX, so we need this hack? (and that unintuitively, it actually seems that LTP works with it)

Done.


.ci/lib/stage-test-sgx.jenkinsfile, line 13 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

SGX is already set in env, no need to hardcode it here (see e.g. .ci/linux-sgx-ubuntu18.04-gcc-release-apps.jenkinsfile)

Done.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r5.
Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jinengandhi-intel, @mkow, and @woju)

a discussion (no related file):

Previously, jinengandhi-intel wrote…

Increased the timeout to 45mins

Let's try now.



.gitignore, line 3 at r5 (raw file):

/build
/install
/LibOS/shim/test/ltp/src

This will probably be not needed if you modify the other gitignore to /src/lib/tst_test.c.


.ci/lib/stage-test-sgx.jenkinsfile, line 10 at r5 (raw file):

                # is basically that LTP uses shared memory that isn't
                # supported on SGX, as a workaround we changed the mmap flag
                # from MAP_SHARED to MAP_PRIVATE inside the LTP framework.

Slightly better wording:

# To run LTP tests with SGX (which doesn't support shared memory mappings),
# as a workaround we change the mmap flag from MAP_SHARED to MAP_PRIVATE
# (unintuitively, the LTP framework still works correctly with this change)

LibOS/shim/test/ltp/.gitignore, line 5 at r5 (raw file):

/ltp*.xml
/pal_loader
lib/tst_test.c

Shouldn't it be /src/lib/tst_test.c?

@dimakuv
Copy link

dimakuv commented Feb 26, 2021

Jenkins, test this please

Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jinengandhi-intel, @mkow, and @woju)

a discussion (no related file):
This time graphene-debug-18.04 job failed with mkdir09 test (timedout)

./runltp_xml.py -c ltp.cfg /home/jenkins/workspace/graphene-debug-18.04/LibOS/shim/test/ltp//install/runtest/syscalls -O ltp.xml
2021-02-26 07:31:31,166 LTP.mkdir09: cannot extract stdio on python < 3.7
2021-02-26 07:31:31,166 LTP.mkdir09: -> ERROR (Timed out after 30.0 s.)
2021-02-26 07:31:41,025 LTP: LTP finished tests=1327 passed=359 failures=0 errors=1 skipped=967 returncode=1
Makefile:68: recipe for target 'ltp.xml' failed

2 other SGX jobs failed with gitignore errors. I had updated the files but will re-check what went wrong there.


Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 5 files reviewed, 9 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv, @jinengandhi-intel, @mkow, and @woju)


LibOS/shim/test/ltp/.gitignore, line 5 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Shouldn't it be /src/lib/tst_test.c?

Instead in the gitmodules file I added the ignore=dirty which will ignore the file changes in ltp/src.

mkow
mkow previously approved these changes Mar 15, 2021
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r14.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv and @woju)

Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv and @woju)


LibOS/shim/test/ltp/ltp_sgx_mmap_fix.patch, line 3 at r11 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

+1, I don't think they'd accept such a patch. It should be rather done by reworking their IPC, not just patching it out, to be upstreamed.

Done.

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 6 files at r10, 1 of 4 files at r11, 3 of 4 files at r13.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv and @jinengandhi-intel)


LibOS/shim/test/ltp/ltp_sgx_mmap_fix.patch, line 3 at r11 (raw file):

Previously, jinengandhi-intel wrote…

Done.

As I said, this patch is not meant for real upstreaming, it rather should be thrown at the mailing list to bait upstream response, i.e. to have them say "this is wrong, because $X", as we don't know LTP enought to know $X. For example it might cause fake TPASSes.

Anyway, I'm resolving this thread, because it looks OT.


LibOS/shim/test/ltp/Makefile, line 28 at r12 (raw file):

Previously, jinengandhi-intel wrote…

Done.

I don't think this might be delayed to another PR. Unless someone demonstrated that this patch actually doesn't break the tests also without SGX (i.e., even those that are skipped in SGX, but run without SGX).

If we leave this as is, someone is bound to execute the suite with the patch applied, but under non-sgx graphene.

Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv, @mkow, and @woju)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

It should be [CI] LTP: <commit message onliner here> (LTP is a sub-component of LTP, or at least we treat it so)

Done from my side.



LibOS/shim/test/ltp/Makefile, line 24 at r12 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

+1 to woju, I'd remove this for simplicity (and because the patch checking is a bit hacky). If someone manually changes the files then it's expected that there will be problems.
Also, right now you just silently skip patching in case of conflicts, so if someone did an unrelated change which conflicts with the patch, then the patch will be silently skipped.

Done


LibOS/shim/test/ltp/Makefile, line 28 at r12 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

I don't think this might be delayed to another PR. Unless someone demonstrated that this patch actually doesn't break the tests also without SGX (i.e., even those that are skipped in SGX, but run without SGX).

If we leave this as is, someone is bound to execute the suite with the patch applied, but under non-sgx graphene.

I ran the LTP tests with Graphene non-SGX and see regression only for 2 tests, accept4_01 and select4, in both the cases all expected test pass but they each have a test that is skipped due to TCONF syscall not supported error. As I have already mentioned before, this changes does have some impact on the LTP parser and hence these tests are categorized as FAIL instead of PASS.

So it's not like this change breaks everything in non-SGX mode, also like I have said before we can make the community aware of such a limitation while we push the fix in a separate PR.

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv and @jinengandhi-intel)


LibOS/shim/test/ltp/Makefile, line 28 at r12 (raw file):

Previously, jinengandhi-intel wrote…

I ran the LTP tests with Graphene non-SGX and see regression only for 2 tests, accept4_01 and select4, in both the cases all expected test pass but they each have a test that is skipped due to TCONF syscall not supported error. As I have already mentioned before, this changes does have some impact on the LTP parser and hence these tests are categorized as FAIL instead of PASS.

So it's not like this change breaks everything in non-SGX mode, also like I have said before we can make the community aware of such a limitation while we push the fix in a separate PR.

"Only" for 2 tests?!

So no, this MAP_PRIVATE patch is too hacky for me then. I haven't seen the LTP with this patch been properly verified and validated to actually accomplish its goals. This patch severely lowers confidence in the test suite without any attempt to make convincing argument that LTP still works as advertised (that is, it PASSes when it should, and also FAILs when it should) beyond "I ran this once and it only 2 tests failed".

So I insist that some measure is needed which would satisfy that this patch won't be applied to LTP that is run for non-SGX graphene. I.e., if user runs make SGX=1 regression && make regression, the second make should run unmodified LTP. I suggest that different builddir is created for LTP with this patch. If LTP build system doesn't support such a case, you can always copy the directory before applying the patch and compile in different directory depending on $(SGX) variable.

Please do it in this PR, because I don't believe it would be ever done if postponed to a "separate PR".

Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 9 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv, @mkow, and @woju)


LibOS/shim/test/ltp/Makefile, line 28 at r12 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

"Only" for 2 tests?!

So no, this MAP_PRIVATE patch is too hacky for me then. I haven't seen the LTP with this patch been properly verified and validated to actually accomplish its goals. This patch severely lowers confidence in the test suite without any attempt to make convincing argument that LTP still works as advertised (that is, it PASSes when it should, and also FAILs when it should) beyond "I ran this once and it only 2 tests failed".

So I insist that some measure is needed which would satisfy that this patch won't be applied to LTP that is run for non-SGX graphene. I.e., if user runs make SGX=1 regression && make regression, the second make should run unmodified LTP. I suggest that different builddir is created for LTP with this patch. If LTP build system doesn't support such a case, you can always copy the directory before applying the patch and compile in different directory depending on $(SGX) variable.

Please do it in this PR, because I don't believe it would be ever done if postponed to a "separate PR".

Based on whether SGX is set or not, I set the BUILDDIR and INSTALLDIR in the makevars.mk, which then flows in to the Makefile. In the Makefile the PATCH_APPLIED will be called or both SGX and non-SGX, simply because for Graphene non-SGX you would need to revert the patch. The solution works for both Graphene non-SGX and SGX where the patch is not applied for SGX and for non-SGX even if there are artifacts from the SGX build, it will be reversed.


LibOS/shim/test/ltp/Makefile, line 86 at r15 (raw file):

	# LTPSCENARIO := $(INSTALLDIR)/$(LTPSCENARIO)
	./contrib/conf_lint.py $(CFG) --scenario $(LTPSCENARIO)
	./runltp_xml.py $(RUNLTPOPTS) $(foreach cfg,$(CFG),-c $(cfg)) $(LTPSCENARIO) -o ltproot=$(LTPROOT) -O $@

Additionally had to add this as runltp_xml.py by default uses the ltproot as install/testcases/bin and here for SGX I will pass the install-sgx/testcases/bin

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-SGX-apps please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-18.04 please

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r15, 1 of 1 files at r16.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv, @jinengandhi-intel, and @woju)


LibOS/shim/test/ltp/Makefile, line 30 at r16 (raw file):

	# (unintuitively, the LTP framework still works correctly with this change)
	# No such change is intended for LTP tests with non-SGX.
	$(eval export patch-status = $(shell patch -p1 -l -N -s --dry-run < ltp_sgx_mmap_fix.patch 1>&2 2> /dev/null; echo $$?))

Accepting this for now as discussed in private - @jinengandhi-intel will submit another PR later which will split source into 2 directories, one with patches and one without, so that we don't have the problem with patching/removing patch.


LibOS/shim/test/ltp/Makefile, line 84 at r16 (raw file):

%.xml: $(CFG) $(target) $(INSTALLDIR)/INSTALL_SUCCESS
	# LTPSCENARIO := $(INSTALLDIR)/$(LTPSCENARIO)

Why is this commented out?

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-18.04 please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-Debug please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-SGX-18.04-apps please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-SGX-18.04 please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-SGX please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-18.04 please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-Debug-18.04 please

Copy link
Contributor Author

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv, @mkow, and @woju)


LibOS/shim/test/ltp/Makefile, line 84 at r16 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why is this commented out?

Done. Have removed it.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r17.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv, @jinengandhi-intel, and @woju)


LibOS/shim/test/ltp/Makefile, line 86 at r15 (raw file):

Previously, jinengandhi-intel wrote…

Additionally had to add this as runltp_xml.py by default uses the ltproot as install/testcases/bin and here for SGX I will pass the install-sgx/testcases/bin

@jinengandhi-intel I guess this comment wasn't supposed to be blocking? (see the disposition status, you're blocking)

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-SGX-apps please

2 similar comments
@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-SGX-apps please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-SGX-apps please

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 4 files at r13, 1 of 2 files at r15, 1 of 1 files at r16.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @mkow and @woju)

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins please

@jinengandhi-intel
Copy link
Contributor Author

Jenkins, retest Jenkins-SGX-apps please

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants