Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Configure codejail and run safety check at startup #10

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

timmc-edx
Copy link
Contributor

@timmc-edx timmc-edx commented Feb 6, 2025

  • Initialize codejail at startup, if CODE_JAIL is set
  • Run safety checks at startup, locking out the API if the checks fail

If codejail isn't properly configured, it defaults to running code unsafely. To prevent this from affecting the service, we run a smoke test at startup to check if there's anything just drastically wrong.

If this check does not pass, two things happen:

  • The healthcheck endpoint will never return a 200 OK
  • The code-exec endpoint will refuse with a 500 error

Supporting changes:

  • Define an explicit AppConfig for the api subpackage so that we can hook into the ready() mechanism
  • Wrap safe_exec to prevent codejail eagerly setting UNSAFE=True at module load time. (Not clear why this doesn't affect edx-platform; maybe something to do with app vs. middleware load order.) Filed Codejail safe_exec makes "unsafe=true" decision at startup codejail#225 for possibly fixing this.
  • safe_exec wrapper also performs a deepcopy to allow callers to reason about the globals dict more easily.

Other changes:

  • Clean up healthcheck docstring (mostly just trim it down)
  • Lint cleanup

Part of edx/edx-arch-experiments#927


Manual testing performed with changes to the Dockerfile and to devstack (PRs pending), and mostly entailed calling the healthcheck endpoint.

When passing, the startup logs look like this:

edx.devstack.codejail  | 2025-02-05 23:26:03,745 INFO 365 [codejail_service.startup_check] [user None] [ip None] startup_check.py:73 - Startup test 'Basic code execution' passed
edx.devstack.codejail  | 2025-02-05 23:26:03,819 INFO 365 [codejail_service.startup_check] [user None] [ip None] startup_check.py:73 - Startup test 'Block sandbox escape by disk access' passed
edx.devstack.codejail  | 2025-02-05 23:26:03,892 INFO 365 [codejail_service.startup_check] [user None] [ip None] startup_check.py:73 - Startup test 'Block sandbox escape by child process' passed

When codejail is misconfigured:

edx.devstack.codejail  | 2025-02-06 11:47:41,056 WARNING 419 [codejail] [user None] [ip None] safe_exec.py:305 - Using codejail/safe_exec.py:not_safe_exec for None
edx.devstack.codejail  | 2025-02-06 11:47:41,058 INFO 419 [codejail_service.startup_check] [user None] [ip None] startup_check.py:73 - Startup test 'Basic code execution' passed
edx.devstack.codejail  | 2025-02-06 11:47:41,058 WARNING 419 [codejail] [user None] [ip None] safe_exec.py:305 - Using codejail/safe_exec.py:not_safe_exec for None
edx.devstack.codejail  | 2025-02-06 11:47:41,059 ERROR 419 [codejail_service.startup_check] [user None] [ip None] startup_check.py:76 - Startup test 'Block sandbox escape by disk access' failed with: "Expected error, but code ran successfully. Globals: {'ret': ['var', 'home', 'lib64', 'tmp', 'boot', 'media', 'root', 'etc', 'srv', 'proc', 'usr', 'run', 'bin', 'dev', 'opt', 'lib', 'sbin', 'mnt', 'sys', 'edx', '.dockerenv', 'app', 'venv', 'sandbox', 'lib.usr-is-merged']}"
edx.devstack.codejail  | 2025-02-06 11:47:41,059 WARNING 419 [codejail] [user None] [ip None] safe_exec.py:305 - Using codejail/safe_exec.py:not_safe_exec for None
edx.devstack.codejail  | 2025-02-06 11:47:41,061 ERROR 419 [codejail_service.startup_check] [user None] [ip None] startup_check.py:76 - Startup test 'Block sandbox escape by child process' failed with: "Expected error, but code ran successfully. Globals: {'ret': '42\\n'}"

Merge checklist:
Check off if complete or not applicable:

  • Version bumped
  • Changelog record added
  • Documentation updated (not only docstrings)
  • Unit tests added/updated
  • Manual testing instructions provided
  • Noted any: Concerns, dependencies, migration issues, deadlines, tickets

- Initialize codejail at startup, if `CODE_JAIL` is set
- Run safety checks at startup, locking out the API if the checks fail

If codejail isn't properly configured, it defaults to running code
unsafely. To prevent this from affecting the service, we run a smoke test
at startup to check if there's anything just *drastically* wrong.

If this check does not pass, two things happen:

- The healthcheck endpoint will never return a 200 OK
- The code-exec endpoint will refuse with a 500 error

Supporting changes:

- Define an explicit AppConfig for the api subpackage so that we can hook
  into the `ready()` mechanism
- Wrap `safe_exec` to prevent codejail eagerly setting `UNSAFE=True`
  at module load time. (Not clear why this doesn't affect
  edx-platform; maybe something to do with app vs. middleware load
  order.) Filed openedx/codejail#225 for
  possibly fixing this.
- `safe_exec` wrapper also performs a deepcopy to allow callers to
  reason about the globals dict more easily.

Other changes:

- Clean up healthcheck docstring (mostly just trim it down)
- Lint cleanup

Part of edx/edx-arch-experiments#927
@timmc-edx timmc-edx marked this pull request as ready for review February 6, 2025 18:12
Copy link

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Comment on lines 30 to 32
else: # pragma: no cover
# Codejail needs this at startup
apply_django_settings(settings.CODE_JAIL)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if this were broken? Would some unit test fail, or is there a reason why we don't want coverage for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it broke, the service would fail to start. There isn't currently any unit test coverage, because we can't really configure codejail outside of a container.

...but it looks like I can set CODE_JAIL = {} in the settings/test.py, and that still allows tests to pass. I thought I had previously seen a failure with an empty settings block, but I can't reproduce that now. I'll add it in.

from django.test import TestCase
from django.urls import reverse


class HealthTests(TestCase):
"""Tests of the health endpoint."""

def test_healthcheck(self):
@patch('codejail_service.startup_check.STARTUP_SAFETY_CHECK_OK', None)
def test_unhealthy(self):
"""Test that the endpoint reports when all services are healthy."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring for test_unhealthy and test_healthy need to be swapped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix. Also going to add a couple more cases.

(responses(math=Exception("Divide by zero")), False),
)
@ddt.unpack
@patch('codejail_service.startup_check.STARTUP_SAFETY_CHECK_OK', None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this ends up as a parameter on the test? I'm confused about how this works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That surprised me too. But if you set the new parameter here, you don't get an additional argument to the decorated function.

If patch() is used as a decorator and new is omitted, the created mock is passed in as an extra argument to the decorated function.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is the new parameter? The None? Maybe I'm just used to patched functions, and that is what is throwing me off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this sets codejail_service.startup_check.STARTUP_SAFETY_CHECK_OK to None for the duration of the test and then sets it back to the original value afterwards.

Comment on lines 96 to 106
mock_log_info.assert_has_calls([
call("Startup test 'Basic code execution' passed"),
])
assert (
"Startup test 'Block sandbox escape by disk access' failed with: "
"\"Expected error, but code ran successfully. Globals: {'ret': ['"
) in mock_log_error.call_args_list[0][0][0]
assert (
"Startup test 'Block sandbox escape by child process' failed with: "
r'''"Expected error, but code ran successfully. Globals: {'ret': '42\\n'}"'''
) == mock_log_error.call_args_list[1][0][0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intentionally not want to check the count of the various log messages, to ensure there is nothing else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specifically, no. I mostly wanted to ensure that all of the useful information was present. I could add an additional check if you think it makes sense to do so.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for it is that it might ensure you add proper coverage if a new validation test and log message were ever added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so. I mostly care whether there's at least one failure message and at least one success message, but no harm in adding some count assertions.

STARTUP_SAFETY_CHECK_OK = not any_failed


def _test_basic_function():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any non-mocked test for this function, and is there any way to? Same for the other calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- test_logging and test_unsafe_tests_default both expect these to actually run.

Copy link

@robrap robrap Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Although maybe this should have been obvious, maybe in test_logging you could explain that because we don't actually have any protections enabled via AppArmor in the unit test, we can assume that all safety tests should fail.

Can you confirm that we don't have any tests that pass the safety tests by mocking at the safe_exec level? If not, would this be useful?
UPDATE: I guess that is what the first test in test_failure_modes is, maybe, which isn't a failure? I'm not following that test well. UPDATE 2: Maybe call it test_success_and_failure_modes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add that note to the test.

Having trouble understanding the second para.

- Set CODE_JAIL to empty dict during unit tests, which allows us to always
  call `apply_django_settings` instead of having an uncovered branch.
- Fix docstrings for healthcheck unit tests

Also:

- Cover additional cases in healthcheck tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants