Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Headful browsing to Agent Web Tooling #1080

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

cmathw
Copy link

@cmathw cmathw commented Jan 6, 2025

This PR contains:

  • New features
  • Changes to dev-tools e.g. CI config / github tooling
  • Docs
  • Bug fixes
  • Code refactor

What is the current behavior? (You can also link to an open issue here)

The web_browser tooling only makes use of headless browser, it is not possible to view this browser graphically in real-time.

What is the new behavior?

This PR implements headful browsing, allowing users to view an agent's interactions with the browser in real time via a VNC viewer.

Tests have been added for this new behaviour in test_playwright_crawler. These are essentially just parameterising playwright's headless flag to test both True and False, instead of just True originally.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

This PR should be accompanied by an update to the image (aisiuk/inspect-web-browser-tool) hosted at Dockerhub for web_browser tooling. This PR's implementation defaults to headless mode. For users making use of web_browser tooling who are not aware of this PR, there should be no downstream impacts.

Other information:

This PR's implementation makes use of Docker targets, the user selects a build target in their task's docker-compose file. For example, in the examples/browser/compose.yaml file, the user can choose headful browsing with:

services:
  default:
    build:
      context: ../../src/inspect_ai/tool/_tools/_web_browser/_resources
      dockerfile: Dockerfile
      target: headful
    ports:
      - "127.0.0.1:5900:5900"

If a target is not specified, or it is specified as headless the Dockerfile will build the headless image. In this instance, it is not necessary to specify the 127.0.0.1:5900:5900 port mapping either (this is only needed for VNC viewing).

This PR achieves this broadly by:

  • Replacing the playwright headless flag with an environment variable instead of hardcoding to True (i.e. the "HEADLESS" environment variable).
  • Each Docker build set this "HEADLESS" environment variable depending on the target chosen by the user. We use Docker targets approach such that only the layer's necessary for each respective build are included.
  • The same entrypoint.sh file is copied across into the web_browser container irrespective of target, it uses the "HEADLESS" environment variable to dictate the actions taken at container start.

Note: When running the test file, a chromium browser will open/close for the headful tests. If necessary I'm open to adding decorators to these tests so they are only run when specified. These tests are not located in the tests/ dir though and won't run calling make tests.

@jjallaire
Copy link
Collaborator

Thank you! This looks like an excellent addition. Some questions:

  1. After we publish the new image, am I correct in thinking that the target: headful (plus the VNC proxying) in compose.yaml is all a user will need to enable this mode? (i.e. they don't need to actually rebuild the image locally or do they?). In any case it seems like making sure this works w/ configuration only is desirable (but if that's problematic in some fashion do LMK).

  2. Docker compose can do interpolation of environment variables. I wonder if there is a way we could enable this behavior in the container AND have Inspect make the right call off of a single environment variable (e.g. INSPECT_WEB_BROWSER_TOOL_TARGET=headful). Note that this can be used to forward into the HEADLESS environment variable.

Take these above suggestions as really me probing at what the workflow will be for end users. I'm open to anything that makes things as transparent and straightforward as possible.

cc'ing @epatey as well here who is working on implementing a full desktop computer tool (as discussed w/ @jmsdao back in August/September). @epatey The Harmony intelligence folks have also been working on desktop computer tools (although I believe possibly using GCP rather than in a container?).

It would be good to compare notes on how we are setting up and running VNC so its consistent across our images (@cmathw our current work is based on the Anthropic example so definitely fungible if there is a better way!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants