Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of a computer tool. #1063

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

epatey
Copy link
Contributor

@epatey epatey commented Jan 2, 2025

This PR contains:

  • New features
  • Changes to dev-tools e.g. CI config / github tooling
  • Docs
  • Bug fixes
  • Code refactor

What is the current behavior? (You can also link to an open issue here)

What is the new behavior?

This PR adds:

  • A squashed and cherry-picked copy of feature/anthropic-native-bash-tool from @jjallaire.
  • A new computer tool can interact with a sandboxed computer via mouse and keyboard. The tool actually comes in two forms:
    • computer which is a single tool that supports all of the available actions. e.g. computer("mouse_move", 100, 100)
    • computer_split which provides the same functionality via a list of tools. e.g. computer_mouse_move(100, 100). The split version is not publicly exported.
  • A new examples/computer/computer.py example eval.
  • An Ubuntu Dockerfile that has
    • An RPC supporting the computer tool
    • A minimal desktop window manager (mutter)
    • VNC support (exposes port 5900)
  • Adds support for an internal inspect-computer-tool image name that will be built automatically when used within a compose.yaml file. This is temporary until we officially publish the image.
  • Updates code that synthesizes "Image content is in the message below." messages to no longer mutate the source of truth. (largely from @jjallaire)
  • Updates the examples/intervention example to support the computer tool. Can be accessed by passing -T mode="computer" on the inspect command line.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No

Other information:

The Docker image in this PR borrows a variety of scripts from Anthropic's Computer Use Demo code. The primary value retained from that repo was the code to interact with x11 via xdotool.

The Anthropic specific code has been removed.

TODO:

  • Properly stall tool/sandbox until container's x11 is ready for screenshots.
  • Investigate if ToolResult should be exported from inspect_ai.tool so that it can be referenced by the computer tool.
  • Remove MockLogger.
  • Scrub tool command docstrings.
  • Enable remaining example eval samples.
  • Tests?

Obsolete

  • Move image from epatey/inspect-computer-tool over to aisiuk/inspect-computer-tool.

Nice to have

  • Figure out what's up with the occasional bell when running the example eval.
  • Figure out how to use relative package imports for the in image python code.

WAY later in a subsequent PR

  • Sandbox for Mac/Windows VM

Copy link
Collaborator

@jjallaire jjallaire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic commit! Made a bunch of small comments mostly small stuff and a couple of things for us to discuss in more depth.

For tests, we do have a way of mocking LLM calls to test the basic mechanics (see https://github.com/UKGovernmentBEIS/inspect_ai/blob/main/tests/tools/test_web_browser.py).

For other OS's, I think we can be content w/ Linux for now but definitely a worthy point of discussion.

src/inspect_ai/tool/__init__.py Outdated Show resolved Hide resolved
src/inspect_ai/tool/_tools/_computer/_computer.py Outdated Show resolved Hide resolved
src/inspect_ai/tool/_tools/_computer/_computer.py Outdated Show resolved Hide resolved
src/inspect_ai/tool/_tools/_computer/_computer.py Outdated Show resolved Hide resolved
src/inspect_ai/tool/_tools/_computer/_computer.py Outdated Show resolved Hide resolved
src/inspect_ai/tool/_tools/_computer/_computer_common.py Outdated Show resolved Hide resolved
src/inspect_ai/tool/_tools/_computer/_computer_common.py Outdated Show resolved Hide resolved
examples/computer/compose.yaml Outdated Show resolved Hide resolved
@@ -0,0 +1,103 @@
FROM docker.io/ubuntu:22.04
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if we can get this integrated with our internal image building scheme (should be quite easy).

WORKDIR $HOME

# setup python
RUN git clone https://github.com/pyenv/pyenv.git ~/.pyenv && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how much of this is Anthropic's opinionated idea of how to install/run Python vs. something more universal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjallaire take a look now. I've dramatically reduced the dockerfile getting rid of Anthropicisms that I could.

@epatey epatey force-pushed the computer branch 4 times, most recently from 9d9a555 to 45451b6 Compare January 8, 2025 02:21
@epatey epatey marked this pull request as ready for review January 8, 2025 15:08
@epatey epatey changed the title DRAFT initial steps in computer tool Initial implementation of a computer tool. Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants