-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of a computer tool. #1063
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic commit! Made a bunch of small comments mostly small stuff and a couple of things for us to discuss in more depth.
For tests, we do have a way of mocking LLM calls to test the basic mechanics (see https://github.com/UKGovernmentBEIS/inspect_ai/blob/main/tests/tools/test_web_browser.py).
For other OS's, I think we can be content w/ Linux for now but definitely a worthy point of discussion.
@@ -0,0 +1,103 @@ | |||
FROM docker.io/ubuntu:22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if we can get this integrated with our internal image building scheme (should be quite easy).
WORKDIR $HOME | ||
|
||
# setup python | ||
RUN git clone https://github.com/pyenv/pyenv.git ~/.pyenv && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how much of this is Anthropic's opinionated idea of how to install/run Python vs. something more universal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjallaire take a look now. I've dramatically reduced the dockerfile getting rid of Anthropicisms that I could.
9d9a555
to
45451b6
Compare
This PR contains:
What is the current behavior? (You can also link to an open issue here)
What is the new behavior?
This PR adds:
computer
tool can interact with a sandboxed computer via mouse and keyboard. The tool actually comes in two forms:computer
which is a single tool that supports all of the available actions. e.g.computer("mouse_move", 100, 100)
computer_split
which provides the same functionality via alist
of tools. e.g.computer_mouse_move(100, 100)
. The split version is not publicly exported.examples/computer/computer.py
example eval.Dockerfile
that hasmutter
)inspect-computer-tool
image name that will be built automatically when used within acompose.yaml
file. This is temporary until we officially publish the image.examples/intervention
example to support the computer tool. Can be accessed by passing-T mode="computer"
on theinspect
command line.Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
No
Other information:
The Docker image in this PR borrows a variety of scripts from Anthropic's Computer Use Demo code. The primary value retained from that repo was the code to interact with
x11
viaxdotool
.The Anthropic specific code has been removed.
TODO:
ToolResult
should be exported frominspect_ai.tool
so that it can be referenced by the computer tool.MockLogger
.Obsolete
Move image fromepatey/inspect-computer-tool
over toaisiuk/inspect-computer-tool
.Nice to have
WAY later in a subsequent PR