What can this agent do? Basically nothing, that’s why it is the useless-agent.
Why is it interesting?
- Uses text-only LLMs.
- Cheap: I spent about $4.57 playing with it for about 7 evenings.
- Single binary. (almost, see todo list)
- Easy to use: run the binary and copy the IP address.
- No telemetry, no bullshit.
- IPv4 & IPv6(*should work, not tested yet)
Caution
- Only use this on a disposable virtual machine.
- It can, and most likely will, destroy your system.
- The LLM API provider has a realistic ability to inject malicious commands/actions/data into the ingested API responses.
- The video is not compressed. If you are connected to a virtual machine in the cloud, be aware of high internet traffic.
- The video stream and everything else, except the API queries, are not encrypted. If connecting to a remote machine, use an SSH tunnel with port forwarding.
Note
It is super slow. Right now, speed is not a priority. If your only problem is speed, you have already won the agents game.
Currently supported models:
deepseek-chat
deepseek-reasoner
Environment: Only works on - Linux + xfce + X11
.
Version - v0.0.0.0.0.1
.
If it were a web service with an SLA, its SLA would guarantee an availability of 0.00001%.
Prompt
: Open a web browser and go to deepseek.com
demo-1-compressed-2.mp4
git clone
cd useless-agent
go build
Scripts - "assets/scripts"
sudo apt install xfce4 xvfb tesseract-ocr-eng tesseract-ocr libtesseract5 libleptonica-dev libtesseract-dev
sudo apt remove xfce4-screensaver
sudo systemctl enable xvfb.service
sudo systemctl enable xfce4.service
sudo systemctl start xvfb.service
sudo systemctl start xfce4.service
copy executable to the target machine
start executable:
API_KEY=your-api-token-value DISPLAY=:1 API_BASE_URL='https://api.deepseek.com/v1' MODEL_ID='deepseek-chat' ./useless-agent --ip=127.0.0.1 --port 8080
On client machine open main.html in the browser.
Put target machine IP into the field "IP Address".
Click "Connect".
Click "Video".
Give it some task, for example "Open web browser", put that prompt into the LLM Chat and press "Send".
Tip
Like to burn money? Try more capable LLMs; using DeepSeek R1 instead of v3 would probably make the program more capable of doing nothing.
How the project started: I just wanted to take a screenshot over the network.
- Build a single fully static binary.
- Add stats about burned tokens per session/task.
- Add dead kitten counter
- Allow LLM to spawn local 'thoughts', which would do/monitor something and then allow them to interrupt the main LOOP and to inject its results into the thinking loop.
- Build a unified concept space for models that are capable of ingesting more data.
- Pause task execution.
- Allow intervention in the execution process and provide guidance/additional instructions.
Problems & Ideas:
- LLMs' context window/input size is so limiting; I want to shuffle in 100M million tokens at each iteration.
- How to do reliable OCR? Don’t do it at all. The idea was to find a single point where Linux renders fonts for the whole system and intercept it, to set something like an eBPF hook and get all text for free.
Turns out there is no single point of font rendering in Linux!
Each X11 client app is allowed to render fonts itself. Nice. - How hard could it be to write a single function to recognize windows? They have unified patterns, right?
Wrong. Each window is allowed to draw its own header however they like.
I’m looking at you, Firefox. Nice.