-
Tsinghua University
- Beijing
-
10:30
- 8h ahead - http://yujia-qin.github.io/
- https://twitter.com/
Highlights
- Pro
Stars
A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
🚀 Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Utilities intended for use with Llama models.
Agentic components of the Llama Stack APIs
🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.
Code examples and resources for DBRX, a large language model developed by Databricks
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
DeepSeek-VL: Towards Real-World Vision-Language Understanding
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K…
A UI-Focused Agent for Windows OS Interaction.
Repo for paper "Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents"
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
A keyboard shortcut browser extension for keyboard-based navigation and tab operations with an advanced omnibar
🦜🔗 Build context-aware reasoning applications
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
[COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
An LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly.
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
An open-source remote desktop application designed for self-hosting, as an alternative to TeamViewer.