-
Language Agents: Foundations, Prospects, and Risks
- Yu Su, Diyi Yang, Shunyu Yao, Tao Yu
- 🏛️ Institutions: OSU, Stanford, Princeton, HKU
- 📅 Date: November 2024
- 📑 Publisher: EMNLP 2024
- 💻 Env: [Misc]
- 🔑 Key: [survey], [tutorial], [reasoning], [planning], [memory], [multi-agent systems], [safty]
- 📖 TLDR: This tutorial provides a comprehensive exploration of language agents—autonomous systems powered by large language models capable of executing complex tasks through language instructions. It delves into their theoretical foundations, potential applications, associated risks, and future directions, covering topics such as reasoning, memory, planning, tool augmentation, grounding, multi-agent systems, and safety considerations.
-
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
- Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, Lingpeng Kong
- 🏛️ Institutions: Shanghai AI Lab, East China Normal University, Princeton, HKU
- 📅 Date: February 12, 2024
- 📑 Publisher: ICLR 2024 Workshop LLMAgents
- 💻 Env: [Desktop]
- 🔑 Key: [framework], [self-directed learning], [GAIA], [FRIDAY], [OS-Copilot]
- 📖 TLDR: The OS-Copilot framework supports building generalist agents capable of performing diverse tasks across an operating system (OS). This work introduces FRIDAY, an embodied agent using OS-Copilot to self-improve by learning from task outcomes. It operates with a memory-based architecture to tackle OS-level tasks across applications like terminals, web browsers, and third-party tools. Tested on the GAIA benchmark, FRIDAY achieved 35% higher performance than prior methods, proving effective in adapting to unfamiliar applications and refining its capabilities with minimal guidance.
-
Reflexion: Language Agents with Verbal Reinforcement Learning
- Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao
- 🏛️ Institutions: Northeastern University, MIT, Princeton
- 📅 Date: March 20, 2023
- 📑 Publisher: NeurIPS 2023
- 💻 Env: [Misc]
- 🔑 Key: [framework], [learning], [verbal reinforcement learning], [Reflexion]
- 📖 TLDR: This paper introduces Reflexion, a framework that enhances language agents by enabling them to reflect on task feedback linguistically, storing these reflections in an episodic memory to improve decision-making in future trials. Reflexion allows agents to learn from various feedback types without traditional weight updates, achieving significant performance improvements across tasks like decision-making, coding, and reasoning. For instance, Reflexion attains a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4's 80%.
-
ReAct: Synergizing Reasoning and Acting in Language Models
- Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
- 🏛️ Institutions: Princeton, Google Research
- 📅 Date: October 6, 2022
- 📑 Publisher: ICLR 2023
- 💻 Env: [Misc]
- 🔑 Key: [framework], [reasoning], [ReAct]
- 📖 TLDR: This paper introduces ReAct, a framework that enables large language models to generate reasoning traces and task-specific actions in an interleaved manner. By combining reasoning and acting, ReAct enhances the model's ability to perform complex tasks in language understanding and interactive decision making. The approach is validated across various benchmarks, demonstrating improved performance and interpretability over existing methods.
-
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
- Shunyu Yao, Howard Chen, John Yang, Karthik Narasimhan
- 🏛️ Institutions: Princeton
- 📅 Date: July 2022
- 📑 Publisher: NeurIPS 2022
- 💻 Env: [Web]
- 🔑 Key: [framework], [dataset], [benchmark], [e-commerce web interaction], [language grounding]
- 📖 TLDR: This paper introduces WebShop, a simulated web-based shopping environment with over 1 million real-world products and 12,087 annotated instructions. It allows language agents to navigate, search, and make purchases based on natural language commands. The study explores how agents handle compositional instructions and noisy web data, providing a robust environment for reinforcement learning and imitation learning. The best models show effective sim-to-real transfer on websites like Amazon, illustrating WebShop’s potential for training grounded agents.