Skip to main content

Command Palette

Search for a command to run...

AI Agents: Unlocking Future Workflows

Fundamentals and The Rise of Agentic AI

Updated
3 min read
AI Agents: Unlocking Future Workflows
S

A life long learner and tech tinkerer. Making and breaking systems. Learning from mistakes.

Introduction

In the fast-evolving world of AI, "agents" have become a buzzword, promising autonomous systems that go beyond simple chatbots. Drawing from Andrew Ng's seminal talk at Sequoia Capital's AI Ascent in March 2024, this post explores what AI agents really are, how they work, and why they're set to transform industries. Andrew Ng, a pioneer in deep learning, argues that agentic workflows’ iterative processes mimicking human thinking could deliver GPT-5-level performance using today's models like GPT-4. Let's dive in.

What Makes an AI Agent?

Traditional AI interactions are "non-agentic": You prompt a model like ChatGPT, and it generates a response in one go. It's impressive but limited like asking someone to write an essay without revisions.

Enter agentic workflows. These are looped processes where the AI plans, acts, observes, and refines. For example, to write an essay, an agent might outline, research via web tools, draft, self-critique, and revise. The result? Dramatically better outputs.

Ng illustrates this with benchmarks:

  • HumanEval (Coding): Zero-shot GPT-4 solves 67% of problems. But wrap GPT-3.5 in an agentic loop, and it hits ~85%—surpassing the more advanced model (Evaluating Large Language Models Trained on Code, Chen et al., 2021).

  • GSM8K (Math Problems): Similar gains, showing agents amplify existing models.

True agents aren't fully autonomous yet, but these workflows are a crucial step toward AGI.

The Four Pillars of Agentic Design

Andrew Ng outlines four patterns:

  1. Reflection: The AI reviews its own work. Prompt it to "check your code for bugs and efficiency," and it iterates. Papers like Self-Refine show GPT-4 improving itself without extra data.

  2. Tool Use: Agents call external resources. Gorilla fine-tunes models for API calls, outperforming GPT-4; MM-REACT integrates vision tools for multimodal tasks.

  3. Planning: Break tasks into steps with Chain-of-Thought prompting. It turns flat performance curves into steep gains on reasoning benchmarks.

  4. Multi-Agent Collaboration: Teams of agents specialize and debate. ChatDev simulates a company (CEO, coder, tester) to build software; multi-agent debates (e.g., GPT vs. Gemini) boost accuracy.

These patterns are messy but potent. Reflection is reliable, while multi-agent can be "mind-blowing" but inconsistent.

Building and Applying Agents Today

You don't need code. Tools like n8n, CrewAI, Autogen, LangChain let you prototype workflows. Ng advises focusing on fast models for more iterations. Speed enables 100x loops, potentially outpacing slower, smarter models.

Applications? Automate coding, research, or even game development. For businesses, this means productivity leaps; for developers, it's a new paradigm.

  1. Reflection

    • Self-Refine: Iterate Refinement with Self-Feedback, Madaan et al. (2023)

    • Reflexion: Language Agents with Verbal Reinforcement Learning, Shinn et al. (2023)

  2. Tool Use

    • Gorilla: Large Language Model Connected with Massive APIs, Patil et al. (2023)

    • MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, Yang et al. (2023)

  3. Planning

    • Chain-of-Thought Prompting Elicit Reasoning in Large Language Models, Wei et al. (2022)

    • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023)

  4. Multi-Agent Collaboration

    • Communicative Agents for Software Development, Qian et al. (2023)

    • AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, Wu et al. (2023)

AI agents aren't hype, they're here. By embracing agentic workflows, we can unlock unprecedented potential. The set of tasks that AI can do will expand dramatically because of agentic workflows. We have to get used to delegating tasks to AI agents and patiently wait for a response.