AI Agents: A new paradigm for automating your work
One of the key concepts gaining traction in the AI field is the emergence of 'AI Agents'. Unlike traditional chatbots or generative AI systems, AI agents are differentiated by their ability to go beyond simply answering questions and perform complex work processes from start to finish on their own. Major technology companies like OpenAI, Nvidia, and Salesforce, as well as market research firm Gartner, are predicting that AI agents will be a key technology of the future.
What is special about AI agents
While traditional generative AI has been focused on responding to user questions or generating a specific set of text, AI agents have a much higher level of "agency" and can perform tasks that would otherwise require human intervention. They can automate multi-step processes, retrieve the information they require from external APIs or databases, and complete the final product, akin to a human counterpart.
Car navigation vs. autonomous vehicles
You can think of traditional generative AI as "car navigation." It gives you directions or traffic updates, but you are still in control of the steering and pedals. AI agents, on the other hand, are like autonomous vehicles. They won't just give you directions—they can start the car, anticipate hazards, react appropriately, and drive themselves.
Tool Calling: a primary enabler for agents
In recent years, many libraries and low-code platforms have been developed to make creating AI agents easier than ever. One of the most intriguing advancements is "tool calling," which enables AI models to go beyond text generation and directly call functions (tools) to access external data or services and take real-world actions.
For instance, you could call the 'search_google' function to look for specific keywords and take further action based on the search results. The ability of a model to fetch necessary information, make appropriate decisions, and solve problems in multi-step processes is one of the most powerful features of AI agents.
Two methods agents reason: evaluate-plan vs. tool-use
AI agents utilize two main reasoning approaches for problem-solving:
- Reasoning through Evaluation & Planning (Evaluation & Planning)
This involves breaking down a problem into steps, analyzing each step, and planning the next one. Methods like Chain-of-Thought and ReAct are used to process outcomes step-by-step and craft a strategy to achieve the desired goal. - Reasoning with Tool Use
This is the ability to call APIs or functions to fetch data or perform actions when needed. Deciding which tool to call and how to utilize its output falls under this category. Some models excel at evaluation and planning, while others specialize in tool use. Combining both methods results in stronger, more capable agents.
Key benchmarks for evaluating model-agent tool-calling ability
Several benchmarks have been developed to evaluate how well models handle tool-calling tasks:
- BFCL (Berkeley Function Calling Leaderboard): Features over 2,000 question-function-answer triples covering diverse programming languages. The latest version, BFCL-V3, includes multi-step and multi-turn function calls, letting you assess how proficient a model is at executing functions in the correct sequence.
- Nexus Function Calling Benchmark: Tests how well models utilize APIs in zero-shot scenarios with challenges like single, parallel, and nested tool calls. Models such as NestRaven have demonstrated strong performance using this benchmark.
- ToolACE: With a synthetic dataset of over 26,000 APIs, this benchmark examines how models handle complex tool environments. Research is ongoing to refine evaluation methods for large-scale tool-calling scenarios.
The complex problems of tool calling
To make tool calling effective, several challenges must be addressed:
- Choosing the right tool: Deciding which tool to use and ensuring the correct input is passed.
- Format consistency: Adhering to required formats like JSON, YAML, and XML when calling functions.
- Managing sequencing and dependencies: Understanding how the output of one tool can serve as input for another.
- Loop prevention: Avoiding infinite loops by carefully managing agent behavior.
- Security and ethical issues: Mitigating risks posed by connecting to external services or databases.
Given this complexity, it’s common to divide tasks among multiple models or agents that work together as a composite system. Each model specializes in tasks like document summarization, tool selection, or result validation, creating a more robust solution overall.
Example of tool calling in action
- Level 1 : ChatGPT web browsingWhen you enable the web search function in ChatGPT, it calls the ‘web search’ function internally to get real-time information. This is a basic level tool calling example.
- Level 2 : Define and use customized tools For example, you can define functions like ‘get_distance_between_locations’, ‘get_current_weather’, and set your model to call them. You can use tools like Databricks Playground to see how your model calls each function and processes the results.
- Level 3 : Configure the agentWhen you’re done with the actual code execution layer, you can configure a complete agent. Various frameworks are utilized in this process, including LangGraph, AutoGen, Semantic Kernel, LlamaIndex, and others.
Real-world applications and potential for agents
- Insurance industry : automate the entire process from claim to compensation by analyzing documents-images-PDFs, verifying the content of contracts, and, if necessary, exchanging questions and answers with customers.
- Logistics : automate complex logistics processes by tracking lost shipments, managing communications with multiple stakeholders, and even compensating them when necessary.
- Marketing : From campaign planning to execution, performance analysis, and optimization, these AI agents can work independently to maximize efficiency.
AI agents like these can reduce the amount of human and time resources spent on tasks while freeing up employees to focus on more creative and strategic tasks.
Things to consider when deploying
Of course, there are no shortage of obstacles to overcome in building and fielding agents: you may need large-parameter models, you may need to meet ethical and legal obligations, and most importantly, you may not get the performance you want without getting data quality and processes fully in order in your organization.
What’s ahead
AI agents are not a passing fad—they're the future of value creation for industries. Such systems already are showing their ability to automate between 70–90% of real-world repetitive tasks by elegantly integrating reasoning competencies, i.e., planning and assessment, with high-end tool-calling functionalities.
There is also an apparent trend of shifting towards creating smaller, specialized models. The bigger models are interested in addressing high-level, strategic issues, while the smaller ones are for executing focused tasks. Together, they offer systems that are both compact and versatile, adapting to diverse requirements with precision.
At the root of their success is a single, basic question: "Are they actually useful?" No matter how advanced the technology, it's only worthwhile if it delivers actual value. Fortunately, AI agents are already proving themselves useful across dozens of industries, and the rapidity with which they're being created guarantees that soon they'll be capable of even more. Shared platforms and tools are also emerging, opening the door for anyone—regardless of technical expertise—to create personalized agents for their uses.
Billed as the 'next generation of AI' these agents are being designed to go beyond simple question-and-answer functionality. They're starting to manage complex workflows independently, deciding which tools are best for the job, and completing tasks without human involvement. This revolution is already in progress, so the moment is ripe to explore their potential.