Skip to content
Article

Autonomous Agents using LLMs

Introduction

Large Language Models (LLMs) are becoming more and more popular every day. Most use cases still revolve around typical content generation tasks, such as plain text generation, information extraction, summarization, or more advanced scenarios like question-answering. However, there's a growing number of discussions and implementations centered around agents, which represent the next level for LLMs. 

Custom Knowledge

To perform many of the tasks with LLMs, we normally rely on their inherent knowledge – their ability to answer questions, summarize, or extract information. But what if we wish to leverage some knowledge that the model lacks? 

This is where Retrieval Augmented Generation (RAG) comes into play. Simply put, RAG is a method of constructing the prompt in a way that introduces new knowledge (new to the LLM, at least). Obtaining this knowledge usually involves extracting it from a database, commonly referred to as a vector store. The process unfolds as follows: 

  1. The user poses a question.
  2. Relevant pieces of knowledge are sought in a vector store.
  3. This knowledge is incorporated into the prompt along with the original question.
  4. The LLM responds utilizing this newfound knowledge.

Although it may seem straightforward, the challenge lies in searching for relevant context, which is not a simple feat. Ensuring efficiency and accuracy requires chunking the original content, a task that often proves challenging while striving to maintain search quality at a high level. 

Beyond Custom Knowledge

RAG works well when the content is known and relatively static (although it may change, requiring updates to the vector store). This method finds widespread use in chatbots or virtual assistants, leveraging custom knowledge to assist users in answering their queries. 

But what if the knowledge is not static? Consider weather information, NASDAQ share prices, or our own API capable of, for instance, retrieving a list of offered destinations for a travel agency along with their prices. This presents a different challenge, one that isn't new: when designing a bot, you need to know when the specific external tools should be called and what the user expects. It's like a complex system of rules – for example:

"if the user wants to order a pizza, ask them to specify the size; available sizes can be fetched from the following endpoint."

The solution to such a challenge is well-established and widely adopted: intent-based chatbots. 

Intent-Based Chatbots 

These chatbots rely on machine learning algorithms based on textual information (though not LLMs). Let's delve into the structure of such a bot. 

To begin, you must define an intent – let's use a pizza ordering process as an example. An intent encapsulates the user's desire, such as "I want to order a pizza." Fulfilling this intent requires the user to provide certain details, such as size, type of pizza, and toppings, enabling the bot to execute the process and create an order. Sound simple? Not necessarily. Two significant challenges arise: 

  1. Text recognition. While a user might initiate the conversation with "Hi, can I order a pizza?", they can also go straight to "Do you have Napoli?" The bot must discern when the intent fulfillment process begins. As a bot designer, you must anticipate user responses at each stage. While it doesn't need to be 100% accurate, bot engines like Amazon Lex are trained on sample utterances to improve intent recognition. This requires careful consideration of potential user responses.
  2. Handling deviations from the expected path. Designing the bot based on the ideal scenario is relatively straightforward. However, what if the user, after stating "I want Napoli," immediately follows with, "No, no, I want Margherita, and with additional Pepperoni!" How does the bot detect and interpret the user's true intention? This aspect of designing intent-based bots demands significant User Experience work since it essentially functions as another type of user interface.

Addressing these challenges often involves providing users with explicit choices to fill slots accurately, such as "What size of pizza would you like to order? [small] [medium] [large]." While this simplifies the flow, it deviates from natural conversation, reducing interaction to selecting options step by step. The process is functional but lacks spontaneity. 

Intent-based bot flow

Figure 1. General schema of the intent-based bot flow. All intents and slots must be carefully designed, and hooks must be attached to the corresponding slots. 

Conversational Bots

With the emergence of LLMs, conversational bots became widely prevalent. Users could engage in discussions with bots across various platforms. This led to amusing situations, such as a user generating Python code on a car dealer's website – certainly not the intended outcome. 

Conversational bots also offered a great deal of freedom and simplicity in their setup. Designing them didn't require extensive hours of work. You could simply create a bot with a typically complex system prompt, and voilà! Later on, RAG began to be employed to integrate knowledge – we had an interactive call center utilizing our internal knowledge to assist users.  

Now, a new development is unfolding: the creation of intent-based bots without the need for overly complex UX processes, allowing the bot to execute tasks without rigidly defining how those tasks should be performed. 

What we desire is to instruct the LLM, for example, "Act as a pizza restaurant service. If the customer wants to order a pizza, ask them what size they want (out of possible 3: small, medium, and large), what type (using the given types: Napoli, Capricciosa, Pepperoni), and what the delivery address is. If you have all the information, make an order and send the email." To demonstrate the simplicity of this approach, we pasted the aforementioned prompt into the chatbot window of Amazon Bedrock’s playground (using Anthropic’s Claude 2 model). Here are the results: 

Initial system prompt: Act as a pizza restaurant service. If the customer wants to order a pizza, ask them what size (out of possible 3: small, medium and large), what type (using the given types: Napoli, Capricciosa, Pepperoni) and what is the address of delivery. If you have all the information, make an order and send the email. 

Bot: Thank you for contacting PizzaBot! I'm happy to take your pizza order.  

To get started, what size pizza would you like - small, medium, or large? 

Client: Medium please 

Bot: Sounds good, medium it is. What type of pizza would you like? The options are Napoli, Capricciosa, or Pepperoni. 

Client: Pepperoni  

Bot: Excellent, a medium Pepperoni pizza. Lastly, I'll need the delivery address please. 

Client: can I change the type of pizza to Napoli?  

Bot: No problem, let's change that to a medium Napoli pizza instead of Pepperoni. Can you please confirm the delivery address? 

Client: Sure, Gdańsk, Grunwaldzka 25. And please change the size to large. 

Bot: Got it, updating the order to a large Napoli pizza for delivery to: 

Gdańsk, Grunwaldzka 25 

Perfect, your order for a large Napoli pizza has been placed and will be delivered to Gdańsk, Grunwaldzka 25 shortly. I'll send a confirmation email with your order details and expected delivery time. Thank you for choosing PizzaBot! 

Fantastic, isn’t it? We don’t even have to design any scenarios for changing orders during the conversation. However, one crucial element is missing here – the types of pizzas, sizes, definitions, etc., cannot be imagined and must be provided using our internal data, which might change quite often. Furthermore, sending an order via email requires invoking an external endpoint. How can we achieve this? By utilizing agents. 

We can now join forces: the LLM bot will guide the conversation using prompts. What does it mean? We have mentioned in the intent-based bots' description that we need to create explicitly all planned intents – in other words, we need to know what the user really needs, what questions we need to ask and what answers are expected. This process for intent-based bots is difficult and requires a lot of UX planning. When using LLMs we have to specify the same, but just in plain English – no more painful process design and intent creation! 

intentVsConversational2

Figure 2. Natural conversation flow using the complex system prompt and API specification 

 

Moreover, if we provide information about available tools (such as an email sender or a pizza-type listing endpoint), the LLM will instruct us on what to do. If we parse the response properly (in a programmatic manner), we can call the external service, retrieve the information, return it to the LLM, and obtain a human-like response. This process is akin to obtaining context in RAG – the difference lies in the fact that RAG allows the use of static, indexed content, while agents utilize dynamic context by invoking external APIs or other sources of information. 

Numerous other scenarios are possible. For instance, consider a travel agency with an internal API exposing endpoints that return lists of destinations, details of specific destinations, and allow bookings. With this setup, you can develop a simple bot to assist clients in finding enticing offers and making bookings. 

Automation 

This concept can be extended to implement automation processes. Let’s consider a workflow where data needs validation, transformation, or textual parsing. Constructing such a pipeline using LLMs is more intricate than implementing a bot as described above, but it is still feasible. Additionally, it results in more flexible process management and simpler flow definition compared to typical orchestration tools. Moreover, altering the process definition is also more straightforward, as it is strictly defined in prompts using plain English. 

It is important to remember to be realistic. Implementing agents is not a simple task, as it requires crafting complex prompts that expose the available toolset and coding to transform the LLM response into API invocations. Nonetheless, this complexity can be mitigated by employing orchestration tools such as Amazon Bedrock. 

Summary

Agents provide us with significant freedom in designing complex bots and workflows without investing hours in designing UX flows or sophisticated rulesets. Nonetheless, there is a trade-off – since LLMs are inherently indeterministic, agents must be meticulously designed to yield acceptable responses and perform their tasks accurately. Doing so results in a remarkable reward: the ability to create intelligent assistants that operate (almost) as humans do! 

To explore how your organization can leverage Generative AI and LLMs, visit our page on Large Language Models.

Explore more articles