AI's hidden state in the execution stack
The natural way for developers to test out code is in a simple console application. That is a simple, obvious, and really easy way to test things out. It is also one of those things that can completely mislead you about the actual realities of using a particular API.
For example, let’s take a look at what is probably the most trivial chatbot example:
var kernel = Kernel.CreateBuilder()
.AddAzureOpenAIChatCompletion(...)
.Build();
var chatService = kernel.GetRequiredService<IChatCompletionService>();
var chatHistory = new ChatHistory("You are a friendly chatbot.");
while (true)
{
Console.Write("User: ");
chatHistory.AddUserMessage(Console.ReadLine());
var response = await chatService.GetChatMessageContentAsync(
chatHistory, kernel: kernel);
Console.WriteLine($"Chatbot: {response}");
chatHistory.AddAssistantMessage(response.ToString());
}
If you run this code, you’ll be able to have a really interesting chat with the model, and it is pretty amazing that it takes less than 15 lines of code to make it happen.
What is really interesting here is that there is so much going on that you cannot really see. In particular, just how much state is being kept by this code without you actually realizing it.
Let’s look at the same code when we use a web backend for it:
app.MapPost("/chat/{sessionId}", async (string sessionId,
HttpContext context, IChatCompletionService chatService,
ConcurrentDictionary<string, ChatHistory> sessions) =>
{
var history = sessions.GetOrAdd(sessionId, _ => new ChatHistory(
"You are a friendly chatbot."));
var request = await context.Request.ReadFromJsonAsync<UserMessage>();
history.AddUserMessage(request.Message);
var response = await chatService.GetChatMessageContentAsync(history,
kernel: kernel);
history.AddAssistantMessage(response.ToString());
return Results.Ok(new { Response = response.ToString() });
});
Suddenly, you can see that you have a lot of state to maintain here. In particular, we have the chat history (which we keep around between requests using a concurrent dictionary). We need that because the model requires us to send all the previous interactions we had in order to maintain context.
Note that for proper use, we’ll also need to deal with concurrency - for example, if two requests happen in the same session at the same time…
But that is still a fairly reasonable thing to do. Now, let’s see a slightly more complex example with tool calls, using the by-now venerable get weather call:
public class WeatherTools
{
[KernelFunction("get_weather")]
[Description("Get weather for a city")]
public string GetWeather(string city) => $"Sunny in {city}.";
}
var builder = Kernel.CreateBuilder().AddAzureOpenAIChatCompletion(...);
builder.Plugins.AddFromType();
var kernel = builder.Build();
var chatService = kernel.GetRequiredService();
var settings = new OpenAIPromptExecutionSettings {
ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
};
var history = new ChatHistory("You are a friendly chatbot with tools.");
while (true)
{
Console.Write("User: ");
history.AddUserMessage(Console.ReadLine());
var response = await chatService.GetChatMessageContentAsync(
history, settings, kernel);
history.Add(response);
Console.WriteLine($"Chatbot: {response.Content}");
}
The AutoInvokeKernelFunctions setting is doing a lot of work for you that isn’t immediately obvious. The catch here is that this is still pretty small & reasonable code. Now, try to imagine that you need a tool call such as: ReplaceProduct(old, new, reason)
.
The idea is that if we don’t have one type of milk, we can substitute it with another. But that requires user approval for the change. Conceptually, this is exactly the same as the previous tool call, and it is pretty trivial to implement that:
[KernelFunction("replace_product")]
[Description("Confirm product replacement with the user")]
public string ReplaceProduct(string old, string replacement, string reason)
{
Console.WriteLine($"{old} -> {replacement}: {reason}? (yes/no)");
return Console.ReadLine();
}
Now, in the same way I transformed the first code sample using the console into a POST request handler, try to imagine what you’ll need to write to send this to the browser for a user to confirm that.
That is when you realize that these 20 lines of code have been transformed into managing a lot of state for you. State that you are implicitly storing inside the execution stack.
You need to gather the tool name, ID and arguments, schlep them to the user, and in a new request get their response. Then you need to identify that this is a tool call answer and go back to the model. That is a separate state from handling a new input from the user.
None of the code is particularly crazy, of course, but you now need to handle the model, the backend, and the frontend states.
When looking at an API, I look to see how it handles actual realistic use cases, because it is so very easy to get caught up with the kind of console app demos - and it turns out that the execution stack can carry quite a lot of weight for you.
Comments
Comment preview
Join the conversation...