AI's hidden state in the execution stack

Aug 18 2025

AI's hidden state in the execution stack

time to read 8 min | 1496 words

The natural way for developers to test out code is in a simple console application. That is a simple, obvious, and really easy way to test things out. It is also one of those things that can completely mislead you about the actual realities of using a particular API.

For example, let’s take a look at what is probably the most trivial chatbot example:

var kernel = Kernel.CreateBuilder()
    .AddAzureOpenAIChatCompletion(...)
    .Build();


var chatService = kernel.GetRequiredService<IChatCompletionService>();
var chatHistory = new ChatHistory("You are a friendly chatbot.");


while (true)
{
    Console.Write("User: ");
    chatHistory.AddUserMessage(Console.ReadLine());
    var response = await chatService.GetChatMessageContentAsync(
        chatHistory, kernel: kernel);
    Console.WriteLine($"Chatbot: {response}");
    chatHistory.AddAssistantMessage(response.ToString());
}

If you run this code, you’ll be able to have a really interesting chat with the model, and it is pretty amazing that it takes less than 15 lines of code to make it happen.

What is really interesting here is that there is so much going on that you cannot really see. In particular, just how much state is being kept by this code without you actually realizing it.

Let’s look at the same code when we use a web backend for it:

app.MapPost("/chat/{sessionId}", async (string sessionId, 
    HttpContext context, IChatCompletionService chatService,
    ConcurrentDictionary<string, ChatHistory> sessions) =>
{
    var history = sessions.GetOrAdd(sessionId, _ => new ChatHistory(
        "You are a friendly chatbot."));


    var request = await context.Request.ReadFromJsonAsync<UserMessage>();


    history.AddUserMessage(request.Message);


    var response = await chatService.GetChatMessageContentAsync(history,
        kernel: kernel);
    history.AddAssistantMessage(response.ToString());


    return Results.Ok(new { Response = response.ToString() });
});

Suddenly, you can see that you have a lot of state to maintain here. In particular, we have the chat history (which we keep around between requests using a concurrent dictionary). We need that because the model requires us to send all the previous interactions we had in order to maintain context.

Note that for proper use, we’ll also need to deal with concurrency - for example, if two requests happen in the same session at the same time…

But that is still a fairly reasonable thing to do. Now, let’s see a slightly more complex example with tool calls, using the by-now venerable get weather call:

public class WeatherTools
{
    [KernelFunction("get_weather")]
    [Description("Get weather for a city")]
    public string GetWeather(string city) => $"Sunny in {city}.";
}
var builder = Kernel.CreateBuilder().AddAzureOpenAIChatCompletion(...);
builder.Plugins.AddFromType();
var kernel = builder.Build();
var chatService = kernel.GetRequiredService();
var settings = new OpenAIPromptExecutionSettings { 
ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions 
};
var history = new ChatHistory("You are a friendly chatbot with tools.");
while (true)
{
    Console.Write("User: ");
    history.AddUserMessage(Console.ReadLine());
   var response = await chatService.GetChatMessageContentAsync(
history, settings, kernel);
    history.Add(response);
   Console.WriteLine($"Chatbot: {response.Content}");
}

The AutoInvokeKernelFunctions setting is doing a lot of work for you that isn’t immediately obvious. The catch here is that this is still pretty small & reasonable code. Now, try to imagine that you need a tool call such as: ReplaceProduct(old, new, reason).

The idea is that if we don’t have one type of milk, we can substitute it with another. But that requires user approval for the change. Conceptually, this is exactly the same as the previous tool call, and it is pretty trivial to implement that:

[KernelFunction("replace_product")]
[Description("Confirm product replacement with the user")]
public string ReplaceProduct(string old, string replacement, string reason)
{
    Console.WriteLine($"{old} -> {replacement}: {reason}? (yes/no)");
    return Console.ReadLine();
}

Now, in the same way I transformed the first code sample using the console into a POST request handler, try to imagine what you’ll need to write to send this to the browser for a user to confirm that.

That is when you realize that these 20 lines of code have been transformed into managing a lot of state for you. State that you are implicitly storing inside the execution stack.

You need to gather the tool name, ID and arguments, schlep them to the user, and in a new request get their response. Then you need to identify that this is a tool call answer and go back to the model. That is a separate state from handling a new input from the user.

None of the code is particularly crazy, of course, but you now need to handle the model, the backend, and the frontend states.

When looking at an API, I look to see how it handles actual realistic use cases, because it is so very easy to get caught up with the kind of console app demos - and it turns out that the execution stack can carry quite a lot of weight for you.

Tweet Share Share 2 comments

Tags:

Comments

08 Sep 2025
13:56 PM

Frank Quednau

While in the middle of a POC to get the details right of a "Get user approval for a tool call with yes / no / always allow this particular tool" semantics, you're quite right to say that there is quite a bit of state-keeping involved

08 Sep 2025
13:58 PM

Oren Eini

Frank,

I'm doing a webinar today where we show off what we can do there with RavenDB. Would love to have you there: https://discord.com/invite/ravendb?event=1410573390154174506

Oren Eini

Oren Eini

CEO of RavenDB