The future of AI interaction: Beyond just text

We're only scratching the surface of what it means to interact with our services using AI and natural language. The Model Context Protocol (MCP) is opening up exciting possibilities in this space, but I think we're about to see a major shift in how we think about these interactions.

My mind was initially opened to the possibilities of using AI to interact with data and make data mutations when I saw Evan Bacon's talk at ReactConf 2024. While his talk wasn't specifically about AI and MCPs (the spec hadn't even been invented yet), his demo used an LLM to interpret natural language queries and determine which components to display based on the responses. If you didn't watch the video above, catch the demo here.

This was truly mind-blowing for me. At the time, I was mostly interested in the power of React Server Components he was demonstrating, but it was fascinating to see an integration that went beyond just text in an app — where you could interact with a large language model like ChatGPT, Claude, or Gemini.

Evan is a brilliant developer and had to write a lot of custom code for that demo to work with the different service providers. With MCPs, you can still write that custom code to do those integrations, but the service providers can (and I think will) write their own MCP servers to meet their users where they are: in an AI application like this one.

What users really want

This is the user experience that our users want. They don't want to navigate through 30 different websites (or even a single complex site) to accomplish one task. They want a single interface where they can express exactly what they want to do or what information they need, and then have that information presented in the most reasonable way possible.

This realization led me to open a discussion on the Model Context Protocol GitHub about having MCP clients support UI that MCP servers return. Whether that UI is provided by the server or generated by the client on demand, it's becoming clear that users often want to interact with UI elements, not just text.

The importance of UI in AI interactions

Let me give you a concrete example. Imagine a user says, "I want to find directions to the nearest taco stand." Sure, an LLM could use an MCP server to get their location, then use a Google Maps MCP server to find directions, and spit out a text response like "Turn left, then right, walk a mile, then turn left again."

But is that really a great user experience? Of course not!

What would be much better is if Google Maps returned some UI that showed exactly what you'd expect from a Google Maps directions interface — a UI generated on-demand for the user's specific use case.

Here are a few more examples where UI is crucial:

Stopwatch: If my kid asks me to time their handstands (which they definitely have), it's much better to have start and stop buttons than to type or say "start" and "stop."
Data visualization: Displaying data in a graphical interface with charts and the ability to select data points is far more useful than a text description in many cases.
Multimodal interactions: Sometimes, you might want both button and voice controls. Imagine timing your reps while lifting weights — you might want to use voice commands when your hands are occupied.

The future of UI development

Now, I've heard some people worry that UI developers are going to be out of a job because AI is going to make that discipline obsolete. But I strongly disagree.

Users are not going to want to interact with AI via text or voice alone all the time. We're going to continue to need visualizations, buttons, and other types of interactive experiences. As Evan Bacon's demo shows, this is not only possible but can create an awesome experience.

Looking forward

I don't know exactly what shape this will take. Maybe it'll end up using React Server Components on web and mobile, as Evan demonstrated. Or perhaps we'll have AI generate UI on demand based on some tokens we provide, allowing for consistent branding and look-and-feel, but still completely dynamic UIs. Or maybe both.

What I do know is this: the future of AI assistants that help us solve our day-to-day problems and answer our questions is going to involve much more than just text. And I, for one, am very much looking forward to it.

So, the next time you're interacting with an AI assistant, imagine what it could be like if it wasn't limited to just text. The possibilities are truly exciting, and we're only at the beginning of this journey.

Hey, I want to show you something that just blew my mind last year and kind of informed a lot of what I think the future is going to look like for user interaction. And now that we have MCPs, we can actually do this in a really reasonable way. And it actually is a demo that I saw at React Conf from Evan Bacon. So if you check the link below this video, wherever, to see this actual demo. But we're gonna scroll through, yeah.

So Evan Bacon, awesome, awesome person. And he was talking about React server components and using that in a native environment. Really, really cool stuff. But he showed this demo that, the demo that he chose was, let's take an AI app and compare ChatGBT and Gemini's experience in listing movies versus what we could do with React Native and stuff. And I think that what he did was really remarkable.

So here we go. If I were to ask list movies from chat GPT or cloud or something, I'd get a list of text, but here we're getting an interactive piece of UI. So that's quite nice. And then we take this even further, and now he's going to say, hey, create an event. And look, we have an event card where we can create an event and that integrates directly with the UI from our local app.

We can send a message and we have a UI for sending a message. Maybe I think with natural language and stuff, you could literally just say, hey, send a message to Charlie and here's what it is. And maybe you wouldn't necessarily need a UI who wants to type when you can talk. But maybe you're in a library and you want to be able to talk. But the cool thing is that all of this is generated as we go.

Hey, show me things to do in the Las Vegas Strip. Look, we've got a map. So we don't have to have directions or like a bunch of text for all of this stuff. Taking it further, we have so many examples of weather with MCPs. It's like the canonical introductory example for people.

Having a weather card with some UI even though I'm not actually going to interact with it a whole lot is a lot better than just having some text and Here we have like you actually do have some interaction You can change it to Fahrenheit if you want to or Celsius. And then, of course, booking a ride with Uber. I can imagine that Uber would definitely want to have a place in this future because it's just so obvious that this is the user experience that people want to have. They don't want to have to open up this app and then that app and then that app to coordinate the task that they're trying to do. They just want to tell the AI, I need a ride to this place and let my wife know that I'm on my way.

And it can just do it all. And we don't have to open up a bunch of different apps to make that happen. So that is the future that I am really looking forward to. Go ahead and read more of what my thoughts are on this in the article. Thanks for checking this out.