The future of AI interaction: Beyond just text

    Kent C. DoddsKent C. Dodds

    We're only scratching the surface of what it means to interact with our services using AI and natural language. The Model Context Protocol (MCP) is opening up exciting possibilities in this space, but I think we're about to see a major shift in how we think about these interactions.

    My mind was initially opened to the possibilities of using AI to interact with data and make data mutations when I saw Evan Bacon's talk at ReactConf 2024. While his talk wasn't specifically about AI and MCPs (the spec hadn't even been invented yet), his demo used an LLM to interpret natural language queries and determine which components to display based on the responses. If you didn't watch the video above, catch the demo here.

    This was truly mind-blowing for me. At the time, I was mostly interested in the power of React Server Components he was demonstrating, but it was fascinating to see an integration that went beyond just text in an app — where you could interact with a large language model like ChatGPT, Claude, or Gemini.

    Evan is a brilliant developer and had to write a lot of custom code for that demo to work with the different service providers. With MCPs, you can still write that custom code to do those integrations, but the service providers can (and I think will) write their own MCP servers to meet their users where they are: in an AI application like this one.

    What users really want

    This is the user experience that our users want. They don't want to navigate through 30 different websites (or even a single complex site) to accomplish one task. They want a single interface where they can express exactly what they want to do or what information they need, and then have that information presented in the most reasonable way possible.

    This realization led me to open a discussion on the Model Context Protocol GitHub about having MCP clients support UI that MCP servers return. Whether that UI is provided by the server or generated by the client on demand, it's becoming clear that users often want to interact with UI elements, not just text.

    The importance of UI in AI interactions

    Let me give you a concrete example. Imagine a user says, "I want to find directions to the nearest taco stand." Sure, an LLM could use an MCP server to get their location, then use a Google Maps MCP server to find directions, and spit out a text response like "Turn left, then right, walk a mile, then turn left again."

    But is that really a great user experience? Of course not!

    What would be much better is if Google Maps returned some UI that showed exactly what you'd expect from a Google Maps directions interface — a UI generated on-demand for the user's specific use case.

    Here are a few more examples where UI is crucial:

    1. Stopwatch: If my kid asks me to time their handstands (which they definitely have), it's much better to have start and stop buttons than to type or say "start" and "stop."
    2. Data visualization: Displaying data in a graphical interface with charts and the ability to select data points is far more useful than a text description in many cases.
    3. Multimodal interactions: Sometimes, you might want both button and voice controls. Imagine timing your reps while lifting weights — you might want to use voice commands when your hands are occupied.

    The future of UI development

    Now, I've heard some people worry that UI developers are going to be out of a job because AI is going to make that discipline obsolete. But I strongly disagree.

    Users are not going to want to interact with AI via text or voice alone all the time. We're going to continue to need visualizations, buttons, and other types of interactive experiences. As Evan Bacon's demo shows, this is not only possible but can create an awesome experience.

    Looking forward

    I don't know exactly what shape this will take. Maybe it'll end up using React Server Components on web and mobile, as Evan demonstrated. Or perhaps we'll have AI generate UI on demand based on some tokens we provide, allowing for consistent branding and look-and-feel, but still completely dynamic UIs. Or maybe both.

    What I do know is this: the future of AI assistants that help us solve our day-to-day problems and answer our questions is going to involve much more than just text. And I, for one, am very much looking forward to it.

    So, the next time you're interacting with an AI assistant, imagine what it could be like if it wasn't limited to just text. The possibilities are truly exciting, and we're only at the beginning of this journey.

    Share