We used to add chatbots to our apps. Now we add our apps to the chatbots.
After the release of the
ChatGPT Apps SDK developer preview,
I built a math-centric MCP server that ships a Tron-flavored calculator straight
into ChatGPT. The project lives in
this repo, and it's
my current favorite way to explain what it looks like to embed UI into an agent
interface using MCP-UI (mcpui.dev) alongside the ChatGPT
Apps SDK.
LLMs are great at conversation, but sometimes a form, a dashboard, or even a
calculator is the faster path to a great user experience. MCP gives us a way to
ship those pixels right into the conversation loop. MCP-UI wraps the
vendor-specific bits (like OpenAI's window.openai API) so we can focus on
rendering UI and streaming structured data back and forth.
If you've been following what I've been talking about over the last several
months, you know what I'm expecting for the future of user interaction: ask for
a thing, an app shows up with the right UI, and every interaction keeps the
model in the loop. MCP-UI is the glue that makes that promise practical today.
All of that is beside the fact that ChatGPT has 800 million users and many
of those users are your potential customers. Nothing would make your competitors
more thrilled than for you to not be competing in this space.
In my demo above, a user (me 😅) asks for help subtracting a number from their
birth year of 1988. ChatGPT wires up our calculator widget, the user subtracts 6
then taps = and because 1982 is the answer, the calculator widget asks ChatGPT
to roleplay as the Master Control Program (referred to as "MCP" in the Tron
films 😆). That playful back-and-forth isn't just for show; it demonstrates that
UI events can push prompts back into the conversation.
The worker spins up an MCP agent, registers traditional tools, and layers on
widgets that ship HTML bundles the Apps SDK can cache and serve back from its
sandbox.
constwidgets=[
createWidget({
name:'calculator',
title:'Calculator',
description:'A simple calculator',
invokingMessage:`Getting your calculator ready`,
// ...
getHtml: ()=>
renderToString(
<html>
<head>
<metacharSet="utf-8"/>
<metaname="color-scheme"content="light dark"/>
<script
src={getResourceUrl('/widgets/calculator.js')}
type="module"
></script>
</head>
<bodycss={{ margin:0}}>
<divid="💿"/>
</body>
</html>,
),
inputSchema: {
display: z.string().optional(),
previousValue: z.number().optional(),
// ...
},
outputSchema: {
display: z.string().optional(),
previousValue: z.number().optional(),
// ...
},
getStructuredContent:async(args)=>args,
}),
]
A few things to notice:
Widgets combine an MCP tool with a UI resource. We version the resource name
with a build timestamp so ChatGPT busts its cache each deploy.
The input/output schemas double as documentation for the LLM. Because the
model reads these, it knows how to shape the arguments when it calls the tool.
We render HTML with Remix's new @remix-run/dom/server helper. That's
pre-alpha Remix v3 territory, so expect breaking changes. The important thing
though is you give some HTML. It can be generated by anything (but it must be
static because it's going to be cached and served from the ChatGPT sandbox).
Registering the MCP server happens during agent initialization, so by the time
the worker handles /mcp, every tool and resource is wired up.
Once the HTML is loaded in ChatGPT's sandbox, it talks to our server via
postMessage. With ChatGPT's Apps SDK, you get a global window.openai API that
manages this postMessage communication. However that's proprietary and only
works with ChatGPT. MCP-UI normalizes those messages and keeps them portable
across clients so if you write your interactions via MCP-UI, you can use it in
ChatGPT or any other app that supports MCP-UI.
functionsendMcpMessage(
type:McpMessageType,
payload:McpMessageTypes[McpMessageType],
options:MessageOptions={},
) {
constmessageId=crypto.randomUUID()
return newPromise((resolve,reject)=>{
if(!window.parent||window.parent===window) {
console.log(`[MCP] No parent frame available. Would have sent message:`, {
Early adopters get splinters. Here are the splinters I hit:
Static HTML Only: The Apps SDK snapshots the HTML once and serves it from
their sandbox. No SSR, no per-request templating. You hydrate with render data
after it loads.
Aggressive Caching: ChatGPT aggressively caches resources. Appending the
build timestamp to the widget name keeps things fresh, but it's a hack until
better cache controls exist.
CORS Is Mandatory: Because assets are fetched from the sandbox domain
(*.oaiusercontent.com), every resource has to be CORS-friendly. The worker's
withCors helper opens things up globally.
Developer Mode Required: Today you must enable Developer Mode in ChatGPT's
settings to install a custom connector. Expect that to change, but it's the
price of entry right now (details in the
User Interaction guide).
Tooling Still Forming: Local testing used to be a pain. MCPJam
(mcpjam.com) now makes it way easier to iterate without
waiting on ChatGPT, but expect the tooling story to evolve quickly.
Feel free to fork the
repo and play around
with it.
If you build something cool, please share it. Seeing UI literally pop into a
chat window never gets old, and we're just scratching the surface of what these
embedded apps can do.
MCP-UI plus the Apps SDK lets us embed fully interactive widgets directly
inside ChatGPT conversations.
The repo's calculator widget shows how to serve static HTML, hydrate it with
MCP tool data, and send events back up to the model.
Expect turbulence: Remix v3 is pre-alpha, caching is finicky, and developer
mode is still required, but the ecosystem is moving fast.
Now go build a widget, hit me up on discord or
𝕏, and let me know what your users discover when
your UI meets their favorite LLM.
Share
For the last couple of years, we've all been adding chatbots to our app. Well, now we're going to be adding our app to the chatbot. With the chat-gpt-apps SDK and others likely following it, we are going to be adding UI and widgets and our apps and services to things like ChatGPT and Cloud and Gemini and all of the different tools using things like the ChatGPT app SDK, which is built on top of the model context protocol, MCP, and is collaborating with MCPUI to make this work across all different kinds of agents. So this is really exciting. And I want to give you a little demo of how this works, what the experience is like today.
It's a little bit rough. It's going to improve. And also what it's like to build this sort of thing as well. So we'll look into the code a little bit to give you an idea of what it looks like now. I expect lots of this to change in the future.
Abstractions will be built and things will become a lot easier. But yeah, right now it's pretty early days. So here I've got chat GPT all set up. I'm going to say, hey, math, my birthday is in 1988. Could you Give me a calculator so I can subtract a number from that.
And we'll see, hopefully, there we go. It's looking for available tools. It found a tool. It's getting my calculator ready, and boom. Now we're initializing the master control program, accessing the grid, authorizing, override accepted greetings program.
Those of you who are into Tron, that will be familiar to you. If you don't know anything about Tron, you should go take a look at Tron. But anyway, here we're going to, we've got our calculator. So I'm going to say minus six. And like all of this is right in the chat GPT experience.
I think that's pretty cool. So here, let's say equals, and we get 1982. Interestingly, that was the year that Tron was released. And would you look at that? Oh, my interaction with the UI actually triggered the chat GPT to do something.
And so we can interact from the conversation into the UI, and then from the UI we can interact back to the conversation. And there's that two-way information, a flow of information going, which is cool. And so in particular, I have this widget configured such that when the result is 1982, the release of the original Tron movie, it will actually send a prompt to chat GPT telling it to act like the master control program from the Tron movie. And so it's treating you as if you're a program and stuff. It's kind of fun.
You should go learn more about Tron if you don't. But yeah, so this is the widget. And it is a full-on calculator, what you would expect, all of that. But what is the code like this? That's what we're going to look at now.
So I have this hosted on Cloudflare. You can see my Wrangler stuff here. I've got my durable objects going on here, my SQLite for the durable object storage, all of that stuff. This is pretty standard setup for a model context protocol server hosted on Cloudflare. And so then if we go into the worker, so we see our main is in worker TSX.
So we're going to say worker index.tsx. Here I've got my MCP agent. Here's my Math MCP server. It's got an MCP server instance that is coming from the official SDK. We're getting the MCP agent from agents as well.
And then, yeah, we've got a couple other things going on in here. So we've got tools. We have widgets that are created on initialization. We have this utility to require the domain. That will be useful here in a little bit.
And then the default export has a fetch. This is what's called when our worker is invoked. And actually, if we come back over here and take a look at the configuration for this connector, we'll see that is my URL for my worker. It's set up to that. And here we can see the different actions.
They call it tools actions that can be performed, which is, yeah, that makes sense. This experience, I don't expect, is going to be the experience that people have as they're using our MCP servers or our services in the future. In fact, in the announcement, when ChatGPT made this announcement and even showed a demo, they didn't explicitly have to mention any service or anything. They didn't have to explicitly install it either. It just kind of happened as a part of the conversation.
Oh, you asked me for houses on Zillow. I have this Zillow service, so I'm going to bring it in. And so this installation process, I think, is not going to be something that our users have to go through. Anyway, the other thing that we have to do right now is we do have to have this in dev mode. And so that's a pro feature.
You come into your apps and connectors, go to Advanced Settings, and you turn on Developer Mode. And so that's how we get that all working in here. And when our server is called, it's calling this fetch handler. We need to have cores enabled for all of the widget assets so that when our UI makes requests for assets for JavaScript and images or whatever, that can be requested from the chat GPT domain rather than our own domain, because they actually do host your widget on their own servers. If we open this up, this is for security purposes, you'll notice this is coming from an iFrame, they are using an iFrame, but it's coming from their own sandbox.
So it's connector, URL crazy subdomain ID thing, web sandbox OAI usercontent.com. So they are gonna copy all of your assets onto their server, all of the assets from the original HTML document. So they'll take that document, they'll host it on their server. But that document is going to have things inside of it. So if we dive into this head, we're going to see, here's our Cloudflare remix, yada, yada, yada.
So the code for the calculator, they're not copying that piece over. They would if we inlined it or whatever, but we are using a script here. So they're not going to copy that over. Instead, they're going to reference it. And so now we're getting a request to this asset from this domain.
And so that's why we need to have cores all set up so that these requests will go through. So You'll want to make sure that you have core set up wherever you're hosting this. So then we've got here our handler. This is going to take that URL. If it's a slash MCP, then we're going to serve our MCP server there.
If it's requesting a static asset, then we'll send those static assets. And then I also have this dev thing for local development and stuff. There are actually now some pretty good tools for doing local development. Mcpjam.com specifically, you'll want to take a look at that. That's a really great resource for developing servers locally.
But when I was building this originally, we didn't have that. So that's why that's still sitting around right there. So the important thing for you to know, though, is that you need to have cores set up for your assets. You need to serve those assets. And then you need to have a place where you're serving the MCP server.
Now, the MCP server itself does have a couple of things that you need. So here we do have a tool called DoMath. That's irrelevant here. That is like a standard traditional MCP tool. In fact, I could say, could you use the DoMath tool?
I'm gonna be very explicit to make sure it calls it, to calculate 1988 minus six. And we'll see if it actually calls the do math tool so we can see what the difference is. So here, you noticed it did call a tool and it displayed a UI. With this one, it's going to call a tool in the more traditional sense. It literally passes the arguments, it gets back the response, and then it's going to use that response to show us that result.
So, the year Firstron was released, amazing. The cycle completes itself program. The grid remembers its origin, that's funny. Okay, yeah, so you can register. My point here is that you can actually register normal tools.
You can register other resources and prompts, all of those things. You don't have to have a specific MCP server just for Chat GPT apps. They can all be together. And so then we have this register widgets. Now, widget is my own invention, I guess, in this context.
A tool that exposes UI in chat GPT apps SDK vernacular. I'm actually, I don't even know if they technically have a name for it, but they expose two things or they consist of two things. They consist of a tool and a resource. And so what I did to simplify things for myself is I created a widget type that has all of the stuff that you would need for the tool and the resource combination for this to work nicely. And it's probably easier if I just show you what the widget looks like.
So I've got this create widget, name calculator, the title is calculator with uppercase C. So this is the thing that users are going to see. And we have the description. This is useful for both the user sometimes, as well as the AI assistant. This would be probably something worth wordsmithing a little bit to make sure that the LLM, or that chat GPT in this case, knows when to call this tool.
I could probably add a little bit more useful information to help it understand what this actually is. And then while the tool is being invoked, so it's being, yeah, it's invoking, We can say, getting your calculator ready. And if you scroll back on the video, it should have said, getting your calculator ready. When it was finished being invoked, here is your calculator, and then the result message, the calculator has been rendered. And these things, the chat-gbt client will use to either display to the user or provide to the LLM so it knows what to do about that current state of your widget.
And yeah, they do technically actually call these things widgets, even though they are consisted of resources and tools. So widget accessible is a part of the actual apps SDK. The results can produce widget. Yes, That is the case. And then here I have this getHTML function that returns an HTML string.
So this renderToString is actually coming from RemixRunDOM. This is a new thing. It's not even officially released. This is Remix V3 that I've used to build this out, which is super cool. I love it.
But you can use anything that can generate HTML. Even with the abstractions that I've built here, anything that generates HTML will work just fine. So I'm using Remix v3, and it's been pretty sweet. So here you'll notice I have this getResourceURL widgetsCalculator.js. So that is the path to the asset for the calculator, for the code for the calculator.
We'll look at that in a second. We have getResourceUrl. This is going to use that required domain. So this required domain comes from the MCP class. That's coming from prompts with the base URL.
That base URL comes into the prompts via this right here. So when our worker is originally called, we're going to parse the URL it's called with. We're going to use that to get the base URL, and that's going to be like our domain and everything. Then we can use that back here to get the URL for the resource. So effectively, what this is doing is whatever domain the MCP server is being hosted at, that's the domain that's also going to be serving the assets.
And I don't want to have to hard code that in my code. And so I use the requester, like that request URL, to determine what is the base URL for all of my assets. So ultimately, this ends up being what we saw earlier. With that, Cloudflare Vite MCP, kentcdonsworker.dev, slash widgets calculator. So with that, then I also have this div ID of disk just for the fun of it.
We'll look at the Widgets calculator here in a second to see how it hooks up to that. And then we have our input schema. So you can control aspects of the calculator up front. So these are the different pieces of input that our UI are going to receive. So we've got our display.
We've got our previous value operation, waiting for all of this stuff, all of the state that we can pass into our calculator class that manages all of this state. And then we have our output schema. So this is the stuff that when they call our tool to get our widget, this is the types of data that they can expect from that. So yeah, and pretty much it's what they provided to us is what we're gonna provide back to them. But it's not always going to be this.
Sometimes the user might be logged in, and so you're going to say, OK, get me a list of products. And so your output schema will include that list of products. And the input schema might just be the product ID or something like that. And then our structured content is just going to be the arguments itself. We could go deeper into several of these things, but this is supposed to just be an overarching thing.
So we're going to take all the widgets. We're going to actually now create the UI resources for these widgets. So, this create UI resource is coming from mcpui server, and so mcpui is kind of acting as the jQuery for UI, Really just kind of normalizing the APIs and stuff like that as best they can. And I expect some of this to improve as well. But here we're gonna have our URI, which is gonna consist of the widget, its name, and .html.
And that name is gonna be the widget's name with the version. I'm being very explicit about the version because, especially during development, Chat GPT is very aggressive with its caching. And so by just adding a version in as part of the name, I avoid caching issues. I am not sure if this is going to be a long-term thing that we have to do or if this will naturally, like, they'll just have a better sense for when to update things. But for right now, including a build version is very useful.
And the way that I get that version is right here. I actually generate the build timestamp through a script in my package.json. So prebuild worker, I generate that build timestamp, which is kind of funny, but it's fine. It really is not a big deal. So we've got our version.
We've got our name. We've got our resource info, encoding text, content, type, raw HTML, HTML string. All of this stuff is MCPUI stuff. So if you've ever used MCPUI, this should be pretty familiar to you. And then we're going to, with that resource info, we're going to register a resource.
We're going to use MCPUI's create UI resource. We'll have that resource info along with some metadata. So we'll have our widget description, our widget CSP. So we're gonna say, hey, your resources are the base URL. This ensures that the content security policy on that widget sandbox includes your base URL, so you can request things from your own server.
So if I didn't have that, then I wouldn't be able to go get the calculator widget code, which would not be great. So you do get to control a little bit about the widget CSP. And yeah, you can also determine whether or not you prefer to have a border around your frame. There's actually a lot of control you have around the way that your app is presented in chat GPT, and I expect that to evolve and change over time as well. And then here, this is an MCP UI specific adapter configuration saying, hey, we are using the apps SDK, so please adapt this resource to support the apps SDK.
So that is our resource, and then we register a tool. So really, the resource registration process, as soon as your server is connected to, ChatGPT is going to go and grab that resource. And maybe it does this lazily, but the point is it grabs all the content from that resource, including the HTML itself. So that HTML string right here, it's going to grab that HTML. It's going to save it into its own sandbox environment so that it's ready to use that at any time.
So whenever the user wants to show that UI, it's always going to show that same HTML document. So you cannot actually make that document be dynamic. It has to be static HTML document. So sorry, no SSR, no server rendering for you. Maybe one day in the future, but that is just not a feature.
I'm actually fine with it. It's not a big deal in this context, because the user is in a chat thread already. It's a little bit different than the challenges facing us on the World Wide Web. Part of it, you really would like to be able to server render all of this stuff dynamically on every request. But that's not the way that the chat-gpt-apps-sdk does it.
So we have a static HTML string, which like pretty standard SPA sort of thing. And then When the tool is called, that's when we get the dynamic data that can go into that static HTML. So we have our widget title, description, and some metadata. So the domain, here's our base URL, output template. That's the URI for our resource.
So that's why we've got our URI specified here. That's the resource URI when we register that resource right there. And then we have that invoked message invoking all of that stuff. All those different widget options go as part of the metadata for this tool. And then we have our input schema, output schema, and our annotations.
We add read-only hint. In this case, that makes sense. This is a read-only tool after all. And also open world is false because this otherwise, chat GPT is going to ask the user to confirm calling this tool and that's kind of annoying. If it's some sort of, like your tool does some sort of destructive behavior before it sends the UI data back or something like that, then you would not want to do this way.
But for lots of UI, you're probably going to say, yeah, this is read-only. I'm just getting you some data so you can display it in a UI. So then from there, we're going to get our structured content. We'll return that as part of our return value. We've got the content that includes some text to say, hey, here's the result of you calling this tool.
Then we're also going to use MCPUI again, create UI message or UI resource to include that information in here as well. So this should actually be usable externally or by other tools that don't use the chat GPT SDK. So this is kind of if I were to use Goose or Nanobot or something or Postman or any of those that support MCPUI, they should actually still be able to use this tool and get the exact same behavior, which is pretty neat. Okay, great. So that is the code for that.
And that's honestly, like, that's the meat of everything. The specifics around the calculator and everything, like, once you get HTML on the page, you do whatever you want to there. If you're curious, The calculator right here is using Remix version 3 packages. So these are still like pre-alpha sort of thing. So you can dive into how this was built using this calculator class as like the primary brains behind the calculator and then we've got our UI for this as well living around here somewhere yeah right here so feel free to dive into any of this if you're interested but there is one other really interesting thing about this that's specific to chat GPT apps, and that is this MCP prompt.
So, dive into this, I explain, the user got the result in 1982, the year of the first Trump film, turning you into the MCP, which is the Master Control Program, which is so perfect, it's so, yeah. MCP, back in 1982. And so I just instructed to behave like the master control program, just as a fun little Easter egg. But the Easter egg is a learning opportunity for us, because this is how you communicate back up to the conversation history. So if we find where this is being used, I listen for the calc engine change event, and we update our UI because Remix has explicit updates, and it's not a problem, and it's really nice.
But here I also say, hey, if the display is 1982, then send an MCP message, a prompt. I want to send a prompt to the conversation and I want this to be the prompt. So what is this, send MCP message? Well, this is actually another utility that I built to make all of the post message stuff work nicely. So remember, we're running inside of an iframe.
We need to post a message. So this is a window parent post message. We need to send a message to our parent, and then that parent can do what it wants to with it. So it's actually expecting specific messages to occur. And the way that chat-gbt-apps SDK works is that you're actually talking with a window.openai object and talking through that.
It's kind of like a little SDK inside of that iframe. And that's very proprietary to ChatGPT. And so instead, we're using MCPUI post messages for all of these different things. And MCPUI will convert the things that you're doing into calling the window OpenAI SDK directly. And so that makes it really nice.
So you can just write to MCPUI, and it will work in all of the clients that support MCPUI, and it also will work in ChatGPTApps SDK. And so the way that this works is we're going to take the, we're going to create a message ID. We're going to return a new promise here with a parent post message. Then we're going to handle the response. We'll send the type, art's type was prompt, the message ID, so we know when the response comes back, which response is associated to this request, and then the payload which will be an object that has our prompt in it.
So that gets sent to the parent. The MCPUI will actually handle that and then call the window.openAI method to actually send the prompt request for us. And then we get a response. So we're going to handle that response. We'll remove the event listener once we get the correct one.
And yeah, send that response back. So if we're saying, hey, call this tool and give me the result or whatever, or what is the initial data for my UI? In fact, we actually do use that other thing right here. We've got our initial state. So if we come right here, we've got our render data.
This app right here is our original app. And we have this wait for render data function that we're calling. And so this ensure when chatGBT opens up our document, it's actually gonna show the document first, and then it's gonna call the tool, it's gonna get the response back, and it will send the response into our document. And so That's actually part of the reason why I have this initializing stuff, aside from the fun of it, is that there will be a very brief loading time while it's going to get the data to load into our document. So we call waitForRenderData.
This waitForRenderData is, again, just doing post message stuff. And again, MCPUI is going to interact directly with the OpenAI SDK to request that initial RenderData that we're sending from our tool. And Just as a quick reminder of what that looks like, if we go to our widgets right here, our initial data is the structured content. Where is there structured content for this? Yeah, I guess it's literally just the input schema.
So we take all of the arguments that are passed in the input schema and we send them back as the output schema. So this is our render data. It's gonna have display, previous value, operation, and waiting for new value and the error state. And so that is what we're going to get back from this request for the initial render data. And with that, here's our render data.
Once we've updated, we've got some logic around, like, waiting for it if it's not ready. By the time we're ready to show the UI and all of that. And then we render the calculator when we're all set to show that. So yeah, there are a couple of things here that I think are really super interesting and involved. And I very like I'm starting to get a clear picture of a framework that handles all of this stuff for you to drastically simplify how all of this stuff works.
There are a number of moving pieces, but I think it's super interesting. And ultimately, the goal of getting your UI into this environment, I think, is a really good one. And I'm excited about that prospect of the things that we are going to be able to do as software developers, reaching our users and interacting with them in the tool that they prefer to use for this type of interaction. Not always is text conversation the best way to interact. Sometimes you want to have some sort of UI.
So I'm excited about ChatGPT blazing a trail on this and showing us really what this experience can be like. And I'm excited to teach you all how to do it using MCPUI and just MCP foundational information and knowledge in general. So hopefully, this was interesting to you. And feel free to dive into the code and get a sense for things. Play around with it yourself.
Build your own thing. Let me know what you learned and what you've done. And Let's build the future of user interaction together. Yay. See ya.