Please make Jarvis (so I don't have to)

    Kent C. DoddsKent C. Dodds

    A year ago, I was driving down to React Conf with my friend Ryan Florence and we spent a lot of time talking about the future of user interfaces with how AI is changing things. We were talking about Jarvis. You know, Tony Stark's AI buddy from Iron Man? Yeah, that J.A.R.V.I.S. (Just A Rather Very Intelligent System... classic backronym πŸ˜…). It's not just a cool movie prop β€” it's the kind of AI assistant we're all secretly hoping for. And like most of you, I want Jarvis to exist, but I don't want to build it myself. I'm too busy teaching people how to make quality MCP (Model Context Protocl) Servers which is a very important element to Jarvis success when it is built.

    Let's talk about why everyone wants Jarvis, and what stands between us and this AI-powered future. And maybe with a little luck, I can nerd-snipe one of you reading this into building it for us (there's a lot in it you as well if you're successful πŸ€‘).

    What is Jarvis, anyway?

    In the Iron Man universe, Jarvis is Tony Stark's do-it-all AI assistant. It manages his schedule, controls his home, helps design tech, handles emergencies, does research, and even provides emotional support. Basically, if Tony needs something done, Jarvis is on it.

    We already have assistants right now and they can already do stuff. I can tell Siri to close the blinds in my office and it happens. I had to wire that up, I have to be careful with how I word my request, and I can only ask one thing at a time, but I can do it. But it can't do everything I want it to do and its ability to discern my intent is pretty poor and that's the difference.

    Now, we might not need an AI to operate our non-existent Iron Man suits, but the core idea is powerful: an AI that can do everything, seamlessly and effortlessly. That's the Jarvis dream.

    The Jarvis experience

    So what makes Jarvis so special?

    1. Natural language: It understands context and intent, no clunky commands needed.
    2. Multimodal: Text, voice, video, gestures β€” Jarvis handles it all.
    3. Immediate response: Even if a task takes time, Jarvis acknowledges instantly.
    4. Proactive: It anticipates needs and takes action without being asked.
    5. No configuration: All integrations are handled automatically.
    6. Ubiquitous: The same assistant everywhere β€” home, phone, car, you name it.

    Sounds pretty great, right? But here's where things get interesting.

    The paradigm shift: The Post-Browser Era

    We're on the cusp of a major change in how we interact with technology. Right now, we use browsers to access websites for everything from social media to banking. We trust these browsers with our data. We don't for a second wonder whether Chrome or Safari is going to steal our bank account password and empty our accounts when we type in our username and password in these applications. The browser is our window into the interconnected world.

    In the Jarvis future, I believe we'll see a shift:

    • Browsers β†’ Jarvis clients
    • Websites β†’ MCP (Model Context Protocol) Servers

    This isn't just a tech swap β€” it's a fundamental change in user experience. Jarvis becomes the interface for everything, and MCP servers provide the specialized knowledge and capabilities.

    Think about it: Jarvis could tap into mechanic servers to help fix your car, fitness servers to be your personal trainer, or even medical MCP servers to be your doctor. The core Jarvis client handles the interaction, while the vast ecosystem of MCP servers provides the expertise.

    To be clear, I expect we'll have a number of Jarvis clients from competing organizations, just like we have a number of browsers from competing organizations. But most people pick a single browser and use it for everything.

    I think that will be the case for a Jarvis client. People will not want to have a different AI assistant for every context of their lives. They want a single assistant which can assume a variety of personas, and those personas can be largely controlled through which MCP Servers are used in the context of a given conversation.

    The MCP standard is what will enable this kind of healthy competition that's good for the consumer as opposed to the walled garden approach.

    And it's all with a much better user experience than anything we've experienced before... If we can do it right. And we have the technology to do it right! We just need to solve a couple problems first...

    Problems to solve: What's standing in our way?

    As cool as this possibility is, we've got some hurdles to clear first. It only counts as a "Jarvis experience" if non-technical people can use it as easily (or more easily) as they do the browser or their mobile phone.

    General challenges

    1. Trust and privacy: This can be built up over time, just like it was when browsers first started being used for people to do online banking. Good policies will be necessary for this. Early on will require much more human-in-the-loop while the trust (and frankly realtive success) of Jarvis is proven out.

    Client-side challenges

    1. No user-managed MCP servers: This should all be handled behind the scenes, even authentication.
    2. Modality switching: Jarvis needs to handle voice, text, video, and more β€” seamlessly switching between them.
    3. Speaker distinction: It should recognize different voices and know who can give it commands.
    4. Context management: Keeping track of conversations, tasks, and memory without explicit user management.
    5. Proactive notifications: Jarvis should pipe up when needed, not just respond to queries.
    6. Automatic task switching: No more manual "new thread" management β€” Jarvis should handle context shifts on its own.

    Server-side challenges

    1. More quality MCP servers: We need a rich ecosystem of specialized services.
    2. Authentication & Authorization: Most MCP servers will need this, but it should be effectively invisible to the user.
    3. Background tasks: For proactive features and scheduled actions.
    4. Memory management: Providing context across interactions and services.

    Testing the Jarvis experience

    Now, I know some of you are thinking, "But wait, don't we already have AI assistants that can do a lot of this?" Fair point. So let's run through some test prompts to see how close we really are to the Jarvis experience.

    Note that these all appear as one-shots, but I do think that having a conversational experience for something like this is an important part of this future as well.

    I want to see the latest Mission Impossible movie in theaters with my friends Josh, Mac, Joel, Sean, Julie, Garrett, and Andy. We've decided Saturday the 31st in the evening is the best time for most of us.

    Please find a movie theater that's showing Mission Impossible that evening sometime after 8 and is relatively close to all of us. Get tickets for seats close by one another, create a calendar event with details, and invite all of us. Oh, and let my friends know what they owe me for the tickets.

    This seemingly simple request requires:

    • Contact management
    • Location services
    • Movie theater lookup
    • Ticket purchasing
    • Calendar integration
    • Payment issuing
    • Group messaging

    It's a lot, right? And while we might have services that can do parts of this, the seamless Jarvis experience isn't quite there yet.


    Let's try another one

    Could you help me plan this week's soccer practice?

    Find a good time on my calendar and reserve a soccer field in Highland, then help me put together a plan for the drills that we'll run at that practice based on what we talked about during the last game, and add those details to the calendar event so the parents know what's going on. Then invite the parents of my team to the practice.

    This one's interesting because it highlights the need for specialized MCP servers. You're not going to get OpenAI to build an integration into ChatGPT for your local city's service, but a city like Highland, Utah could have its own MCP server for field reservations that seamlessly integrates with Jarvis and can be dynamically discovered and used. We'd also need:

    • Calendar management
    • Soccer coaching knowledge (either built-into the foundational model or from an MCP server)
    • Memory recall from previous conversations
    • Event planning and invitations

    Ok, great, let's do another

    I want to improve my cardio. Please block out a 30-minute run for me this week. Also, remind me to stretch afterward.

    This short request actually packs in a lot:

    • Calendar integration
    • Weather checking (to avoid scheduling a run in the rain, because it knows I hate running in the rain)
    • Proactive reminders
    • Potential integration with fitness devices
    • Long-term goal tracking

    Please give me directions to the nearest Costa Vida and place an order for the usual so it's ready when I get there.

    Here we need:

    • Location services (on-device MCP)
    • Restaurant lookup
    • Order placement via restaurant MCP server
    • Payment processing
    • Memory recall for "the usual" order
    • Navigation UI (not just text directions)

    If you're interested in a discussion about adding UI to the MCP spec, check out modelcontextprotocol/discussions/287!


    And let's not forget about proactive behaviors. A true Jarvis should be able to initiate conversations like:

    You've been paying $30 a month for your car wash membership, but you don't wash your car enough to make it worth that. Would you like me to cancel your membership, or would you like me to schedule time to wash your car?

    Or

    You've just received a message from your mom asking for photos of your family trip to Hawaii. Would you like me to send her some of the highlights?

    These examples require Jarvis to have access to various data sources, the ability to analyze patterns, and the intelligence to suggest relevant actions.

    The Jarvis future (that I want you to build)

    I believe we can build Jarvis. We have a lot of the pieces, but we're missing that holistic, seamless experience. And while I'm highly interested in this future, I'm not the one who's going to build it.

    My focus is on teaching people how to build high-quality MCP servers. That's what I'm doing on EpicAI.pro. I need Jarvis to exist so that these MCP services can reach their full potential.

    So, if you're working on building Jarvis (or something like it), make me demos of your Jarvis performing the things above. Seriously, reach out. And if you're interested in being part of this future by creating the MCP servers that will power it, come check out what we're doing on EpicAI.pro.

    The Jarvis future is coming. Let's make it happen!

    Start designing the future of intelligent user experiences

    Join the EpicAI.pro newsletter to get updates on events, articles, and more as they’re released

    I respect your privacy. Unsubscribe at any time.

    Share