How to Use Sampling in MCP: Borrow the User's LLM

Modern AI applications are all about generating new content on demand—whether that's text, summaries, suggestions, or more. This process is called sampling: asking a language model (LLM) to produce a completion or response based on a prompt and some context.

The Model Context Protocol (MCP) makes sampling easy and powerful. Instead of wiring up your own API keys and integrations, you just make a sampling request from your MCP server. The client handles model selection, permissions, and user controls. This opens up a world of automation possibilities for your apps.

And while MCP sampling is still in its early days, it's already enabling developers to build smarter, more helpful features with just a few lines of code.

What is Sampling (and Why Should You Care)?

Sampling is the act of asking a language model to generate something new for you. That could be:

A summary of a document
Suggested tags for a new entry
A motivational message
A list of action items from a meeting
Or just about anything else you can describe in a prompt

With MCP, you can automate these tasks in your own server logic, making your apps more helpful and responsive—without reinventing the wheel.

Here's the basic flow:

Your server sends a sampling request to the client, including a system prompt, user messages, and a token limit. (Note: this means it requires a long-lived connection to the client, so HTTP-streaming servers cannot implement this feature.)
The client forwards the request to the user's chosen LLM (like Claude, GPT, etc.).
The model generates a response, which is returned to your server.
You parse and validate the response, then use it however you like.

Example: Automating Meeting Minutes

Let's say you want to automate the process of summarizing meeting transcripts, extracting action items, and suggesting tags. Here's how you might do it with MCP sampling:

import { z } from 'zod'
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'

export async function summarizeMeeting(server: McpServer, transcript: string) {
	// the server.server thing is because the MCP server maintains an internal, lower-level
	// server that is used to communicate with the client.
	const result = await server.server.createMessage({
		systemPrompt: `
You are an expert meeting assistant. Given a meeting transcript, respond with a JSON object containing:
- a concise summary of the meeting
- an array of action items (each as a short string)
- an array of relevant tags (each as a string, max 5)
Respond with JSON only. Example:
{
  "summary": "The team discussed the Q2 budget, agreed to cut marketing costs, and scheduled a follow-up meeting.",
  "actionItems": [
    "Carol will draft a revised marketing plan",
    "Team will meet next week to finalize the budget"
  ],
  "tags": ["budget", "marketing", "follow-up"]
}
		`.trim(),
		messages: [
			{
				role: 'user',
				content: {
					type: 'text',
					mimeType: 'application/json',
					text: JSON.stringify({ transcript }),
				},
			},
		],
		// this is the maximum number of tokens the model will generate.
		// it's important to set this to a high enough value that the model has
		// enough context to generate a good summary.
		maxTokens: 1000,
	})

	const resultSchema = z.object({
		content: z.object({
			type: z.literal('text'),
			text: z.string(),
		}),
	})

	const parsedResult = resultSchema.parse(result)
	const jsonResponse = JSON.parse(parsedResult.content.text)

	// Schema validation for safety
	const responseSchema = z.object({
		summary: z.string(),
		actionItems: z.array(z.string()),
		tags: z.array(z.string()).max(5),
	})

	const response = responseSchema.parse(jsonResponse)

	// write these to the database or whatever you need... Congrats, you just
	// "borrowed" the user's LLM to do some work for you!
	console.error('Meeting summary:', response.summary)
	console.error('Action items:', response.actionItems)
	console.error('Tags:', response.tags)
	return response
}

This is just one example. The real power of sampling is that you can use it for almost anything you can describe in a prompt.

Other Creative Use Cases

Here are a few more ideas to spark your imagination:

Recipe Ingredient Optimizer: Suggest substitutions and generate shopping lists based on a recipe.
Blog Post Image Generator: Generate an image for a blog post.
Customer Support Ticket Classifier: Assign urgency, suggest departments, and draft first responses for incoming tickets.
Personalized Book Recommendation Engine: Recommend new books with short pitches based on a user's reading history.
Fitness Progress Motivator: Generate motivational messages and suggest new exercises after a workout log.

If you can describe the task, you can probably automate it with MCP sampling.

Tips for Effective Sampling

Be explicit: Tell the model exactly what you want, and provide example responses.
Use structured data: Send and receive JSON for easy parsing and validation.
Validate everything: Use a schema (like Zod) to ensure the model's response matches your expectations.
Experiment: Try your prompts in an LLM playground, iterate, and refine until you get reliable results.

This is Just the Beginning

MCP sampling is still new. I'm unaware of any popular LLM apps that support this feature as of today, but eventually there will be some and it will be a big unlock.

If you want to build smarter, more helpful apps, now's the time to experiment with MCP sampling.

I hope this is helpful! In the MCP Fundamentals workshop, I give you exercises to develop stuff like this. If you'd like to keep up with AI development, sign up for the EpicAI newsletter.