Try it live
See PromptDiff in action. No sign-up required.
How it works
From prompt to comparison in milliseconds. One request, all the data you need to make the right model choice.
POST your prompt
Send your prompt and choose which models to compare. Include system instructions or variables as needed.
We run all models in parallel
PromptDiff calls each model simultaneously, measuring latency, collecting tokens, and computing costs in real time.
Get structured results
Receive a unified JSON response with outputs, latency, cost, and token breakdown per model. Compare and decide.
Simple API, powerful results
Integrate in minutes. Works with any HTTP client.
curl -X POST https://promptdiff.bizmarq.com/api/v1/compare \
-H "Content-Type: application/json" \
-H "Authorization: Bearer pd_your_api_key" \
-d '{
"prompt": "Explain async/await in JavaScript in one paragraph.",
"models": ["gpt-4o-mini", "claude-3-haiku", "gemini-1.5-flash"],
"options": {
"temperature": 0.7,
"max_tokens": 300
}
}'Supported models
All major providers in one comparison. More added regularly.
Check the docs for the complete and up-to-date model list.
Simple, transparent pricing
Start free. Pay only for what you use.
Note: PromptDiff pricing is separate from underlying LLM costs, which are billed by each provider on your behalf.
Why PromptDiff
The fastest way to benchmark LLM models
Choosing between Claude, GPT-4o, Gemini, and Grok for your AI application means running each model separately, collecting results manually, and comparing them by hand. PromptDiff eliminates that friction — one API call returns every model's output, latency, token count, and cost side by side.
Cross-model comparison in one call
Send your prompt once. PromptDiff fans it out to every model in parallel and returns all results in a single structured response.
Real-time cost and latency tracking
Every response includes exact token counts, cost in USD, and latency in milliseconds — so you always know which model is fastest and cheapest for your workload.
Supports all major providers
Claude Sonnet 4.6 and Haiku 4.5 from Anthropic, GPT-4o and GPT-4o-mini from OpenAI, Gemini 2.5 Pro and Flash from Google, and Grok 3 and Grok 3 Mini from xAI.
Python and TypeScript SDKs
First-class SDKs for Python and TypeScript. Drop in the client, pass your prompt and model list, and iterate on results in minutes.
Free tier — no credit card
100 evaluations per month free. Mini models (Claude Haiku, GPT-4o-mini, Gemini Flash, Grok Mini) are included. No credit card required to start.
Usage-based pricing
After the free tier, pay only for what you use. Pricing is LLM provider cost plus a 40% margin — transparent and predictable with a configurable monthly spend limit.
FAQ
Frequently asked questions
- What is PromptDiff?
- PromptDiff is an API that compares LLM outputs across multiple models (Claude, GPT-4o, Gemini, Grok) with a single API call. It returns structured results including output text, latency, cost, and token usage for each model.
- How much does PromptDiff cost?
- PromptDiff offers 100 free evaluations per month with no credit card required. After the free tier, pricing is based on actual LLM API costs plus a 40% margin. Mini models (Claude Haiku, GPT-4o-mini, Gemini Flash, Grok Mini) are available in the free tier.
- Which LLM models does PromptDiff support?
- PromptDiff supports 8 models from 4 providers: Claude Sonnet 4.6 and Haiku 4.5 (Anthropic), GPT-4o and GPT-4o-mini (OpenAI), Gemini 2.5 Pro and Flash (Google), and Grok 3 and Grok 3 Mini (xAI).
- Does PromptDiff have an SDK?
- Yes, PromptDiff provides both Python and TypeScript SDKs. You can also use the REST API directly with any HTTP client like curl or fetch.