PromptDiff is an API that compares LLM outputs across multiple models (Claude, GPT-4o, Gemini, Grok) with a single API call. It returns structured results including output text, latency, cost, and token usage for each model.

How much does PromptDiff cost?

PromptDiff offers 100 free evaluations per month with no credit card required. After the free tier, pricing is based on actual LLM API costs plus a 40% margin. Mini models (Claude Haiku, GPT-4o-mini, Gemini Flash, Grok Mini) are available in the free tier.

Which LLM models does PromptDiff support?

PromptDiff supports 8 models from 4 providers: Claude Sonnet 4.6 and Haiku 4.5 (Anthropic), GPT-4o and GPT-4o-mini (OpenAI), Gemini 2.5 Pro and Flash (Google), and Grok 3 and Grok 3 Mini (xAI).

Does PromptDiff have an SDK?

Yes, PromptDiff provides both Python and TypeScript SDKs. You can also use the REST API directly with any HTTP client like curl or fetch.

One API call. All models. Instant comparison.

Compare LLM outputs
across models.

One API call. Pick your models. Get structured comparisons of output, latency, cost, and token usage — side by side.

Free 100 evals/month. No credit card required.

Try it live

See PromptDiff in action. No sign-up required.

demo — gpt-4o-mini, claude-3-haiku, gemini-1.5-flash

Prompt

Explain the difference between REST and GraphQL in 2 sentences.

How it works

From prompt to comparison in milliseconds. One request, all the data you need to make the right model choice.

Step 01

POST your prompt

Send your prompt and choose which models to compare. Include system instructions or variables as needed.

Step 02

We run all models in parallel

PromptDiff calls each model simultaneously, measuring latency, collecting tokens, and computing costs in real time.

Step 03

Get structured results

Receive a unified JSON response with outputs, latency, cost, and token breakdown per model. Compare and decide.

Simple API, powerful results

Integrate in minutes. Works with any HTTP client.

curl -X POST https://promptdiff.bizmarq.com/api/v1/compare \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer pd_your_api_key" \
  -d '{
    "prompt": "Explain async/await in JavaScript in one paragraph.",
    "models": ["gpt-4o-mini", "claude-3-haiku", "gemini-1.5-flash"],
    "options": {
      "temperature": 0.7,
      "max_tokens": 300
    }
  }'

Supported models

All major providers in one comparison. More added regularly.

GPT-4oOpenAI

GPT-4o miniOpenAI

GPT-4 TurboOpenAI

o1-miniOpenAI

Claude 3.5 SonnetAnthropic

Claude 3 OpusAnthropic

Claude 3 HaikuAnthropic

Gemini 1.5 ProGoogle

Gemini 1.5 FlashGoogle

Llama 3.1 70BMeta

Mistral LargeMistral

Mixtral 8x7BMistral

Check the docs for the complete and up-to-date model list.

Simple, transparent pricing

Start free. Pay only for what you use.

Free

$0/month

Perfect for exploration and small projects.

100 evals per month
All models supported
Full API access
JSON responses
Eval history (30 days)

Pay-as-you-go

$0.005/eval

For teams and production workloads.

Unlimited evals
All models supported
Full API access
Volume discounts
Eval history (unlimited)
Priority support

Volume discounts

1,000 evals+$0.004/eval

10,000 evals+$0.003/eval

100,000 evals+$0.002/eval

Note: PromptDiff pricing is separate from underlying LLM costs, which are billed by each provider on your behalf.

Why PromptDiff

The fastest way to benchmark LLM models

Choosing between Claude, GPT-4o, Gemini, and Grok for your AI application means running each model separately, collecting results manually, and comparing them by hand. PromptDiff eliminates that friction — one API call returns every model's output, latency, token count, and cost side by side.

Cross-model comparison in one call

Send your prompt once. PromptDiff fans it out to every model in parallel and returns all results in a single structured response.

Real-time cost and latency tracking

Every response includes exact token counts, cost in USD, and latency in milliseconds — so you always know which model is fastest and cheapest for your workload.

Supports all major providers

Claude Sonnet 4.6 and Haiku 4.5 from Anthropic, GPT-4o and GPT-4o-mini from OpenAI, Gemini 2.5 Pro and Flash from Google, and Grok 3 and Grok 3 Mini from xAI.

Python and TypeScript SDKs

First-class SDKs for Python and TypeScript. Drop in the client, pass your prompt and model list, and iterate on results in minutes.

Free tier — no credit card

100 evaluations per month free. Mini models (Claude Haiku, GPT-4o-mini, Gemini Flash, Grok Mini) are included. No credit card required to start.

Usage-based pricing

After the free tier, pay only for what you use. Pricing is LLM provider cost plus a 40% margin — transparent and predictable with a configurable monthly spend limit.

FAQ

Frequently asked questions

What is PromptDiff?: PromptDiff is an API that compares LLM outputs across multiple models (Claude, GPT-4o, Gemini, Grok) with a single API call. It returns structured results including output text, latency, cost, and token usage for each model.
How much does PromptDiff cost?: PromptDiff offers 100 free evaluations per month with no credit card required. After the free tier, pricing is based on actual LLM API costs plus a 40% margin. Mini models (Claude Haiku, GPT-4o-mini, Gemini Flash, Grok Mini) are available in the free tier.
Which LLM models does PromptDiff support?: PromptDiff supports 8 models from 4 providers: Claude Sonnet 4.6 and Haiku 4.5 (Anthropic), GPT-4o and GPT-4o-mini (OpenAI), Gemini 2.5 Pro and Flash (Google), and Grok 3 and Grok 3 Mini (xAI).
Does PromptDiff have an SDK?: Yes, PromptDiff provides both Python and TypeScript SDKs. You can also use the REST API directly with any HTTP client like curl or fetch.

Compare LLM outputsacross models.