Metrix Logo
Accuracy / Model & Prompt Performance

Model & Prompt Performance

Compare models and prompt versions across accuracy metrics to find optimization opportunities.

Time range
Customer
Best model

Model with highest field accuracy in the selected period.

Claude
94.2% field accuracy
Top performer
Most cost-efficient

Model with best accuracy-to-cost ratio.

Claude
€0.0045/msg
Best value
Current prompt

Active prompt version and its field accuracy.

v2.1
94.8% accuracy
+6.6 pp
since v1.0
Models evaluated

Number of models being tracked for comparison.

4
in comparison
Active
Model performance comparison

How do models compare across multiple accuracy metrics?

Field AccuracyClassificationPerfect Order RateCustomer Match
Claude 3.5 Sonnet
GPT-4o Mini
Claude 3 Haiku
Gemini 1.5 Pro
Accuracy vs cost per model

Which models give the best accuracy for the cost?

Claude 3.5 Sonnet
GPT-4o Mini
Claude 3 Haiku
Gemini 1.5 Pro
Prompt version accuracy over time

Did new prompt versions improve or hurt accuracy when deployed?

Field Accuracy
| Dashed lines = version changes
Accuracy by field group

For specific field groups, which model is best?

Addresses
Claude
91.2%
GPT-4o
87.5%
Claude
84.2%
Gemini
89.1%
Dates
Claude
96.5%
GPT-4o
94.2%
Claude
91.8%
Gemini
95.2%
Quantities
Claude
93.8%
GPT-4o
90.5%
Claude
87.2%
Gemini
92.1%
References
Claude
89.5%
GPT-4o
85.8%
Claude
82.5%
Gemini
87.8%
Claude 3.5 Sonnet
GPT-4o Mini
Claude 3 Haiku
Gemini 1.5 Pro