Metrix Logo
Experiments / Test Runs / Run Detail

Claude 3.5 Sonnet - Prompt v2.1

run-005 • Created November 28, 2025 PM1764340200 14:30

Completed Compare
Messages evaluated
1,250
total messages
Total LLM cost
€22.75
for this run
Avg. cost/message
€0.0182
per message
Avg. cost/order
€0.0365
per order
Avg. duration
1245ms
pipeline latency
p95 duration
2890ms
worst case
Field accuracy
94.2%
overall
+2.1 pp
Classification acc.
97.8%
overall
+1.3 pp

Cost & Performance

Cost and Latency Over Messages

Cost per message and pipeline duration across the run.

Cost per message (€)
Pipeline duration (ms)
Latency Distribution

How latency is distributed across messages.

Accuracy

Field Accuracy by Field

Which fields are accurate, and which are problematic?

pickup_date
96.8% +2.6
delivery_date
95.2% +1.4
weight
98.4% +1.3
volume
92.1% +0.6
pickup_address
89.3% +1.7
delivery_address
88.7% +1.8
reference
94.5% +1.3
goods_description
91.2% +0.4
This run
Production baseline
Order Accuracy Buckets

Perfect vs Minor vs Major edits.

68%Perfect
Perfect
847 (67.8%)
Minor edits
312 (25.0%)
Major edits
91 (7.3%)
Accuracy by Customer

Which customers benefit or suffer from this run?

Cost Breakdown

Cost by Model

Which AI models contribute most to the total cost?

€22.75Total
claude-3.5-sonnet
€18.45 (81.1%)
gpt-4o-mini
€2.85 (12.5%)
claude-3-haiku
€1.45 (6.4%)
Tokens by Pipeline Node

Which nodes use the most tokens? (Sorted by usage)

Node Performance Hotspots

Node Latency by Node

Which nodes are slowest in this run? (Ordered by p95 latency)