
Claude 3.5 Sonnet - Prompt v2.1
run-006 • Created November 28, 2025 PM1764340200 14:30
Completed Compare
Messages evaluated
1,250
total messagesTotal LLM cost
€22.75
for this runAvg. cost/message
€0.0182
per messageAvg. cost/order
€0.0365
per orderAvg. duration
1245ms
pipeline latencyp95 duration
2890ms
worst caseField accuracy
94.2%
overall +2.1 pp
Classification acc.
97.8%
overall +1.3 pp
Cost & Performance
Cost and Latency Over Messages
Cost per message and pipeline duration across the run.
Cost per message (€)
Pipeline duration (ms)
Latency Distribution
How latency is distributed across messages.
Accuracy
Field Accuracy by Field
Which fields are accurate, and which are problematic?
pickup_date
96.8% +2.6
delivery_date
95.2% +1.4
weight
98.4% +1.3
volume
92.1% +0.6
pickup_address
89.3% +1.7
delivery_address
88.7% +1.8
reference
94.5% +1.3
goods_description
91.2% +0.4
This run
Production baseline
Order Accuracy Buckets
Perfect vs Minor vs Major edits.
Perfect
847 (67.8%)
Minor edits
312 (25.0%)
Major edits
91 (7.3%)
Accuracy by Customer
Which customers benefit or suffer from this run?
Cost Breakdown
Cost by Model
Which AI models contribute most to the total cost?
claude-3.5-sonnet
€18.45 (81.1%)
gpt-4o-mini
€2.85 (12.5%)
claude-3-haiku
€1.45 (6.4%)
Tokens by Pipeline Node
Which nodes use the most tokens? (Sorted by usage)
Node Performance Hotspots
Node Latency by Node
Which nodes are slowest in this run? (Ordered by p95 latency)