Experiments / Test Runs / Run Detail

Claude 3.5 Sonnet - Prompt v2.1

run-006 • Created November 28, 2025 PM1764340200 14:30

Completed Compare

Messages evaluated

1,250

total messages

Total LLM cost

€22.75

for this run

Avg. cost/message

€0.0182

per message

Avg. cost/order

€0.0365

per order

Avg. duration

1245ms

pipeline latency

p95 duration

2890ms

worst case

Field accuracy

94.2%

overall

+2.1 pp

Classification acc.

97.8%

overall

+1.3 pp

Cost & Performance

Cost and Latency Over Messages

Cost per message and pipeline duration across the run.

Cost per message (€)

Pipeline duration (ms)

Latency Distribution

How latency is distributed across messages.

Accuracy

Field Accuracy by Field

Which fields are accurate, and which are problematic?

pickup_date

96.8% +2.6

delivery_date

95.2% +1.4

weight

98.4% +1.3

volume

92.1% +0.6

pickup_address

89.3% +1.7

delivery_address

88.7% +1.8

reference

94.5% +1.3

goods_description

91.2% +0.4

This run

Production baseline

Order Accuracy Buckets

Perfect vs Minor vs Major edits.

Perfect

847 (67.8%)

Minor edits

312 (25.0%)

Major edits

91 (7.3%)

Accuracy by Customer

Which customers benefit or suffer from this run?

Cost Breakdown

Cost by Model

Which AI models contribute most to the total cost?

claude-3.5-sonnet

€18.45 (81.1%)

gpt-4o-mini

€2.85 (12.5%)

claude-3-haiku

€1.45 (6.4%)

Tokens by Pipeline Node

Which nodes use the most tokens? (Sorted by usage)

Node Performance Hotspots

Node Latency by Node

Which nodes are slowest in this run? (Ordered by p95 latency)