Metrix Logo
Experiments / Test Runs

Test Runs

Compare model configurations, prompts, and pipelines before deploying to production.

Total test runs
7
experiments
+3
this week
Completed runs
4
finished experiments
71%
success rate
Avg. field accuracy
91.5%
across completed runs
+2.1 pp
vs baseline
Best performer
94.2%
Claude 3.5 Sonnet
Top run
All Test Runs

Overview of all experiment runs with key metrics.

Run NameCreatedPipelineMessagesField Acc.Class. Acc.Avg. CostStatusActions
Claude 3.5 Sonnet - Prompt v2.1 run-001
Nov 28, 14:30Both1,25094.2%97.8%€0.0182 CompletedView
GPT-4o Mini - Cost Optimization run-002
Nov 27, 10:15Extraction1,25091.5%95.2%€0.0098 CompletedView
Claude 3 Haiku - Speed Test run-003
Nov 26, 16:45Classification1,25088.3%93.1%€0.0045 CompletedView
Production Baseline run-004
Nov 25, 09:00Both1,25092.1%96.5%€0.0156 CompletedView
GPT-4o - Accuracy Focus run-005
Nov 29, 08:30Both847 RunningView
Gemini 1.5 Pro - Experiment run-006
Nov 29, 11:00Extraction0 PendingView
Claude 3 Opus - Failed Config run-007
Nov 24, 15:20Both156€0.0892 FailedView
Accuracy vs Cost per Run

Which runs give high accuracy at low cost? (Cost normalized to 0–100 scale)

Field Accuracy (%)
Avg Cost (normalized)
Run Status Over Time

How frequently are experiments running and succeeding?

Completed
Failed
Running