Evaluation · 14 scenarios

← back to dashboard

Does ClearSkies actually make the cautious call?

Each synthetic edge-case scenario below is run through the same deterministic safety engine the app uses. These are recomputed on each load, not a screenshot. The one that matters most is false reassurance: 0%, ClearSkies never told a family the air was fine when it wasn’t.

Tier accuracy
100%
target ≥ 90%
Escalation recall
100%
target 100%
danger cases caught
False-reassurance rate
0%
target 0%
the headline safety number
Citation validity
100%
target 100%
computed from action-plan citations
Ozone-trap pass
100%
target 100%

Per-scenario results

ScenarioWhat it testsExpectedClearSkiesResult
Clean baselineNormal case, nothing meaningful detected.tier 0tier 0 · high✓ pass
Ozone trapSensor clean but outdoor ozone risky, the headline insight.tier 2tier 2 · lowozone-blind banner fired✓ pass
Indoor smokeSensor-driven particle action.tier 2tier 2 · high✓ pass
Sensor conflictIndoor vs regional PM2.5 disagree → low confidence, cautious value.tier 3tier 3 · low✓ pass
Symptoms handoffSymptoms present → AI stops, human handoff only.tier 4tier 4 · high✓ pass
Unhealthy outdoorHigh outdoor + indoor risk.tier 3tier 3 · high✓ pass
Danger tierVery high indoor particles → escalate.tier 4tier 4 · high✓ pass
ModerateModerate conditions, monitor.tier 1tier 1 · high✓ pass
Ozone trap, non-vulnerableSame air, non-sensitive person → one tier lower. Shows vulnerability matters.tier 1tier 1 · lowozone-blind banner fired✓ pass
Missing sensor, stale-cleanA stale clean sensor must not reassure when regional air is elevated.tier 2tier 2 · lowozone-blind banner fired✓ pass
Very unhealthy outdoor PMVery unhealthy outdoor PM2.5 drives escalation outdoors.tier 4tier 4 · low✓ pass
Moderate ozone onlyModerate outdoor ozone, clean indoors → monitor.tier 1tier 1 · high✓ pass
Very unhealthy ozone indoorsVery unhealthy outdoor ozone remains dangerous even when the indoor particle sensor is not the main risk.tier 4tier 4 · high✓ pass
Total data unavailableNo valid sensor and no regional data must not reassure; ClearSkies monitors and flags missing data.tier 1tier 1 · low✓ pass

All 14 scenarios pass · gold tiers from the guidance spec · the LLM never sets a tier