PHOENIX Sicily — wildfire detection ledger

What this page is. A weekly audit of every fire signal PHOENIX produced for Sicily — algorithmic leads where our detector beat a comparator within its revisit window, co-detections with authoritative sources, fires we missed, and our own sole-reporter detections that 72 hours later either still have no corroborating evidence or have been refuted.

What it isn't. A leaderboard. PHOENIX is an experimental academic system; the authoritative wildfire authority for Sicily is 115 (Vigili del Fuoco). Every claim below is reproducible from the linked CSV.

Last 7 days — every event by outcome

📊 PHOENIX sub-detector precision (resolved subset)

Per the per-feed accuracy rule, every PHOENIX sub-detector gets the same Wilson 95% CI treatment as the external comparators. Resolved = confirmed (T1+T2+T3) + refuted (T+72h with no evidence).

Tier definitions (T0 → T3)
TierMeaning
T0Sole reporter — no independent corroborator within 5 km / ±2 h. Most events sit here. Most are false positives or below comparator detection floors.
T1≥1 independent satellite family corroborated within 5 km / ±2 h.
T2Vigili del Fuoco match (±24 h) or Italian news / Protezione Civile match.
T3Burn-scar verified (Sentinel-2 dNBR > biome threshold: 0.27 forest / 0.18 shrub / 0.12 grass) or Vigili del Fuoco + ≥2 satellite sources.

Race-strict = PHOENIX's lead beats the comparator AND is less than 50% of the comparator's revisit period — a genuine algorithmic advantage. Likely geometric = lead exceeds revisit (we won because their sensor hadn't passed yet, not because our algorithm was faster). Vigili del Fuoco, ANSA news, and similar human-dispatch / social sources do not qualify for race-strict — their revisit isn't a sensor cadence, it's reporting latency. They corroborate truth (T2), but they don't race satellites.

Two clocks: sensor-acquisition Δ (algorithm vs algorithm) and feed-delivered Δ (wall clock vs user). We always show both.

Multi-stage reconcile: T+72 h preliminary, T+14 d after the post-fire Sentinel-2 pass typically clears clouds, T+45 d for long-tail confirmation. Each pass can upgrade or downgrade. Cloud-occluded events flagged as unverifiable, not refuted.

🚒 Confirmed fires in Sicily this week (authoritative sources)

Union of every fire that Vigili del Fuoco, NASA FIRMS, EUMETSAT, Sentinel-3 SLSTR, or other authoritative sources reported. These are the ground-truth events for the week; PHOENIX's contribution to each (co-detected / missed) is shown per row.

✅ PHOENIX-first algorithmic leads (corroborated)

Events where PHOENIX detected before a corroborator AND the event was confirmed (≥T1, not refuted at T+72h). Two-tier badge: RACE-STRICT means PHOENIX's lead was <50% of the comparator's revisit window — algorithmic advantage clearly exceeds orbital geometry. Race-marginal* means PHOENIX still detected first, but by a margin comparable to the comparator's poll cadence — within revisit, real first-detection, but the strict bootstrap test does not yet separate it from chance. Both stay listed as wins; the asterisk explains the methodology nuance.

Asterisk notes (methodology nuance, not retraction):
Race-marginal* — PHOENIX detected before the satellite comparator, but the lead was ≥50% of the comparator's nominal revisit cadence. The null-distribution bootstrap on race-strict (lead <50% revisit) yields p=1.0 — the strict subset is statistically indistinguishable from chance under random comparator-time shuffling. These events are real first-detections; the asterisk flags that the algorithmic margin is small relative to comparator poll noise.
First vs VVF* / First vs news* — PHOENIX produced the detection before the Vigili del Fuoco dispatch report (or news article) for the same fire. Human-dispatch sources don't have a sensor-cadence revisit, so the satellite race-strict bar doesn't apply. The lead is still a real algorithm-vs-reporting advantage. PHOENIX-first wins require ≥1 external corroborator (sat / VVF / news / burn-scar); cross-PHOENIX-family corroboration counts as internal consistency, not a "win", and is moved to the co-detected section.

🤝 Co-detected with comparator

Fires confirmed by ≥1 independent comparator family within 5 km / ±2 h where PHOENIX did not race-win. Comparator may have led; we co-detected. Still real fires, fully credited.

👏 Caught by others, missed by PHOENIX

Real fires detected by NASA, EUMETSAT, Copernicus, Vigili del Fuoco, or other comparators that PHOENIX did NOT independently flag within ±5 km / ±2 h. These are honest coverage gaps — full credit to the teams that caught them. Each row notes whether PHOENIX even had a detector running at the time.

❌ Refuted at T+72h (no corroborating evidence — likely false positives)

Sole-reporter PHOENIX detections where 72 hours have passed and no Vigili del Fuoco, news, or Sentinel-2 burn-scar evidence emerged. These are likely false positives — they bound our precision. Published openly because hiding FPs is the worst pattern.

🟡 Unconfirmed PHOENIX leads (T0, awaiting T+72h reconcile)

Sole-reporter PHOENIX detections younger than 72 h. Most will resolve to refuted based on the historical tail.

⚠️ Unverifiable (cloud-blocked or no Sentinel-2 scene)

Sole-reporter PHOENIX detections where no clear Sentinel-2 pass was available in the reconcile window. These are not refuted — we just can't verify. Counts separately from the FP precision denominator.

📉 Below comparator detection floor

Sole-reporter PHOENIX detections where the event's FRP is below the physical detection floor of every comparator that could have seen it. A comparator literally couldn't have caught these — counted separately, not against precision.

Methodology & reproducibility. Grading code: github.com/markl02us/persistent-thermal-sources-sicily (MIT/CC-BY 4.0). Schema: /api/event_grades · CSV: phoenix_event_grades.csv.
Reproduce these grades yourself. Daily raw-input snapshots (CSV) are published at /data/snapshots/. Each contains internal_fires.csv, external_fires.csv, corroboration_signals.csv, the published event_grades.csv, plus SHA256SUMS. Run scripts/regrade.py (in the repo) against the raw inputs and diff against the published grades — should produce zero mismatches.
Comparator revisit cadences, biome dNBR thresholds, comparator-class rules, and the strict-race definition (lead < 50% of comparator revisit) are documented in scripts/grade_events.py. Found a mismatch? Open an issue or email [email protected].