1a5995b4304496a2Manual reviewer confirmed this as agricultural burn, PHOENIX had it at tier T0 with outcome (none). Reviewer wave v1.2.1, 0.86 km from cluster centroid.
Manual reviewers from the FP-catalog confirmed these events as real fires or agricultural burns; PHOENIX had them at low tier without confirmation.
1a5995b4304496a2Manual reviewer confirmed this as agricultural burn, PHOENIX had it at tier T0 with outcome (none). Reviewer wave v1.2.1, 0.86 km from cluster centroid.
c08fb03a14dba285Manual reviewer confirmed this as agricultural burn, PHOENIX had it at tier T0 with outcome (none). Reviewer wave v1.2.1, 1.92 km from cluster centroid.
d62cd5731e904495Manual reviewer confirmed this as agricultural burn, PHOENIX had it at tier T0 with outcome (none). Reviewer wave v1.2.1, 1.37 km from cluster centroid.
be98c0e04c005aaaManual reviewer confirmed this as agricultural burn, PHOENIX had it at tier T0 with outcome (none). Reviewer wave v1.2.1, 3.37 km from cluster centroid.
1fd44aad3dc127d7Manual reviewer confirmed this as real fire, PHOENIX had it at tier T0 with outcome (none). Reviewer wave v1.2.1, 0.93 km from cluster centroid.
... and 582 more. Full list at /api/auto_change_log?type=catalog_corrects_phoenix.
N-series adaptive clustering (shadow mode) believes PHOENIX missed these. No live alert change until the 30-60 day promotion gate passes.
3d16710fa180096bN-series shadow clustering promotes from T0 to T2. Corroborating families: phoenix_thermal,viirs. Shadow promotion is observational; no live alert change until the 30-60 day promotion gate passes.
9f854d7b0b7993e0N-series shadow clustering promotes from T0 to T2. Corroborating families: phoenix_thermal,viirs. Shadow promotion is observational; no live alert change until the 30-60 day promotion gate passes.
80eb84b7d9f77b05N-series shadow clustering promotes from T0 to T2. Corroborating families: phoenix_thermal,viirs. Shadow promotion is observational; no live alert change until the 30-60 day promotion gate passes.
332dd0380884e92fN-series shadow clustering promotes from T0 to T2. Corroborating families: phoenix_thermal,viirs. Shadow promotion is observational; no live alert change until the 30-60 day promotion gate passes.
251659d406bc356cN-series shadow clustering promotes from T0 to T2. Corroborating families: phoenix_thermal,viirs. Shadow promotion is observational; no live alert change until the 30-60 day promotion gate passes.
... and 929 more. Full list at /api/auto_change_log?type=shadow_upgrades.
What is new. A new high-confidence detection class voted_alpha is now emitted whenever ≥2 independent sensor families coincide within 5 km and 30 min. Sensor families are: firms (VIIRS+MODIS active-fire), slstr_frp, mtg_af_l2, seviri_subpixel, seviri_dozier, fci_subpixel, wind_diff, tropomi, sentinel1_sar, sentinel2_swir, vigili_fuoco. Single-sensor detections continue to flow under their existing tags for telemetry / debug; only the voter-confirmed events are tagged voted_alpha.
Why. Per-sensor precision is moderate; coincidence across independent physics (thermal MIR vs SWIR vs SAR change vs chemical NO2/HCHO anomaly vs human report) approaches a theoretical near-100% precision floor on the ≥2-family path. Same conformal-singleton-gate pattern as the gold-vault drone classifier - applied to wildfire confirmation. FIRMS counts as a voter, but a cluster opened by FIRMS alone is held back until an independent voter arrives (FIRMS-as-judge rule), so PHOENIX does not merely re-emit FIRMS as its own high-confidence call.
Backfill. A 7-day historical pass over internal_fires + external_fires produced 18 voted events out of 18,893 raw detections (0.10% of detections promoted, 1.24% of spatial-temporal clusters promoted). Voted pairs were dominated by (firms, wind_diff), (seviri_subpixel, wind_diff) and (fci_subpixel, wind_diff), which is the expected outcome - wind_diff is high-recall low-precision and is exactly the source the voter is designed to corroborate.
Operational impact. Read-only against the existing detector tables. New SQLite table wildfire_voted_events in ground_truth.sqlite. New endpoints /api/voted_events and /api/voter_status. /api/detections rows now carry voter_count, voted_alpha, voter_list, voter_event_id fields. Runs out-of-process as phoenix-voter.timer every 5 min in phoenix.slice, load_guard-wrapped, MemoryMax=1G, Nice=10. Per-environment parameter dict at src/voter/coincidence.py:VOTER_PARAMS (sicily_mediterranean_summer active; add new environments as scope expands).
Honest limitation. The voter does NOT validate detector correctness - if two independent sensors share the same systematic FP source (e.g. both confused by a glasshouse-roof glint), the voter will confirm the FP. The FP-catalog mask zones in /falsi-positivi are the orthogonal safety net.
Per the project's transparency commitment, every algorithm / threshold / mask / weight change that affects detection or reporting is logged here with the rationale. When public PHOENIX data turns out to be incorrect, a retraction is filed and linked from the retraction policy.
Change: src/detectors/dozier_v1.py now applies the canonical Dozier (1981) MIR−LST anomaly as a strict pre-gate before the bi-spectral solve: a candidate is kept only if BT_MIR − LST_now ≥ 12 K (per-sensor / per-environment configurable). LST is supplied per-pixel from the LSA-SAF MTG-LST product via the existing src/data_sources/lst_resample.py sidecar resampler that subpixel_v1 already uses, so dozier_v1 and subpixel_v1 now share the same LST truth. Detections record lst_now_k, lst_gate_status (passed / rejected / bypassed), lst_delta_threshold_k, and mir_minus_lst_k for full audit trail.
Why: The DEFENSE-O audit (2026-06-10) found that the file named dozier_v1.py implemented only the bi-spectral solver and made no LST consumption at all — in direct contradiction with the public-facing “Dozier MIR-LST gate” description on this site. Reviewer instinct of the name was right; the code was wrong. Rather than rename the file (a cosmetic fix), we wired LST in so the detector now actually implements the namesake Dozier (1981) gate.
Honest caveats: (1) Legacy callers that do not pass lst_now still run dozier_v1 with the LST gate bypassed; those detections are tagged lst_gate=bypassed so downstream consumers can discount them. (2) Pixels with cloud-masked or invalid LST are skipped silently — without truth we cannot honor the namesake gate, so the detector stays silent rather than emit an un-verified detection. (3) The 12 K default is conservative (stricter than subpixel_v1’s relaxed 8 K) and is environment-keyed per the OLDGABE per-environment-tuning rule.
Audit report: Internal at phoenix_mir_wooster_audit/DEFENSE_R_DOZIER_LST_INTEGRATION_2026_06_10.md. Code backup at src/detectors/dozier_v1.py.bak-pre-r-lst-integration-2026_06_10.
The src/data_sources/hawkes_ignition.py module fits a spatio-temporal
Hawkes process on FIRMS history to produce a 24h ignition-probability grid for Sicily.
An adaptive hook in _hawkes_threshold_adjustment was configured to lower
the per-pixel ml_accept threshold by 0.05 (equivalent to a ~1 K MIR-delta
reduction) whenever prob_24h > 0.5 in the candidate's cell.
An internal audit (DEFENSE-Q, 2026-06-10) found the Hawkes parameters (alpha self-excitation magnitude, r=2 km spatial kernel, tau=36 h temporal decay) and the 0.5 trigger probability were never validated against a held-out FIRMS truth set. No reliability backtest, no precision/recall sweep, no calibration curve.
Action shipped 2026-06-10:
IS_CALIBRATED = False added._hawkes_threshold_adjustment now early-returns 0.0 when the flag is False, so the threshold reduction is currently a no-op./api/ignition_prior endpoint still reports the Hawkes value (for transparency) but annotates each response with is_calibrated: false and a calibration_note instructing downstream consumers to ignore the value.The flag will flip to True only after a reliability backtest harness
lands and the calibration curve is published on this page.
Updated 2026-06-10 post-G8 bimodal finding. The original 2026-06-09 entry framed the PHOENIX-vs-FIRMS residual as a single ∼7 km NNE systematic offset and proposed a static Δlat / Δlon calibration behind an env flag. The G8 audit (2026-06-09, fifth and final falsification attempt) showed that framing was incorrect. This corrected entry supersedes it.
What G8 actually found: the n = 9 PHOENIX/FIRMS matched pairs in
the Sicily 14-day window are not 9 independent samples. They are
5 adjacent PHOENIX pixels clustered around ONE FIRMS hit at
(37.09, 14.37) with bearings 24–68° (NE/ENE), and
3 adjacent PHOENIX pixels clustered around ANOTHER FIRMS hit at
(37.66, 12.74) with bearings 313–344° (NNW). The effective
independent-fire count is n_independent ≈ 2. The original
"mean bearing 16.7° NNE" is the circular mean of two opposing
directional clusters, not a coherent systematic offset. The original
Rayleigh p = 0.002 is inflated by pixel-cluster
pseudoreplication and should not be cited as evidence of a unimodal bias.
Why this matters: a static Δlat / Δlon calibration can only help one cluster while hurting the other. Applying +0.060° / +0.020° (the originally-proposed offset) would improve the NE/ENE cluster's residual but worsen the NNW cluster's residual by an equal amount. There is no single shift that fits both. The right framing is per-detection uncertainty zone, not bias correction.
G1 calibration rollback: the previously-shipped env-flag drop-in
PHOENIX_FCI_SUBPIXEL_GEO_OFFSET=1 (proposed Δlat = +0.060°,
Δlon = +0.020°) has been rolled back. The drop-in
file has been renamed to .disabled-pre-g8-bimodal-finding-20260610
and is no longer loaded by the service. No static offset is in effect.
Five consecutive falsifications — pivot from root-cause to mitigation: the underlying source of the residual has now failed every hypothesised mechanism we could test:
DEFAULT_SICILY_CHUNKS too narrow): widening
to (29..36) did not move the median residual. FALSIFIED.Mitigation shipped instead of calibration: PHOENIX will publish a
per-detection ~7 km circular uncertainty radius on /event/,
/accuracy/, and the public map, and will stop claiming pixel-level
localization for subpixel_v1. The existing additive credit-extras
scorer (commit 9e70a24) and /api/detections_phoenix_anchored
endpoint are kept — they were always the right read-side fix and are not
affected by the rollback.
What is still open (not yet falsified): the H2 hypothesis from G8 — that L2 FCI-AF and L1c-derived PHOENIX subpixel_v1 are detecting different physical fires (PHOENIX seeing smaller / earlier events that fall below L2's threshold) — is unresolved. It will be re-tested when the 30-day backfill provides ≥ 5 fires detected by BOTH L2 and PHOENIX within ±10 min / ±5 km.
Humble framing (unchanged): the Sicily 14-day window contained 10 raw PHOENIX detections vs 16,611 FIRMS hits. FIRMS / VIIRS (NOAA20 / NOAA21 / SNPP) / MODIS / SLSTR / OroraTech / Vigili del Fuoco continue to be the workhorses and remain authoritative for operational dispatch. PHOENIX is an experimental prototype; it does not replace them, and this post-G8 correction reinforces — not weakens — that posture.
Audit reports: Internal at
phoenix_mir_wooster_audit/G5_ACCUMULATOR_PROJECTION_PORT_2026_06_09.md,
G6_CHUNK_WIDEN_PLUS_PARALLAX_2026_06_09.md,
G7_WIND_DRIFT_HYPOTHESIS_2026_06_09.md,
G8_SATPY_NAV_AND_FIRMS_ERROR_2026_06_09.md. The original (now-superseded)
bias entry remains in the retraction policy for
full provenance.
What was wrong: An internal audit found that
_wooster_frp_mw in src/detectors/subpixel_v1.py
was using FRP = σ · A_pix · (T8_fire − T8_bg)
with σ = 1.89e-19. This formula does not appear in any
published Wooster paper (2003, 2005, 2015) nor in the NOAA Enterprise FDC
ATBD or SLSTR Fire ATBD. The 2026-05-25 "fix" (see
/retraction-policy) patched only the constant
magnitude (1.89e-7 → 1.89e-19); the wrong T8 exponent was
never corrected, and the K-8 unit string is not physical.
Operational impact: Limited. The live /api/detections
endpoint at the time of audit returned zero rows tagged
subpixel_v1_alpha — all currently-served FRP values come
from wind_diff and fci_l1c proxies (which are also
non-canonical but are not the subject of this audit). Historical
subpixel_v1_alpha rows written between 2026-05-25 and 2026-06-05
used the wrong-formula function; magnitudes were within an order of magnitude
of canonical for the typical 295–315 K BT operating range but diverged
by 20× on synthetic high-temperature inputs.
Fix: Replaced the function body with the canonical Wooster
pixel-integrated Stefan-Boltzmann form
FRP = A_pix · σ_SB · (T4_fire − T4_bg)
with σ_SB = 5.670374419 × 10-8
W·m-2·K-4 (Wooster 2003 eq. 4 /
Wooster 2005 JGR Atmos 10.1029/2005JD006318 eq. 15). Shipped seven
pytest unit tests in tests/test_wooster_frp.py
covering the canonical synthetic-fire input, null/degenerate cases, and
Stefan-Boltzmann linearity in both pixel area and temperature. All seven
pass. Pre-patch source preserved at
src/detectors/subpixel_v1.py.bak-wooster-fix-2026-06-05.
Honest caveats: The pixel-integrated retrieval does NOT solve
for sub-pixel fire area; for that PHOENIX needs a bi-spectral Dozier-Wooster
inversion (planned: src/detectors/dozier_v1.py). The
pixel-integrated form under-predicts true FRP whenever a hot sub-pixel patch
averages with cool background; readers comparing PHOENIX FRP to MTG-AF-L2 or
SLSTR-FRP retrievals should expect a low bias for small sub-pixel fires.
Audit report: Internal at phoenix_mir_wooster_audit/AUDIT_2026_06_05.md.
Shipped the output of a 6-round bake-off: a 3-model ensemble (v5_derived 8-channel
physics features, v4_focal with focal loss, stunet_fused_honest) combined by
mean-of-sigmoids and isotonic-calibrated against labelled training data. New
shadow columns on event_grades: ensemble_score,
ensemble_score_computed_at, ensemble_calibrated_score.
Scoring runs every 6 h via the laptop shadow pipeline. No live alert, tier,
or broadcast change yet — a 30-day shadow-observation window runs through
2026-06-29 before any promotion to authoritative. Per-round honest metrics are
documented at /methodology. Public methodology
page added the same day.
Added src/verifiers/sentinel2_active_fire.py — a SWIR
saturation detector (Murphy 2016 / Cicala-Genovese 2019) that pulls
Sentinel-2 L2A scenes within ±6 h of recent PHOENIX detections and looks
for B12 SWIR-2 saturation signatures with cloud-mask + B11 / B8A ratio
filters. Hits are written to external_fires with source
s2_active_fire. The reconciler now scans this source as an
independent corroboration path (confirmed_s2_active_fire
outcome). 3 h cadence, runs as a gunicorn daemon thread alongside the
existing comparators.
Added src/verifiers/score_on_arrival.py — a DGX-side daemon
that polls the SEVIRI baseline-frames directory every 5 min and immediately
scores recent PHOENIX detections with the full ensemble when a new frame
arrives. CPU torch on DGX (0.1 s per pass), cuts up to ~6 h of artificial
latency vs the laptop 6 h shadow-pipeline cadence. Ensemble weights mirrored
to DGX at ensemble_weights/.
Added sole_reporter_alert column on event_grades.
Flag is set to active when (1) ensemble calibrated probability
≥ 0.80, (2) zero corroborators within the matching window, and (3) the
pixel falls in a wildland-urban or urban WUI class. The tier is exposed
via /api/sole_reporter. Each entry includes a defense_url
pointing to a per-event evidence page and explicit disclaimers stating
PHOENIX is a research project, not an operational alerting system.
/event/<event_key>New public page showing the full evidence trail for any event PHOENIX
has graded: all model scores (transformer, raw ensemble, calibrated
ensemble — none aggregated away), N1 adaptive-clustering tier, N3
SAR-silence refutation status, the corroborator presence flags (VVF,
news, burn-scar, SAR change, LST anomaly), the raw external_fires
rows within ±72 h / ±7 km of the cluster centroid, race-strict timing
context, calibration version + sample size, and the t72h / t14d / t45d
outcome chain with raw evidence JSONs. Every claim on the rest of the
site has a clickable trail through to the row that supports it.
/lead-timeThe lead-time page was rewritten to remove all competitive-claim
language. Previous draft framed PHOENIX as "earlier than" FIRMS / MODIS
/ VIIRS / VVF / ANSA. That framing was wrong. The free operational
feeds are the indispensable ground truth that makes PHOENIX possible — we
have nothing to verify against without them. The page now opens with a
red warning banner enumerating what we are NOT claiming, frames every
metric as observation rather than comparison, and gates any
minute-level race-strict claim behind n > 100 measurements. Every
number on the page links through to /event/<key>
for defensibility.
Added daemon_heartbeat.py on the laptop — queries live DGX
state for the latest update of each of the five 2026-05-29-shipped daemons
(transformer_inference, ensemble_inference, sole_reporter_alert,
sentinel2_active_fire, score_on_arrival) and emails Mark on per-daemon
silence-threshold breach. Per-daemon thresholds account for the inherently
sparse Sentinel-2 5-day revisit (72 h threshold) vs the 5-min poll cadence
of score-on-arrival (1 h threshold).
s2_active_fire sourcePatched scripts/grade_events.py to add a new
_scan_s2_active_fire function that surfaces the
confirmed_s2_active_fire outcome when an S2 SWIR-saturation
hit lands within radius/time window. Previously the new s2_active_fire
rows were written to external_fires but ignored by the
reconciler (it scanned only VVF / news / DPC / SAR / LST sources).
sentinel2_burnscar.pyThe burn-scar verifier source file on DGX had ~2.7 KB of trailing
null bytes (from a partial write at some prior point), causing Python to
raise SyntaxError: source code string cannot contain null bytes
when other modules tried to import its helpers. The valid file content
ends at byte 22,341; the file was truncated to that length and
recompiles cleanly. Backup at
sentinel2_burnscar.py.bak-nullbytes-2026-05-29.
The Wooster-2005 MIR-FRP coefficient in src/detectors/subpixel_v1.py
was off by 12 orders of magnitude (1.89e-7 vs the correct 1.89e-19). Some
rows in /api/detections showed FRP values up to 3.9 trillion MW —
physically impossible. Constant fixed; 2 catastrophic historical rows
nulled; service restarted. New detections produce 15–60 MW for typical
hot pixels (physical range).
The t72h / t14d / t45d reconciliation path in scripts/grade_events.py
previously consulted only Vigili del Fuoco reports, Italian news, and
Sentinel-2 burn scars. Added Sentinel-1 SAR backscatter change (all-weather,
3–12 day revisit) and MOD11 LST anomaly (contemporary thermal confirmation).
Re-reconciled 917 t72h + 234 t14d + 74 t45d events; produced 13 additional
confirmed_sar_change outcomes that were previously unverified.
The S2 burn-scar verifier in src/verifiers/sentinel2_burnscar.py
previously rejected any post-fire scene with greater than 60% cloud cover.
Raised to 80% (windowed dNBR survives partial cloud) and added a Sentinel-1
SAR-change fallback when S2 is unavailable. Mediterranean cloud cover is
the dominant gap in current burn-verifier coverage.
Added src/frp_gates.py (permissive defaults pending data window) and
scripts/calibrate_confidence.py (pure-Python PAV isotonic regression
per source, joins detections to reconciled outcomes by 5 km / 6 h proximity).
Initial pass: 1 of 3 sources had sufficient samples to fit a curve. Both
scaffolds are NOT yet wired into the live scoring path — wiring follows
~48 h of corrected-FRP data and additional reconciled outcomes.
Centralized engineering material in a single version-controlled
location restricted to the two-person team. The false-positive catalog at
markl02us/persistent-thermal-sources-sicily and the live system at
adr-wildfire.com remain in their existing public locations.
Code-level changes (commit-by-commit) are published at the public FP-catalog
repository markl02us/persistent-thermal-sources-sicily. The full
engineering repo is private (academic, non-commercial); we publish redacted
change summaries here when behaviour affecting public claims is altered.