Reference
Methodology
How we turn scattered public auction records, official price lists and statistical bulletins into a comparable €/m³ index — and what that index does and doesn't tell you.
Where the data comes from
Every workday, the KORENA Timber Index pipeline visits a curated set of public sources — government auction portals, state-forestry price lists, sectoral statistical bulletins — via Foura.ai, our HTTP fetcher. Foura is the only client allowed to make external network calls; everything else runs locally, deterministically, on the bytes Foura returned.
We then parse those bytes (HTML or PDF) using source-specific providers. Each provider returns a flat list of ParsedPrice rows: price, currency, unit, sale date, species label, region label, transport term, sale type. Parsing is covered by ~280 golden fixtures so format drift fails loudly in CI rather than silently in production.
The full source list — including each source’s attribution terms, license, expected freshness, and last successful scrape — is on the Sources page.
Normalization pipeline
Comparing a Romanian standing-tree auction in RON/m³ inclusive of VAT to a German roadside log in €/Fm ex VAT requires several deterministic conversions. We do them in this order, on every row, before it ever lands in the public index:
- VAT → ex-VAT. If a source quotes prices inclusive of VAT, we strip the local VAT rate (RO 19%, DE 19%, BG 20%, …) from
price_original. - Local currency → EUR. Using ECB daily reference rates (
tpi.fx_rates, refreshed at 04:00 UTC), keyed on the actual sale date — not the scrape date. - Unit basis → €/m³. We accept eight common unit forms (m³, Fm, Rm, ster, MBF, m, ha, piece) and convert via published wood-trade ratios. Conversions that depend on species (e.g. tree-volume estimates) use a species-specific factor.
- Transport term → delivered. Standing / roadside / ex-works are uplifted to a delivered-equivalent using transport multipliers — small, conservative, documented per country.
Each provider parses, but never normalizes — normalization is a pure function (toEurPerM3ExVat) covered by ~280 golden fixtures. Changing any constant (FX, unit ratio, transport uplift) requires a deliberate snapshot commit.
Aggregation: the public index
The publicly-served numbers are not raw deals. They are statistical aggregates over a 30-day rolling window, computed per (species, region, product stage) cell as:
- P10 — 10th-percentile price
- Median — 50th-percentile (the headline)
- P90 — 90th-percentile price
- n_observations — how many normalized rows fed this cell
- n_sources — how many distinct public sources contributed
- avg_confidence — mean of per-row provenance scores (0–100)
Two windows: 30-day median vs. latest available
The default view is a 30-day median index: ideal for high-frequency sources (state-forestry auctions, daily price lists) where a rolling month gives a stable, current band. But some sources publish slowly — annual standing-wood indices, quarterly market reports, sparse auction archives — and would never appear in a 30-day window.
The “latest available” view solves that: instead of a fixed 30-day cutoff, each (species, region, stage) cell shows its most-recent batch, so an annual index surfaces with n_observations = 1 rather than disappearing. Each row carries a freshness indicator derived from how long ago the underlying sale was: fresh (within ~45 days), recent (within ~7 months), or pinned to the actual sale date (“as of 2024-03-01”) when older. Toggle between the two windows above the table on the home page.
Public-index gate
To publish a cell, the worker enforces a minimum-evidence gate:
- Production rule (long-term): ≥ 5 observations and ≥ 2 distinct sources in the 30-day window.
- V1 relaxation (current): ≥ 5 observations only. We launched with 4 of 11 sources publicly contributing — the second-source requirement is temporarily off so the most relevant cells (Bulgarian and Romanian state-forestry auctions) can be served while Phase-2 sources go through QA.
The n_sources column is therefore often 1 today. The second-source gate reactivates once additional Phase-2 sources are flipped (you’ll see the column climb across more cells without us having to redeploy).
Confidence score
Each normalized row carries an integer 0–100 confidence score that reflects three things:
- Source quality — official state-forestry auctions outrank market-report summaries.
- Field completeness — rows with explicit species + region + transport term + sale date score higher than ones where we had to infer.
- Normalization fragility — rows that survived currency + unit + VAT + transport conversion without dictionary fallbacks score higher than rows where a dictionary defaulted.
The aggregate’s avg_confidence is the simple mean of the contributing rows. The 4-pip indicator (●●●○) is just round(score / 25) filled pips.
Limitations — what this index is not
- Not a live spot quote. Cells aggregate sales over 30 days. If you need today’s offer for a specific piece, the index gives you a credible reference band — not a quote.
- Regional, not parcel-level. We aggregate to region, not to a specific forest district or sawmill. Two cells with the same median can sit on very different standing inventories.
- Public sources only. Confidential trade data never enters the public index. Where a country only publishes standing-tree auctions, the sawlog and roadside cells will look sparser — that’s honest, not a bug.
- FX timing. Conversions use the ECB rate for the sale date. Volatile-currency periods will pull aggregates in ways that have nothing to do with timber supply.
- Coverage is uneven. Romania publishes much more than Bulgaria; Germany’s public picture is fragmented across 16 Länder. The index reflects whatever the public record contains, weighted by what we could parse.
Refresh + freshness
Sources are scraped on their own schedules (most: daily; some: weekly or monthly). The hub revalidates hourly via Next.js ISR, and the API responses carry Cache-Control: s-maxage=3600, stale-while-revalidate=7200 headers. So a freshly-flipped source can appear within an hour of the worker running, with no deploy required.
Each row carries a latest_sale timestamp; the source itself is marked is_stale if its expected freshness window has elapsed without an update.
License + attribution
You may republish derived numbers from this index with attribution to korena.eu/timber-index and a link back. Where a source’s license requires per-source attribution (e.g. some official price lists), the source-drawer on a row will tell you which attribution text to use.
The full technical specification — schema, queue topology, provider lifecycle, normalization details — is in the timber-index repository on GitLab.