Where the data comes from

Every workday, the KORENA Timber Index pipeline visits a curated set of public sources — government auction portals, state-forestry price lists, sectoral statistical bulletins — via Foura.ai, our HTTP fetcher. Foura is the only client allowed to make external network calls; everything else runs locally, deterministically, on the bytes Foura returned.

We then parse those bytes (HTML or PDF) using source-specific providers. Each provider returns a flat list of ParsedPrice rows: price, currency, unit, sale date, species label, region label, transport term, sale type. Parsing is covered by ~280 golden fixtures so format drift fails loudly in CI rather than silently in production.

The full source list — including each source’s attribution terms, license, expected freshness, and last successful scrape — is on the Sources page.

Normalization pipeline

Comparing a Romanian standing-tree auction in RON/m³ inclusive of VAT to a German roadside log in €/Fm ex VAT requires several deterministic conversions. We do them in this order, on every row, before it ever lands in the public index:

  1. VAT → ex-VAT. If a source quotes prices inclusive of VAT, we strip the local VAT rate (RO 19%, DE 19%, BG 20%, …) from price_original.
  2. Local currency → EUR. Using ECB daily reference rates (tpi.fx_rates, refreshed at 04:00 UTC), keyed on the actual sale date — not the scrape date.
  3. Unit basis → €/m³. We accept eight common unit forms (m³, Fm, Rm, ster, MBF, m, ha, piece) and convert via published wood-trade ratios. Conversions that depend on species (e.g. tree-volume estimates) use a species-specific factor.
  4. Transport term → delivered. Standing / roadside / ex-works are uplifted to a delivered-equivalent using transport multipliers — small, conservative, documented per country.

Each provider parses, but never normalizes — normalization is a pure function (toEurPerM3ExVat) covered by ~280 golden fixtures. Changing any constant (FX, unit ratio, transport uplift) requires a deliberate snapshot commit.

Aggregation: the public index

The publicly-served numbers are not raw deals. They are statistical aggregates over a 30-day rolling window, computed per (species, region, product stage) cell as:

Two windows: 30-day median vs. latest available

The default view is a 30-day median index: ideal for high-frequency sources (state-forestry auctions, daily price lists) where a rolling month gives a stable, current band. But some sources publish slowly — annual standing-wood indices, quarterly market reports, sparse auction archives — and would never appear in a 30-day window.

The “latest available” view solves that: instead of a fixed 30-day cutoff, each (species, region, stage) cell shows its most-recent batch, so an annual index surfaces with n_observations = 1 rather than disappearing. Each row carries a freshness indicator derived from how long ago the underlying sale was: fresh (within ~45 days), recent (within ~7 months), or pinned to the actual sale date (“as of 2024-03-01”) when older. Toggle between the two windows above the table on the home page.

Public-index gate

To publish a cell, the worker enforces a minimum-evidence gate:

The n_sources column is therefore often 1 today. The second-source gate reactivates once additional Phase-2 sources are flipped (you’ll see the column climb across more cells without us having to redeploy).

Confidence score

Each normalized row carries an integer 0–100 confidence score that reflects three things:

The aggregate’s avg_confidence is the simple mean of the contributing rows. The 4-pip indicator (●●●○) is just round(score / 25) filled pips.

Limitations — what this index is not

Refresh + freshness

Sources are scraped on their own schedules (most: daily; some: weekly or monthly). The hub revalidates hourly via Next.js ISR, and the API responses carry Cache-Control: s-maxage=3600, stale-while-revalidate=7200 headers. So a freshly-flipped source can appear within an hour of the worker running, with no deploy required.

Each row carries a latest_sale timestamp; the source itself is marked is_stale if its expected freshness window has elapsed without an update.

License + attribution

You may republish derived numbers from this index with attribution to korena.eu/timber-index and a link back. Where a source’s license requires per-source attribution (e.g. some official price lists), the source-drawer on a row will tell you which attribution text to use.

The full technical specification — schema, queue topology, provider lifecycle, normalization details — is in the timber-index repository on GitLab.