Seller Metrics Engine

YAML-declared metrics and dimensions, executed as a planner/fetcher/joiner/aggregator pipeline over rpt_* report tables and dynamic_attributes_values. Surfaced to the LLM via metrics-engine-local-claude-code (and prod) MCP servers.

Where things live

Path	Purpose
`mcp_server/seller_metrics_engine/definitions/metrics.yaml`	Every metric — source/formula/grain/category
`mcp_server/seller_metrics_engine/definitions/dimensions.yaml`	Every dimension — table_column_map, optional classification or enricher binding
`mcp_server/seller_metrics_engine/definitions/__init__.py`	YAML loader — loaded once at import
`mcp_server/seller_metrics_engine/tools/query.py`	MCP tool entrypoints (`query_metrics`, `compare_periods`)
`app/services/seller_metrics_engine/engine/`	Resolver, planner, joiner, aggregator, executor
`app/services/seller_metrics_engine/engine/enricher/`	Stage-typed enricher framework — see Enrichers
`app/services/seller_metrics_engine/strategies/`	`report_table.py` (rpt_* fetch) + `dynamic_attribute.py` (EAV pivot)

Pipeline at a glance

Every stage transition triggers run_enrichers_at(stage, ctx) — see Enrichers for hooks.

Required-metric validation

A universal pipeline step (between post_join enrichers and the internal-dim collapse). It scans resolved.all_metrics for any with required: true and checks the joined DataFrame for null rows in the metric's column. If any are present, it raises RequiredMetricMissingError naming the affected entities (ASIN / SKU / parent ASIN — whichever ID column the DF carries).

The step runs at source-row grain — before the collapse step drops the asin column — so the error can identify which specific entities are missing data.

# In the executor pipeline, after post_join enrichers:
self._validate_required_metrics(ctx.df, resolved)

The MCP layer maps the exception to a structured required_metric_missing error code with metric + missing_entities fields, so the LLM can tell the user precisely what needs to be configured (COGS uploaded for those ASINs, MB commission configured, etc.) instead of returning a silent null cascade through margin formulas.

To make a metric required, add required: true to its YAML entry. No registration step — the validator is part of the engine, not an enricher.

Defining a metric

Two flavors:

L1 — sourced directly from a column

- key: pnl_sales
  display_name: 'PNL: Sales'
  description: Gross sales totals from SKU Economics.
  sources:
  - alias: pnl_sales
    table: rpt_pnl_sku_economics
    column: sales
  metric_refs: null
  formula: null
  supported_granularities: [weekly, monthly, quarterly]
  aggregation_rule: SUM
  required: true
  format: $
  category: PNL
  hierarchy_level: L1

L1 — SQL expression (computed per source row)

For per-row arithmetic that must aggregate as SUM(a*b) rather than SUM(a)*SUM(b):

sources:
- alias: pnl_landed_cogs_amt
  table: rpt_pnl_sku_economics
  expr: cost_of_goods_sold_per_unit * units_sold

Engine emits SUM(cost_of_goods_sold_per_unit * units_sold) in the SELECT, so per-row arithmetic is preserved before aggregation. YAML is trusted — same blast radius as Python formula: strings.

L2/L3 — computed from other metrics

- key: pnl_pre_ad_margin
  sources: null
  metric_refs: [pnl_sales, pnl_landed_cogs_amt, pnl_referral_fee_amt, pnl_fba_fee, pnl_mb_commission_amt]
  formula: pnl_sales - pnl_landed_cogs_amt - pnl_referral_fee_amt - pnl_fba_fee - pnl_mb_commission_amt
  aggregation_rule: Recalculate

Recalculate means: SUM the source components first during aggregation, then re-evaluate the formula. Use it for any ratio/total whose pieces must be summed before division (ACoS, margin %, etc.).

Computed by enricher (no formula, no source)

- key: pnl_mb_commission_amt
  sources: null
  metric_refs: [pnl_sales]      # forces rpt_pnl_sku_economics fetch
  formula: null
  enricher: mb_commission        # ← Python-implemented
  aggregation_rule: SUM
  required: true

See Enrichers for how this is wired.

Field reference

Field	Purpose
`key`	Unique identifier; what the LLM passes in `metrics: [...]`
`display_name`, `description`	LLM-facing
`sources`	List of `{alias, table, column}` or `{alias, table, expr}`
`metric_refs`	Transitive deps — pulled into the plan even if not user-requested
`formula`	Python expression evaluated on the polars DF (column refs by alias)
`enricher`	Name of a registered Python enricher that fills this metric
`aggregation_rule`	`SUM` / `Average` / `Recalculate` / null
`required`	If true, the engine raises `RequiredMetricMissingError` on null source data — see Required-metric validation
`supported_granularities`	Which `granularity` query values are valid
`format`	`$` / `%` / `int` — output formatting
`category`, `hierarchy_level`	Browsing/grouping in `list_metrics`

Defining a dimension

- key: sku
  display_name: SKU
  description: Seller-assigned SKU. Column name varies per report — engine resolves the right one for each table.
  table_column_map:
  - {table: rpt_pnl_sku_economics, column: msku}
  - {table: rpt_restock_inventory, column: msku}
  - {table: rpt_br_detail_page_sales_traffic, column: sku}
  - {table: rpt_all_listings, column: seller_sku}
  - {table: rpt_sponsored_products_advertised_product, column: advertised_sku}
  data_type: string
  category: Product

One dim, N tables. The planner picks the right column per table.

Three resolution paths (in order)

Explicit YAML — table_column_map entry matches a queried table.
Enricher — dim is registered with an Enricher subclass that adds the column post-fetch (e.g. parent_asin looked up via new_asins).
Native column fallback — dim key is not in YAML at all, but a queried table has a column with that exact name. Engine accepts it implicitly. Example: dimensions=["seller_id"] works on every rpt_* table without a YAML entry.

If none match, the planner raises DimensionTableMismatchError with the queried table list.

Classification dimensions (derived from a formula)

- key: campaign_type
  description: Auto vs Manual based on targeting type.
  table_column_map:
  - {table: rpt_sponsored_products_targeting, column: targeting}
  classification_formula: "'Auto' if targeting in ('close-match','loose-match','substitutes','complements') else 'Manual'"

Aggregator evaluates the formula on the raw column post-fetch, then re-aggregates by the derived value.

Query flow

QueryRequest(
    metrics=["pnl_sales", "pnl_pre_ad_margin", "pnl_pre_ad_margin_pct"],
    dimensions=["sku"],
    filters={"child_asin": ["B0X...", "B0Y..."]},
    granularity="monthly",
    date_range_start="2026-04-01", date_range_end="2026-04-30",
    seller_id="AGBDRQ8IA12WY",
    marketplace="US",
)

The MCP wrapper validates seller authorization, parses the date range (date strings, never inferred), and hands off to QueryExecutor.execute(request). Result envelope:

{
  "results": {"format": "csv", "schema": {...}, "data": "<csv>"},
  "trace": {
    "tables_queried": "...",
    "sql_queries":      {csv-envelope},
    "rows_scanned":     {csv-envelope},
    "aggregation_rules": {csv-envelope},
    "formulas_applied":  {csv-envelope},
    "warnings":          "pipe-joined"
  }
}

Conservation of mass

Every dimension grain should return the same total for additive metrics. Quick sanity check when adding metrics or enrichers:

`dimensions=`	rows	total `pnl_sales`
`[]`	1	12,809.59
`[seller_id]`	1	12,809.59
`[parent_asin]`	5	12,809.59
`[child_asin]`	9	12,809.59
`[sku]`	15	12,809.59

If a new metric breaks this, suspect the aggregator: the enricher is probably running post-collapse instead of pre-collapse, or a formula is summing a non-additive value. For a deeper walkthrough of how enrichers preserve this property — including a worked example through SKU/ASIN/seller grain — see Enrichers — How dimensionality is handled.

Adding a metric — checklist

Pick the right hierarchy level (L1 source-backed / L2 formula / L3 with ad spend).
Decide aggregation rule: raw SUM for L1 totals, Recalculate for L2/L3 ratios.
Mark required: true if downstream margins would silently break with null inputs.
Add to metrics.yaml; restart the MCP server (YAML loads once at import).
Smoke-test conservation across at least three grains.

Adding a dimension — checklist

Identify the column name on every relevant report table.
If the column name is identical across tables, consider relying on native-column fallback instead of YAML.
Otherwise add a table_column_map entry per table.
If the dim needs a lookup (e.g. parent ASIN from child), add an enricher and bind it with enricher: <name>.

Metrics Engine Enrichers — stage-typed framework for custom column computation
PnL Metrics — PnL-specific source mapping, formulas, and required-data semantics
Reporting Structure — upstream rpt_* table architecture

Where things live​

Pipeline at a glance​

Required-metric validation​

Defining a metric​

L1 — sourced directly from a column​

L1 — SQL expression (computed per source row)​

L2/L3 — computed from other metrics​

Computed by enricher (no formula, no source)​

Field reference​

Defining a dimension​

Three resolution paths (in order)​

Classification dimensions (derived from a formula)​

Query flow​

Conservation of mass​

Adding a metric — checklist​

Adding a dimension — checklist​

Related​