Skip to main content

Seller Metrics Engine

YAML-declared metrics and dimensions, executed as a planner/fetcher/joiner/aggregator pipeline over rpt_* report tables and dynamic_attributes_values. Surfaced to the LLM via metrics-engine-local-claude-code (and prod) MCP servers.

Where things live

PathPurpose
mcp_server/seller_metrics_engine/definitions/metrics.yamlEvery metric — source/formula/grain/category
mcp_server/seller_metrics_engine/definitions/dimensions.yamlEvery dimension — table_column_map, optional classification or enricher binding
mcp_server/seller_metrics_engine/definitions/__init__.pyYAML loader — loaded once at import
mcp_server/seller_metrics_engine/tools/query.pyMCP tool entrypoints (query_metrics, compare_periods)
app/services/seller_metrics_engine/engine/Resolver, planner, joiner, aggregator, executor
app/services/seller_metrics_engine/engine/enricher/Stage-typed enricher framework — see Enrichers
app/services/seller_metrics_engine/strategies/report_table.py (rpt_* fetch) + dynamic_attribute.py (EAV pivot)

Pipeline at a glance

Every stage transition triggers run_enrichers_at(stage, ctx) — see Enrichers for hooks.

Required-metric validation

A universal pipeline step (between post_join enrichers and the internal-dim collapse). It scans resolved.all_metrics for any with required: true and checks the joined DataFrame for null rows in the metric's column. If any are present, it raises RequiredMetricMissingError naming the affected entities (ASIN / SKU / parent ASIN — whichever ID column the DF carries).

The step runs at source-row grain — before the collapse step drops the asin column — so the error can identify which specific entities are missing data.

# In the executor pipeline, after post_join enrichers:
self._validate_required_metrics(ctx.df, resolved)

The MCP layer maps the exception to a structured required_metric_missing error code with metric + missing_entities fields, so the LLM can tell the user precisely what needs to be configured (COGS uploaded for those ASINs, MB commission configured, etc.) instead of returning a silent null cascade through margin formulas.

To make a metric required, add required: true to its YAML entry. No registration step — the validator is part of the engine, not an enricher.

Defining a metric

Two flavors:

L1 — sourced directly from a column

- key: pnl_sales
display_name: 'PNL: Sales'
description: Gross sales totals from SKU Economics.
sources:
- alias: pnl_sales
table: rpt_pnl_sku_economics
column: sales
metric_refs: null
formula: null
supported_granularities: [weekly, monthly, quarterly]
aggregation_rule: SUM
required: true
format: $
category: PNL
hierarchy_level: L1

L1 — SQL expression (computed per source row)

For per-row arithmetic that must aggregate as SUM(a*b) rather than SUM(a)*SUM(b):

sources:
- alias: pnl_landed_cogs_amt
table: rpt_pnl_sku_economics
expr: cost_of_goods_sold_per_unit * units_sold

Engine emits SUM(cost_of_goods_sold_per_unit * units_sold) in the SELECT, so per-row arithmetic is preserved before aggregation. YAML is trusted — same blast radius as Python formula: strings.

L2/L3 — computed from other metrics

- key: pnl_pre_ad_margin
sources: null
metric_refs: [pnl_sales, pnl_landed_cogs_amt, pnl_referral_fee_amt, pnl_fba_fee, pnl_mb_commission_amt]
formula: pnl_sales - pnl_landed_cogs_amt - pnl_referral_fee_amt - pnl_fba_fee - pnl_mb_commission_amt
aggregation_rule: Recalculate

Recalculate means: SUM the source components first during aggregation, then re-evaluate the formula. Use it for any ratio/total whose pieces must be summed before division (ACoS, margin %, etc.).

Computed by enricher (no formula, no source)

- key: pnl_mb_commission_amt
sources: null
metric_refs: [pnl_sales] # forces rpt_pnl_sku_economics fetch
formula: null
enricher: mb_commission # ← Python-implemented
aggregation_rule: SUM
required: true

See Enrichers for how this is wired.

Field reference

FieldPurpose
keyUnique identifier; what the LLM passes in metrics: [...]
display_name, descriptionLLM-facing
sourcesList of {alias, table, column} or {alias, table, expr}
metric_refsTransitive deps — pulled into the plan even if not user-requested
formulaPython expression evaluated on the polars DF (column refs by alias)
enricherName of a registered Python enricher that fills this metric
aggregation_ruleSUM / Average / Recalculate / null
requiredIf true, the engine raises RequiredMetricMissingError on null source data — see Required-metric validation
supported_granularitiesWhich granularity query values are valid
format$ / % / int — output formatting
category, hierarchy_levelBrowsing/grouping in list_metrics

Defining a dimension

- key: sku
display_name: SKU
description: Seller-assigned SKU. Column name varies per report — engine resolves the right one for each table.
table_column_map:
- {table: rpt_pnl_sku_economics, column: msku}
- {table: rpt_restock_inventory, column: msku}
- {table: rpt_br_detail_page_sales_traffic, column: sku}
- {table: rpt_all_listings, column: seller_sku}
- {table: rpt_sponsored_products_advertised_product, column: advertised_sku}
data_type: string
category: Product

One dim, N tables. The planner picks the right column per table.

Three resolution paths (in order)

  1. Explicit YAMLtable_column_map entry matches a queried table.
  2. Enricher — dim is registered with an Enricher subclass that adds the column post-fetch (e.g. parent_asin looked up via new_asins).
  3. Native column fallback — dim key is not in YAML at all, but a queried table has a column with that exact name. Engine accepts it implicitly. Example: dimensions=["seller_id"] works on every rpt_* table without a YAML entry.

If none match, the planner raises DimensionTableMismatchError with the queried table list.

Classification dimensions (derived from a formula)

- key: campaign_type
description: Auto vs Manual based on targeting type.
table_column_map:
- {table: rpt_sponsored_products_targeting, column: targeting}
classification_formula: "'Auto' if targeting in ('close-match','loose-match','substitutes','complements') else 'Manual'"

Aggregator evaluates the formula on the raw column post-fetch, then re-aggregates by the derived value.

Query flow

QueryRequest(
metrics=["pnl_sales", "pnl_pre_ad_margin", "pnl_pre_ad_margin_pct"],
dimensions=["sku"],
filters={"child_asin": ["B0X...", "B0Y..."]},
granularity="monthly",
date_range_start="2026-04-01", date_range_end="2026-04-30",
seller_id="AGBDRQ8IA12WY",
marketplace="US",
)

The MCP wrapper validates seller authorization, parses the date range (date strings, never inferred), and hands off to QueryExecutor.execute(request). Result envelope:

{
"results": {"format": "csv", "schema": {...}, "data": "<csv>"},
"trace": {
"tables_queried": "...",
"sql_queries": {csv-envelope},
"rows_scanned": {csv-envelope},
"aggregation_rules": {csv-envelope},
"formulas_applied": {csv-envelope},
"warnings": "pipe-joined"
}
}

Conservation of mass

Every dimension grain should return the same total for additive metrics. Quick sanity check when adding metrics or enrichers:

dimensions=rowstotal pnl_sales
[]112,809.59
[seller_id]112,809.59
[parent_asin]512,809.59
[child_asin]912,809.59
[sku]1512,809.59

If a new metric breaks this, suspect the aggregator: the enricher is probably running post-collapse instead of pre-collapse, or a formula is summing a non-additive value. For a deeper walkthrough of how enrichers preserve this property — including a worked example through SKU/ASIN/seller grain — see Enrichers — How dimensionality is handled.

Adding a metric — checklist

  1. Pick the right hierarchy level (L1 source-backed / L2 formula / L3 with ad spend).
  2. Decide aggregation rule: raw SUM for L1 totals, Recalculate for L2/L3 ratios.
  3. Mark required: true if downstream margins would silently break with null inputs.
  4. Add to metrics.yaml; restart the MCP server (YAML loads once at import).
  5. Smoke-test conservation across at least three grains.

Adding a dimension — checklist

  1. Identify the column name on every relevant report table.
  2. If the column name is identical across tables, consider relying on native-column fallback instead of YAML.
  3. Otherwise add a table_column_map entry per table.
  4. If the dim needs a lookup (e.g. parent ASIN from child), add an enricher and bind it with enricher: <name>.