MarketFlows technical notes (MVP)

This file is meant for repo readers who want the plumbing: what runs where, what gets named, and how range derivatives work.

Pipeline

Entry point:

marketflows.cli:main()
- parses --config, --secrets, --out-dir, --log-level, --tutorial
- calls configure_logging(...)
- calls app.run_pipeline(...)

Orchestrator:

All metric columns follow the same shape:

<group>_by_<base_asset> for normalized market cap (or normalized bucket value)
optional _ema<N> (EMA span) when EMA is applied
optional _growth, _inflection, or _deriv<K> for derivatives
optional _unit for per-timestep unit normalization

marketflows._helpers.name_column(...) is the single place that generates names.

analysis.aggregates.create_master_df(...):

creates a datetime index from (min_timestamp, max_timestamp, freq)
for each asset, converts chart timestamps to datetime, joins onto master index, and interpolates with method="time" (limit_direction="both")

This gives a single aligned DataFrame: df_master[asset_id] -> market cap.

analysis.metrics.calculate_group_metrics(...):

normalizes each group series by a base asset (or passes through for us-dollar)
normalizes each series by its first valid record time (shared first-valid time)
applies EMA(s) when configured
computes derivatives (growth / inflection) and applies smoothing EMA on derivatives
optionally adds unit-normalized columns per timestep

Cohort selection (provider side):

providers.coingecko._read_mcs_above_limit(min_limit) builds the cohort once at startup: all coins with current market cap above min_limit.

Bucketing (analysis side):

analysis.aggregates.prepare_cap_ranges(...) creates a long DataFrame: (Datetime index, asset, market_caps, lower_limit)
lower_limit is assigned per row as the largest threshold that market_caps exceeds.

Aggregation:

analysis.aggregates.aggregate_cap_ranges(...) groups by (Datetime, lower_limit) and sums market caps.

Derivatives:

Growth and inflection use shifted bucket membership per asset:
- growth uses membership at (t-1) vs totals at t
- inflection uses membership at (t-2, t-1, t)

This is the MVP approach to avoid “range drift” artifacts when assets cross thresholds.

marketflows/tutorial/data.py loads:

Tutorial mode exists to prove installation + end-to-end outputs without needing API keys.