Benchmarks are intentionally lightweight and reproducible:

uv run python scripts/benchmark_efficiency.py --rows 100000 --assets 500 --repeats 1

Current local baseline after the rolling-rank optimization:

rolling_rank best=0.3035s
runtime cache hit best=0.0001s

The previous local baseline for rolling_rank was about 1.59s on the same 100k-row synthetic panel. The improvement comes from replacing per-window Python callbacks with a per-asset NumPy pass for rolling_rank and rolling_percentile.