BagelQuant Data

bagelquant-data is a local Parquet and SQLite data lake for quantitative research. Its public API has three facades: lake.admin, lake.update, and lake.query.

Read the guides in order: overview, quickstart, datasets, sources, updates, queries, and operations.

import polars as pl
from bagelquant_data import DataLake, DatasetSpec

lake = DataLake.open("data")
spec = DatasetSpec(
    "daily",
    "by_daily",
    calendar="trade_cal",
    field_mappings={"trade_date": "time", "ts_code": "asset_id"},
)
lake.ingest(spec, pl.DataFrame({"trade_date": ["20250102"], "ts_code": ["000001.SZ"], "close": [11.25]}))
print(lake.query.query("daily", source="custom", fields=["time", "asset_id", "close"]).collect())

general datasets replace one file and do not require canonical key fields. by_daily and by_asset datasets derive the key (time, asset_id) and must explicitly map provider fields to those names; add primary_key_extra when another field, such as period, is also unique.

Incremental completeness is owned by the lake’s update_scopes ledger. by_daily records one scope per open date and request variant. by_asset records one scope per asset and request variant, with a checked_through watermark independent from the latest returned record. Provider work is marked successful only after its canonical Parquet commit succeeds.

uv run pytest
uv run pyright
uv run ruff check .