Internal Documentation

This page describes implementation boundaries for maintainers and extension authors.

Dependency Direction

bagelquant-data sits below the rest of the ecosystem. It must not import bagelquant-core, bagelquant-bt, bagelquant-app, or the website.

The handoff to bagelquant-core is plain data: RetrievedPanel exposes a DataFrame, sorted calendar, and universe. Downstream code constructs Domain and Panel objects itself.

Storage Layout

Local V1 storage is partitioned by source, table, year, and month:

lake-root/
  tushare/
    daily/
      year=2024/
        month=01/
          snapshots/

JSON catalogs track source-level ids, table metadata, latest snapshot pointers, and inferred panel fields. Reads use partition metadata to skip snapshots outside the requested date range, then apply exact filtering after loading.

Update Flow

Normal provider refresh:

DataRequest -> DataSource.read -> DataLakeManager.update -> LocalDataLake.write

Tushare production-style refresh:

reference refresh -> scan_tushare_updates -> review report -> execute jobs

The report is the review boundary. Execution should consume the confirmed jobs from that report instead of rebuilding plans implicitly.

Module Structure

  • datasource: provider adapters, request objects, and registry.
  • lake: snapshot storage, catalogs, direct reads, and update orchestration.
  • loader: lake-first retrieval and panel-shaped return objects.
  • metadata: schemas, contracts, identities, and lineage.
  • transform: stateless DataFrame transformation pipelines.
  • cache: optional cache policies and implementations.
  • config: environment and profile settings.

Failure Handling

Provider and lake errors should include the source, dataset, requested date range, and operation whenever possible. Missing optional provider dependencies should fail at adapter construction or first provider use, not during package import.