Tushare is the first bundled source adapter. It lives under:
bagelquant_data.sources.tushare
The core framework does not import Tushare-specific code.
Installation
Install optional dependencies:
uv sync --extra tushare
Credentials
Use an environment variable:
export TUSHARE_TOKEN="..."
Or configure at runtime:
from bagelquant_data import DataLake
from bagelquant_data.sources.tushare import TushareSource
lake = DataLake.open("data")
lake.sources.register(TushareSource())
lake.sources.configure_tushare(token="...")
configure_tushare persists the token in the local lake metadata DB for future runs. Tokens are not included in repr output or source listings and should not be committed.
Register Dataset Specs
Bundled specs live in:
src/bagelquant_data/sources/tushare/datasets/
Register examples:
lake.datasets.add_from_yaml(
"src/bagelquant_data/sources/tushare/datasets/daily.yaml"
)
lake.datasets.add_from_yaml(
"src/bagelquant_data/sources/tushare/datasets/income.yaml"
)
Initial Dataset Set
Reference:
stock_basictrade_cal
Market:
dailydaily_basicadj_factor
Financial statements:
incomebalancesheetcashflow
Financial events:
forecastexpress
Canonical Time Mapping
Market datasets:
daily: asset_id = ts_code, time = trade_date
daily_basic: asset_id = ts_code, time = trade_date
adj_factor: asset_id = ts_code, time = trade_date
Financial statement datasets:
income: asset_id = ts_code, time = f_ann_date, period = end_date
balancesheet: asset_id = ts_code, time = f_ann_date, period = end_date
cashflow: asset_id = ts_code, time = f_ann_date, period = end_date
Financial event datasets:
forecast: asset_id = ts_code, time = ann_date, period = end_date
express: asset_id = ts_code, time = ann_date, period = end_date
Original source columns are preserved where possible.
Updating Tushare Data
lake.update.dataset("daily", source="tushare")
lake.update.datasets(
["daily", "daily_basic", "adj_factor"],
source="tushare",
)
lake.update.dataset(
"income",
source="tushare",
assets=["000001.SZ", "600000.SH"],
start="2020-01-01",
end="2026-06-15",
)
Financial statement/event datasets call Tushare once per asset because the API requires ts_code. If assets is omitted, the updater derives the universe from stock_basic, so update/register stock_basic first.
Querying Tushare Data
close = lake.query.field(
"daily",
"close",
source="tushare",
collect=True,
)
income = lake.query.raw(
"income",
source="tushare",
columns=["asset_id", "time", "period", "n_income_attr_p"],
)
Use lake.finance for point-in-time financial transformations.