Factor Models Construction Process
This project aims to build a stock scoring and backtesting system based on classical factor investing techniques. The goal is to start simple, generate interpretable results, and gradually expand into more sophisticated methods.
Project Settings
- Universe: All historical constituents of CSI 300, CSI 500, and CSI 1000.
- Date Range: 2000-01-01 to 2025-12-31
- Frequency: Monthly
- Data Source: Tushare via bagel-tushare (local MySQL database)
- Strategy Type: Long-only, holding top 20–50 stocks
Factor Library
The initial factor set will cover standard categories (all oriented so that higher = better):
- Market
- Size
- Value
- Momentum
- Quality
- Investment
- Volume
A detailed factor dictionary (definitions, formulas, and data fields) will be maintained in a separate file.
Factor Evaluation
Each factor will be tested individually using:
- Information Coefficient (IC) > 0.05
- Rank IC > 0.5
- Quantile Analysis: IC across quantile portfolios
- Correlation Check: pairwise correlation < 0.8 (to reduce redundancy)
Validation Methodology (Walk-Forward)
To better mimic real-time investing, I will use walk-forward validation rather than a one-off split.
- Timeline & windows
- Expanding window: at date $t$, fit preprocessing/weights using all data up to $t$, then test on $t \to t+1$.
- Optionally compare with a rolling window (e.g., last 60 months) to check regime sensitivity.
- At each rebalance date $t$
- Universe: only stocks investable at $t$.
- Preprocessing (fit on train only):
- Winsorize (1%/99%), z-score by date, optional size/industry neutralization.
- IC testing: compute cross-sectional IC/RankIC between factor at $t$ and future return $t \to t+1$.
- Scoring & portfolio: form factor scores using equal/IC/ICIR weights calibrated on the train window only.
- Aggregation & inference (OOS only)
- Time series of OOS ${\text{IC}_t}$. Report mean IC, ICIR, hit rate, and HAC (Newey–West) t-stat.
- Portfolio OOS performance: CAGR, vol, Sharpe, maxDD, turnover, and cost-adjusted returns.
- Acceptance bar (OOS, monthly)
- Mean RankIC > 0 with HAC t-stat ≥ 2
- ICIR ≥ 0.3
- Hit rate ≥ 55%
- Factor correlation with others < 0.8
- Practical cautions
- Avoid overlapping returns for multi-month horizons (or use larger HAC lags).
- Keep preprocessing parameters fixed within each train window.
- Always log data availability dates to prevent look-ahead.
Stock Scoring Framework
Scores will be calculated as a linear combination of selected factors. Initial methods to test:
- Equal weights
- IC-weighted
- ICIR-weighted
Portfolio Construction
- Rebalance: Score at month-end, trade at next month’s open
- Holding Sizes: Test portfolios of 20, 30, 40, and 50 stocks
- Weighting: Equal-weighted portfolios to start
- Transaction Costs: Apply simple cost models (e.g., 10–20 bps per trade side) to adjust returns. Report turnover, cost-adjusted performance, and sensitivity to different cost assumptions.
Future improvements: test alternative rebalancing frequencies, sector neutrality, and more sophisticated transaction cost modeling (e.g., volume-based slippage or market impact).
Roadmap
- Phase 1: Build factor library and run baseline equal-weight backtests.
- Phase 2: Add IC/ICIR weighting, factor correlation pruning, and walk-forward validation.
- Phase 3: Expand portfolio construction with turnover control, transaction costs, and risk adjustments.
✅ Deliverable: For each rebalance date, generate factor scores, stock rankings, and portfolio backtest results — published as a series of blog posts tracking progress.
Comments