Transformer

Overview

A transformer is a unary function-style operation:

Panel | Graph -> Graph

For signatures, parameter descriptions, and examples for every public operation, see the transformer reference.

Built-In Transformers

from bagelquant_core.transformer import (
    rank,
    rolling_mean,
    signed_log1p,
    winsorize,
    zscore,
)

factor = rank(zscore(winsorize(raw_factor)), name="factor")
smoothed = rolling_mean(factor, window=20, name="smoothed")
compressed = signed_log1p(smoothed, name="compressed")

Built-ins are grouped by behavior:

Family Transformers
Basic identity, abs_value, negate, diff, pct_change
Missing values fillna, fillna_zero, ffill, bfill
Replacement replace_non_nan, non_nan_to_one, non_nan_to_zero
Rolling rolling_mean, rolling_std, rolling_min, rolling_max, rolling_sum, ewm_mean, ewm_std, ewm_var
Power power, signed_power, sqrt
Logarithmic log, log1p, signed_log1p
Normalization rank, zscore, winsorize, min_max_scale
Category category_demean, category_mean, category_rank, category_zscore
General nonnans, notnan, denoise, posonly, negonly, lag, delta, rate_of_change, remove_repeated, date_age_constraint, constant, replace_inf
Translation demean, translate_to_pos
Rank rankpct, nrank, logrank
Outliers truncate, trim, trim_quantile
Variance stabilization boxcox, anscombe, freeman, fisher
Trigonometric sin, cos, arcsin, arccos, trig, arctanh, arctan
Kelly criterion kelly, kelly_nonan_standardize, kelly_rank_boxcox, kelly_rescaling_weight

Basic

Basic operations are element-wise or run over rows, which represent time:

Transformer Behavior
identity(source) Return input values unchanged.
abs_value(source) Return absolute values.
negate(source) Negate values.
diff(source, periods=1) Calculate differences over time.
pct_change(source, periods=1) Calculate fractional changes over time, such as returns from a price panel.

Missing Values

Missing-value operations preserve the panel shape:

Transformer Behavior
fillna(source, value=0) Fill NaN values with a numeric scalar.
fillna_zero(source) Fill NaN values with zero.
ffill(source, limit=None) Forward-fill over time.
bfill(source, limit=None) Backward-fill over time.

ffill and bfill accept an optional positive limit.

Replacement

Replacement operations preserve missing values and replace existing values:

Transformer Behavior
replace_non_nan(source, value=...) Replace each non-NaN value with a numeric scalar.
non_nan_to_one(source) Replace each non-NaN value with one.
non_nan_to_zero(source) Replace each non-NaN value with zero.

These operations are useful for availability masks and constant exposures.

Rolling

Rolling operations run over rows, which represent time:

Transformer Behavior
rolling_mean(source, window, min_periods=None) Rolling arithmetic mean.
rolling_std(source, window, min_periods=None, ddof=1) Rolling standard deviation.
rolling_min(source, window, min_periods=None) Rolling minimum.
rolling_max(source, window, min_periods=None) Rolling maximum.
rolling_sum(source, window, min_periods=None) Rolling sum.
ewm_mean(source, ...) Pandas exponentially weighted mean.
ewm_std(source, ...) Pandas exponentially weighted standard deviation.
ewm_var(source, ...) Pandas exponentially weighted variance.

EWM operations follow pandas semantics and require exactly one decay argument: com, span, halflife, or alpha. They also accept min_periods, adjust, and ignore_na. ewm_std and ewm_var additionally accept bias.

Power

Transformer Behavior
power(source, exponent) Raise values to an exponent.
signed_power(source, exponent) Raise absolute values to an exponent while preserving signs.
sqrt(source) Calculate square roots, returning NaN for negative values.

Logarithmic

Transformer Behavior
log(source) Calculate natural logarithms, returning NaN for non-positive values.
log1p(source) Calculate log(1 + value), returning NaN for values at or below -1.
signed_log1p(source) Calculate sign(value) * log(1 + abs(value)).

Normalization

Normalization operations run across columns, which represent assets:

Transformer Behavior
rank(source) Calculate percentile ranks for each row.
zscore(source) Calculate z-scores for each row.
winsorize(source, lower=0.01, upper=0.99) Clip each row to its quantile bounds.
min_max_scale(source) Scale each row to [0, 1].
normalize(source) Scale each row to [-1, 1].
net_scale(source) Scale positive and negative values independently by their row sums.

Constant rows produce NaN values where normalization is undefined.

Extended Rolling Operations

The rolling family also includes rolling_var, rolling_skew, rolling_kurt, rolling_median, rolling_rank, rolling_percentile, and rolling_zscore. rolling_ewm and rolling_ew_std are half-life-compatible aliases for the general EWM operations, while rolling_ewm_fw exposes expanding exponentially weighted means.

Category

Category operations accept a numeric source and a matching CategoryPanel. The category panel may contain strings such as industry, sector, or country labels:

import pandas as pd

from bagelquant_core import CategoryPanel
from bagelquant_core.transformer import category_demean, category_rank

industry = CategoryPanel.from_domain(
    pd.DataFrame(...),
    domain,
    name="industry",
)

industry_neutral = category_demean(raw_factor, industry)
industry_ranked = category_rank(raw_factor, industry)
Transformer Behavior
category_demean(source, categories) Subtract each category mean within each row.
category_mean(source, categories) Replace values with their category mean within each row.
category_rank(source, categories) Calculate percentile ranks within each category and row.
category_zscore(source, categories) Calculate z-scores within each category and row.

Although category operations are exported with transformers, they consume two aligned inputs and are represented internally as multi-input graph nodes.

User-Defined Transformers

import pandas as pd

from bagelquant_core.transformer import transformer


@transformer
def demean(frame: pd.DataFrame) -> pd.DataFrame:
    return frame.sub(frame.mean(axis=1), axis=0)


centered = demean(price, name="centered")

The decorated function receives a DataFrame during execution but accepts a Panel or Graph when researchers construct a workflow.

Configuration arguments are stored in graph specifications and cache keys.