About
PyAnomaly is a comprehensive python library for asset pricing research with a focus on firm characteristic and factor generation. It covers the majority of the firm characteristics published in the literature and contains various analytic tools that are commonly used in asset pricing research, such as quantile portfolio construction, factor regression, and cross-sectional regression. The purpose of PyAnomaly is NOT to generate firm characteristics in a fixed manner. Rather, we aim to build a package that can serve as a standard library for asset pricing research and help reduce non-standard errors[5].
The current list of firm characteristics supported by PyAnomaly can be found in Coverage. PyAnomaly is a live project and we plan to add more firm characteristics and functionalities going forward. We also welcome contributions from other scholars.
PyAnomaly is very efficient, comprehensive, and flexible.
- Efficiency
PyAnomaly can generate over 200 characteristics from 1950 in around one hour including the time to download data from WRDS. To achieve this, PyAnomaly utilizes numba, multiprocessing, and asyncio packages when possible, but not too heavily to maximize readability of the code.
- Comprehensiveness
PyAnomaly supports over 200 firm characteristics published in the literature. It covers most characteristics in Green et al. (2017)[2] and Jensen et al. (2021)[4], except those that use IBES data. It also provides various tools for asset pricing research.
- Flexibility
PyAnomaly adopts the object-oriented programming design philosophy and is easy to customize or add functionalities. This means users can easily change the definition of an existing characteristic, add a new characteristic, or change configurations to run the program. For instance, a user can choose whether to update annual accounting variables quarterly (using Compustat.fundq) or annually (using Compustat.funda), or whether to use the latest market equity or the year-end market equity, when generating firm characteristics.
Main Features
Efficient data download from WRDS using asynco.
Over 200 firm characteristics generation. You can choose which firm characteristics to generate.
Fama-French 3-factor and Hou-Xue-Zhang 4-factor portfolios.
Analytics
Cross-section regression
1-D sort
2-D sort
Rolling regression
Quantile portfolio
Long-short portfolio
Portfolio performance analysis
Data tools
Data filtering
Winsorizing
Trimming
Data population
Coverage
Markets
PyAnomaly currently supports analysis of the firms listed in the US stock market.
Firm Characteristics
The table below lists firm characteristics that are currently supported by PyAnomaly. The characteristics without a function are not yet available but may be added in the future. For a mapping between the functions and the firm characteristics in Chen and Zimmermann (2020)[1], Green et al. (2017)[2], Hou et al. (2020)[3], and Jensen et al. (2021)[4], refer to the mapping file,
Description |
Author(s) |
Year |
Journal |
Function |
|
|---|---|---|---|---|---|
1 |
Effective Tax Rate |
Abarbanell and Bushee |
1998 |
AR |
|
2 |
Gross margin growth to sales growth |
Abarbanell and Bushee |
1998 |
AR |
dgp_dsale |
3 |
Industry-adjusted change in capital investment |
Abarbanell and Bushee |
1998 |
AR |
pchcapx_ia |
4 |
Labor force efficiency |
Abarbanell and Bushee |
1998 |
AR |
sale_emp_gr1 |
5 |
Sales growth to inventory growth |
Abarbanell and Bushee |
1998 |
AR |
dsale_dinv |
6 |
Sales growth to receivable growth |
Abarbanell and Bushee |
1998 |
AR |
dsale_drec |
7 |
Sales growth to SG&A growth |
Abarbanell and Bushee |
1998 |
AR |
dsale_dsga |
8 |
Liquidity beta (illiquidity-illiquidity) |
Acharya and Pedersen |
2005 |
JFE |
|
9 |
Liquidity beta (illiquidity-return) |
Acharya and Pedersen |
2005 |
JFE |
|
10 |
Liquidity beta (return-illiquidity) |
Acharya and Pedersen |
2005 |
JFE |
|
11 |
Liquidity beta (return-return) |
Acharya and Pedersen |
2005 |
JFE |
|
12 |
Net liquidity beta |
Acharya and Pedersen |
2005 |
JFE |
|
13 |
Leverage beta |
Adrian, Etula and Muir |
2014 |
JF |
|
14 |
Idiosyncratic volatility (GHZ) |
Ali, Hwang, and Trombley |
2003 |
JFE |
idiovol |
15 |
Idiosyncratic volatility (Org, JKP) |
Ali, Hwang, and Trombley |
2003 |
JFE |
ivol_capm_252d |
16 |
Illiquidity |
Amihud |
2002 |
JFM |
ami_126d |
17 |
Bid-ask spread |
Amihud and Mendelson |
1986 |
JFE |
baspread |
18 |
Three-year investment growth |
Anderson and Garcia-Feijoo |
2006 |
JF |
capx_gr3 |
19 |
Two-year investment growth |
Anderson and Garcia-Feijoo |
2006 |
JF |
capx_gr2 |
20 |
Dispersion in analyst long-term growth forecasts |
Anderson, Ghysels, and Juergens |
2005 |
RFS |
|
21 |
Idiosyncratic volatility (CAPM) |
Ang et al. |
2006 |
JF |
ivol_capm_21d |
22 |
Idiosyncratic volatility (FF3) |
Ang et al. |
2006 |
JF |
ivol_ff3_21d |
23 |
Idiosyncratic volatility (q-factor) |
Ang et al. |
2006 |
JF |
ivol_hxz4_21d |
24 |
Return volatility |
Ang et al. |
2006 |
JF |
retvol |
25 |
Systematic volatility |
Ang et al. |
2006 |
JF |
|
26 |
Downside beta |
Ang, Chen, and Xing |
2006 |
RFS |
betadown_252d |
27 |
Industry-adjusted book-to-market |
Asness, Porter, and Stevens |
2000 |
WP |
bm_ia |
28 |
Industry-adjusted cash flow-to-price |
Asness, Porter, and Stevens |
2000 |
WP |
cfp_ia |
29 |
Industry-adjusted change in employees |
Asness, Porter, and Stevens |
2000 |
WP |
chempia |
30 |
Industry-adjusted firm size |
Asness, Porter, and Stevens |
2000 |
WP |
mve_ia |
31 |
Book-to-market (June ME) |
Asnesss and Frazzini |
2013 |
JPM |
|
32 |
Highest 5 days of return to volatility |
Assness et al. |
2020 |
JFE |
rmax5_rvol_21d |
33 |
Market correlation |
Assness et al. |
2020 |
JFE |
corr_1260d |
34 |
Quality minus Junk: Composite |
Assness, Frazzini, and Pedersen |
2018 |
RAS |
qmj |
35 |
Quality minus Junk: Growth |
Assness, Frazzini, and Pedersen |
2018 |
RAS |
qmj_growth |
36 |
Quality minus Junk: Profitability |
Assness, Frazzini, and Pedersen |
2018 |
RAS |
qmj_prof |
37 |
Quality minus Junk: Safety |
Assness, Frazzini, and Pedersen |
2018 |
RAS |
qmj_safety |
38 |
Change in quarterly return on assets |
Balakrishnan, Bartov, and Faurel |
2010 |
JAE |
niq_at_chg1 |
39 |
Change in quarterly return on equity |
Balakrishnan, Bartov, and Faurel |
2010 |
JAE |
niq_be_chg1 |
40 |
Quarterly return on assets |
Balakrishnan, Bartov, and Faurel |
2010 |
JAE |
niq_at |
41 |
Highest 5 days of return |
Bali, Brown, and Tang |
2017 |
JFE |
rmax5_21d |
42 |
Maximum daily return |
Bali, Cakici, and Whitelaw |
2011 |
JFE |
rmax1_21d |
43 |
Idiosyncratic skewness (CAPM) |
Bali, Engle, and Murray |
2016 |
BOOK |
iskew_capm_21d |
44 |
Idiosyncratic skewness (FF3) |
Bali, Engle, and Murray |
2016 |
BOOK |
iskew_ff3_21d |
45 |
Idiosyncratic skewness (q-factor) |
Bali, Engle, and Murray |
2016 |
BOOK |
iskew_hxz4_21d |
46 |
Return skewness |
Bali, Engle, and Murray |
2016 |
BOOK |
rskew_21d |
47 |
Cash-based operating profitablility |
Ball et al. |
2016 |
JFE |
cop_at |
48 |
Cash-based operating profits to lagged assets |
Ball et al. |
2016 |
JFE |
cop_atl1 |
49 |
Cash-based operating profits to lagged assets (quarterly) |
Ball et al. |
2016 |
JFE |
|
50 |
Operating profits-to-assets |
Ball et al. |
2016 |
JFE |
op_at |
51 |
Operating profits-to-lagged assets |
Ball et al. |
2016 |
JFE |
op_atl1 |
52 |
Operating profits-to-lagged assets (quarterly) |
Ball et al. |
2016 |
JFE |
|
53 |
Absolute accruals |
Bandyopadhyay, Huang, and Wirjanto |
2010 |
WP |
absacc |
54 |
Accrual volatility |
Bandyopadhyay, Huang, and Wirjanto |
2010 |
WP |
stdacc |
55 |
Market equity |
Banz |
1981 |
JFE |
market_equity |
56 |
Sales to price |
Barbee, Mukherji, and Raines |
1996 |
FAJ |
sale_me |
57 |
Sales to price (quarterly) |
Barbee, Mukherji, and Raines |
1996 |
FAJ |
|
58 |
Number of consecutive quarters with earnings increases |
Barth, Elliott, and Finn |
1999 |
JAR |
ni_inc8q |
59 |
Earnings to price |
Basu |
1983 |
JFE |
ni_me |
60 |
Earnings to price (quarterly) |
Basu |
1983 |
JFE |
|
61 |
Forecasted growth in 5-year EPS |
Bauman and Dowen |
1988 |
FAJ |
|
62 |
Inventory growth |
Belo and Lin |
2012 |
RFS |
inv_gr1 |
63 |
Brand capital to assets |
Belo, Lin and Vitorino |
2014 |
RED |
|
64 |
Employment growth |
Belo, Lin, and Bazdresch |
2014 |
JPE |
emp_gr1 |
65 |
Debt to market |
Bhandari |
1988 |
JFE |
debt_me |
66 |
Debt to market (quarterly) |
Bhandari |
1988 |
JFE |
|
67 |
12 month residual momentum |
Blitz, Huij, and Martens |
2011 |
JEF |
resff3_12_1 |
68 |
6 month residual momentum |
Blitz, Huij, and Martens |
2011 |
JEF |
resff3_6_1 |
69 |
Change in operating cash flow to assets |
Bouchard et al. |
2019 |
JF |
ocf_at_chg1 |
70 |
Operating cash flow to assets |
Bouchard et al. |
2019 |
JF |
ocf_at |
71 |
Net payout yield |
Boudoukh et al. |
2007 |
JF |
eqnpo_me |
72 |
Net payout yield (quarterly) |
Boudoukh et al. |
2007 |
JF |
|
73 |
Payout yield |
Boudoukh et al. |
2007 |
JF |
eqpo_me |
74 |
Payout yield (quarterly) |
Boudoukh et al. |
2007 |
JF |
|
75 |
Net debt finance |
Bradshaw, Richardson, and Sloan |
2006 |
JAE |
dbnetis_at |
76 |
Net equity finance |
Bradshaw, Richardson, and Sloan |
2006 |
JAE |
eqnetis_at |
77 |
Net external finance |
Bradshaw, Richardson, and Sloan |
2006 |
JAE |
netis_at |
78 |
Dollar trading volume (JKP) |
Brennan, Chordia, and Subrahmanyam |
1998 |
JFE |
dolvol_126d |
79 |
Dollar trading volume (Org, GHZ) |
Brennan, Chordia, and Subrahmanyam |
1998 |
JFE |
dolvol |
80 |
Return on invested capital |
Brown and Rowe |
2007 |
WP |
roic |
81 |
Failure probability, monthly |
Campbell, Hilscher, and Szilagyi |
2008 |
JF |
|
82 |
Failure probaility |
Campbell, Hilscher, and Szilagyi |
2008 |
JF |
|
83 |
Earnings announcement return (Chan et al.) |
Chan, Jegadeesh, and Lakonishok |
1996 |
JF |
|
84 |
Advertising expense to market |
Chan, Lakonishok, and Sougiannis |
2001 |
JF |
|
85 |
R&D to market |
Chan, Lakonishok, and Sougiannis |
2001 |
JF |
rd_me |
86 |
R&D to market (quarterly) |
Chan, Lakonishok, and Sougiannis |
2001 |
JF |
|
87 |
R&D to sales |
Chan, Lakonishok, and Sougiannis |
2001 |
JF |
rd_sale |
88 |
R&D to sales (quarterly) |
Chan, Lakonishok, and Sougiannis |
2001 |
JF |
|
89 |
Cash productivity |
Chandrashekar and Rao |
2009 |
WP |
cashpr |
90 |
CAPEX and inventory |
Chen and Zhang |
2010 |
JF |
invest |
91 |
Volatility of dollar trading volume (GHZ) |
Chordia, Subrahmanyam, and Anshuman |
2001 |
JFE |
std_dolvol |
92 |
Volatility of dollar trading volume (JKP) |
Chordia, Subrahmanyam, and Anshuman |
2001 |
JFE |
dolvol_var_126d |
93 |
Volatility of share turnover (GHZ) |
Chordia, Subrahmanyam, and Anshuman |
2001 |
JFE |
std_turn |
94 |
Volatility of share turnover (JKP) |
Chordia, Subrahmanyam, and Anshuman |
2001 |
JFE |
turnover_var_126d |
95 |
Customer momentum |
Cohen and Frazzini |
2008 |
JF |
|
96 |
Segment momentum |
Cohen and Lou |
2012 |
JFE |
|
97 |
Asset growth |
Cooper, Gulen, and Schill |
2008 |
JF |
at_gr1 |
98 |
Asset growth (quarterly) |
Cooper, Gulen, and Schill |
2008 |
JF |
|
99 |
High-low bid-ask spread |
Corwin and Schultz |
2012 |
JF |
bidaskhl_21d |
100 |
Disparing between long- and short-term earnings growth forecasts |
Da and Warachka |
2011 |
JFE |
|
101 |
Composite equity issuance (Org) |
Daniel and Titman |
2006 |
JF |
eqnpo_60m |
102 |
Composite equity issuance (JKP, 12 months) |
Daniel and Titman |
2006 |
JF |
eqnpo_12m |
103 |
Intangible return |
Daniel and Titman |
2006 |
JF |
|
104 |
Share turnover (JKP) |
Datar, Naik, and Radcliffe |
1998 |
JFM |
turnover_126d |
105 |
Share turnover (Org, GHZ) |
Datar, Naik, and Radcliffe |
1998 |
JFM |
turn |
106 |
Long-term reversal (12-36) |
De Bondt and Thaler |
1985 |
JF |
ret_36_12 |
107 |
Long-term reversal (12-60) |
De Bondt and Thaler |
1985 |
JF |
ret_60_12 |
108 |
Equity duration |
Dechow, Sloan, and Soliman |
2004 |
RAS |
eq_dur |
109 |
Operating Cash flows to price (JKP) |
Desai, Rajgopal, and Venkatachalam |
2004 |
AR |
ocf_me |
110 |
Operating Cash flows to price (Org, GHZ) |
Desai, Rajgopal, and Venkatachalam |
2004 |
AR |
cfp |
111 |
Operating Cash flows to price (quarterly) |
Desai, Rajgopal, and Venkatachalam |
2004 |
AR |
|
112 |
Altman Z-score |
Dichev |
1998 |
JF |
z_score |
113 |
Altman Z-score (quarterly) |
Dichev |
1998 |
JF |
|
114 |
Ohlson O-score |
Dichev |
1998 |
JF |
o_score |
115 |
Ohlson O-Score (quarterly) |
Dichev |
1998 |
JF |
|
116 |
Credit Rating Downgrade |
Dichev and Piotroski |
2001 |
JF |
|
117 |
Dispersion in analysts’ earnings forecasts |
Diether, Malloy, and Scherbina |
2002 |
JF |
|
118 |
Dimson Beta |
Dimson |
1979 |
JFE |
beta_dimson_21d |
119 |
Probability of informed trading |
Easley, Hvidkjaer, and O’Hara |
2002 |
JF |
|
120 |
Unexpected R&D increase |
Eberhart, Maxwell, and Siddique |
2004 |
JF |
rd |
121 |
Industry-adjusted organizational capital |
Eisfeldt and Papanikolaou |
2013 |
JF |
|
122 |
Organization capital/assets |
Eisfeldt and Papanikolaou |
2013 |
JF |
|
123 |
Analysts coverage |
Elgers, Lo, and Pfeiffer |
2001 |
AR |
|
124 |
Analysts’ earnings forecast-to-price |
Elgers, Lo, and Pfeiffer |
2001 |
AR |
|
125 |
Change in long-term net operating assets |
Fairfield, Whisenant, and Yohn |
2003 |
AR |
lnoa_gr1a |
126 |
Assets-to-market |
Fama and French |
1992 |
JF |
at_me |
127 |
Assets-to-market (quarterly) |
Fama and French |
1992 |
JF |
|
128 |
Book leverage |
Fama and French |
1992 |
JF |
at_be |
129 |
Book leverage (quarterly) |
Fama and French |
1992 |
JF |
|
130 |
Operating profits to book equity (JKP) |
Fama and French |
2015 |
JFE |
ope_be |
131 |
Operating profits to book equity (GHZ, Org) |
Fama and French |
2015 |
JFE |
operprof |
132 |
Operating profits to lagged book equity |
Fama and French |
2015 |
JFE |
ope_bel1 |
133 |
Operating profits to lagged equity (quarterly) |
Fama and French |
2015 |
JFE |
|
134 |
Beta squared (GHZ) |
Fama and MacBeth |
1973 |
JPE |
betasq |
135 |
Market beta (GHZ) |
Fama and MacBeth |
1973 |
JPE |
beta |
136 |
Market beta (Org, JKP) |
Fama and MacBeth |
1973 |
JPE |
beta_60m |
137 |
Earnings surprise |
Foster, Olsen, and Shevlin |
1984 |
AR |
niq_su |
138 |
Earnings conservatism |
Francis et al. |
2004 |
AR |
|
139 |
Earnings persistence |
Francis et al. |
2004 |
AR |
ni_ar1 |
140 |
Earnings predictability |
Francis et al. |
2004 |
AR |
ni_ivol |
141 |
Earnings smoothness |
Francis et al. |
2004 |
AR |
earnings_variability |
142 |
Earnings timeliness |
Francis et al. |
2004 |
AR |
|
143 |
ROA volatility |
Francis et al. |
2004 |
AR |
roavol |
144 |
Value relevance of earnings |
Francis et al. |
2004 |
AR |
|
145 |
Accrual quality |
Francis et al. |
2005 |
JAE |
|
146 |
Accrual quality (quarterly) |
Francis et al. |
2005 |
JAE |
|
147 |
Analysts optimism |
Frankel and Lee |
1998 |
JAE |
|
148 |
Analysts-based intrinsic value-to-market |
Frankel and Lee |
1998 |
JAE |
|
149 |
Intrinsic value-to-market |
Frankel and Lee |
1998 |
JAE |
intrinsic_value |
150 |
Predicted analysts focecast error |
Frankel and Lee |
1998 |
JAE |
|
151 |
Pension funding rate (scaled by market equity) |
Franzoni and Marin |
2006 |
JF |
|
152 |
Persion funding rate (scaled by assets) |
Franzoni and Marin |
2006 |
JF |
|
153 |
Frazzini-Pedersen beta |
Frazzini and Pedersen |
2014 |
JFE |
betabab_1260d |
154 |
52-week high |
George and Hwang |
2004 |
JF |
prc_highprc_252d |
155 |
Change in 6-month momentum |
Gettleman and Marks |
2006 |
WP |
chmom |
156 |
Corporate governance index |
Gompers, Ishii, and Metrick |
2003 |
QJE |
|
157 |
Percent discretionary accruals |
Hafzalla, Lundholm, and Van Winkle |
2011 |
AR |
|
158 |
Percent operating accruals (JKP) |
Hafzalla, Lundholm, and Van Winkle |
2011 |
AR |
oaccruals_ni |
159 |
Percent operating accruals (GHZ, Org) |
Hafzalla, Lundholm, and Van Winkle |
2011 |
AR |
pctacc |
160 |
Percent total accruals |
Hafzalla, Lundholm, and Van Winkle |
2011 |
AR |
taccruals_ni |
161 |
Tangibility |
Hahn and Lee |
2009 |
JF |
tangibility |
162 |
Tangibility (quarterly) |
Hahn and Lee |
2009 |
JF |
|
163 |
Trend factor |
Han, Zhou, and Zhu |
2016 |
JFE |
trend_factor |
164 |
Coskewness |
Harvey and Siddique |
2000 |
JF |
coskew_21d |
165 |
Capital turnover |
Haugen and Baker |
1996 |
JFE |
at_turnover |
166 |
Capital turnover (quarterly) |
Haugen and Baker |
1996 |
JFE |
|
167 |
Return on equity |
Haugen and Baker |
1996 |
JFE |
ni_be |
168 |
Analysts’ forecast change |
Hawkins, Chamberlin, and Daniel |
1984 |
FAJ |
|
169 |
Revisions in analyst’s’ earnings forecasts |
Hawkins, Chamberlin, and Daniel |
1984 |
FAJ |
|
170 |
Year 1-lagged return, annual |
Heston and Sadka |
2008 |
JFE |
seas_1_1an |
171 |
Year 1-lagged return, nonannual |
Heston and Sadka |
2008 |
JFE |
seas_1_1na |
172 |
Years 2-5 lagged returns, annual |
Heston and Sadka |
2008 |
JFE |
seas_2_5an |
173 |
Years 2-5 lagged returns, nonannual |
Heston and Sadka |
2008 |
JFE |
seas_2_5na |
174 |
Years 6-10 lagged returns, annual |
Heston and Sadka |
2008 |
JFE |
seas_6_10an |
175 |
Years 6-10 lagged returns, nonannual |
Heston and Sadka |
2008 |
JFE |
seas_6_10na |
176 |
Years 11-15 lagged returns, annual |
Heston and Sadka |
2008 |
JFE |
seas_11_15an |
177 |
Years 11-15 lagged returns, nonannual |
Heston and Sadka |
2008 |
JFE |
seas_11_15na |
178 |
Years 16-20 lagged returns, annual |
Heston and Sadka |
2008 |
JFE |
seas_16_20an |
179 |
Years 16-20 lagged returns, nonannual |
Heston and Sadka |
2008 |
JFE |
seas_16_20na |
180 |
Citations to R&D expenses |
Hirschleifer, Hsu, and Li |
2013 |
JFE |
|
181 |
Patents to R&D expenses |
Hirschleifer, Hsu, and Li |
2013 |
JFE |
|
182 |
Change in net operating assets |
Hirshleifer et al. |
2004 |
JAE |
noa_gr1a |
183 |
Net operating assets |
Hirshleifer et al. |
2004 |
JAE |
noa_at |
184 |
Change in depreciation to PP&E |
Holthausen and Larcker |
1992 |
JAE |
pchdepr |
185 |
Depreciation to PP&E |
Holthausen and Larcker |
1992 |
JAE |
depr |
186 |
Sin stock |
Hong and Kacperczyk |
2009 |
JFE |
sin |
187 |
Industry lead-lag effect in earnings surprises |
Hou |
2007 |
RFS |
|
188 |
Industry lead-lag effect in prior returns |
Hou |
2007 |
RFS |
|
189 |
Bid-ask spread (TAQ) |
Hou and Loh |
2016 |
JFE |
|
190 |
Price delay based on R-squared |
Hou and Moskowitz |
2005 |
RFS |
pricedelay |
191 |
Price delay based on SE-adjusted slopes |
Hou and Moskowitz |
2005 |
RFS |
|
192 |
Price delay based on slopes |
Hou and Moskowitz |
2005 |
RFS |
pricedelay_slope |
193 |
Industry concentration (book equity) |
Hou and Robinson |
2006 |
JF |
herf_be |
194 |
Industry concentration (sales) |
Hou and Robinson |
2006 |
JF |
herf_sale |
195 |
Industry concentration (total assets) |
Hou and Robinson |
2006 |
JF |
herf_at |
196 |
Return on equity (quarterly) |
Hou, Xue, and Zhang |
2015 |
RFS |
niq_be |
197 |
Cash flow volatility |
Huang |
2009 |
JEF |
ocfq_saleq_std |
198 |
Short-term reversal |
Jegadeesh |
1990 |
JF |
ret_1_0 |
199 |
Revenue surprise |
Jegadeesh and Livnat |
2006 |
JFE |
saleq_su |
200 |
Momentum (12 month) |
Jegadeesh and Titman |
1993 |
JF |
ret_12_1 |
201 |
Momentum (3 month) |
Jegadeesh and Titman |
1993 |
JF |
ret_3_1 |
202 |
Momentum (6 month) |
Jegadeesh and Titman |
1993 |
JF |
ret_6_1 |
203 |
Momentum (9 month) |
Jegadeesh and Titman |
1993 |
JF |
ret_9_1 |
204 |
Firm age |
Jiang, Lee, and Zhang |
2005 |
RAS |
age |
205 |
Revenue surprise (Karma) |
Karma |
2009 |
JBFA |
rsup |
206 |
Tail risk |
Kelly and Jiang |
2014 |
RFS |
|
207 |
Earnings announcement return (Kishore et al.) |
Kishore et al. |
2008 |
WP |
|
208 |
Long-term EPS forecast |
La Porta |
1996 |
JF |
|
209 |
Long-term EPS forecast (monthly sort) |
La Porta |
1996 |
JF |
|
210 |
Annual sales growth |
Lakonishok, Shleifer, and Vishny |
1994 |
JF |
sale_gr1 |
211 |
Annual sales growth (quarterly) |
Lakonishok, Shleifer, and Vishny |
1994 |
JF |
|
212 |
Cash flow-to-price |
Lakonishok, Shleifer, and Vishny |
1994 |
JF |
fcf_me |
213 |
Cash flow-to-price (quarterly) |
Lakonishok, Shleifer, and Vishny |
1994 |
JF |
|
214 |
Five-year sales growth rank |
Lakonishok, Shleifer, and Vishny |
1994 |
JF |
|
215 |
Three-year sales growth |
Lakonishok, Shleifer, and Vishny |
1994 |
JF |
sale_gr3 |
216 |
Kaplan-Zingales index |
Lamont, Polk, and Saa-Requejo |
2001 |
RFS |
kz_index |
217 |
Kaplan-Zingales index (quarterly) |
Lamont, Polk, and Saa-Requejo |
2001 |
RFS |
|
218 |
Abnormal volume in earnings announcement month |
Lerman, Livnat, and Mendenhall |
2008 |
WP |
|
219 |
Taxable income to income (JKP) |
Lev and Nissim |
2004 |
AR |
pi_nix |
220 |
Taxable income to income (Org, GHZ) |
Lev and Nissim |
2004 |
AR |
tb |
221 |
Taxable income to income (quarterly) |
Lev and Nissim |
2004 |
AR |
|
222 |
R&D capital-to-assets |
Li |
2011 |
RFS |
rd5_at |
223 |
Dividend yield (JKP) |
Litzenberger and Ramaswamy |
1979 |
JF |
div12m_me |
224 |
Dividend yield (GHZ) |
Litzenberger and Ramaswamy |
1979 |
JF |
dy |
225 |
Dividend yield (quarterly) |
Litzenberger and Ramaswamy |
1979 |
JF |
|
226 |
Zero-trading days (1 month) |
Liu |
2006 |
JFE |
zero_trades_21d |
227 |
Zero-trading days (6 months) |
Liu |
2006 |
JFE |
zero_trades_126d |
228 |
Zero-trading days (12 months) |
Liu |
2006 |
JFE |
zero_trades_252d |
229 |
Growth in advertising expenses |
Lou |
2014 |
RFS |
|
230 |
Initial public offerings |
Loughran and Ritter |
1995 |
JF |
ipo |
231 |
Enterprise multiple |
Loughran and Wellman |
2011 |
JFQA |
enterprise_multiple |
232 |
Enterprise multiple (JKP) |
Loughran and Wellman |
2011 |
JFQA |
ebitda_mev |
233 |
Enterprise multiple (quarterly) |
Loughran and Wellman |
2011 |
JFQA |
|
234 |
Changes in PPE and inventory/assets |
Lyandres, Sun, and Zhang |
2008 |
RFS |
ppeinv_gr1a |
235 |
Composite debt issuance |
Lyandres, Sun, and Zhang |
2008 |
RFS |
debt_gr3 |
236 |
Customer industries momentum |
Menzly and Ozbas |
2010 |
JF |
|
237 |
Supplier industries momentum |
Menzly and Ozbas |
2010 |
JF |
|
238 |
Dividend initiation |
Michaely, Thaler, and Womack |
1995 |
JF |
divi |
239 |
Dividend omission |
Michaely, Thaler, and Womack |
1995 |
JF |
divo |
240 |
Share price |
Miller and Scholes |
1982 |
JPE |
price |
241 |
Mohanram G-score |
Mohanram |
2005 |
RAS |
|
242 |
Industry momentum |
Moskowitz and Grinblatt |
1999 |
JFE |
indmom |
243 |
Operating leverage |
Novy-Marx |
2011 |
JFE |
opex_at |
244 |
Operating leverage (quarterly) |
Novy-Marx |
2011 |
JFE |
|
245 |
Intermediate momentum (7-12) |
Novy-Marx |
2012 |
ROF |
ret_12_6 |
246 |
Gross profits-to-assets |
Novy-Marx |
2013 |
JFE |
gp_at |
247 |
Gross profits-to-lagged assets |
Novy-Marx |
2013 |
JFE |
gp_atl1 |
248 |
Gross profits-to-lagged assets (quarterly) |
Novy-Marx |
2013 |
JFE |
|
249 |
Asset liquidity to book assets |
Ortiz-Molina and Phillips |
2014 |
JFQA |
aliq_at |
250 |
Asset liquidity to book assets (quarterly) |
Ortiz-Molina and Phillips |
2014 |
JFQA |
|
251 |
Asset liquidity to market assets |
Ortiz-Molina and Phillips |
2014 |
JFQA |
aliq_mat |
252 |
Asset liquidity to market assets (quarterly) |
Ortiz-Molina and Phillips |
2014 |
JFQA |
|
253 |
Cash flow-to-debt |
Ou and Penman |
1989 |
JAR |
cashdebt |
254 |
Change in current ratio |
Ou and Penman |
1989 |
JAR |
pchcurrat |
255 |
Change in quick ratio |
Ou and Penman |
1989 |
JAR |
pchquick |
256 |
Change in sales to inventory |
Ou and Penman |
1989 |
JAR |
pchsaleinv |
257 |
Current ratio |
Ou and Penman |
1989 |
JAR |
currat |
258 |
Quick ratio |
Ou and Penman |
1989 |
JAR |
quick |
259 |
Sales-to-cash |
Ou and Penman |
1989 |
JAR |
salecash |
260 |
Sales-to-inventory |
Ou and Penman |
1989 |
JAR |
saleinv |
261 |
Sales-to-receivables |
Ou and Penman |
1989 |
JAR |
salerec |
262 |
Cash-to-assets |
Palazzo |
2012 |
JFE |
cash_at |
263 |
Pastor-Stambaugh liquidity beta |
Pastor and Stambaugh |
2003 |
JPE |
|
264 |
Book-to-market enterprise value |
Penman, Richardson, and Tuna |
2007 |
JAR |
bev_mev |
265 |
Book-to-market enterprise value (quarterly) |
Penman, Richardson, and Tuna |
2007 |
JAR |
|
266 |
Net debt-to-price |
Penman, Richardson, and Tuna |
2007 |
JAR |
netdebt_me |
267 |
Net debt-to-price (quarterly) |
Penman, Richardson, and Tuna |
2007 |
JAR |
|
268 |
Piotroski F-score (JKP) |
Piotroski |
2000 |
AR |
f_score |
269 |
Piotroski F-score (GHZ, Org) |
Piotroski |
2000 |
AR |
ps |
270 |
Piotroski F-score (quarterly) |
Piotroski |
2000 |
AR |
|
271 |
Net stock issues (JKP, Org) |
Pontiff and Woodgate |
2008 |
JF |
chcsho_12m |
272 |
Net stock issues (GHZ) |
Pontiff and Woodgate |
2008 |
JF |
chcsho |
273 |
Order backlog |
Rajgopal, Shevlin, and Venkatachalam |
2003 |
RAS |
|
274 |
Unexpected quarterly earnings |
Rendelman, Jones, and Latane |
1982 |
JFE |
|
275 |
Chage in common equity |
Richardson et al. |
2005 |
JAE |
be_gr1a |
276 |
Chagne in long-term investments |
Richardson et al. |
2005 |
JAE |
lti_gr1a |
277 |
Change in current Ooperating liabilities |
Richardson et al. |
2005 |
JAE |
col_gr1a |
278 |
Change in current operating assets |
Richardson et al. |
2005 |
JAE |
coa_gr1a |
279 |
Change in financial liabilities |
Richardson et al. |
2005 |
JAE |
fnl_gr1a |
280 |
Change in long-term debt |
Richardson et al. |
2005 |
JAE |
lgr |
281 |
Change in net financial assets |
Richardson et al. |
2005 |
JAE |
nfna_gr1a |
282 |
Change in net non-cash working capital |
Richardson et al. |
2005 |
JAE |
cowc_gr1a |
283 |
Change in net non-current operating assets |
Richardson et al. |
2005 |
JAE |
nncoa_gr1a |
284 |
Change in non-current operating assets |
Richardson et al. |
2005 |
JAE |
ncoa_gr1a |
285 |
Change in non-current operating liabilities |
Richardson et al. |
2005 |
JAE |
ncol_gr1a |
286 |
Change in short-term investments |
Richardson et al. |
2005 |
JAE |
sti_gr1a |
287 |
Total accruals |
Richardson et al. |
2005 |
JAE |
taccruals_at |
288 |
Book to market (December ME, quarterly) |
Rosenberg, Reid, and Lanstein |
1985 |
JF |
|
289 |
Book-to-market (December ME) |
Rosenberg, Reid, and Lanstein |
1985 |
JF |
be_me |
290 |
Change in analyst coverage |
Scherbina |
2008 |
ROF |
|
291 |
Operating accruals (JKP) |
Sloan |
1996 |
AR |
oaccruals_at |
292 |
Operating accruals (GHZ, Org) |
Sloan |
1996 |
AR |
acc |
293 |
Asset turnover |
Soliman |
2008 |
AR |
sale_bev |
294 |
Asset turnover (quarterly) |
Soliman |
2008 |
AR |
|
295 |
Change in asset turnover |
Soliman |
2008 |
AR |
chatoia |
296 |
Change in profit margin |
Soliman |
2008 |
AR |
chpmia |
297 |
Profit margin |
Soliman |
2008 |
AR |
ebit_sale |
298 |
Profit margin (quarterly) |
Soliman |
2008 |
AR |
|
299 |
Return on net operating assets |
Soliman |
2008 |
AR |
ebit_bev |
300 |
Return on net operating assets (quarterly) |
Soliman |
2008 |
AR |
|
301 |
Mispricing factor: Management |
Stambaugh and Yuan |
2016 |
RFS |
mispricing_mgmt |
302 |
Mispricing factor: Performance |
Stambaugh and Yuan |
2016 |
RFS |
mispricing_perf |
303 |
Inventory change |
Thomas and Zhang |
2002 |
RAS |
inv_gr1a |
304 |
Tax expense surprise |
Thomas and Zhang |
2011 |
JAR |
chtx |
305 |
Abnormal corporate investment |
Titman, Wei, and Xie |
2004 |
JFQA |
capex_abn |
306 |
Real estate holdings |
Tuzel |
2010 |
RFS |
realestate |
307 |
Convertible debt indicator |
Valta |
2016 |
JFQA |
convind |
308 |
Convertible debt-to-total debt |
Valta |
2016 |
JFQA |
|
309 |
Secured debt indicator |
Valta |
2016 |
JFQA |
securedind |
310 |
Secured debt-to-total debt |
Valta |
2016 |
JFQA |
secured |
311 |
Whited-Wu index |
Whited and Wu |
2006 |
RFS |
|
312 |
Whited-Wu index (quarterly) |
Whited and Wu |
2006 |
RFS |
|
313 |
CAPEX growth (1 year) |
Xie |
2001 |
AR |
capx_gr1 |
314 |
Discretionay accruals |
Xie |
2001 |
AR |
Structure
PyAnomaly consists of various modules and the core modules you are likely to use are as follows. The full list of the modules and their details can be found in the API documentation (pyanomaly).
wrdsdata.py:WRDS, a class to handle data downloading from WRDS is defined here.panel.py:Panel, a base class to handle panel data is defined here.characteristics.py: Classes to generate firm characteristics are defined here. These classes are derived fromPanel.FUNDA: A class to generate firm characteristics from funda.FUNDQ: A class to generate firm characteristics from fundq.CRSPM: A class to generate firm characteristics from crspm.CRSPD: A class to generate firm characteristics from crspd.Merge: A class to generate firm characteristics from a merged dataset of funda, fundq, crspm, and crspd.
analytics.py: A module that defines functions for analytics, such as 1-D sort, cross-sectional regression, and time-series regression.datatools.py: A module that defines functions for data handling, such as data filtering, trimming, and winsorizing.
System Requirement
- Recommendation
Disc space: minimum 100 GB
Memory: minimum 64 GB
The minimum system requirement depends on the configuration, e.g., what characteristics to generate or the sample period.
- Disc space
The raw data downloaded from WRDS take up about 27 GB of the disc space. The final output file can take up to 15 GB if all characteristics are generated and the raw data are saved together. The size of the output file can be significantly reduced if only the firm characteristics are saved (less than 5 GB). In general, 100GB should be sufficient in all types of tasks and even when interim results are saved.
- Memory
Generating firm characteristics from daily data such as crspd consumes a significant amount of memory. The memory usage can be as much as 50 GB at a peak. This does not mean you need a physical memory of this size. Most OS will use Paging File to allocate some of the disc space as memory, although using Paging File will increase the running time.
Comparison to Other Sources
PyAnomaly benefits greatly from the SAS codes of Green et al. (2017) and Jensen et al. (2021), and also from the papers and documentations of Hou et al. (2020) and Chen and Zimmermann (2020). We generally follow the SAS codes of JKP and GHZ and validate our code against them, but when their implementation is significantly different from the original definition, we try to follow the original definition. When the implementation of a firm characteristic is significantly different between the two sources, we implement both implementations using different function names. We also found several mistakes in these codes. For those mistakes we found and the differences between our implementation and theirs, we make a note in the mapping file and comments in the code. The SAS code of Jensen et al. (2021) has been updated several times while we develop PyAnomaly and some of the comments we documented may no longer be valid.
Comparison to the SAS code of Jensen et al. (2021)
PyAnomaly can be configured so that it replicates JKP’s SAS code as closely as possible. However, there are a few key differences that make our results differ from theirs.
- Market equity
JKP use not only CRSP’s msf but also Compustat’s secm and secd to calculate market equity, and (roughly speaking) choose the maximum market equity among those calculated from different sources. We only use the price and shares outstanding from CRSP to calculate the market equity.
- Merging FUNDA with FUNDQ
JKP quarterly-update annual accounting variables using comp.fundq. More specifically, JKP create same characteristics in funda and fundq separately and merge them. On the other hand, we merge the raw data first and then generate characteristics. Since some variables in funda are not available in fundq, eg, ebitda, JKP make those unavailable variables from other variables and create characteristics, even when they are available in funda. We prefer to merge funda with fundq at the raw data level and create characteristics from the merged data.
- Share code filtering
JKP do not filter data using CRSP share code (shrcd), whereas we only use ordinary common stocks (shrcd = 10, 11, or 12). We find that some stocks’ shrcd changes over time. Therefore, this difference does not only affect the cross-section but also affects time-series.
References
Useful Links
PyAnomaly repository: https://github.com/chulwoohan/pyanomaly
JKP’s SAS code: https://github.com/bkelly-lab/ReplicationCrisis
Openassetpricing: https://www.openassetpricing.com/
GHZ’ SAS code: https://sites.google.com/site/jeremiahrgreenacctg/home
Glossary
crspd: CRSP daily data created from dsf, dsenames, and dseall.
crspm: CRSP monthly data created from msf, msenames, and mseall.
funda: Compustat annual accounting data created from funda.
fundq: Compustat quarterly accounting data created from fundq.
CZ: Either the paper or the R/Stata code of Chen and Zimmermann (2020).
GHZ: Either the paper or the SAS code of Green, Hand, and Zhang (2017).
HXZ: Hou, Xue, and Zhang (2020).
JKP: Either the paper or the SAS code of Jensen, Kelly, and Pedersen (2021).