Installation
You can install PyAnomaly either using pip or from the source.
If you intend to use PyAnomaly ‘’as-is’’, we recommend pip.
If you need to refer to the source code frequently, e.g., to add new firm characteristics or functions,
you may want to copy the source to your project directory.
Using pip
pip install pyassetpricing
After installing pyanomaly, download the mapping file and the examples.
From the source
Download the source from the link below to your project directory.
You need to install the required packages:
wrds
pandas
statsmodels
numba
openpyxl
matplotlib
scikit-learn
pyarrow
You can install these packages one by one or run setup.bat to install them at once.
To confirm the package is installed correctly, try the following code:
>>> from pyanomaly.globals import *
>>> x = [1, 2, 2, 3, [1, 5], None]
>>> y = unique_list(x)
>>> print(y)
[1, 2, 3, 5]
We strongly discourage changing the source as it can be updated from time to time. If you have suggestions of changes, please contact us.
Generating Characteristics
Process Flow
A high-level process to generate firm characteristics is as follows.
Download data from WRDS and save them to files.
Generate factor portfolios: the factors will be used to generate firm characteristics based on factor regression.
Generate firm characteristics from each dataset (funda, fundq, crspm, crspd).
Merge the data and generate firm characteristics that require data from multiple datasets.
Save the result to a file.
It is worth noting that:
Data need to be downloaded only once. The data are saved in files and can be loaded later to generate firm characteristics.
The results can be saved in each step. This can save processing time when you test new characteristics. Consider a scenario where you add a new characteristic in
FUNDAand need to generate it many times to validate the result. You can generate other firm characteristics only once and save them. Then, you can load them from files instead of generating them every time you generate the new characteristic.
Mapping File
The mapping file defines the mapping between firm characteristics and functions, and functions and characteristic names used in other papers. The file can be used to select the characteristics you want to generate and give them aliases. A part of the file is shown below.
- description
This is a short description of the characteristics. When there are multiple versions of implementation, it is indicated in the description. For example, ‘Idiosyncratic volatility (GHZ)’ is GHZ’s implementation of idiosyncratic volatility and ‘Idiosyncratic volatility (Org, JKP)’ is JKP’s implementation. ‘Org’ indicates this version is closer to the original definition.
- function
This column shows the associated functions (methods). If function is ‘idiovol’, the firm characteristic is implemented in the function (method)
c_idiovol(): the actual function name always starts withc_. If function is missing, it means the corresponding firm characteristic is not yet available.- ghz, jkp, hxz, cz
These columns respectively show the aliases of the firm characteristics used in GHZ, JKP, HXZ, and CZ. If you set
alias='ghz'when initializingFCPanelor its derived class, only the characteristics defined in ‘ghz’ column will be generated. Similarly, settingaliasto ‘jkp’, ‘hxz’, or ‘cz’ will generate firm characteristics defined in these columns. If you setalias=None, all available firm characteristics will be generated.- my chars
You can add a new column in the file to define which characteristics to generate and their aliases. For example, if you add a column ‘my chars’ as shown in the table and set
alias='my chars', only ‘Idiosyncratic volatility (Org, JKP)’ and ‘Illiquidity’ will be generated.
description |
author |
year |
journal |
function |
ghz |
jkp |
hxz |
cz |
my chars |
|---|---|---|---|---|---|---|---|---|---|
Idiosyncratic volatility (GHZ) |
Ali, Hwang, and Trombley |
2003 |
JFE |
idiovol |
idiovol |
||||
Idiosyncratic volatility (Org, JKP) |
Ali, Hwang, and Trombley |
2003 |
JFE |
ivol_capm_252d |
ivol_capm_252d |
Iv |
IdioVolAHT |
ivol |
|
Illiquidity |
Amihud |
2002 |
JFM |
ami_126d |
ill |
ami_126d |
Ami |
Illiquidity |
illq |
Bid-ask spread |
Amihud and Mendelson |
1986 |
JFE |
baspread |
baspread |
BidAskSpread |
|||
Three-year investment growth |
Anderson and Garcia-Feijoo |
2006 |
JF |
capx_gr3 |
capx_gr3 |
3Ig |
grcapx3y |
||
Two-year investment growth |
Anderson and Garcia-Feijoo |
2006 |
JF |
capx_gr2 |
grcapx |
capx_gr2 |
2Ig |
grcapx |
|
Dispersion in analyst long-term growth forecasts |
Anderson, Ghysels, and Juergens |
2005 |
RFS |
Dlg |
ForecastDispersionLT |
||||
Downside beta |
Ang, Chen, and Xing |
2006 |
RFS |
betadown_252d |
betadown_252d |
beta-1 |
DownsideBeta |
Output Files
FCPanel and its derived classes (FUNDA, FUNDQ, CRSPM, CRSPD, and Merge) have an attribute
data, which is a DataFrame that contains the raw data and the firm characteristics. An exception is CRSPD.
CRSPD.data only contains firm characteristics and the raw crspd data is stored in CRSPD.cd.data.
CRSPD.cd is an object of CRSPDRaw, a class derived from FCPanel. We separate firm characteristics from the raw data
in crspd because the raw data have a daily frequency, whereas the firm characteristics have a monthly frequency.
The column names of the firm characteristics are their function names (without c_). When data
is saved to a file by calling FCPanel.save(), the column names will be replaced by their aliases.
When a saved file is loaded back to a class by calling FCPanel.load(), the column names will be replaced by
the function names. In summary, the column names of the firm characteristics are the function names in
the data attribute, whereas the column names are the aliases in saved files.
The data attribute has a MultiIndex of ‘date’ and ‘permno’ in CRSPM, CRSPD, and Merge, whereas
it has a MultiIndex of ‘datadate’ and ‘gvkey` in FUNDA and FUNDQ. Once the data in FUNDA and FUNDQ
are populated monthly, the index changes to ‘date’ and ‘gvkey’, and ‘datadate’ remains as a column.
Note that the dates in ‘date’ are shifted to month-end to be compatible with ‘datadate’.
The easiest way to get started is going through examples. The next section presents several examples to help you get familiarized with PyAnomaly.