Installation

You can install PyAnomaly either using pip or from the source. If you intend to use PyAnomaly ‘’as-is’’, we recommend pip. If you need to refer to the source code frequently, e.g., to add new firm characteristics or functions, you may want to copy the source to your project directory.

Using pip

pip install pyassetpricing

After installing pyanomaly, download the mapping file and the examples.

From the source

Download the source from the link below to your project directory.

https://github.com/chulwoohan/pyanomaly

You need to install the required packages:

wrds

pandas

statsmodels

numba

openpyxl

matplotlib

scikit-learn

pyarrow

You can install these packages one by one or run setup.bat to install them at once.

To confirm the package is installed correctly, try the following code:

>>> from pyanomaly.globals import *
>>> x = [1, 2, 2, 3, [1, 5], None]
>>> y = unique_list(x)
>>> print(y)
[1, 2, 3, 5]

We strongly discourage changing the source as it can be updated from time to time. If you have suggestions of changes, please contact us.

Generating Characteristics

Process Flow

A high-level process to generate firm characteristics is as follows.

Download data from WRDS and save them to files.
Generate factor portfolios: the factors will be used to generate firm characteristics based on factor regression.
Generate firm characteristics from each dataset (funda, fundq, crspm, crspd).
Merge the data and generate firm characteristics that require data from multiple datasets.
Save the result to a file.

It is worth noting that:

Data need to be downloaded only once. The data are saved in files and can be loaded later to generate firm characteristics.
The results can be saved in each step. This can save processing time when you test new characteristics. Consider a scenario where you add a new characteristic in FUNDA and need to generate it many times to validate the result. You can generate other firm characteristics only once and save them. Then, you can load them from files instead of generating them every time you generate the new characteristic.

Mapping File

The mapping file defines the mapping between firm characteristics and functions, and functions and characteristic names used in other papers. The file can be used to select the characteristics you want to generate and give them aliases. A part of the file is shown below.

description: This is a short description of the characteristics. When there are multiple versions of implementation, it is indicated in the description. For example, ‘Idiosyncratic volatility (GHZ)’ is GHZ’s implementation of idiosyncratic volatility and ‘Idiosyncratic volatility (Org, JKP)’ is JKP’s implementation. ‘Org’ indicates this version is closer to the original definition.
function: This column shows the associated functions (methods). If function is ‘idiovol’, the firm characteristic is implemented in the function (method) c_idiovol(): the actual function name always starts with c_. If function is missing, it means the corresponding firm characteristic is not yet available.
ghz, jkp, hxz, cz: These columns respectively show the aliases of the firm characteristics used in GHZ, JKP, HXZ, and CZ. If you set alias='ghz' when initializing FCPanel or its derived class, only the characteristics defined in ‘ghz’ column will be generated. Similarly, setting alias to ‘jkp’, ‘hxz’, or ‘cz’ will generate firm characteristics defined in these columns. If you set alias=None, all available firm characteristics will be generated.
my chars: You can add a new column in the file to define which characteristics to generate and their aliases. For example, if you add a column ‘my chars’ as shown in the table and set alias='my chars', only ‘Idiosyncratic volatility (Org, JKP)’ and ‘Illiquidity’ will be generated.

description	author	year	journal	function	ghz	jkp	hxz	cz	my chars
Idiosyncratic volatility (GHZ)	Ali, Hwang, and Trombley	2003	JFE	idiovol	idiovol
Idiosyncratic volatility (Org, JKP)	Ali, Hwang, and Trombley	2003	JFE	ivol_capm_252d		ivol_capm_252d	Iv	IdioVolAHT	ivol
Illiquidity	Amihud	2002	JFM	ami_126d	ill	ami_126d	Ami	Illiquidity	illq
Bid-ask spread	Amihud and Mendelson	1986	JFE	baspread	baspread			BidAskSpread
Three-year investment growth	Anderson and Garcia-Feijoo	2006	JF	capx_gr3		capx_gr3	3Ig	grcapx3y
Two-year investment growth	Anderson and Garcia-Feijoo	2006	JF	capx_gr2	grcapx	capx_gr2	2Ig	grcapx
Dispersion in analyst long-term growth forecasts	Anderson, Ghysels, and Juergens	2005	RFS				Dlg	ForecastDispersionLT
Downside beta	Ang, Chen, and Xing	2006	RFS	betadown_252d		betadown_252d	beta-1	DownsideBeta

Output Files

FCPanel and its derived classes (FUNDA, FUNDQ, CRSPM, CRSPD, and Merge) have an attribute data, which is a DataFrame that contains the raw data and the firm characteristics. An exception is CRSPD. CRSPD.data only contains firm characteristics and the raw crspd data is stored in CRSPD.cd.data. CRSPD.cd is an object of CRSPDRaw, a class derived from FCPanel. We separate firm characteristics from the raw data in crspd because the raw data have a daily frequency, whereas the firm characteristics have a monthly frequency. The column names of the firm characteristics are their function names (without c_). When data is saved to a file by calling FCPanel.save(), the column names will be replaced by their aliases. When a saved file is loaded back to a class by calling FCPanel.load(), the column names will be replaced by the function names. In summary, the column names of the firm characteristics are the function names in the data attribute, whereas the column names are the aliases in saved files.

The data attribute has a MultiIndex of ‘date’ and ‘permno’ in CRSPM, CRSPD, and Merge, whereas it has a MultiIndex of ‘datadate’ and ‘gvkey` in FUNDA and FUNDQ. Once the data in FUNDA and FUNDQ are populated monthly, the index changes to ‘date’ and ‘gvkey’, and ‘datadate’ remains as a column. Note that the dates in ‘date’ are shifted to month-end to be compatible with ‘datadate’.

The easiest way to get started is going through examples. The next section presents several examples to help you get familiarized with PyAnomaly.