Installation

You can install PyAnomaly either using pip or from the source. If you intend to use PyAnomaly ‘’as-is’’, we recommend pip. If you need to refer to the source code frequently, e.g., to add new firm characteristics or functions, you may want to copy the source to your project directory.

Using pip

pip install pyassetpricing

After installing pyanomaly, download the mapping file and the examples.

From the source

Download the source from the link below to your project directory.

You need to install the required packages:

  • wrds

  • pandas

  • statsmodels

  • numba

  • openpyxl

  • matplotlib

  • scikit-learn

  • pyarrow

You can install these packages one by one or run setup.bat to install them at once.

To confirm the package is installed correctly, try the following code:

>>> from pyanomaly.globals import *
>>> x = [1, 2, 2, 3, [1, 5], None]
>>> y = unique_list(x)
>>> print(y)
[1, 2, 3, 5]

We strongly discourage changing the source as it can be updated from time to time. If you have suggestions of changes, please contact us.

Generating Characteristics

Process Flow

A high-level process to generate firm characteristics is as follows.

  1. Download data from WRDS and save them to files.

  2. Generate factor portfolios: the factors will be used to generate firm characteristics based on factor regression.

  3. Generate firm characteristics from each dataset (funda, fundq, crspm, crspd).

  4. Merge the data and generate firm characteristics that require data from multiple datasets.

  5. Save the result to a file.

_images/process.png

It is worth noting that:

  • Data need to be downloaded only once. The data are saved in files and can be loaded later to generate firm characteristics.

  • The results can be saved in each step. This can save processing time when you test new characteristics. Consider a scenario where you add a new characteristic in FUNDA and need to generate it many times to validate the result. You can generate other firm characteristics only once and save them. Then, you can load them from files instead of generating them every time you generate the new characteristic.

Mapping File

The mapping file defines the mapping between firm characteristics and functions, and functions and characteristic names used in other papers. The file can be used to select the characteristics you want to generate and give them aliases. A part of the file is shown below.

description

This is a short description of the characteristics. When there are multiple versions of implementation, it is indicated in the description. For example, ‘Idiosyncratic volatility (GHZ)’ is GHZ’s implementation of idiosyncratic volatility and ‘Idiosyncratic volatility (Org, JKP)’ is JKP’s implementation. ‘Org’ indicates this version is closer to the original definition.

function

This column shows the associated functions (methods). If function is ‘idiovol’, the firm characteristic is implemented in the function (method) c_idiovol(): the actual function name always starts with c_. If function is missing, it means the corresponding firm characteristic is not yet available.

ghz, jkp, hxz, cz

These columns respectively show the aliases of the firm characteristics used in GHZ, JKP, HXZ, and CZ. If you set alias='ghz' when initializing FCPanel or its derived class, only the characteristics defined in ‘ghz’ column will be generated. Similarly, setting alias to ‘jkp’, ‘hxz’, or ‘cz’ will generate firm characteristics defined in these columns. If you set alias=None, all available firm characteristics will be generated.

my chars

You can add a new column in the file to define which characteristics to generate and their aliases. For example, if you add a column ‘my chars’ as shown in the table and set alias='my chars', only ‘Idiosyncratic volatility (Org, JKP)’ and ‘Illiquidity’ will be generated.

description

author

year

journal

function

ghz

jkp

hxz

cz

my chars

Idiosyncratic volatility (GHZ)

Ali, Hwang, and Trombley

2003

JFE

idiovol

idiovol

Idiosyncratic volatility (Org, JKP)

Ali, Hwang, and Trombley

2003

JFE

ivol_capm_252d

ivol_capm_252d

Iv

IdioVolAHT

ivol

Illiquidity

Amihud

2002

JFM

ami_126d

ill

ami_126d

Ami

Illiquidity

illq

Bid-ask spread

Amihud and Mendelson

1986

JFE

baspread

baspread

BidAskSpread

Three-year investment growth

Anderson and Garcia-Feijoo

2006

JF

capx_gr3

capx_gr3

3Ig

grcapx3y

Two-year investment growth

Anderson and Garcia-Feijoo

2006

JF

capx_gr2

grcapx

capx_gr2

2Ig

grcapx

Dispersion in analyst long-term growth forecasts

Anderson, Ghysels, and Juergens

2005

RFS

Dlg

ForecastDispersionLT

Downside beta

Ang, Chen, and Xing

2006

RFS

betadown_252d

betadown_252d

beta-1

DownsideBeta

Output Files

FCPanel and its derived classes (FUNDA, FUNDQ, CRSPM, CRSPD, and Merge) have an attribute data, which is a DataFrame that contains the raw data and the firm characteristics. An exception is CRSPD. CRSPD.data only contains firm characteristics and the raw crspd data is stored in CRSPD.cd.data. CRSPD.cd is an object of CRSPDRaw, a class derived from FCPanel. We separate firm characteristics from the raw data in crspd because the raw data have a daily frequency, whereas the firm characteristics have a monthly frequency. The column names of the firm characteristics are their function names (without c_). When data is saved to a file by calling FCPanel.save(), the column names will be replaced by their aliases. When a saved file is loaded back to a class by calling FCPanel.load(), the column names will be replaced by the function names. In summary, the column names of the firm characteristics are the function names in the data attribute, whereas the column names are the aliases in saved files.

The data attribute has a MultiIndex of ‘date’ and ‘permno’ in CRSPM, CRSPD, and Merge, whereas it has a MultiIndex of ‘datadate’ and ‘gvkey` in FUNDA and FUNDQ. Once the data in FUNDA and FUNDQ are populated monthly, the index changes to ‘date’ and ‘gvkey’, and ‘datadate’ remains as a column. Note that the dates in ‘date’ are shifted to month-end to be compatible with ‘datadate’.

The easiest way to get started is going through examples. The next section presents several examples to help you get familiarized with PyAnomaly.