Changelog

v0.9 - 2022.01.15

Initial version.

v0.923 - 2022.01.16

Multiprocessing in datatools.populate() has been updated to increase the speed.

v0.930 - 2022.01.17

The trend factor of Han, Zhou, and Zhu (2016) has been added. We thank Guofu Zhou for this suggestion.

v0.931 - 2022.01.23

A bug of not returning the result in FUNDA.c_ebitda_mev() has been fixed.

A new characteristic method for Enterprise multiple (Loughran and Wellman, 2011), c_enterprise_multiple(), has been added to FUNDA, as the previous one (c_ebitda_mev()) that implements JKP’s SAS code uses a different definition from the original definition. This new method uses the original definition.

v0.932 - 2022.01.25

Bug fix for missing classes

When data has not enough distinct values, some classes can be missing when data is sorted into classes (quantiles). The append_long_short() and time_series_average() in analytics.py have been revised to handle data with missing classes.

v1.0 - 2024.02.28 (Major Update)

There are several important updates in this version and some functions are not backward compatible. See the examples in Cookbook for changes.

Major updates

Performance upgrade

The library is now significantly faster and more memory efficient.
panel.FCPanel

The Panel class has been divided into two classes: Panel class that serves as the base class for panel data analysis and FCPanel class that inherits Panel and serves as the base class for firm characteristics generation. FUNDA, FUNDQ, CRSPM, CRSPD, and Merge now inherit FCPanel instead of Panel.
characteristics.CRSPDRaw

Previously, CRSPD.data contained daily crspd data and CRSPD.chars contained monthly firm characteristics. In the new version, a new class CRSPDRaw handles daily crspd data and is a member of CRSPD. CRSPDRaw.data contains daily crspd data and CRSPD.data contains monthly firm characteristics.
Factor models

Two new factor models, Fama-French 5-factor and Stambough and Yuan 4-factor models, have been added.
CRSP-Compustat link

If a use don’t have WRDS subscription for ccmxpf_linktable, PyAnomlay will create a link table internally and use it to map permno and gvkey. Compared to using ccmxpf_linktable, about 13% of gvkey’s are different when using the internal link table (‘crsp_comp_linktable’).

Minor updates

Default log directory has been added as config.log_dir.
Float datatype can be configured to float32 using set_config(float_type='float32').
New file format, parquet, has been added. To change the file format to parquet, do set_config(file_format='parquet'). The default file format is pickle.
log.set_log_path() has been revised so that it can create a log file automatically from a file name.
datatools.classify() has been revised so that if the characteristic is a binary variable, the class is either 0 or (number of quantiles - 1). In the previous version, the class was not deterministic.
jkp.py has been renamed as factors.py.
analytics.rolling_beta() has been renamed as numba_support.rolling_regression().
panel.Panel.rolling_beta() has been renamed as panel.Panel.rolling_regression().
Input arguments have been changed in the following functions.
- datatools.classify()
- datatools.trim()
- datatools.filter()
- datatools.winsorize()
A new argument fname has been added to load_data() of FUNDA, FUNDQ, CRSPM, and CRSPD. If funda, fundq, crspm, and crspd data are modified (e.g., cleansed) and saved with different file names, those names can be given to read data from those modified data files.
mapping.xlsx: New columns, original sample start date (sample_start) and original sample end date (sample_end), have been added.

New functions

analytics.grs_test(): GRS (Gibbons, Ross, and Shanken, 1989) test.

config.set_config(): Set library configuration.

config.get_config(): Get library configuration.

datatools.apply_to_groups(): Group data and apply a function to each group.

datatools.apply_to_groups_jit(): Group data and apply a function to each group (jitted version).

datatools.apply_to_groups_reduce_jit(): Group data and apply a reduce function to each group (jitted version).

numba_support.roll_sum(): Rolling sum.

numba_support.roll_mean(): Rolling mean.

numba_support.roll_std(): Rolling standard deviation.

numba_support.roll_var(): Rolling variance.

numba_support.rank(): Rank.

numba_support.bivariate_regression(): Bivariate regression.

numba_support.regression(): Multivariate regression.

numba_support.rolling_regression(): Rolling regression.

panel.Panel.apply_to_ids(): Apply a function to each id group.

panel.Panel.apply_to_dates(): Apply a function to each date group.

wrdsdata.WRDS.create_crsp_comp_linktable(): Create a CRSP-Compustat link table using cusip.

wrdsdata.WRDS.add_gvkey_to_crsp_cusip(): Add gvkey to m(d)sf and identify primary stocks using internal link table.

Deprecated functions

characteristics.FUNDA.convert_to_monthly(): Use Panel.populate() instead.

characteristics.FUNDQ.convert_to_monthly(): Use Panel.populate() instead.

datatools.filter_n().

datatools.groupby_apply(): Use datatools.apply_to_groups(), datatools.apply_to_groups_jit(), or datatools.apply_to_groups_reduce_jit().

datatools.groupby_apply_np(): Use datatools.apply_to_groups(), datatools.apply_to_groups_jit(), or datatools.apply_to_groups_reduce_jit().

datatools.rolling_apply(): Use datatools.apply_to_groups(), datatools.apply_to_groups_jit(), or datatools.apply_to_groups_reduce_jit().

datatools.rolling_apply_np(): Use datatools.apply_to_groups(), datatools.apply_to_groups_jit(), or datatools.apply_to_groups_reduce_jit().

Bug fix

characteristic.FUNDA.c_currat(): A bug of not returning the result has been fixed.

characteristics.FUNDQ.c_ni_inc8q(): In the previous version, dibq (difference of ibq) was set to nan in the first 4 quarters. This made some valid ni_inc8q in the first 12 quarters become nan. In the new version, we set all nan values of dibq to 0 before calculating ni_inc8q and ni_inc8q is set to nan if dibq is nan. The revised logic does not lose valid ni_inc8q in the first 12 quarters.

characteristic.CRSPD.zero_trades_21d(): Fixed dividing by 0 when the total turnover is 0. When counting the number of days in a month, only the days when turnover is not nan are counted. Before, all days were counted.

characteristic.CRSPD.c_zero_trades_126d(): Fixed dividing by 0 when the total turnover is 0.

characteristic.CRSPD.c_zero_trades_252d(): Fixed dividing by 0 when the total turnover is 0.

characteristic.CRSPD.c_rmax5_21d(): A bug when there are only a few distinct return values in a month has been fixed. Suppose the return is positive in two days and 0 in the other days. Previously, rmax5_21d was the mean of the two positive returns. In the new version, it is the mean of the two positive returns and three 0 returns. Also, if days of valid returns (not nan) are fewer than or equal to 5, the result is nan.

characteristic.Merge.age(): In the previous version, age was the max of (funda history, crspm history). This logic can make the age decrease when funda history is missing: if funda data exists from 2000.01 to 2020.12 and crsp data from 2001.01 to 2022.12, the age will decrease in 2021.01. The logic has been revised so that the age doesn’t decrease when funda data is missing.

panel.Panel.rolling(): When lag > 0, shifted rows were not properly removed. This bug has been fixed.

v1.01 - 2024.03.13

Minor bug fix in analytics.time_series_average().