Changelog
v0.9 - 2022.01.15
Initial version.
v0.923 - 2022.01.16
Multiprocessing in datatools.populate() has been updated to increase the speed.
v0.930 - 2022.01.17
The trend factor of Han, Zhou, and Zhu (2016) has been added. We thank Guofu Zhou for this suggestion.
v0.931 - 2022.01.23
A bug of not returning the result in FUNDA.c_ebitda_mev() has been fixed.
A new characteristic method for Enterprise multiple (Loughran and Wellman, 2011), c_enterprise_multiple(),
has been added to FUNDA, as the previous one (c_ebitda_mev()) that implements JKP’s SAS code uses a different
definition from the original definition. This new method uses the original definition.
v0.932 - 2022.01.25
Bug fix for missing classes
When data has not enough distinct values, some classes can be missing when data is sorted into classes (quantiles). The
append_long_short()andtime_series_average()in analytics.py have been revised to handle data with missing classes.
v1.0 - 2024.02.28 (Major Update)
There are several important updates in this version and some functions are not backward compatible. See the examples in Cookbook for changes.
Major updates
Performance upgrade
The library is now significantly faster and more memory efficient.
panel.FCPanelThe
Panelclass has been divided into two classes:Panelclass that serves as the base class for panel data analysis andFCPanelclass that inheritsPaneland serves as the base class for firm characteristics generation.FUNDA,FUNDQ,CRSPM,CRSPD, andMergenow inheritFCPanelinstead ofPanel.characteristics.CRSPDRawPreviously,
CRSPD.datacontained daily crspd data andCRSPD.charscontained monthly firm characteristics. In the new version, a new classCRSPDRawhandles daily crspd data and is a member ofCRSPD.CRSPDRaw.datacontains daily crspd data andCRSPD.datacontains monthly firm characteristics.Factor models
Two new factor models, Fama-French 5-factor and Stambough and Yuan 4-factor models, have been added.
CRSP-Compustat link
If a use don’t have WRDS subscription for ccmxpf_linktable, PyAnomlay will create a link table internally and use it to map permno and gvkey. Compared to using ccmxpf_linktable, about 13% of gvkey’s are different when using the internal link table (‘crsp_comp_linktable’).
Minor updates
Default log directory has been added as
config.log_dir.Float datatype can be configured to float32 using
set_config(float_type='float32').New file format, parquet, has been added. To change the file format to parquet, do
set_config(file_format='parquet'). The default file format is pickle.log.set_log_path()has been revised so that it can create a log file automatically from a file name.datatools.classify()has been revised so that if the characteristic is a binary variable, the class is either 0 or (number of quantiles - 1). In the previous version, the class was not deterministic.jkp.pyhas been renamed asfactors.py.analytics.rolling_beta()has been renamed asnumba_support.rolling_regression().panel.Panel.rolling_beta()has been renamed aspanel.Panel.rolling_regression().Input arguments have been changed in the following functions.
datatools.classify()datatools.trim()datatools.filter()datatools.winsorize()
A new argument fname has been added to
load_data()ofFUNDA,FUNDQ,CRSPM, andCRSPD. If funda, fundq, crspm, and crspd data are modified (e.g., cleansed) and saved with different file names, those names can be given to read data from those modified data files.mapping.xlsx: New columns, original sample start date (sample_start) and original sample end date (sample_end), have been added.
New functions
analytics.grs_test(): GRS (Gibbons, Ross, and Shanken, 1989) test.
config.set_config(): Set library configuration.
config.get_config(): Get library configuration.
datatools.apply_to_groups(): Group data and apply a function to each group.
datatools.apply_to_groups_jit(): Group data and apply a function to each group (jitted version).
datatools.apply_to_groups_reduce_jit(): Group data and apply a reduce function to each group (jitted version).
numba_support.roll_sum(): Rolling sum.
numba_support.roll_mean(): Rolling mean.
numba_support.roll_std(): Rolling standard deviation.
numba_support.roll_var(): Rolling variance.
numba_support.rank(): Rank.
numba_support.bivariate_regression(): Bivariate regression.
numba_support.regression(): Multivariate regression.
numba_support.rolling_regression(): Rolling regression.
panel.Panel.apply_to_ids(): Apply a function to each id group.
panel.Panel.apply_to_dates(): Apply a function to each date group.
wrdsdata.WRDS.create_crsp_comp_linktable(): Create a CRSP-Compustat link table using cusip.
wrdsdata.WRDS.add_gvkey_to_crsp_cusip(): Add gvkey to m(d)sf and identify primary stocks using internal link table.
Deprecated functions
characteristics.FUNDA.convert_to_monthly(): UsePanel.populate()instead.
characteristics.FUNDQ.convert_to_monthly(): UsePanel.populate()instead.
datatools.filter_n().
datatools.groupby_apply(): Usedatatools.apply_to_groups(),datatools.apply_to_groups_jit(), ordatatools.apply_to_groups_reduce_jit().
datatools.groupby_apply_np(): Usedatatools.apply_to_groups(),datatools.apply_to_groups_jit(), ordatatools.apply_to_groups_reduce_jit().
datatools.rolling_apply(): Usedatatools.apply_to_groups(),datatools.apply_to_groups_jit(), ordatatools.apply_to_groups_reduce_jit().
datatools.rolling_apply_np(): Usedatatools.apply_to_groups(),datatools.apply_to_groups_jit(), ordatatools.apply_to_groups_reduce_jit().
Bug fix
characteristic.FUNDA.c_currat(): A bug of not returning the result has been fixed.
characteristics.FUNDQ.c_ni_inc8q(): In the previous version, dibq (difference of ibq) was set to nan in the first 4 quarters. This made some valid ni_inc8q in the first 12 quarters become nan. In the new version, we set all nan values of dibq to 0 before calculating ni_inc8q and ni_inc8q is set to nan if dibq is nan. The revised logic does not lose valid ni_inc8q in the first 12 quarters.
characteristic.CRSPD.zero_trades_21d(): Fixed dividing by 0 when the total turnover is 0. When counting the number of days in a month, only the days when turnover is not nan are counted. Before, all days were counted.
characteristic.CRSPD.c_zero_trades_126d(): Fixed dividing by 0 when the total turnover is 0.
characteristic.CRSPD.c_zero_trades_252d(): Fixed dividing by 0 when the total turnover is 0.
characteristic.CRSPD.c_rmax5_21d(): A bug when there are only a few distinct return values in a month has been fixed. Suppose the return is positive in two days and 0 in the other days. Previously, rmax5_21d was the mean of the two positive returns. In the new version, it is the mean of the two positive returns and three 0 returns. Also, if days of valid returns (not nan) are fewer than or equal to 5, the result is nan.
characteristic.Merge.age(): In the previous version, age was the max of (funda history, crspm history). This logic can make the age decrease when funda history is missing: if funda data exists from 2000.01 to 2020.12 and crsp data from 2001.01 to 2022.12, the age will decrease in 2021.01. The logic has been revised so that the age doesn’t decrease when funda data is missing.
panel.Panel.rolling(): When lag > 0, shifted rows were not properly removed. This bug has been fixed.
v1.01 - 2024.03.13
Minor bug fix in analytics.time_series_average().