How to add new CondDensityEstimators, CauseClusterers, and EffectClusterers to CFL

While the CFL software package comes with pre-implemented (either by us or Scikit-learn) models for conditional density estimation and clustering, it has also been designed to make it easy to try out new models. To do so, there are two main steps: - Make a python class for your model that inherits from the appropriate abstract class (either CDEModel, CCModel, or ECModel) - pass it in to your Experiment

[1]:
import numpy as np
from cfl import Experiment
from cfl.cond_density_estimation import CDEModel
[2]:
# generate toy data
data_info = {'X_dims' : (10000, 5),
             'Y_dims' : (10000, 3),
             'Y_type' : 'continuous'}
X = np.random.normal(size=data_info['X_dims'])
Y = np.random.normal(size=data_info['Y_dims'])
print(X.shape)
print(Y.shape)
(10000, 5)
(10000, 3)
[3]:
# make a new conditional density estimator that inherits CDEModel. Your class
# must implement all methods specified by CDEModel
class MyCDE(CDEModel):
    def __init__(self, data_info, model_params):
        self.data_info = data_info
        self.model_params = model_params

    def train(self, dataset, prev_results=None):
        pyx = np.random.normal(size=dataset.get_Y().shape)
        return {'pyx' : pyx}

    def predict(self, dataset, prev_results=None):
        pyx = np.random.normal(size=dataset.get_Y().shape)
        return {'pyx' : pyx}

    def load_model(self, path):
        pass

    def save_model(self, path):
        pass

    def get_model_params(self):
        return self.model_params
[4]:
# MyCDE can be passed in as the value for the 'model' key in CDE_params,
# instead of a string name for pre-defined model
CDE_params = {'model' : MyCDE(data_info, model_params={})}

CC_params =  {'model' : 'KMeans',
              'model_params' : {'n_clusters' : 2}}

block_names = ['CondDensityEstimator', 'CauseClusterer']
block_params = [CDE_params, CC_params]
my_exp = Experiment(X_train=X, Y_train=Y, data_info=data_info,
                    block_names=block_names, block_params=block_params, results_path=None)
Block: model_params not specified in input, defaulting to {}
Block: verbose not specified in input, defaulting to 1
Block: tune not specified in input, defaulting to False
Block: user_input not specified in input, defaulting to True
Block: verbose not specified in input, defaulting to 1
[5]:
my_exp.train()
#################### Beginning CFL Experiment training. ####################
Beginning CondDensityEstimator training...
CondDensityEstimator training complete.
Beginning CauseClusterer training...
CauseClusterer training complete.
Experiment training complete.
[5]:
{'CondDensityEstimator': {'pyx': array([[ 1.61736868, -0.83274787,  1.79892445],
         [ 0.93879797,  0.55829648, -1.65463966],
         [-0.53689253, -0.332663  ,  1.45926297],
         ...,
         [ 0.39492332, -0.39107461, -0.41794009],
         [-0.30334537, -1.12357842,  0.44379039],
         [-1.05910614, -0.58523058,  0.2388775 ]])},
 'CauseClusterer': {'x_lbls': array([0, 0, 1, ..., 0, 1, 1], dtype=int32)}}