cfl.cond_density_estimation package

Submodules

cfl.cond_density_estimation.cde_model module

class cfl.cond_density_estimation.cde_model.CDEModel(data_info, model_params)

Bases: object

This is an abstract class defining the type of model that can be passed into a CondDensityEstimator Block. If you build your own CDE model to pass into CondDensityEstimator, you should inherit CDEModel to enure that you have specified all required functionality to properly interface with the CFL pipeline. CDEModel specifies the following required methods:

__init__ train predict load_model save_model get_model_params

abstract __init__(data_info, model_params)

Do any setup required for your model here. :param data_info: a dictionary containing information about the

data that will be passed in. Should contain - ‘X_dims’ key with a tuple value specifying shape of X, - ‘Y_dims’ key with a tuple value specifying shape of Y, - ‘Y_type’ key with a string value specifying whether Y is ‘continuous’ or ‘categorical’.

Parameters:

model_params (dict) – dictionary containing parameters for the model. This is a way for users to specify any modifiable parts of your model.

Returns: None

abstract get_model_params()

Return the specified parameters for self.model. Arguments: None :returns: dictionary of model parameters :rtype: dict

abstract load_model(path)

Load model saved at path and set self.model to it. :param path: file path to saved weights. :type path: str

Returns:

None

abstract predict(dataset, prev_results=None)

Predict P(Y|X) for samples in dataset.get_X() using the self.model trained by self.train. :param dataset: a Dataset object to generate predictions on.

X and Y can be retrieved using dataset.get_X(), dataset.get_Y()

Parameters:

prev_results (dict) – an optional dictionary of variables to feed into prediction. CondDensityEstimators don’t require variable input, so this is here for uniformity across the repo.

Returns:

a dictionary of results from prediction. A CauseClusterer,

which will generally follow a CondDensityEstimator, will receive this dictionary through it’s prev_results argument and expect it to contain ‘pyx’ as a key with it’s value being the estimate for P(Y|X) for all samples in dataset.get_X(). Other artifacts can be returned through this dictionary as well if desired.

Return type:

dict

abstract save_model(path)

Save self.model to specified file path path. :param path: path to save to. :type path: str

Returns:

None

abstract train(dataset, prev_results=None)

Train your model with a given dataset and return an estimate of the conditional probability P(Y|X). :param dataset: a Dataset object to train the model with.

X and Y can be retrieved using dataset.get_X(), dataset.get_Y()

Parameters:

prev_results (dict) – an optional dictionary of variables to feed into training. CondDensityEstimators don’t require variable input, so this is here for uniformity across the repo.

Returns:

a dictionary of results from training. A CauseClusterer,

which will generally follow a CondDensityEstimator, will receive this dictionary through it’s prev_results argument and expect it to contain ‘pyx’ as a key with it’s value being the estimate for P(Y|X) for all samples in dataset.get_X(). Other artifacts can be returned through this dictionary as well if desired.

Return type:

dict

cfl.cond_density_estimation.condDensityEstimator module

class cfl.cond_density_estimation.condDensityEstimator.CondDensityEstimator(data_info, block_params)

Bases: Block

This class inherits Block to define a Block subtype for conditional density estimation. It takes in specifications for a particular conditional density estimation model to use, and manages its instantiation, and serves as an interface between Experiment training/prediction calls and the model itself.

data_info

information about the data being trained with / predictd on

block_params

parameters to define the model

name

name of the Block type

model

the conditional density estimation model

_create_model()

given self.block_params, build the CDE model

get_block_params()

return self.block_params

_get_default_block_params()

return values for block_params to defualt to if unspecified

train()

train a model to estimate P(Y|X=x) from X,Y

predict()

predict P(Y|X=x) given a new sample x

save_block()

save the state of the object

load_block()

load the state of the object from a specified file path

__init__(data_info, block_params)

Initialize CondDensityEstimator.

Parameters:
  • data_info (dict) – dict with information about the dataset shape

  • block_params (dict) – a set of parameters specifying a CDE. The ‘model’ key must be specified and can either be the name of an cfl.cond_density_estimation model, or an instantiated CDE model object that follows the cfl.clustering.CDEModel interface. Hyperparameters for the model may be specified through the ‘model_params’ dictionary.

Returns: None

_create_model()

Return a conditional density estimator model as specified by self.block_params. If self.block_params[‘model’] is a string, it will try to instantiate a built-in cfl.cond_density_estmation model with the same name. Otherwise, it will treat the value of self.block_params[‘model’] as the instantiated model.

Arguments: None :returns:

a model that implements conditional density estimation

and follows the cde_model interface.

Return type:

type varies

_get_default_block_params()

Private method that specifies default CDE parameters. Arguments: None :returns: dictionary of parameter names (keys) and values (values) :rtype: dict

get_block_params()

Get parameters for this clustering model. Arguments: None :returns: dictionary of parameter names (keys) and values (values) :rtype: dict

load_block(file_path)

Wrapper to load model. :param file_path: file path to block to load :type file_path: str

Returns: None

predict(dataset, prev_results)

Wrapper to generate model predictions. :param dataset: Dataset object containing X, Y data to

assign partition labels to

Parameters:

prev_results (dict) – dictionary that contains any results generated in previous Block in Experiment pipeline (usually none for CDE).

Returns:

results generated by the model

Return type:

dict

save_block(file_path)

Wrapper to save model. :param file_path: file path to save block to :type file_path: str

Returns: None

train(dataset, prev_results)

Wrapper to train model. :param dataset: Dataset object containing X, Y data to

assign partition labels to

Parameters:

prev_results (dict) – dictionary that contains any results generated in previous Block in Experiment pipeline (usually none for CDE).

Returns:

results generated by the model

Return type:

dict

cfl.cond_density_estimation.condExpBase module

class cfl.cond_density_estimation.condExpBase.CondExpBase(data_info, model_params)

Bases: CDEModel

A class to define, train, and perform inference with conditional density estimators that fall under the “conditional expectation” umbrella. This subset of conditional density estimators (referred to as ‘CondExp’) learns E[P(Y|X)] instead of the full conditional distribution. This base class implements all functions needed for training and prediction, and supplies a model architecture that can be overridden by children of this class. In general, if you would like to use a CondExp CDE for your CFL pipeline, it is easiest to either 1) use the CondExpDIY child class of CondExpBase that allows you to define your network through a function specified in model_params, 2) use the condExpMod child class which allows you to pass in limited architecture specifications through the params attribute, or 3) inherit this class and override the methods you would like to modify.

name

name of the model so that the model type can be recovered from saved parameters (str)

Type:

str

data_info

dict with information about the dataset shape

Type:

dict

default_params

default parameters to fill in if user doesn’t provide a given entry

Type:

dict

model_params

parameters for the CDE that are passed in by the user and corrected by check_save_model_params

Type:

dict

trained

whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].

Type:

bool

model

tensorflow model for this CDE

Type:

tf.keras.Model.Sequential

get_model_params()

return self.model_params

load_model()

load everything needed for this CondExpBase model

save_model()

save the current state of this CondExpBase model

train()

train the neural network on a given Dataset

_graph_results()

helper function to graph training and validation loss

predict()

once the model is trained, predict for a given Dataset

load_network()

load tensorflow network weights from a file into self.network

save_network()

save the current weights of self.network

_build_network()

create and return a tensorflow network

_check_format_model_params()

check dimensionality of provided parameters and fill in any missing parameters with defaults.

__init__(data_info, model_params)

Initialize model and define network.

Parameters:
  • data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain ‘X_dims’, ‘Y_dims’, and ‘Y_type’ as keys.

  • model_params (dict) – dictionary containing parameters for the model.

  • model (str) – name of the model so that the model type can be recovered from saved parameters.

Returns:

None

abstract _build_network()

Define the neural network based on specifications in self.model_params.

Parameters:

None

Returns:

untrained network specified in self.model_params.

Return type:

tf.keras.models.Model

abstract _check_format_model_params()

Make sure all required model_params are specified and of appropriate dimensionality. Replace any missing model_params with defaults, and resolve any simple dimensionality issues if possible.

Parameters:

None

Returns:

None

Raises:

AssertionError – if params are misspecified and can’t be automatically fixed.

abstract _get_default_model_params()

Returns the default parameters specific to this type of model.

Arguments: None :returns: dictionary of default parameters :rtype: dict

_graph_results(train_loss, val_loss, show=True)

Graph training and validation loss across training epochs.

Parameters:
  • train_loss (np.ndarray) – (n_epochs,) array of training losses per epoch.

  • val_loss (np.ndarray) – (n_epochs,) array of validation losses per epoch.

  • show (bool) – displays figure if show=True. Defaults to True.

Returns:

figure object.

Return type:

matplotlib.pyplot.figure

get_model_params()

Get parameters for this CDE model. Arguments: None :returns: dictionary of parameter names (keys) and values (values) :rtype: dict

load_model(path)

Load model saved at path into this model. :param path: path to saved weights. :type path: str

Returns:

None

load_network(file_path)

Load network weights from saved checkpoint into current network.

Parameters:

file_path (str) – path to checkpoint file

Returns:

None

predict(dataset, prev_results=None)

Given a Dataset of microvariable observations, estimate macrovariable states.

Parameters:
  • dataset (Dataset) – Dataset object containing X and Y data to estimate macrovariable states for.

  • prev_results (dict) – dictionary that contains any results generated in previous Block in Experiment pipeline (usually none for CDE).

Returns:

dictionary of prediction results. Specifically, this

dictionary will contain pyx, the predicted conditional probabilites for the given Dataset.

Return type:

dict

save_model(path)

Save trained model to specified path.

Parameters:

path (str) – path to save to.

Returns:

None

save_network(file_path)

Save network weights from current network.

Parameters:

file_path (str) – path to checkpoint file

Returns:

None

train(dataset, prev_results=None)

Full training loop. Constructs t.data.Dataset for training and testing, updates model weights each epoch and evaluates on test set periodically.

Parameters:
  • dataset (Dataset) – Dataset object containing X and Y data for this training run.

  • prev_results (dict) – dictionary that contains any results generated in previous Block in Experiment pipeline (usually none for CDE).

Returns:

dictionary of CDE training results. Specifically, this will

contain pyx, the predicted conditional probabilites for the training dataset.

Return type:

dict

cfl.cond_density_estimation.condExpCNN module

class cfl.cond_density_estimation.condExpCNN.CondExpCNN(data_info, model_params)

Bases: CondExpBase

A child class of CondExpBase that defines an easy-to-parameterize convolutional neural network composed of 2D convolutional layers interspersed with pooling layers. This model is ideal for spatially organized data (like images) as it accounts for spatial reltionships between features.

See CondExpBase documentation for more details about training.

name

name of the model so that the model type can be recovered from saved parameters (str)

Type:

str

data_info

dict with information about the dataset shape

Type:

dict

default_params

default parameters to fill in if user doesn’t provide a given entry

Type:

dict

model_params

parameters for the CDE that are passed in by the user and corrected by check_save_model_params

Type:

dict

trained

whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].

Type:

bool

model

tensorflow model for this CDE

Type:

tf.keras.Model.Sequential

get_model_params()

return self.model_params

load_model()

load everything needed for this CondExpCNN model

save_model()

save the current state of this CondExpCNN model

train()

train the neural network on a given Dataset

_graph_results()

helper function to graph training and validation loss

predict()

once the model is trained, predict for a given Dataset

load_network()

load tensorflow network weights from a file into self.network

save_network()

save the current weights of self.network

_build_network()

create and return a tensorflow network

_check_format_model_params()

check dimensionality of provided parameters and fill in any missing parameters with defaults.

_get_default_model_params()

return values for block_params to defualt to if unspecified

__init__(data_info, model_params)

Initialize model and define network.

Parameters:
  • data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain ‘X_dims’, ‘Y_dims’, and ‘Y_type’ as keys.

  • model_params (dict) – dictionary containing parameters for the model.

Returns:

None

_build_network()

Define the neural network based on specifications in self.model_params.

This creates a convolutional neural net with the structure (Conv2D layer, MaxPooling2D layer) * n, Flatten layer, Dense layer(s), Output layer

The number of Conv2d/Maxpooling layers is determined by the length of the filter/kernel_size/pool_size parameter lists given in the model_params (default 2).

The dense layer(s) after flattening are to reduce the number of parameters in the model before the output layer. The output layer gives the final predictions for each feature in Y.

Parameters:

None

Returns:

untrained model specified in

self.model_params.

Return type:

tf.keras.models.Model

_check_format_model_params()

Verify that a valid CNN structure was specified in self.model_params.

Parameters:

None

Returns:

None

Raises:

AssertionError – if model architecture specified in self.model_params is invalid.

_get_default_model_params()

Returns the default parameters specific to this type of model.

Arguments: None :returns: dictionary of default parameters :rtype: dict

cfl.cond_density_estimation.condExpDIY module

class cfl.cond_density_estimation.condExpDIY.CondExpDIY(data_info, model_params)

Bases: CondExpBase

A child class of CondExpBase that takes in model specifications from self.model_params to define the model architecture. This class aims to simplify the process of tuning a mainstream feed-forward model.

See CondExpBase documentation for more details.

name

name of the model so that the model type can be recovered from saved parameters (str)

Type:

str

data_info

dict with information about the dataset shape

Type:

dict

default_params

default parameters to fill in if user doesn’t provide a given entry

Type:

dict

model_params

parameters for the CDE that are passed in by the user and corrected by check_save_model_params

Type:

dict

trained

whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].

Type:

bool

model

tensorflow model for this CDE

Type:

tf.keras.Model.Sequential

get_model_params()

return self.model_params

load_model()

load everything needed for this CondExpDIY model

save_model()

save the current state of this CondExpDIY model

train()

train the neural network on a given Dataset

_graph_results()

helper function to graph training and validation loss

predict()

once the model is trained, predict for a given Dataset

load_network()

load tensorflow network weights from a file into self.network

save_network()

save the current weights of self.network

_build_network()

create and return a tensorflow network

_check_format_model_params()

check dimensionality of provided parameters and fill in any missing parameters with defaults.

__init__(data_info, model_params)

Initialize model and define network.

Parameters:
  • data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain ‘X_dims’, ‘Y_dims’, and ‘Y_type’ as keys.

  • model_params (dict) – dictionary containing parameters for the model.

Returns:

None

_build_network()

Define the neural network based on specifications in self.model_params.

This model takes specifications through the self.model_params dict to define it’s architecture.

Parameters:

None

Returns:

untrained model specified in self.model_params.

Return type:

tf.keras.models.Model

_check_format_model_params()

Verify that valid model params were specified in self.model_params.

Parameters:

None

Returns:

None

Raises:

AssertionError – if model architecture specified in self.model_params is invalid.

_get_default_model_params()

Returns the default parameters specific to this type of model.

Parameters:

None

Returns:

dictionary of default parameters

Return type:

dict

cfl.cond_density_estimation.condExpMod module

class cfl.cond_density_estimation.condExpMod.CondExpMod(data_info, model_params)

Bases: CondExpBase

A child class of CondExpBase that takes in model specifications from self.model_params to define the model architecture. This class aims to simplify the process of tuning a mainstream feed-forward model.

See CondExpBase documentation for more details about training.

name

name of the model so that the model type can be recovered from saved parameters (str)

Type:

str

data_info

dict with information about the dataset shape

Type:

dict

default_params

default parameters to fill in if user doesn’t provide a given entry

Type:

dict

model_params

parameters for the CDE that are passed in by the user and corrected by check_save_model_params

Type:

dict

trained

whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].

Type:

bool

model

tensorflow model for this CDE

Type:

tf.keras.Model.Sequential

get_model_params()

return self.model_params

load_model()

load everything needed for this CondExpMod model

save_model()

save the current state of this CondExpMod model

train()

train the neural network on a given Dataset

_graph_results()

helper function to graph training and validation loss

predict()

once the model is trained, predict for a given Dataset

load_network()

load tensorflow network weights from a file into self.network

save_network()

save the current weights of self.network

_build_network()

create and return a tensorflow network

_check_format_model_params()

check dimensionality of provided parameters and fill in any missing parameters with defaults.

_get_default_model_params()

return values for block_params to defualt to if unspecified

__init__(data_info, model_params)

Initialize model and define network.

Parameters:
  • data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain ‘X_dims’, ‘Y_dims’, and ‘Y_type’ as keys.

  • model_params (dict) – dictionary containing parameters for the model.

Returns:

None

_build_network()

Define the neural network based on specifications in self.model_params.

This model takes specifications through the self.model_params dict to define it’s architecture.

Parameters:

None

Returns:

untrained model specified in self.model_params.

Return type:

tf.keras.models.Model

_check_format_model_params()

Make sure all required model_params are specified and of appropriate dimensionality. Replace any missing model_params with defaults, and resolve any simple dimensionality issues if possible.

Parameters:

None

Returns:

a dict of parameters cleared for model specification

Return type:

dict

Raises:

AssertionError – if params are misspecified and can’t be automatically fixed.

_get_default_model_params()

Returns the default parameters specific to this type of model.

Parameters:

None

Returns:

dictionary of default parameters

Return type:

dict

cfl.cond_density_estimation.condExpRidgeRegCV module

class cfl.cond_density_estimation.condExpRidgeRegCV.CondExpRidgeCV(data_info, model_params)

Bases: CDEModel

A ridge regression implementation of a CDE.

name

name of the model so that the model type can be recovered from saved parameters (str)

Type:

str

data_info

dict with information about the dataset shape

Type:

dict

model_params

parameters for the CDE

Type:

dict

trained

whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].

Type:

bool

model

sklearn ridge regression model

Type:

sklearn.linear_model.Ridge

alpha

final value of alpha used for fitting

Type:

float

scores

array of scores from cross-validation

Type:

np.ndarray

get_model_params()

return self.model_params

load_model()

load everything needed for this CondExpRidgeCV model

save_model()

save the current state of this CondExpRidgeCV model

train()

fit the model on a given Dataset

predict()

once the model is trained, predict for a given Dataset

__init__(data_info, model_params)
Parameters:
  • data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain - ‘X_dims’ key with a tuple value specifying shape of X, - ‘Y_dims’ key with a tuple value specifying shape of Y, - ‘Y_type’ key with a string value specifying whether Y is ‘continuous’ or ‘categorical’.

  • model_params (dict) – dictionary containing parameters for the model. This is a way for users to specify any modifiable parts of your model.

Returns: None

get_model_params()

Return the specified parameters for self.model. Arguments: None :returns: dictionary of model parameters :rtype: dict

load_model(path)

Load model saved at path and set self.model to it. :param path: file path to saved weights. :type path: str

Returns:

None

predict(dataset, prev_results=None)

Predict P(Y|X) for samples in dataset.get_X() using the self.model trained by self.train. :param dataset: a Dataset object to generate predictions on.

X and Y can be retrieved using dataset.get_X(), dataset.get_Y()

Parameters:

prev_results (dict) – an optional dictionary of variables to feed into prediction. CondDensityEstimators don’t require variable input, so this is here for uniformity across the repo.

Returns:

a dictionary of results from prediction. A CauseClusterer,

which will generally follow a CondDensityEstimator, will receive this dictionary through it’s prev_results argument and expect it to contain ‘pyx’ as a key with it’s value being the estimate for P(Y|X) for all samples in dataset.get_X(). Other artifacts can be returned through this dictionary as well if desired.

Return type:

dict

save_model(path)

Save self.model to specified file path path. :param path: path to save to. :type path: str

Returns:

None

train(dataset, prev_results=None)

Train your model with a given dataset and return an estimate of the conditional probability P(Y|X). :param dataset: a Dataset object to train the model with.

X and Y can be retrieved using dataset.get_X(), dataset.get_Y()

Parameters:

prev_results (dict) – an optional dictionary of variables to feed into training. CondDensityEstimators don’t require variable input, so this is here for uniformity across the repo.

Returns:

a dictionary of results from training. A CauseClusterer,

which will generally follow a CondDensityEstimator, will receive this dictionary through it’s prev_results argument and expect it to contain ‘pyx’ as a key with it’s value being the estimate for P(Y|X) for all samples in dataset.get_X(). Other artifacts can be returned through this dictionary as well if desired.

Return type:

dict

Module contents