cfl.cond_density_estimation package
Submodules
cfl.cond_density_estimation.cde_model module
- class cfl.cond_density_estimation.cde_model.CDEModel(data_info, model_params)
Bases:
object
This is an abstract class defining the type of model that can be passed into a CondDensityEstimator Block. If you build your own CDE model to pass into CondDensityEstimator, you should inherit CDEModel to enure that you have specified all required functionality to properly interface with the CFL pipeline. CDEModel specifies the following required methods:
__init__ train predict load_model save_model get_model_params
- abstract __init__(data_info, model_params)
Do any setup required for your model here. :param data_info: a dictionary containing information about the
data that will be passed in. Should contain - ‘X_dims’ key with a tuple value specifying shape of X, - ‘Y_dims’ key with a tuple value specifying shape of Y, - ‘Y_type’ key with a string value specifying whether Y is ‘continuous’ or ‘categorical’.
- Parameters:
model_params (dict) – dictionary containing parameters for the model. This is a way for users to specify any modifiable parts of your model.
Returns: None
- abstract get_model_params()
Return the specified parameters for self.model. Arguments: None :returns: dictionary of model parameters :rtype: dict
- abstract load_model(path)
Load model saved at path and set self.model to it. :param path: file path to saved weights. :type path: str
- Returns:
None
- abstract predict(dataset, prev_results=None)
Predict P(Y|X) for samples in dataset.get_X() using the self.model trained by self.train. :param dataset: a Dataset object to generate predictions on.
X and Y can be retrieved using dataset.get_X(), dataset.get_Y()
- Parameters:
prev_results (dict) – an optional dictionary of variables to feed into prediction. CondDensityEstimators don’t require variable input, so this is here for uniformity across the repo.
- Returns:
- a dictionary of results from prediction. A CauseClusterer,
which will generally follow a CondDensityEstimator, will receive this dictionary through it’s prev_results argument and expect it to contain ‘pyx’ as a key with it’s value being the estimate for P(Y|X) for all samples in dataset.get_X(). Other artifacts can be returned through this dictionary as well if desired.
- Return type:
dict
- abstract save_model(path)
Save self.model to specified file path path. :param path: path to save to. :type path: str
- Returns:
None
- abstract train(dataset, prev_results=None)
Train your model with a given dataset and return an estimate of the conditional probability P(Y|X). :param dataset: a Dataset object to train the model with.
X and Y can be retrieved using dataset.get_X(), dataset.get_Y()
- Parameters:
prev_results (dict) – an optional dictionary of variables to feed into training. CondDensityEstimators don’t require variable input, so this is here for uniformity across the repo.
- Returns:
- a dictionary of results from training. A CauseClusterer,
which will generally follow a CondDensityEstimator, will receive this dictionary through it’s prev_results argument and expect it to contain ‘pyx’ as a key with it’s value being the estimate for P(Y|X) for all samples in dataset.get_X(). Other artifacts can be returned through this dictionary as well if desired.
- Return type:
dict
cfl.cond_density_estimation.condDensityEstimator module
- class cfl.cond_density_estimation.condDensityEstimator.CondDensityEstimator(data_info, block_params)
Bases:
Block
This class inherits Block to define a Block subtype for conditional density estimation. It takes in specifications for a particular conditional density estimation model to use, and manages its instantiation, and serves as an interface between Experiment training/prediction calls and the model itself.
- data_info
information about the data being trained with / predictd on
- block_params
parameters to define the model
- name
name of the Block type
- model
the conditional density estimation model
- _create_model()
given self.block_params, build the CDE model
- get_block_params()
return self.block_params
- _get_default_block_params()
return values for block_params to defualt to if unspecified
- train()
train a model to estimate P(Y|X=x) from X,Y
- predict()
predict P(Y|X=x) given a new sample x
- save_block()
save the state of the object
- load_block()
load the state of the object from a specified file path
- __init__(data_info, block_params)
Initialize CondDensityEstimator.
- Parameters:
data_info (dict) – dict with information about the dataset shape
block_params (dict) – a set of parameters specifying a CDE. The ‘model’ key must be specified and can either be the name of an cfl.cond_density_estimation model, or an instantiated CDE model object that follows the cfl.clustering.CDEModel interface. Hyperparameters for the model may be specified through the ‘model_params’ dictionary.
Returns: None
- _create_model()
Return a conditional density estimator model as specified by self.block_params. If self.block_params[‘model’] is a string, it will try to instantiate a built-in cfl.cond_density_estmation model with the same name. Otherwise, it will treat the value of self.block_params[‘model’] as the instantiated model.
Arguments: None :returns:
- a model that implements conditional density estimation
and follows the cde_model interface.
- Return type:
type varies
- _get_default_block_params()
Private method that specifies default CDE parameters. Arguments: None :returns: dictionary of parameter names (keys) and values (values) :rtype: dict
- get_block_params()
Get parameters for this clustering model. Arguments: None :returns: dictionary of parameter names (keys) and values (values) :rtype: dict
- load_block(file_path)
Wrapper to load model. :param file_path: file path to block to load :type file_path: str
Returns: None
- predict(dataset, prev_results)
Wrapper to generate model predictions. :param dataset: Dataset object containing X, Y data to
assign partition labels to
- Parameters:
prev_results (dict) – dictionary that contains any results generated in previous Block in Experiment pipeline (usually none for CDE).
- Returns:
results generated by the model
- Return type:
dict
- save_block(file_path)
Wrapper to save model. :param file_path: file path to save block to :type file_path: str
Returns: None
- train(dataset, prev_results)
Wrapper to train model. :param dataset: Dataset object containing X, Y data to
assign partition labels to
- Parameters:
prev_results (dict) – dictionary that contains any results generated in previous Block in Experiment pipeline (usually none for CDE).
- Returns:
results generated by the model
- Return type:
dict
cfl.cond_density_estimation.condExpBase module
- class cfl.cond_density_estimation.condExpBase.CondExpBase(data_info, model_params)
Bases:
CDEModel
A class to define, train, and perform inference with conditional density estimators that fall under the “conditional expectation” umbrella. This subset of conditional density estimators (referred to as ‘CondExp’) learns E[P(Y|X)] instead of the full conditional distribution. This base class implements all functions needed for training and prediction, and supplies a model architecture that can be overridden by children of this class. In general, if you would like to use a CondExp CDE for your CFL pipeline, it is easiest to either 1) use the CondExpDIY child class of CondExpBase that allows you to define your network through a function specified in model_params, 2) use the condExpMod child class which allows you to pass in limited architecture specifications through the params attribute, or 3) inherit this class and override the methods you would like to modify.
- name
name of the model so that the model type can be recovered from saved parameters (str)
- Type:
str
- data_info
dict with information about the dataset shape
- Type:
dict
- default_params
default parameters to fill in if user doesn’t provide a given entry
- Type:
dict
- model_params
parameters for the CDE that are passed in by the user and corrected by check_save_model_params
- Type:
dict
- trained
whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].
- Type:
bool
- model
tensorflow model for this CDE
- Type:
tf.keras.Model.Sequential
- get_model_params()
return self.model_params
- load_model()
load everything needed for this CondExpBase model
- save_model()
save the current state of this CondExpBase model
- train()
train the neural network on a given Dataset
- _graph_results()
helper function to graph training and validation loss
- predict()
once the model is trained, predict for a given Dataset
- load_network()
load tensorflow network weights from a file into self.network
- save_network()
save the current weights of self.network
- _build_network()
create and return a tensorflow network
- _check_format_model_params()
check dimensionality of provided parameters and fill in any missing parameters with defaults.
- __init__(data_info, model_params)
Initialize model and define network.
- Parameters:
data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain ‘X_dims’, ‘Y_dims’, and ‘Y_type’ as keys.
model_params (dict) – dictionary containing parameters for the model.
model (str) – name of the model so that the model type can be recovered from saved parameters.
- Returns:
None
- abstract _build_network()
Define the neural network based on specifications in self.model_params.
- Parameters:
None –
- Returns:
untrained network specified in self.model_params.
- Return type:
tf.keras.models.Model
- abstract _check_format_model_params()
Make sure all required model_params are specified and of appropriate dimensionality. Replace any missing model_params with defaults, and resolve any simple dimensionality issues if possible.
- Parameters:
None –
- Returns:
None
- Raises:
AssertionError – if params are misspecified and can’t be automatically fixed.
- abstract _get_default_model_params()
Returns the default parameters specific to this type of model.
Arguments: None :returns: dictionary of default parameters :rtype: dict
- _graph_results(train_loss, val_loss, show=True)
Graph training and validation loss across training epochs.
- Parameters:
train_loss (np.ndarray) – (n_epochs,) array of training losses per epoch.
val_loss (np.ndarray) – (n_epochs,) array of validation losses per epoch.
show (bool) – displays figure if show=True. Defaults to True.
- Returns:
figure object.
- Return type:
matplotlib.pyplot.figure
- get_model_params()
Get parameters for this CDE model. Arguments: None :returns: dictionary of parameter names (keys) and values (values) :rtype: dict
- load_model(path)
Load model saved at path into this model. :param path: path to saved weights. :type path: str
- Returns:
None
- load_network(file_path)
Load network weights from saved checkpoint into current network.
- Parameters:
file_path (str) – path to checkpoint file
- Returns:
None
- predict(dataset, prev_results=None)
Given a Dataset of microvariable observations, estimate macrovariable states.
- Parameters:
dataset (Dataset) – Dataset object containing X and Y data to estimate macrovariable states for.
prev_results (dict) – dictionary that contains any results generated in previous Block in Experiment pipeline (usually none for CDE).
- Returns:
- dictionary of prediction results. Specifically, this
dictionary will contain pyx, the predicted conditional probabilites for the given Dataset.
- Return type:
dict
- save_model(path)
Save trained model to specified path.
- Parameters:
path (str) – path to save to.
- Returns:
None
- save_network(file_path)
Save network weights from current network.
- Parameters:
file_path (str) – path to checkpoint file
- Returns:
None
- train(dataset, prev_results=None)
Full training loop. Constructs t.data.Dataset for training and testing, updates model weights each epoch and evaluates on test set periodically.
- Parameters:
dataset (Dataset) – Dataset object containing X and Y data for this training run.
prev_results (dict) – dictionary that contains any results generated in previous Block in Experiment pipeline (usually none for CDE).
- Returns:
- dictionary of CDE training results. Specifically, this will
contain pyx, the predicted conditional probabilites for the training dataset.
- Return type:
dict
cfl.cond_density_estimation.condExpCNN module
- class cfl.cond_density_estimation.condExpCNN.CondExpCNN(data_info, model_params)
Bases:
CondExpBase
A child class of CondExpBase that defines an easy-to-parameterize convolutional neural network composed of 2D convolutional layers interspersed with pooling layers. This model is ideal for spatially organized data (like images) as it accounts for spatial reltionships between features.
See CondExpBase documentation for more details about training.
- name
name of the model so that the model type can be recovered from saved parameters (str)
- Type:
str
- data_info
dict with information about the dataset shape
- Type:
dict
- default_params
default parameters to fill in if user doesn’t provide a given entry
- Type:
dict
- model_params
parameters for the CDE that are passed in by the user and corrected by check_save_model_params
- Type:
dict
- trained
whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].
- Type:
bool
- model
tensorflow model for this CDE
- Type:
tf.keras.Model.Sequential
- get_model_params()
return self.model_params
- load_model()
load everything needed for this CondExpCNN model
- save_model()
save the current state of this CondExpCNN model
- train()
train the neural network on a given Dataset
- _graph_results()
helper function to graph training and validation loss
- predict()
once the model is trained, predict for a given Dataset
- load_network()
load tensorflow network weights from a file into self.network
- save_network()
save the current weights of self.network
- _build_network()
create and return a tensorflow network
- _check_format_model_params()
check dimensionality of provided parameters and fill in any missing parameters with defaults.
- _get_default_model_params()
return values for block_params to defualt to if unspecified
- __init__(data_info, model_params)
Initialize model and define network.
- Parameters:
data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain ‘X_dims’, ‘Y_dims’, and ‘Y_type’ as keys.
model_params (dict) – dictionary containing parameters for the model.
- Returns:
None
- _build_network()
Define the neural network based on specifications in self.model_params.
This creates a convolutional neural net with the structure (Conv2D layer, MaxPooling2D layer) * n, Flatten layer, Dense layer(s), Output layer
The number of Conv2d/Maxpooling layers is determined by the length of the filter/kernel_size/pool_size parameter lists given in the model_params (default 2).
The dense layer(s) after flattening are to reduce the number of parameters in the model before the output layer. The output layer gives the final predictions for each feature in Y.
- Parameters:
None –
- Returns:
- untrained model specified in
self.model_params.
- Return type:
tf.keras.models.Model
- _check_format_model_params()
Verify that a valid CNN structure was specified in self.model_params.
- Parameters:
None –
- Returns:
None
- Raises:
AssertionError – if model architecture specified in self.model_params is invalid.
- _get_default_model_params()
Returns the default parameters specific to this type of model.
Arguments: None :returns: dictionary of default parameters :rtype: dict
cfl.cond_density_estimation.condExpDIY module
- class cfl.cond_density_estimation.condExpDIY.CondExpDIY(data_info, model_params)
Bases:
CondExpBase
A child class of CondExpBase that takes in model specifications from self.model_params to define the model architecture. This class aims to simplify the process of tuning a mainstream feed-forward model.
See CondExpBase documentation for more details.
- name
name of the model so that the model type can be recovered from saved parameters (str)
- Type:
str
- data_info
dict with information about the dataset shape
- Type:
dict
- default_params
default parameters to fill in if user doesn’t provide a given entry
- Type:
dict
- model_params
parameters for the CDE that are passed in by the user and corrected by check_save_model_params
- Type:
dict
- trained
whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].
- Type:
bool
- model
tensorflow model for this CDE
- Type:
tf.keras.Model.Sequential
- get_model_params()
return self.model_params
- load_model()
load everything needed for this CondExpDIY model
- save_model()
save the current state of this CondExpDIY model
- train()
train the neural network on a given Dataset
- _graph_results()
helper function to graph training and validation loss
- predict()
once the model is trained, predict for a given Dataset
- load_network()
load tensorflow network weights from a file into self.network
- save_network()
save the current weights of self.network
- _build_network()
create and return a tensorflow network
- _check_format_model_params()
check dimensionality of provided parameters and fill in any missing parameters with defaults.
- __init__(data_info, model_params)
Initialize model and define network.
- Parameters:
data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain ‘X_dims’, ‘Y_dims’, and ‘Y_type’ as keys.
model_params (dict) – dictionary containing parameters for the model.
- Returns:
None
- _build_network()
Define the neural network based on specifications in self.model_params.
This model takes specifications through the self.model_params dict to define it’s architecture.
- Parameters:
None –
- Returns:
untrained model specified in self.model_params.
- Return type:
tf.keras.models.Model
- _check_format_model_params()
Verify that valid model params were specified in self.model_params.
- Parameters:
None –
- Returns:
None
- Raises:
AssertionError – if model architecture specified in self.model_params is invalid.
- _get_default_model_params()
Returns the default parameters specific to this type of model.
- Parameters:
None –
- Returns:
dictionary of default parameters
- Return type:
dict
cfl.cond_density_estimation.condExpMod module
- class cfl.cond_density_estimation.condExpMod.CondExpMod(data_info, model_params)
Bases:
CondExpBase
A child class of CondExpBase that takes in model specifications from self.model_params to define the model architecture. This class aims to simplify the process of tuning a mainstream feed-forward model.
See CondExpBase documentation for more details about training.
- name
name of the model so that the model type can be recovered from saved parameters (str)
- Type:
str
- data_info
dict with information about the dataset shape
- Type:
dict
- default_params
default parameters to fill in if user doesn’t provide a given entry
- Type:
dict
- model_params
parameters for the CDE that are passed in by the user and corrected by check_save_model_params
- Type:
dict
- trained
whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].
- Type:
bool
- model
tensorflow model for this CDE
- Type:
tf.keras.Model.Sequential
- get_model_params()
return self.model_params
- load_model()
load everything needed for this CondExpMod model
- save_model()
save the current state of this CondExpMod model
- train()
train the neural network on a given Dataset
- _graph_results()
helper function to graph training and validation loss
- predict()
once the model is trained, predict for a given Dataset
- load_network()
load tensorflow network weights from a file into self.network
- save_network()
save the current weights of self.network
- _build_network()
create and return a tensorflow network
- _check_format_model_params()
check dimensionality of provided parameters and fill in any missing parameters with defaults.
- _get_default_model_params()
return values for block_params to defualt to if unspecified
- __init__(data_info, model_params)
Initialize model and define network.
- Parameters:
data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain ‘X_dims’, ‘Y_dims’, and ‘Y_type’ as keys.
model_params (dict) – dictionary containing parameters for the model.
- Returns:
None
- _build_network()
Define the neural network based on specifications in self.model_params.
This model takes specifications through the self.model_params dict to define it’s architecture.
- Parameters:
None –
- Returns:
untrained model specified in self.model_params.
- Return type:
tf.keras.models.Model
- _check_format_model_params()
Make sure all required model_params are specified and of appropriate dimensionality. Replace any missing model_params with defaults, and resolve any simple dimensionality issues if possible.
- Parameters:
None –
- Returns:
a dict of parameters cleared for model specification
- Return type:
dict
- Raises:
AssertionError – if params are misspecified and can’t be automatically fixed.
- _get_default_model_params()
Returns the default parameters specific to this type of model.
- Parameters:
None –
- Returns:
dictionary of default parameters
- Return type:
dict
cfl.cond_density_estimation.condExpRidgeRegCV module
- class cfl.cond_density_estimation.condExpRidgeRegCV.CondExpRidgeCV(data_info, model_params)
Bases:
CDEModel
A ridge regression implementation of a CDE.
- name
name of the model so that the model type can be recovered from saved parameters (str)
- Type:
str
- data_info
dict with information about the dataset shape
- Type:
dict
- model_params
parameters for the CDE
- Type:
dict
- trained
whether or not the modeled has been trained yet. This can either happen by defining by instantiating the class and calling train, or by passing in a path to saved weights from a previous training session through model_params[‘weights_path’].
- Type:
bool
- model
sklearn ridge regression model
- Type:
sklearn.linear_model.Ridge
- alpha
final value of alpha used for fitting
- Type:
float
- scores
array of scores from cross-validation
- Type:
np.ndarray
- get_model_params()
return self.model_params
- load_model()
load everything needed for this CondExpRidgeCV model
- save_model()
save the current state of this CondExpRidgeCV model
- train()
fit the model on a given Dataset
- predict()
once the model is trained, predict for a given Dataset
- __init__(data_info, model_params)
- Parameters:
data_info (dict) – a dictionary containing information about the data that will be passed in. Should contain - ‘X_dims’ key with a tuple value specifying shape of X, - ‘Y_dims’ key with a tuple value specifying shape of Y, - ‘Y_type’ key with a string value specifying whether Y is ‘continuous’ or ‘categorical’.
model_params (dict) – dictionary containing parameters for the model. This is a way for users to specify any modifiable parts of your model.
Returns: None
- get_model_params()
Return the specified parameters for self.model. Arguments: None :returns: dictionary of model parameters :rtype: dict
- load_model(path)
Load model saved at path and set self.model to it. :param path: file path to saved weights. :type path: str
- Returns:
None
- predict(dataset, prev_results=None)
Predict P(Y|X) for samples in dataset.get_X() using the self.model trained by self.train. :param dataset: a Dataset object to generate predictions on.
X and Y can be retrieved using dataset.get_X(), dataset.get_Y()
- Parameters:
prev_results (dict) – an optional dictionary of variables to feed into prediction. CondDensityEstimators don’t require variable input, so this is here for uniformity across the repo.
- Returns:
- a dictionary of results from prediction. A CauseClusterer,
which will generally follow a CondDensityEstimator, will receive this dictionary through it’s prev_results argument and expect it to contain ‘pyx’ as a key with it’s value being the estimate for P(Y|X) for all samples in dataset.get_X(). Other artifacts can be returned through this dictionary as well if desired.
- Return type:
dict
- save_model(path)
Save self.model to specified file path path. :param path: path to save to. :type path: str
- Returns:
None
- train(dataset, prev_results=None)
Train your model with a given dataset and return an estimate of the conditional probability P(Y|X). :param dataset: a Dataset object to train the model with.
X and Y can be retrieved using dataset.get_X(), dataset.get_Y()
- Parameters:
prev_results (dict) – an optional dictionary of variables to feed into training. CondDensityEstimators don’t require variable input, so this is here for uniformity across the repo.
- Returns:
- a dictionary of results from training. A CauseClusterer,
which will generally follow a CondDensityEstimator, will receive this dictionary through it’s prev_results argument and expect it to contain ‘pyx’ as a key with it’s value being the estimate for P(Y|X) for all samples in dataset.get_X(). Other artifacts can be returned through this dictionary as well if desired.
- Return type:
dict