cfl package
Subpackages
- cfl.clustering package
- cfl.cond_density_estimation package
- Submodules
- cfl.cond_density_estimation.cde_model module
- cfl.cond_density_estimation.condDensityEstimator module
- cfl.cond_density_estimation.condExpBase module
- cfl.cond_density_estimation.condExpCNN module
- cfl.cond_density_estimation.condExpDIY module
- cfl.cond_density_estimation.condExpMod module
- cfl.cond_density_estimation.condExpRidgeRegCV module
- Module contents
- cfl.post_cfl package
- cfl.util package
- cfl.visualization package
Submodules
cfl.block module
- class cfl.block.Block(data_info, block_params)
Bases:
object
- A Block is an object that can:
be trained on a Dataset
predict some target for a Dataset.
Blocks are intended to be the components of a graph workflow in an Experiment. For example, if the graph Block_A->Block_B is constructed in an Experiment, the output of Block_A will provide input to Block_B.
- __init__(data_info, block_params)
Instantiate the specified model.
- Parameters:
data_info (dict) – dict of information about associated datasets
block_params (dict) – parameters for this model
Returns: None
- _check_block_params(input_params)
- Check that all expected block parameters have been provided,
and substitute the default if not. Remove any unused but specified parameters. Arguments:
params (dict): dictionary, where keys are parameter names)
- Returns:
dict: Verified parameter dictionary
- abstract _get_default_block_params()
Get the default parameters for the Block.
- Parameters:
None –
- Returns:
dictionary of default parameters.
- Return type:
dict
- get_name()
Return name of model.
Arguments: None :returns: name of the model :rtype: str
- get_params()
Return block params. Arguments: None :returns: parameters specified for this Block. :rtype: dict
- is_trained()
Return whether this block has been trained yet.
- Parameters:
None –
- Returns:
whether the block has been trained
- Return type:
bool
- abstract load_block(path)
Load a Block that has already been trained in a previous Experiment. All Blocks should be load-able with just a path name. The specific Block type is responsible for making sure that it has loaded all relevant fields.
- Parameters:
path – path to load from
Returns: None
- abstract predict(dataset, prev_results=None)
Make prediction for the specified dataset with the model attribute.
- Parameters:
dataset (Dataset) – dataset for model to predict on
prev_results (dict) – any results computed by the previous Block during prediction.
- Returns:
- a dictionary of results to be saved and to pass on as the
’prev_results’ argument to the next Block’s predict method.
- Return type:
dict
- abstract save_block(path)
Save a Block that has been trained so that in can be reconstructed using load_block.
- Parameters:
path – path to save at
Returns: None
- abstract train(dataset, prev_results=None)
Train model attribute.
- Parameters:
dataset (Dataset) – dataset to train model with
prev_results (dict) – any results computed by the previous Block during training.
- Returns:
- a dictionary of results to be saved and to pass on as the
’prev_results’ argument to the next Block’s train method.
- Return type:
dict
cfl.dataset module
Dataset Module
- class cfl.dataset.Dataset(X, Y, name='dataset', Xraw=None, Yraw=None, in_sample_idx=None, out_sample_idx=None)
Bases:
object
The Dataset class packages X and Y so that they can be easily passed through steps of CFL and saved consistently. It enforces separation of any withheld datasets passed through CFL for prediction after training.
Note
Xraw and Yraw attributes will be deprecated soon as the visualizations interface has changed and no longer requires them to be stored in a Dataset.
- __init__(X, Y, name='dataset', Xraw=None, Yraw=None, in_sample_idx=None, out_sample_idx=None)
Initialize Dataset.
- Parameters:
X (np.ndarray) – X data to pass through CFL pipeline, dimensions (n_samples, n_x_features). #TODO: dimensions different if going to use a CNN
Y (np.ndarray) – Y data to pass through CFL pipeline, dimensions (n_samples, n_y_features).
name (str) – name of Dataset. Defaults to ‘dataset’.
Xraw (np.ndarray) – (Optional) raw form of X before preprocessing to remain associated with X for visualization. Defaults to None.
Yraw (np.ndarray) – (Optional) raw form of Y before preprocessing to remain associated with Y for visualization. Defaults to None.
- Returns:
None
- get_X()
Return X array associated with this Dataset Arguments: None :returns: an (n_samples,n_X_features) array :rtype: np.ndarray
- get_Y()
Return Y array associated with this Dataset Arguments: None :returns: an (n_samples,n_Y_features) array :rtype: np.ndarray
- get_cfl_results()
Return cfl results generated by passing this dataset through Experiment training or prediction. Arguments: None :returns: results generated by cfl.Experiment.train or .predict :rtype: dict
- get_in_sample_idx()
Return in_sample_idx set for this Dataset. Arguments: None :returns: an array of sample indices in this subset :rtype: np.ndarray
- get_name()
Return the name of this Dataset. Arguments: None :returns: name associated with this Dataset. :rtype: str
- get_out_sample_idx()
Return out_sample_idx set for this Dataset. Arguments: None :returns: an array of sample indices in this subset :rtype: np.ndarray
- set_cfl_results(cfl_results)
Assign results from a CFL Experiment run to this Dataset. :param cfl_results: results generated by cfl.Experiment.train
or .predict
Returns: None
- set_in_sample_idx(in_sample_idx)
Set in_sample_idx set for this Dataset. :param in_sample_idx: an array of sample indices in this
subset
Returns: None
- set_out_sample_idx(out_sample_idx)
Set out_sample_idx set for this Dataset. :param out_sample_idx: an array of sample indices in this
subset
Returns: None
cfl.experiment module
- class cfl.experiment.Experiment(data_info, X_train, Y_train, X_train_raw=None, Y_train_raw=None, in_sample_idx=None, out_sample_idx=None, past_exp_path=None, block_names=None, block_params=None, blocks=None, verbose=1, results_path=None)
Bases:
object
- The Experiment class:
Creates a pipeline to pass data through the different Blocks of CFL
Save parameters, models, results of the pipeline for reuse
- verbose
controls printout level
- is_trained
boolean indicating whether Experiment pipeline has been trained yet
- blocks
list of Block objects in pipeline
- block_names
list of names of Blocks in pipeline
- block_params
list of parameter dicts for each Block in pipeline
- data_info
dict of information about the training dataset
- datasets
list of Dataset objects registered to this Experiment
- save_path
path to directory to save Experiment results to
- train()
trains each Block in self.blocks according to it’s self-specified train method.
- predict()
generates predictions for each Block in self.blocks according to it’s self-specified predict method.
- __save_results()
helper function to save results generated by each Block
- __save_params()
helper function to save parameters for each Block
- __load_params()
helper function to load parameters for Blocks
- add_dataset()
registeres a new Dataset to this Experiment
- get_dataset()
get a Dataset by name from registry
- load_dataset_results()
load results for a given Dataset from saved Experiment directory
- __build_block()
build a Block by str name or return the Block itself if already instantiated in the argument.
- __make_exp_dir()
make directory to save Experiment results and parameterization to.
- __init__(data_info, X_train, Y_train, X_train_raw=None, Y_train_raw=None, in_sample_idx=None, out_sample_idx=None, past_exp_path=None, block_names=None, block_params=None, blocks=None, verbose=1, results_path=None)
Sets up and trains an Experiment.
- Parameters:
data_info (dict) – a dictionary of information about this Experiment’s associated data. Refer to cfl.block.validate_data_info() for more information.
X_train (np.ndarray) – an (n_samples, n_x_features) 2D array.
Y_train (np.ndarray) – an (n_samples, n_y_features) 2D array.
X_train_raw (np.ndarray) – Deprecated, defaults to None.
Y_train_raw (np.ndarray) – Deprecated, defaults to None.
in_sample_idx (np.ndarray) – array of indices to include in training of CFL pipeline on X,Y. If None, will automatically generate. Defaults to None.
out_sample_idx (np.ndarray) – array of indices to withhold in training for validation of CFL pipeline on X,Y. If None, will automatically generate. Defaults to None.
past_exp_path (str) – path to directory associated with a previously trained Experiment. See note below.
block_names (list of strs) – list of block names to use (i.e. [‘CondDensityEstimator’, ‘CauseClusterer’, ‘EffectClusterer]). See note below.
block_params (list of dicts) – list of dicts specifying parameters for each block specified in block_names. Default is None. See note below.
blocks (list of Blocks) – list of block objects. Default is None. See note below.
verbose (int) – Amount of logging to print. Possible values are 0, 1, 2. Default is 1.,
results_path (str) – path to directory to save this experiment to. If None, results will not be saved. Default is None.
- Note: There are three ways to specify Blocks:
specify past_exp_path
specify both block_names and block_params
specify blocks.
Do not specify all four of these parameters.
- add_dataset(X, Y, dataset_name, Xraw=None, Yraw=None, in_sample_idx=None, out_sample_idx=None)
Add a new dataset to be tracked by this Experiment.
- Parameters:
X (np.ndarray) – X data of shape (n_samples, n_x_features) associated with this Dataset.
Y (np.ndarray) – Y data of shape (n_samples, n_y_features) associated with this Dataset.
dataset_name (str) – name associated with this Dataset. This will be the name used to retrieve a dataset using the Experiment.get_dataset() method.
Xraw (np.ndarray) – (Optional) raw form of X before preprocessing to remain associated with X for visualization. Defaults to None. Deprecated.
Yraw (np.ndarray) – (Optional) raw form of Y before preprocessing to remain associated with Y for visualization. Defaults to None. Deprecated.
- Returns:
the newly constructed Dataset object.
- Return type:
- get_data_info()
- get_dataset(dataset_name)
Retrieve a Dataset that has been registered with this Experiment.
- Parameters:
dataset_name (str) – name of the Dataset to retrieve.
- Returns:
the Dataset associated with dataset_name.
- Return type:
- get_save_path()
Return the path at which experiment results are saved. Arguments: None :returns: path to experiment :rtype: str
- load_results_from_file(dataset_name='dataset_train')
Load and return saved results from running a given dataset through the Experiment pipeline. This function differs from retrieve_results() because this loads the saved results from their save directory
- Parameters:
dataset_name (str) – name of Dataset to load results for. Defaults to the dataset used to train the pipeline, ‘dataset_train’.
- Returns:
- dictionary of results-dictionaries. The first key
specifies which Block the results come from. The second key specifies the specific result.
- Return type:
dict of dicts
- predict(dataset, prev_results=None)
Predict using the trained CFL pipeline.
- Parameters:
dataset (str or Dataset) – dataset name or object.
prev_results (dict) – dict of results to pass to first Block to predict with, if needed.
- Returns:
dict of results dictionaries from all Blocks.
- Return type:
(dict)
- retrieve_results(dataset_name='dataset_train')
Returns the results from running a given dataset through the Experiment pipeline. Default is the training dataset
- Parameters:
dataset_name (str) – name of Dataset to load results for. Defaults to the dataset used to train the pipeline, ‘dataset_train’.
- Returns:
- dictionary of results-dictionaries. The first key
specifies which Block the results come from. The second key specifies the specific result.
- Return type:
dict
- cfl.experiment.get_next_dirname(path)
gets the next subdirectory name in numerical order. i.e. if ‘path’ contains ‘run0000’ and ‘run0001’, this will return ‘run0002’. :param path: path of directory in which to find next subdirectory name :type path: str
- Returns:
next subdirectory name.
- Return type:
str