cfl package

Subpackages

Submodules

cfl.block module

class cfl.block.Block(data_info, block_params)

Bases: object

A Block is an object that can:
  1. be trained on a Dataset

  2. predict some target for a Dataset.

Blocks are intended to be the components of a graph workflow in an Experiment. For example, if the graph Block_A->Block_B is constructed in an Experiment, the output of Block_A will provide input to Block_B.

__init__(data_info, block_params)

Instantiate the specified model.

Parameters:
  • data_info (dict) – dict of information about associated datasets

  • block_params (dict) – parameters for this model

Returns: None

_check_block_params(input_params)
Check that all expected block parameters have been provided,

and substitute the default if not. Remove any unused but specified parameters. Arguments:

params (dict): dictionary, where keys are parameter names)

Returns:

dict: Verified parameter dictionary

abstract _get_default_block_params()

Get the default parameters for the Block.

Parameters:

None

Returns:

dictionary of default parameters.

Return type:

dict

get_name()

Return name of model.

Arguments: None :returns: name of the model :rtype: str

get_params()

Return block params. Arguments: None :returns: parameters specified for this Block. :rtype: dict

is_trained()

Return whether this block has been trained yet.

Parameters:

None

Returns:

whether the block has been trained

Return type:

bool

abstract load_block(path)

Load a Block that has already been trained in a previous Experiment. All Blocks should be load-able with just a path name. The specific Block type is responsible for making sure that it has loaded all relevant fields.

Parameters:

path – path to load from

Returns: None

abstract predict(dataset, prev_results=None)

Make prediction for the specified dataset with the model attribute.

Parameters:
  • dataset (Dataset) – dataset for model to predict on

  • prev_results (dict) – any results computed by the previous Block during prediction.

Returns:

a dictionary of results to be saved and to pass on as the

’prev_results’ argument to the next Block’s predict method.

Return type:

dict

abstract save_block(path)

Save a Block that has been trained so that in can be reconstructed using load_block.

Parameters:

path – path to save at

Returns: None

abstract train(dataset, prev_results=None)

Train model attribute.

Parameters:
  • dataset (Dataset) – dataset to train model with

  • prev_results (dict) – any results computed by the previous Block during training.

Returns:

a dictionary of results to be saved and to pass on as the

’prev_results’ argument to the next Block’s train method.

Return type:

dict

cfl.dataset module

Dataset Module

class cfl.dataset.Dataset(X, Y, name='dataset', Xraw=None, Yraw=None, in_sample_idx=None, out_sample_idx=None)

Bases: object

The Dataset class packages X and Y so that they can be easily passed through steps of CFL and saved consistently. It enforces separation of any withheld datasets passed through CFL for prediction after training.

Note

Xraw and Yraw attributes will be deprecated soon as the visualizations interface has changed and no longer requires them to be stored in a Dataset.

__init__(X, Y, name='dataset', Xraw=None, Yraw=None, in_sample_idx=None, out_sample_idx=None)

Initialize Dataset.

Parameters:
  • X (np.ndarray) – X data to pass through CFL pipeline, dimensions (n_samples, n_x_features). #TODO: dimensions different if going to use a CNN

  • Y (np.ndarray) – Y data to pass through CFL pipeline, dimensions (n_samples, n_y_features).

  • name (str) – name of Dataset. Defaults to ‘dataset’.

  • Xraw (np.ndarray) – (Optional) raw form of X before preprocessing to remain associated with X for visualization. Defaults to None.

  • Yraw (np.ndarray) – (Optional) raw form of Y before preprocessing to remain associated with Y for visualization. Defaults to None.

Returns:

None

get_X()

Return X array associated with this Dataset Arguments: None :returns: an (n_samples,n_X_features) array :rtype: np.ndarray

get_Y()

Return Y array associated with this Dataset Arguments: None :returns: an (n_samples,n_Y_features) array :rtype: np.ndarray

get_cfl_results()

Return cfl results generated by passing this dataset through Experiment training or prediction. Arguments: None :returns: results generated by cfl.Experiment.train or .predict :rtype: dict

get_in_sample_idx()

Return in_sample_idx set for this Dataset. Arguments: None :returns: an array of sample indices in this subset :rtype: np.ndarray

get_name()

Return the name of this Dataset. Arguments: None :returns: name associated with this Dataset. :rtype: str

get_out_sample_idx()

Return out_sample_idx set for this Dataset. Arguments: None :returns: an array of sample indices in this subset :rtype: np.ndarray

set_cfl_results(cfl_results)

Assign results from a CFL Experiment run to this Dataset. :param cfl_results: results generated by cfl.Experiment.train

or .predict

Returns: None

set_in_sample_idx(in_sample_idx)

Set in_sample_idx set for this Dataset. :param in_sample_idx: an array of sample indices in this

subset

Returns: None

set_out_sample_idx(out_sample_idx)

Set out_sample_idx set for this Dataset. :param out_sample_idx: an array of sample indices in this

subset

Returns: None

cfl.experiment module

class cfl.experiment.Experiment(data_info, X_train, Y_train, X_train_raw=None, Y_train_raw=None, in_sample_idx=None, out_sample_idx=None, past_exp_path=None, block_names=None, block_params=None, blocks=None, verbose=1, results_path=None)

Bases: object

The Experiment class:
  • Creates a pipeline to pass data through the different Blocks of CFL

  • Save parameters, models, results of the pipeline for reuse

verbose

controls printout level

is_trained

boolean indicating whether Experiment pipeline has been trained yet

blocks

list of Block objects in pipeline

block_names

list of names of Blocks in pipeline

block_params

list of parameter dicts for each Block in pipeline

data_info

dict of information about the training dataset

datasets

list of Dataset objects registered to this Experiment

save_path

path to directory to save Experiment results to

train()

trains each Block in self.blocks according to it’s self-specified train method.

predict()

generates predictions for each Block in self.blocks according to it’s self-specified predict method.

__save_results()

helper function to save results generated by each Block

__save_params()

helper function to save parameters for each Block

__load_params()

helper function to load parameters for Blocks

add_dataset()

registeres a new Dataset to this Experiment

get_dataset()

get a Dataset by name from registry

load_dataset_results()

load results for a given Dataset from saved Experiment directory

__build_block()

build a Block by str name or return the Block itself if already instantiated in the argument.

__make_exp_dir()

make directory to save Experiment results and parameterization to.

__init__(data_info, X_train, Y_train, X_train_raw=None, Y_train_raw=None, in_sample_idx=None, out_sample_idx=None, past_exp_path=None, block_names=None, block_params=None, blocks=None, verbose=1, results_path=None)

Sets up and trains an Experiment.

Parameters:
  • data_info (dict) – a dictionary of information about this Experiment’s associated data. Refer to cfl.block.validate_data_info() for more information.

  • X_train (np.ndarray) – an (n_samples, n_x_features) 2D array.

  • Y_train (np.ndarray) – an (n_samples, n_y_features) 2D array.

  • X_train_raw (np.ndarray) – Deprecated, defaults to None.

  • Y_train_raw (np.ndarray) – Deprecated, defaults to None.

  • in_sample_idx (np.ndarray) – array of indices to include in training of CFL pipeline on X,Y. If None, will automatically generate. Defaults to None.

  • out_sample_idx (np.ndarray) – array of indices to withhold in training for validation of CFL pipeline on X,Y. If None, will automatically generate. Defaults to None.

  • past_exp_path (str) – path to directory associated with a previously trained Experiment. See note below.

  • block_names (list of strs) – list of block names to use (i.e. [‘CondDensityEstimator’, ‘CauseClusterer’, ‘EffectClusterer]). See note below.

  • block_params (list of dicts) – list of dicts specifying parameters for each block specified in block_names. Default is None. See note below.

  • blocks (list of Blocks) – list of block objects. Default is None. See note below.

  • verbose (int) – Amount of logging to print. Possible values are 0, 1, 2. Default is 1.,

  • results_path (str) – path to directory to save this experiment to. If None, results will not be saved. Default is None.

Note: There are three ways to specify Blocks:
  1. specify past_exp_path

  2. specify both block_names and block_params

  3. specify blocks.

Do not specify all four of these parameters.

add_dataset(X, Y, dataset_name, Xraw=None, Yraw=None, in_sample_idx=None, out_sample_idx=None)

Add a new dataset to be tracked by this Experiment.

Parameters:
  • X (np.ndarray) – X data of shape (n_samples, n_x_features) associated with this Dataset.

  • Y (np.ndarray) – Y data of shape (n_samples, n_y_features) associated with this Dataset.

  • dataset_name (str) – name associated with this Dataset. This will be the name used to retrieve a dataset using the Experiment.get_dataset() method.

  • Xraw (np.ndarray) – (Optional) raw form of X before preprocessing to remain associated with X for visualization. Defaults to None. Deprecated.

  • Yraw (np.ndarray) – (Optional) raw form of Y before preprocessing to remain associated with Y for visualization. Defaults to None. Deprecated.

Returns:

the newly constructed Dataset object.

Return type:

Dataset

get_data_info()
get_dataset(dataset_name)

Retrieve a Dataset that has been registered with this Experiment.

Parameters:

dataset_name (str) – name of the Dataset to retrieve.

Returns:

the Dataset associated with dataset_name.

Return type:

Dataset

get_save_path()

Return the path at which experiment results are saved. Arguments: None :returns: path to experiment :rtype: str

load_results_from_file(dataset_name='dataset_train')

Load and return saved results from running a given dataset through the Experiment pipeline. This function differs from retrieve_results() because this loads the saved results from their save directory

Parameters:

dataset_name (str) – name of Dataset to load results for. Defaults to the dataset used to train the pipeline, ‘dataset_train’.

Returns:

dictionary of results-dictionaries. The first key

specifies which Block the results come from. The second key specifies the specific result.

Return type:

dict of dicts

predict(dataset, prev_results=None)

Predict using the trained CFL pipeline.

Parameters:
  • dataset (str or Dataset) – dataset name or object.

  • prev_results (dict) – dict of results to pass to first Block to predict with, if needed.

Returns:

dict of results dictionaries from all Blocks.

Return type:

(dict)

retrieve_results(dataset_name='dataset_train')

Returns the results from running a given dataset through the Experiment pipeline. Default is the training dataset

Parameters:

dataset_name (str) – name of Dataset to load results for. Defaults to the dataset used to train the pipeline, ‘dataset_train’.

Returns:

dictionary of results-dictionaries. The first key

specifies which Block the results come from. The second key specifies the specific result.

Return type:

dict

train(dataset=None, prev_results=None)

Train the CFL pipeline.

Parameters:
  • dataset (str or Dataset) – dataset name or object.

  • prev_results (dict) – dict of results to pass to first Block to be trained, if needed.

Returns:

dict of results dicts from all Blocks.

Return type:

all_results (dict)

cfl.experiment.get_next_dirname(path)

gets the next subdirectory name in numerical order. i.e. if ‘path’ contains ‘run0000’ and ‘run0001’, this will return ‘run0002’. :param path: path of directory in which to find next subdirectory name :type path: str

Returns:

next subdirectory name.

Return type:

str

cfl.type_decorators module

Module contents