cfl.util package

Submodules

cfl.util.data_processing module

A set of helper functions that are used often in processing data passed through CFL.

cfl.util.data_processing.one_hot_decode(data)
Convert one-hot-encoded samples to standard categorical labels. For

examples, if data = [[0,1],[1,0],[1,0]], one_hot_decode(data) will return [1,0,0].

Parameters:

data – a 2D int array comprised only of ones and zeros. (np.ndarray)

Returns:

a 1D int array holding the one-hot decoding of data. (np.ndarray)

Return type:

ohd

cfl.util.data_processing.one_hot_encode(data, unique_labels)

Convert categorical labels to one-hot-encoding. For example, if data = [0, 2, 1, 2], one_hot_encode(data, [0, 1, 2]) will return [[1, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 1]].

Parameters:
  • data – an int array of categorical labels. (np.ndarray)

  • unique_labels – unique set of labels included in data (i.e. result of np.unique(data)) (np.ndarray)

Returns:

one-hot encoding of data (np.ndarray)

Return type:

ohe

cfl.util.data_processing.standardize_train_test(data, dtype=<class 'numpy.float32'>)

Standardize data that has been split into training and test sets.

Parameters:
  • data (array) –

    an array of 2D np.arrays to z-score along axis=1 For example, data could equal [Xtr, Xts, Ytr, Yts], where:

    • Xtr.shape = (n_train_samples, n_x_features)

    • Xts.shape = (n_test_samples , n_x_features)

    • Ytr.shape = (n_train_samples, n_y_features)

    • Yts.shape = (n_test_samples , n_y_features)

  • dtype (type) – data type to return values of all np.arrays in

Returns:

standardized version of the data argument

Return type:

data (array)

cfl.util.experiment_loading module

A set of functions to load in results from save CFL experiments.

cfl.util.experiment_loading.exp_load(exp_path, exp_id, dataset, block_name, result)

Loads in a result saved by a CFL Experiment. :param exp_path: path to directory where experiments are saved :type exp_path: str :param exp_id: experiment ID number :type exp_id: int :param dataset: name of dataset to pull results for (specify ‘dataset_train’

if you would like results for the dataset passed into the Experiment during intialization for training.

Parameters:
  • block_name (str) – name of Block to pull results from (common Block names include: ‘CondDensityEstimator’, ‘CauseClusterer’, or ‘EffectClusterer’)

  • results (str) – name of specific result to pull, i.e. ‘x_lbls’

Returns:

result object

Return type:

type varies

cfl.util.experiment_loading.get_fig_path(exp_path, exp_id, dataset)

Builds a path to save a figure to in an Experiment directory. :param exp_path: path to directory where experiments are saved :type exp_path: str :param exp_id: experiment ID number :type exp_id: int :param dataset: name of dataset to pull results for (specify ‘dataset_train’

if you would like results for the dataset passed into the Experiment during intialization for training.

Returns:

path to save figure to

Return type:

str

cfl.util.find_xlbl_locations module

Return the indices of each x_lbl grouped together

cfl.util.find_xlbl_locations.rows_where_each_x_class_occurs(x_lbls)

returns indices at which each x_lbl (X macrovariable class) occurs, as a list of np arrays

Parameters:
  • x_lbls (np.array) – a 1-D array, output from CFL, that contains CFL

  • labels (cluster) –

Returns:

returns a list whose length equals the number of clusters in x_lbls. Each entry in the list is a numpy array that gives the indices

Return type:

list of np.arrays

Example

>>> import numpy as np
>>> from cfl.util.find_xlbl_locations import rows_where_each_x_class_occurs
>>> x_lbls = np.array([0, 1, 0, 1, 2])
>>> rows_where_each_x_class_occurs(x_lbls)
[array([0, 2], dtype=int64), array([1, 3], dtype=int64), array([4], dtype=int64)]

cfl.util.input_val module

A set of functions helpful to validate inputs to CFL.

cfl.util.input_val.check_params(input_params, default_params, tag)

Check that all expected parameters have been provided, and substitute the default if not. Remove any unused but specified parameters. :param input_params: dictionary, where keys are parameter names :type input_params: dict :param default_params: dictionary, where keys are parameter names

and this set of parameter names is the the complete set of required params

Returns:

Verified parameter dictionary

Return type:

dict

cfl.util.input_val.validate_data_info(data_info)

Make sure all information about data is correctly specified.

Parameters:

data_info (dict) – a dictionary of information about the data CFL expects the following entries in data_info: - X_dims: (n_examples X, n_features X) - Y_dims: (n_examples Y, n_featuers Y) - Y_type: ‘continuous’ or ‘categorical’

Module contents