cfl.util package
Submodules
cfl.util.data_processing module
A set of helper functions that are used often in processing data passed through CFL.
- cfl.util.data_processing.one_hot_decode(data)
- Convert one-hot-encoded samples to standard categorical labels. For
examples, if data = [[0,1],[1,0],[1,0]], one_hot_decode(data) will return [1,0,0].
- Parameters:
data – a 2D int array comprised only of ones and zeros. (np.ndarray)
- Returns:
a 1D int array holding the one-hot decoding of data. (np.ndarray)
- Return type:
ohd
- cfl.util.data_processing.one_hot_encode(data, unique_labels)
Convert categorical labels to one-hot-encoding. For example, if data = [0, 2, 1, 2], one_hot_encode(data, [0, 1, 2]) will return [[1, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 1]].
- Parameters:
data – an int array of categorical labels. (np.ndarray)
unique_labels – unique set of labels included in data (i.e. result of np.unique(data)) (np.ndarray)
- Returns:
one-hot encoding of data (np.ndarray)
- Return type:
ohe
- cfl.util.data_processing.standardize_train_test(data, dtype=<class 'numpy.float32'>)
Standardize data that has been split into training and test sets.
- Parameters:
data (array) –
an array of 2D np.arrays to z-score along axis=1 For example, data could equal [Xtr, Xts, Ytr, Yts], where:
Xtr.shape = (n_train_samples, n_x_features)
Xts.shape = (n_test_samples , n_x_features)
Ytr.shape = (n_train_samples, n_y_features)
Yts.shape = (n_test_samples , n_y_features)
dtype (type) – data type to return values of all np.arrays in
- Returns:
standardized version of the data argument
- Return type:
data (array)
cfl.util.experiment_loading module
A set of functions to load in results from save CFL experiments.
- cfl.util.experiment_loading.exp_load(exp_path, exp_id, dataset, block_name, result)
Loads in a result saved by a CFL Experiment. :param exp_path: path to directory where experiments are saved :type exp_path: str :param exp_id: experiment ID number :type exp_id: int :param dataset: name of dataset to pull results for (specify ‘dataset_train’
if you would like results for the dataset passed into the Experiment during intialization for training.
- Parameters:
block_name (str) – name of Block to pull results from (common Block names include: ‘CondDensityEstimator’, ‘CauseClusterer’, or ‘EffectClusterer’)
results (str) – name of specific result to pull, i.e. ‘x_lbls’
- Returns:
result object
- Return type:
type varies
- cfl.util.experiment_loading.get_fig_path(exp_path, exp_id, dataset)
Builds a path to save a figure to in an Experiment directory. :param exp_path: path to directory where experiments are saved :type exp_path: str :param exp_id: experiment ID number :type exp_id: int :param dataset: name of dataset to pull results for (specify ‘dataset_train’
if you would like results for the dataset passed into the Experiment during intialization for training.
- Returns:
path to save figure to
- Return type:
str
cfl.util.find_xlbl_locations module
Return the indices of each x_lbl grouped together
- cfl.util.find_xlbl_locations.rows_where_each_x_class_occurs(x_lbls)
returns indices at which each x_lbl (X macrovariable class) occurs, as a list of np arrays
- Parameters:
x_lbls (np.array) – a 1-D array, output from CFL, that contains CFL
labels (cluster) –
- Returns:
returns a list whose length equals the number of clusters in x_lbls. Each entry in the list is a numpy array that gives the indices
- Return type:
list of np.arrays
Example
>>> import numpy as np >>> from cfl.util.find_xlbl_locations import rows_where_each_x_class_occurs >>> x_lbls = np.array([0, 1, 0, 1, 2]) >>> rows_where_each_x_class_occurs(x_lbls) [array([0, 2], dtype=int64), array([1, 3], dtype=int64), array([4], dtype=int64)]
cfl.util.input_val module
A set of functions helpful to validate inputs to CFL.
- cfl.util.input_val.check_params(input_params, default_params, tag)
Check that all expected parameters have been provided, and substitute the default if not. Remove any unused but specified parameters. :param input_params: dictionary, where keys are parameter names :type input_params: dict :param default_params: dictionary, where keys are parameter names
and this set of parameter names is the the complete set of required params
- Returns:
Verified parameter dictionary
- Return type:
dict
- cfl.util.input_val.validate_data_info(data_info)
Make sure all information about data is correctly specified.
- Parameters:
data_info (dict) – a dictionary of information about the data CFL expects the following entries in data_info: - X_dims: (n_examples X, n_features X) - Y_dims: (n_examples Y, n_featuers Y) - Y_type: ‘continuous’ or ‘categorical’