gojo.util package

Submodules

gojo.util.io module

gojo.util.io.load(file: str, backend: str = 'joblib_gzip') → object[source]

Function used to load serialized Python objects (see gojo.util.io.serialize).

filestr: Object to be loaded.
backendstr, default=’joblib’: Backend used for serialize the object.

objobject: Loaded object.

gojo.util.io.loadJson(file: str) → dict[source]

Load a json file.

filestr: Json file to be loaded.

contentdict: Json file content.

gojo.util.io.pprint(*args, verbose: bool = True, level: Optional[str] = None, sep: str = ' ')[source]: Print function for the gojo module.

gojo.util.io.saveJson(data: dict, file: str)[source]

Saves the input dictionary into a json file.

datadict: Dictionary to be exported to a json file.
filestr: Output json file

IMPORTANT NOTE: numpy types must be previously converted to Python types.

gojo.util.io.saveTorchModel(base_path: str, key: str, model: torch.nn.Module) → str[source]

Function used to save the weights of torch.nn.Module models.

base_pathstr: Base directory where the model will be stored. If this directory does not exist, it will be created.
keystr: Key used to identify the model.
modeltorch.nn.Module: Model whose parameters will be saved.

filestr: Generated file.

gojo.util.io.saveTorchModelAndHistory(base_path: str, key: str, model: torch.nn.Module, history: dict)[source]

Subroutine used to serialize model data and convergence history.

base_pathstr: Base directory where the model and convergence information will be stored. If this directory does not exist, it will be created.
keystr: Key used to identify the model.
modeltorch.nn.Module: Model whose parameters will be saved.
historydict: Dictionary similar to the one returned by the function util.torch_util.fit_neural_network().

gojo.util.io.serialize(obj, path: str, time_prefix: bool = False, overwrite: bool = False, backend: str = 'joblib_gzip') → str[source]

Function used to serialize Python objects.

objobject: Object to be saved.
pathstr: File used to save the provided object.
time_prefixbool, default=False: Parameter indicating whether to add a time prefix to the exported file (YYYYMMDD-HHMMSS).
overwritebool, default=False: Parameter indicating whether to overwrite a possible existing file.
backendstr, default=’joblib’: Backend used for serialize the object.

pathstr: Serialized object.

gojo.util.login module

class gojo.util.login.Login[source]

Bases: object

Basic Login handler.

logger_levels = {None: loguru.logger.info, 'info': loguru.logger.info, 'error': loguru.logger.error, 'err': loguru.logger.error, 'warn': loguru.logger.warning, 'warning': loguru.logger.warning, 'success': loguru.logger.success}

gojo.util.login.configureLogger(file: Optional[str] = None, add_time_prefix: bool = True)[source]

Function used to configure the login system. If no file is provided as input the output will be driven by the standard Python output. If an input file is provided it will be created and the output will be redirected to that file.

Login levels (when calling the gojo.io.pprint()): - None: Information level - ‘info’: Information level (same as None). - ‘error’: Error level. - ‘err’: Error level (same as ‘error’) - ‘warning’: Warning level. - ‘warn’: Warning level (same as ‘warn’). - ‘success’: Successful level.

filestr, default=None: Output file to redirect the output.
add_time_prefixbool, default=True: Indicate whether to add the time prefix to the login.

The login status can be checked using gojo.util.login.isActive(), and can be disabled by using gojo.util.login.deactivate().

gojo.util.login.deactivate()[source]: Deactivate the current login system.

gojo.util.login.isActive() → bool[source]: Indicates whether the login system is active.

gojo.util.splitter module

class gojo.util.splitter.InstanceLevelKFoldSplitter(n_splits: int, instance_id: numpy.ndarray, n_repeats: int = 1, shuffle: bool = True, random_state: Optional[int] = None)[source]

Bases: object

Splitter that allows to make splits at instance level ignoring the observations associated to the instance.

Important

The observations of the input data of the split() method will be associated with the identifiers provided in instance_id.

n_splitsint: Number of folds. Must be at least 2.
instance_idnp.ndarray: Array identifying the instances to perform the splits.
n_repeatsint, default=1: Number of times cross-validator needs to be repeated.
shufflebool, default=True: Indicates whether to shuffle the data before performing the split.
random_stateint, default=None: Controls the randomness of each repeated cross-validation instance.

split(X: pandas.DataFrame, y=None) → Tuple[numpy.ndarray, numpy.ndarray][source]

Generate the splits. This function will return a tuple where the first element will correspond to the training indices and the second element to the test indices.

Important

X must match with instance_id.

Xnp.ndarray or pd.DataFrame: Input data.
yobject, default=None: Ignored parameter. Implemented for sklearn compatibility.

class gojo.util.splitter.PredefinedSplitter(train_index: list, test_index: list)[source]

Bases: object

Wrapper that allows to incorporate a predefined split within the model evaluation subroutines. This wrapper expects from the user two lists, with the indices (positions along dimension 0 of the input data) that will be used as training and test respectively.

train_indexlist or np.ndarray: Indices used for train.
test_indexlist or np.ndarray: Indices used for test.

>>> import numpy as np
>>> from gojo import util
>>>
>>> np.random.seed(1997)
>>>
>>> n_samples = 20
>>> n_feats = 10
>>> X = np.random.uniform(size=(n_samples, n_feats))
>>> y = np.random.randint(0, 2, size=n_samples)
>>>
>>> splitter = util.splitter.PredefinedSplitter(
>>>     train_index=np.arange(0, 15),
>>>     test_index=np.arange(15, 20),
>>> )
>>>
>>> for train_idx, test_idx in splitter.split(X, y):
>>>     print(len(train_idx), y[train_idx].mean())
>>>     print(len(test_idx), y[test_idx].mean())

split(X: pandas.DataFrame, y: Optional[pandas.Series] = None) → Tuple[numpy.ndarray, numpy.ndarray][source]

Generates indices to split data into training and test set.

Xnp.ndarray or pd.DataFrame: Input data.
ynp.ndarray or pd.Series, default=None: Target variable.

class gojo.util.splitter.SimpleSplitter(test_size: float, stratify: bool = False, random_state: Optional[int] = None, shuffle: bool = True)[source]

Bases: object

Wrapper of the sklearn sklearn.model_selection.train_test_split function used to perform a simple partitioning of the data into a training and a test set (optionally with stratification).

test_sizefloat: If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
stratifybool, default=False: If not False, data is split in a stratified fashion, using this as the class labels.
random_stateint, default=None: Controls the shuffling applied to the data before applying the split.
shufflebool, default=True: Whether to shuffle the data before splitting. If shuffle=False then stratify must be None.

>>> import numpy as np
>>> from gojo import util
>>>
>>> np.random.seed(1997)
>>>
>>> n_samples = 20
>>> n_feats = 10
>>> X = np.random.uniform(size=(n_samples, n_feats))
>>> y = np.random.randint(0, 2, size=n_samples)
>>>
>>> splitter = util.splitter.SimpleSplitter(
>>>     test_size=0.2,
>>>     stratify=True,
>>>     random_state=1997
>>> )
>>>
>>> for train_idx, test_idx in splitter.split(X, y):
>>>     print(len(train_idx), y[train_idx].mean())
>>>     print(len(test_idx), y[test_idx].mean())

split(X: pandas.DataFrame, y: Optional[pandas.Series] = None) → Tuple[numpy.ndarray, numpy.ndarray][source]

Generates indices to split data into training and test set.

Xnp.ndarray or pd.DataFrame: Input data.
ynp.ndarray or pd.Series, default=None: If stratify was specified as True this variable will be used for performing a stratified split.

gojo.util.splitter.getCrossValObj(cv: Optional[int] = None, repeats: int = 1, stratified: bool = False, loocv: bool = False, random_state: Optional[int] = None) → sklearn.model_selection.LeaveOneOut[source]

Function used to obtain the sklearn class used to perform an evaluation of the models according to the cross-validation or leave-one-out cross-validation (LOOCV) schemes.

cvint, default=None: (cross-validation) This parameter is used to specify the number of folds. Ignored when loocv is set to True.
repeatsint, default=1: (cross-validation) This parameter is used to specify the number of repetitions of an N-repeats cross-validation. Ignored when loocv is set to True.
stratifiedbool, default=False: (cross-validation) This parameter is specified whether to perform the cross-validation with class stratification. Ignored when loocv is set to True.
loocvbool, default=False: (Leave-one-out cross validation) Indicates whether to perform a LOOCV. If this parameter is set to True the rest of the parameters will be ignored.
random_stateint, default=None: (cross-validation) Random state for study replication.

cv_objRepeatedKFold or RepeatedStratifiedKFold or LeaveOneOut: Cross-validation instance from the sklearn library.

gojo.util.tools module

gojo.util.tools.getNumModelParams(model: torch.nn.Module) → int[source]: Function that returns the number of trainable parameters from a torch.nn.Module instance.

gojo.util.tools.minMaxScaling(data: numpy.ndarray, feature_range: tuple = (0, 1)) → numpy.ndarray[source]

Apply a min-max scaling to the provided data range.

datapd.DataFrame or np.ndarray: Data to be scaled.
feature_rangetuple, default=(0, 1): Feature range to scale the input data

scaled_datapd.DataFrame or np.ndarray: Data scaled to the provided range.

gojo.util.tools.zscoresScaling(data: numpy.ndarray) → numpy.ndarray[source]

Apply a z-scores scaling to the provided data range.

datapd.DataFrame or np.ndarray: Data to be scaled.

scaled_datapd.DataFrame or np.ndarray: Z-scores

gojo.util.validation module

gojo.util.validation.checkCallable(input_obj_name: str, obj: callable)[source]: Function used to check if a given object is callable.

gojo.util.validation.checkClass(input_obj_name: str, obj)[source]: Function used to check if a given object is a class.

gojo.util.validation.checkInputType(input_var_name: str, input_var: object, valid_types: list)[source]: Function that checks that the type of the input variable input_var is within the valid types valid_types.

gojo.util.validation.checkIterable(input_obj_name: str, obj)[source]: Function used to check if a given object is an iterable.

gojo.util.validation.checkMultiInputTypes(*args)[source]: Wrapper of function checkInputType to check multiple variables at the same time.

gojo.util.validation.fileExists(file: str, must_exists: bool)[source]: Function that checks if a given file exists or not exists

gojo.util.validation.pathExists(path: str, must_exists: bool)[source]: Function that checks if a given path exists.

gojo.util package

Submodules

gojo.util.io module

gojo.util.login module

gojo.util.splitter module

gojo.util.tools module

gojo.util.validation module

Module contents