gojo.util package

Submodules

gojo.util.io module

gojo.util.io.load(file: str, backend: str = 'joblib_gzip') object[source]

Function used to load serialized Python objects (see gojo.util.io.serialize).

filestr

Object to be loaded.

backendstr, default=’joblib’

Backend used for serialize the object.

objobject

Loaded object.

gojo.util.io.loadJson(file: str) dict[source]

Load a json file.

filestr

Json file to be loaded.

contentdict

Json file content.

gojo.util.io.pprint(*args, verbose: bool = True, level: Optional[str] = None, sep: str = ' ')[source]

Print function for the gojo module.

gojo.util.io.saveJson(data: dict, file: str)[source]

Saves the input dictionary into a json file.

datadict

Dictionary to be exported to a json file.

filestr

Output json file

IMPORTANT NOTE: numpy types must be previously converted to Python types.

gojo.util.io.saveTorchModel(base_path: str, key: str, model: torch.nn.Module) str[source]

Function used to save the weights of torch.nn.Module models.

base_pathstr

Base directory where the model will be stored. If this directory does not exist, it will be created.

keystr

Key used to identify the model.

modeltorch.nn.Module

Model whose parameters will be saved.

filestr

Generated file.

gojo.util.io.saveTorchModelAndHistory(base_path: str, key: str, model: torch.nn.Module, history: dict)[source]

Subroutine used to serialize model data and convergence history.

base_pathstr

Base directory where the model and convergence information will be stored. If this directory does not exist, it will be created.

keystr

Key used to identify the model.

modeltorch.nn.Module

Model whose parameters will be saved.

historydict

Dictionary similar to the one returned by the function util.torch_util.fit_neural_network().

gojo.util.io.serialize(obj, path: str, time_prefix: bool = False, overwrite: bool = False, backend: str = 'joblib_gzip') str[source]

Function used to serialize Python objects.

objobject

Object to be saved.

pathstr

File used to save the provided object.

time_prefixbool, default=False

Parameter indicating whether to add a time prefix to the exported file (YYYYMMDD-HHMMSS).

overwritebool, default=False

Parameter indicating whether to overwrite a possible existing file.

backendstr, default=’joblib’

Backend used for serialize the object.

pathstr

Serialized object.

gojo.util.login module

class gojo.util.login.Login[source]

Bases: object

Basic Login handler.

logger_levels = {None: loguru.logger.info, 'info': loguru.logger.info, 'error': loguru.logger.error, 'err': loguru.logger.error, 'warn': loguru.logger.warning, 'warning': loguru.logger.warning, 'success': loguru.logger.success}
gojo.util.login.configureLogger(file: Optional[str] = None, add_time_prefix: bool = True)[source]

Function used to configure the login system. If no file is provided as input the output will be driven by the standard Python output. If an input file is provided it will be created and the output will be redirected to that file.

Login levels (when calling the gojo.io.pprint()): - None: Information level - ‘info’: Information level (same as None). - ‘error’: Error level. - ‘err’: Error level (same as ‘error’) - ‘warning’: Warning level. - ‘warn’: Warning level (same as ‘warn’). - ‘success’: Successful level.

filestr, default=None

Output file to redirect the output.

add_time_prefixbool, default=True

Indicate whether to add the time prefix to the login.

The login status can be checked using gojo.util.login.isActive(), and can be disabled by using gojo.util.login.deactivate().

gojo.util.login.deactivate()[source]

Deactivate the current login system.

gojo.util.login.isActive() bool[source]

Indicates whether the login system is active.

gojo.util.splitter module

class gojo.util.splitter.InstanceLevelKFoldSplitter(n_splits: int, instance_id: numpy.ndarray, n_repeats: int = 1, shuffle: bool = True, random_state: Optional[int] = None)[source]

Bases: object

Splitter that allows to make splits at instance level ignoring the observations associated to the instance.

Important

The observations of the input data of the split() method will be associated with the identifiers provided in instance_id.

n_splitsint

Number of folds. Must be at least 2.

instance_idnp.ndarray

Array identifying the instances to perform the splits.

n_repeatsint, default=1

Number of times cross-validator needs to be repeated.

shufflebool, default=True

Indicates whether to shuffle the data before performing the split.

random_stateint, default=None

Controls the randomness of each repeated cross-validation instance.

split(X: pandas.DataFrame, y=None) Tuple[numpy.ndarray, numpy.ndarray][source]

Generate the splits. This function will return a tuple where the first element will correspond to the training indices and the second element to the test indices.

Important

X must match with instance_id.

Xnp.ndarray or pd.DataFrame

Input data.

yobject, default=None

Ignored parameter. Implemented for sklearn compatibility.

class gojo.util.splitter.PredefinedSplitter(train_index: list, test_index: list)[source]

Bases: object

Wrapper that allows to incorporate a predefined split within the model evaluation subroutines. This wrapper expects from the user two lists, with the indices (positions along dimension 0 of the input data) that will be used as training and test respectively.

train_indexlist or np.ndarray

Indices used for train.

test_indexlist or np.ndarray

Indices used for test.

>>> import numpy as np
>>> from gojo import util
>>>
>>> np.random.seed(1997)
>>>
>>> n_samples = 20
>>> n_feats = 10
>>> X = np.random.uniform(size=(n_samples, n_feats))
>>> y = np.random.randint(0, 2, size=n_samples)
>>>
>>> splitter = util.splitter.PredefinedSplitter(
>>>     train_index=np.arange(0, 15),
>>>     test_index=np.arange(15, 20),
>>> )
>>>
>>> for train_idx, test_idx in splitter.split(X, y):
>>>     print(len(train_idx), y[train_idx].mean())
>>>     print(len(test_idx), y[test_idx].mean())
split(X: pandas.DataFrame, y: Optional[pandas.Series] = None) Tuple[numpy.ndarray, numpy.ndarray][source]

Generates indices to split data into training and test set.

Xnp.ndarray or pd.DataFrame

Input data.

ynp.ndarray or pd.Series, default=None

Target variable.

class gojo.util.splitter.SimpleSplitter(test_size: float, stratify: bool = False, random_state: Optional[int] = None, shuffle: bool = True)[source]

Bases: object

Wrapper of the sklearn sklearn.model_selection.train_test_split function used to perform a simple partitioning of the data into a training and a test set (optionally with stratification).

test_sizefloat

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.

stratifybool, default=False

If not False, data is split in a stratified fashion, using this as the class labels.

random_stateint, default=None

Controls the shuffling applied to the data before applying the split.

shufflebool, default=True

Whether to shuffle the data before splitting. If shuffle=False then stratify must be None.

>>> import numpy as np
>>> from gojo import util
>>>
>>> np.random.seed(1997)
>>>
>>> n_samples = 20
>>> n_feats = 10
>>> X = np.random.uniform(size=(n_samples, n_feats))
>>> y = np.random.randint(0, 2, size=n_samples)
>>>
>>> splitter = util.splitter.SimpleSplitter(
>>>     test_size=0.2,
>>>     stratify=True,
>>>     random_state=1997
>>> )
>>>
>>> for train_idx, test_idx in splitter.split(X, y):
>>>     print(len(train_idx), y[train_idx].mean())
>>>     print(len(test_idx), y[test_idx].mean())
split(X: pandas.DataFrame, y: Optional[pandas.Series] = None) Tuple[numpy.ndarray, numpy.ndarray][source]

Generates indices to split data into training and test set.

Xnp.ndarray or pd.DataFrame

Input data.

ynp.ndarray or pd.Series, default=None

If stratify was specified as True this variable will be used for performing a stratified split.

gojo.util.splitter.getCrossValObj(cv: Optional[int] = None, repeats: int = 1, stratified: bool = False, loocv: bool = False, random_state: Optional[int] = None) sklearn.model_selection.LeaveOneOut[source]

Function used to obtain the sklearn class used to perform an evaluation of the models according to the cross-validation or leave-one-out cross-validation (LOOCV) schemes.

cvint, default=None

(cross-validation) This parameter is used to specify the number of folds. Ignored when loocv is set to True.

repeatsint, default=1

(cross-validation) This parameter is used to specify the number of repetitions of an N-repeats cross-validation. Ignored when loocv is set to True.

stratifiedbool, default=False

(cross-validation) This parameter is specified whether to perform the cross-validation with class stratification. Ignored when loocv is set to True.

loocvbool, default=False

(Leave-one-out cross validation) Indicates whether to perform a LOOCV. If this parameter is set to True the rest of the parameters will be ignored.

random_stateint, default=None

(cross-validation) Random state for study replication.

cv_objRepeatedKFold or RepeatedStratifiedKFold or LeaveOneOut

Cross-validation instance from the sklearn library.

gojo.util.tools module

gojo.util.tools.getNumModelParams(model: torch.nn.Module) int[source]

Function that returns the number of trainable parameters from a torch.nn.Module instance.

gojo.util.tools.minMaxScaling(data: numpy.ndarray, feature_range: tuple = (0, 1)) numpy.ndarray[source]

Apply a min-max scaling to the provided data range.

datapd.DataFrame or np.ndarray

Data to be scaled.

feature_rangetuple, default=(0, 1)

Feature range to scale the input data

scaled_datapd.DataFrame or np.ndarray

Data scaled to the provided range.

gojo.util.tools.zscoresScaling(data: numpy.ndarray) numpy.ndarray[source]

Apply a z-scores scaling to the provided data range.

datapd.DataFrame or np.ndarray

Data to be scaled.

scaled_datapd.DataFrame or np.ndarray

Z-scores

gojo.util.validation module

gojo.util.validation.checkCallable(input_obj_name: str, obj: callable)[source]

Function used to check if a given object is callable.

gojo.util.validation.checkClass(input_obj_name: str, obj)[source]

Function used to check if a given object is a class.

gojo.util.validation.checkInputType(input_var_name: str, input_var: object, valid_types: list)[source]

Function that checks that the type of the input variable input_var is within the valid types valid_types.

gojo.util.validation.checkIterable(input_obj_name: str, obj)[source]

Function used to check if a given object is an iterable.

gojo.util.validation.checkMultiInputTypes(*args)[source]

Wrapper of function checkInputType to check multiple variables at the same time.

gojo.util.validation.fileExists(file: str, must_exists: bool)[source]

Function that checks if a given file exists or not exists

gojo.util.validation.pathExists(path: str, must_exists: bool)[source]

Function that checks if a given path exists.

Module contents