gojo.util package
Submodules
gojo.util.io module
- gojo.util.io.load(file: str, backend: str = 'joblib_gzip') object [source]
Function used to load serialized Python objects (see
gojo.util.io.serialize
).- filestr
Object to be loaded.
- backendstr, default=’joblib’
Backend used for serialize the object.
- objobject
Loaded object.
- gojo.util.io.loadJson(file: str) dict [source]
Load a json file.
- filestr
Json file to be loaded.
- contentdict
Json file content.
- gojo.util.io.pprint(*args, verbose: bool = True, level: Optional[str] = None, sep: str = ' ')[source]
Print function for the
gojo
module.
- gojo.util.io.saveJson(data: dict, file: str)[source]
Saves the input dictionary into a json file.
- datadict
Dictionary to be exported to a json file.
- filestr
Output json file
IMPORTANT NOTE: numpy types must be previously converted to Python types.
- gojo.util.io.saveTorchModel(base_path: str, key: str, model: torch.nn.Module) str [source]
Function used to save the weights of torch.nn.Module models.
- base_pathstr
Base directory where the model will be stored. If this directory does not exist, it will be created.
- keystr
Key used to identify the model.
- modeltorch.nn.Module
Model whose parameters will be saved.
- filestr
Generated file.
- gojo.util.io.saveTorchModelAndHistory(base_path: str, key: str, model: torch.nn.Module, history: dict)[source]
Subroutine used to serialize model data and convergence history.
- base_pathstr
Base directory where the model and convergence information will be stored. If this directory does not exist, it will be created.
- keystr
Key used to identify the model.
- modeltorch.nn.Module
Model whose parameters will be saved.
- historydict
Dictionary similar to the one returned by the function
util.torch_util.fit_neural_network()
.
- gojo.util.io.serialize(obj, path: str, time_prefix: bool = False, overwrite: bool = False, backend: str = 'joblib_gzip') str [source]
Function used to serialize Python objects.
- objobject
Object to be saved.
- pathstr
File used to save the provided object.
- time_prefixbool, default=False
Parameter indicating whether to add a time prefix to the exported file (YYYYMMDD-HHMMSS).
- overwritebool, default=False
Parameter indicating whether to overwrite a possible existing file.
- backendstr, default=’joblib’
Backend used for serialize the object.
- pathstr
Serialized object.
gojo.util.login module
- class gojo.util.login.Login[source]
Bases:
object
Basic Login handler.
- logger_levels = {None: loguru.logger.info, 'info': loguru.logger.info, 'error': loguru.logger.error, 'err': loguru.logger.error, 'warn': loguru.logger.warning, 'warning': loguru.logger.warning, 'success': loguru.logger.success}
- gojo.util.login.configureLogger(file: Optional[str] = None, add_time_prefix: bool = True)[source]
Function used to configure the login system. If no file is provided as input the output will be driven by the standard Python output. If an input file is provided it will be created and the output will be redirected to that file.
Login levels (when calling the
gojo.io.pprint()
): - None: Information level - ‘info’: Information level (same as None). - ‘error’: Error level. - ‘err’: Error level (same as ‘error’) - ‘warning’: Warning level. - ‘warn’: Warning level (same as ‘warn’). - ‘success’: Successful level.- filestr, default=None
Output file to redirect the output.
- add_time_prefixbool, default=True
Indicate whether to add the time prefix to the login.
The login status can be checked using
gojo.util.login.isActive()
, and can be disabled by usinggojo.util.login.deactivate()
.
gojo.util.splitter module
- class gojo.util.splitter.InstanceLevelKFoldSplitter(n_splits: int, instance_id: numpy.ndarray, n_repeats: int = 1, shuffle: bool = True, random_state: Optional[int] = None)[source]
Bases:
object
Splitter that allows to make splits at instance level ignoring the observations associated to the instance.
Important
The observations of the input data of the
split()
method will be associated with the identifiers provided in instance_id.- n_splitsint
Number of folds. Must be at least 2.
- instance_idnp.ndarray
Array identifying the instances to perform the splits.
- n_repeatsint, default=1
Number of times cross-validator needs to be repeated.
- shufflebool, default=True
Indicates whether to shuffle the data before performing the split.
- random_stateint, default=None
Controls the randomness of each repeated cross-validation instance.
- split(X: pandas.DataFrame, y=None) Tuple[numpy.ndarray, numpy.ndarray] [source]
Generate the splits. This function will return a tuple where the first element will correspond to the training indices and the second element to the test indices.
Important
X must match with instance_id.
- Xnp.ndarray or pd.DataFrame
Input data.
- yobject, default=None
Ignored parameter. Implemented for sklearn compatibility.
- class gojo.util.splitter.PredefinedSplitter(train_index: list, test_index: list)[source]
Bases:
object
Wrapper that allows to incorporate a predefined split within the model evaluation subroutines. This wrapper expects from the user two lists, with the indices (positions along dimension 0 of the input data) that will be used as training and test respectively.
- train_indexlist or np.ndarray
Indices used for train.
- test_indexlist or np.ndarray
Indices used for test.
>>> import numpy as np >>> from gojo import util >>> >>> np.random.seed(1997) >>> >>> n_samples = 20 >>> n_feats = 10 >>> X = np.random.uniform(size=(n_samples, n_feats)) >>> y = np.random.randint(0, 2, size=n_samples) >>> >>> splitter = util.splitter.PredefinedSplitter( >>> train_index=np.arange(0, 15), >>> test_index=np.arange(15, 20), >>> ) >>> >>> for train_idx, test_idx in splitter.split(X, y): >>> print(len(train_idx), y[train_idx].mean()) >>> print(len(test_idx), y[test_idx].mean())
- split(X: pandas.DataFrame, y: Optional[pandas.Series] = None) Tuple[numpy.ndarray, numpy.ndarray] [source]
Generates indices to split data into training and test set.
- Xnp.ndarray or pd.DataFrame
Input data.
- ynp.ndarray or pd.Series, default=None
Target variable.
- class gojo.util.splitter.SimpleSplitter(test_size: float, stratify: bool = False, random_state: Optional[int] = None, shuffle: bool = True)[source]
Bases:
object
Wrapper of the sklearn sklearn.model_selection.train_test_split function used to perform a simple partitioning of the data into a training and a test set (optionally with stratification).
- test_sizefloat
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
- stratifybool, default=False
If not False, data is split in a stratified fashion, using this as the class labels.
- random_stateint, default=None
Controls the shuffling applied to the data before applying the split.
- shufflebool, default=True
Whether to shuffle the data before splitting. If shuffle=False then stratify must be None.
>>> import numpy as np >>> from gojo import util >>> >>> np.random.seed(1997) >>> >>> n_samples = 20 >>> n_feats = 10 >>> X = np.random.uniform(size=(n_samples, n_feats)) >>> y = np.random.randint(0, 2, size=n_samples) >>> >>> splitter = util.splitter.SimpleSplitter( >>> test_size=0.2, >>> stratify=True, >>> random_state=1997 >>> ) >>> >>> for train_idx, test_idx in splitter.split(X, y): >>> print(len(train_idx), y[train_idx].mean()) >>> print(len(test_idx), y[test_idx].mean())
- split(X: pandas.DataFrame, y: Optional[pandas.Series] = None) Tuple[numpy.ndarray, numpy.ndarray] [source]
Generates indices to split data into training and test set.
- Xnp.ndarray or pd.DataFrame
Input data.
- ynp.ndarray or pd.Series, default=None
If stratify was specified as True this variable will be used for performing a stratified split.
- gojo.util.splitter.getCrossValObj(cv: Optional[int] = None, repeats: int = 1, stratified: bool = False, loocv: bool = False, random_state: Optional[int] = None) sklearn.model_selection.LeaveOneOut [source]
Function used to obtain the sklearn class used to perform an evaluation of the models according to the cross-validation or leave-one-out cross-validation (LOOCV) schemes.
- cvint, default=None
(cross-validation) This parameter is used to specify the number of folds. Ignored when loocv is set to True.
- repeatsint, default=1
(cross-validation) This parameter is used to specify the number of repetitions of an N-repeats cross-validation. Ignored when loocv is set to True.
- stratifiedbool, default=False
(cross-validation) This parameter is specified whether to perform the cross-validation with class stratification. Ignored when loocv is set to True.
- loocvbool, default=False
(Leave-one-out cross validation) Indicates whether to perform a LOOCV. If this parameter is set to True the rest of the parameters will be ignored.
- random_stateint, default=None
(cross-validation) Random state for study replication.
- cv_objRepeatedKFold or RepeatedStratifiedKFold or LeaveOneOut
Cross-validation instance from the sklearn library.
gojo.util.tools module
- gojo.util.tools.getNumModelParams(model: torch.nn.Module) int [source]
Function that returns the number of trainable parameters from a
torch.nn.Module
instance.
- gojo.util.tools.minMaxScaling(data: numpy.ndarray, feature_range: tuple = (0, 1)) numpy.ndarray [source]
Apply a min-max scaling to the provided data range.
- datapd.DataFrame or np.ndarray
Data to be scaled.
- feature_rangetuple, default=(0, 1)
Feature range to scale the input data
- scaled_datapd.DataFrame or np.ndarray
Data scaled to the provided range.
- gojo.util.tools.zscoresScaling(data: numpy.ndarray) numpy.ndarray [source]
Apply a z-scores scaling to the provided data range.
- datapd.DataFrame or np.ndarray
Data to be scaled.
- scaled_datapd.DataFrame or np.ndarray
Z-scores
gojo.util.validation module
- gojo.util.validation.checkCallable(input_obj_name: str, obj: callable)[source]
Function used to check if a given object is callable.
- gojo.util.validation.checkClass(input_obj_name: str, obj)[source]
Function used to check if a given object is a class.
- gojo.util.validation.checkInputType(input_var_name: str, input_var: object, valid_types: list)[source]
Function that checks that the type of the input variable input_var is within the valid types valid_types.
- gojo.util.validation.checkIterable(input_obj_name: str, obj)[source]
Function used to check if a given object is an iterable.
- gojo.util.validation.checkMultiInputTypes(*args)[source]
Wrapper of function checkInputType to check multiple variables at the same time.