gojo.interfaces package

Submodules

gojo.interfaces.model module

class gojo.interfaces.model.Model[source]

Base class (interface) used to define a model that can interact with the gojo library.

Subclasses must define the following methods:

train()
This method is used to fit a given model to the input data. Once the model has been fitted, inside this method, the superclass method fitted() must be called; otherwise, the model will not be recognized as fitted to any data, and performInference() will raise a gojo.exception.UnfittedEstimator error.

performInference()
Once the model has been fitted using the train() method (when the is_fitted property is called, the returned value should be True), this method allows performing inferences on new data.

reset()
This method should reset the inner estimator, forgetting all the data seen.

getParameters()
This method must return a dictionary containing the parameters used by the model. The parameters returned by this method will be used to store metadata about the model.

updateParameters()
This method must update the inner parameters of the model.

copy()
This method must return a copy of the model.

This abstract class provides the following properties:

parameters -> dict
Returns the hyperparameters of the model.

is_fitted -> bool
Indicates whether a given model has been fitted (i.e., if the train() method was called).

And the following methods:

fitted()
This method should be called inside the train() method to indicate that the model was fitted to the input data and can now perform inferences using the performInference() subroutine.

resetFit()
This method is used to reset learned model weights.

abstract copy()[source]: Method used to make a copy of the model.

fitted()[source]: Method called to indicate that a given model have been fitted.

abstract getParameters() → dict[source]

Method that must return the model parameters.

model_parametersdict: Model parameters.

property is_fitted: bool

Indicates whether the model has been trained by calling the train() method.

model_fittedbool: Returns True if the model was fitted.

property parameters: dict

Return the model parameters defined in the getParameters() method.

model_parametersdict: Model parameters.

abstract performInference(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]

Method used to perform the model predictions.

Xnp.ndarray: Input data used to perform inference.

abstract reset(**kwargs)[source]: Method used to reset the fitted model.

resetFit()[source]: Method used to reset a fitted model.

abstract train(X: numpy.ndarray, y: None = None, **kwargs)[source]

Method used to fit a model to a given input data.

Xnp.ndarray: Input data to fit the model.
ynp.ndarray or None, default=None: Data labels (optional).
**kwargs: Additional training parameters.

update(**kwargs)[source]: Method used to update model parameters.

abstract updateParameters(**kwargs)[source]: Method used to update model parameters.

class gojo.interfaces.model.SklearnModelWrapper(model_class, predict_proba: bool = False, supress_warnings: bool = False, **kwargs)[source]

Wrapper used for easy integration of models following the sklearn interface into the gojo library and functionality.

model_classtype: Model following the ‘sklearn.base.BaseEstimator’ interface. The class provided does not have to be a subclass of the sklearn interfacebut should provide the basic fit() and predict() (or predict_proba()) methods.
predict_probabool, default=False: Parameter that indicates whether to call the predict_proba() method when making predictions. If this parameter is False (default behavior) the predict() method will be called. If the parameter is set to True and the model provided does not have the predict_proba method implemented, the predict() method will be called and a warning will inform that an attempt has been made to call the predict_proba() method.
supress_warningsbool, default=False: Parameter indicating whether to suppress the warnings issued by the class.
**kwargs: Additional model hyparameters. This parameters will be passed to the model_class constructor.

>>> from gojo import interfaces
>>> from sklearn.naive_bayes import GaussianNB
>>>
>>> # create model
>>> model = interfaces.SklearnModelWrapper(
>>>     GaussianNB, predict_proba=True, priors=[0.25, 0.75])
>>>
>>> # train model
>>> model.train(X, y)    # X and y are numpy.arrays
>>>
>>> # perform inference
>>> y_hat = model.performInference(X_new)    # X_new is a numpy.array
>>>
>>> # reset model fitting
>>> model.resetFit()
>>> model.is_fitted    # must return False

copy()[source]: Method used to make a copy of the model.

getParameters() → dict[source]

Method that must return the model parameters.

model_parametersdict: Model parameters.

property model: Returns the internal model provided by the constructor and adjusted if the train method has been called.

performInference(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]

Method used to perform the model predictions.

Xnp.ndarray: Input data used to perform inference.

model_predictionsnp.ndarray: Model predictions associated with the input data.

reset()[source]: Method used to reset the fitted model.

train(X: numpy.ndarray, y: None = None, **kwargs)[source]

Method used to fit a model to a given input data.

Xnp.ndarray: Input data to fit the model.
ynp.ndarray or None, default=None: Data labels (optional).

updateParameters(**kwargs)[source]

Method used to update the inner model parameters.

NOTE: Model parameters should be updated by calling the update() method from the model superclass.

class gojo.interfaces.model.TorchSKInterface(model: torch.nn.Module, iter_fn: callable, loss_function, n_epochs: int, optimizer_class, dataset_class, dataloader_class, lr_scheduler_class: Optional[type] = None, optimizer_kw: Optional[dict] = None, lr_scheduler_kw: Optional[dict] = None, train_dataset_kw: Optional[dict] = None, valid_dataset_kw: Optional[dict] = None, inference_dataset_kw: Optional[dict] = None, train_dataloader_kw: Optional[dict] = None, valid_dataloader_kw: Optional[dict] = None, inference_dataloader_kw: Optional[dict] = None, iter_fn_kw: Optional[dict] = None, train_split: float = 1.0, train_split_stratify: bool = False, callbacks: Optional[list] = None, metrics: Optional[list] = None, batch_size: Optional[int] = None, seed: Optional[int] = None, device: str = 'cpu', verbose: int = 1)[source]

Wrapper class designed to integrate pytorch models (‘torch.nn.Module’ instances) in the gojo. library functionalities.

modeltorch.nn.Module: Subclass of ‘torch.nn.Module’.
iter_fncallable: Function that executes an epoch of the torch.nn.Module typical training pipeline. For more information consult gojo.deepl.loops.
loss_functioncallable: Loss function used to train the model.
n_epochsint: Number of epochs used to train the model.
optimizer_classtype: Pytorch optimizer used to train the model (see torch.optim module.)
dataset_classtype: Pytorch class dataset used to train the model (see torch.utils.data module or the gojo submodule gojo.deepl.loading).
dataloader_classtype: Pytorch dataloader class (torch.utils.data.DataLoader).
lr_scheduler_classtype, default=None: Class used to construct a learning rate schedule as defined in torch.optim.lr_scheduler().
optimizer_kwdict, default=None: Parameters used to initialize the provided optimizer class.
lr_scheduler_kwdict, default=None: Parameters used to initialize the learning rate scheduler as defined based on lr_scheduler_class.
train_dataset_kwdict, default=None: Parameters used to initialize the provided dataset class for the data used for training.
train_dataloader_kwdict, default=None: Parameters used to initialize the provided dataloader class for the data used for training.
train_splitfloat, default=1.0: Percentage of the training data received in train() that will be used to train the model. The rest of the data will be used as validation set.
valid_dataset_kwdict, default=None: Parameters used to initialize the provided dataset class for the data used for validation. Parameter ignored if train_split == 1.0.
valid_dataloader_kwdict, default=None: Parameters used to initialize the provided dataloader class for the data used for validation. Parameter ignored if train_split == 1.0.
inference_dataset_kwdict, default=None: Parameters used to initialize the provided dataset class for the data used for inference when calling gojo.interfaces.TorchSKInterface.performInference(). If no parameters are provided, the arguments provided for the training will be used.
inference_dataloader_kwdict, default=None: Parameters used to initialize the provided dataloader class for the data used for inference when calling gojo.interfaces.TorchSKInterface.performInference(). If no parameters are provided, the arguments provided for the training will be used changing the dataloader parameters: shuffle = False, drop_last = False, batch_size = batch_size (batch_size provided in the constructor or when calling the method gojo.interfaces.TorchSKInterface.performInference())
iter_fn_kwdict, default=None: Optional arguments of the parameter iter_fn.
train_split_stratifybool, default=False: Parameter indicating whether to perform the train/validation split with class stratification. Parameter ignored if train_split == 1.0.
callbacksList[gojo.deepl.callback.Callback], default=None: Callbacks during model training. For more information see gojo.deepl.callback.
metricsList[gojo.core.evaluation.Metric], default=None: Metrics used to evaluate the model performance during training. Fore more information see gojo.core.evaluation.Metric.
batch_sizeint, default=None: Batch size used when calling to gojo.interfaces.TorchSKInterface.performInference(). This parameter can also be set during the function calling.
seedint, default=None: Random seed used for controlling the randomness.
devicestr, default=’cpu’: Device used for training the model.
verboseint, default=1: Verbosity level. Use -1 to indicate maximum verbosity.

>>> import torch
>>> import pandas as pd
>>> from sklearn import datasets
>>> from sklearn.model_selection import train_test_split
>>>
>>> # Gojo libraries
>>> from gojo import interfaces
>>> from gojo import core
>>> from gojo import deepl
>>> from gojo import util
>>> from gojo import plotting
>>>
>>>
>>> DEVICE = 'mps'
>>>
>>>
>>> # load test dataset (Wine)
>>> wine_dt = datasets.load_wine()
>>>
>>> # create the target variable. Classification problem 0 vs rest
>>> # to see the target names you can use wine_dt['target_names']
>>> y = (wine_dt['target'] == 1).astype(int)
>>> X = wine_dt['data']
>>>
>>> # standardize input data
>>> std_X = util.zscoresScaling(X)
>>>
>>> # split Xs and Ys in training and validation
>>> X_train, X_valid, y_train, y_valid = train_test_split(
>>>     std_X, y, train_size=0.8, random_state=1997, shuffle=True, stratify=y)
>>>
>>> model = interfaces.TorchSKInterface(
>>>     model=deepl.ffn.createSimpleFFNModel(
>>>         in_feats=X_train.shape[1],
>>>         out_feats=1,
>>>         layer_dims=[20],
>>>         layer_activation=torch.nn.ELU(),
>>>         output_activation=torch.nn.Sigmoid()),
>>>     iter_fn=deepl.iterSupervisedEpoch,
>>>     loss_function=torch.nn.BCELoss(),
>>>     n_epochs=50,
>>>     train_split=0.8,
>>>     train_split_stratify=True,
>>>     optimizer_class=torch.optim.Adam,
>>>     dataset_class=deepl.loading.TorchDataset,
>>>     dataloader_class=torch.utils.data.DataLoader,
>>>     optimizer_kw=dict(
>>>         lr=0.001
>>>     ),
>>>     train_dataset_kw=None,
>>>     valid_dataset_kw=None,
>>>     train_dataloader_kw=dict(
>>>         batch_size=16,
>>>         shuffle=True
>>>     ),
>>>     valid_dataloader_kw=dict(
>>>         batch_size=X_train.shape[0]
>>>     ),
>>>     iter_fn_kw= None,
>>>     callbacks= None,
>>>     seed=1997,
>>>     device=DEVICE,
>>>     metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5),
>>>     verbose=1
>>> )
>>>
>>> # train the model
>>> model.train(X_train, y_train)
>>>
>>> # get the model convergence information
>>> model_history = model.fitting_history
>>>
>>> # display model convergence
>>> plotting.linePlot(
>>>     model_history['train'], model_history['valid'],
>>>     x='epoch', y='loss (mean)', err='loss (std)',
>>>     labels=['Train', 'Validation'],
>>>     title='Model convergence',
>>>     ls=['solid', 'dashed'],
>>>     legend_pos='center right')
>>>
>>> # display model performance
>>> plotting.linePlot(
>>>     model_history['train'], model_history['valid'],
>>>     x='epoch', y='f1_score',
>>>     labels=['Train', 'Validation'],
>>>     title='Model F1-score',
>>>     ls=['solid', 'dashed'],
>>>     legend_pos='center right')

copy()[source]: Method used to make a copy of the model.

property fitting_history: tuple: Returns a tuple with the training/validation fitting history of the models returned by the gojo.deepl.loops.fitNeuralNetwork() function. The first element will correspond to the training data while the second element to the validation data.

getParameters() → dict[source]: Returns the model parameters.

loadStateDict(file: str)[source]

Subroutine used to load a state dictionary with the serialized model weights using torch.save.

filestr: File with the saved weights.

property model: torch.nn.Module: Returns the internal model provided by the constructor and adjusted if the train method has been called.

property num_params: int: Returns the number model trainable parameters.

performInference(X: numpy.ndarray, batch_size: Optional[int] = None, **kwargs) → numpy.ndarray[source]

Method used to perform the model predictions.

Xnp.ndarray: Input data used to perform inference.
batch_sizeint, default=None: Parameter indicating whether to perform the inference using batches instead of all input data at once. By default, all input data will by used.
**kwargs: Optional arguments for instance-level data.

model_predictionsnp.ndarray: Model predictions associated with the input data.

reset()[source]: Method used to reset the fitted model.

train(X: numpy.ndarray, y: None = None, **kwargs)[source]

Train the model using the input data.

Xnp.ndarray: Predictor variables.
ynp.ndarray or None, default=None: Target variable.
**kwargs: Optional instance-level arguments.

updateParameters(**kwargs)[source]: Function not available for this class objects. If you want to use a parametrized version see gojo.core.base.ParametrizedTorchSKInterface.

class gojo.interfaces.model.ParametrizedTorchSKInterface(generating_fn: callable, gf_params: dict, iter_fn: callable, loss_function, n_epochs: int, optimizer_class, dataset_class, dataloader_class, lr_scheduler_class: Optional[type] = None, optimizer_kw: Optional[dict] = None, lr_scheduler_kw: Optional[dict] = None, train_dataset_kw: Optional[dict] = None, valid_dataset_kw: Optional[dict] = None, inference_dataset_kw: Optional[dict] = None, train_dataloader_kw: Optional[dict] = None, valid_dataloader_kw: Optional[dict] = None, inference_dataloader_kw: Optional[dict] = None, iter_fn_kw: Optional[dict] = None, train_split: float = 1.0, train_split_stratify: bool = False, callbacks: Optional[list] = None, metrics: Optional[list] = None, batch_size: Optional[int] = None, seed: Optional[int] = None, device: str = 'cpu', verbose: int = 1)[source]

Parameterized version of gojo.interfaces.TorchSKInterface. This implementation is useful for performing cross validation with hyperparameter optimization using the gojo.core.loops.evalCrossValNestedHPO() function. This class provides an implementation of the updateParameters() method.

generating_fncallable: Function used to generate a model from a set of parameters. Currently, there are some implemented functions such as gojo.deepl.ffn.createSimpleFFNModel(). Also, the user can define its own generating function.
gf_paramsdict: Parameters used by the input function generating_fn to generate a torch.nn.Module instance.
iter_fncallable: Function that executes an epoch of the torch.nn.Module typical training pipeline. For more information consult gojo.deepl.loops.
loss_functioncallable: Loss function used to train the model.
n_epochsint: Number of epochs used to train the model.
optimizer_classtype: Pytorch optimizer used to train the model (see torch.optim module.)
dataset_classtype: Pytorch class dataset used to train the model (see torch.utils.data module or the gojo submodule gojo.deepl.loading).
dataloader_classtype: Pytorch dataloader class (torch.utils.data.DataLoader).
lr_scheduler_classtype, default=None: Class used to construct a learning rate schedule as defined in torch.optim.lr_scheduler().
optimizer_kwdict, default=None: Parameters used to initialize the provided optimizer class.
lr_scheduler_kwdict, default=None: Parameters used to initialize the learning rate scheduler as defined based on lr_scheduler_class.
train_dataset_kwdict, default=None: Parameters used to initialize the provided dataset class for the data used for training.
train_dataloader_kwdict, default=None: Parameters used to initialize the provided dataloader class for the data used for training.
train_splitfloat, default=1.0: Percentage of the training data received in train() that will be used to train the model. The rest of the data will be used as validation set.
valid_dataset_kwdict, default=None: Parameters used to initialize the provided dataset class for the data used for validation. Parameter ignored if train_split == 1.0.
valid_dataloader_kwdict, default=None: Parameters used to initialize the provided dataloader class for the data used for validation. Parameter ignored if train_split == 1.0.
inference_dataset_kwdict, default=None: Parameters used to initialize the provided dataset class for the data used for inference when calling gojo.interfaces.TorchSKInterface.performInference(). If no parameters are provided, the arguments provided for the training will be used.
inference_dataloader_kwdict, default=None: Parameters used to initialize the provided dataloader class for the data used for inference when calling gojo.interfaces.TorchSKInterface.performInference(). If no parameters are provided, the arguments provided for the training will be used changing the dataloader parameters: shuffle = False, drop_last = False, batch_size = batch_size (batch_size provided in the constructor or when calling the method gojo.interfaces.TorchSKInterface.performInference())
iter_fn_kwdict, default=None: Optional arguments of the parameter iter_fn.
train_split_stratifybool, default=False: Parameter indicating whether to perform the train/validation split with class stratification. Parameter ignored if train_split == 1.0.
callbacksList[gojo.deepl.callback.Callback], default=None: Callbacks during model training. For more information see gojo.deepl.callback.
metricsList[gojo.core.evaluation.Metric], default=None: Metrics used to evaluate the model performance during training. Fore more information see gojo.core.evaluation.Metric.
batch_sizeint, default=None: Batch size used when calling to gojo.interfaces.ParametrizedTorchSKInterface.performInference(). This parameter can also be set during the function calling.
seedint, default=None: Random seed used for controlling the randomness.
devicestr, default=’cpu’: Device used for training the model.
verboseint, default=1: Verbosity level. Use -1 to indicate maximum verbosity.

>>> import sys
>>>
>>> sys.path.append('..')
>>>
>>> import torch
>>> import pandas as pd
>>> from sklearn import datasets
>>> from sklearn.model_selection import train_test_split
>>>
>>> # GOJO libraries
>>> from gojo import interfaces
>>> from gojo import core
>>> from gojo import deepl
>>> from gojo import util
>>> from gojo import plotting
>>>
>>> DEVICE = 'mps'
>>>
>>> # load test dataset (Wine)
>>> wine_dt = datasets.load_wine()
>>>
>>> # create the target variable. Classification problem 0 vs rest
>>> # to see the target names you can use wine_dt['target_names']
>>> y = (wine_dt['target'] == 1).astype(int)
>>> X = wine_dt['data']
>>>
>>> # standarize input data
>>> std_X = util.zscoresScaling(X)
>>>
>>> # split Xs and Ys in training and validation
>>> X_train, X_valid, y_train, y_valid = train_test_split(
>>>     std_X, y, train_size=0.8, random_state=1997, shuffle=True,
>>>     stratify=y
>>> )
>>>
>>> model = interfaces.ParametrizedTorchSKInterface(
>>>     generating_fn=deepl.ffn.createSimpleFFNModel,
>>>     gf_params=dict(
>>>         in_feats=X_train.shape[1],
>>>         out_feats=1,
>>>         layer_dims=[20],
>>>         layer_activation='ELU',
>>>         output_activation='Sigmoid'),
>>>     iter_fn=deepl.iterSupervisedEpoch,
>>>     loss_function=torch.nn.BCELoss(),
>>>     n_epochs=50,
>>>     train_split=0.8,
>>>     train_split_stratify=True,
>>>     optimizer_class=torch.optim.Adam,
>>>     dataset_class=deepl.loading.TorchDataset,
>>>     dataloader_class=torch.utils.data.DataLoader,
>>>     optimizer_kw=dict(
>>>         lr=0.001
>>>     ),
>>>     train_dataset_kw=None,
>>>     valid_dataset_kw=None,
>>>     train_dataloader_kw=dict(
>>>         batch_size=16,
>>>         shuffle=True
>>>     ),
>>>     valid_dataloader_kw=dict(
>>>         batch_size=X_train.shape[0]
>>>     ),
>>>     iter_fn_kw= None,
>>>     callbacks= None,
>>>     seed=1997,
>>>     device=DEVICE,
>>>     metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5, select=['f1_score']),
>>>     verbose=1
>>> )
>>>
>>> # train the model
>>> model.train(X_train, y_train)
>>>
>>> # display model convergence
>>> model_history = model.fitting_history
>>> plotting.linePlot(
>>>     model_history['train'], model_history['valid'],
>>>     x='epoch', y='loss (mean)', err='loss (std)',
>>>     labels=['Train', 'Validation'],
>>>     title='Model convergence',
>>>     ls=['solid', 'dashed'],
>>>     legend_pos='center right')
>>>
>>> # display model performance
>>> plotting.linePlot(
>>>     model_history['train'], model_history['valid'],
>>>     x='epoch', y='f1_score',
>>>     labels=['Train', 'Validation'],
>>>     title='Model F1-score',
>>>     ls=['solid', 'dashed'],
>>>     legend_pos='center right')
>>>
>>> # update model paramters
>>> model.update(
>>>     n_epochs=100,
>>>     train_dataloader_kw__batch_size=32,
>>>     gf_params__layer_dims=[5, 5, 5],
>>>     metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5, select=['f1_score', 'auc'])
>>> )
>>>
>>> # after parameter updating the model is reseted
>>> y_hat = model.performInference(X_valid)
>>> pd.DataFrame([core.getScores(y_true=y_valid, y_pred=y_hat,
>>>                metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5))]
>>> ).T.round(decimals=3)
>>>

copy()[source]: Method used to make a copy of the model.

getParameters() → dict[source]: Returns the model parameters.

updateParameters(**kwargs)[source]

Method that allows updating the model parameters. If you want to update a parameter contained in a dictionary, the name of the dictionary key must be specified together with the name of the parameter separated by “__”.

NOTE: Model parameters should be updated by calling the update() method from the model superclass.

>>> from gojo import interfaces
>>> from gojo import deepl
>>>
>>> # create the model to be evaluated
>>> model = interfaces.ParametrizedTorchSKInterface(
>>>     # example of generating function
>>>     generating_fn=deepl.ffn.createSimpleFFNModel,
>>>     gf_params=dict(
>>>         in_feats=13,
>>>         out_feats=1,
>>>         layer_dims=[20, 10],
>>>         layer_activation='ELU',
>>>         output_activation='Sigmoid'),
>>>     # example of iteration function
>>>     iter_fn=deepl.iterSupervisedEpoch,
>>>     loss_function=torch.nn.BCELoss(),
>>>     n_epochs=50,
>>>     train_split=0.8,
>>>     train_split_stratify=True,
>>>     optimizer_class=torch.optim.Adam,
>>>     dataset_class=deepl.loading.TorchDataset,
>>>     dataloader_class=torch.utils.data.DataLoader,
>>>     optimizer_kw=dict(
>>>         lr=0.001
>>>     ),
>>>     train_dataloader_kw=dict(
>>>         batch_size=16,
>>>         shuffle=True
>>>     ),
>>>     valid_dataloader_kw=dict(
>>>         batch_size=200
>>>     ),
>>>     # use default classification metrics
>>>     metrics=core.getDefaultMetrics(
>>>        'binary_classification', bin_threshold=0.5, select=['f1_score']),
>>> )
>>> model
Out [0]
    ParametrizedTorchSKInterface(
        model=Sequential(
      (LinearLayer 0): Linear(in_features=13, out_features=20, bias=True)
      (Activation 0): ELU(alpha=1.0)
      (LinearLayer 1): Linear(in_features=20, out_features=10, bias=True)
      (Activation 1): ELU(alpha=1.0)
      (LinearLayer 2): Linear(in_features=10, out_features=1, bias=True)
      (Activation 2): Sigmoid()
    ),
        iter_fn=<function iterSupervisedEpoch at 0x7fd7ca47b940>,
        loss_function=BCELoss(),
        n_epochs=50,
        train_split=0.8,
        train_split_stratify=True,
        optimizer_class=<class 'torch.optim.adam.Adam'>,
        dataset_class=<class 'gojo.deepl.loading.TorchDataset'>,
        dataloader_class=<class 'torch.utils.data.dataloader.DataLoader'>,
        optimizer_kw={'lr': 0.001},
        train_dataset_kw={},
        valid_dataset_kw={},
        train_dataloader_kw={'batch_size': 16, 'shuffle': True},
        valid_dataloader_kw={'batch_size': 200},
        iter_fn_kw={},
        callbacks=None,
        metrics=[Metric(
        name=f1_score,
        function_kw={},
        multiclass=False
    )],
        seed=None,
        device=cpu,
        verbose=1,
        generating_fn=<function createSimpleFFNModel at 0x7fd7ca4805e0>,
        gf_params={'in_feats': 13, 'out_feats': 1, 'layer_dims': [20, 10], 'layer_activation': 'ELU',
        'output_activation': 'Sigmoid'}
    )
>>>
>>> # update parameters by using the update() method provided by the Model interface
>>> model.update(
>>>    gf_params__layer_dims=[5],    # update dictionary-level parameter
>>>    n_epochs=100                  # update model-level parameter
>>> )
Out [1]
    ParametrizedTorchSKInterface(
        model=Sequential(
      (LinearLayer 0): Linear(in_features=13, out_features=5, bias=True)
      (Activation 0): ELU(alpha=1.0)
      (LinearLayer 1): Linear(in_features=5, out_features=1, bias=True)
      (Activation 1): Sigmoid()
    ),
        iter_fn=<function iterSupervisedEpoch at 0x7fd7ca47b940>,
        loss_function=BCELoss(),
        n_epochs=100,
        train_split=0.8,
        train_split_stratify=True,
        optimizer_class=<class 'torch.optim.adam.Adam'>,
        dataset_class=<class 'gojo.deepl.loading.TorchDataset'>,
        dataloader_class=<class 'torch.utils.data.dataloader.DataLoader'>,
        optimizer_kw={'lr': 0.001},
        train_dataset_kw={},
        valid_dataset_kw={},
        train_dataloader_kw={'batch_size': 16, 'shuffle': True},
        valid_dataloader_kw={'batch_size': 200},
        iter_fn_kw={},
        callbacks=None,
        metrics=[Metric(
        name=f1_score,
        function_kw={},
        multiclass=False
    )],
        seed=None,
        device=cpu,
        verbose=1,
        generating_fn=<function createSimpleFFNModel at 0x7fd7ca4805e0>,
        gf_params={'in_feats': 13, 'out_feats': 1, 'layer_dims': [5], 'layer_activation': 'ELU',
        'output_activation': 'Sigmoid'}
    )

gojo.interfaces.data module

class gojo.interfaces.data.Dataset(data: pandas.Series)[source]

Bases: object

Class representing a dataset. This class is used internally by the functions defined in gojo.core.loops.

datanp.ndarray or pd.DataFrame or pd.Series: Data to be homogenized as a dataset.

property array_data: numpy.ndarray: Returns the input data as a numpy.array.

property index_values: numpy.array: Returns the input data index values.

property var_names: list: Returns the name of the variables.

gojo.interfaces.transform module

class gojo.interfaces.transform.GraphStandardScaler[source]

Bases: gojo.interfaces.transform.Transform

Class that performs a standardization of three-dimensional input data associated with the following dimensions: (n_instances, n_nodes, n_features). The returned data will have a mean of 0 and standard deviation of 1 along dimensions 1 and 2.

fit(X: numpy.ndarray, y: None = None, **kwargs)[source]

Method used to fit a transform to a given input data.

Xnp.ndarray: Input data to fit the model.
ynp.ndarray or None, default=None: Data labels (optional).

getParameters() → dict[source]

Method that must return the transform parameters.

model_parametersdict: Model parameters.

reset()[source]: Reset the model fit.

transform(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]

Method used to apply the transformations.

Xnp.ndarray: Input data to be transformed.

X_transnp.ndarray: Transformer data.

updateParameters(**kwargs)[source]: This method has no effect.

class gojo.interfaces.transform.SKLearnTransformWrapper(transform_class, **kwargs)[source]

Bases: gojo.interfaces.transform.Transform

Wrapper used to easily incorporate the transformations implemented in the sklearn library.

transform_classType: sklearn transform. The instances of this class must have the fit and transform methods defined according to the sklearn implementation.
**kwargs: Optional arguments used to initialize instances of the provided class.

>>> from sklearn.svm import SVC
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.decomposition import PCA
>>>
>>> # GOJO libraries
>>> import gojo
>>> from gojo import core
>>> from gojo import interfaces
>>>
>>> # previous model transforms
>>> transforms = [
>>>     interfaces.SKLearnTransformWrapper(StandardScaler),
>>>     interfaces.SKLearnTransformWrapper(PCA, n_components=5)
>>> ]
>>>
>>> # default model
>>> model = interfaces.SklearnModelWrapper(
>>>     SVC, kernel='poly', degree=1, coef0=0.0,
>>>     cache_size=1000, class_weight=None
>>> )
>>>
>>> cv_report = core.evalCrossVal(
>>>     X=X, y=y,
>>>     model=model,
>>>     cv=gojo.util.getCrossValObj(cv=5),
>>>     transforms=transforms)
>>>

copy()[source]: Make a deepcopy of the instance.

fit(X: numpy.ndarray, y: None = None, **kwargs)[source]

Method used to fit a transform to a given input data.

Xnp.ndarray: Input data to fit the model.
ynp.ndarray or None, default=None: Data labels (optional).

getParameters() → dict[source]

Method that must return the transform parameters.

model_parametersdict: Model parameters.

reset()[source]: Reset the model fit.

transform(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]

Method used to apply the transformations.

Xnp.ndarray: Input data to be transformed.

X_transnp.ndarray: Transformer data.

property transform_obj: object: Get the internal transform object. By default, a deepcopy from the transform will be generated. To return the internal transformation directly, it is possible by selecting copy=True.

updateParameters(**kwargs)[source]

Method used to update the inner transform parameters.

IMPORTANT NOTE: Transform parameters should be updated by calling the update() method from the superclass gojo.core.transform.Transform.

class gojo.interfaces.transform.Transform[source]

Bases: object

Base interface for applying transformations to the input data in the gojo.core.loops subroutines. Internally, the training data will be passed to the fit() method for adjusting the transformation to the training dataset statistics, and subsequently, the transformation will be applied to the training and test data by means of the transform().

Subclasses must define the following methods:

fit()
Method used to fit a transform to a given input data.

transform()
Method used to perform the transformations to the input data.

reset()
Method used to reset the fitted transform

copy()
Method used to make a copy of the transform. It is not mandatory to define this method. By default, a deep copy will be performed

getParameters()
Method that must return the transform parameters. It is not mandatory to define this method. By default, it will return a various dictionary

updateParameters()
Method used to update the transform parameters. It is not mandatory to define this method. By default, it will have no effect

This abstract class provides the following properties:

is_fitted
Indicates whether the transformation has been fitted by calling the fit() method.

And the following methods:

update()
Method used to update the transform parameters.

fitted()
Method called (usually internally) to indicate that a given transformation have been fitted.

resetFit()
Method used to reset a fitted transformation (usually called internally).

copy()[source]: Method used to make a copy of the transform.

abstract fit(X: numpy.ndarray, y: None, **kwargs)[source]

Method used to fit a transform to a given input data.

Xnp.ndarray: Input data to fit the transformation.
ynp.ndarray or None, optional: Data labels (optional).

fitted()[source]: Method called to indicate that a given transformation have been fitted.

getParameters() → dict[source]

Method that must return the transform parameters.

model_parametersdict: Model parameters.

property is_fitted: bool

Indicates whether the transformation has been fitted by calling the fit() method.

model_fittedbool: Returns True if the model was fitted.

abstract reset(**kwargs)[source]: Method used to reset the fitted transform.

resetFit()[source]: Method used to reset a fitted transformation.

abstract transform(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]

Method used to perform the transformations to the input data.

Xnp.ndarray: Input data used to perform the transformations.

update(**kwargs)[source]: Method used to update the transform parameters.

updateParameters(**kwargs)[source]: Method used to update the transform parameters.

gojo.interfaces package

Submodules

gojo.interfaces.model module

gojo.interfaces.data module

gojo.interfaces.transform module

Module contents