gojo.interfaces package

Submodules

gojo.interfaces.model module

class gojo.interfaces.model.Model[source]

Base class (interface) used to define a model that can interact with the gojo library.

Subclasses must define the following methods:

  • train()

    This method is used to fit a given model to the input data. Once the model has been fitted, inside this method, the superclass method fitted() must be called; otherwise, the model will not be recognized as fitted to any data, and performInference() will raise a gojo.exception.UnfittedEstimator error.

  • performInference()

    Once the model has been fitted using the train() method (when the is_fitted property is called, the returned value should be True), this method allows performing inferences on new data.

  • reset()

    This method should reset the inner estimator, forgetting all the data seen.

  • getParameters()

    This method must return a dictionary containing the parameters used by the model. The parameters returned by this method will be used to store metadata about the model.

  • updateParameters()

    This method must update the inner parameters of the model.

  • copy()

    This method must return a copy of the model.

This abstract class provides the following properties:

  • parameters -> dict

    Returns the hyperparameters of the model.

  • is_fitted -> bool

    Indicates whether a given model has been fitted (i.e., if the train() method was called).

And the following methods:

  • fitted()

    This method should be called inside the train() method to indicate that the model was fitted to the input data and can now perform inferences using the performInference() subroutine.

  • resetFit()

    This method is used to reset learned model weights.

abstract copy()[source]

Method used to make a copy of the model.

fitted()[source]

Method called to indicate that a given model have been fitted.

abstract getParameters() dict[source]

Method that must return the model parameters.

model_parametersdict

Model parameters.

property is_fitted: bool

Indicates whether the model has been trained by calling the train() method.

model_fittedbool

Returns True if the model was fitted.

property parameters: dict

Return the model parameters defined in the getParameters() method.

model_parametersdict

Model parameters.

abstract performInference(X: numpy.ndarray, **kwargs) numpy.ndarray[source]

Method used to perform the model predictions.

Xnp.ndarray

Input data used to perform inference.

abstract reset(**kwargs)[source]

Method used to reset the fitted model.

resetFit()[source]

Method used to reset a fitted model.

abstract train(X: numpy.ndarray, y: None = None, **kwargs)[source]

Method used to fit a model to a given input data.

Xnp.ndarray

Input data to fit the model.

ynp.ndarray or None, default=None

Data labels (optional).

**kwargs

Additional training parameters.

update(**kwargs)[source]

Method used to update model parameters.

abstract updateParameters(**kwargs)[source]

Method used to update model parameters.

class gojo.interfaces.model.SklearnModelWrapper(model_class, predict_proba: bool = False, supress_warnings: bool = False, **kwargs)[source]

Wrapper used for easy integration of models following the sklearn interface into the gojo library and functionality.

model_classtype

Model following the ‘sklearn.base.BaseEstimator’ interface. The class provided does not have to be a subclass of the sklearn interfacebut should provide the basic fit() and predict() (or predict_proba()) methods.

predict_probabool, default=False

Parameter that indicates whether to call the predict_proba() method when making predictions. If this parameter is False (default behavior) the predict() method will be called. If the parameter is set to True and the model provided does not have the predict_proba method implemented, the predict() method will be called and a warning will inform that an attempt has been made to call the predict_proba() method.

supress_warningsbool, default=False

Parameter indicating whether to suppress the warnings issued by the class.

**kwargs

Additional model hyparameters. This parameters will be passed to the model_class constructor.

>>> from gojo import interfaces
>>> from sklearn.naive_bayes import GaussianNB
>>>
>>> # create model
>>> model = interfaces.SklearnModelWrapper(
>>>     GaussianNB, predict_proba=True, priors=[0.25, 0.75])
>>>
>>> # train model
>>> model.train(X, y)    # X and y are numpy.arrays
>>>
>>> # perform inference
>>> y_hat = model.performInference(X_new)    # X_new is a numpy.array
>>>
>>> # reset model fitting
>>> model.resetFit()
>>> model.is_fitted    # must return False
copy()[source]

Method used to make a copy of the model.

getParameters() dict[source]

Method that must return the model parameters.

model_parametersdict

Model parameters.

property model

Returns the internal model provided by the constructor and adjusted if the train method has been called.

performInference(X: numpy.ndarray, **kwargs) numpy.ndarray[source]

Method used to perform the model predictions.

Xnp.ndarray

Input data used to perform inference.

model_predictionsnp.ndarray

Model predictions associated with the input data.

reset()[source]

Method used to reset the fitted model.

train(X: numpy.ndarray, y: None = None, **kwargs)[source]

Method used to fit a model to a given input data.

Xnp.ndarray

Input data to fit the model.

ynp.ndarray or None, default=None

Data labels (optional).

updateParameters(**kwargs)[source]

Method used to update the inner model parameters.

  • NOTE: Model parameters should be updated by calling the update() method from the model superclass.

class gojo.interfaces.model.TorchSKInterface(model: torch.nn.Module, iter_fn: callable, loss_function, n_epochs: int, optimizer_class, dataset_class, dataloader_class, lr_scheduler_class: Optional[type] = None, optimizer_kw: Optional[dict] = None, lr_scheduler_kw: Optional[dict] = None, train_dataset_kw: Optional[dict] = None, valid_dataset_kw: Optional[dict] = None, inference_dataset_kw: Optional[dict] = None, train_dataloader_kw: Optional[dict] = None, valid_dataloader_kw: Optional[dict] = None, inference_dataloader_kw: Optional[dict] = None, iter_fn_kw: Optional[dict] = None, train_split: float = 1.0, train_split_stratify: bool = False, callbacks: Optional[list] = None, metrics: Optional[list] = None, batch_size: Optional[int] = None, seed: Optional[int] = None, device: str = 'cpu', verbose: int = 1)[source]

Wrapper class designed to integrate pytorch models (‘torch.nn.Module’ instances) in the gojo. library functionalities.

modeltorch.nn.Module

Subclass of ‘torch.nn.Module’.

iter_fncallable

Function that executes an epoch of the torch.nn.Module typical training pipeline. For more information consult gojo.deepl.loops.

loss_functioncallable

Loss function used to train the model.

n_epochsint

Number of epochs used to train the model.

optimizer_classtype

Pytorch optimizer used to train the model (see torch.optim module.)

dataset_classtype

Pytorch class dataset used to train the model (see torch.utils.data module or the gojo submodule gojo.deepl.loading).

dataloader_classtype

Pytorch dataloader class (torch.utils.data.DataLoader).

lr_scheduler_classtype, default=None

Class used to construct a learning rate schedule as defined in torch.optim.lr_scheduler().

optimizer_kwdict, default=None

Parameters used to initialize the provided optimizer class.

lr_scheduler_kwdict, default=None

Parameters used to initialize the learning rate scheduler as defined based on lr_scheduler_class.

train_dataset_kwdict, default=None

Parameters used to initialize the provided dataset class for the data used for training.

train_dataloader_kwdict, default=None

Parameters used to initialize the provided dataloader class for the data used for training.

train_splitfloat, default=1.0

Percentage of the training data received in train() that will be used to train the model. The rest of the data will be used as validation set.

valid_dataset_kwdict, default=None

Parameters used to initialize the provided dataset class for the data used for validation. Parameter ignored if train_split == 1.0.

valid_dataloader_kwdict, default=None

Parameters used to initialize the provided dataloader class for the data used for validation. Parameter ignored if train_split == 1.0.

inference_dataset_kwdict, default=None

Parameters used to initialize the provided dataset class for the data used for inference when calling gojo.interfaces.TorchSKInterface.performInference(). If no parameters are provided, the arguments provided for the training will be used.

inference_dataloader_kwdict, default=None

Parameters used to initialize the provided dataloader class for the data used for inference when calling gojo.interfaces.TorchSKInterface.performInference(). If no parameters are provided, the arguments provided for the training will be used changing the dataloader parameters: shuffle = False, drop_last = False, batch_size = batch_size (batch_size provided in the constructor or when calling the method gojo.interfaces.TorchSKInterface.performInference())

iter_fn_kwdict, default=None

Optional arguments of the parameter iter_fn.

train_split_stratifybool, default=False

Parameter indicating whether to perform the train/validation split with class stratification. Parameter ignored if train_split == 1.0.

callbacksList[gojo.deepl.callback.Callback], default=None

Callbacks during model training. For more information see gojo.deepl.callback.

metricsList[gojo.core.evaluation.Metric], default=None

Metrics used to evaluate the model performance during training. Fore more information see gojo.core.evaluation.Metric.

batch_sizeint, default=None

Batch size used when calling to gojo.interfaces.TorchSKInterface.performInference(). This parameter can also be set during the function calling.

seedint, default=None

Random seed used for controlling the randomness.

devicestr, default=’cpu’

Device used for training the model.

verboseint, default=1

Verbosity level. Use -1 to indicate maximum verbosity.

>>> import torch
>>> import pandas as pd
>>> from sklearn import datasets
>>> from sklearn.model_selection import train_test_split
>>>
>>> # Gojo libraries
>>> from gojo import interfaces
>>> from gojo import core
>>> from gojo import deepl
>>> from gojo import util
>>> from gojo import plotting
>>>
>>>
>>> DEVICE = 'mps'
>>>
>>>
>>> # load test dataset (Wine)
>>> wine_dt = datasets.load_wine()
>>>
>>> # create the target variable. Classification problem 0 vs rest
>>> # to see the target names you can use wine_dt['target_names']
>>> y = (wine_dt['target'] == 1).astype(int)
>>> X = wine_dt['data']
>>>
>>> # standardize input data
>>> std_X = util.zscoresScaling(X)
>>>
>>> # split Xs and Ys in training and validation
>>> X_train, X_valid, y_train, y_valid = train_test_split(
>>>     std_X, y, train_size=0.8, random_state=1997, shuffle=True, stratify=y)
>>>
>>> model = interfaces.TorchSKInterface(
>>>     model=deepl.ffn.createSimpleFFNModel(
>>>         in_feats=X_train.shape[1],
>>>         out_feats=1,
>>>         layer_dims=[20],
>>>         layer_activation=torch.nn.ELU(),
>>>         output_activation=torch.nn.Sigmoid()),
>>>     iter_fn=deepl.iterSupervisedEpoch,
>>>     loss_function=torch.nn.BCELoss(),
>>>     n_epochs=50,
>>>     train_split=0.8,
>>>     train_split_stratify=True,
>>>     optimizer_class=torch.optim.Adam,
>>>     dataset_class=deepl.loading.TorchDataset,
>>>     dataloader_class=torch.utils.data.DataLoader,
>>>     optimizer_kw=dict(
>>>         lr=0.001
>>>     ),
>>>     train_dataset_kw=None,
>>>     valid_dataset_kw=None,
>>>     train_dataloader_kw=dict(
>>>         batch_size=16,
>>>         shuffle=True
>>>     ),
>>>     valid_dataloader_kw=dict(
>>>         batch_size=X_train.shape[0]
>>>     ),
>>>     iter_fn_kw= None,
>>>     callbacks= None,
>>>     seed=1997,
>>>     device=DEVICE,
>>>     metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5),
>>>     verbose=1
>>> )
>>>
>>> # train the model
>>> model.train(X_train, y_train)
>>>
>>> # get the model convergence information
>>> model_history = model.fitting_history
>>>
>>> # display model convergence
>>> plotting.linePlot(
>>>     model_history['train'], model_history['valid'],
>>>     x='epoch', y='loss (mean)', err='loss (std)',
>>>     labels=['Train', 'Validation'],
>>>     title='Model convergence',
>>>     ls=['solid', 'dashed'],
>>>     legend_pos='center right')
>>>
>>> # display model performance
>>> plotting.linePlot(
>>>     model_history['train'], model_history['valid'],
>>>     x='epoch', y='f1_score',
>>>     labels=['Train', 'Validation'],
>>>     title='Model F1-score',
>>>     ls=['solid', 'dashed'],
>>>     legend_pos='center right')
copy()[source]

Method used to make a copy of the model.

property fitting_history: tuple

Returns a tuple with the training/validation fitting history of the models returned by the gojo.deepl.loops.fitNeuralNetwork() function. The first element will correspond to the training data while the second element to the validation data.

getParameters() dict[source]

Returns the model parameters.

loadStateDict(file: str)[source]

Subroutine used to load a state dictionary with the serialized model weights using torch.save.

filestr

File with the saved weights.

property model: torch.nn.Module

Returns the internal model provided by the constructor and adjusted if the train method has been called.

property num_params: int

Returns the number model trainable parameters.

performInference(X: numpy.ndarray, batch_size: Optional[int] = None, **kwargs) numpy.ndarray[source]

Method used to perform the model predictions.

Xnp.ndarray

Input data used to perform inference.

batch_sizeint, default=None

Parameter indicating whether to perform the inference using batches instead of all input data at once. By default, all input data will by used.

**kwargs

Optional arguments for instance-level data.

model_predictionsnp.ndarray

Model predictions associated with the input data.

reset()[source]

Method used to reset the fitted model.

train(X: numpy.ndarray, y: None = None, **kwargs)[source]

Train the model using the input data.

Xnp.ndarray

Predictor variables.

ynp.ndarray or None, default=None

Target variable.

**kwargs

Optional instance-level arguments.

updateParameters(**kwargs)[source]

Function not available for this class objects. If you want to use a parametrized version see gojo.core.base.ParametrizedTorchSKInterface.

class gojo.interfaces.model.ParametrizedTorchSKInterface(generating_fn: callable, gf_params: dict, iter_fn: callable, loss_function, n_epochs: int, optimizer_class, dataset_class, dataloader_class, lr_scheduler_class: Optional[type] = None, optimizer_kw: Optional[dict] = None, lr_scheduler_kw: Optional[dict] = None, train_dataset_kw: Optional[dict] = None, valid_dataset_kw: Optional[dict] = None, inference_dataset_kw: Optional[dict] = None, train_dataloader_kw: Optional[dict] = None, valid_dataloader_kw: Optional[dict] = None, inference_dataloader_kw: Optional[dict] = None, iter_fn_kw: Optional[dict] = None, train_split: float = 1.0, train_split_stratify: bool = False, callbacks: Optional[list] = None, metrics: Optional[list] = None, batch_size: Optional[int] = None, seed: Optional[int] = None, device: str = 'cpu', verbose: int = 1)[source]

Parameterized version of gojo.interfaces.TorchSKInterface. This implementation is useful for performing cross validation with hyperparameter optimization using the gojo.core.loops.evalCrossValNestedHPO() function. This class provides an implementation of the updateParameters() method.

generating_fncallable

Function used to generate a model from a set of parameters. Currently, there are some implemented functions such as gojo.deepl.ffn.createSimpleFFNModel(). Also, the user can define its own generating function.

gf_paramsdict

Parameters used by the input function generating_fn to generate a torch.nn.Module instance.

iter_fncallable

Function that executes an epoch of the torch.nn.Module typical training pipeline. For more information consult gojo.deepl.loops.

loss_functioncallable

Loss function used to train the model.

n_epochsint

Number of epochs used to train the model.

optimizer_classtype

Pytorch optimizer used to train the model (see torch.optim module.)

dataset_classtype

Pytorch class dataset used to train the model (see torch.utils.data module or the gojo submodule gojo.deepl.loading).

dataloader_classtype

Pytorch dataloader class (torch.utils.data.DataLoader).

lr_scheduler_classtype, default=None

Class used to construct a learning rate schedule as defined in torch.optim.lr_scheduler().

optimizer_kwdict, default=None

Parameters used to initialize the provided optimizer class.

lr_scheduler_kwdict, default=None

Parameters used to initialize the learning rate scheduler as defined based on lr_scheduler_class.

train_dataset_kwdict, default=None

Parameters used to initialize the provided dataset class for the data used for training.

train_dataloader_kwdict, default=None

Parameters used to initialize the provided dataloader class for the data used for training.

train_splitfloat, default=1.0

Percentage of the training data received in train() that will be used to train the model. The rest of the data will be used as validation set.

valid_dataset_kwdict, default=None

Parameters used to initialize the provided dataset class for the data used for validation. Parameter ignored if train_split == 1.0.

valid_dataloader_kwdict, default=None

Parameters used to initialize the provided dataloader class for the data used for validation. Parameter ignored if train_split == 1.0.

inference_dataset_kwdict, default=None

Parameters used to initialize the provided dataset class for the data used for inference when calling gojo.interfaces.TorchSKInterface.performInference(). If no parameters are provided, the arguments provided for the training will be used.

inference_dataloader_kwdict, default=None

Parameters used to initialize the provided dataloader class for the data used for inference when calling gojo.interfaces.TorchSKInterface.performInference(). If no parameters are provided, the arguments provided for the training will be used changing the dataloader parameters: shuffle = False, drop_last = False, batch_size = batch_size (batch_size provided in the constructor or when calling the method gojo.interfaces.TorchSKInterface.performInference())

iter_fn_kwdict, default=None

Optional arguments of the parameter iter_fn.

train_split_stratifybool, default=False

Parameter indicating whether to perform the train/validation split with class stratification. Parameter ignored if train_split == 1.0.

callbacksList[gojo.deepl.callback.Callback], default=None

Callbacks during model training. For more information see gojo.deepl.callback.

metricsList[gojo.core.evaluation.Metric], default=None

Metrics used to evaluate the model performance during training. Fore more information see gojo.core.evaluation.Metric.

batch_sizeint, default=None

Batch size used when calling to gojo.interfaces.ParametrizedTorchSKInterface.performInference(). This parameter can also be set during the function calling.

seedint, default=None

Random seed used for controlling the randomness.

devicestr, default=’cpu’

Device used for training the model.

verboseint, default=1

Verbosity level. Use -1 to indicate maximum verbosity.

>>> import sys
>>>
>>> sys.path.append('..')
>>>
>>> import torch
>>> import pandas as pd
>>> from sklearn import datasets
>>> from sklearn.model_selection import train_test_split
>>>
>>> # GOJO libraries
>>> from gojo import interfaces
>>> from gojo import core
>>> from gojo import deepl
>>> from gojo import util
>>> from gojo import plotting
>>>
>>> DEVICE = 'mps'
>>>
>>> # load test dataset (Wine)
>>> wine_dt = datasets.load_wine()
>>>
>>> # create the target variable. Classification problem 0 vs rest
>>> # to see the target names you can use wine_dt['target_names']
>>> y = (wine_dt['target'] == 1).astype(int)
>>> X = wine_dt['data']
>>>
>>> # standarize input data
>>> std_X = util.zscoresScaling(X)
>>>
>>> # split Xs and Ys in training and validation
>>> X_train, X_valid, y_train, y_valid = train_test_split(
>>>     std_X, y, train_size=0.8, random_state=1997, shuffle=True,
>>>     stratify=y
>>> )
>>>
>>> model = interfaces.ParametrizedTorchSKInterface(
>>>     generating_fn=deepl.ffn.createSimpleFFNModel,
>>>     gf_params=dict(
>>>         in_feats=X_train.shape[1],
>>>         out_feats=1,
>>>         layer_dims=[20],
>>>         layer_activation='ELU',
>>>         output_activation='Sigmoid'),
>>>     iter_fn=deepl.iterSupervisedEpoch,
>>>     loss_function=torch.nn.BCELoss(),
>>>     n_epochs=50,
>>>     train_split=0.8,
>>>     train_split_stratify=True,
>>>     optimizer_class=torch.optim.Adam,
>>>     dataset_class=deepl.loading.TorchDataset,
>>>     dataloader_class=torch.utils.data.DataLoader,
>>>     optimizer_kw=dict(
>>>         lr=0.001
>>>     ),
>>>     train_dataset_kw=None,
>>>     valid_dataset_kw=None,
>>>     train_dataloader_kw=dict(
>>>         batch_size=16,
>>>         shuffle=True
>>>     ),
>>>     valid_dataloader_kw=dict(
>>>         batch_size=X_train.shape[0]
>>>     ),
>>>     iter_fn_kw= None,
>>>     callbacks= None,
>>>     seed=1997,
>>>     device=DEVICE,
>>>     metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5, select=['f1_score']),
>>>     verbose=1
>>> )
>>>
>>> # train the model
>>> model.train(X_train, y_train)
>>>
>>> # display model convergence
>>> model_history = model.fitting_history
>>> plotting.linePlot(
>>>     model_history['train'], model_history['valid'],
>>>     x='epoch', y='loss (mean)', err='loss (std)',
>>>     labels=['Train', 'Validation'],
>>>     title='Model convergence',
>>>     ls=['solid', 'dashed'],
>>>     legend_pos='center right')
>>>
>>> # display model performance
>>> plotting.linePlot(
>>>     model_history['train'], model_history['valid'],
>>>     x='epoch', y='f1_score',
>>>     labels=['Train', 'Validation'],
>>>     title='Model F1-score',
>>>     ls=['solid', 'dashed'],
>>>     legend_pos='center right')
>>>
>>> # update model paramters
>>> model.update(
>>>     n_epochs=100,
>>>     train_dataloader_kw__batch_size=32,
>>>     gf_params__layer_dims=[5, 5, 5],
>>>     metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5, select=['f1_score', 'auc'])
>>> )
>>>
>>> # after parameter updating the model is reseted
>>> y_hat = model.performInference(X_valid)
>>> pd.DataFrame([core.getScores(y_true=y_valid, y_pred=y_hat,
>>>                metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5))]
>>> ).T.round(decimals=3)
>>>
copy()[source]

Method used to make a copy of the model.

getParameters() dict[source]

Returns the model parameters.

updateParameters(**kwargs)[source]

Method that allows updating the model parameters. If you want to update a parameter contained in a dictionary, the name of the dictionary key must be specified together with the name of the parameter separated by “__”.

  • NOTE: Model parameters should be updated by calling the update() method from the model superclass.

>>> from gojo import interfaces
>>> from gojo import deepl
>>>
>>> # create the model to be evaluated
>>> model = interfaces.ParametrizedTorchSKInterface(
>>>     # example of generating function
>>>     generating_fn=deepl.ffn.createSimpleFFNModel,
>>>     gf_params=dict(
>>>         in_feats=13,
>>>         out_feats=1,
>>>         layer_dims=[20, 10],
>>>         layer_activation='ELU',
>>>         output_activation='Sigmoid'),
>>>     # example of iteration function
>>>     iter_fn=deepl.iterSupervisedEpoch,
>>>     loss_function=torch.nn.BCELoss(),
>>>     n_epochs=50,
>>>     train_split=0.8,
>>>     train_split_stratify=True,
>>>     optimizer_class=torch.optim.Adam,
>>>     dataset_class=deepl.loading.TorchDataset,
>>>     dataloader_class=torch.utils.data.DataLoader,
>>>     optimizer_kw=dict(
>>>         lr=0.001
>>>     ),
>>>     train_dataloader_kw=dict(
>>>         batch_size=16,
>>>         shuffle=True
>>>     ),
>>>     valid_dataloader_kw=dict(
>>>         batch_size=200
>>>     ),
>>>     # use default classification metrics
>>>     metrics=core.getDefaultMetrics(
>>>        'binary_classification', bin_threshold=0.5, select=['f1_score']),
>>> )
>>> model
Out [0]
    ParametrizedTorchSKInterface(
        model=Sequential(
      (LinearLayer 0): Linear(in_features=13, out_features=20, bias=True)
      (Activation 0): ELU(alpha=1.0)
      (LinearLayer 1): Linear(in_features=20, out_features=10, bias=True)
      (Activation 1): ELU(alpha=1.0)
      (LinearLayer 2): Linear(in_features=10, out_features=1, bias=True)
      (Activation 2): Sigmoid()
    ),
        iter_fn=<function iterSupervisedEpoch at 0x7fd7ca47b940>,
        loss_function=BCELoss(),
        n_epochs=50,
        train_split=0.8,
        train_split_stratify=True,
        optimizer_class=<class 'torch.optim.adam.Adam'>,
        dataset_class=<class 'gojo.deepl.loading.TorchDataset'>,
        dataloader_class=<class 'torch.utils.data.dataloader.DataLoader'>,
        optimizer_kw={'lr': 0.001},
        train_dataset_kw={},
        valid_dataset_kw={},
        train_dataloader_kw={'batch_size': 16, 'shuffle': True},
        valid_dataloader_kw={'batch_size': 200},
        iter_fn_kw={},
        callbacks=None,
        metrics=[Metric(
        name=f1_score,
        function_kw={},
        multiclass=False
    )],
        seed=None,
        device=cpu,
        verbose=1,
        generating_fn=<function createSimpleFFNModel at 0x7fd7ca4805e0>,
        gf_params={'in_feats': 13, 'out_feats': 1, 'layer_dims': [20, 10], 'layer_activation': 'ELU',
        'output_activation': 'Sigmoid'}
    )
>>>
>>> # update parameters by using the update() method provided by the Model interface
>>> model.update(
>>>    gf_params__layer_dims=[5],    # update dictionary-level parameter
>>>    n_epochs=100                  # update model-level parameter
>>> )
Out [1]
    ParametrizedTorchSKInterface(
        model=Sequential(
      (LinearLayer 0): Linear(in_features=13, out_features=5, bias=True)
      (Activation 0): ELU(alpha=1.0)
      (LinearLayer 1): Linear(in_features=5, out_features=1, bias=True)
      (Activation 1): Sigmoid()
    ),
        iter_fn=<function iterSupervisedEpoch at 0x7fd7ca47b940>,
        loss_function=BCELoss(),
        n_epochs=100,
        train_split=0.8,
        train_split_stratify=True,
        optimizer_class=<class 'torch.optim.adam.Adam'>,
        dataset_class=<class 'gojo.deepl.loading.TorchDataset'>,
        dataloader_class=<class 'torch.utils.data.dataloader.DataLoader'>,
        optimizer_kw={'lr': 0.001},
        train_dataset_kw={},
        valid_dataset_kw={},
        train_dataloader_kw={'batch_size': 16, 'shuffle': True},
        valid_dataloader_kw={'batch_size': 200},
        iter_fn_kw={},
        callbacks=None,
        metrics=[Metric(
        name=f1_score,
        function_kw={},
        multiclass=False
    )],
        seed=None,
        device=cpu,
        verbose=1,
        generating_fn=<function createSimpleFFNModel at 0x7fd7ca4805e0>,
        gf_params={'in_feats': 13, 'out_feats': 1, 'layer_dims': [5], 'layer_activation': 'ELU',
        'output_activation': 'Sigmoid'}
    )

gojo.interfaces.data module

class gojo.interfaces.data.Dataset(data: pandas.Series)[source]

Bases: object

Class representing a dataset. This class is used internally by the functions defined in gojo.core.loops.

datanp.ndarray or pd.DataFrame or pd.Series

Data to be homogenized as a dataset.

property array_data: numpy.ndarray

Returns the input data as a numpy.array.

property index_values: numpy.array

Returns the input data index values.

property var_names: list

Returns the name of the variables.

gojo.interfaces.transform module

class gojo.interfaces.transform.GraphStandardScaler[source]

Bases: gojo.interfaces.transform.Transform

Class that performs a standardization of three-dimensional input data associated with the following dimensions: (n_instances, n_nodes, n_features). The returned data will have a mean of 0 and standard deviation of 1 along dimensions 1 and 2.

fit(X: numpy.ndarray, y: None = None, **kwargs)[source]

Method used to fit a transform to a given input data.

Xnp.ndarray

Input data to fit the model.

ynp.ndarray or None, default=None

Data labels (optional).

getParameters() dict[source]

Method that must return the transform parameters.

model_parametersdict

Model parameters.

reset()[source]

Reset the model fit.

transform(X: numpy.ndarray, **kwargs) numpy.ndarray[source]

Method used to apply the transformations.

Xnp.ndarray

Input data to be transformed.

X_transnp.ndarray

Transformer data.

updateParameters(**kwargs)[source]

This method has no effect.

class gojo.interfaces.transform.SKLearnTransformWrapper(transform_class, **kwargs)[source]

Bases: gojo.interfaces.transform.Transform

Wrapper used to easily incorporate the transformations implemented in the sklearn library.

transform_classType

sklearn transform. The instances of this class must have the fit and transform methods defined according to the sklearn implementation.

**kwargs

Optional arguments used to initialize instances of the provided class.

>>> from sklearn.svm import SVC
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.decomposition import PCA
>>>
>>> # GOJO libraries
>>> import gojo
>>> from gojo import core
>>> from gojo import interfaces
>>>
>>> # previous model transforms
>>> transforms = [
>>>     interfaces.SKLearnTransformWrapper(StandardScaler),
>>>     interfaces.SKLearnTransformWrapper(PCA, n_components=5)
>>> ]
>>>
>>> # default model
>>> model = interfaces.SklearnModelWrapper(
>>>     SVC, kernel='poly', degree=1, coef0=0.0,
>>>     cache_size=1000, class_weight=None
>>> )
>>>
>>> cv_report = core.evalCrossVal(
>>>     X=X, y=y,
>>>     model=model,
>>>     cv=gojo.util.getCrossValObj(cv=5),
>>>     transforms=transforms)
>>>
copy()[source]

Make a deepcopy of the instance.

fit(X: numpy.ndarray, y: None = None, **kwargs)[source]

Method used to fit a transform to a given input data.

Xnp.ndarray

Input data to fit the model.

ynp.ndarray or None, default=None

Data labels (optional).

getParameters() dict[source]

Method that must return the transform parameters.

model_parametersdict

Model parameters.

reset()[source]

Reset the model fit.

transform(X: numpy.ndarray, **kwargs) numpy.ndarray[source]

Method used to apply the transformations.

Xnp.ndarray

Input data to be transformed.

X_transnp.ndarray

Transformer data.

property transform_obj: object

Get the internal transform object. By default, a deepcopy from the transform will be generated. To return the internal transformation directly, it is possible by selecting copy=True.

updateParameters(**kwargs)[source]

Method used to update the inner transform parameters.

IMPORTANT NOTE: Transform parameters should be updated by calling the update() method from the superclass gojo.core.transform.Transform.

class gojo.interfaces.transform.Transform[source]

Bases: object

Base interface for applying transformations to the input data in the gojo.core.loops subroutines. Internally, the training data will be passed to the fit() method for adjusting the transformation to the training dataset statistics, and subsequently, the transformation will be applied to the training and test data by means of the transform().

Subclasses must define the following methods:

  • fit()

    Method used to fit a transform to a given input data.

  • transform()

    Method used to perform the transformations to the input data.

  • reset()

    Method used to reset the fitted transform

  • copy()

    Method used to make a copy of the transform. It is not mandatory to define this method. By default, a deep copy will be performed

  • getParameters()

    Method that must return the transform parameters. It is not mandatory to define this method. By default, it will return a various dictionary

  • updateParameters()

    Method used to update the transform parameters. It is not mandatory to define this method. By default, it will have no effect

This abstract class provides the following properties:

  • is_fitted

    Indicates whether the transformation has been fitted by calling the fit() method.

And the following methods:

  • update()

    Method used to update the transform parameters.

  • fitted()

    Method called (usually internally) to indicate that a given transformation have been fitted.

  • resetFit()

    Method used to reset a fitted transformation (usually called internally).

copy()[source]

Method used to make a copy of the transform.

abstract fit(X: numpy.ndarray, y: None, **kwargs)[source]

Method used to fit a transform to a given input data.

Xnp.ndarray

Input data to fit the transformation.

ynp.ndarray or None, optional

Data labels (optional).

fitted()[source]

Method called to indicate that a given transformation have been fitted.

getParameters() dict[source]

Method that must return the transform parameters.

model_parametersdict

Model parameters.

property is_fitted: bool

Indicates whether the transformation has been fitted by calling the fit() method.

model_fittedbool

Returns True if the model was fitted.

abstract reset(**kwargs)[source]

Method used to reset the fitted transform.

resetFit()[source]

Method used to reset a fitted transformation.

abstract transform(X: numpy.ndarray, **kwargs) numpy.ndarray[source]

Method used to perform the transformations to the input data.

Xnp.ndarray

Input data used to perform the transformations.

update(**kwargs)[source]

Method used to update the transform parameters.

updateParameters(**kwargs)[source]

Method used to update the transform parameters.

Module contents