gojo.interfaces package
Submodules
gojo.interfaces.model module
- class gojo.interfaces.model.Model[source]
Base class (interface) used to define a model that can interact with the
gojo
library.Subclasses must define the following methods:
- train()
This method is used to fit a given model to the input data. Once the model has been fitted, inside this method, the superclass method
fitted()
must be called; otherwise, the model will not be recognized as fitted to any data, andperformInference()
will raise agojo.exception.UnfittedEstimator
error.
- reset()
This method should reset the inner estimator, forgetting all the data seen.
- getParameters()
This method must return a dictionary containing the parameters used by the model. The parameters returned by this method will be used to store metadata about the model.
- updateParameters()
This method must update the inner parameters of the model.
- copy()
This method must return a copy of the model.
This abstract class provides the following properties:
- parameters -> dict
Returns the hyperparameters of the model.
- is_fitted -> bool
Indicates whether a given model has been fitted (i.e., if the
train()
method was called).
And the following methods:
- fitted()
This method should be called inside the
train()
method to indicate that the model was fitted to the input data and can now perform inferences using theperformInference()
subroutine.
- resetFit()
This method is used to reset learned model weights.
- abstract getParameters() dict [source]
Method that must return the model parameters.
- model_parametersdict
Model parameters.
- property is_fitted: bool
Indicates whether the model has been trained by calling the
train()
method.- model_fittedbool
Returns True if the model was fitted.
- property parameters: dict
Return the model parameters defined in the
getParameters()
method.- model_parametersdict
Model parameters.
- abstract performInference(X: numpy.ndarray, **kwargs) numpy.ndarray [source]
Method used to perform the model predictions.
- Xnp.ndarray
Input data used to perform inference.
- abstract train(X: numpy.ndarray, y: None = None, **kwargs)[source]
Method used to fit a model to a given input data.
- Xnp.ndarray
Input data to fit the model.
- ynp.ndarray or None, default=None
Data labels (optional).
- **kwargs
Additional training parameters.
- class gojo.interfaces.model.SklearnModelWrapper(model_class, predict_proba: bool = False, supress_warnings: bool = False, **kwargs)[source]
Wrapper used for easy integration of models following the sklearn interface into the
gojo
library and functionality.- model_classtype
Model following the ‘sklearn.base.BaseEstimator’ interface. The class provided does not have to be a subclass of the sklearn interfacebut should provide the basic
fit()
andpredict()
(orpredict_proba()
) methods.- predict_probabool, default=False
Parameter that indicates whether to call the
predict_proba()
method when making predictions. If this parameter is False (default behavior) thepredict()
method will be called. If the parameter is set to True and the model provided does not have the predict_proba method implemented, thepredict()
method will be called and a warning will inform that an attempt has been made to call thepredict_proba()
method.- supress_warningsbool, default=False
Parameter indicating whether to suppress the warnings issued by the class.
- **kwargs
Additional model hyparameters. This parameters will be passed to the model_class constructor.
>>> from gojo import interfaces >>> from sklearn.naive_bayes import GaussianNB >>> >>> # create model >>> model = interfaces.SklearnModelWrapper( >>> GaussianNB, predict_proba=True, priors=[0.25, 0.75]) >>> >>> # train model >>> model.train(X, y) # X and y are numpy.arrays >>> >>> # perform inference >>> y_hat = model.performInference(X_new) # X_new is a numpy.array >>> >>> # reset model fitting >>> model.resetFit() >>> model.is_fitted # must return False
- getParameters() dict [source]
Method that must return the model parameters.
- model_parametersdict
Model parameters.
- property model
Returns the internal model provided by the constructor and adjusted if the train method has been called.
- performInference(X: numpy.ndarray, **kwargs) numpy.ndarray [source]
Method used to perform the model predictions.
- Xnp.ndarray
Input data used to perform inference.
- model_predictionsnp.ndarray
Model predictions associated with the input data.
- train(X: numpy.ndarray, y: None = None, **kwargs)[source]
Method used to fit a model to a given input data.
- Xnp.ndarray
Input data to fit the model.
- ynp.ndarray or None, default=None
Data labels (optional).
- class gojo.interfaces.model.TorchSKInterface(model: torch.nn.Module, iter_fn: callable, loss_function, n_epochs: int, optimizer_class, dataset_class, dataloader_class, lr_scheduler_class: Optional[type] = None, optimizer_kw: Optional[dict] = None, lr_scheduler_kw: Optional[dict] = None, train_dataset_kw: Optional[dict] = None, valid_dataset_kw: Optional[dict] = None, inference_dataset_kw: Optional[dict] = None, train_dataloader_kw: Optional[dict] = None, valid_dataloader_kw: Optional[dict] = None, inference_dataloader_kw: Optional[dict] = None, iter_fn_kw: Optional[dict] = None, train_split: float = 1.0, train_split_stratify: bool = False, callbacks: Optional[list] = None, metrics: Optional[list] = None, batch_size: Optional[int] = None, seed: Optional[int] = None, device: str = 'cpu', verbose: int = 1)[source]
Wrapper class designed to integrate pytorch models (‘torch.nn.Module’ instances) in the
gojo
. library functionalities.- modeltorch.nn.Module
Subclass of ‘torch.nn.Module’.
- iter_fncallable
Function that executes an epoch of the torch.nn.Module typical training pipeline. For more information consult
gojo.deepl.loops
.- loss_functioncallable
Loss function used to train the model.
- n_epochsint
Number of epochs used to train the model.
- optimizer_classtype
Pytorch optimizer used to train the model (see torch.optim module.)
- dataset_classtype
Pytorch class dataset used to train the model (see torch.utils.data module or the gojo submodule
gojo.deepl.loading
).- dataloader_classtype
Pytorch dataloader class (torch.utils.data.DataLoader).
- lr_scheduler_classtype, default=None
Class used to construct a learning rate schedule as defined in
torch.optim.lr_scheduler()
.- optimizer_kwdict, default=None
Parameters used to initialize the provided optimizer class.
- lr_scheduler_kwdict, default=None
Parameters used to initialize the learning rate scheduler as defined based on lr_scheduler_class.
- train_dataset_kwdict, default=None
Parameters used to initialize the provided dataset class for the data used for training.
- train_dataloader_kwdict, default=None
Parameters used to initialize the provided dataloader class for the data used for training.
- train_splitfloat, default=1.0
Percentage of the training data received in
train()
that will be used to train the model. The rest of the data will be used as validation set.- valid_dataset_kwdict, default=None
Parameters used to initialize the provided dataset class for the data used for validation. Parameter ignored if train_split == 1.0.
- valid_dataloader_kwdict, default=None
Parameters used to initialize the provided dataloader class for the data used for validation. Parameter ignored if train_split == 1.0.
- inference_dataset_kwdict, default=None
Parameters used to initialize the provided dataset class for the data used for inference when calling
gojo.interfaces.TorchSKInterface.performInference()
. If no parameters are provided, the arguments provided for the training will be used.- inference_dataloader_kwdict, default=None
Parameters used to initialize the provided dataloader class for the data used for inference when calling
gojo.interfaces.TorchSKInterface.performInference()
. If no parameters are provided, the arguments provided for the training will be used changing the dataloader parameters: shuffle = False, drop_last = False, batch_size = batch_size (batch_size provided in the constructor or when calling the methodgojo.interfaces.TorchSKInterface.performInference()
)- iter_fn_kwdict, default=None
Optional arguments of the parameter iter_fn.
- train_split_stratifybool, default=False
Parameter indicating whether to perform the train/validation split with class stratification. Parameter ignored if train_split == 1.0.
- callbacksList[
gojo.deepl.callback.Callback
], default=None Callbacks during model training. For more information see
gojo.deepl.callback
.- metricsList[
gojo.core.evaluation.Metric
], default=None Metrics used to evaluate the model performance during training. Fore more information see
gojo.core.evaluation.Metric
.- batch_sizeint, default=None
Batch size used when calling to
gojo.interfaces.TorchSKInterface.performInference()
. This parameter can also be set during the function calling.- seedint, default=None
Random seed used for controlling the randomness.
- devicestr, default=’cpu’
Device used for training the model.
- verboseint, default=1
Verbosity level. Use -1 to indicate maximum verbosity.
>>> import torch >>> import pandas as pd >>> from sklearn import datasets >>> from sklearn.model_selection import train_test_split >>> >>> # Gojo libraries >>> from gojo import interfaces >>> from gojo import core >>> from gojo import deepl >>> from gojo import util >>> from gojo import plotting >>> >>> >>> DEVICE = 'mps' >>> >>> >>> # load test dataset (Wine) >>> wine_dt = datasets.load_wine() >>> >>> # create the target variable. Classification problem 0 vs rest >>> # to see the target names you can use wine_dt['target_names'] >>> y = (wine_dt['target'] == 1).astype(int) >>> X = wine_dt['data'] >>> >>> # standardize input data >>> std_X = util.zscoresScaling(X) >>> >>> # split Xs and Ys in training and validation >>> X_train, X_valid, y_train, y_valid = train_test_split( >>> std_X, y, train_size=0.8, random_state=1997, shuffle=True, stratify=y) >>> >>> model = interfaces.TorchSKInterface( >>> model=deepl.ffn.createSimpleFFNModel( >>> in_feats=X_train.shape[1], >>> out_feats=1, >>> layer_dims=[20], >>> layer_activation=torch.nn.ELU(), >>> output_activation=torch.nn.Sigmoid()), >>> iter_fn=deepl.iterSupervisedEpoch, >>> loss_function=torch.nn.BCELoss(), >>> n_epochs=50, >>> train_split=0.8, >>> train_split_stratify=True, >>> optimizer_class=torch.optim.Adam, >>> dataset_class=deepl.loading.TorchDataset, >>> dataloader_class=torch.utils.data.DataLoader, >>> optimizer_kw=dict( >>> lr=0.001 >>> ), >>> train_dataset_kw=None, >>> valid_dataset_kw=None, >>> train_dataloader_kw=dict( >>> batch_size=16, >>> shuffle=True >>> ), >>> valid_dataloader_kw=dict( >>> batch_size=X_train.shape[0] >>> ), >>> iter_fn_kw= None, >>> callbacks= None, >>> seed=1997, >>> device=DEVICE, >>> metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5), >>> verbose=1 >>> ) >>> >>> # train the model >>> model.train(X_train, y_train) >>> >>> # get the model convergence information >>> model_history = model.fitting_history >>> >>> # display model convergence >>> plotting.linePlot( >>> model_history['train'], model_history['valid'], >>> x='epoch', y='loss (mean)', err='loss (std)', >>> labels=['Train', 'Validation'], >>> title='Model convergence', >>> ls=['solid', 'dashed'], >>> legend_pos='center right') >>> >>> # display model performance >>> plotting.linePlot( >>> model_history['train'], model_history['valid'], >>> x='epoch', y='f1_score', >>> labels=['Train', 'Validation'], >>> title='Model F1-score', >>> ls=['solid', 'dashed'], >>> legend_pos='center right')
- property fitting_history: tuple
Returns a tuple with the training/validation fitting history of the models returned by the
gojo.deepl.loops.fitNeuralNetwork()
function. The first element will correspond to the training data while the second element to the validation data.
- loadStateDict(file: str)[source]
Subroutine used to load a state dictionary with the serialized model weights using torch.save.
- filestr
File with the saved weights.
- property model: torch.nn.Module
Returns the internal model provided by the constructor and adjusted if the train method has been called.
- property num_params: int
Returns the number model trainable parameters.
- performInference(X: numpy.ndarray, batch_size: Optional[int] = None, **kwargs) numpy.ndarray [source]
Method used to perform the model predictions.
- Xnp.ndarray
Input data used to perform inference.
- batch_sizeint, default=None
Parameter indicating whether to perform the inference using batches instead of all input data at once. By default, all input data will by used.
- **kwargs
Optional arguments for instance-level data.
- model_predictionsnp.ndarray
Model predictions associated with the input data.
- train(X: numpy.ndarray, y: None = None, **kwargs)[source]
Train the model using the input data.
- Xnp.ndarray
Predictor variables.
- ynp.ndarray or None, default=None
Target variable.
- **kwargs
Optional instance-level arguments.
- class gojo.interfaces.model.ParametrizedTorchSKInterface(generating_fn: callable, gf_params: dict, iter_fn: callable, loss_function, n_epochs: int, optimizer_class, dataset_class, dataloader_class, lr_scheduler_class: Optional[type] = None, optimizer_kw: Optional[dict] = None, lr_scheduler_kw: Optional[dict] = None, train_dataset_kw: Optional[dict] = None, valid_dataset_kw: Optional[dict] = None, inference_dataset_kw: Optional[dict] = None, train_dataloader_kw: Optional[dict] = None, valid_dataloader_kw: Optional[dict] = None, inference_dataloader_kw: Optional[dict] = None, iter_fn_kw: Optional[dict] = None, train_split: float = 1.0, train_split_stratify: bool = False, callbacks: Optional[list] = None, metrics: Optional[list] = None, batch_size: Optional[int] = None, seed: Optional[int] = None, device: str = 'cpu', verbose: int = 1)[source]
Parameterized version of
gojo.interfaces.TorchSKInterface
. This implementation is useful for performing cross validation with hyperparameter optimization using thegojo.core.loops.evalCrossValNestedHPO()
function. This class provides an implementation of theupdateParameters()
method.- generating_fncallable
Function used to generate a model from a set of parameters. Currently, there are some implemented functions such as
gojo.deepl.ffn.createSimpleFFNModel()
. Also, the user can define its own generating function.- gf_paramsdict
Parameters used by the input function generating_fn to generate a torch.nn.Module instance.
- iter_fncallable
Function that executes an epoch of the torch.nn.Module typical training pipeline. For more information consult
gojo.deepl.loops
.- loss_functioncallable
Loss function used to train the model.
- n_epochsint
Number of epochs used to train the model.
- optimizer_classtype
Pytorch optimizer used to train the model (see torch.optim module.)
- dataset_classtype
Pytorch class dataset used to train the model (see torch.utils.data module or the gojo submodule
gojo.deepl.loading
).- dataloader_classtype
Pytorch dataloader class (torch.utils.data.DataLoader).
- lr_scheduler_classtype, default=None
Class used to construct a learning rate schedule as defined in
torch.optim.lr_scheduler()
.- optimizer_kwdict, default=None
Parameters used to initialize the provided optimizer class.
- lr_scheduler_kwdict, default=None
Parameters used to initialize the learning rate scheduler as defined based on lr_scheduler_class.
- train_dataset_kwdict, default=None
Parameters used to initialize the provided dataset class for the data used for training.
- train_dataloader_kwdict, default=None
Parameters used to initialize the provided dataloader class for the data used for training.
- train_splitfloat, default=1.0
Percentage of the training data received in
train()
that will be used to train the model. The rest of the data will be used as validation set.- valid_dataset_kwdict, default=None
Parameters used to initialize the provided dataset class for the data used for validation. Parameter ignored if train_split == 1.0.
- valid_dataloader_kwdict, default=None
Parameters used to initialize the provided dataloader class for the data used for validation. Parameter ignored if train_split == 1.0.
- inference_dataset_kwdict, default=None
Parameters used to initialize the provided dataset class for the data used for inference when calling
gojo.interfaces.TorchSKInterface.performInference()
. If no parameters are provided, the arguments provided for the training will be used.- inference_dataloader_kwdict, default=None
Parameters used to initialize the provided dataloader class for the data used for inference when calling
gojo.interfaces.TorchSKInterface.performInference()
. If no parameters are provided, the arguments provided for the training will be used changing the dataloader parameters: shuffle = False, drop_last = False, batch_size = batch_size (batch_size provided in the constructor or when calling the methodgojo.interfaces.TorchSKInterface.performInference()
)- iter_fn_kwdict, default=None
Optional arguments of the parameter iter_fn.
- train_split_stratifybool, default=False
Parameter indicating whether to perform the train/validation split with class stratification. Parameter ignored if train_split == 1.0.
- callbacksList[
gojo.deepl.callback.Callback
], default=None Callbacks during model training. For more information see
gojo.deepl.callback
.- metricsList[
gojo.core.evaluation.Metric
], default=None Metrics used to evaluate the model performance during training. Fore more information see
gojo.core.evaluation.Metric
.- batch_sizeint, default=None
Batch size used when calling to
gojo.interfaces.ParametrizedTorchSKInterface.performInference()
. This parameter can also be set during the function calling.- seedint, default=None
Random seed used for controlling the randomness.
- devicestr, default=’cpu’
Device used for training the model.
- verboseint, default=1
Verbosity level. Use -1 to indicate maximum verbosity.
>>> import sys >>> >>> sys.path.append('..') >>> >>> import torch >>> import pandas as pd >>> from sklearn import datasets >>> from sklearn.model_selection import train_test_split >>> >>> # GOJO libraries >>> from gojo import interfaces >>> from gojo import core >>> from gojo import deepl >>> from gojo import util >>> from gojo import plotting >>> >>> DEVICE = 'mps' >>> >>> # load test dataset (Wine) >>> wine_dt = datasets.load_wine() >>> >>> # create the target variable. Classification problem 0 vs rest >>> # to see the target names you can use wine_dt['target_names'] >>> y = (wine_dt['target'] == 1).astype(int) >>> X = wine_dt['data'] >>> >>> # standarize input data >>> std_X = util.zscoresScaling(X) >>> >>> # split Xs and Ys in training and validation >>> X_train, X_valid, y_train, y_valid = train_test_split( >>> std_X, y, train_size=0.8, random_state=1997, shuffle=True, >>> stratify=y >>> ) >>> >>> model = interfaces.ParametrizedTorchSKInterface( >>> generating_fn=deepl.ffn.createSimpleFFNModel, >>> gf_params=dict( >>> in_feats=X_train.shape[1], >>> out_feats=1, >>> layer_dims=[20], >>> layer_activation='ELU', >>> output_activation='Sigmoid'), >>> iter_fn=deepl.iterSupervisedEpoch, >>> loss_function=torch.nn.BCELoss(), >>> n_epochs=50, >>> train_split=0.8, >>> train_split_stratify=True, >>> optimizer_class=torch.optim.Adam, >>> dataset_class=deepl.loading.TorchDataset, >>> dataloader_class=torch.utils.data.DataLoader, >>> optimizer_kw=dict( >>> lr=0.001 >>> ), >>> train_dataset_kw=None, >>> valid_dataset_kw=None, >>> train_dataloader_kw=dict( >>> batch_size=16, >>> shuffle=True >>> ), >>> valid_dataloader_kw=dict( >>> batch_size=X_train.shape[0] >>> ), >>> iter_fn_kw= None, >>> callbacks= None, >>> seed=1997, >>> device=DEVICE, >>> metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5, select=['f1_score']), >>> verbose=1 >>> ) >>> >>> # train the model >>> model.train(X_train, y_train) >>> >>> # display model convergence >>> model_history = model.fitting_history >>> plotting.linePlot( >>> model_history['train'], model_history['valid'], >>> x='epoch', y='loss (mean)', err='loss (std)', >>> labels=['Train', 'Validation'], >>> title='Model convergence', >>> ls=['solid', 'dashed'], >>> legend_pos='center right') >>> >>> # display model performance >>> plotting.linePlot( >>> model_history['train'], model_history['valid'], >>> x='epoch', y='f1_score', >>> labels=['Train', 'Validation'], >>> title='Model F1-score', >>> ls=['solid', 'dashed'], >>> legend_pos='center right') >>> >>> # update model paramters >>> model.update( >>> n_epochs=100, >>> train_dataloader_kw__batch_size=32, >>> gf_params__layer_dims=[5, 5, 5], >>> metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5, select=['f1_score', 'auc']) >>> ) >>> >>> # after parameter updating the model is reseted >>> y_hat = model.performInference(X_valid) >>> pd.DataFrame([core.getScores(y_true=y_valid, y_pred=y_hat, >>> metrics=core.getDefaultMetrics('binary_classification', bin_threshold=0.5))] >>> ).T.round(decimals=3) >>>
- updateParameters(**kwargs)[source]
Method that allows updating the model parameters. If you want to update a parameter contained in a dictionary, the name of the dictionary key must be specified together with the name of the parameter separated by “__”.
NOTE: Model parameters should be updated by calling the
update()
method from the model superclass.
>>> from gojo import interfaces >>> from gojo import deepl >>> >>> # create the model to be evaluated >>> model = interfaces.ParametrizedTorchSKInterface( >>> # example of generating function >>> generating_fn=deepl.ffn.createSimpleFFNModel, >>> gf_params=dict( >>> in_feats=13, >>> out_feats=1, >>> layer_dims=[20, 10], >>> layer_activation='ELU', >>> output_activation='Sigmoid'), >>> # example of iteration function >>> iter_fn=deepl.iterSupervisedEpoch, >>> loss_function=torch.nn.BCELoss(), >>> n_epochs=50, >>> train_split=0.8, >>> train_split_stratify=True, >>> optimizer_class=torch.optim.Adam, >>> dataset_class=deepl.loading.TorchDataset, >>> dataloader_class=torch.utils.data.DataLoader, >>> optimizer_kw=dict( >>> lr=0.001 >>> ), >>> train_dataloader_kw=dict( >>> batch_size=16, >>> shuffle=True >>> ), >>> valid_dataloader_kw=dict( >>> batch_size=200 >>> ), >>> # use default classification metrics >>> metrics=core.getDefaultMetrics( >>> 'binary_classification', bin_threshold=0.5, select=['f1_score']), >>> ) >>> model Out [0] ParametrizedTorchSKInterface( model=Sequential( (LinearLayer 0): Linear(in_features=13, out_features=20, bias=True) (Activation 0): ELU(alpha=1.0) (LinearLayer 1): Linear(in_features=20, out_features=10, bias=True) (Activation 1): ELU(alpha=1.0) (LinearLayer 2): Linear(in_features=10, out_features=1, bias=True) (Activation 2): Sigmoid() ), iter_fn=<function iterSupervisedEpoch at 0x7fd7ca47b940>, loss_function=BCELoss(), n_epochs=50, train_split=0.8, train_split_stratify=True, optimizer_class=<class 'torch.optim.adam.Adam'>, dataset_class=<class 'gojo.deepl.loading.TorchDataset'>, dataloader_class=<class 'torch.utils.data.dataloader.DataLoader'>, optimizer_kw={'lr': 0.001}, train_dataset_kw={}, valid_dataset_kw={}, train_dataloader_kw={'batch_size': 16, 'shuffle': True}, valid_dataloader_kw={'batch_size': 200}, iter_fn_kw={}, callbacks=None, metrics=[Metric( name=f1_score, function_kw={}, multiclass=False )], seed=None, device=cpu, verbose=1, generating_fn=<function createSimpleFFNModel at 0x7fd7ca4805e0>, gf_params={'in_feats': 13, 'out_feats': 1, 'layer_dims': [20, 10], 'layer_activation': 'ELU', 'output_activation': 'Sigmoid'} ) >>> >>> # update parameters by using the update() method provided by the Model interface >>> model.update( >>> gf_params__layer_dims=[5], # update dictionary-level parameter >>> n_epochs=100 # update model-level parameter >>> ) Out [1] ParametrizedTorchSKInterface( model=Sequential( (LinearLayer 0): Linear(in_features=13, out_features=5, bias=True) (Activation 0): ELU(alpha=1.0) (LinearLayer 1): Linear(in_features=5, out_features=1, bias=True) (Activation 1): Sigmoid() ), iter_fn=<function iterSupervisedEpoch at 0x7fd7ca47b940>, loss_function=BCELoss(), n_epochs=100, train_split=0.8, train_split_stratify=True, optimizer_class=<class 'torch.optim.adam.Adam'>, dataset_class=<class 'gojo.deepl.loading.TorchDataset'>, dataloader_class=<class 'torch.utils.data.dataloader.DataLoader'>, optimizer_kw={'lr': 0.001}, train_dataset_kw={}, valid_dataset_kw={}, train_dataloader_kw={'batch_size': 16, 'shuffle': True}, valid_dataloader_kw={'batch_size': 200}, iter_fn_kw={}, callbacks=None, metrics=[Metric( name=f1_score, function_kw={}, multiclass=False )], seed=None, device=cpu, verbose=1, generating_fn=<function createSimpleFFNModel at 0x7fd7ca4805e0>, gf_params={'in_feats': 13, 'out_feats': 1, 'layer_dims': [5], 'layer_activation': 'ELU', 'output_activation': 'Sigmoid'} )
gojo.interfaces.data module
- class gojo.interfaces.data.Dataset(data: pandas.Series)[source]
Bases:
object
Class representing a dataset. This class is used internally by the functions defined in
gojo.core.loops
.- datanp.ndarray or pd.DataFrame or pd.Series
Data to be homogenized as a dataset.
- property array_data: numpy.ndarray
Returns the input data as a numpy.array.
- property index_values: numpy.array
Returns the input data index values.
- property var_names: list
Returns the name of the variables.
gojo.interfaces.transform module
- class gojo.interfaces.transform.GraphStandardScaler[source]
Bases:
gojo.interfaces.transform.Transform
Class that performs a standardization of three-dimensional input data associated with the following dimensions: (n_instances, n_nodes, n_features). The returned data will have a mean of 0 and standard deviation of 1 along dimensions 1 and 2.
- fit(X: numpy.ndarray, y: None = None, **kwargs)[source]
Method used to fit a transform to a given input data.
- Xnp.ndarray
Input data to fit the model.
- ynp.ndarray or None, default=None
Data labels (optional).
- getParameters() dict [source]
Method that must return the transform parameters.
- model_parametersdict
Model parameters.
- transform(X: numpy.ndarray, **kwargs) numpy.ndarray [source]
Method used to apply the transformations.
- Xnp.ndarray
Input data to be transformed.
- X_transnp.ndarray
Transformer data.
- class gojo.interfaces.transform.SKLearnTransformWrapper(transform_class, **kwargs)[source]
Bases:
gojo.interfaces.transform.Transform
Wrapper used to easily incorporate the transformations implemented in the sklearn library.
- transform_classType
sklearn transform. The instances of this class must have the fit and transform methods defined according to the sklearn implementation.
- **kwargs
Optional arguments used to initialize instances of the provided class.
>>> from sklearn.svm import SVC >>> from sklearn.preprocessing import StandardScaler >>> from sklearn.decomposition import PCA >>> >>> # GOJO libraries >>> import gojo >>> from gojo import core >>> from gojo import interfaces >>> >>> # previous model transforms >>> transforms = [ >>> interfaces.SKLearnTransformWrapper(StandardScaler), >>> interfaces.SKLearnTransformWrapper(PCA, n_components=5) >>> ] >>> >>> # default model >>> model = interfaces.SklearnModelWrapper( >>> SVC, kernel='poly', degree=1, coef0=0.0, >>> cache_size=1000, class_weight=None >>> ) >>> >>> cv_report = core.evalCrossVal( >>> X=X, y=y, >>> model=model, >>> cv=gojo.util.getCrossValObj(cv=5), >>> transforms=transforms) >>>
- fit(X: numpy.ndarray, y: None = None, **kwargs)[source]
Method used to fit a transform to a given input data.
- Xnp.ndarray
Input data to fit the model.
- ynp.ndarray or None, default=None
Data labels (optional).
- getParameters() dict [source]
Method that must return the transform parameters.
- model_parametersdict
Model parameters.
- transform(X: numpy.ndarray, **kwargs) numpy.ndarray [source]
Method used to apply the transformations.
- Xnp.ndarray
Input data to be transformed.
- X_transnp.ndarray
Transformer data.
- property transform_obj: object
Get the internal transform object. By default, a deepcopy from the transform will be generated. To return the internal transformation directly, it is possible by selecting copy=True.
- class gojo.interfaces.transform.Transform[source]
Bases:
object
Base interface for applying transformations to the input data in the
gojo.core.loops
subroutines. Internally, the training data will be passed to thefit()
method for adjusting the transformation to the training dataset statistics, and subsequently, the transformation will be applied to the training and test data by means of thetransform()
.Subclasses must define the following methods:
- fit()
Method used to fit a transform to a given input data.
- transform()
Method used to perform the transformations to the input data.
- reset()
Method used to reset the fitted transform
- copy()
Method used to make a copy of the transform. It is not mandatory to define this method. By default, a deep copy will be performed
- getParameters()
Method that must return the transform parameters. It is not mandatory to define this method. By default, it will return a various dictionary
- updateParameters()
Method used to update the transform parameters. It is not mandatory to define this method. By default, it will have no effect
This abstract class provides the following properties:
- is_fitted
Indicates whether the transformation has been fitted by calling the
fit()
method.
And the following methods:
- update()
Method used to update the transform parameters.
- fitted()
Method called (usually internally) to indicate that a given transformation have been fitted.
- resetFit()
Method used to reset a fitted transformation (usually called internally).
- abstract fit(X: numpy.ndarray, y: None, **kwargs)[source]
Method used to fit a transform to a given input data.
- Xnp.ndarray
Input data to fit the transformation.
- ynp.ndarray or None, optional
Data labels (optional).
- getParameters() dict [source]
Method that must return the transform parameters.
- model_parametersdict
Model parameters.
- property is_fitted: bool
Indicates whether the transformation has been fitted by calling the
fit()
method.- model_fittedbool
Returns True if the model was fitted.
- abstract transform(X: numpy.ndarray, **kwargs) numpy.ndarray [source]
Method used to perform the transformations to the input data.
- Xnp.ndarray
Input data used to perform the transformations.