gojo.plotting package

Submodules

gojo.plotting.basic module

gojo.plotting.basic.barPlot(*dfs, x: str, y: str, labels: Optional[list] = None, colors: Optional[list] = None, ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (6, 3.5), style: str = 'ggplot', dpi: int = 100, err_capsize: float = 0.15, err_lw: float = 1.5, grid_alpha: float = 0.15, xlabel_size: int = 13, ylabel_size: int = 13, title: str = '', title_size: int = 15, title_pad: int = 15, hide_legend: bool = False, legend_pos: str = 'upper right', legend_bbox_to_anchor: Optional[tuple] = None, legend_size: int = 12, yvmin: Optional[float] = None, yvmax: Optional[float] = None, xvmin: Optional[float] = None, xvmax: Optional[float] = None, hide_xlabel: bool = False, hide_ylabel: bool = False, xaxis_tick_size: int = 12, yaxis_tick_size: int = 12, xaxis_rotation: float = 0.0, yaxis_rotation: float = 0.0, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]

Bar plot function

*dfs

Input dataframes with the data to be represented.

xstr

X-axis variable. Must be present in the input dataframes.

ystr

Y-axis variable. Must be present in the input dataframes.

labelslist, default=None

Labels used for identifying the input dataframes.

colorslist or str, default=None

Colors used for identifying the dataframe information. A string colormap can be provided.

axmpl.axes.Axes, default=None

Axes used to represent the figure.

figsizetuple, default=(6, 3.5)

Figure size.

stylestr, default=’ggplot’

Plot styling. (see ‘matplotlib.pyplot.styles’)

dpiint, default=100

Figure dpi.

err_capsizefloat, default=0.15

Error capsize.

err_lwfloat, default=1.5

Error linewidth.

grid_alphafloat, default=0.15

Gird lines opacity.

xlabel_sizeint, default=13

Size of the x-label.

ylabel_sizeint, default=13

Size of the y-label.

titlestr, default=’’

Plot title.

title_sizeint, default=15

Title font size.

title_padint, default=15

Title pad.

hide_legendbool, default=False

Parameter indicating whether to hide the legend.

legend_posstr, default=’upper right’

Legend position.

legend_bbox_to_anchortuple, default=None

Used for modifying the legend position relative to the position defined in legend_pos.

legend_sizeint, default=12

Legend size.

yvminfloat, default=None

Minimum value in the y-axis.

yvmaxfloat, default=None

Maximum value in the y-axis.

xvminfloat, default=None

Minimum value in the x-axis.

xvmaxfloat, default=None

Maximum value in the x-axis.

hide_xlabelbool, default=False

Parameter indicating whether to hide the x-axis label.

hide_ylabelbool, default=False

Parameter indicating whether to hide the y-axis label.

xaxis_tick_sizeint, default=12

Controls the x-axis tick size.

yaxis_tick_sizeint, default=12

Controls the y-axis tick size.

xaxis_rotationfloat or int, default=0.0

Y-axis tick rotation.

yaxis_rotationfloat or int, default=0.0

Y-axis tick rotation.

savestr, default=None

Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.

save_kwdict, default=None

Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.

showbool, default=True

Parameter indicating whether to save the generated plot.

>>> from gojo import core
>>> from gojo import plotting
>>>
>>> # i.e., compute model performance metrics
>>> scores_1 = report1.getScores(
>>>     core.getDefaultMetrics(
>>>     binary_classification, bin_threshold=0.5))['test']
>>>
>>> scores_2 = report1.getScores(
>>>     core.getDefaultMetrics(
>>>     binary_classification, bin_threshold=0.5))['test']
>>>
>>> # adapt for barplot representation
>>> scores_1 = scores_1.melt()
>>> scores_2 = scores_2.melt()
>>>
>>>
>>> plotting.barPlot(
>>>     scores_1, scores_2,
>>>     x='variable', y='value',
>>>     labels=['Model 1', 'Model 2'],
>>>     title='Cross-validation results'
>>> )
gojo.plotting.basic.linePlot(*dfs, x: str, y: str, err: Optional[str] = None, err_alpha: float = 0.3, labels: Optional[list] = None, ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (6, 3.5), style: str = 'ggplot', dpi: int = 100, colors: Optional[list] = None, title: str = '', title_size: int = 15, title_pad: int = 15, hide_legend: bool = False, legend_pos: str = 'upper right', legend_size: int = 12, xlabel_size: float = 13, ylabel_size: float = 13, grid_alpha: float = 0.5, yvmin: Optional[float] = None, yvmax: Optional[float] = None, xvmin: Optional[float] = None, xvmax: Optional[float] = None, lw: Optional[float] = None, ls: Optional[str] = None, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]

Line plot function.

*dfspd.DataFrame

Input dataframes with the data to be represented.

xstr

X-axis variable. Must be present in the input dataframes.

ystr

Y-axis variable. Must be present in the input dataframes.

errstr

Variable indicating the errors associated with the lines. Must be present in the input dataframes.

err_alphafloat, default=0.3

Opacity used to plot the errors.

labelslist, default=None

Labels used for identifying the input dataframes.

axmatplotlib.axes.Axes, default=None

Axes used to represent the figure.

figsizetuple, default=(6, 3.5)

Figure size.

stylestr, default=’ggplot’

Plot styling. (see ‘matplotlib.pyplot.styles’)

dpiint, default=100

Figure dpi.

colorslist or str, default=None

Colors used for identifying the dataframe information. A string colormap can be provided.

titlestr, default=’’

Plot title.

title_sizeint or float, default=0.5

Title font size.

title_padint, default=15

Title pad.

hide_legendbool, default=False

Parameter indicating whether to hide the legend.

legend_posstr, default=’upper right’

Legend position.

legend_sizeint, default=12

Legend size.

yvminfloat, default=None

Minimum value in the y-axis.

yvmaxfloat, default=None

Maximum value in the y-axis.

xvminfloat, default=None

Minimum value in the x-axis.

xvmaxfloat, default=None

Maximum value in the x-axis.

xlabel_sizefloat or int, default=13

X-axis label size.

ylabel_sizefloat ot int, default=13

Y-axis label size.

grid_alphafloat, default=0.5

Grid opacity.

lwfloat or int or list, default=None

Line(s) width(s).

lsstr or list, default=None

Line(s) styles(s).

savestr, default=None

Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.

save_kwdict, default=None

Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.

showbool, default=True

Parameter indicating whether to save the generated plot.

>>> from gojo import plotting
>>>
>>> # train_info, test_info are pandas dataframes returned by gojo.deepl.fitNeuralNetwork
>>> plotting.linePlot(
>>>     train_info, valid_info,
>>>     x='epoch', y='loss (mean)', err='loss (std)',
>>>     labels=['Train', 'Validation'],
>>>     title='Model convergence',
>>>     ls=['solid', 'dashed'],
>>>     style='default', legend_pos='center right')
>>>
gojo.plotting.basic.scatterPlot(df: pandas.DataFrame, x: str, y: str, hue: Optional[str] = None, hue_mapping: Optional[dict] = None, ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (6, 4.5), style: str = 'ggplot', dpi: int = 100, maker_size: Optional[float] = None, colors: Optional[list] = None, title: str = '', title_size: int = 15, title_pad: int = 15, hide_legend: bool = False, legend_pos: Optional[str] = None, legend_size: int = 12, xlabel_size: float = 13, ylabel_size: float = 13, grid_alpha: float = 0.5, yvmin: Optional[float] = None, yvmax: Optional[float] = None, xvmin: Optional[float] = None, xvmax: Optional[float] = None, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]

Scatter plot function.

dfpd.DataFrame

Input dataframes with the data to be represented.

xstr

X-axis variable. Must be present in the input dataframes.

ystr

Y-axis variable. Must be present in the input dataframes.

huestr ,default=None

Hue variable for plotting groups.

hue_mappingdict ,default=None

Hash to map group names from the hue variable in the df to user-defined names.

axmpl.axes.Axes ,default=None

Axes used to represent the figure.

figsizetuple ,default=(6, 4.5)

Figure size.

stylestr ,default=’ggplot’

Plot styling. (see ‘matplotlib.pyplot.styles’)

dpiint ,default=100

Figure dpi.

maker_sizefloat or int ,default=None

Marker size.

colorslist or str ,default=None

Colors used for identifying the dataframe information. A string colormap can be provided.

titlestr ,default=’’

Plot title.

title_sizeint or float ,default=15

Title font size.

title_padint ,default=15

Title pad.

hide_legendbool ,default=False

Parameter indicating whether to hide the legend.

legend_posstr ,default=None

Legend position.

legend_sizeint or float ,default=12

Legend size.

xlabel_sizefloat or int ,default=13

X-label size.

ylabel_sizefloat or int ,default=13

Y-label size.

grid_alphafloat ,default=0.5

Opcaity of the grid lines.

yvminfloat ,default=None

Minimum value in the y-axis.

yvmaxfloat ,default=None

Maximum value in the y-axis.

xvminfloat ,default=None

Minimum value in the x-axis.

xvmaxfloat ,default=None

Maximum value in the x-axis.

savestr ,default=None

Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.

save_kwdict ,default=None

Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.

showbool ,default=True

Parameter indicating whether to save the generated plot.

>>> import pandas as pd
>>> from sklearn import datasets
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.decomposition import PCA
>>> from gojo import plotting
>>>
>>> # load test dataset (Wine)
>>> wine_dt = datasets.load_wine()
>>> data = StandardScaler().fit_transform(wine_dt['data'])
>>> PCs = PCA(n_components=2).fit_transform(data)
>>> PCs = pd.DataFrame(PCs, columns=['PC1', 'PC2'])
>>> PCs['target'] = wine_dt['target']
>>>
>>> plotting.scatterPlot(
>>>     df=PCs,
>>>     x='PC1',
>>>     y='PC2',
>>>     hue='target',
>>>     hue_mapping={0: 'C0', 1: 'C1', 2: 'C2'})
>>>

gojo.plotting.classification module

gojo.plotting.classification.confusionMatrix(df: pandas.DataFrame, y_pred: str, y_true: str, average: Optional[str] = None, y_pred_threshold: Optional[float] = None, normalize: bool = True, labels: Optional[list] = None, ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (5, 4), dpi: int = 100, cmap: str = 'Blues', alpha: float = 0.7, cm_font_size: int = 14, xaxis_label: Optional[str] = None, yaxis_label: Optional[str] = None, axis_label_size: int = 15, axis_label_pad: int = 15, axis_tick_size: int = 12, title: str = '', title_size: int = 15, title_pad: int = 15, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]

Function used to represent a confusion matrix from a pandas DataFrame with predictions and true values (e.g., returned by methods gojo.core.report.CVReport.getTestPredictions() and gojo.core.report.CVReport.getTrainPredictions()).

dfpd.DataFrame

Pandas DataFrame with the model predictions.

>>> df
    Out[0]
                            pred_labels  true_labels
    n_fold indices
    0      2                0.0          0.0
           6                0.0          0.0
           11               0.0          0.0
           12               0.0          0.0
           13               0.0          0.0
    ...                     ...          ...
    4      987              0.0          0.0
           992              0.0          0.0
           1011             0.0          0.0
           1016             0.0          0.0
           1018             0.0          0.0
y_predstr

Variable indicating which values are predicted by the model.

y_truestr

Variable indicating which values are the ground truth.

averagestr, default=None

Variable that stratifies the predictions (e.g.n at the folds level) to represent the mean and standard deviation values of the confusion matrix.

y_pred_thresholdfloat or None, default=None

Threshold to be used to binarize model predictions.

normalizebool, default=True

Parameter indicating whether to express the normalized confusion matrix (as a percentage).

labelslist, default=None

Labels used to identify the classes. By default, they will be C0, C1, …, CX.

axmatplotlib.axes.Axes, default=None

Axes used to represent the figure.

figsizetuple, default=(5, 4)

Figure size.

dpiint, default=100

Figure dpi.

cmapstr, default=’Blues’

Colormap.

alphafloat, default=0.7

Plot opacity.

cm_font_sizeint, default=14

Confusion matriz font size.

xaxis_labelstr, default=None

X-axis label.

yaxis_labelstr, default=None

Y-axis label.

axis_label_sizeint, default=15

XY-axis label size.

axis_label_padint, default=15

XY-axis pad.

axis_tick_sizeint, default=12

XY-ticks size.

titlestr, default=’’

Title.

title_sizeint, default=15

Title size.

title_padint, default=15

Title pad.

savestr, default=None

Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.

save_kwdict, default=None

Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.

showbool, default=True

Parameter indicating whether to save the generated plot.

>>> from gojo import core
>>> from gojo import plotting
>>>
>>> # ... data loading and model definition
>>>
>>> # perform the cross validation
>>> cv_report = core.evalCrossVal(
>>>     X=X,
>>>     y=y,
>>>     model=model,
>>>     cv=util.getCrossValObj(cv=5)
>>> )
>>>
>>> # get the model predictions on the test data
>>> predictions = cv_report.getTestPredictions()
>>>
>>> # plot the confusion matrix
>>> plotting.confusionMatrix(
>>>     df=predictions,
>>>     y_pred='pred_labels',
>>>     y_true='true_labels',
>>>     average='n_fold',
>>>     normalize=True,
>>>     labels=['Class 1', 'Class 2'],
>>>     title='Confusion matrix',
>>> )
>>>
gojo.plotting.classification.roc(df: pandas.DataFrame, y_pred: str, y_true: str, average: Optional[str] = None, stratify: Optional[str] = None, n_roc_points: int = 200, add_auc_info: bool = True, labels: Optional[dict] = None, labels_order: Optional[list] = None, show_random: bool = True, random_ls: str = 'dotted', random_lw: int = 1, random_color: str = 'black', random_label: str = 'Random', ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (5, 4), dpi: int = 100, style: str = 'ggplot', xaxis_label: Optional[str] = None, yaxis_label: Optional[str] = None, lw: Optional[float] = None, ls: Optional[str] = None, colors: Optional[list] = None, err_alpha: float = 0.3, title: str = '', title_size: int = 15, title_pad: int = 15, hide_legend: bool = False, legend_pos: str = 'lower right', legend_size: int = 10, xlabel_size: float = 13, ylabel_size: float = 13, grid_alpha: float = 0.5, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]

Function used to represent a ROC curve from a pandas DataFrame with predictions and true values (e.g., returned by methods gojo.core.report.CVReport.getTestPredictions() and gojo.core.report.CVReport.getTrainPredictions()).

dfpd.DataFrame

Pandas DataFrame with the model predictions.

>>> df
    Out[0]
                            pred_labels  true_labels
    n_fold indices
    0      2                0.0          0.0
           6                0.0          0.0
           11               0.0          0.0
           12               0.0          0.0
           13               0.0          0.0
    ...                     ...          ...
    4      987              0.0          0.0
           992              0.0          0.0
           1011             0.0          0.0
           1016             0.0          0.0
           1018             0.0          0.0
y_predstr

Variable indicating which values are predicted by the model.

y_truestr

Variable indicating which values are the ground truth.

averagestr, default=None

Variable that stratifies the predictions (e.g.n at the folds level) to represent the mean and standard deviation values of the confusion matrix.

stratifystr, default=None

Variable used to separate the predictions made by different models.

n_roc_pointsint, default=200

Number of ROC points to be calculated in order to represent the ROC curve.

add_auc_infobool, default=True

Parameter indicating whether to display the AUC value associated with each model in the legend.

labelsdict, default=None

Labels used to identify the models, if not provided the values of the variable specified in stratify or a default value of “Model” will be used. The labels should be provided as a dictionary where the key will be the value that identifies the model in the input data and the key will be the name given to the model.

labels_orderlist, default=None

Order in which the labels will be displayed by default they will be sorted or if parameter labels is provided they will appear in the order defined in that input parameter.

show_randombool, default=True

Indicates whether to display the ROC curve associated with a random model.

random_lsstr, default=’dotted’

Random line style.

random_lwint or float, default=1

Random line width.

random_colorstr, default=’black’

Random line color.

random_labelstr, default=’Random’

Random line label.

axmatplotlib.axes.Axes, default=None

Axes used to represent the figure.

figsizetuple, default=(5, 4)

Figure size.

dpiint, default=100

Figure dpi.

stylestr, default=’ggplot’

Plot styling. (see ‘matplotlib.pyplot.styles’)

xaxis_labelstr, default=None

X-axis label. Default to “False positive rate”

yaxis_labelstr, default=None

Y-axis label. Default to “True positive rate”

lwfloat or int or list, default=None

Line width(s).

lsstr or list, default=None

Line style(s).

colorslist or str, default=None

Colors used for identifying the dataframe information. A string colormap can be provided.

err_alphafloat, default=0.3

Opacity of the error shadow.

titlestr, default=’’

Plot title.

title_sizeint, default=15

Title font size.

title_padint, default=15

Title pad.

hide_legendbool, default=False

Parameter indicating whether to hide the legend.

legend_posstr, default=’upper right’

Legend position.

legend_sizeint, default=12

Legend size.

xlabel_sizeint, default=13

Size of the x-label.

ylabel_sizeint, default=13

Size of the y-label.

grid_alphafloat, default=0.15

Gird lines opacity.

savestr, default=None

Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.

save_kwdict, default=None

Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.

showbool, default=True

Parameter indicating whether to save the generated plot.

>>> from gojo import core
>>> from gojo import plotting
>>>
>>> # ... model definition and data loading
>>>
>>> # train the models
>>> model1.train(X_train, y_train)
>>> model2.train(X_train, y_train)
>>>
>>> # perform inference on the new data
>>> y_preds1 = model1.performInference(X_test)
>>> y_preds2 = model2.performInference(X_test)
>>>
>>> # gather the predictions on a single dataframe
>>> model1_df = pd.DataFrame({
>>>     'y_pred': y_preds1,
>>>     'y_true': y_test,
>>>     'model': ['Model 1'] * y_test.shape[0]
>>> })
>>> model2_df = pd.DataFrame({
>>>     'y_pred': y_preds2,
>>>     'y_true': y_test,
>>>     'model': ['Model 2'] * y_test.shape[0]
>>> })
>>> model_preds = pd.concat([model1_df, model2_df], axis=0)
>>>
>>> # display the ROC curve
>>> plotting.roc(
>>>     df=model_preds,
>>>     y_pred='y_pred',
>>>     y_true='y_true',
>>>     stratify='model')
>>>

Module contents