gojo.plotting package
Submodules
gojo.plotting.basic module
- gojo.plotting.basic.barPlot(*dfs, x: str, y: str, labels: Optional[list] = None, colors: Optional[list] = None, ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (6, 3.5), style: str = 'ggplot', dpi: int = 100, err_capsize: float = 0.15, err_lw: float = 1.5, grid_alpha: float = 0.15, xlabel_size: int = 13, ylabel_size: int = 13, title: str = '', title_size: int = 15, title_pad: int = 15, hide_legend: bool = False, legend_pos: str = 'upper right', legend_bbox_to_anchor: Optional[tuple] = None, legend_size: int = 12, yvmin: Optional[float] = None, yvmax: Optional[float] = None, xvmin: Optional[float] = None, xvmax: Optional[float] = None, hide_xlabel: bool = False, hide_ylabel: bool = False, xaxis_tick_size: int = 12, yaxis_tick_size: int = 12, xaxis_rotation: float = 0.0, yaxis_rotation: float = 0.0, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]
Bar plot function
- *dfs
Input dataframes with the data to be represented.
- xstr
X-axis variable. Must be present in the input dataframes.
- ystr
Y-axis variable. Must be present in the input dataframes.
- labelslist, default=None
Labels used for identifying the input dataframes.
- colorslist or str, default=None
Colors used for identifying the dataframe information. A string colormap can be provided.
- axmpl.axes.Axes, default=None
Axes used to represent the figure.
- figsizetuple, default=(6, 3.5)
Figure size.
- stylestr, default=’ggplot’
Plot styling. (see ‘matplotlib.pyplot.styles’)
- dpiint, default=100
Figure dpi.
- err_capsizefloat, default=0.15
Error capsize.
- err_lwfloat, default=1.5
Error linewidth.
- grid_alphafloat, default=0.15
Gird lines opacity.
- xlabel_sizeint, default=13
Size of the x-label.
- ylabel_sizeint, default=13
Size of the y-label.
- titlestr, default=’’
Plot title.
- title_sizeint, default=15
Title font size.
- title_padint, default=15
Title pad.
- hide_legendbool, default=False
Parameter indicating whether to hide the legend.
- legend_posstr, default=’upper right’
Legend position.
- legend_bbox_to_anchortuple, default=None
Used for modifying the legend position relative to the position defined in legend_pos.
- legend_sizeint, default=12
Legend size.
- yvminfloat, default=None
Minimum value in the y-axis.
- yvmaxfloat, default=None
Maximum value in the y-axis.
- xvminfloat, default=None
Minimum value in the x-axis.
- xvmaxfloat, default=None
Maximum value in the x-axis.
- hide_xlabelbool, default=False
Parameter indicating whether to hide the x-axis label.
- hide_ylabelbool, default=False
Parameter indicating whether to hide the y-axis label.
- xaxis_tick_sizeint, default=12
Controls the x-axis tick size.
- yaxis_tick_sizeint, default=12
Controls the y-axis tick size.
- xaxis_rotationfloat or int, default=0.0
Y-axis tick rotation.
- yaxis_rotationfloat or int, default=0.0
Y-axis tick rotation.
- savestr, default=None
Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.
- save_kwdict, default=None
Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.
- showbool, default=True
Parameter indicating whether to save the generated plot.
>>> from gojo import core >>> from gojo import plotting >>> >>> # i.e., compute model performance metrics >>> scores_1 = report1.getScores( >>> core.getDefaultMetrics( >>> binary_classification, bin_threshold=0.5))['test'] >>> >>> scores_2 = report1.getScores( >>> core.getDefaultMetrics( >>> binary_classification, bin_threshold=0.5))['test'] >>> >>> # adapt for barplot representation >>> scores_1 = scores_1.melt() >>> scores_2 = scores_2.melt() >>> >>> >>> plotting.barPlot( >>> scores_1, scores_2, >>> x='variable', y='value', >>> labels=['Model 1', 'Model 2'], >>> title='Cross-validation results' >>> )
- gojo.plotting.basic.linePlot(*dfs, x: str, y: str, err: Optional[str] = None, err_alpha: float = 0.3, labels: Optional[list] = None, ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (6, 3.5), style: str = 'ggplot', dpi: int = 100, colors: Optional[list] = None, title: str = '', title_size: int = 15, title_pad: int = 15, hide_legend: bool = False, legend_pos: str = 'upper right', legend_size: int = 12, xlabel_size: float = 13, ylabel_size: float = 13, grid_alpha: float = 0.5, yvmin: Optional[float] = None, yvmax: Optional[float] = None, xvmin: Optional[float] = None, xvmax: Optional[float] = None, lw: Optional[float] = None, ls: Optional[str] = None, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]
Line plot function.
- *dfspd.DataFrame
Input dataframes with the data to be represented.
- xstr
X-axis variable. Must be present in the input dataframes.
- ystr
Y-axis variable. Must be present in the input dataframes.
- errstr
Variable indicating the errors associated with the lines. Must be present in the input dataframes.
- err_alphafloat, default=0.3
Opacity used to plot the errors.
- labelslist, default=None
Labels used for identifying the input dataframes.
- axmatplotlib.axes.Axes, default=None
Axes used to represent the figure.
- figsizetuple, default=(6, 3.5)
Figure size.
- stylestr, default=’ggplot’
Plot styling. (see ‘matplotlib.pyplot.styles’)
- dpiint, default=100
Figure dpi.
- colorslist or str, default=None
Colors used for identifying the dataframe information. A string colormap can be provided.
- titlestr, default=’’
Plot title.
- title_sizeint or float, default=0.5
Title font size.
- title_padint, default=15
Title pad.
- hide_legendbool, default=False
Parameter indicating whether to hide the legend.
- legend_posstr, default=’upper right’
Legend position.
- legend_sizeint, default=12
Legend size.
- yvminfloat, default=None
Minimum value in the y-axis.
- yvmaxfloat, default=None
Maximum value in the y-axis.
- xvminfloat, default=None
Minimum value in the x-axis.
- xvmaxfloat, default=None
Maximum value in the x-axis.
- xlabel_sizefloat or int, default=13
X-axis label size.
- ylabel_sizefloat ot int, default=13
Y-axis label size.
- grid_alphafloat, default=0.5
Grid opacity.
- lwfloat or int or list, default=None
Line(s) width(s).
- lsstr or list, default=None
Line(s) styles(s).
- savestr, default=None
Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.
- save_kwdict, default=None
Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.
- showbool, default=True
Parameter indicating whether to save the generated plot.
>>> from gojo import plotting >>> >>> # train_info, test_info are pandas dataframes returned by gojo.deepl.fitNeuralNetwork >>> plotting.linePlot( >>> train_info, valid_info, >>> x='epoch', y='loss (mean)', err='loss (std)', >>> labels=['Train', 'Validation'], >>> title='Model convergence', >>> ls=['solid', 'dashed'], >>> style='default', legend_pos='center right') >>>
- gojo.plotting.basic.scatterPlot(df: pandas.DataFrame, x: str, y: str, hue: Optional[str] = None, hue_mapping: Optional[dict] = None, ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (6, 4.5), style: str = 'ggplot', dpi: int = 100, maker_size: Optional[float] = None, colors: Optional[list] = None, title: str = '', title_size: int = 15, title_pad: int = 15, hide_legend: bool = False, legend_pos: Optional[str] = None, legend_size: int = 12, xlabel_size: float = 13, ylabel_size: float = 13, grid_alpha: float = 0.5, yvmin: Optional[float] = None, yvmax: Optional[float] = None, xvmin: Optional[float] = None, xvmax: Optional[float] = None, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]
Scatter plot function.
- dfpd.DataFrame
Input dataframes with the data to be represented.
- xstr
X-axis variable. Must be present in the input dataframes.
- ystr
Y-axis variable. Must be present in the input dataframes.
- huestr ,default=None
Hue variable for plotting groups.
- hue_mappingdict ,default=None
Hash to map group names from the hue variable in the df to user-defined names.
- axmpl.axes.Axes ,default=None
Axes used to represent the figure.
- figsizetuple ,default=(6, 4.5)
Figure size.
- stylestr ,default=’ggplot’
Plot styling. (see ‘matplotlib.pyplot.styles’)
- dpiint ,default=100
Figure dpi.
- maker_sizefloat or int ,default=None
Marker size.
- colorslist or str ,default=None
Colors used for identifying the dataframe information. A string colormap can be provided.
- titlestr ,default=’’
Plot title.
- title_sizeint or float ,default=15
Title font size.
- title_padint ,default=15
Title pad.
- hide_legendbool ,default=False
Parameter indicating whether to hide the legend.
- legend_posstr ,default=None
Legend position.
- legend_sizeint or float ,default=12
Legend size.
- xlabel_sizefloat or int ,default=13
X-label size.
- ylabel_sizefloat or int ,default=13
Y-label size.
- grid_alphafloat ,default=0.5
Opcaity of the grid lines.
- yvminfloat ,default=None
Minimum value in the y-axis.
- yvmaxfloat ,default=None
Maximum value in the y-axis.
- xvminfloat ,default=None
Minimum value in the x-axis.
- xvmaxfloat ,default=None
Maximum value in the x-axis.
- savestr ,default=None
Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.
- save_kwdict ,default=None
Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.
- showbool ,default=True
Parameter indicating whether to save the generated plot.
>>> import pandas as pd >>> from sklearn import datasets >>> from sklearn.preprocessing import StandardScaler >>> from sklearn.decomposition import PCA >>> from gojo import plotting >>> >>> # load test dataset (Wine) >>> wine_dt = datasets.load_wine() >>> data = StandardScaler().fit_transform(wine_dt['data']) >>> PCs = PCA(n_components=2).fit_transform(data) >>> PCs = pd.DataFrame(PCs, columns=['PC1', 'PC2']) >>> PCs['target'] = wine_dt['target'] >>> >>> plotting.scatterPlot( >>> df=PCs, >>> x='PC1', >>> y='PC2', >>> hue='target', >>> hue_mapping={0: 'C0', 1: 'C1', 2: 'C2'}) >>>
gojo.plotting.classification module
- gojo.plotting.classification.confusionMatrix(df: pandas.DataFrame, y_pred: str, y_true: str, average: Optional[str] = None, y_pred_threshold: Optional[float] = None, normalize: bool = True, labels: Optional[list] = None, ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (5, 4), dpi: int = 100, cmap: str = 'Blues', alpha: float = 0.7, cm_font_size: int = 14, xaxis_label: Optional[str] = None, yaxis_label: Optional[str] = None, axis_label_size: int = 15, axis_label_pad: int = 15, axis_tick_size: int = 12, title: str = '', title_size: int = 15, title_pad: int = 15, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]
Function used to represent a confusion matrix from a pandas DataFrame with predictions and true values (e.g., returned by methods
gojo.core.report.CVReport.getTestPredictions()
andgojo.core.report.CVReport.getTrainPredictions()
).- dfpd.DataFrame
Pandas DataFrame with the model predictions.
>>> df Out[0] pred_labels true_labels n_fold indices 0 2 0.0 0.0 6 0.0 0.0 11 0.0 0.0 12 0.0 0.0 13 0.0 0.0 ... ... ... 4 987 0.0 0.0 992 0.0 0.0 1011 0.0 0.0 1016 0.0 0.0 1018 0.0 0.0
- y_predstr
Variable indicating which values are predicted by the model.
- y_truestr
Variable indicating which values are the ground truth.
- averagestr, default=None
Variable that stratifies the predictions (e.g.n at the folds level) to represent the mean and standard deviation values of the confusion matrix.
- y_pred_thresholdfloat or None, default=None
Threshold to be used to binarize model predictions.
- normalizebool, default=True
Parameter indicating whether to express the normalized confusion matrix (as a percentage).
- labelslist, default=None
Labels used to identify the classes. By default, they will be C0, C1, …, CX.
- axmatplotlib.axes.Axes, default=None
Axes used to represent the figure.
- figsizetuple, default=(5, 4)
Figure size.
- dpiint, default=100
Figure dpi.
- cmapstr, default=’Blues’
Colormap.
- alphafloat, default=0.7
Plot opacity.
- cm_font_sizeint, default=14
Confusion matriz font size.
- xaxis_labelstr, default=None
X-axis label.
- yaxis_labelstr, default=None
Y-axis label.
- axis_label_sizeint, default=15
XY-axis label size.
- axis_label_padint, default=15
XY-axis pad.
- axis_tick_sizeint, default=12
XY-ticks size.
- titlestr, default=’’
Title.
- title_sizeint, default=15
Title size.
- title_padint, default=15
Title pad.
- savestr, default=None
Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.
- save_kwdict, default=None
Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.
- showbool, default=True
Parameter indicating whether to save the generated plot.
>>> from gojo import core >>> from gojo import plotting >>> >>> # ... data loading and model definition >>> >>> # perform the cross validation >>> cv_report = core.evalCrossVal( >>> X=X, >>> y=y, >>> model=model, >>> cv=util.getCrossValObj(cv=5) >>> ) >>> >>> # get the model predictions on the test data >>> predictions = cv_report.getTestPredictions() >>> >>> # plot the confusion matrix >>> plotting.confusionMatrix( >>> df=predictions, >>> y_pred='pred_labels', >>> y_true='true_labels', >>> average='n_fold', >>> normalize=True, >>> labels=['Class 1', 'Class 2'], >>> title='Confusion matrix', >>> ) >>>
- gojo.plotting.classification.roc(df: pandas.DataFrame, y_pred: str, y_true: str, average: Optional[str] = None, stratify: Optional[str] = None, n_roc_points: int = 200, add_auc_info: bool = True, labels: Optional[dict] = None, labels_order: Optional[list] = None, show_random: bool = True, random_ls: str = 'dotted', random_lw: int = 1, random_color: str = 'black', random_label: str = 'Random', ax: Optional[matplotlib.axes.Axes] = None, figsize: tuple = (5, 4), dpi: int = 100, style: str = 'ggplot', xaxis_label: Optional[str] = None, yaxis_label: Optional[str] = None, lw: Optional[float] = None, ls: Optional[str] = None, colors: Optional[list] = None, err_alpha: float = 0.3, title: str = '', title_size: int = 15, title_pad: int = 15, hide_legend: bool = False, legend_pos: str = 'lower right', legend_size: int = 10, xlabel_size: float = 13, ylabel_size: float = 13, grid_alpha: float = 0.5, save: Optional[str] = None, save_kw: Optional[dict] = None, show: bool = True)[source]
Function used to represent a ROC curve from a pandas DataFrame with predictions and true values (e.g., returned by methods
gojo.core.report.CVReport.getTestPredictions()
andgojo.core.report.CVReport.getTrainPredictions()
).- dfpd.DataFrame
Pandas DataFrame with the model predictions.
>>> df Out[0] pred_labels true_labels n_fold indices 0 2 0.0 0.0 6 0.0 0.0 11 0.0 0.0 12 0.0 0.0 13 0.0 0.0 ... ... ... 4 987 0.0 0.0 992 0.0 0.0 1011 0.0 0.0 1016 0.0 0.0 1018 0.0 0.0
- y_predstr
Variable indicating which values are predicted by the model.
- y_truestr
Variable indicating which values are the ground truth.
- averagestr, default=None
Variable that stratifies the predictions (e.g.n at the folds level) to represent the mean and standard deviation values of the confusion matrix.
- stratifystr, default=None
Variable used to separate the predictions made by different models.
- n_roc_pointsint, default=200
Number of ROC points to be calculated in order to represent the ROC curve.
- add_auc_infobool, default=True
Parameter indicating whether to display the AUC value associated with each model in the legend.
- labelsdict, default=None
Labels used to identify the models, if not provided the values of the variable specified in stratify or a default value of “Model” will be used. The labels should be provided as a dictionary where the key will be the value that identifies the model in the input data and the key will be the name given to the model.
- labels_orderlist, default=None
Order in which the labels will be displayed by default they will be sorted or if parameter labels is provided they will appear in the order defined in that input parameter.
- show_randombool, default=True
Indicates whether to display the ROC curve associated with a random model.
- random_lsstr, default=’dotted’
Random line style.
- random_lwint or float, default=1
Random line width.
- random_colorstr, default=’black’
Random line color.
- random_labelstr, default=’Random’
Random line label.
- axmatplotlib.axes.Axes, default=None
Axes used to represent the figure.
- figsizetuple, default=(5, 4)
Figure size.
- dpiint, default=100
Figure dpi.
- stylestr, default=’ggplot’
Plot styling. (see ‘matplotlib.pyplot.styles’)
- xaxis_labelstr, default=None
X-axis label. Default to “False positive rate”
- yaxis_labelstr, default=None
Y-axis label. Default to “True positive rate”
- lwfloat or int or list, default=None
Line width(s).
- lsstr or list, default=None
Line style(s).
- colorslist or str, default=None
Colors used for identifying the dataframe information. A string colormap can be provided.
- err_alphafloat, default=0.3
Opacity of the error shadow.
- titlestr, default=’’
Plot title.
- title_sizeint, default=15
Title font size.
- title_padint, default=15
Title pad.
- hide_legendbool, default=False
Parameter indicating whether to hide the legend.
- legend_posstr, default=’upper right’
Legend position.
- legend_sizeint, default=12
Legend size.
- xlabel_sizeint, default=13
Size of the x-label.
- ylabel_sizeint, default=13
Size of the y-label.
- grid_alphafloat, default=0.15
Gird lines opacity.
- savestr, default=None
Parameter indicating whether to save the generated plot. If None (default) the plot will not be saved.
- save_kwdict, default=None
Optional parameters for saving the plot. This parameter will not have effect if the save parameter was set as None.
- showbool, default=True
Parameter indicating whether to save the generated plot.
>>> from gojo import core >>> from gojo import plotting >>> >>> # ... model definition and data loading >>> >>> # train the models >>> model1.train(X_train, y_train) >>> model2.train(X_train, y_train) >>> >>> # perform inference on the new data >>> y_preds1 = model1.performInference(X_test) >>> y_preds2 = model2.performInference(X_test) >>> >>> # gather the predictions on a single dataframe >>> model1_df = pd.DataFrame({ >>> 'y_pred': y_preds1, >>> 'y_true': y_test, >>> 'model': ['Model 1'] * y_test.shape[0] >>> }) >>> model2_df = pd.DataFrame({ >>> 'y_pred': y_preds2, >>> 'y_true': y_test, >>> 'model': ['Model 2'] * y_test.shape[0] >>> }) >>> model_preds = pd.concat([model1_df, model2_df], axis=0) >>> >>> # display the ROC curve >>> plotting.roc( >>> df=model_preds, >>> y_pred='y_pred', >>> y_true='y_true', >>> stratify='model') >>>