Scatterplot 2D and 3D

Calls matplotlib scatterplot with some extra tweaks, like plotting groups with a specific colour and annotating built in.

Parameters:

df: pd.DataFrame,
x: object --> string column name of the scatterplot values in the DF for the X
y: object --> string column name of the scatterplot values in the DF for the Y
title='' --> string title
xlabel='' --> string x label
ylabel='' --> string y label
colour=None --> either a string for the colour (e.g. HEX or "blue"),
z=None --> string column name of the scatterplot values in the DF for the Z if this is set will make it 3D
zlabel=None -->  string z label
add_legend=True --> adding the legend or not
points_to_annotate=None --> specific points that you want to annotate as a list of strings [label1, label2, ...]
annotation_label=None, --> column name (string) that the points that you want to label are in
figsize=(3, 3),
title_font_size=12,
label_font_size=8,
title_font_weight=700,
s=30 --> size of the points, 10 is small, 100 is large.
color_col=None --> a column that you want to colour by, these were the values set in colour (i.e. df[color_col].values
config={},
)

Config options = any of the parameters with the same name but with in a dictionary format instead, and also includes default parameters for the visualisation such as the font family and font.

Example config:

config={'palette': ['red', 'yellow', 'pink'],
       'figsize':(4, 5),  # Size of figure (x, y)
        'title_font_size': 16, # Size of the title (pt)
        'label_font_size': 12, # Size of the labels (pt)
        'title_font_weight': 700, # 700 = bold, 600 = normal, 400 = thin
        'font_family': 'sans-serif', # 'serif', 'sans-serif', or 'monospace'
        'font': ['Tahoma'] # Default: Arial  # http://jonathansoma.com/lede/data-studio/matplotlib/list-all-fonts-available-in-matplotlib-plus-samples/
}

Loading data

[1]:
import pandas as pd
from sciviso import Barchart, Boxplot, Heatmap, Histogram, Scatterplot, Violinplot, Volcanoplot, Line
import matplotlib.pyplot as plt

df = pd.read_csv('iris.csv')
df
[1]:
sepal_length sepal_width petal_length petal_width label
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 5 columns

2D Scatterplot

[2]:
scatterplot = Scatterplot(df, x='sepal_width', y='sepal_length', title='IRIS',
                          xlabel='Sepal width', ylabel='Sepal Length', add_legend=False)
scatterplot.plot()
plt.show()
../_images/examples_Scatterplot_4_0.png

Scatterplot 3D

[3]:
scatterplot = Scatterplot(df, x='sepal_width', y='sepal_length',
                          title='IRIS',
                          xlabel='Sepal width', ylabel='Sepal Length',
                          z='petal_length',
                          add_legend=False) # Just need to add a z parameter
scatterplot.plot()
plt.show()
../_images/examples_Scatterplot_6_0.png

Annotate one of the classes on the chart

Add in the colour of each class as a row label.

[4]:
scatterplot = Scatterplot(df, x='sepal_width', y='sepal_length', title='IRIS',
                          xlabel='Sepal width', ylabel='Sepal Length',
                          points_to_annotate=['Iris-setosa'], # Could add the other ones in here
                          annotation_label='label',
                          add_legend=False) # What column those values come from
scatterplot.plot()
plt.show()
../_images/examples_Scatterplot_8_0.png
[5]:
# Do the same in 3D
scatterplot = Scatterplot(df, x='sepal_width', y='sepal_length', title='IRIS',
                          xlabel='Sepal width', ylabel='Sepal Length',
                          points_to_annotate=['Iris-setosa'], # Could add the other ones in here
                          annotation_label='label',
                          z='petal_length',
                          add_legend=False) # What column those values come from
scatterplot.plot()
plt.show()
../_images/examples_Scatterplot_9_0.png

Colour rather than annotate groups

[6]:
# Same initial setup
scatterplot = Scatterplot(df, x='sepal_width', y='sepal_length', title='IRIS',
                          xlabel='Sepal width', ylabel='Sepal Length') # What column those values come from

#
groups_labels = ['Iris-setosa'] # Could also do all groups but this is just showing how
# to highlight one
group_idxs = [[i for i, l in enumerate(df['label'].values) if l == 'Iris-setosa']]

scatterplot.plot_groups_2D(groups_labels, group_idxs)
plt.show()
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
../_images/examples_Scatterplot_11_1.png
[7]:
# Same initial setup
scatterplot = Scatterplot(df, x='sepal_width', y='sepal_length', title='IRIS',
                          xlabel='Sepal width', ylabel='Sepal Length',
                         z='petal_width')

#
groups_labels = ['Iris-setosa'] # Could also do all groups but this is just showing how
# to highlight one
group_idxs = [[i for i, l in enumerate(df['label'].values) if l == 'Iris-setosa']]

scatterplot.plot_groups_3D(groups_labels, group_idxs)
plt.show()
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
../_images/examples_Scatterplot_12_1.png
[8]:
# Same initial setup
scatterplot = Scatterplot(df, x='sepal_width', y='sepal_length', title='IRIS',
                          xlabel='Sepal width', ylabel='Sepal Length',
                          z='petal_width',
                          figsize=(5, 5), # Make the figure bigger.
                          s=100) # Make the size of the points bigger

#
groups_labels = ['Iris-setosa', 'Iris-virginica'] # Could also do all groups but this is just showing how
# to highlight one
group_idxs = [[i for i, l in enumerate(df['label'].values) if l == 'Iris-setosa'],
              [i for i, l in enumerate(df['label'].values) if l == 'Iris-virginica']
             ]

scatterplot.plot_groups_3D(groups_labels, group_idxs,
                           alpha_bg=0.01)
plt.show()

# Since you can't see one group since it's behind other values, let's remove the background "grey points"

*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
../_images/examples_Scatterplot_13_1.png

Advanced style options

Here are some examples with extra style options.

[9]:
scatterplot = Scatterplot(df,
                          x='sepal_width',
                          y='sepal_length',
                          title='IRIS',
                          xlabel='Sepal width',
                          ylabel='Sepal Length',
                          z='petal_width',
                          colour=df['petal_length'].values, # Colour on this values
                          zlabel=None,
                          add_legend=True,
                          points_to_annotate=None,
                          annotation_label=None,
                          add_correlation=False,
                          correlation='Spearman',
                          title_font_size=12,
                          label_font_size=8,
                          title_font_weight=700,
                          color_col=None, # Set points to be coloured by this column
                            # Config options = any of the parameters with the same name but with in a dictionary format instead
                             # You could also pass these as individual parameters, but it's easier to set as a dictionary
                            # also, then you can re-use it for other charts!
                          config={'s': 200, # Make points larger
                                  'figsize':(4, 5),  # Size of figure (x, y)
                                  'title_font_size': 16, # Size of the title (pt)
                                  'label_font_size': 12, # Size of the labels (pt)
                                  'title_font_weight': 700, # 700 = bold, 600 = normal, 400 = thin
                                  'font_family': 'sans-serif', # 'serif', 'sans-serif', or 'monospace'
                                  'font': ['Tahoma'] # Default: Arial  # http://jonathansoma.com/lede/data-studio/matplotlib/list-all-fonts-available-in-matplotlib-plus-samples/
                          })
scatterplot.plot()
plt.show()
../_images/examples_Scatterplot_15_0.png
[10]:
scatterplot = Scatterplot(df,
                          x='sepal_width',
                          y='sepal_length',
                          title='IRIS',
                          xlabel='Sepal width',
                          ylabel='Sepal Length',
                          z='petal_width',
                          colour=df['petal_length'].values, # This wil be overridden on color_col
                          zlabel=None,
                          add_legend=True,
                          points_to_annotate=None,
                          annotation_label=None,
                          add_correlation=False,
                          correlation='Spearman',
                          title_font_size=12,
                          label_font_size=8,
                          title_font_weight=700,
                          color_col='label', # Set points to be coloured by this column
                            # Config options = any of the parameters with the same name but with in a dictionary format instead
                             # You could also pass these as individual parameters, but it's easier to set as a dictionary
                            # also, then you can re-use it for other charts!
                          config={'s': 200, # Make points larger
                                  'figsize':(4, 5),  # Size of figure (x, y)
                                  'title_font_size': 16, # Size of the title (pt)
                                  'label_font_size': 12, # Size of the labels (pt)
                                  'title_font_weight': 700, # 700 = bold, 600 = normal, 400 = thin
                                  'font_family': 'sans-serif', # 'serif', 'sans-serif', or 'monospace'
                                  'font': ['Tahoma'] # Default: Arial  # http://jonathansoma.com/lede/data-studio/matplotlib/list-all-fonts-available-in-matplotlib-plus-samples/
                          })
scatterplot.plot()
plt.show()
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
../_images/examples_Scatterplot_16_1.png

Saving

Saving is the same for all plots and v simple, just make sure you specify what ending you want it to have.

[27]:
scatterplot = Scatterplot(df, x='sepal_width', y='sepal_length', title='IRIS',
                          xlabel='Sepal width', ylabel='Sepal Length',
                          points_to_annotate=['Iris-setosa'], # Could add the other ones in here
                          annotation_label='label',
                          colour=df['petal_width'].values,
                          add_legend=True,
                          config={'palette': ['orchid', 'paleturquoise', 'gold'],
                           'figsize':(3, 3),  # Size of figure (x, y)
                           's': 20,
                           'title_font_size': 16, # Size of the title (pt)
                           'label_font_size': 12, # Size of the labels (pt)
                           'title_font_weight': 700, # 700 = bold, 600 = normal, 400 = thin
                           'font_family': 'sans-serif', # 'serif', 'sans-serif', or 'monospace'
                           'font': ['Tahoma'] # Default: Arial  # http://jonathansoma.com/lede/data-studio/matplotlib/list-all-fonts-available-in-matplotlib-plus-samples/
                           }) # What column those values come from
scatterplot.plot()
plt.savefig('scatter2D.svg', bbox_inches='tight') # .png, .pdf, .jpg
plt.savefig('scatter2D.png', dpi=300) # .png, .pdf, .jpg
plt.savefig('chart.pdf') # .png, .pdf, .jpg
../_images/examples_Scatterplot_18_0.png
[28]:
# Same initial setup
scatterplot = Scatterplot(df, x='sepal_width', y='sepal_length', title='IRIS',
                          xlabel='Sepal width', ylabel='Sepal Length',
                          z='petal_width',

                         config={'palette': ['orchid', 'paleturquoise', 'gold'],
                           'figsize':(5, 5),  # Size of figure (x, y)
                           's': 100,
                           'title_font_size': 16, # Size of the title (pt)
                           'label_font_size': 12, # Size of the labels (pt)
                           'title_font_weight': 700, # 700 = bold, 600 = normal, 400 = thin
                           'font_family': 'sans-serif', # 'serif', 'sans-serif', or 'monospace'
                           'font': ['Tahoma'] # Default: Arial  # http://jonathansoma.com/lede/data-studio/matplotlib/list-all-fonts-available-in-matplotlib-plus-samples/
                           }) # Make the size of the points bigger

#
groups_labels = ['Iris-setosa', 'Iris-virginica'] # Could also do all groups but this is just showing how
# to highlight one
group_idxs = [[i for i, l in enumerate(df['label'].values) if l == 'Iris-setosa'],
              [i for i, l in enumerate(df['label'].values) if l == 'Iris-virginica']
             ]

scatterplot.plot_groups_3D(groups_labels, group_idxs,
                           alpha_bg=0.01)
# Since you can't see one group since it's behind other values, let's remove the background "grey points"
plt.savefig('scatter3d.svg', bbox_inches='tight') # .png, .pdf, .jpg
plt.savefig('scatter3d.png', dpi=300) # .png, .pdf, .jpg
plt.savefig('chart.pdf') # .png, .pdf, .jpg
../_images/examples_Scatterplot_19_0.png