Do you want to get quick insights from a large dataset? Just use Python – it has all the tools you need to load, transform and visualise complex data with just a few lines of code.
This post will show you how you can quickly create a professional visualisation from your data using Pandas and Plotly.
📌 The following Python snippets are best tried out in a Jupyter Notebook or directly in a Visual Code interactive notebook.
TL;DR
Installing the required libraries:
conda install -c plotly -y plotly_express nbformat pandas ipykernel
Load the data from a CSV file and visualise it:
import plotly.express as px
import pandas as pd
df = pd.read_csv('titanic.csv')
fig = px.histogram(df, x="Age")
fig.show()
Why Python?
Python is an interpreter language. This leads to very lean language syntax and fast execution of the code without prior compilation. This makes Python challenging for large production-ready software projects, but very attractive for research, machine learning or data science projects.
With so-called notebooks, you can write your Python code in an IDE or even in the browser and display the executed result directly below. Perfect if you just want to try something or quickly display some data in a pretty diagram. 😉
What is Pandas?
Pandas is an open-source library for analysing and manipulating data. It is characterised by its power, speed and simplicity. Pandas is the most widely used standard library for data processing in Python.
The installation can be done either via conda or pip as package manager:
conda install pandas
Now you can easily load and display data from a CSV file in a Python script:
import pandas as pd
df = pd.read_csv('titanic.csv')
df.head()
What is Plotly?
Plotly is a free open-source library for Python, Javascript and R. It is characterised by high-quality and interactive plots, a large number of available diagram types and its simplicity.
Available diagram types:
- Basics: scatter, line, area, bar, funnel, timeline
- Part-of-Whole: pie, sunburst, treemap, funnel_area
- 1D Distributions: histogram, box, violin, strip
- 2D Distributions: density_heatmap, density_contour
- Matrix Input: imshow
- 3-Dimensional: scatter_3d, line_3d
- Multidimensional: scatter_matrix, parallel_coordinates, parallel_categories
- Tile Maps: scatter_mapbox, line_mapbox, choropleth_mapbox, density_mapbox
- Outline Maps: scatter_geo, line_geo, choropleth
- Polar Charts: scatter_polar, line_polar, bar_polar
- Ternary Charts: scatter_ternary, line_ternary
The syntax for the diagrams differs from language to language, of course, but is quite similar. This means that the plots used in Python for viewing data can also be used in a React frontend with minor modifications.
Plotly Express
Plotly Express provides a high-level API for visualising complex plots with few lines of code. It simplifies the use of Plotly even more and you can reach your goal very quickly:
Installing the required packages:
conda install -c plotly -y plotly_express nbformat pandas ipykernel
Load the data and visualise it as a line plot:
import plotly.express as px
import pandas as pd
df = pd.read_csv('titanic.csv')
fig = px.histogram(df, x="Age")
fig.show()
Here you can see the standard steps:
- load data into a data frame
- filter and transform data
- specify the plot
- display the plot
Even much more complex plots can be displayed in the same simple way:
import pandas as pd
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter(df.query("year==2007"), x="gdpPercap", y="lifeExp", size="pop", color="continent", hover_name="country", log_x=True, size_max=60)
fig.show()
Many paths lead to the goal
Plotly Express is not the only way to visualise professional plots. The following libraries can serve as alternatives:
- Matplotlib: the most widely used library for creating visualisations with Python
- Seaborn: high-level API to simplify matplotib
- Bokeh: interactive visualisations for web applications
- Plotly Dash: interactive visualisations for web applications – alternative to Bokeh
Conclusion
If you are familiar with Python, Pandas and Plotly offer a quick and easy way to analyse and visualise data – and often even faster than with Excel. 🥳