Stop googling pandas data-frame commands!!

Sanchit Bhavsar
4 min readMay 11, 2020

>> Getting Started with EDA on pandas df using bamboolib UI

Pearson correlation for customer churn datasets

In this article, I will focus mainly on data analysis and visualization using bamboolib. The bamboolib UI is the fastest and the easiest way to work with pandas dataframe.

Before the introduction of bamboolib UI, I would like to point out the fact why data analysis & visualization are important factors before applying machine learning techniques. By performing Exploratory Data Analysis (EDA), we can analyze the trends and patterns in the data rather than looking for thousands of rows.

By performing Exploratory Data Analysis (EDA), we can analyze the trends and patterns in the data rather than looking for thousands of rows.

Pandas is a most powerful & flexible library which provides extensive means for data analysis. As a Data Scientist, I often use pandas library to get started with EDA. There are plenty of operations that we can perform on the given datasets. The issue that arise with search of particular commands or operations through google and in particular stack overflow may lead to unnecessary time consumption. Despite the availability of numerous solutions online, we tend to choose those fit perfectly but often end up writing out of scratch.

What if I tell that your life can be made easier, a tool that can perform operations based on data with an interactive UI. Adding a cherry on the top a tool that could export the code for all the operations with an integrated UI. well, sit tight!!

Let’s get started, for the analysis we will use customer churn data for telecom industry. The dataset is interesting to explore as it answers questions like, what variables are contributing to customer churn? or who are the customers more likely to churn? (get the data here)

import pandas as pd
df = pd.read_csv('telecom_churn.csv')

bamboolib helps with data wrangling and data exploration with pandas.

bamboolib adds an interactive UI to pandas output, which allow us to quickly prepare and visualize the datasets.

Installation

To install bamboolib for Jupyter Notebook or Jupyter Lab, run the following commands in terminal (integration with colab is in progress):

Note: It is strongly recommended to use a virtual envirnoment to avoid conflicts with other packages.

Initially the service has to be activated with the given email address and a key. A trial key valid for 14 days will be sent to the subscribed email address.

Alright! Now just import the bamboolib in the notebook and we are all set.

The bamboolib package comes with key benefits for data preparation, transformation, visualization & exploration. The data visualization can be done with overall dataframe and on individual columns.

The tool can handle upto 1Million data rows & 100 columns.

bamboolib interactive UI on pandas df

The UI includes most common and useful transformer operations:

  • Select or drop columns
  • Filter data values
  • Sorting
  • Groupby and aggregation
  • Join or merge dataframes
  • Change data types
  • Replace missing values
  • String manipulation
  • Extract date-time attributes
  • One-hot encoding & many more..
bamboolib UI with transformers

The UI ‘create plot’ action allows to create interactive charts like,

  • Bar plot
  • Line plot
  • Scatter plot
  • Box plot
  • Density heatmap
  • Scatter & line plot in 3D
  • Scatter matrix, etc

It also allow us to save the visualizations and adjust it by changing columns.

bamboolib UI create plots

The ‘explore dataframe’ functionality helps us to analyze the dataframe in more detail and interactive way. We can easily create bi-variant plots against the target variable in our case (‘churn’) and from the predictors we can find the important and the effect of each features on the target variable. As seen in the following image we notice that ‘monthly charge’ of the data plans has higher impact on customer churn which is totally decision-driven. We can also analyze the predictive power plot.

bamboolib UI explore dataframe

Further we can create the correlation matrix straight away without any hurdles to go through. When we select a particular feature on correlation matrix, it also allows us to dig deeper for inspecting the feature relationship.

bamboolib UI dataframe visualization

inspiration / motivation

  • flexible integration
  • saves a lot of time to avoid finding commands
  • speeds up the data exploration
  • live code export functionality

I think I have given you some pretty good reasons to get started with bamboolib or at least a motivation to try out this amazing tool!

A massive shout-out to the creators of bamboolib team.

Thank you for reading this article, I hope it was helpful to you!

--

--