Member-only story
An Extensive Step by Step Guide to Exploratory Data Analysis
My personal guide to performing EDA for any dataset

Be sure to subscribe here or to my exclusive newsletter to never miss another article on data science guides, tricks and tips, life lessons, and more!
What is Exploratory Data Analysis?
Exploratory Data Analysis (EDA), also known as Data Exploration, is a step in the Data Analysis Process, where a number of techniques are used to better understand the dataset being used.
‘Understanding the dataset’ can refer to a number of things including but not limited to…
- Extracting important variables and leaving behind useless variables
- Identifying outliers, missing values, or human error
- Understanding the relationship(s), or lack of, between variables
- Ultimately, maximizing your insights of a dataset and minimizing potential error that may occur later in the process
Here’s why this is important.
Have you heard of the phrase, “garbage in, garbage out”?
With EDA, it’s more like, “garbage in, perform EDA, possibly garbage out.”
By conducting EDA, you can turn an almost useable dataset into a completely useable dataset. I’m not saying that EDA can magically make any dataset clean — that is not true. However, many EDA techniques can remedy some common problems that are present in every dataset.
Exploratory Data Analysis does two main things:
1. It helps clean up a dataset.
2. It gives you a better understanding of the variables and the relationships between them.
Components of EDA
To me, there are main components of exploring data:
- Understanding your variables
- Cleaning your dataset
- Analyzing relationships between variables
In this article, we’ll take a look at the first two components.