TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

An Extensive Step by Step Guide to Exploratory Data Analysis

My personal guide to performing EDA for any dataset

Terence Shin, MSc, MBA
TDS Archive
Published in
9 min readJan 12, 2020

Be sure to subscribe here or to my exclusive newsletter to never miss another article on data science guides, tricks and tips, life lessons, and more!

What is Exploratory Data Analysis?

Exploratory Data Analysis (EDA), also known as Data Exploration, is a step in the Data Analysis Process, where a number of techniques are used to better understand the dataset being used.

‘Understanding the dataset’ can refer to a number of things including but not limited to…

  • Extracting important variables and leaving behind useless variables
  • Identifying outliers, missing values, or human error
  • Understanding the relationship(s), or lack of, between variables
  • Ultimately, maximizing your insights of a dataset and minimizing potential error that may occur later in the process

Here’s why this is important.

Have you heard of the phrase, “garbage in, garbage out”?

With EDA, it’s more like, “garbage in, perform EDA, possibly garbage out.”

By conducting EDA, you can turn an almost useable dataset into a completely useable dataset. I’m not saying that EDA can magically make any dataset clean — that is not true. However, many EDA techniques can remedy some common problems that are present in every dataset.

Exploratory Data Analysis does two main things:

1. It helps clean up a dataset.

2. It gives you a better understanding of the variables and the relationships between them.

Components of EDA

To me, there are main components of exploring data:

  1. Understanding your variables
  2. Cleaning your dataset
  3. Analyzing relationships between variables

In this article, we’ll take a look at the first two components.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Terence Shin, MSc, MBA
Terence Shin, MSc, MBA

Responses (16)

Write a response