Sign in

2.5M+ views | Data Scientist | MSc Analytics & MBA student | https://terenceshin.com/

Learn one of the most practical data science concepts

Image for post
Image for post
Photo by Florian Schmetz on Unsplash

Table of Content

  1. Introduction
  2. What is Feature Importance
  3. Why is Feature Importance so Useful?
  4. Feature Importance in Python
  5. Feature Importance with Gradio

Introduction

With all of the packages and tools available, building a machine learning model isn’t difficult. However, building a good machine learning model is another story.

If you think that machine learning simply involves throwing hundreds of columns of data into a notebook and using scikit-learn to build a model, think again.

A huge thing that is often ignored is selecting the appropriate features for these models. Useless data results in bias that messes up the final results of our machine learning…


Comparing several web UI tools for data science!

Image for post
Image for post
Photo by Cesar Carlevarino Aragon on Unsplash

Introduction

Machine learning models are exciting and powerful, but they aren’t very useful by themselves. Once a model is complete, it likely has to be deployed before it can deliver any sort of value. As well, being able to deploy a preliminary model or a prototype to get feedback from other stakeholders is extremely useful.

Recently, there has been an emergence of several tools that Data Scientists can use to quickly and easily deploy a machine learning model. …


What You Should Learn and How You Can Learn them

Image for post
Image for post
Photo by David Clode on Unsplash

Learning data science can be overwhelming. There are hundreds of tools and resources out there and it’s not always obvious what tools you should be focusing on or what you should learn.

The short answer is that you should learn what you enjoy because data science offers a wide range of skills and tools. That being said, I wanted to share with you what I believe are the top 10 Python libraries that are most commonly used in data science.

With that said, here are the Top 10 Python Libraries for Data Science:

1. Pandas


An updated resource to brush up your statistics knowledge for your interview!

Image for post
Image for post
Photo by Edge2Edge Media on Unsplash

Introduction

You’ve probably heard me say this a million times, but a data scientist is really a modern term for a statistician and machine learning is a modern term for statistics.

And because statistics is so important, Nathan Rosidi, founder of StrataScratch, and I collaborated to write OVER 50 statistics interview questions and answers. You can check out his website here!

With that said, let’s dive right into it!

Q: When should you use a t-test vs a z-test?

A Z-test is a hypothesis test with a normal distribution that uses a z-statistic. …


A modern-day metric that addresses the number one problem of Pearson’s correlation

Image for post
Image for post
Photo by Coffee Geek on Unsplash

Table of Content

  1. Introduction
  2. What is Distance Correlation?
  3. Mathematics behind Distance Correlation
  4. Implementing Distance Correlation in Python

Introduction

I think we can agree that one of the most commonly used measures in business is correlation, more specifically, Pearson’s correlation.

To recap, correlation measures the linear relationship between two variables, and that in itself is already a problem because there are MANY relationships that are not linear.

And so, for the sake of an example, you might conclude that the relationship between variable X and revenue is not correlated, when it in fact is correlated, just not linearly.

And this is where distance correlation comes…


#19. Learning how to set expectations will make a big difference in how “successful” you are in your career.

Image for post
Image for post
Photo by Clark Tibbs on Unsplash

In this article, I’m going to share with you 21 pieces of advice that I’ve learned from other data scientists and through my own experiences over the past few years.

Depending on how far you are into your career, some of these tips will definitely speak to you more than others. For example, “Take some time to discover and explore new libraries and packages” might not be as relevant for someone who is just starting off.

With that said, let’s dive right into it!

1. The simplest solution is often the best solution.

Being a data scientist doesn’t mean that you have to solve every problem with a machine…


Goodbye ETL & ELT, Hello dbt!

Image for post
Image for post
Image by Peter H from Pixabay

Every day, petabytes and petabytes of data are collected, operated on, and stored for a vast range of analytical purposes all across the world. Without pipelines to get this data and use it properly, large scale data science simply wouldn’t be possible. Traditionally, one of two processes, dubbed ETL and ELT, were used to grab large amounts of data, pick apart the bits that mattered, and then load these into a data lake or data store. …


Develop a deeper understanding of one of the most popular machine learning models

Image for post
Image for post
Image by Gerd Altmann from Pixabay

Support Vector Machines (SVMs) are one of the most popular machine learning models in the data science world. Intuitively, it’s a rather simple concept. Mathematically speaking, however, support vector machines can seem like a black box.

In this article, I have two goals:

  1. I want to demystify the mechanics underlying support vector machines and give you a better understanding of its overall logic.
  2. I’ll want to teach you how to implement a simple SVM in Python and deploy it using Gradio. By the end, you’ll be able to build something like this:
GIF Created by Author

With that said, let’s…


An essential data analysis tool for understanding your customers’ behavior

Introduction

Understanding your customers and their behaviors are the pinnacle to any successful startup, which is exactly what cohort analyses are for. A Cohort Analysis is an extremely useful tool that allows you to gather insights pertaining to customer churn, lifetime value, product engagement, stickiness, and more.

Cohort analyses are especially useful for improving user onboardings, product development, and marketing tactics. What makes cohort analyses so powerful is that they’re essentially a 3-dimensional visualization, where you can compare a value/metric across different segments over time.

By the end of this article, you’ll learn how to create something like this:

Image for post

If you’re…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store