With all of the packages and tools available, building a machine learning model isn’t difficult. However, building a good machine learning model is another story.
If you think that machine learning simply involves throwing hundreds of columns of data into a notebook and using scikit-learn to build a model, think again.
A huge thing that is often ignored is selecting the appropriate features for these models. Useless data results in bias that messes up the final results of our machine learning…
Machine learning models are exciting and powerful, but they aren’t very useful by themselves. Once a model is complete, it likely has to be deployed before it can deliver any sort of value. As well, being able to deploy a preliminary model or a prototype to get feedback from other stakeholders is extremely useful.
Recently, there has been an emergence of several tools that Data Scientists can use to quickly and easily deploy a machine learning model. …
Learning data science can be overwhelming. There are hundreds of tools and resources out there and it’s not always obvious what tools you should be focusing on or what you should learn.
The short answer is that you should learn what you enjoy because data science offers a wide range of skills and tools. That being said, I wanted to share with you what I believe are the top 10 Python libraries that are most commonly used in data science.
With that said, here are the Top 10 Python Libraries for Data Science:
You’ve probably heard me say this a million times, but a data scientist is really a modern term for a statistician and machine learning is a modern term for statistics.
With that said, let’s dive right into it!
A Z-test is a hypothesis test with a normal distribution that uses a z-statistic. …
I think we can agree that one of the most commonly used measures in business is correlation, more specifically, Pearson’s correlation.
To recap, correlation measures the linear relationship between two variables, and that in itself is already a problem because there are MANY relationships that are not linear.
And so, for the sake of an example, you might conclude that the relationship between variable X and revenue is not correlated, when it in fact is correlated, just not linearly.
And this is where distance correlation comes…
In this article, I’m going to share with you 21 pieces of advice that I’ve learned from other data scientists and through my own experiences over the past few years.
Depending on how far you are into your career, some of these tips will definitely speak to you more than others. For example, “Take some time to discover and explore new libraries and packages” might not be as relevant for someone who is just starting off.
With that said, let’s dive right into it!
Being a data scientist doesn’t mean that you have to solve every problem with a machine…
Every day, petabytes and petabytes of data are collected, operated on, and stored for a vast range of analytical purposes all across the world. Without pipelines to get this data and use it properly, large scale data science simply wouldn’t be possible. Traditionally, one of two processes, dubbed ETL and ELT, were used to grab large amounts of data, pick apart the bits that mattered, and then load these into a data lake or data store. …
Support Vector Machines (SVMs) are one of the most popular machine learning models in the data science world. Intuitively, it’s a rather simple concept. Mathematically speaking, however, support vector machines can seem like a black box.
In this article, I have two goals:
With that said, let’s…
Understanding your customers and their behaviors are the pinnacle to any successful startup, which is exactly what cohort analyses are for. A Cohort Analysis is an extremely useful tool that allows you to gather insights pertaining to customer churn, lifetime value, product engagement, stickiness, and more.
Cohort analyses are especially useful for improving user onboardings, product development, and marketing tactics. What makes cohort analyses so powerful is that they’re essentially a 3-dimensional visualization, where you can compare a value/metric across different segments over time.
By the end of this article, you’ll learn how to create something like this: