How to become a data scientist? Here are 3 best tips to get started!

Data Scientist‘ is the most in-demand, futuristic, fancy, and high paying career option today. This leads directly to the question – How to become a data scientist? to By the end of this post, you will be one step closer to becoming a data scientist.

The internet is full of blogs and videos that provides a decent guide about the skills required to become a data scientist. However, there are hardly resources available that emphasize on the basic skills that sets you up to become a good data scientist.

Did you know, most data scientists start their careers as a data analyst and a data engineer? Without getting into the discussion around titles, naming and roles, all I want to say is that there is a certain basic set of skills required as you embark on your journey to becoming a data scientist.

Let’s get right into the 3 core skills, which are required to become a ‘good’ data scientist:

  • Learn to read/understand the numbers
  • Contextualize the data (numbers)
  • Connect the data points to tell a story

Learn to read/understand the numbers

As a data scientist, one of the most basic requirements is the ability to read and understand the numbers right. It is important to know what the numbers mean and how to interpret it.

A mean (average) is NOT the same as a median. Each of these statistics tells a very unique and different story about the data.

A change in percentage is NOT the same as a change in percentage points. Knowing the difference here and which to use when become extremely useful when presenting a business situation.

A Compounded Annual Growth Rate (CAGR) is NOT the same as Total Growth Rate. The former indicates consistency in growth while the later shows over-all growth.

Contextualize the data (numbers)

Throwing numbers and it’s correct interpretation is only half-job done. Without the right context, only numbers don’t mean much. This will result in a disconnect with your audience.

Let’s say you are presenting a fact about the Mariana Trench – the deepest point on earth. The pressure at the bottom of the Mariana trench is ~15,750 PSI (pound per square inch) or ~1083 bars. What does these numbers mean? How can I land these numbers properly with my audience.

So we decide to simplify this further. Pressure at the bottom of the Mariana Trench is ~1000x that of the pressure we feel at sea level. However, the audience is still not able to connect with this number. The audience is like – ‘Ahhh! But, I do not feel any pressure at the sea level’.

At this point, it becomes really important to put the numbers into context so that our audience is able to relate it with real life. Pressure at the bottom of Mariana trench feels like 100 adult elephants standing over your head.

Bravo! Now the numbers landed pretty well with the audience. Also, they are easily able to figure out the magnitude of pressure at the bottom of the Mariana trench.

Connect the data points to tell a story

Looking at numbers in isolation lead to false conclusions and wrong decision making! Hence, it is very important to connect various data points to be able to look at the full picture. And a full picture woven into a story that flows seamlessly is the key becoming a ‘good” data scientist.

To understand this better, let’s look at some numbers in context of the pandemic! Hopefully by now, understanding and contextualizing these numbers would be easy for you.

Let’s say we have 2 countries to compare – Country A & Country B. The task here is to recommend which country has lower risk of being infected. Below are some data points to consider, connect and build a story of the situation:

CountryPopulation
Country A1,000,000
Country B2,000,000
Table 1

The data in Table 1 tells us that Country B is 2x more populous compared to Country A.

CountryPopulation# of Cases
Country A1,000,000100,000
Country B2,000,000130,000
Table 2

The data about ‘# of Cases’ in Table 2 indicates that Country B has only 30% more cases in spite of their population being twice that of Country A. Can we really conclude that Country B has lower risk of being infect? Probably not.

CountryPopulation# of Cases# of Tests
Country A1,000,000100,000500,000
Country B2,000,000130,000600,000
Table 3

Inclusion of the number of tests data into our Table 3 now gives us a broader perspective on the real situation. We can clearly see that the tests conducted in Country B is only 20% more than that in Country A while their population is 2x more than that of Country A. This changes our conclusion to a large extent, however, we need to clearly show some data points that will drive our point home.

At this point, we are able to calculate 2 more metrics that will crystalize our story and conclusion further. Firstly, number of tests per 100 people will show us the testing capabilities of each of the country. And second, % positivity rate will show us the % of positive samples out of the total tests conducted in the country.

CountryPopulation# of Cases# of TestsTests per 100% Positivity Rate
Country A1,000,000100,000500,0005020%
Country B2,000,000130,000600,0003022%
Table 4

The data within Table 4 tells a very clear story. The testing in Country B is significantly lower than that of Country A. And the % positivity rate indicates that the infection rate in Country B is 10% more than that of Country A. Meaning if Country B were to ramp-up their testing efforts and match it to Country A, we would have over 220,000 cases in Country B. Currently the total cases in Country B is understated due to lower testing rate.

Now, we can confidently recommend Country A to be having a lower risk of being infected compared to Country B.

Conclusion

At the risk of over-simplifying the examples for each of the core skills, I have tried to explain what each of these entails. To me, these skills are as simple as I have described but grossly underrated when it comes to becoming a data scientist.

Improving on the above listed skills will place you well for a meaningful career in data science. The other skill-sets that are required to become a data scientist are going to be like satellite skills that will revolve around these core skills.

Kindly share your views, feedbacks and questions in the comment section below. If you liked this post, consider signing up to thechiverse community so that you can be a part of a weekly newsletter that I intend to start at a later time. Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *