A Data Scientist is a person who is better at statistics than any software engineer and better at software engineering than any statistician.Josh Wills
Data Scientists convert raw data into valuable information for businesses. For this, they possess knowledge in many different areas including software development, data munging, databases, mathematics, statistics, machine learning and data visualization.
Let me give you an example for better understanding – Streaming companies such as Amazon and Netflix give a lot of importance to viewership data when they are commissioning new shows. What genres are doing well, what length is perfect for a sitcom, how many episodes should there be per season; all these decisions are taken based on existing data.
Also, one thing that should be clear in your mind from the beginning is the difference between Data Scientists and Data analysts – In simple terms, data scientists build the tools and algorithms that can be used to make sense of data, including big data. For this, they utilize technology, machine learning and mathematical principles.
On the other hand, data analysts apply these models to analyze business data of all kinds to help make smarter business decisions. Even the use of excel by businesses falls under the purview of data analysis. A “big data” analyst would just utilize large amounts of data, that cannot be processed by traditional tools like Excel.
This post is about Data scientists. If you want to check out the data analysts, here’s How to become a Data Analyst in India.
Job Profile (A Day in the life of a Data Scientist)
Your role as a Data Scientist would most probably be a combination of product (web) development, applied machine learning (ML), NLP, server administration, as well as people management.
Primary area of work is in the field of data science (duh!), that is, managing and utilizing data to further improve the products and services of the company. An average day involves writing code (at times closed sourced, at times open sourced), getting it reviewed, merged and ensuring it works fine when deployed. This code could be on web development or ML, depending on the needs of the day.
After a few years of experience, you would also help in hiring talent for the organization apart from taking leadership initiatives. The day, at that level, usually involves a few meetings to coordinate with team members on a specific task as well as interviewing someone when you become a team leader.
Work-life balance is good (frankly, it’s among the few professions where you can decide how your day goes). Money is great, and salaries are expected to grow exponentially.
Salary of a Data Scientist in India
Starting salary is roughly around 80,000 rupees per month. The money is really good, and it’s bound to get better and better. Big Data is taking over our lives, and thus, Data Scientists are going to be in huge demand in the coming years. Expect your salary to double in 4-5 years, or maybe even before that.
Yes, a degree is required. In fact, multiple degrees are required. The Data scientist we talked to had even gotten a PhD. For which degree to pursue, and when to pursue it, scroll down to the Step-by-Step guide as it is a slightly complicated thing to answer.
One thing to love about being a Data Scientist
“Freedom to explore and learn.”
One thing to hate about being a Data Scientist
“A very high barrier to entry.”
Resources and Tips
- Check out this excellent Reddit post before you begin your journey. The r/DataScience subreddit also links to some wonderful resources in its wiki.
- The YouTube channel and the website of Analytics India Magazine are great resources for general high-level overview of this profession. In particular, check out their “How to start a career in data science” series. I got a lot of information for this article from those videos.
- Freecodecamp is a great place if you want to learn the technical coding skills.
- If you want to learn web development from scratch, there is no better place on the internet than The Odin Project.
- If you speak Hindi, check out the videos of CodeWithHarry. The dude teaches stuff in what he calls ‘desi bhasha’ (kinda cringe, ngl), and breaks down difficult concepts easily. Start with – Complete Roadmap to Become a Data Scientist.
Step-by-step guide to becoming a Data Scientist
- Take science (PCM) in class 11th.
- Get an undergraduate degree in engineering (computer science branch). For that, you would have to give JEE and other equivalent exams. A bachelor’s degree in Data Science (available in a few colleges such as IIT Madras) is also a great option.
- Get your masters in Data Science or Mathematics/Statistics(most people generally prefer to do it abroad).
- This career path can also be achieved by doing your undergraduate in statistics/mathematics and then doing your masters in Data Science/CS with machine learning specialization. If you go this route, do a few courses related to programming and machine learning as soon as possible
- Phd is optional, but recommended.
- Sit for placements, and start looking for jobs through LinkedIn.
But while you are going through these steps, do this on the side:
One thing that you need to realize is that to enter this profession, you require deep and thorough knowledge of both math/statistics and computer science/programming. As far as online courses go for data science, I would suggest first go for required math courses on Edx by universities, only audit, no need for certificate. Once you’re done with the math courses, proceed onto data visualization. This website has tutorials available for free regarding those.
Next proceed to learn SQL & Python (and R if you can manage). Move on to Andrew Ng’s course on Coursera (he uses octave, but you can find python translations on GitHub). Again, no need to get certification. Once done proceed to either of these three – Deeplearning.ai courses, Advanced machine learning by NRUHSE(Coursera) or Machine learning by University of Columbia (on Edx). Get certification of either of the these to show in your CV.
Lastly, either go to kaggle or download a free dataset and try to work with it. Start with datasets with low dimensionality and proceed higher. You can get easily get hired as a fresher or junior level once you’ve shown you can do some intermediate level project (moderate dimensionality).
Data cleaning is also a huge requirement since that’s where most of your time will go. Knowledge of data mining also would be a huge plus. Learn both RDBMS & Big data.
Data analyst, Data Visualization expert, Data architect, Machine learning engineer