Ace Your Data Science Career: A Step-by-Step Guide for Success
With the advent of new computing technologies, domains like data science are gaining more limelight than others. Humongous data and advance technologies available made it feasible for the businesses to get hidden insight from data assets. So, what is data science? How it is different from analytics and reporting? Do I need strong programming? I did not study mathematics; can I get into this field? Can I transition from another field to data science? What skillsets I need? So on and so forth.
Data science is not new-born; it is the part of endeavour that started in 1960’s in the name of Artificial Intelligence. Earlier technologies did not support the theories lies behind in making the machines to learn by itself but now it is possible at very less price. Further, cloud computing made it possible to run and deploy most complex application in no time.
If you are interested to be the part of this journey you need to understand its pipeline first. Broadly, there are five steps in the data science career pipeline and it does not require to be super human being to be the part of data science. Apparently easy skill like SQL can fit you in the arena. You can start with minimal programming with no maths background. No need to invest ‘fat’ amount in courses if you understand this pipeline. You can orchestrate your career in data-science as per your current skill-sets and available opportunities. Of the five steps, first three steps in the data science pipelines are related to the analysis of data where no math, no software engineering, or exhaustive-programming experience required. Process and business understanding is the key. If you understand the process you, you can make use of SQL and MS-Office to run these jobs. A few profiles related to data visualization requires data-viz tools experience like tableau or Power-BI. Fourth step expects you to be proficient in mathematics, statistics, exhaustive-programming. And if you want to complete data science journey, with all five steps, you should know all the software engineering and deployment concepts. On the top of all this, you should be able to integrate all data science steps on cloud computing.
Understanding Step #1, Data Pull:
Data science starts with data. Bigger the organization, complex the data collection process is and difficult the access approval process is. You SHOULD have knowledge about data-warehouses (DWH), data lakes and legacy systems. It is also expected you know about IT systems and SQL.
Ground truth: You, not only need process knowledge and SQL understanding, but also need to collaborate with many teams to get what you want. For example, if you need data for upcoming marketing campaign, you have to collaborate with operations and sales team to understand available data and ways to fetch it. This is the skill no 1 you need in this step, ability to collaborate.
Job availability: Most reporting jobs are available where only required skill is SQL with 1-2 years’ experience. Analysts in this job, usually, don’t deal with what’s in the data. They pass or report the data as it comes out from system or process.
Understanding Step #2, Data Clean:
Most companies invest heavily in data warehouses which keep data in clean formats and it is expected that users don’t have to work on data clean. But its far from reality. To make data speak you need to do multiple tasks like; treatment of missing values, formats conversion, data mapping etc. You should have tools handy that make it possible. You can use SAS, Python, R, SQL for this job. Usually, companies rely on the traditional systems like SQL and SAS. Working knowledge of Python or R is an advantage not mandatory.
If you don’t have programming background or you don’t want to learn math again, you can get in any of the first two step and work in data science pipeline.
Understanding Step #3, Exploratory Data Analysis (EDA):
Data exploration is required to get insights out of it. More than data, it is the business and process understanding that make data speak. Slice and dice data from all angles and perspectives for the business goal to be achieved. Data dashboards, data-stories, data-visualizations, power point presentation are key terms in this stage. Basic statistics background and knowledge of visualization tool (like Excel, Tableau, PowerBI) are mandatory to get things done in this step. Several BI tools like CognosBI, SAS Business Intelligence and MicroStrategy can integrate all three steps to one.
First three steps do not require any developer experience. Generally, people working in these steps are called ‘Analysts’. They know everything about business, data and how to get insights from data.
Ground truth: Companies invest heavily on Business Intelligence tools but power point presentations are choice #1 for most leaders to get insights. So, one should know how to make impressive presentations on PowerPoint.
Understanding Step #4, Modeling:
First three steps can be learned when you are part of any process for some time. Process and business understanding can help in making the use of tools to get insights from data. But step 4, need much more than just business understanding. You need high-end-programming skills, expert-level statistics and mathematics to create model for data predictions. This field is very vast and one need to study it for a longer period of time learn it properly. You can start with learning programming language (like Python, R, Scala or Java) and statistics then scale up to learn machine learning and deep learning techniques. You also need to learn how to deal with scale of data. Big data frameworks like Spark should be known.
Ground truth: Most people who want to learn data science start with python/R and Machine learning. And learn everything partially. But no company is looking for someone with entry-level modelling experience. One need to have hand-on experience on projects to get a job as data scientists. If you don’t have expertise in programming and mathematics, either you can get into any of the first three experience or give yourself time to learn all these and work on live project.
Understanding Step #5, Deployment:
Thought, the work of data scientist finishes with model creation but it is also expected that model have to be deployed. Its all about software deployment techniques. You should know about CI/CD pipeline and tools like Jenkins to make the model available for the users.
Learning all five steps in isolation can put you somewhere in the data science pipeline but to be a competent data scientist, you should know all. On one hand you have to be a good analyst, where you know all about domain, on other hand you should be a good developer too.
Cloud computing revolution is also changing the way data scientist are working. Companies are migrating their workloads on clouds and service providers alike AWS, GCP and AZURE are making it possible to integrate and automate all steps on the could platform.
Data science is multi-dimensional discipline where one needs to upgrade in all the dimension to be relevant in the field. Only in a few years single machine computing is replaced by cloud computing where you don’t need to invest in heavy machines, just pay for what you use.
Ready to dive in? A new and exciting career awaits! You can start by learning SQL then upgrade gradually as per your appetite.
To know more about free high quality courses available, read this article: