Hello Data Scientists,
Let me continue from my last blog “How Data Scientist can help organization grow?” where I wrote the importance of Data Scientists and their role in the growth. To conclude my last blog, I would say Data Scientist are skilled with analytical capability to analyze huge data cloud(s) by intelligently articulating the outcome, allowing leaders to enrich their choices. They empower everyone by unleashing the hidden data views to outer world for ready consumption.
To be a good scientist start with A to Z i.e., Analytical, Industry Domain, Mathematics, Statistical algorithms, Power Tools, User Friendly and easy to read Visualization with Z as Zeal to learn power of data. – AG
To be an expert in any field one should be qualified on certain skills, it’s same for Data Scientist as well. From our school days, we have been taught few concepts around Statistics and as we grow and go for graduation we were taught industry domains and we try to go for specialization we get an exposure to Power tool like Excel and reporting tools and with fast change in technology there are lot of new Powerful tool like R, Python, SAS etc.
Let me help articulate and what skills (in no particular order) one should possess to be a Data Scientist. Kindly bear in mind this revolves around “Data Science” which is a combination of “DATA” and “SCIENCE” hence most of the skills revolve around these 2 key areas.
To be a Data Scientist, basic concept for Mathematics is very important as this profession is all about dealing in numbers. One should know basic mathematics details like Different Data Types, Data Operations, Probability, Probability Distribution etc. It is GOOD TO HAVE hands on experience on Algebra, Equations and Probability fundamentals. That said one does not need to be graduate in Mathematics, basic school level math’s knowledge is good enough to start Analytical journey towards Data Science.
It is one of the important skill (but not must) to have analytical mindset. Anyone who has interest in solving Puzzles or working on Brain Teasers are better placed and will have edge over others who does not find it interesting. I love puzzle solving, my favorite time pass is successfully finishing Sudoku. GOOD TO HAVE skills to get on the bandwagon but MUST HAVE SKILL from long term perspective.
Statistical Analysis complement analytical Skills. This is the most important and MUST HAVE skillset for a good Data Scientist. Individuals with good Statistical Analysis skills can infer better outcome using right techniques from the raw data.
There are 2 layer of tools to be a proficient Data Scientist, Data Collection tools and Data Consumption/Processing tools
Data Collection Tools:
Data can be collected either as structured data or unstructured data. Unstructured data gets collected primarily as the secondary data from applications where as Structured data is well designed data collected as applications are used. Unstructured data can be later processed using tools and be converted to structure data. CSV, XML, Logs, JSON are form of unstructured data whereas SQL and other relational databases holds structure data. GOOD TO KNOW tools.
Data Processing Tools:
Once data is collected it is MUST for Data scientist to know one statistical tools like R, Python, SAS. This will help gain productivity and avoid performing all manual calculations. There are easy to use basic tools like EXCEL, SQL and Scientific calculators but each has its limitation. GO FOR “R”, PYTHON” or any other statistical programming language. It is good to understand any Big data tools like Hadoop where data is stored in multiple server because of data size but managed by a single server etc.
INDUSTRY KNOWLEDGE OR DOMAIN:
As a scientist, it is always advantageous to understand industry domain as statistical calculations are based on Industrial parameters. Let me take an example Significance Level can be 95% i.e., 2σ – 3σ limit whereas Healthcare and Aviation industry anything less than 6σ can be disastrous. Outcomes will be more relevant and accurate as one understands domain. This is good to have skill, but I personally will keep it under MUST HAVE SKILL.
UX/UI OR VISUALIZATION:
Data if not presented to the right forum in RIGTH FORM, it is not useful. Data Visualization is GOOD TO HAVE skill as it can be complemented by Data writers for final consumptions by leaders. R has got good graphical visualization outcomes, those can be leveraged to come up with fantastic looking dashboards.
In my next blog, I will share more insight into how to pick any one statistical programming language. Though it is a matter of choice however I will share my rational for picking R and work through it.
Thank you once for sparing time going through this article, I hope it must have helped you understand what it will take for an individual to be a successful Data Scientist. Kindly share your views and what you would like to see and hear from me in my future blogs.
Outstanding Outliers “AG”.