Millennium Consulting

Managing Technological Change, Deployment, Development and Optimisation


Aspirations to be a Data Scientist?

By Phil Keet - 15th November 2017

The Harvard Business Review declares that the Data Scientist has become "the sexiest job of the 21st century" whilst Forbes magazine calls data science ‘the century’s hottest career’. Technical advances combined with exponentially increasing data storage capacity now offer organisations the chance to identify activity patterns and derive information that can be commercially valuable. This has increased demand for data engineers, analysts and scientists to collate, manipulate and extract knowledge/insight from the vast data sources now available.

Data has become one of the most lucrative, fastest-growing industries in the world. Companies such as Alphabet, Amazon, Apple, Facebook and Microsoft possess vast amounts of customer data which continues to grow rapidly. Smartphones and the internet have made data abundant and freely available. As “the internet of things” expands into new areas, data volumes continue to increase and the need to understand and interpret data has become a key imperative driving commercial success.

Data Scientists perform a wide range of activities including identifying source data, ensuring it is compiled in the required format after cleaning and munging for transfer to an enterprise data warehouse. Data is then available for interpretation using sophisticated methods derived from statistics and machine learning.

So what experience does the Data Scientist need to enable them to perform their role effectively?

Academic qualifications are advantageous and most Data Scientists have Masters degrees and many possess PhDs. Popular degree subjects include maths, statistics, computer science and engineering.

Data Scientists also need a broad range of technical skills including familiarity with R or Python programming languages and database querying languages such as SQL. They also need a basic understanding of statistics including familiarity with tests, distributions, maximum likelihood estimators, etc.

In a data driven environment, machine learning methods such as k-nearest neighbours, random forests and ensemble methods may need to be developed using R or Python. Multivariable calculus and linear algebra may be required for machine learning and statistical result interpretation. Understanding these concepts is important in environments where minor improvements in predictive performance or algorithm optimization can generate huge wins. Data munging skills may be required to manipulate diverse data sources and format types and data imperfections may include missing values, inconsistent string or date formats. Communicating the results of data analysis using visualization tools such as ggplot and d3.js are important as is the ability to engage effectively with business stake holders and explain the derived results.

For further information about the role of the Data Scientist, contact Philip Keet, Consulting Director at Millennium Affine on +44(0)1303262473.