Ugh. Everyone is talking about the citizen data scientist, but no one can define it (perhaps they know one when they
see one). Here goes — the simplest definition of a citizen data scientist is: non-data scientist. That’s not a pejorative; it just means that citizen data scientists nobly desire to do data science but are not formally schooled in all the ins and outs of the data science life cycle. For example, a citizen data scientist may be quite savvy about what enterprise data is likely to be important to create a model but may not know the difference between GBM, random forester, and SVM. Those algorithms are data scientist geek-speak to many of them. The citizen data scientist’s job is not data science; rather, they use it as a tool to get their job done. Here is my definition of the enterprise citizen data scientist:
A businessperson who aspires to use data science techniques such as machine learning to discover new insights and create predictive models to improve business outcomes.
Citizen Data Scientists Are A Hearty Lot
They must be dedicated to their part-time craft, because doing data science is not easy. It requires learning the life cycle: data acquisition, data preparation, feature engineering, algorithm selection, model training, model evaluation, and, finally, insights and/or predictions. They may even have to learn to program in R or Python. If they are lucky (and smart), they will download RapidMiner, KNIME, or others, because these tools provide nice visual drag-and-drop interfaces versus harsh coding.
Good News For Everyone That Deals With The Gnarly
The best news for citizen data scientists is that many of the gnarliest aspects of the data science life cycle are being abstracted by automated machine learning solutions (AutoML). Automated machine learning solutions such as DataRobot, H2O.ai’s Driverless AI, Google Cloud AutoML, and more provide sophisticated tools that abstract the gory details of data science so that citizen data scientists and perhaps mere mortals can analyze data and build robust machine learning models. It’s also good news for data scientists because the same automation of the data science life cycle can make data scientists more productive. And it’s good news for business because the demand for machine learning is reaching voraciousness-level.
Forrester Wave™ Evaluations On Machine Learning Solutions
Kjell Carlsson and I will be doing a Wave on automation-focused machine learning solutions, scheduled for publication in Q2 of 2019. We define this category as:
Software that provides enterprise data scientist teams and/or citizen data scientists with tools to train, deploy, and manage analytical results and models that are principally designed to automate key aspects of the machine learning life cycle, including feature engineering, algorithm selection, model evaluation, and explainability.