Not Enough Data Scientists? Look to the Cloud

Dave Vennergrund

In today’s knowledge economy, data is the raw material and data scientists are the talented artisans who can spin that data into gold. The only problem is that data scientists are in very short supply.

For government agencies, that talent shortage is limiting their ability to develop and deploy advanced analytical solutions.

While we would all like to snap our fingers and solve this problem, the fact is demand for data scientists will likely continue to outstrip supply for years to come. Fortunately, automation is coming to the rescue.

We can’t automate all of the high-level work data scientists do. But we don’t have to. Through automation, we can streamline the process of building models, saving time, and shifting workloads down the chain, both to data analysts and, where appropriate, to machines.

The convergence of modern cloud architectures, open source software and advanced analytics tools is allowing data analysts to perform work that only veteran data scientists could do a short while ago. And that’s having payoffs in terms of what we can accomplish using advanced artificial intelligence and machine learning tools.

Amazon’s new SageMaker is one such platform. SageMaker provides developers and data scientists the ability to quickly and easily build, train and deploy machine learning models – at scale and in the cloud – in a fully managed and governed framework. SageMaker greatly simplifies the process by providing building blocks that can be used to accelerate the entire process.

What might take weeks to accomplish using conventional data science practices can now be accomplished in a matter of days.

At General Dynamics Information Technology, we’ve been pioneering artificial intelligence, machine learning, data mining tools and natural language processing for more than 20 years, across the federal space. Our Data and Analytics Consulting practice identifies, evaluates and helps our programs insert new technology, like emerging self-service and assisted data science platforms such as Amazon’s SageMaker.

We’ve built and deployed numerous solutions in defense, intelligence and civilian agencies, creating predictive analytics tools to perform image recognition, identify cyber threats, forecast budgets and spending, cluster genomic and health data and stop fraud, waste and abuse. SageMaker lowers the entry point for agencies seeking to develop machine learning models, allowing staff to experiment and learn without having to make six- or seven-figure investments in software licenses and costly storage and computing resources.

Simple to use, it follows the same process data scientists would use on their own – access and prepare data, train and evaluate models using a variety of machine learning algorithms, and then deploy them. It includes both open source, and AWS-optimized algorithms to do classifications, generate forecasts or recommended actions, as well as perform image recognition, machine translation and text analytics. These are the building blocks with which viable solutions can be rapidly developed.

Because every use case is unique, machine learning solutions can’t typically be repurposed to other problem sets. What tools like SageMaker provide is an underlying framework – the storage options, the machine learning algorithms and other components that go into building a viable solution. And by making viable solutions more accessible – and affordable – they promise to speed up the pace of innovation in machine learning throughout the federal marketplace.