Data Scientist w/Polygraph

Clearance Level
Top Secret SCI + Polygraph
Data Science
Herndon, Virginia

REQ#: RQ65947

Travel Required: None
Requisition Type: Regular

We are GDIT. The people supporting and securing some of the most complex government, defense, and intelligence projects across the country. We ensure today is safe and tomorrow is smarter. Our work has meaning and impact on the world around us, but also on us, and that’s important.

GDIT is your place. You make it your own by embracing autonomy, seizing opportunity, and being trusted to deliver your best every day.

We think. We act. We deliver. There is no challenge we can't turn into opportunity. And our work depends on a Data Scientist joining our team to support our customers activities at Herndon, VA .

The following responsibilities highlight the role of a Data Scientist:

Candidate should perform data science experiments that use machine learning and artificial intelligence technologies to automate the conditioning of raw data into physical data models during ingest.  Currently, the extract, transform, and loading (ETL) of data into IDL systems is done with manually developed rules and algorithms.  The outcome of these experiments are to determine how and to what extent the process and existence of manually developed ETL rules and algorithms can be replaced or augmented by machine learning-based and other artificial intelligence techniques and tools.  The experiments will include using live data set(s) to demonstrate potential solutions.  Successful experiments may result in new capabilities being added to IDL during Phase 2 IDL enhancement activities.  The candidate shall perform multiple data science experiments , as directed by the COTR, in series (every 90 days, up to four per year) using different techniques and different data sets.  The candidate shall perform the following data conditioning tasks as part of the data science experiment:


(1)  Identify and label a subset of data from a dataset provided by Sponsor, for which manually built ETL rules exist.  Determine the amount of data to label based on the machine learning/artificial intelligence technique selected by candidate / team.

(2)  Use the labeled data to train a supervised machine learning or artificial intelligence algorithm with these rules.

(3)  Run a test of the machine learning/artificial intelligence algorithm on an unconditioned dataset.

(4)  Perform testing of the machine learning/artificial algorithm that includes effectiveness tests using a methodology that is equal to or better than 10-fold cross-validation.

(5)  Correct errors that cannot be properly trained with additional training or hand-crafted rules.  The Sponsor believes that there will only be a few hand-craftedrules to address most problems.

(6)  Use statistical sampling to estimate the number of errors in the conditioned dataset.

(7)  Data conditioning experiments shall be designed to be no longer than 60 days in duration per experiment.

(8)  Execute the Sponsor-approved data conditioning capability experiments using the existing IDL system(s) and cloud environment.

(9)  Recommend additional data conditioning methods using advanced data capabilities such as, machine learning and other artificial intelligence capabilities.  If, approved by Sponsor, include these methods and test results using industry accepted standards and methods.

(10)  Create and deliver an IDL Data Science Experiment Plan for each experiment, the first one is due no later than 60 days after implementation of contract award.  The document shall include the approach, schedule, program plan, forecasted compute costs, and any additional software needs for each experiment.  A subsequent IDL Data Science Experiment Plan every 90 days for the period of the contract.

(11)  Document the results of the experiments in IDL Data Science Experiment Results no later than 30 days after the completion of each experiment.  The IDL Data Science Experiment Results shall include a proposed methodology to implement the machine/artificial intelligence algorithm into the existing ETL processes.

Guides the successful completion of major programs and may function in a project leadership role.

Develops data-driven solutions to unusually complex business challenges.

Utilizes advanced tools and computational skills to interpret, connect, predict and make discoveries in complex data and deliver recommendations for business and analytic decisions.

Uses predictive modeling to increase and optimize customer experiences, efficiencies, process improvements, and other business outcomes.

Works with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions.

Performs additional duties as assigned.

May coach and review the work of less-experienced professionals.

May serve as a team or task leader. (Not a people manager)

Designs, researches and develops highly advanced processes, which may result in new product/business opportunities for the company

DESIRED QUALIFICATIONS: MS (or equivalent experience), 10+ years of experience.