DeepSky Lab
DeepSky Lab

Data and Analytics, Intelligence, Homeland Security 3 MIN Read

Geospatial Data Cleaning: Five Challenges & Five Solutions

September 28th, 2022

OUR CAPABILITIES

Learn more about GDIT's analytical work, including our geospatial portfolio.

Anyone who’s ever worked with data will tell you that data isn’t perfect. As a career-long geospatial professional, most recently as a Geospatial Functional Expert with GDIT and with a career as a former U.S. Geospatial Engineer, I concur. Clean data is consistent, relevant, valid, recent, uniform, and complete … and rare.

That’s why understanding how to curate and clean geospatial data is such an essential element of any analytical strategy. It’s also, sometimes, an overlooked one. Understanding the challenges associated with data curation and cleaning – as well as how to overcome them – is something every analyst needs to know and is the foundation upon which actionable analytical results are based.

So, let’s look at five common challenges associated with geospatial data, as well as their solutions:

Not Knowing What “Right” Looks Like

Yes, there is a such thing as “right” and “wrong” data, and, yes, sometimes analysts begin a project not knowing the difference. This is where standards and documentation come in. A data schema that governs how data should be presented and, therefore, interpreted is so important. Make sure you have one, or that one is developed at the start of every project.

Inability to Edit Data

In some cases, often for security purposes, analysts will not have the proper administrative rights or required training to obtain them. In this case, and only after there’s consensus around what “right” data looks like, analysts should request that their admins provide them with the appropriate roles and permissions edit and manipulate the data.

Missing Source Data or Bad Metadata

Sometimes data sources are missing, or the metadata associated with it is bad or incomplete. In these instances, analysts should investigate sources and perform research to remedy this problem. Only when you have a complete picture as to where data came from can you reliably incorporate it into your analysis.

Lack of Tools to Contain the Data

To put it plainly, not all databases are created equal. Storing data in a way that’s secure, accessible, and easily updated is important. If you find yourself with inadequate database tools or management systems, make sure management provides the resources or workarounds.

Cleaning Tools That Vary in Quality

Just like databases, not all data cleaning tools are created equally either, unfortunately. Data cleaning is a cyclical process. Analysts import, merge, rebuild, standardize, normalize, deduplicate, verify, enrich, and export it… and then start again with new data imports as it becomes available. At every stage, there are tools that exist to help. Within ArcGIS, for example, there are native tools that can automate cleaning. Make sure you’re using quality tools that have an earned reputation for trust and dependability. Likewise, ensure you’re tracking the procedures for cleaning as these are often used in metadata and to investigate errors.

Data curation and cleaning - clearly - are not without challenges. But starting any analytical endeavor with an awareness of these challenges and with the knowledge of how to overcome them should give any analyst the confidence and assurance they need to set themselves up for success.