Populating & Maintaining Your Data Warehouse

Once you have a well structured data warehouse you are ready to populate it (see our post “Five Terrific (and Free) GIS Data Sources” for potential sources). Before I began organizing Inlailawatash’s warehouse I created a list of my four main issues:

  1. How will I ensure users know where the data sets came from and when they were last downloaded?
  2. How will I avoid breaking links in my MXDs as data is updated?
  3. How will I make it easy to update the files?
  4. Should I keep an archive of my old spatial data?

Keeping these questions in mind I worked to create a plan. First I knew I wanted a central location to house all my meta data allowing it to be easily queried and updated. I am a big fan of Excel and look for any excuse to create a table. For my meta data I created a table with eight columns to house all the information, as shown below. This allows users to see all the required source information in one place.

Metadata Table Columns Example Dataset:
Provider Province of BC
Source Data BC
Theme Administrative Boundaries
Name Municipalities
File Name TA_MUNICIP
Date Last Downloaded October 2015
Next Download Check October 2016
Source Website http://catalogue.data.gov.bc.ca/dataset/tantalis-municipalities

I also wanted to make my data updates as easy as possible. I developed a process so that no matter who was doing the downloading it would be done consistently.

  1. Use the “Data Download Metadata” document to find your files that need updating, click on the hyperlink to go to the download site.
  2. When data is downloaded it should be in BC Albers projection for provincial wide data or the appropriate UTM projection for regional data (either zone 10 or zone 11 if BC data).
  3. Once downloaded this data is put in its layer folder (maintaining its original name). In cases where the data is large and must be downloaded by mapsheet all should be in the same layer folder and then in its own mapsheet folder (a merged version should also be created and continuously updated as more mapsheets are downloaded).
  4. The “Data Download Metadata” document is then updated with its new ‘last downloaded date’ and “Next Download Check”.

My final question “Should I keep an archive of all my old spatial data?” is still open to debate. Currently I am maintaining a separate archive with the same data warehouse structure for previously downloaded versions of my spatial data; however I am unsure as to whether I will continue this process. When making this decision for your own organization it is important to remember that data is continuously updated and having information from the past can come in handy, but it is also important to remember that storing data is expensive.

Good luck!

-Allison Hunt