August 4th, 2022
Managing Your Data
Gathering ocean data is expensive and time consuming, with a little work you can leverage your investments in data production to create ongoing value for your company.
Have you ever invested in a new tool for a home improvement project that you used once, and then put on a shelf, never to be used again? Historically, we as an industry have tended to do the same thing with our data.
The usefulness of the data we gather goes beyond just a single analysis and the answering of one question. It can be used to build historical context and learn lessons from previous endeavors, informing future decisions and improving efficiency. It can be used to create transparency through the creation of visualizations and infographics. It can also be used to facilitate collaborations with other organizations and solve problems which are larger than those which one organization may be able to address alone.
Data management is often viewed as a complicated, technical subject, but it doesn’t need to be. In this article I’m going to explore some of the tools and techniques we can borrow from the data management world to better capitalize on our investment in producing data, from a non-technical perspective.
Writing Usable Metadata
Metadata is a complex sounding term for a relatively simple topic. The first step in capitalizing on the data you’re producing is to record important information about the data. Write down all of the information someone would need to know about the data you’re producing in order to use it again in the future. This includes things like: units of measurement, a description of where, why, and how the data was collected, and information about which instruments were used to produce the data.
The goal in recording this information is to make it so that the data stands on its own and can be used and reused without needing to talk to the people who produced it to get the necessary context. Without proper metadata the data becomes more and more difficult to use over time as people move on to new jobs and memories fade.
There is guidance available from organizations like CIOOS and IOOS on what information is important to record and how to organize the information.
Storing Your Data
Storing the data is the next important step in making sure that you and your team continue to get value from the data you’ve produced in the future. This step may seem obvious but there are some common pitfalls in where and how you store your data. Following a few simple guidelines you can make sure the data can be easily re-used in the future.
- Keep a central record of where all of your data is stored, especially if you are storing data in multiple locations. This could be as simple as a spreadsheet where you put the title, description, and storage location of each of your datasets. This will help your team to know where to look in the future.
- Make sure your data is backed up. Hard drives fail over time and data that is only stored in a single location will eventually be lost. Keeping a second backup of your data in a different location ensures that it won’t be lost in the event that one of your hard drives fails.
- Make sure that your data is stored in a common format. Often instruments will produce data in a proprietary format that was created by the instrument manufacturer. These formats have their benefits, but their use often requires special software maintained by the instrument manufacturer. Converting your data to a non-proprietary format ensures that you will still be able to use the data, even if the manufacturer stops maintaining their software.
Cataloging Your Data
As discussed above in the section on storing data, it’s important to maintain a central record of what data you have and where it’s stored so that your team is aware of what resources are at their disposal and where to find them. If your organization produces a great deal of data you may find that a spreadsheet becomes difficult to manage and identifying which datasets may be relevant to a particular problem can become difficult.
If these are problems that you and your team are experiencing you may be ready for a data catalog. A data catalog is basically a search engine for your data. It gives your team the ability to search and filter your datasets based on the metadata that you are producing. There are excellent open source catalogs available such as CKAN: https://ckan.org/. These are great if you wish to have full ownership over your data catalog and you have technical staff who are able to configure and maintain it. Alternatively you can look to groups like CIOOS who make a catalog available for public use.
By following these simple guidelines you can create ongoing value from the data that you and your company are invested in producing. Even if you don’t have a dedicated data manager on your staff or access to a technical team, it is still possible for you and your team to realize the benefits of well managed data.