Data Blog

Pushmi-Pullyu and Retrieving GeoNet Data

Published: Wed Mar 20 2024 9:20 AM
Data Blog

Welcome, haere mai to another GeoNet Data Blog. Today’s blog is about two methods we can use to retrieve data from autonomous recorders installed around Aotearoa-New Zealand. What they are, why we almost exclusively use one method and not the other, and how they deal with communications outages.

In the 1920 children’s novel The Story of Doctor Dolittle, by Hugh Lofting, about a doctor who talks to animals, one of the more unusual animal characters in the book is a Pushmi-Pullyu, a fictional animal that was a cross between a gazelle and unicorn and had two heads at opposite ends of its body.

A Pushmi-Pullyu illustrated in Hugh Lofting’s 1920 children’s novel The Story of Doctor Dolittle. Source Wikimedia Commons.

A Pushmi-Pullyu illustrated in Hugh Lofting’s 1920 children’s novel The Story of Doctor Dolittle. Source Wikimedia Commons.

What does a Pushmi-Pullyu have to do with retrieving GeoNet data? To be honest very little, but the name of the animal, Pushmi-Pullyu, includes the words “push” and “pull”, which are the two methods that can be used to retrieve data from continuous field data recorders. Hopefully you are interested enough to read further, no more references to fictional animals, we promise.

Before we get started, lets quickly explain how GeoNet uses some terms when referring to our data. We use “acquire” to indicate getting a measurement or observation from a location in the natural environment. When the acquisition is done by a digital recorder and a measuring device (or “sensor”), we refer to that as “recording”. When there is a person that makes the measurement (or acquires a “sample”), we call that “sampling”. When the acquired data arrive at the GeoNet Data Centre, we “collect” it, and typically apply some processes to curate and archive the data. In this blog, we use the term “retrieve” to mean getting the data from the field to our Data Centre. GeoNet sometimes uses the term “transport” for this step.

It’s a bit geeky, but if you are curious to know a bit more about the terms used for GeoNet data, have a look at the GeoNet Aotearoa New Zealand Glossary of Data-related terms.

Push Versus Pull


When we first started collecting data in 2001, GeoNet used two data retrieval methods to build the best solution for each data collection problem and available technology. Historically, we used “push” for our seismic and geodetic recorders, but we now use “pull” method for pretty much everything (with one exception). With the “push” method of data retrieval, a field-based data recorder sends data to our Data Centre. With the “pull” method, our Data Centre requests data from a recorder. While the result is the same, data arrives at our Data Centre, the two methods are fundamentally different.

With “push”, the field data recorder is in charge. It keeps track of what data have been successfully sent and knows what to try to send next. Our Data Centre is passively receiving the data and storing it away. With “pull”, our Data Centre is in charge and works out exactly what data it wants from a field recorder and asks for it. This might be data recorded between two dates or times, or the last few minutes or hours of data. The Data Centre needs to know what was already collected and sends a specific request to the field recorder for the data it wants.

Push Or Pull, which is the Best Choice?


As is the case with many questions, the answer to which is best, push or pull is “it depends”! In general, the “pull” option is nowadays considered as more secure and safe from cyber-attacks, because the Data Centre has full control on who (which field instrument) it is collecting the data from. But with the development of IoT (Internet of Things) devices, there is now a shift in the market to rely more and more on the recorder and adopt a “push” model.

Considering the technologies, scientific equipment, and resilience we need to operate the GeoNet Data Centre, the “pull” method is currently our preferred option. We have a number of computer systems at our data centre that, depending on what recorder we are pulling the data from, will know what data to collect, when to collect it, how often, and will make sure that we have retrieved all the data we recorded out in the natural environment.

We ensure that the computer systems ask for the data in a way that is efficient and cost effective for our “data transport” (or communication) network. For data streams where we need to collect data almost continuously (for example from our seismic stations and real time GNSS stations), the transport network and the Data Centre are set to constantly communicate with the sensors and “pull” data packets while ensuring we are collecting everything.

For most of our other data types, we can afford to receive longer time chunks of data less often. For example, GNSS not operating in real time and envirosensor data are pulled once an hour, with the Data Centre running a computer program that generates a data request and sends it to a recorder each time.

What Happens If Communications Are Lost?


With almost all our field recorders, we maintain a continuous communications connection. The recorders all have an IP (Internet Protocol) address, so they are effectively part of our big “computer network”. The communications system that maintains that network links to the field recorders mostly uses either a cellular phone network or a satellite network. While we work really hard to keep the continuous communications connection to every field recorder working, sometimes there are failures. What happens about getting data back if there is a communications outage? This depends a lot on the scientific equipment we use and the system that is collecting the data at the Data Centre. How this system is programmed will differ depending on if a “push” or “pull” method is used.

It is important to know that, regardless of the type of scientific equipment, virtually all of our recorders have the ability to continue recording and save the data “locally”. So, if there is a communication outage, we are generally able to retrieve all data that were not collected. We typically call this process “backfill”.

Push

We mentioned at the beginning that there is only one exception in our Pushmi-Pullyu story. So, who is this unicorn? These are the New Zealand DART stations, that are regularly pushing data from the bottom of the ocean to our Data Centre (it is a bit more complicated than this, but more on that another day). In this case, the “push” method is necessary to use the least amount of power and preserve battery life for sensors that are located deep down on the ocean floor. What the DART stations do is to continue recording new data and mark it as “not yet sent”. They also know that data are to be pushed at specific times, so when the right time comes, they start sending data chunks without any prompting from us. This normally happens every 6 hours but if there is an event (a “trigger”) the DART knows they need to send data more often. It is a very clever unicorn indeed!

Pull

All other sensors around Aotearoa are set to use a “pull” method. How that works, and how “pull” responds to a communications outage depends on the timing and length of the outage and how frequently our Data Centre tries to pull data. As most of our data recorders operate autonomously, they continue recording data even if we can’t communicate with them. This means that once communications are restored, a new “pull” request will normally succeed without a glitch.

A problem arises when an outage coincides with our Data Centre asking for data. The field recorder doesn’t get the request, and the data aren’t sent. Let’s illustrate this with some of our data recording systems.

Envirosensor data set are retrieved from our LRDCP (Low Rate Data Collection Platform) once an hour. The actual data request is “give me the data recorded between now and one hour ago”. If an outage is for less than an hour and doesn’t coincide with a data pull request, then there is no impact, and we get all the data when we request it. But if the communications are out when a pull request is made, our Data Centre notes which request failed and we have to repeat this request at a later time. An example of the “backfill” process we mentioned earlier.

With GNSS receivers (recorders) we also pull data hourly, so the impact of an outage is similar to envirosensor data, and we try to get any data we’ve missed in a similar way.

ScanDOAS data are pulled by the Data Centre only once each day, at about 7 pm, and subsequently deleted from the field computer. With daily retrieval, the chance of a communications outage impacting data collection is less than data that are pulled more often, but if there is an outage when we try to pull data, we fail to retrieve a whole day’s worth. The recovery scheme for missed data automatically requests any available files on the field computer every 30 minutes.

Our webcam photograph data set is a little different. Rather than asking the webcam for a particular photograph, every 10 minutes our Data Centre first instructs the camera to take a photograph and then pulls that photograph back. A communications outage means a webcam does receive the instruction to take a photograph, so the photograph never exists and can’t be retrieved later. As we are sending an instruction to each webcam every 10 minutes, a communications outage is quite likely to affect creating and retrieving at least some photographs.

The webcams autonomously record one image per second “video images” and we can retrieve those “as needed” by a special, manual pull request. We are working on a review of our webcam camera technology, what our volcanologists need, how often we record images, etc. While “push” versus “pull” and instructing cameras to take photographs isn’t part of the review, its likely to influence our thinking about how we need to retrieve photographs. Possibly more on that as the review progresses.

That’s it for now


Most have probably never thought of exactly how we retrieve data from our autonomous field recorders and the steps required to make it happen. It’s just one of several interesting “behind the scenes” activities GeoNet carries out to create and deliver data and data products to our users. We hope you enjoyed it. You can find our earlier blog posts through the News section on our web page just select the Data Blog filter before hitting the Search button. We welcome your feedback on our data blogs and if there are any GeoNet data topics you’d like us to talk about please let us know! Ngā mihi nui.

Contact: info@geonet.org.nz