Significant Digits

Everyone who has taken a first year chemistry class has learned that significant digits (aka “significant figures” or “sig figs”) indicate the precision of a measurement. The basic rule is that you save all measurement digits you are certain about plus one more that you estimate. Unfortunately, computers don’t know anything about significant digits. Developers creating data systems for scientific measurements should always include a rounding step as part of any data output. Not embracing significant digits can have … uhm … “significant” consequences.

Precision

When collecting scientific data, it is desirable to measure things as precisely as possible, thus reducing measurement error as one of the sources of variance in downstream analysis. It is also important to report that precision using the commonly accepted rules for significant digits.

  1. Include each digit that can be accurately measured
  2. Include one more digit associated with an estimation

For example, if a small thermometer only has marks every 10 degrees, then we can report our room temperature as 68°F, estimating the one’s place. If a larger thermometer has tic marks for every degree, we might report 68.5°F. A highly accurate digital thermometer might report 68.47°F.

But no one outside of an applied physics lab should ever report a temperature of 68.478302°F. No off-the-shelf thermometer is capable of measuring temperature with a precision of millionths of a degree.

This issue is sometimes a problem within environmental science where new devices generate lots of data; that data is processed by code written by software engineers; and the number of decimal places displayed is often determined by a global setting in some software library rather than an understanding of measurement error.

To present just one example from 2022: A USGS elevation query service currently returns elevations in meters with nine decimal places — precise to the nanometer!!! Unless they are measuring elevation with a scanning electron microscope, this is way too much precision. A single decimal place would be better.

Negative consequences of too much precision

This focus on significant digits may seem pedantic but it has real world consequences for public facing data systems. Web based systems need to send data files to client applications. Smaller data files means improved performance and lower data transfer costs. If your web based visualization system needs to scale to tens of thousands of users, this will be important. Every non-significant digit in a data file is an extra byte of “garbage data” that should be removed. We’ll look at two examples below.

geojson

Many web-based maps include geojson files with many thousands of locations. Each location will have a block of ASCII text similar to the following chunk of 189 characters (= 189 bytes):

    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-107.8711, 37.3022]
      },
      "properties": {
        "temp": "68.5"
      }
    },

If longitude, latitude and temp each had nine decimal places, that would represent an extra 18 bytes — an expansion of 10%.

CSV

With timeseries data files, whether CSV or JSON, the expansion can be much more severe.

2210010100,46.513508,-114.090897,9,12939,12,-9999,2,18.3,60,902.9,-9999,45,1.6,340,14.3,0
2210010200,46.513508,-114.090897,9,12939,10,-9999,2,16.8,70,903.1,-9999,49,0.6,334,14.4,0

The above records from a CSV file contain a date-time stamp, latitude, longitude and 14 other measured variables, using -9999 as a missing value flag. The total size of each record is 90 bytes. Without careful attention to significant digits, a not unusual default setting of six significant digits would nearly double the size of each record.

Best wishes for proper handling of your significant digits!

Leave a Reply