Significant Digits

Everyone who has taken a first year chemistry class has learned that significant digits (aka “significant figures” or “sig figs”) indicate the precision of a measurement. The basic rule is that you save all measurement digits you are certain about plus one more that you estimate. Unfortunately, computers don’t know anything about significant digits. Developers creating data systems for scientific measurements should always include a rounding step as part of any data output. Not embracing significant digits can have … uhm … “significant” consequences.

Continue reading

Ten UNIX commands every data manager should know

Working with data from varied sources can be frustrating — some data will be in CSV format; some in XML; some available as HTML pages; other data as relational databases or MS Excel spreadsheets.

This post will cover the UNIX tools that every data manager needs to be familiar with in order to work with varied data sources.

Continue reading

Data volumes

Despite what they say, size does matter.

Successful data management is all about finding the proper tools and formats for dealing with your data.  There is no one-size-fits-all solution.  The very first question you should be asking yourself is:  “How much data are we talking about?”

Continue reading