I work with lots of environmental time series data from stationary instruments. This post describes why you should avoid mixing data and metadata in a single file and instead suggests an easy-to-implement, easy-to-use, maximally compact format consisting of two .csv files linked by unique identifiers.
Continue reading “How NOT to format time series data”Tag: CSV
Significant Digits
Everyone who has taken a first year chemistry class has learned that significant digits (aka “significant figures” or “sig figs”) indicate the precision of a measurement. The basic rule is that you save all measurement digits you are certain about plus one more that you estimate. Unfortunately, computers don’t know anything about significant digits. Developers creating data systems for scientific measurements should always include a rounding step as part of any data output. Not embracing significant digits can have … uhm … “significant” consequences.
Continue reading “Significant Digits”Ten UNIX commands every data manager should know
Working with data from varied sources can be frustrating — some data will be in CSV format; some in XML; some available as HTML pages; other data as relational databases or MS Excel spreadsheets.
This post will cover the UNIX tools that every data manager needs to be familiar with in order to work with varied data sources.
Continue reading “Ten UNIX commands every data manager should know”Data Structures – Tabular vs. Relational
With enough effort it is possible to fit a square peg into a round hole. But we have all learned — sometimes more than once — that it is much easier if peg and hole have the same shape.
Continue reading “Data Structures – Tabular vs. Relational”Data volumes
Despite what they say, size does matter.
Successful data management is all about finding the proper tools and formats for dealing with your data. There is no one-size-fits-all solution. The very first question you should be asking yourself is: “How much data are we talking about?”
Continue reading “Data volumes”