Working with data from varied sources can be frustrating — some data will be in CSV format; some in XML; some available as HTML pages; other data as relational databases or MS Excel spreadsheets.
This post will cover the UNIX tools that every data manager needs to be familiar with in order to work with varied data sources.
With enough effort it is possible to fit a square peg into a round hole. But we have all learned — sometimes more than once — that it is much easier if peg and hole have the same shape.
Despite what they say, size does matter.
Successful data management is all about finding the proper tools and formats for dealing with your data. There is no one-size-fits-all solution. The very first question you should be asking yourself is: “How much data are we talking about?”