Manipulating CSV-format files on FreeBSD

Conclusion Updated 2013-08-16.

The initial Maestro implementation site uses CSV files for intermediate data exported from legacy systems. I’m using CSV files because they are human readable and provide a point of control between the legacy systems and Maestro. They also provide an opportunity to manipulate and report on the data using simple command-line Unix tools.

Doing some cursory research on command-line CSV-file manipulation, solutions mentioned most often included:

  • csvfix is a command line tool that reportedly does everything you’d want, including order/select columns, split/merge, generating SQL inserts from CSV data and diffing CSV data, and has a permissive license.
  • csvkit is a collection of programs adhering to the Unix philosophy (a program should do one thing and do it well), intended to be to CSV files what grep, sed, cut, sort… are to text. csvkit is licensed under the permissive MIT license.
  • csvprintf is a simple UNIX command line utility for parsing CSV files, similar to the printf utility. cvsprintf is written in C, and is licensed under the permissive Apache License, Version 2.0.
  • csvtool is included with the OCaml CSV library.
  • GNU Recutils is a set of tools and libraries to access human-editable, plain text databases called recfiles, with support for CSV file import and export (the recfile concept seems similar to SQLite, but recutils seems to have better csv support than SQLite – e.g. escaping the field delimiter when it occurs within text, supporting \r\n (carriage return – line feed) within a text string, etc.
  • Perl, AWK and R were also mentioned, but working solutions appeared non-trivial for the uninitiated (sorry Randall).

so let’s see what compiles on FreeBSD 8.3…..

csvfix

csvfix is available as a Windows binary from the project site with the csvfix source on BitBucket. Compiling (using “make lin” for Unix/Linux) resulted in a g++ option error.

From the csvfix wiki, compiling requires a modern version of g++ with C++2011 support – and reportedly uncompilable on Mac OS X by people who know a lot more about compilers than me, so that’s as far as I’ll be going with csvfix for now.

csvkit

csvkit is written in Python, and requires mkvirtualenv and pip, which I unfortunately don’t have any experience with (yet). I’ll postpone further csvkit investigation for now.

csvprintf

First, I tried the instructions in the download archive:

> tar -xzf csvprintf-1.0.3.tar.gz
 > ls
 csvprintf-1.0.3 csvprintf-1.0.3.tar.gz
 > cd csvprintf-1.0.3
 > ls
 CHANGES Makefile.in configure csvprintf.h xml2csv
 COPYING README configure.ac main.c
 INSTALL aclocal.m4 csv.xsl scripts
 Makefile.am config.h.in csvprintf.1.in svnrev.c
 > ./configure
 checking for a BSD-compatible install... /usr/bin/install -c
 checking whether build environment is sane... yes
 checking for a thread-safe mkdir -p... scripts/install-sh -c -d
 checking for gawk... gawk
 ...
 checking for iconv_open in -lc... no
 configure: error: required function iconv_open missing
 >

Making a wild stab in the dark (because I don’t know better), I installed iconv-2.0_4 from the ports tree – and got the same error (you’re probably saying “I could have told you so!”).

Next, although from googling it seemed there was a possibility of GNU Linux autoconf miss-reading my FreeBSD system (and iconv_open capability), I tried updating libiconv to see if that changed anything. Unfortunately, portmaster reported my version of Perl (5.10) was out-dated, which I tried updating to 5.16 (instead of the current 5.18, because I read a May 2013 post saying 5.18 broke a lot of modules).

portmaster -o lang/perl5.16 lang/perl5.10

However, that didn’t end in success because I have apparently installed modules from both CPAN and BSDPan, and BSDPan modules can’t be updated using portmaster. I don’t have time to hunt where these modules came from, and if I even need then anymore! Grrrr.

csvtool

On the plus side, OCaml CSV is in the FreeBSD ports (ocaml-csv), but on the other hand, it pulls in a *lot* of dependencies I won’t use again (including graphics stuff). I went ahead and compiled the port anyway. It completed without error, but it seems csvtool isn’t included in the FreeBSD port (or at least it didn’t get installed). Grrrr.

Conclusions

I’m rolling my own. I’m following tutorials on the Python CMS module, and adapting them to my simple but unique needs – and I need to learn Python anyway for OpenERP (and to hang out with the cool kids). However, I would like to have a general-purpose go-to CVS utility, and look forward to eventually testing csvkit.

Converting text files between Windows and Unix

I develop on both Windows and Unix laptops for deployment on Unix servers, and invariably forget to save text files from Windows in Unix-format (in Windows, lines end with both the line feed and carriage return ASCII characters, but Unix uses only a line feed). It doesn’t cause any problems, but it’s sure ugly to look at on the server. Googling around, I found the following solution using the tr command.

> mv file file.tmp
> tr -d '\r' < file.tmp > file
> rm file.tmp

After putting up with what was really more of an ugly hack for about a year, I decided it was time to write a quick and dirty shell script to hide the mess – but before reinventing the wheel thought I’d try a quick ‘net search. After all, I can’t be the first one to do this, could I?

Asking Google again for how to convert text from DOS to Unix, but this time including “shell script”, I mostly got the same links to tr, sed, and perl tutorials that I got the first time. There were also a couple links to shell scripts, but almost hidden in the hits was a reference to “flip”, a ~250 line C program that has apparently been the de facto standard command since it was released into the wild in 1998.

flip does a lot more than I was going to implement in my quick and dirty shell script. It detects binary files and leaves them alone unless intentionally overridden. It doesn’t modify files that are already in the specified format, and it preserves file timestamps. It handles user interrupts gracefully and doesn’t leave behind garbage or corrupted files.

Compiling and installing flip from the FreeBSD ports collection took all of 20 seconds. I don’t know why I didn’t find flip a year ago, but think I’ll dig out my venerable Unix in a Nutshell and see what other utilities are out there I could use, but don’t know about.

FreeBSD mention on Linux Outlaws

I was lucky enough to get a shout-out for FreeBSD from Fab and Dan recently on Linux Outlaws podcast #213. Fab conceded he had “… heard from other people that FreeBSD was a good server” and Dan grudgingly commented “I know it’s a very well used server.” On a Linux podcast, I couldn’t have hoped for more!

The origin of this shout-out was when I was cycling home from work one day listening to Linux Outlaws # 210 (AFAIR). Fab had launched into a tirade (in beautiful Fab fashion) on what a loss it would be to the Linux server community if the CentOS project collapsed (CentOS is a free GNU/Linux distribution based on and functionally identical to Redhat Enterprise Linux). I immediately got off my bike and e-mailed Fab the solution – FreeBSD. Not only is FreeBSD the absolutely best Unix-like server OS, but the FreeBSD project has easily stood the test of time with its genesis in 1993.

I like Linux Outlaws to get a overview of the open source community and updates on foss software projects, new and old (I also like listening to Fab and Dan banter back and forth, although the language is closer to sailor’s than nerd-speak. I admit I’m envious of the attention GNU/Linux attracts, but most of the attention is for desktop features – shiny widgets, graphical applications, etc. When I was selecting a server OS, I wanted a GUI-less interface, minimal cruft between me and the silicon, a stable consistent architecture, good documentation, a reliable release strategy, and a community that felt right. For me, that’s clearly FreeBSD!

FreeBSD Beastie Image
FreeBSD Beastie

(BSD Daemon Copyright 1988 by Marshall Kirk McKusick. All Rights Reserved.)