Manipulating CSV-format files on FreeBSD

Conclusion Updated 2013-08-16.

The initial Maestro implementation site uses CSV files for intermediate data exported from legacy systems. I’m using CSV files because they are human readable and provide a point of control between the legacy systems and Maestro. They also provide an opportunity to manipulate and report on the data using simple command-line Unix tools.

Doing some cursory research on command-line CSV-file manipulation, solutions mentioned most often included:

  • csvfix is a command line tool that reportedly does everything you’d want, including order/select columns, split/merge, generating SQL inserts from CSV data and diffing CSV data, and has a permissive license.
  • csvkit is a collection of programs adhering to the Unix philosophy (a program should do one thing and do it well), intended to be to CSV files what grep, sed, cut, sort… are to text. csvkit is licensed under the permissive MIT license.
  • csvprintf is a simple UNIX command line utility for parsing CSV files, similar to the printf utility. cvsprintf is written in C, and is licensed under the permissive Apache License, Version 2.0.
  • csvtool is included with the OCaml CSV library.
  • GNU Recutils is a set of tools and libraries to access human-editable, plain text databases called recfiles, with support for CSV file import and export (the recfile concept seems similar to SQLite, but recutils seems to have better csv support than SQLite – e.g. escaping the field delimiter when it occurs within text, supporting \r\n (carriage return – line feed) within a text string, etc.
  • Perl, AWK and R were also mentioned, but working solutions appeared non-trivial for the uninitiated (sorry Randall).

so let’s see what compiles on FreeBSD 8.3…..

csvfix

csvfix is available as a Windows binary from the project site with the csvfix source on BitBucket. Compiling (using “make lin” for Unix/Linux) resulted in a g++ option error.

From the csvfix wiki, compiling requires a modern version of g++ with C++2011 support – and reportedly uncompilable on Mac OS X by people who know a lot more about compilers than me, so that’s as far as I’ll be going with csvfix for now.

csvkit

csvkit is written in Python, and requires mkvirtualenv and pip, which I unfortunately don’t have any experience with (yet). I’ll postpone further csvkit investigation for now.

csvprintf

First, I tried the instructions in the download archive:

> tar -xzf csvprintf-1.0.3.tar.gz
 > ls
 csvprintf-1.0.3 csvprintf-1.0.3.tar.gz
 > cd csvprintf-1.0.3
 > ls
 CHANGES Makefile.in configure csvprintf.h xml2csv
 COPYING README configure.ac main.c
 INSTALL aclocal.m4 csv.xsl scripts
 Makefile.am config.h.in csvprintf.1.in svnrev.c
 > ./configure
 checking for a BSD-compatible install... /usr/bin/install -c
 checking whether build environment is sane... yes
 checking for a thread-safe mkdir -p... scripts/install-sh -c -d
 checking for gawk... gawk
 ...
 checking for iconv_open in -lc... no
 configure: error: required function iconv_open missing
 >

Making a wild stab in the dark (because I don’t know better), I installed iconv-2.0_4 from the ports tree – and got the same error (you’re probably saying “I could have told you so!”).

Next, although from googling it seemed there was a possibility of GNU Linux autoconf miss-reading my FreeBSD system (and iconv_open capability), I tried updating libiconv to see if that changed anything. Unfortunately, portmaster reported my version of Perl (5.10) was out-dated, which I tried updating to 5.16 (instead of the current 5.18, because I read a May 2013 post saying 5.18 broke a lot of modules).

portmaster -o lang/perl5.16 lang/perl5.10

However, that didn’t end in success because I have apparently installed modules from both CPAN and BSDPan, and BSDPan modules can’t be updated using portmaster. I don’t have time to hunt where these modules came from, and if I even need then anymore! Grrrr.

csvtool

On the plus side, OCaml CSV is in the FreeBSD ports (ocaml-csv), but on the other hand, it pulls in a *lot* of dependencies I won’t use again (including graphics stuff). I went ahead and compiled the port anyway. It completed without error, but it seems csvtool isn’t included in the FreeBSD port (or at least it didn’t get installed). Grrrr.

Conclusions

I’m rolling my own. I’m following tutorials on the Python CMS module, and adapting them to my simple but unique needs – and I need to learn Python anyway for OpenERP (and to hang out with the cool kids). However, I would like to have a general-purpose go-to CVS utility, and look forward to eventually testing csvkit.

One Reply to “Manipulating CSV-format files on FreeBSD”

  1. Thanks for this list! I didn’t know about csvprintf before. One more for the toolkit.

    I’d give csvkit another go. It’s the easiest to get up and running, at least under Xubuntu.

    If you’re learning Python, you might already know this now, but:

    Pip is the modern standard Python package installer.

    With pip, you just type “pip install csvkit” to install csvkit and its dependent packages.

    mkvirtualenv is probably only important if you are not allowed to install Python packages for all users. Instead you can create a virtual environment with a set of packages just for you and install it there.

Comments are closed.