Monday, February 6, 2012

How to grep for NULL char

Was trying to extract some FIX messages from log files, and more precisely filter out a number of messages like heartbeat. My initial thought was to do something like that:

cat fixlog.log | grep '\[8=FIX.' | tr '\1' '|' | grep -v '\|35=0'

That works fine, except it's a little bit wasteful. What this command above does is, replace all the null character (that are used as delimiter in FIX messages) by "|". So that got me thinking how I could filter out those without having to replace anything. My first idea was:

cat C20120203.log | grep '\[8=FIX.' | grep -v '\00135=D'

That didn't work because grep was interpreting \001 as a back reference. Several variants later (among which the use of sed) and some constructive goggling made me understand that the the problem wasn't coming from grep but from bash and here's the proof.

If you type the following (under linux):
$ echo '\001' | hexdump -C
00000000  5c 30 30 31 0a                                    |\001.|
00000005

The trick is to tell bash do some quoting (escaping), and you do that by prefixing the string with $:
$ echo $'\001' | hexdump -C
00000000  01 0a                                             |..|
00000002

So putting it all together the original command above becomes:
cat fixlog.log | grep '\[8=FIX.' | grep -v $'\00135=0'
The mechanism allow you to grep for any arbitrary ASCII character.