Parsing XML can be a tedious and unpleasant job if you insist on using just standard Unix tools like sed, awk, cut, grep and so on. One might say that it’s better to use python/perl/ruby/other language that ships with a full blown XML parser and use the standard Unix utilites for what they were meant for, plain old text files and not pesky XML. The problem with those nice programming languages is that they take away the one liners. You need to import stuff, have variables, flow control and so on.
A nice tool that makes one’s life easier when it comes to XML is XML2. It can convert a normal xml file to a more line oriented file format. The standard debian distribution has this neat tool in the repos so you are one apt-get away from using it.
One simple example. Take this XML file:
<xml>
<fruits>
<fruit name="apple" type="royal gala" quantity="2" price="1"/>
<fruit name="orange" type="tasty" quantity="4" price="1.5"/>
<fruit name="banana" type="green" quantity="3" price="1"/>
</fruits>
</xml>
We run xml2 against it:
cosu@roadwarrior:/tmp$ xml2 < fruits.xml
/xml/fruits/fruit/@name=apple
/xml/fruits/fruit/@type=royal gala
/xml/fruits/fruit/@quantity=2
/xml/fruits/fruit/@price=1
/xml/fruits/fruit
/xml/fruits/fruit/@name=orange
/xml/fruits/fruit/@type=tasty
/xml/fruits/fruit/@quantity=4
/xml/fruits/fruit/@price=1.5
/xml/fruits/fruit
/xml/fruits/fruit/@name=banana
/xml/fruits/fruit/@type=green
/xml/fruits/fruit/@quantity=3
/xml/fruits/fruit/@price=1
And now we extract all the fruit names:
cosu@roadwarrior:/tmp$ xml2 < fruits.xml |grep name |cut -d"=" -f2
apple
orange
banana
There you go! A fruit salad! Of course for more complicated stuff use other tools