Posts tagged ‘Tips’

Parsing XML data using bash and standard Unix tools

Parsing XML can be a tedious and unpleasant job if you insist on using just standard Unix tools like sed, awk, cut, grep and so on. One might say that it’s better to use python/perl/ruby/other language that ships with a full blown XML parser and use the standard Unix utilites for what they were meant for, plain old text files and not pesky XML. The problem with those nice programming languages is that they take away the one liners. You need to import stuff, have variables, flow control and so on.

A nice tool that makes one’s life easier when it comes to XML is XML2. It can convert a normal xml file to a more line oriented file format. The standard debian distribution has this neat tool in the repos so you are one apt-get away from using it.

 

One simple example. Take this XML file:


<xml>
<fruits>
<fruit name="apple" type="royal gala" quantity="2" price="1"/>
<fruit name="orange" type="tasty" quantity="4" price="1.5"/>
<fruit name="banana" type="green" quantity="3" price="1"/>
</fruits>
</xml>

We run xml2 against it:

cosu@roadwarrior:/tmp$ xml2 < fruits.xml
/xml/fruits/fruit/@name=apple
/xml/fruits/fruit/@type=royal gala
/xml/fruits/fruit/@quantity=2
/xml/fruits/fruit/@price=1
/xml/fruits/fruit
/xml/fruits/fruit/@name=orange
/xml/fruits/fruit/@type=tasty
/xml/fruits/fruit/@quantity=4
/xml/fruits/fruit/@price=1.5
/xml/fruits/fruit
/xml/fruits/fruit/@name=banana
/xml/fruits/fruit/@type=green
/xml/fruits/fruit/@quantity=3
/xml/fruits/fruit/@price=1

And now we extract all the fruit names:

cosu@roadwarrior:/tmp$ xml2 < fruits.xml |grep name |cut -d"=" -f2
apple
orange
banana

There you go! A fruit salad! Of course for more complicated stuff use other tools :)

 

Choosing random entries from a group

In the past two weeks we had a lottery-type thing on RGC.ro (Romanian Guitarist Community). Proguitar, the official importer of Fender products in Romania, wanted to give-away a custom made Fender Stratocaster electric guitar. To register, the community users had to fill out a form and choose from a series of custom options for the guitar.

As organizers we had to pick out the lucky winner of the raffle. Usually this is done by someone who is impartial. Due to the fact that we had about 1600 entries and that we are geeks we wanted to do something that geeks would do. Therefore we ditched the “extract the name of the lucky winner from a bowl”. The geek version of this is described in RFC2777 – Publicly Verifiable Nomcom Random Selection

In short RFC2777 describes a simple publicly verifiable algorithm to pick out a set of entries from a group as random as possible. The keywords here are public – anyone can see how the entries are picked – and as random as possible. To have random values a thing called information entropy is needed. To get that initial random value full of juicy entropy we used, as suggested in the RFC, the results from three international lotteries. This initial random value was slightly modified for each “extracted” entry and then transformed into a MD5 hash. Due to the nature of a hash when slightly modifying the original the resulting hash differs heavily from the original hash.

Below you can find a naive python implementation that can be freely used for any purpose. Just make sure you fill in the entropySource with a good initial random value.

import md5                                                 

if __name__ == '__main__':

    entropySource = "9.24.30.32.36.40./18.25.35.43.46.47./1.3.4.8.23.31./"

    numberOfEntries = 1655
    numberOfWinners = 10  

    numbers = map( lambda x: x + 1, range( numberOfEntries ) )

    i = 0
    entries = numberOfEntries
    print "index \t hex value of MD5 \t div \t selected"
    while ( i < numberOfWinners ) :
        md5hash = md5.new()
        md5hash.update( chr( i ) + entropySource + chr( i ) )
        val = int( md5hash.hexdigest(), 16 )
        modulo = val % entries
        print str( i + 1 ) + "\t" + md5hash.hexdigest() + "\t" + str( entries ) + "\t" + str( numbers[modulo] )
        del numbers[modulo]
        i += 1
        entries -= 1

Get your personal email account

Most people use free email services like yahoo, gmail or live. Unfortunately all the nice sounding email addresses are taken by now so new users have to come up with strange combinations like johndoe19__smth_smth@yahoo.com. That’s very hard to remember and it sounds very unprofessional.

Having an online presence is no longer such a big deal. With a few dollars a year you can get your own .com (or other top-level-domain) and another few dollars a month get you a hosting plan which provides you a couple megabytes for website storage and a number of email accounts. So with a small investment you can have a decent email like name.sourname@somedomain.com . That’s something that you could put on your personal business card. Few know that you can skip the email service offered by your webhost  and instead use a more reliable service.

Both Microsoft and Google offer domain email hosting as a free service. Microsoft calls this Windows Live Custom Domains ( https://domains.live.com/ ) while Google calls it’s service Google Apps ( http://www.google.com/apps/intl/en/group/index.html )

Using these services is quite simple. You just have to prove that you are indeed the owner of the domain and make some DNS modifications so that emails will be handled by Google or Microsoft. Modifying the DNS records is a process that can be made using the web interface set up by your hosting provider (the one that hosts your DNS records) or by directly edition your DNS configuration in case you manage the DNS yourself. Either way both Microsoft and Google give you directions on how and what to modify.
For the tech savvy readers there are 2 basic steps: add a CNAME record containing a random string to prove that you are the rightful owner and then modify the MX records with the one provided in the instructions. It’s not that complicated.

Why should you do this?
Well both Microsoft and Google provide a better service than a normal hosting company when it comes to reliability. Sure, you don’t sign a contract that mentions any SLA but statistically speaking both offer a kick-ass service. You don’t have to worry about backups, downtime, spam and so on. It just works. For small operations, say personal email or small companies like startups , this kind of service is ideal as it cuts costs and/or gives less headaches.
Using the administration page you can create, delete or reset any email account. If someone messes up his/hers password you can simply reset the account. 
By using either the Microsoft based service or the Google one you get access to other related services like Office Online or Google Docs because the created email accounts serve as Live IDs or Google Accounts. This opens a new world of online collaboration. I know a few startups that use these kind of services.

What are the downsides?
You don’t own your email (carefully read the EULA’s ) and some may not like this.
You are limited to 50 or 100 email accounts and when you hit that limit you have to upgrade to a paid service. Individuals and small companies will just ignore this.
The web mail interface will display ads just as gmail.com or live.com. Adblocker type software could make this a non-issue.
You get little to no tech support. This can be neglected by individuals or small companies considering the advantages.

Access to the email account is made either by browser or by email client. Google Apps email can be accessed by POP3, IMAP and webmail. Unfortunately Windows Live Custom Domains does not offer access using the IMAP or POP3 protocols. To use Outlook you need to install a small piece of software called Office Outlook connector. The advantage of this approach is that besides email you can synchronize your address book and calendars. The IMAP and POP3 protocols don’t allow that. For Thunderbird + live you need a plugin but you get only basic service : get/send emails, no calendar :( .

With 9$ a year you could get a .com domain. You just need a public DNS server to host your records and that’s it, you can sign up for free email hosting.

Regarding DNS hosting, this is really not an issue. http://freedns.afraid.org/ is a very good option. If you don’t like it you could always ask your geek friend to help you out.

It’s hard to tell which service is best. Right now I’m using both Live Custom Domains and Google Apps and I’m quite happy with either one. It all depends on what you want to achieve.

After a year or more of using Goggle Apps I’m thinking of decommissioning all of my postfix installs (yes postfix is better than qmail) and switching to one of the above options. Having a full blown email server (even if it’s just a virtual machine with just enough resources serving many domains by means of sql and virtual domains) seems more and more a waste of time and resources for small operations.

I have a gut feeling that more and more companies will outsource the email service. I’ve seen this happening on a large scale in a few Universities in Romania.  The Bucharest Academy of Economic Studies is using Google Apps to offer email accounts to all it’s students ( that’s more than 20.000 accounts!). Likewise there’s a small implementation of Live @EDU , a Microsoft programme that basically does the same thing, in the Faculty of Automatic Control and Computers at the POLITEHNICA University in Bucharest (that’s about 3000 accounts, give or take). 

Color that manpage!

Manpages are the last line of defence when it comes to unix troubleshooting. After you’ve tried everything you could have think of and it still doesn’t work you know it’s time to read the manual.

By default linuxes use the less command to display the man page requested by the user. The manpage is displayed as plain text and because of that it can be sometimes hard to find what you’re looking for. Keywords and special parameters are printed with a bold face to ease document navigation but sometimes this is not enough.

Navigation is done by using the up and down arrows , page up/ page down and the space key.
Searching through the document is done by typing the / character followed by the word or phrase to search for.

One useful hack is to color the manpage so that keywords parameters and so on are highlighted.

To do this you we have to set some environment variables:

export LESS_TERMCAP_mb=$'\E[01;31m' # begin blinking
export LESS_TERMCAP_md=$'\E[01;31m' # begin bold
export LESS_TERMCAP_me=$'\E[0m' # end mode
export LESS_TERMCAP_se=$'\E[0m' # end standout-mode
export LESS_TERMCAP_so=$'\E[01;44;33m' # begin standout-mode - info box
export LESS_TERMCAP_ue=$'\E[0m' # end underline
export LESS_TERMCAP_us=$'\E[01;32m' # begin underline

The strange ‘\E0 strings stand for color codes used by the bash shell. You can check out some info about that on the bash-prompt-howto

After you have customized your colors you can save the above commands in your .bashrc file (the one in your home folder) so that the variables are set every time you logon.

Quickie: Wrap to 80 columns

I got a complaint that my submitted text file is not wrapped to 80 columns. Rather than work my butt to mix and match the text lines until i get to the bastard’s requirement I used the neat little tool called fold

cosu@cosu-desktop:~/Desktop$ cat file | fold -s
my monitor resolution is soooooooooooooooooo small that more than 80 colums of
text give me a segfault.

-s stands for break at spaces. man fold for more options.

Find script path

Azi am avut nevoie sa gasesc calea in care sa gaseste un script chiar din scriptul respectiv. Fiserul se plimba prin diverse locuri si eu am nevoie de cai absolute ( hint java classpaths)

Solutie

MYPATH=”$(readlink -f $(dirname “$0″))”

hints: man readlink, man dirname

On screen

Screen este o scula foarte utila atunci cand ai multe de facut si putine terminale la dispozitie. Practic screen este un multiplexor de terminal, cam cum e gnome terminal sau yakuake in varianta lor cu taburi, doar ca totul e facut direct in consola.

Pe langa multiplexarea mai sus amintita screen ofera si un feature foarte util. programul este imun la semnalele de tipul sighup. Mai pe romaneste screen ramane pornit pe masina chiar daca ai inchis conexiunea ssh/fizica cu terminalul in care lucra. Utilitatea chestiei asteia se dovedeste atunci cand ai de facut un task peste o legatura la internet foarte proasta. De regula procesele pornite din un shell au ca parinte respectivul shell. In clipa in care legatura peste ssh moare, moare si shellul si cu el procesele copil. Foarte urat. Porinite din screen procesele raman active chiar si dupa terminarea legaturii remote. Deja ne vin in minte utilizari foarte productive: client de torenti in screen, download masiv cu wget si prietenii, stat pe irc cu irssi (sau alt irc client in mod text) compilari fara sfarsit si multe altele.

Utilizarea lui screen necesita putina acomodare. Pornirea lui se face direct ruland screen.
Comenzile se activeaza folosind shortcuturi de tipul Ctrl+A urmate de o litera(case sensitive) /cifra/simbol. Ctrl+A este cunoscut ca escape sequence (cine a lucrat cu minicom stie).
Combinatia Ctrl+A urmata de ? afiseaza pe ecran toate comenzile suportate de screen.
Cateva combinatii utile:

Ctrl+A c –> creeaza un terminal nou

Ctrl+A “ –> afiseaza o lista cu toate terminalele

Ctrl+A <cifra> trece la terminalul cu id-ul respectiv

Ctrl+A d –> detasare de screen (practic un fel de exit)

Ctrl+A n –> next

Ctrl+A p –> previous

Ctrl+A ESC –> copy mode

Personal am renuntat la Ctrl+A pentru caracterul `. Avand in vedere ca folosesc rar caracterul (si doar cand e vorba de scripturi bash.

Reatasarea la screen se poate face ruland screen –r –d. Putem avea mai multe procese screen ce ruleaza si putem da ca parametru sesiunea la care vrem sa ne atasam.

Screen se poate customiza folosind fisierul .screenrc

Al meu arata asa:

escape ``
hardstatus alwayslastline "%{= KW}%-w%{= wk}%50>%n %t%{-}%+w%<"
hardstatus string '%{gk}[ %{G}%H %{g}][%= %{wk}%?%-Lw%?%{=b kR}(%{W}%n*%f %t%?(%u)%?%{=b kR})%{= kw}%?%+Lw%?%?%= %{g}][%{Y}%l%{g}]%{=b C}[ %m/%d %c ]%{W}'
startup_message off
vbell off
msgminwait 0
msgwait 10

Si rezultatul este asta

image

Se observa in partea de jos 5 terminale deschise, fiecare fiind denumit dupa procesul executat (ifstat, htop, links).

umount: device is busy

Uneori cand vreau sa fac umount unui volum se mai intampla sa primesc mesaje de genul “device is busy”.

Solutiile sunt multiple si foarte complexe.

A doua solutie est este mai brutala: fortarea procesului de umount cu ajutorul parametrului –f :

umount –f /path/to/mount

In prima instanta aflam procesul care tine lucrurile pe loc cu ajutorul comenzii

fuser : fuser –m /path/to/mount/

Urmeaza o inspectie a procesului cu pidul returnat de fuser si un eventual SIGKILL si un umount bine meritat.

PowerShell Server

Meseriasii de la /n (aia care dadeau stickere moca) au un produs tare fain. Ii zice powershell server si jucaria iti transforma masina windows din o statie unde poti face chestii doar via GUI in una bucata statie accesibila remote via ssh. SSH vine de la Secure shell. Ei bine daca secure e clar ce ne da, shell aici e inlocuit cu PowerShell. In felul asta se poate controla o masina windows doar cu putty! (yes Powershel rocks! )

Desigur exista un mic downside … pretul

http://www.nsoftware.com/powershell/server/

Free e-book: Virtualization Solutions

Am dat azi peste cartulia asta destul de consistenta. In ea sunt prezentate cam toate tehnologiile de virtualizare oferite de Microsoft. E destul de bine scrisa si cel mai important: e gratuita!

Highlights: Hyper-V, System Center Virtual Machine Manager (SCVMM) , App-V, Terminal Services, MED-V, VDI, Roaming User Profiles

https://www.getvirtualnow.com/usevents/education/download/693371eBook.pdf

Enjoy!