Inspired by the project of this year’s SNE students I’ve decided to explore this thing on my own.
Some quick notes on my findings so far:
- Google increases the string length as it runs out of keyspace space. Since late July 2013 they use 6 upper/lower/digits for the unique part of the short URL
- As of December 2014 the fill ratio for the 6 digit keyspace is about 4%.
- I think that the keys are not random and must be generated via some algorithm. I came to this conclusion because the 4 and 5 character keyspaces are completely full, i.e. any lookup returns ‘success’ or ‘removed’ . I assume it would be inefficient to randomly generate the string and then check if the string is not already used. As the keyspace fills up, most of the random strings would have been already used. Also this means that there’s no correlation between the time or the length of the URL.
Sourcecode here: https://github.com/cosu/urlshort
You need mongo running on the localhost to save the results.