I was asked this interview question for SomeCompany in NY Bethpage.
Well, I said I do not know how it works, but I know what it does.
Why is this so important at all? Why do you want to know this?
This is important because it evaluates your core computer science Data Structures and Algorithm skills.
As you all know what it does, it takes the long URL (Uniform Resource Locator) and converts it to a short URL. I know there would be at-least couple of questions in your mind.
How does the browser know to navigate to the long URL when a short URL is browsed?
Ans: When a short URL is generated by the tinyurl website, with a unique identifier, the browser, hits the registered tinyurl domain and it then redirects the tinyurl to the long URL with a UNIQUE IDENTIFIER.
How does the tinyurl.com website generate the tinyurl for the give long URL? In other words, how is the Unique key being generated and how does the browser recognize the long url with its respective identifier?
Ans: The interviewer wasn’t really keen in the above Q & A, but instead he was more interested in this one.
There are 3 ways to do so.
Given a URL, a Alpha-Unique key can be generated by using some programatic function such as Math.random() and associate with the URL, and store it in the DB as key-value pairs. If the URL is already present in the DB, then do not call the Math.random(), instead just return the same URL as a key-value pair.
The problem with this approach is there are 2 overhead operations. First is check if the Unique key that is programmatically generated, already exist’s in the DB, and the other is creating a Unique ID using Math.random() if it doesn’t exist. DB table lookup is a expensive operation. You have to search a key (for its existence) in the entire table. Nevertheless, you can increase the speed of the table search by indexing the column.
The other way to solve this problem is by not using any programatic functions to create a Unique Index. Let the DB do this job for you. The DB table will create a unique alpha numeric id for each entry of the requested tinyurl. You can then index the unique id column for faster retrieval.
The problem with the above approach is you still have to call the DB, which is an expensive operation. Now the question that would have come to your mind. What is the possible way that you could generate a unique identifier. ?
Have you ever heard about encryption and decryption ? May be about MD5 or CRC ? Or about 2 way Hashing algorithm ? If so, then you are a rockstar. Well for those who are unaware of these concepts, it is important to know it. Given a key (long URL), the md5 hashing algorithm will encrypt/generate a unique value (short/tinyurl). When this unique value is decrypted, it will return you the original key (long URL). It always returns you a unique value (tinyurl) no matter what long URL you ask for.
There is no major problem as such, which was discussed above. You can create your own encryption/decryption hashing algorithms. Ensure that there you properly garbage collect unreferenced objects in your code. Always keep in mind about performances.
I hope you’ve enjoyed my post.