Shazam’s Algorithm

Dev Aggarwal
4 min readDec 14, 2022

--

For those of you music listeners out there, I’m sure you have heard of Shazam. Or if you haven’t, I certainly recommend that you do. Regardless, if you’ve ever heard a song being played and instantaneously known that you must have this song, then if you ever want to listen to this song again, it is imperative that you get its name which is when Shazam comes in handy.

Source — Shazam: the app that calls the tune

Exaggerations aside, most music listeners would find Shazam useful in terms of identifying music. However, it also has some more surprising applications as well. The creation behind Shazam was done at a time (2002) when there were not really any algorithms that could be used to accomplish the task at hand. Thus, they created their own algorithm which can also be used for “copyright monitoring at a search speed of over 1000 times realtime” as well as “content-based cueing and indexing for library and archival uses”.

What’s surprising is not the task itself, but the speed with which it is consistently carried out. It usually only takes about 5 seconds for Shazam to identify the song being played, even though there are millions of songs stored in Shazam’s database.

So What’s the Key to Their Success?

“Fingerprinting”. Of course, not to be taken literally but the process does boil down to that one word.

Source: Restore Privacy

Sound is vibration that exists in the form of pressure and displacement. Just as vibrations enter humans’ eardrums and transmit electrical impulses to the brain through the auditory ear nerve, so too do the recording devices of today use the pressure of sound wave to convert it into an electrical signal.

Subsequently, there is much signal processing involved that includes the use of analag-to-digital converter in order to quantify the input in terms of a digital value that represents the amplitude of the signal. The representations of signals as frequencies are used to form a unique digital fingerprint for each song in a library that is preprocessed in this manner. However, “the main challenge here is how to distinguish, in the ocean of frequencies captured, which frequencies are the most important”.

Finally, the “signatures/fingerprints” of each song is stored in a hast table that returns the song name and artist within O(1) time when given the hash tag. This is similar to a designated key that must be used in order to access a specific piece of information.

A simplified breakdown of Shazam is as follows:

  1. A user runs Shazam while a song is playing in the background
  2. Recording the song in play using microphones
  3. Conversion to Frequence Domain (which is crucial)
  4. “Fingerprint” is formed, quantifiable as a “hash tag”
  5. The hash tag searches for a match in a database that has a unique fingerprint for each and every song
  6. If a hash tag matches with multiple songs, this means that a particular instance of the songs found are the same. Matched songs are inserted into a list by descending order of likelihood, and the algorithm searches in the database for the fingerprint of the next instance/second of the song since no song has just 1 fingerprint (it is not the fingerprints of a song that are unique, but its collection of said fingerprints)
  7. The correct song is at the top of the list which has the most matched occurrences
  8. Shazam then notifies the user of the song and artist’s name
Source: Toptal

If you wish to gain a more comprehensive understanding of this highly complex process, then it is best to visit the sources listed at the very bottom.

Eliminating Noise

Now, let’s just try to imagine it. There’s music playing in the background, and you’re loving it. You really want to know the name of that song, so you open up Shazam and press play. Unfortunately, it doesn’t work because there’s other noises coming from 50 other directions and Shazam is on the fritz and can’t tell which is which. Is this part of the song too? Except, that’s not what happens because Shazam does it’s thing too well — somehow taking into account the useless noise that is also inputted into the program. By now, you are probably starting to understand just how amazing Shazam truly is. In short, Shazam filters the input to only recognize the strongest frequencies which is used to match fingerprint.

Shazam is one of many wondrous things all around us, behind which lies great insight. Deciding to go a little deeper into what we’re already familiar with can produce shocking realizations. Owned by Apple today, Shazam was worth about half a billion around the time of its purchase. It’s truly amazing how such intricate solutions can be found to small problems that one day are worth so much!

Sources

  1. https://www.toptal.com/algorithms/shazam-it-music-processing-fingerprinting-and-recognition
  2. https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

--

--

Dev Aggarwal
Dev Aggarwal

Written by Dev Aggarwal

Tennis player, bookworm, programmer that can't wait to learn more, do more

No responses yet