PyMIR

I am starting a new open source project for music information retrieval in Python called PyMIR. The project is located on Github: https://github.com/jsawruk/pymir.

Motivation

I started PyMIR after taking a summer workshop on MIR at CCRMA at Stanford. During that workshop, we used algorithms that were coded in Matlab. Over the years I had translated some of these algorithms to other languages and platforms for various research I was doing, but they were not consolidated into a single place. The primary goal is to take these algorithms and have them all available in a centralized location so that I may access them easily. I am open sourcing it so that others may benefit from this as well.

I chose Python for its syntax as well as cross-platform capability. Speed is always a concern, and PyMIR will strive to be optimized. I am not considering a C-extension at this time, though this may be necessary in the future.

Roadmap

PyMIR is essentially broken down into two sections: pymir.audio for audio signal processing, and pymir.music for representational processing (MIDI, music theory etc). Additionally, I plan on having a pymir.api package for interfacing with relevant 3rd party APIs (e.g. Peachnote).

  • pymir.audio
    • pymir.audio.features – extraction algorithms for common MIR-related features
    • pymir.audio.effects – standard DSP audio effects
    • pymir.audio.pitch – F0 and chord estimation
    • pymir.audio.timing – Timing and rhythm features (onsets, tempo)
  • pymir.music

Future

Algorithms by themselves are not very useful; they only become useful with the right data. Initially, I plan to integrate with open-source data sets (such as Peachnote). Longer term, I may consider open-sourcing my own data sets. Regardless, this is a low priority at the moment, but something to consider for the future as PyMIR evolves.

Music Hack Day Presentation

I recently gave a presentation at Lehigh Valley Tech Meetup (January 17, 2012) about Music Hack Days and the Echo Nest API. This post provides additional supplemental materials that I did not have time to cover.

Hack Demos

Boston Music Hack Day 2011 Recap

Attended my fourth music hack day this past weekend in Boston. Met some really awesome people and saw some amazing hacks.

I worked with Robby Grodin from ConductiveIO to create FMA Radio - an online radio for Creative Commons-licensed music from the Free Music Archive. By tapping the power of the Echo Nest API, we not only can search for artists who release CC-licensed material, but we can search for CC-licensed material that sounds similar to artists who do not provide CC-licensed material.

You can search by Artist, Mood, and/or Style. Additionally, you can specify which type of CC license to search by. Besides providing a service to music enthusiasts, FMA Radio also allows users to locate CC-licensed music for their media projects, such as film, tv, and web projects.

We are working with the FMA and WFMU to expand this into an integrated solution within the FMA site. In the spirit of CC, the hack is open sourced and available on github.

FMA Radio on the web:

Mitch Hedberg does Math

A lot of Mitch Hedberg‘s material derives from logical absurdity in the form of non-sequiturs or paraprosdokians, but one quote in particular always reminds me of mathematics:

I remixed a remix and it was back to normal. (source)

Who doesn’t enjoy a good remix? But could you actually remix it again back to the original?

As it turns out, this type of transform is common in mathematics and is called an involute. An involute is any function which is its own inverse.

The simplest example is multiplying by negative one:

f(x) = -x

Take the number 6. f(6) = -6, and f(-6) = 6. Therefore, f(f(6)) = 6. Remixed back to normal!

Some other examples:

  • Math: Complex conjugation. a+bi -> a-bi -> a+bi
  • Cryptography: ROT13(ROT13(x)) = x. Example: ROT13(A) = N, ROT13(N) = A.
  • Music: Transposition by a tritone. C -> F# -> C
  • Clocks: Adding 6 hours on a 12-hour clock, or 12 hours on a 24-hour clock. 3 am + 6 = 9 am + 6 = 3pm. 0300 + 12 = 1500 + 12 = 0300.
  • Geometry: Rotating a plane figure by 180 degrees, and then again by 180 degrees in the same direction moves it through 360 degrees, returning it back to where it began.
Actually, whenever you are performing modular arithmetic with a modulus 2N, then the function

f(x) = x + N
is always an involute. The cryptography, music, and clock examples all work because of this principle. This statement is easily proven because

f(f(x)) = f(x + N) = (x + N) + N = x + 2N
and x + 2N = x mod 2N by definition.

Cardinality of the Rationals

A somewhat surprising result in mathematics is that the set of rational numbers, Q, has the same cardinality as the set of all integers, Z. There are several ways to prove this statement, but I found this method particularly ingenious. This is based on Abstract Algebra and Solution by Radicals by Maxfield & Maxfield (ISBN 978-0-486-47723-7). A related proof can be found here.

In order to prove this statement, we have to put all of the elements of Q into a 1:1 correspondence with all of the elements of Z.

Every rational number (in lowest terms) can be expressed as:

  \frac{a}{b} = x  \\ bx = a

For example, the fraction 4/5 = x is equal to the equation 5x = 4. We now have an equation for every rational number.

We can then show that every equation corresponds 1:1 with an integer. To do this, assume we chose to represent integers in base 13 with the following symbols:

  \{0,1,2,3,4,5,6,7,8,9,x,=,- \}

Then, each equation represents an integer value in this base-13 notation. 5x = 4 equates to:

  5 * 13^3 + 10 * 13^2 + 11 * 13^1 + 4 * 13^0 = 12822

But since every rational number corresponds to an equation, and every equation to an integer, every rational number corresponds exactly to one integer.

Dundee Society

One day I was in Meijer and came across a jar of lime marmalade which had been imported from the UK. Having never seen this before, I decided to try it. Little did I know the full history of this jar.

The marmalade itself (Keiller brand) was first introduced in 1797 in Dundee, Scotland. As a result, this brand and its competitors is known as Dundee Marmalade.

This particular brand of marmalade was a favorite of Lambros Callimahos. Callimahos was a virtuoso flautist who toured the US and Europe during the 1930s. He joined the US Army in 1941, beginning his career as a cryptanalyst. Callimahos went on to write Military Cryptanalytics and taught a course at the NSA on advanced cryptanalysis called CA-400. During this course, Callimahos used these marmalade jars as pencil holders. When he needed a cover for a meeting of course graduates, he chose the name Dundee Society. The course was very unusual, “crammed to overflowing with problems, examples, jokes, stories, special tests, and other surprises” (he even would rehearse his flute!).

Many graduates of the Callimahos course were
fascinated by the wit and cultural ebullitions of the
exalted Guru. This was such an integral part of the
course that members of classes 17 through 20 compiled
a list of 100 questions, culled from the parenthetical
musings of the Guru, which any attentive student
should have been able to answer by graduation day.
Entitled "Dundee Society Introductory Placement
Test," the list includes questions covering a wide
range of subjects: mathematics, physics, history,
gastronomy, music, philosophy, medicine, English,
etymology, geography, language, chess, communica-
tions, cinematology , literature, and biology.

Sources: Wikipedia, NSA.

Stompbox 2011

From July 18 – 22, I attended the CCRMA Summer Workshop on Stompbox design at Stanford University. The workshop was given by Ed Berdahl and Esteban Maestre. The workshop focused on the design and implementation of audio effects for amplified instruments using a platform known as Satellite CCRMA. Satellite CCRMA is a prototyping system consisting of a BeagleBoard and an Arduino.

Topics covered included dynamics-based effects (amplitude modulation, compression/limiting), time-based effects (delay, reverb), filters (wah, flange), and distortion (clipping, saturation).

Course notes can be found here.

Pitch Class Normal Form

Overview

See this code in action here!

One of the common problems in musical set theory involves determining how to pitch class sets (PC sets) are related. One way to determine how sets of the same cardinality (size) are related is to compute the prime form of each set.

For example, the prime form of a C major scale is 7-35 ([0,1,3,5,6,8,10]). The prime form of D major scale is also 7-35. Also, the prime form of the A minor scale is also 7-35. All major and minor scales have 7-35 as their prime form. This relation between different scales is not immediately apparent, making the prime form extremely useful for finding relations among pitch class sets.

The name, 7-35, comes from the catalog that Allen Forte published in his Structure of Atonal Music. 7 denotes that it is a 7-note scale, and 35 denotes that it is the 35th in the list when the list is sorted. The algorithm used here comes from The Structure of Atonal Music, p. 4, as well as from Joseph Straus’ Introduction to Post Tonal Theory, p. 50.

I will soon convert this code to a PC set calculator that lives somewhere on this site. This code is released under Creative Commons Attribution-Noncommercial-Share Alike 3.0 license. It is provided as is with no warranty. If you find any errors, please let me know!

Implementation

For this implementation, I decided to use Python for its functional programming features and because it is already installed on this webserver. I had tried previously to implement this using a procedural language, but did not get very far.

I looked at two other online PC Set calculators (and to verify that my own code worked properly):

Both were implemented in Javascript. The PC Set Calculator used about 3000 lines of code. Granted, there was a lot of whitespace, as well as the PC set catalog and a lot of UI-specific code. Still, the code was considerably larger than SetFinder, which used about 400 lines of code. My Python implementation, which does not include the PC set catalog or any UI-code, comes in at 100 lines.

Programming Notes

I tried to use an associative array to associate the pitch class with its outer interval, but I could not because I implemented the pitch class set itself as a list. Python says that lists are not hashable, and therefore cannot be used as keys in associative arrays.

Instead, I used a tuple format: (a, b), where a is the pitch class set as a list, and b is the outer interval. Here is an example: ([0,1,3], 3)

I made extensive use of list comprehensions, often to create a new list by applying a transform to an existing list. An example of this is the inversion function, which returns a new list by subtracting 12 from each element of the input list.

One last thing, notice how the order of the ternary operator in Python is different from how it is in C/C++.

def rangeDiff(pcs):
    # Return the difference between the first and last items of a list of pitch class sets
    diff = (pcs[len(pcs) - 1] - pcs[0])
    if diff < 0:         diff += 12     return diff def cycleLeft(pcs):     # Return a single cyclic left permutation of a pitch class set     # Ex: [a,b,c] -> [b, c, a]
    cycle = pcs[1:]
    cycle.append(pcs[0])
    return cycle

def cycleDiff(pcs):
    # Given a pitch class set as a list
    # Return a duple (2-tuple) of the form  ([pcs], diff)
    # Example:
    # Input: [0,1,2,3]
    # Output: ([0,1,2,3], 3)
    return (pcs, rangeDiff(pcs))

def removeLast(list):
    # Given a list, remove the last element
    # Unlike pop(), which returns the value of the
    # last element this function returns the
    # modified list instead
    return list[0:len(list) - 1]

def selectSmallest(pcsDupleList):
    # From a list of PCS duples (2-tuples),
    # select the N sets with the smallest
    # outside intervals
    # If N = 1, return
    # Else, recurse until N = 1

    # Use a lambda to sort by the difference
    # the smallest difference will be first
    pcsDupleList.sort(key = lambda x: x[1])

    # Get the value of the smallest interval
    smallestDiff = pcsDupleList[0][1]

    # filter for the N with smallest outer intervals
    pcsDupleList = filter(lambda x: x[1] == smallestDiff, pcsDupleList)

    if len(pcsDupleList) == 1:
        return pcsDupleList
    else :
        subsets = [cycleDiff(removeLast(x[0])) for x in pcsDupleList]
        return selectSmallest(subsets)

def normalForm(pcs):
    # Return the normal form of the input pitch class set
    cycle = pcs
    cycleCount = 0
    cycles = []

    while cycleCount < len(cycle) :
        cycle = cycleLeft(cycle)
        cycleCount += 1

        # Insert into a quasi-associative array
        # cannot use an actual associative array
        # because lists are not hashable
        # Format [([pcs], diff), ... ]
        cycles.append(cycleDiff(cycle))

        if cycle == pcs:
            break

    smallestSets = selectSmallest(cycles)

    if len(smallestSets[0][0]) < len(cycles[0][0]) :         lengthDiff = len(cycles[0][0]) - len(smallestSets[0][0])         # create a list that only has subsets from the cycle list         # that start with the same number element         filterList = [x[0][:len(x[0]) - lengthDiff] for x in cycles]           # look for its position in the list         index = filterList.index(smallestSets[0][0])          # because the lists are sorted, the indexes match         smallestSets[0] = cycles[index]      return smallestSets[0][0] def inversion(pcs):     # Determine the non-transposed inversion of the     # pitch class set by subtracting every element from 12     return [12 - x for x in pcs] def zeroNormal(pcs):     # Convert the normal form so that the first element is 0     # By subtracting the first element from every element     # Add 12 if the result is less than 0     normal = normalForm(pcs)     zeroNormal = [x - normal[0] % 12 for x in normal]     # Note the unusual order for the ternary operator     return [x if x >= 0 else x + 12 for x in zeroNormal]

def primeForm(pcs):
    # To determine the prime form, compute the zeroNormal of the set
    # as well as the zeroNormal of its inversion
    # Select whichever one is more packed to the left (smaller intervals first)
    zeroNorm = zeroNormal(pcs)
    zeroInv = sorted(zeroNormal(inversion(pcs)))

    primeForm = zeroNorm

    # Luckily, we can use Python's built-in list comparison
    # to determine which set is more packed to the left
    if zeroInv < zeroNorm :
        primeForm = zeroInv

    return primeForm

libMFCC

libMFCC is a C library for computing Mel Frequency Cepstral Coefficients. It is open-source  (MIT License) and available on Google Code. The implementation is based on the work of Bob L. Strum et al.

Mel Frequency Cepstral Coefficients (MFCCs) are frequently used for spectral analysis in fields such as speech recognition and music information retrieval.

The basic idea is to compute the normal spectrum using a Fourier Transform, apply mel scaling, and then compute the cepstrum.

Mel scaling is a weighting applied to frequencies to more closely approximate the perception of pitch.

The cepstrum for a signal is then computed using the formula:

  \hat{x}[n] = FFT(x[n]) \\*\\*  cepstrum = FFT^{-1}[\mathrm{log}|\hat{x}[n]|]

Should this be translated into Python?