Elliptic Fourier Descriptors

Elliptic Fourier Descriptors (EFDs) are one really good answer to the question: How can I mathematically describe a close contour?

That’s nice, but why would we want to do this?

One good reason, with apologies to Randall Munroe, is that after reading xkcd #26 you probably really wanted to take the Fourier transform of your cat. You and me both! Below, we’ll do exactly that, using the method described by Kuhl and Giardina (1982) in their paper ‘Elliptic Fourier Descriptors with Chain Encodings’.

Image: xkcd by Randall Munroe

Of course this isn’t exactly fair- you’ve already taken the Fourier transform of your cat if you’ve ever saved a picture of it as a jpg or listened to it meow over the phone (listen, I don’t know what you do with your free time, but I’m not here to judge) because Fourier transforms are really popular in both video and audio compression.

Step One: Image Pre-Processing

I’m actually a dog person, so I found my creative-commons licensed cat photo through flickr. To make preprocessing easier, I was looking for a monotone cat that was entirely in the frame.

First I extracted the cat from the background using Preview’s great cropping tool. You could automate this segmentation with a script, but for one image it’s easier to do it by hand.

Next I created a binary version of the segmented image with the Matlab one-liner: imwrite(im2bw(imread(‘cat.png’)), ‘bw_cat.png’). The image is binary because every pixel is either pure black or pure white. This binary image can be thought of as a matrix where every pixel makes up one matrix entry and is either 0 (white) or 1 (black).

Original image: Emiliyan Dikanarov, via Flickr.

Step Two: Extracting the Boundary

Now that we have our binary image let’s grab our contour. At this step we’ll represent the contour as a list of the (x,y) positions of the pixels that make up the boundary of the cat.

From our preprocessing step we assured that all non-zero (white) pixels belonged to our cat, so we can just walk through each column of our matrix until we find the first non-zero entry. This pixel must be on the boundary of the cat! From there, we can just walk around the contour by following neighboring boundary pixels until we’ve found every pixel in the contour. At this point we’ll have a list of (x, y) positions of all the boundary pixels.

But wait!

I claimed that Elliptic Fourier Descriptors were a way to mathematically describe a close contour- why isn’t this description, a list of the (x,y) coordinates of all the pixels in the contour good enough of a mathematical description?

I lied when I motivated EFDs as a great way to take a Fourier transform of your cat.

The real use-case for EFDs is creating features for machine learning algorithms; something like teaching a computer to classify photos of cats from non-cats. What we really need is some way to saying how ‘cat-like’ an object in a photo looks- and that’s why this mathematical representation falls short. This representation isn’t position or scale invariant, and is really susceptible to noise- this means that if I have a more zoomed out photo of the same cat in a different position in the frame or it’s a bit blurry, then my algorithm won’t stand a chance of realizing it’s still a cat.

We’ll see below that all these problems are resolved when we take the Fourier transform.

But first, to make taking the Fourier transform easier, we’ll change out of a coordinate encoding. Instead, we’ll use a Freeman chain encoding, which describes each point in a shape relative to each previous point.

Step Three: Freeman Chain Encodings

A Freeman chain encoding is a piece-wise linear approximation of a continuous curve. Each segment of the continuous curve is approximated by one of eight line segments, denoted by a0 – a7 in the graph below. The line segments with even index have unit length, and the odd line segments with odd with odd index have a length of √2. This will be convenient later on!

For our purposes, we’re encoding a contour in a digital image, which is already a discrete approximation of a continuous curve. Below on the left is an example of the type of curve we might want to approximate; the chart on the right shows its Freeman chain encoding.

Detour: Fourier Series

In general, a Fourier series is a way to approximate a periodic function. A periodic function is just a waveform, and a Fourier transform breaks complex waveforms into sums of simple ones.

We can also use Fourier series to approximate non-periodic functions so long as we specify an interval. Below I’ll show an example of how we can approximate x2 on [-pi, pi]; we do this by setting the “period” of x2 to be [-pi, pi]. We pretend that the part of the non-periodic function we want to approximate is periodic outside of the interval we’re considering it over.

The really beautiful thing about this approximation is that we can make it arbitrarily good, that is, we can make the error between the approximation and the real function as small as we want.

The Fourier series f(x) is described by the equation below- we see that the Fourier series is parameterized entirely by its constituent an’s and bn’s.

This approximation approximates the true function arbitrarily well on any finite interval. If we instead replace the infinite sum with a finite one, from n = 1 to k, this is called the k’th harmonic; the higher the value of k, the better the approximation.

Let’s look at a few harmonics of x2 on [-pi, pi]. The graphs below were generated in WolframAlpha with the command “overlay[{plot FourierSeries[x^2, x, k], plot x^2}]” with k replaced by the number of the harmonic we want to look at (ie, 1, 2, 3).

Notice that the function of the harmonic is written in blue in the legend of the graph. It’s amazing how fast we converge to a good approximation!

Click to embiggen.

Step Four: Fourier Transform and Time Derivative

Okay, back to what we were doing with our cat.

Ask not what you can do for your Fourier series, but what your Fourier series can do for you.

Now that we have the Freeman chain encoding of our cat contour, and are convinced that Fourier series are the way of the future, let’s look at what they can do for us. Below is a really quick explanation of the Kuhl-Giardina 1982 paper.

Our first goal is to find how we can compute those coefficients we talked about in the previous section, an and bn.

First we’ll notice that we can take separate our chain encoding into its x and y projections. We’ll define xp and yp, the projection of the first p links in our chain encoding to be the sum of the differences between all the previous links.

Notice that it’s the exact same story in both the x and y direction, so for ease we’ll just work the x version out below and understand that exactly the same logic will hold for y.

We’ll consider the time derivative of x. When we say “time” here we actually mean the length of the chain, for instance, the “time” contribution of any horizontal or vertical link is 1, and the contribution of any diagonal link is √2. The “time” of the p’th link is just the sum of all the times of the previous links.

The punchline is going to be that we’ll write this derivative in two different ways- one way which we can compute from the chain code, and one way which will involve the coefficients we’re trying to find an and bn. Then we can solve for an and bn.

Time derivative: First Form!

Here ‘n’ is which harmonic we’re on, T is the period (the interval we’re looking at the function on) and alphan and betan depend on the slope of each line segment.

This is very good news, because we can compute everything in this form of the time derivative! Because we’re using a Freeman chain encoding (just piecewise linear segments) the slope of any line segment is either 0, (plus/minus) 1, or (plus/minus) √2.

Time derivative: Second Form!

This is also good for us, because this form includes the coefficients we’re trying to solve for. Solving for an and bn we get:

We know how to compute an and bn! T is the period (the interval), n is the number of the harmonic that we’re looking at, p is the index of the chain link, K is the total number of chain links, (del xp/del tp) is the slope of each link, and tp and tp-1 are the lengths of the chain at the p’th link.

The exact same thing happens in the y direction, so we call the an value in the y direction cn and the bn value in the y direction dn.

Step Five: Elliptic Equation

Phew! We figured out how to find all our coefficients. How’s this going to help us with our cat?

We’ll use these coefficients to modify an ellipse- after all, we basically want a modified ellipse to approximate our closed contour.

Below is the equation that gives the n’th Elliptic Fourier Descriptor.

Notice that this looks a lot like the equation for an ellipse; the xy term comes in so that we can rotate the ellipse.

Another good thing about these coefficients is that they’re the same regardless of how the contour is rotated, what size it is, or where in the frame of the image it is.

Now we’ve got the cat in the bag.

Step Six: Results!

Using the description we found above, we’ll approximate the cat contour that we found. The true contour is given in blue, and the approximation is given in red.

Notice that with 10 harmonics we already have a pretty good approximation, but with 70 harmonics we can fit much finer features.

For real-world applications, like creating features for machine learning, fitting fine features might be not be desirable, because you might be over-fitting to random noise instead your data.

10 harmonics on the left, 20 in the middle, and 70 on the right.

I also decided this was a great opportunity to find the EFD of Randall Munroe.
I grabbed another creative commons image, and found the 70th harmonic of Randall Munroe.

Appendix: Matlab code

Here’s the code you need to replicate everything we did.

Segment the image
Generate the chain code
chain = mk_chain(bw_cat.png’);
[cc] = chaincode(chain);

You’ll need to grab mk_chain.m off of this website first.
Really all it does is make a call to the Matlab function bwtraceboundary
Also be warned that this code often failed for me, if this happens for you just replace the <dir> on the last line (line 36) with <‘SE’>.

Generate the EFD
chain = mk_chain(bw_cat.png’);
[cc] = chaincode(chain);
coeffs = calc_harmonic_coefficients(transpose([cc.code]), 70)

First you’ll need to grab all the m files off of this website.

You can replace 70 with whatever number of harmonic you want to look at. Coeffs will be a matrix with 4 numbers, these are the coefficients a70, b70, c70, d70.

See what your chain code looks like
chain = mk_chain(bw_cat.png’);
[cc] = chaincode(chain);
figure
plot_chain_code(transpose([cc.code]));

chain = mk_chain(bw_cat.png’);
[cc] = chaincode(chain);
figure
plot_chain_code(transpose([cc.code]));
hold
plot_fourier_approx(transpose([cc.code]), 70, 50, 0, ‘r’);

Again, you can replace 70 with however many harmonics you’d like to see.

My Top Secret Messages to Maryam Mirzakhani

If you’re anything like me, you need to send TOP SECRET messages all the time.

Just the other day, I was working on a really hard problem set for my History of Math class, so I decided to ask my good friend Maryam Mirzakhani to do it for me. This, of course, went against my University’s cheating policy, so I needed to be sure that my message was encrypted securely enough that my resourceful and mathematically gifted professor Evelyn Lamb couldn’t read my message and fail me for cheating.

Luckily, by the grace of modular arithmetic, I was able to have a quick exchange with Maryam just in time to hand in my assignment undetected. Below I’ll discuss the rad encryption algorithm Maryam and I used to exchange messages, and the clever but unfortunately unsuccessful algorithms my suspicious professor tried to discover our ploy.

RSA
We decided to encrypt with RSA and pay homage to the best public-key cryptosystem around. RSA is an asymmetric algorithm, which means that the keys of the sender and the receiver are completely independent. Maryam and I needed to independently complete the steps below to exchange encrypted messages.

1) I chose 2 extremely large prime numbers p and q.
I went with my favorite primes, 61 and 11.
2) Set my modulus to be n = p * q, and held on to a value I’ll call ϕ(n) = (p-1)*(q-1)
So for me, n = 11*61 = 671, and ϕ(n) = 10*60 = 160.
3) Chose the exponent “e” for my public key
The number e just needs to be coprime with ϕ(n), a common choice is 216 + 1 = 65,537 but 3 is sometimes just as good a choice.
I chose e = 7, just because I happen to like 7.
4) Found my private key exponent, “d” as the multiplicative inverse of e mod ϕ(n).
That is, find d such that d*e = 1 (mod ϕ(n)).
Normally, you can do this using the extended Euclidean Algorithm.
But I instead used the coveted Wolfram-Alpha algorithm, and found that d = 23.

After these steps Maryam and I each had a public and private key- you can think of these as keys that interchangeably lock and unlock the message. Anyone listening in (like Professor Lamb) can see each of our public keys- this is what allows strangers on the internet to securely exchange messages.

The public key consists of n and e, and the private key is d. My public key was (n = 187, e = 7) and my private key was d = 23 (but don’t tell Professor Lamb!) Maryam broadcast her public key, which was (n = 779167, e = 17).

I want to encrypt my message:
Hi Mimi! How great is the weather in California? Hey, I have a favor to ask…

First I converted the letters in my message into numbers by some publicly known agreed upon encoding, and broke my message into chunks so that the value of each chunk was less than Maryam’s public key value n, again with a publicly agreed upon scheme:
720 010 500 077 001 050 010 900 105 000 330 007 200 111…

I then encoded each chunk into the cypher-text c using Maryam’s public key (n = 779167, e = 17) as: c = me (mod n)
So specifically, c1 = 72017 (mod 779167)
c2 = 1017 (mod 779167) and so on.

I sent these encoded cypher-text chunks to Maryam, who then used her private key d to decode them into the message that I wrote:
m = cd (mod n)

This is because I encoded the cypher-text as c = me (mod n), so when Maryam computed cd, she had actually computed (me)d (mod n) = med (mod n). Recall that Maryam very carefully chose e and d so that e*d = 1 (mod ϕ(n)). This means, thanks to Fermat’s Little Theorem, that med (mod n) is the same as m1 (mod n). Excellent news, this is just my original message! Thanks, modular arithmetic!

We could now securely exchange messages, and for even more security I even left a signature in my message so that Maryam could be sure the message actually came from me.

But not so fast! Professor Lamb noticed that Maryam and I were exchanging mysterious messages, so she took a stab at decoding them.

Pollard’s p-1 Algorithm
RSA is a secure algorithm because it is very difficult to factor large numbers.

Recall that when I sent Maryam a message, I encoded the message m into cypher-text c using her public key (n and e) as:
c = me (mod n)
and she decoded the message using her private key d as:
cd = med (mod n) = m (mod n)
This is secure because, if you remember back to how Maryam chose e and d,
e*d = 1 (mod ϕ(n))

This means that for Professor Lamb to decode the message that I sent to Maryam, she needed to find d. To find d, she needed to know what ϕ(n) = (p-1)*(q-1) was, because you need to know the modulus before you can find the inverse of an element, and to find ϕ(n) she needed to figure out p and q. Therefore, the only thing standing between me and expulsion for cheating is the fact that it’s very hard to factor very large numbers. Notice, however, that all the other information is publicly available- c, e and n can be viewed by everyone.

Professor Lamb decided to try Pollard’s p-1 algorithm to factor Maryam’s public key modulus, n = 779167. She first decided to try the algorithm on a smaller, more manageable number, so she tried n = 5917. Here’s what she did:

1. She chose a positive number B.
Professor Lamb liked the number 5, so she set B = 5.

2. Computed m as the least-common multiple of the positive integers less than B.
m = lcm(1, 2, 3, 4, 5) = 60

3. Set a = 2.
Easiest step ever.

4. Found x = am – 1 (mod N) and g = gcd(x, N)
x = 260 – 1 (mod 5917) = 3417 (mod 5917)
g = gcd(3417, 5917) = 61

5. If g isn’t equal to 1 or N, then you’re done!
Professor Lamb found that 61 was a prime factor of 5917! Slick!

6. Otherwise, add 1 to a and try again. If you’ve already tried 10 times, just give up.
Luckily she didn’t need to use this step, but for a lot of different n’s she probably would have.

Feeling triumphant and confident in Pollard’s p-1 algorithm, Professor Lamb turned to Maryam’s public key modulus, n = 779167. The first 3 steps were the exact same as before, and for step 4 she found:
x = 260 -1 (mod 779167) = 710980
g = gcd(710980, 779167) = 1

Drat! Professor Lamb then had to proceed to step 6, increased a to 3 and try again:
x = 360 -1 (mod 779167) = 592846
g = gcd(592846, 779167) = 1

Double drat! Professor Lamb continued this for approximately 10 steps, and then gave up. (Really I should just be glad that she didn’t try to factor my public key modulus n = 187. Our encryption would have been much more secure if I had chosen much larger primes!)

Luckily for me, Maryam and I chose a secure encryption algorithm. RSA is set up so that to decode the message, you need to know the prime factors p and q of the modulus n. You need p and q so that you can find the inverse of the public key mod (p-1)(q-1), and these public and private key exponents work to encode and decode the message because of Fermat’s Little Theorem.

Professor Lamb tried to decode our secret messages by factoring Maryam’s public key modulus with Pollard’s p-1 algorithm, but unfortunately it did not yield a prime factor. Because finding large prime factors is such a difficult problem, Professor Lamb wasn’t able to read our secret messages, and I got an A on my homework.

Obvious disclaimers!

– I obviously didn’t ask Maryam Mirzakani to do my Math History homework. She’s an incredibly intelligent lady, working on much, much more difficult things, and apparently getting awesome results.

– I obviously don’t endorse cheating and Professor Lamb’s homework is not too difficult. It is just difficult enough 🙂

– Even though I motivated the need for privacy in my silly article with my desire to keep my professor from finding out I was cheating, privacy is obviously very important for a wide range of reasons(possible hyperlink?), and is equally important to protect people who don’t have anything to hide.

– The ascii art image of Maryam Mirzakani is obvious very cool! It was made by my very talented friend Tobin Yehle, who wrote a neat program to translate photos into ascii art.

Where do numbers come from, anyways?

A short history of imaginary numbers

Mathematicians first came up against imaginary numbers in the mid 16th century and it wasn’t until the mid 19th century that they saw how awesome complex numbers could be. Before we look at how imaginary numbers came to be, let’s look at some other familiar number systems.

Number Systems Solve Problems

The first, most obvious, number system is the integers, or counting numbers. We have integers to answer really useful question that we see all the time in day-to-day life like, how many grapes can I really stuff into my mouth at a time? (About 9)

The next number system we might think about is the rational numbers, or fractions. These also serve to answer really useful day-to-day questions like that involve division like, if I have 6 roommates but only 1 pint of ice cream, what portion of the tub can I eat?

This assumes that I’m a fair roommate who would never eat more than her share of the communal ice cream- which leads us to our next number system. Negative numbers are used to measure debt; like how much ice cream I might owe my other roommates.

With these two systems we can count and divide stuff, but we also might have other sorts of problems like how to measure things. Like, for instance, I might need to walk 1 block south and 2 blocks east around a park to get to school, but since I’m inherently lazy (a good quality for all mathematicians), I cut straight through the park, and find that I’ve walked √5 blocks to get to school, which is totally irrational. We have to deal with irrational numbers when we measure distances because it turns out (to the Greeks’ great sorrow) that not all distances can be measured with rational numbers.

Where did they come from, and what are they good for?

We’ve got Real Problems: Imaginary Numbers give Real Results

In the mid 16th century a mathematician named Tartaligia came up with a general solution for finding the roots of 3rd degree polynomial, but he held his method as a closely-guarded secret. Another mathematician named Cardano eventually managed to convinced the reluctant Tartaligia to tell him the method, on the condition that he would never ever tell anyone else. Well, I think they should make a soap opera about 16th century mathematics because in 1545 Cardano completely betrayed Tartaligia by publishing the solution in his book ‘Ars Magna’.

Tartaligia’s method is really important in the history of imaginary numbers because there are some perfectly good 3rd degree polynomials with perfectly good real roots that this method doesn’t make sense for. When you use Tartaligia’s method for these certain polynomials, you get a nonsense step in the middle of the calculation where you have to take the square root of a negative number.

Consider for example the equation:

x3 = 15x + 4.

This cubic has a real root x = 4, but when we apply Cardano’s formula we get:

x = ∛[ 2 + √(-121) ] + ∛[ 2 – √(-121) ]

The real problem (pun intended) was that even though everyone knew that taking the square root of -121 was totally ridiculous, they also knew that the root x=4 was a totally reasonable real solution. There was this breakdown in what the equation was trying to communicate.

The first mathematician to really break through this mold was Rafael Bombelli, who got around this problem with the crazy proposition that, well let’s just imagine that there’s some number that’s negative when we square it. With this assumption he was able to manipulate Tartaligia’s equation, for instance the example above becomes:

∛[ 2 + (√-121) ] + ∛[ 2 – (√-121) ] = (2 +(√-1) ) + (2 –  (√-1))   (**!)

= 4 – 2(√-1) + 2(√-1) – (√-1)2

= 4 (!)

Conveniently, the ‘imaginary’ numbers cancel out, leaving good real roots! Way to take a leap of faith, Bombelli!

** Okay, hold on, what just happened there? Well it turns out (2 +(√-1))3 is :

(2 +(√-1))3 = (2 +(√-1))*(3 + 4i) = (2 + 11i) = 2 + (√-121)

Same goes for ( 2 – (√-1)). Neat.

About a half-century later in 1637, Descartes coined the term “imaginary” when he wrote about roots of nth degree polynomials in his book ‘La Geometrie’. He wrote that these polynomials might have as many as n solutions, but sometimes they have fewer, as some of the solutions are ‘impossible’, ‘improbable’ and ‘imaginary’. He meant it in a demeaning way- like we should be doing ‘real math’, not ‘day-dream math’.

At this point in history mathematicians swept imaginary numbers under the rug; they cautiously imagined that they might exist but only for long enough to cancel out and yield real solutions. It wasn’t conceivable that they might be useful by themselves.

It’s sort of a complex story

The complex number system was really first understood as the incredibly powerful mathematically tool that it is in the 19th century when Gauss took an interest in imaginary numbers. He came up with a geometric interpretation for complex numbers (which, to be fair, was also independently discovered by the Norweigan mathematician Wessel and the French bookstore manager and amateur mathematician Argand). Gauss’ interpretation was that the imaginary number line is just like the real number line, so a complex number (a number with a real and an imaginary part) is actually a coordinate in a plane, like in the image below. We just say that real numbers lay on the horizontal axis, while imaginary numbers lie of the vertical axis.

The really amazing and exciting thing about this description is that it’s totally consistent with operations we might like to do on complex numbers, like addition and multiplication. Consider what happens when we multiply by i, for instance 1*i. We rotate 90 degrees, from the coordinate (1, 0) to the coordinate (0, 1), so we can say that multiplying by i is the same as rotating by 90 degrees. Then consider i*i (which is i2): we rotate another 90 degrees and end up at -1! Neat!

Later in the 19th century complex numbers got a lot of traction because they turned out to be very good at describing waves. At this point in history, physicists were developing ways to describe electricity and magnetism, and complex numbers enabled them to really understand these phenomena.

The neat thing about complex numbers is they show up everywhere in our day-to-day lives. Anything you have that uses electricity only works because some engineer somewhere knew how to build it using their understanding of imaginary numbers. Can you even imagine your life if you couldn’t send your mom photos of other people’s dogs? Any time you snap a photo or make a phone call your phone does a Fast-Fourier-Transform, which is a method based on complex numbers, to compress the data into just tiny amounts of storage.

So do imaginary numbers really exist?

Complex numbers are great representations for lots of natural phenomena, like electricity. Remember how we used our other number systems- like how we used integers to count how many grapes I could fit in my mouth? In some sense, it’s just the grapes that exist, not that the integers. The integers exist mathematically- they’re only there to describe the real world, and this is true for every number system. In this sense, not only do imaginary numbers ‘exist’ mathematically, but they’re first-class citizens because they describe so many awesome things that we use every day.

Sources:

http://www.bbc.co.uk/programmes/b00tt6b2

http://en.wikipedia.org/wiki/Imaginary_number

http://www.storyofmathematics.com/16th_tartaglia.html

http://en.wikipedia.org/wiki/Niccol%C3%B2_Fontana_Tartaglia

http://mathfaculty.fullerton.edu/mathews/c2003/ComplexNumberOrigin.html

http://en.wikipedia.org/wiki/Carl_Friedrich_Gauss

http://en.wikipedia.org/wiki/Caspar_Wessel

http://www.ms.uky.edu/~sohum/ma330/files/eqns_4.pdf

http://www.und.edu/instruct/lgeller/complex.html