Category Archives: Big Problems

Who knew that an unlikely friendship and a few games of cricket with one of the greatest mathematicians in the early 20th Century could lead to a breakthrough in population genetics?

Today, it is almost commonplace for us in the scientific community to accept the influence natural selection and Mendelian genetics have on one another, however for the majority of human history this was not the case. Up until the early 1900s, many scientists believed that these concepts were nothing more than two opposing and unassociated positions on heredity. Scientists were torn between a theory of inheritance (a.k.a. Mendelian genetics) and a theory of evolution through natural selection. Although natural selection could account for variation, which inheritance could not, it offered no real explanation on how traits were passed on to the next generation. For the most part, scientists could not see how well Mendel’s theory of inheritance worked with Darwin’s theory of evolution because they did not have a way to quantify the relationship. It was not until the introduction of the theorem of genetic equilibrium that biologists acquired the necessary mathematical rigor to show how inheritance and natural selection interacted. One of the men who helped provide this framework was G.H. Hardy.

G. H. Hardy. Image: public domain, via Wikimedia Commons.

G. H. Hardy. Image: public domain, via Wikimedia Commons.

Godfrey Harold (G.H.) Hardy was a renowned English mathematician who lived between 1877-1947 and is best known for his accomplishments in number theory and for his work with the another great mathematician, Srinivasa Ramanujan. For a man who was such an outspoken supporter of pure mathematics and abhorred any practical application of his work[5], it is ironic that he should have such a powerful influence on a field of applied mathematics and help shape our very understanding of population genetics.

How did a pure mathematician come to work on population genetics? Well it all started with a few games of cricket. Whilst teaching at the University of Cambridge, Hardy would often interact with professors in other departments through friendly games of cricket and evening common meals [1]. It was through these interactions that Hardy came to know Reginald Punnett, cofounder of the genetics department at Cambridge and developer of Punnett Squares, which are named for him, and developed a close friendship with him[13].

Punnett, being one of the foremost experts in population genetics, was in the thick of the debate over inheritance vs. evolution. His interactions with contemporaries like G. Udny Yule, made him wonder why a population’s genotype, or the genes found in each person, did not eventually contain only variations, known as alleles, of a particular gene that are dominant. This was the question he posed to Hardy in 1908, and Hardy’s response was nigh on brilliant. The answer was so simple that it almost seemed obvious. Hardy even expressed that “I should have expected the very simple point which I wish to make to have been familiar to biologists’’ [4]. His solution was so simple in fact that unbeknownst to him, another scientist had reached the same conclusion around the same time in Germany [17]. In time, this concept would be known as Hardy-Weinberg Equilibrium (HWE).

In short, HWE asserts that when a population is not experiencing any genetic changes that would cause it to evolve, such as genetic drift, gene flow, selective mating, etc., then the allele (af) and genotypic frequencies (gf) will remain constant within a given population (P’). To calculate the gf for humans, a diploid species that receives two complete sets of chromosomes from their parents, we simply look at the proportion of genotypes in P’.

0 < gf < 1

To calculate the af, we look at the case where either the gene variation is homozygous and contains two copies of the alleles (dominant—AA || recessive—aa) or heterozygous and only has one copy of each allele (Aa). P’ achieves “equilibrium” when these frequencies do not change.

Hardy’s proof of these constant frequencies for humans, a diploid species that receives two complete sets of chromosomes from its parents, is as follows[1][4]:

If parental genotypic proportions are p AA: 2q Aa: r aa, then the offspring’s would be (p + q)2: 2(p + q)(q + r): (q + r)2. With four equations (the three genotype frequencies and p + 2q + r = 1) and three unknowns, there must be a relation among them. ‘‘It is easy to see that . . . this is q2 = pr” 

Which is then broken down as:

q =(p + q)(q + r) = q(p + r) + pr + q2

Then to:

q2 = q(1- p – r) – pr = 2q2 – pr     ——->   q2 = pr

In order to fully account for the population, the gf and af must sum to 1. And, since each subsequent generation will have the same number of genes, the frequencies remain constant and follows either a binomial or multinomial distribution.

One important thing to keep in mind, however, is that almost every population is experiencing some form of evolutionary change. So, while HWE shows that the frequencies don’t change or disappear, it is best used as a baseline model to test for changes or equilibrium.

When using the Hardy-Weinberg theorem to test for equilibrium, researchers divide the genotypic expressions into two homozygous events: HHο and hhο. The union of each event’s frequency ( f ), is then calculated to give the estimated number of alleles (Nf). In this case, the expression for HWE could read something like this:

Nf = f(HHο)  f(hhο)

However, another way to view this expression is to represent the frequency of each homozygous event as single variable, i.e. p and q. Using p to represent the frequency of one dominant homozygous event (H) and q to represent the frequency of one recessive homozygous event (h), gives the following: p = f(H) and q = f(h). It then follows that p² = f(HHο) and q² = f(hhο). By using the Rule of Addition and Associative Property to calculate the union of the two event’s frequencies, we are left with F = (p+q)². Given that the genotype frequencies must sum to one, the prevailing expression for HWE emerges when F is expanded:

Fp² +2pq + q² = 1

Using this formula, researchers can create a baseline model of P’ and then identify evolutionary pressures by comparing any subsequent frequencies of alleles and genotypes (F) to F. The data can then be visually represented as a change of allele frequency with respect to time.

HWE represents the curious situation that populations experience when their allele frequencies change. This situation is realized by first assuming complete dominance, then calculating the frequency of alleles, and then using the resultant number as a baseline with which to compare any subsequent values. Although there are some limitations on how we can use HWE—namely, identifying complete dominance, the model is very useful in identifying any evolutionary pressures a population may be experiencing and is one of the most important principles in population genetics. Developed, in part, by G.H. Hardy, it connected two key theories: the theory of inheritance and the theory of evolution. Although, mathematically speaking, his observation/discovery was almost trivial, Hardy provided the mathematical rigor the field sorely needed in order to see that the genotypes didn’t completely disappear and, in turn, forever changed the way we view the fields of biology and genetics.


  1. Edwards, A. W. F. “GH Hardy (1908) and Hardy–Weinberg Equilibrium.”Genetics3 (2008): 1143-1150.
  2. Edwards, Anthony WF. Foundations of mathematical genetics. Cambridge University Press, 2000.
  3. Guo, Sun Wei, and Elizabeth A. Thompson. “Performing the exact test of Hardy-Weinberg proportion for multiple alleles.” Biometrics(1992): 361-372.
  4. Hardy, Godfrey H. “Mendelian proportions in a mixed population.” Science706 (1908): 49-50.
  5. Hardy, G. H., & Snow, C. P. (1967). A mathematician’s apology. Reprinted, with a foreword by CP Snow. Cambridge University Press.
  6. Pearson, Karl. “Mathematical contributions to the theory of evolution. XI. On the influence of natural selection on the variability and correlation of organs.”Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character(1903): 1-66.
  7. Pearson, K., 1904. Mathematical contributions to the theory of evolution. XII. On a generalised theory of alternative inheritance, with special reference to Mendel’s laws. Philos. Trans. R. Soc. A 203 53–86.
  8. Punnett, R. C., 1908. Mendelism in relation to disease. Proc. R. Soc. Med. 1 135–168.[PMC free article] [PubMed]
  9. Punnett, R. C., 1911. Mendelism. Macmillan, London.
  10. Punnett, R. C., 1915. Mimicry in Butterflies. Cambridge University Press, Cambridge/London/New York.
  11. Punnett, R. C., 1917. Eliminating feeblemindedness. J. Hered. 8 464–465.
  12. Punnett, R. C., 1950. Early days of genetics. Heredity 4 1–10.
  13. Snow, C. P., 1967. G. H. Hardy. Macmillan, London.
  14. Stern, C., 1943. The Hardy–Weinberg law. Science 97 137–138. [PubMed]
  15. Sturtevant, A. H., 1965. A History of Genetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
  16. Weinberg, Wilhelm. “Über vererbungsgesetze beim menschen.” Molecular and General Genetics MGG1 (1908): 440-460.
  17. Weinberg, W. “On the demonstration of heredity in man.” Boyer SH, trans (1963) Papers on human genetics. Prentice Hall, Englewood Cliffs, NJ(1908).

Figure: Wikimedia Commons

Elliptic Fourier Descriptors

Elliptic Fourier Descriptors (EFDs) are one really good answer to the question: How can I mathematically describe a close contour?

That’s nice, but why would we want to do this?

One good reason, with apologies to Randall Munroe, is that after reading xkcd #26 you probably really wanted to take the Fourier transform of your cat. You and me both! Below, we’ll do exactly that, using the method described by Kuhl and Giardina (1982) in their paper ‘Elliptic Fourier Descriptors with Chain Encodings’.


Image: xkcd by Randall Munroe

Of course this isn’t exactly fair- you’ve already taken the Fourier transform of your cat if you’ve ever saved a picture of it as a jpg or listened to it meow over the phone (listen, I don’t know what you do with your free time, but I’m not here to judge) because Fourier transforms are really popular in both video and audio compression.

Step One: Image Pre-Processing

I’m actually a dog person, so I found my creative-commons licensed cat photo through flickr. To make preprocessing easier, I was looking for a monotone cat that was entirely in the frame.

First I extracted the cat from the background using Preview’s great cropping tool. You could automate this segmentation with a script, but for one image it’s easier to do it by hand.

Next I created a binary version of the segmented image with the Matlab one-liner: imwrite(im2bw(imread(‘cat.png’)), ‘bw_cat.png’). The image is binary because every pixel is either pure black or pure white. This binary image can be thought of as a matrix where every pixel makes up one matrix entry and is either 0 (white) or 1 (black).


Original image: Emiliyan Dikanarov, via Flickr.

Step Two: Extracting the Boundary

Now that we have our binary image let’s grab our contour. At this step we’ll represent the contour as a list of the (x,y) positions of the pixels that make up the boundary of the cat.

From our preprocessing step we assured that all non-zero (white) pixels belonged to our cat, so we can just walk through each column of our matrix until we find the first non-zero entry. This pixel must be on the boundary of the cat! From there, we can just walk around the contour by following neighboring boundary pixels until we’ve found every pixel in the contour. At this point we’ll have a list of (x, y) positions of all the boundary pixels.


But wait!

I claimed that Elliptic Fourier Descriptors were a way to mathematically describe a close contour- why isn’t this description, a list of the (x,y) coordinates of all the pixels in the contour good enough of a mathematical description?

I lied when I motivated EFDs as a great way to take a Fourier transform of your cat.

The real use-case for EFDs is creating features for machine learning algorithms; something like teaching a computer to classify photos of cats from non-cats. What we really need is some way to saying how ‘cat-like’ an object in a photo looks- and that’s why this mathematical representation falls short. This representation isn’t position or scale invariant, and is really susceptible to noise- this means that if I have a more zoomed out photo of the same cat in a different position in the frame or it’s a bit blurry, then my algorithm won’t stand a chance of realizing it’s still a cat.

We’ll see below that all these problems are resolved when we take the Fourier transform.

But first, to make taking the Fourier transform easier, we’ll change out of a coordinate encoding. Instead, we’ll use a Freeman chain encoding, which describes each point in a shape relative to each previous point.

Step Three: Freeman Chain Encodings

A Freeman chain encoding is a piece-wise linear approximation of a continuous curve. Each segment of the continuous curve is approximated by one of eight line segments, denoted by a0 – a7 in the graph below. The line segments with even index have unit length, and the odd line segments with odd with odd index have a length of √2. This will be convenient later on!


For our purposes, we’re encoding a contour in a digital image, which is already a discrete approximation of a continuous curve. Below on the left is an example of the type of curve we might want to approximate; the chart on the right shows its Freeman chain encoding.


Detour: Fourier Series

In general, a Fourier series is a way to approximate a periodic function. A periodic function is just a waveform, and a Fourier transform breaks complex waveforms into sums of simple ones.

We can also use Fourier series to approximate non-periodic functions so long as we specify an interval. Below I’ll show an example of how we can approximate x2 on [-pi, pi]; we do this by setting the “period” of x2 to be [-pi, pi]. We pretend that the part of the non-periodic function we want to approximate is periodic outside of the interval we’re considering it over.

The really beautiful thing about this approximation is that we can make it arbitrarily good, that is, we can make the error between the approximation and the real function as small as we want.

The Fourier series f(x) is described by the equation below- we see that the Fourier series is parameterized entirely by its constituent an’s and bn’s.


This approximation approximates the true function arbitrarily well on any finite interval. If we instead replace the infinite sum with a finite one, from n = 1 to k, this is called the k’th harmonic; the higher the value of k, the better the approximation.

Let’s look at a few harmonics of x2 on [-pi, pi]. The graphs below were generated in WolframAlpha with the command “overlay[{plot FourierSeries[x^2, x, k], plot x^2}]” with k replaced by the number of the harmonic we want to look at (ie, 1, 2, 3).

Notice that the function of the harmonic is written in blue in the legend of the graph. It’s amazing how fast we converge to a good approximation!


Click to embiggen.


Step Four: Fourier Transform and Time Derivative

Okay, back to what we were doing with our cat.

Ask not what you can do for your Fourier series, but what your Fourier series can do for you.

Now that we have the Freeman chain encoding of our cat contour, and are convinced that Fourier series are the way of the future, let’s look at what they can do for us. Below is a really quick explanation of the Kuhl-Giardina 1982 paper.

Our first goal is to find how we can compute those coefficients we talked about in the previous section, an and bn.

First we’ll notice that we can take separate our chain encoding into its x and y projections. We’ll define xp and yp, the projection of the first p links in our chain encoding to be the sum of the differences between all the previous links.


Notice that it’s the exact same story in both the x and y direction, so for ease we’ll just work the x version out below and understand that exactly the same logic will hold for y.

We’ll consider the time derivative of x. When we say “time” here we actually mean the length of the chain, for instance, the “time” contribution of any horizontal or vertical link is 1, and the contribution of any diagonal link is √2. The “time” of the p’th link is just the sum of all the times of the previous links.


The punchline is going to be that we’ll write this derivative in two different ways- one way which we can compute from the chain code, and one way which will involve the coefficients we’re trying to find an and bn. Then we can solve for an and bn.

Time derivative: First Form!


Here ‘n’ is which harmonic we’re on, T is the period (the interval we’re looking at the function on) and alphan and betan depend on the slope of each line segment.

This is very good news, because we can compute everything in this form of the time derivative! Because we’re using a Freeman chain encoding (just piecewise linear segments) the slope of any line segment is either 0, (plus/minus) 1, or (plus/minus) √2.

Time derivative: Second Form!


This is also good for us, because this form includes the coefficients we’re trying to solve for. Solving for an and bn we get:


We know how to compute an and bn! T is the period (the interval), n is the number of the harmonic that we’re looking at, p is the index of the chain link, K is the total number of chain links, (del xp/del tp) is the slope of each link, and tp and tp-1 are the lengths of the chain at the p’th link.

The exact same thing happens in the y direction, so we call the an value in the y direction cn and the bn value in the y direction dn.

Step Five: Elliptic Equation

Phew! We figured out how to find all our coefficients. How’s this going to help us with our cat?

We’ll use these coefficients to modify an ellipse- after all, we basically want a modified ellipse to approximate our closed contour.

Below is the equation that gives the n’th Elliptic Fourier Descriptor.


Notice that this looks a lot like the equation for an ellipse; the xy term comes in so that we can rotate the ellipse.

Another good thing about these coefficients is that they’re the same regardless of how the contour is rotated, what size it is, or where in the frame of the image it is.

Now we’ve got the cat in the bag.

Step Six: Results!

Using the description we found above, we’ll approximate the cat contour that we found. The true contour is given in blue, and the approximation is given in red.

Notice that with 10 harmonics we already have a pretty good approximation, but with 70 harmonics we can fit much finer features.

For real-world applications, like creating features for machine learning, fitting fine features might be not be desirable, because you might be over-fitting to random noise instead your data.


10 harmonics on the left, 20 in the middle, and 70 on the right.

I also decided this was a great opportunity to find the EFD of Randall Munroe.
I grabbed another creative commons image, and found the 70th harmonic of Randall Munroe.


Appendix: Matlab code

Here’s the code you need to replicate everything we did.

Segment the image
imwrite(im2bw(imread(‘cat.png’)), ‘bw_cat.png’)
Generate the chain code
chain = mk_chain(bw_cat.png’);
[cc] = chaincode(chain);

You’ll need to grab mk_chain.m off of this website first.
Really all it does is make a call to the Matlab function bwtraceboundary
Also be warned that this code often failed for me, if this happens for you just replace the <dir> on the last line (line 36) with <‘SE’>.

Generate the EFD
chain = mk_chain(bw_cat.png’);
[cc] = chaincode(chain);
coeffs = calc_harmonic_coefficients(transpose([cc.code]), 70)

First you’ll need to grab all the m files off of this website.

You can replace 70 with whatever number of harmonic you want to look at. Coeffs will be a matrix with 4 numbers, these are the coefficients a70, b70, c70, d70.

See what your chain code looks like
chain = mk_chain(bw_cat.png’);
[cc] = chaincode(chain);

See what your EFD looks like compared to your chain code
chain = mk_chain(bw_cat.png’);
[cc] = chaincode(chain);
plot_fourier_approx(transpose([cc.code]), 70, 50, 0, ‘r’);

Again, you can replace 70 with however many harmonics you’d like to see.

Convergence of a Divergent Series and other Tests

In class we have been toying with the idea of classifying diverging infinite series, such as the sum: Σ k (k = 1,) = 1 + 2 + 3 + … which, as we add it up, continues on to infinity. We also messed around with some series notations and came to a conclusion that the sum adds up to -1/12. Now, I have no intention to claim, nor prove, that it equals -1/12. In fact, I would like to do the opposite; I would like to show that it in no way converges. If an infinite series converges, that means it sums up to a real number s: Σ ak = s. Likewise, an infinite series diverges if the sum of the series equals ±∞ or it does not add to any value. There are a few methods I will use to prove that this series diverges, and these methods can also be used to determine whether any infinite series converges or diverges.

The first test I want to look at is the Root Test. In this test, we take the sequence ak raise it to the power 1/k, and take the limit of that as k→∞. If this limit is greater than 1, the series diverges; if the limit is less than 1, it converges. For the sum of all integers, that gives us limit as k→∞ of k(1/k) . As k→∞, limit of k(1/k) → 1. Unfortunately the limit equals 1, which means it is inconclusive; there is a chance that this does indeed sum up to -1/12. Let’s leave the score at 0-0.

I will now put the series through the Ratio Test. It is called the Ratio Test because we take a ratio of the k+1 term divided by the k-th term and take the limit of that as k→∞: the limit of ak+1/ak. Similar to the Root Test, if the limit is greater than one it diverges, and if the limit is less than 1 it converges. For the dubious series which is under examination, that limit (k+1)/k, and as k→∞ the limit equals 1. Once again, this test is inconclusive. The score is still 0-0.

Now it’s time to get serious; so far every test I’ve done has been disappointingly empty of an answer. In this test if a series bk > ak and bk converges, then ak must also converge and if bk < ak and bk diverges, then ak diverges as well. My claim is that Σ k (k = 1,∞) diverges and now I will compare it with a series Σ 1 (k = 1,n) = n. We can see that 1+1+1+… grows much more slowly than 1+2+3+… thus we can use it as a comparison. By another test, the Term Test, if the series converges then lim an = 0. The series must diverge then if the limit doesn’t equal 0. The limit of 1 as n→∞ = 1 ≠ 0, thus the sum of 1 on (k=1,∞) diverges. Since Σ 1 (k = 1,∞) < Σ k (k = 1,∞), the latter series must also diverge. Now the score is 1 for divergence, and 0 for -1/12.

While it can now be seen that Σ k (k = 1,∞) does indeed diverge, the comparison test relies on having knowledge that a similar infinite series will either converge or diverge. In this case, I have to know that Σ 1 (k = 1,∞) diverges before I can compare it to the original series. I used the Term Test to show that Σ 1 (k = 1,∞) diverged, but I can also use it to show that Σ k (k = 1,∞) diverges. The limit as n→∞ of k = ∞, and by the Term Test diverges. The problem with the Term Test is that the limit ak can be equal to zero, but the series can still diverge. Therefore, this test is only useful if the limit does not equal 0 or to insure that a converging series does indeed converge.

By both the Term and Comparison tests, I was able to show that Σ k (k = 1,∞) diverges and is not equal to -1/12. In class though, we determined that the infinite series itself doesn’t sum to a negative number, there is no possible way that adding large positive whole numbers together would result in a negative rational number, but rather -1/12 represented a value or categorization of the infinite sum Σ k (k = 1,∞). How did this number come about? The idea is definitely not of “fringe mathematics” and has some excellent arguments. The idea stems from Σ (-1)k (k = 0,∞) = 1 – 1 + 1 – 1 + … The mathematician Srinivasa Ramanujan gave this sum a “value” of ½, since it seems to jump between one and zero equally, with ½ as the average.

The second sum Σ k(-1)k-1 (k = 1,∞) = ¼ as this sum added to itself equals Σ (-1)k (k = 0,∞). That is 1 – (2 – 1) + (3 – 2) + … = 1 – 1 + 1 – 1 + … Finally, Σ k (k = 1,∞) – 4(Σ k (k = 1,∞)) = 1 + (2 – 4) + 3 – (4 – 8) + … equals 1 – 2 + 3 – 4 + … = Σ k(-1)k-1 (k = 1,∞), or rather -3(Σ k (k = 1,∞)) = ¼ Σ k (k = 1,∞) = -1/12.

Ramanujan discovering the value -1/12 for the the infinite series ak = k. Image: Srinivasa Ramanujan via Wikimedia Commons.

The specific stipulation given is that the series must go to its limit. The partial summations of any of these series will produce a number unlike the total summation. For the majority of mathematics, to say Σ k (k = 1,∞) = ∞ makes more sense and requires significantly less head-scratching than -1/12.

There is one more test for convergence that I did not talk about, as the infinite series I was examining did not apply, and that is the Integral Test. For Σ ak, the function f(k) = ak. The reason why I could not apply it is because it only works with positive, non-increasing functions bounded on [1,∞). If the integral of f(x) is a real number, then the series converges, whereas if that same integral diverges, then the series diverges. The first case I will look at is Σ 1/k (k = 1,∞), where f(x) = 1/x and is a positive, non-increasing function. The integral of 1/x = log(x) evaluated from 1 to ∞. This integral turns out to diverge, and therefore the series also diverges. The second case is similar: Σ 1/k2 (k = 1,∞), where f(x) = x-2. The integral of 1/x2 = -(1/x) evaluated over the same interval. The sum of these two bounds is 0 – (-1) = 0 + 1 = 1. Since the indefinite integral converges, the series also converges.

These tests help determine whether a series converges or not. I used them to prove with basic mathematics that Σ k (k = 1,) diverges, rather than converging on a negative rational number. While the values given to divergent indefinite series can provide an idea of how they relate to each other, they require a fair amount of assumption and a lot of counterintuitive work to calculate. It is far easier and more practical to state that Σ k (k = 1,) diverges.

Sources: (Foundations of Analysis by Joseph L. Taylor)

Proofs Perceived

I hope to impart some extremely awesome facts on the intrinsic elements of being a mathematician and knowing mathematics by explaining, broadly, the fundamentals of proofs. I will do this by first giving you the basics and then move to more interesting theorems/proofs. For simplicities’ sake all information contained in this blog post is from the book “The Nuts and Bolts of Proofs” by Antonella Cupillari and I have included the page numbers at the end of the sentences.

Let’s start with if/then statements. If/then statements are like statements of a scientist who happens to states his hypothesis when he first starts talking only to end on some conclusion that was dependent upon his hypothesis (11). The tricky part of proving if/then statements is that some of the information might not be explicitly stated (12). For instance, sometimes people will ask you how long it takes to get a college degree. The amount of fluctuation in the length of time it takes differs greatly because of different requirements and different people, but it is implied that the person is looking for average length of time. This implied part can slip you up when trying to do these proofs because you might need to use information that is implied. For example, let’s say you were given an assignment to prove that two is prime. The original statement if two satisfies the condition of being prime then it is prime, will very likely not state that you can use all the properties of real numbers. You could accidentally see things being implied that aren’t being implied. If/then statements are a basic statement in mathematics and fundamental to many theorems and proofs.

All proofs fall into one of two categories, or a mixture of these categories (12-29). The categories are indirect proof and direct proof. Direct proof is like performing a calculation or computing a problem in most math classes. You just carry out a series of logically valid calculations to come to a final answer. You are given a beginning claim that you want to show leads to some particular conclusion. Essentially, you want to move from point “A” to point “B”, and in order to do so in a direct proof you carry out a series of logical steps. Direct proofs are used when there is enough information in the hypothesis to come to the conclusion (12). Indirect proofs are a bit more complicated than direct proofs. Indirect proofs involve altering the hypothesis in order to prove a different statement that is logically equivalent to the original hypothesis (22). A specific example of indirect proof is proof by contradiction. Proof by contradiction is assuming the hypothesis to not be true, and if this leads to a contradiction, then the hypothesis must be true.

Negation of quantifiers is also fundamental to proofs. I will identify the quantifiers in bold followed by an example and another example of its negation. All of this information on quantifiers can be found on page twenty-five from the book “The Nuts and Bolts of Proofs” by Antonella Cupillari as previously mentioned earlier. There are six quantifier negations that you just have to know. The first is at least one, as in there is at least one mathematician in our class (Hint: Lamb). The negation of this statement would be none of our classmates are mathematicians. The second one you have to know is some. Today some new myths will be made. The number of new myths made today was none. If you assert that all birds in America drink water, the negation of this statement is there is at least one object in the collection that does not have that property. Specific to the example given, the negation would be there is at least one bird in America that doesn’t drink water.

The fourth quantifier that you need to know is none. None of the boys in Salt Lake City like to eat cake. There is at least one boy in Salt Lake City that likes to eat cake. Fifth is there is no unicorn. There is at least one unicorn. The sixth is every object in a collection has a certain property. Every book at the University of Utah is worth reading. There is at least one book in the University that is not worth reading.

Moving on to the theorems and special proofs. By special proofs I mean specific versions of direct and indirect proofs. Theorems are general truths that are proven to be true by a series of logical steps, hence their connection to proofs (50). Existence proofs involve either an algorithm as a proof or an argument that proves the existence of the object in question and are often in the category of theorem (58). Just in case anyone doesn’t know what an algorithm is, it is a series of steps that are carried out sequentially. Also, an argument is a claim that is usually supported by reason and evidence, but as religion has shown, neither reason nor evidence is needed to argue—just some claim that causes controversy is necessary.

Another special type of proofs is the uniqueness proof, which shows that the steps/algorithms to show that the object exists were unique (61). Or by assuming that the object isn’t unique and then showing that the other object with the same properties isn’t a different object, and by constructing a valid argument you are able to prove uniqueness (61).

If and only if proofs involve the proof of two separate if/then proofs in order to prove the if and only if statement (35). An example of this is necessary because it is difficult to explain in words. “I will checkout more books if and only if I can read the books I have” would require the proof of “if I will checkout more books then I can read the books I have” and the proof of “if I can read the books I have then I can checkout more books.” Once you have done this then you have proven the statement “I will checkout more books if and only if I can read the books I have.” This makes more sense when you think about the fact that both checking-out more books and reading the books I already have must both be true in order to get a true statement. If either is false then the statement is false but if both are false then the statement is true (35). In this relationship checking-out books and reading the books I already have must follow each other in their truth value in order to get a true statement (35).

Proof by induction is also another cool, interesting tool in the world of mathematics and it is of great utility. All information on mathematical inductive proofs can be found on pages 48-58.  The first step in a proof by induction is to prove that the hypothesis is true for the smallest base case. Then you assume that it is true for some arbitrary value. After that you deductively show that it is true for an arbitrary value plus an additional defined magnitude. If you said for any natural number there is always a number greater in magnitude, the first thing you would do is show that it is true for the smallest case. In this instance the smallest case is one with two being greater. Then you would assume that an arbitrary number represented by some variable, usually “n”, has the same property. After that you would show that your hypothesis is true for the “n” plus one case. The steps taken in a proof by induction are awesome because it gives you the ability to conquer the infinite with a logical domino effect.

You now know the basics of proofs—which are if/then statements, negations, direct proofs, indirect proofs, and quantifiers. You also know some more interesting aspects of proofs such as proof by induction, if and only if proofs, existence proofs, and uniqueness proofs. How is this information important to your life?

Using Math to make Change: The Hull-House Statistics

In the late 1800’s, thousands of immigrants came to the United State seeking for “The American Dream”. Factories boomed as they flooded into cities such as Chicago and New York. But with so many incoming desperate workers, factory owners realized that they could demand long hours for little pay and the foreign immigrants would comply because they had no other option. They couldn’t speak English, they could be easily replaced, they didn’t know of any other life and so they obeyed. This was the era that Jane Addams was born into.

Jane Addams was born in September of 1860 to a very affluent family in Northern Illinois. Jane grew up wanting to make a difference. She read stories by Dickens and watched her mother care for the poor of her neighborhood. In this environment, she decided that she wanted to become a doctor so that she could work among the poor. Unlike many women of her time, Jane went on to get a higher education at the Rockford Female Seminary and the Women’s Medical College of Philadelphia. She was unable to complete her degree due to health problems, but decided that there were other things she could do to help the poor. In 1889, she created Hull-House, a settlement house in Chicago, Illinois [1].

Now, you may be wondering what all of this has to do with math. Well, in that day and age, there was a prevalent idea that those who were poor lived in such conditions because of their physical inability to do better. They believed that you were poor because you were “stupid” and you couldn’t do anything to help that. As Jane lived at Hull-House among the poor of Chicago, she quickly realized that this idea was false. She grew to understand and love these people and saw that they could rise to higher positions in life if they were just given the opportunity. But to do so would require institutional change. And she, as a woman, could not even vote. She understood that for this change to come about, she would need support of the government and policies put into place to protect these immigrants. In order for her to gain the support needed for this change, she would need quantitative data.

In 1895, she co-authored The Hull-House Maps and Papers. This document provided a statistical analysis of the surrounding blocks of Hull-House. To gain this data, the women of Hull-House went door to door and asked a survey of the families who lived inside. These questions included questions such as: how many families lived in each apartment, how many were in each family, how long had they lived in the United States, did they speak English, where did they work, what was their wage, etc [2]. This data was then summarized and quantified into a series of color-coded maps. These maps divided the city blocks according to the wage earned by the family and their nationality. Each apartment was then color-coded so that the data could be easily seen and understood.

Wages Earned Map No. 1. Image: Thomas Y. Crowell & Co via The Newberry Library.


After collecting this data and compiling it into The Hull-House Maps and Papers, the women of Hull-House presented the information and data they had gathered to Congress. The data they gathered would eventually lead to new laws instating maximum working hours, minimum wage, and child-labor protection. This was one of the first times quantitative data was used to prove theories and change the political structure of an institution in this way. It was “a new type of Science” [3]. Through this statistical approach, these women were able to redefine what the general populace believed was the cause of poverty.

I think that there are many times that we get so caught up in the mathematical equations and the proofs that we distance ourselves from the social differences that can be brought about by it. This methodology of quantitative use of data is still used to make differences. The research of a pharmacist student to discover a new medicine and cure a disease. The survey of a teacher amongst his/her students to figure out what more he/she can do to better their educational experience. The analysis of a city-wide opinion on what to do with an area so that their community can be bettered. Each of these examples uses statistical data to bring about change. Each of us have within us the opportunity to do so. We just need to find a problem, grasp at the answers needed and go for it.





Method of Exhaustion

When asked to find area of any given triangle, anyone will be able to solve it with the known formula for area of a triangle which is (1/2)(base)(height). Anyone can do the same with a square because the formula for the area of a square is (base)(height). This is basic geometry/algebra that students learn and is part of the curriculum. If you were asked to solve the area of a curved object, would you be able to solve for the area without using Calculus? Would you be able to find the area without using any type of formula? How can the area be solved for any object with curves, like a circle? Greek mathematicians used a technique called the method of exhaustion, which is precursor to Calculus. This method is no longer commonly used to solve problems but this method if very similar to Riemann Integration. The idea of this method is to take an approximation of an object repeatedly to end up with an area, which would otherwise be difficult to find.

The method of exhaustion was developed by Greek mathematicians in order to find the area of a shape. At the time this method was used, there was no known formula for the area of a circle. This method helped find a formula for the area of a circle that we use today. What was needed was to find an approximation for pi. They did not know the constant they needed to multiply by. The way the Greeks would use the method of approximation was to inscribe a polygon with a known area inside the circle. A square inside of a circle is how this might look like if you are trying to imagine it. The next step would be to add a side to the square, to create a pentagon inside the circle.  With this process we could solve for the area of the pentagon and then we will know some of the area of the circle. But there is still the area of outside of the pentagon and inside of the circle that is still unknown so we would then repeat the process. The next step would be to add another length to the pentagon to create a hexagon inside of the circle. The Greeks would use this process over and over again until they got the n-polygon close enough to the circle. If this process is correctly constructed, the difference in area between the n-polygon inscribed and that of the circle will become really small. This would result in a more accurate estimation of the area of the circle. The possible values for the area of the circle are systematically exhausted by the polygon.

The method of exhaustion using polygons was used to solve the area of a circle. Image: Leszek Krupinski, via Wikimedia Commons.

It is important to remember that Greeks did not have the same tools we have today. How were they able to construct such accurate polygons that were inscribed within the circles? Well the answer is that they used a straight edge and a compass. This would only work for specific polygons. “In 1837, mathematician Carl Gauss was able to prove that a n-polygon could be constructed with a compass and straight edge if and only if n is the product of a power of 2 and any number of the Fermat primes m = 22k,” wrote Chelsea E. DeSouza in The Greek Method of Exhaustion: Leading the Way to Modern Integration. The first polygons that would able to be constructed would be 3=2+1, 4=22, 5=22+1, 6= 2(2+1), 8 =23, 10=2(22+1), 12=22(3), 16=24, 24+1. Not all polygons were able to be constructed with a compass and straight edge.

Though it is a long process, it is possible to find the area of a circle with polygons inscribed within. Can the same be done with other curved shapes, like parabolas? Parabolas can be inscribed with polygons like triangles to evaluate the area of the parabola. In the third century, Archimedes came up with 24 propositions that were about parabolas. At the end of these propositions, there is a proof that says, the region inside of the parabola and a line is 4/3 the area of a triangle inscribed inside the parabola. Archimedes split up the region inside the parabolas into triangles to solve for the area.

Solving the area of a parabola without integration would look like this. Image: Public domain, via Wikimedia Commons.


Using integration to find the area under a curve could be difficult at times but when you realize what the Greeks developed to find the area, integration seems like a quicker method. The method of exhaustion is a process that took some time to solve. This method is easy to learn because it is just using the area formulas for polygons.


Secrecy in Plain SIght

Security is a vastly important facet of our day to day lives.  It is so ubiquitous, few people recognize that they use it every day. However, if you ever visit a webpage that has “https” in the URL, you are using secure technology.  Most of the early security protocols require that two parties have a secret exchanged between them.  Unfortunately, if someone is listening to the key exchange, they can easily decrypt all of the messages sent between the two parties.  You may find yourself asking, “But isn’t there some better way to exchange secrets?” You’d be right!

The method is called Diffie-Hellman key exchange, and it relies heavily on modular arithmetic.  The amazing thing about this protocol is that the two parties, Alice and Bob (in keeping with the cryptographic tradition of keeping names simple) can even publish their communications in the newspaper.  Anyone can be listening to these two exchange their keys.

Here’s how it works:

First, Alice and Bob pick two numbers, p and g.  P is a prime number, and g is a primitive root mod p.  A primitive root mod p has a very special property, namely that every number that is coprime with p is congruent to some power of g[1].  For example, if we pick p=5, 3 is a primitive root mod p because:
1,2,3,4 are relatively prime to 5.
30 ≡ 1 mod 5
31 ≡ 3 mod 5
32 ≡ 4 mod 5
33 ≡ 2 mod 5

As you can see, each number that is relatively prime to 5 is represented by some power of 3.  Therefore, 3 is a suitable primitive root mod 5.

For this example, Alice and Bob will decide on p=17 and g = 7.  They can pick these numbers from a well-known list.  What’s more, they can even display these numbers in public.  These will form the foundation of their key.  Next, Alice and Bob both pick their own numbers and keep them secret.  We’ll call Alice’s secret number a, and Bob’s number b. (Original, right?)

Next, Alice determines ga mod p, and Bob determines gb mod p.  We’ll call these c and d, respectively.  Now, Alice and Bob will exchange their values.  Again, this can take place in the open.
Now, Alice has d, and Bob has c.  Next, Alice performs da mod p, and Bob performs ca mod p.  This has the net effect of giving both Alice and Bob the same number.  They are now both in possession of gab mod p.  What’s more, any attacker could have watched the entire exchange, and wouldn’t have gotten anything out of it.


Secrecy in Plain Sight. Image: A.J. van Hinck, via Wikimedia Commons.

Back to the example:

Alice chooses 3 as her secret number.
Bob chooses 5 as his.
Alice’s transport number is 73 mod 17 ≡ 3
Bob’s transport number is 75 mod 17 ≡ 11
Alice gets 11, and performs 113 mod 17 ≡ 5
Bob gets 3 and performs 35 mod 17 ≡ 5

Alice and Bob now have the same number, which they can use for regular cryptography.  Obviously, in a real situation, Alice and Bob would have chosen much larger numbers, but this suffices for an example.

This is the foundation of cryptography – relying on operations that are easy to perform, but are nearly impossible to reverse.  In this case, exponentiation can be performed on a computer in a reasonable amount of time.  It’s not the fastest algorithm, but it’s far faster than performing the reverse of a modulo operation.  (Incidentally, that problem is called a discrete logarithm, and there is currently no way that it can be done in a reasonable amount of time.)

You may wonder, “I really want to steal nuclear secrets!  Is there some way that I can still eavesdrop on people using Diffie-Hellman?”  You’d be right!  Diffie-Hellman is vulnerable to what is known as a “man-in-the-middle” attack.  This is difficult, as it requires being present at the time the keys are exchanged.  However, the attack is simple.  All that needs to be done is that some attacker Eve (short for eavesdropper) intercepts Alice and Bob’s messages.  When Alice sends Bob her key, Eve steals it, and substitutes her own key.  Bob then responds, and Eve steals his response.  She then gives her own key to Alice.  Now, there are two unbreakable passwords, and Eve has both of them.  However, there is a limitation: Eve needs to be present for the entire conversation, or Alice and Bob will immediately know something is wrong.  This is because the key that Alice has is not compatible with the key Bob has.  Suddenly, all of their communication will be reduced to gibberish, and they will know that they are compromised.  However, it is possible for Eve to hijack their conversation, leaving none the wiser.

We may ask, why not use RSA?  Well, we do!  If we combine RSA with Diffie-Hellman, we gain protection against eavesdroppers and hijackers.  Diffie-Hellman is faster than RSA and can encode larger messages than RSA.  (The messages sent by RSA are limited in size, lest it become easy to crack.)  In addition, using these two approaches combined gives us the ultimate goal of cryptography:  Perfect Forward Secrecy!  If the secret numbers used by Alice and Bob are discarded, nobody can read the messages exchanged between the two, ever.  That’s right, 30 years down the line, when someone digs up Alice’s old machine out of some attic in Nebraska, they won’t be able to read the messages she shared with Bob.  This is the real advantage of Diffie-Hellman, and the reason that it will likely remain in use for a very long time.