So, somewhat randomly, I wanted to investigate the "diversity" of Purdue. I came across this website: http://www.purdue.edu/datadigest/pages/students/stu_int_country.htm , which lists the number of international students from each country. Not surprisingly, China (24.2% of all international  students), India (22.9% of all international  students), and "Republic of Korea" (13.9% of all international students) lead the way for international students' backgrounds. [If you're wondering, Taiwan (4.4%) and Malaysia (3.6%) are the next two best representative countries at Purdue. This means only 31% of the international student population at Purdue are NOT from the aforementioned five countries (China, India, Korea, Taiwan, and Malaysia).]

So (naturally?...maybe for an engineer like myself) I was wondering whether any correlation exists between a country's GDP per capita and its Purdue student enrollment numbers. So I began by plotting Purdue enrollment versus Country GDP (PPP), as per http://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)_per_capita . The result was the following graph (PDF version here):



In the plot, Yellow = China, Orange = India, Green = Korea, Red = Taiwan, and Purple = Malaysia. Any country whose student enrollment was 1 was removed from the data set, since this is not enough data to make a good extrapolation (i.e. only countries with 2 or more Purdue students were included in this analysis).


An attempted linear fit trend line to the data shows that there exists no strong correlation (in fact, the correlation is actually negative (negative slope), contrary to what one might hypothesize...seems like a slam-dunk case against the thesis that more money = better chance for higher education (at least at Purdue), right? Right?

...Not so fast, Buster. It is obvious from the above plot that China and India are clear outliers - there is a huge number of students from these countries, yet these two countries are also relatively poor (both have GDP of < $7000 USD per capita). What's the dealio, Coolio?


Well, it also "just so happens" that China and India are the two LARGEST countries in the world. So there is a larger "student population" to begin with. What's more impressive, a country with 100 people, wherein 20 students decide to enroll at Purdue, or a country with 20 people, wherein all 20 people decide to enroll at Purdue (hypothetically speaking)?

Consequently, I performed a "normalizing" procedure, based off the country population data from: http://en.wikipedia.org/wiki/List_of_countries_by_population

Essentially what I did is I normalized the Purdue student enrollment figures by the population of China - e.g. if a country has 50% of the population of China, then its raw Purdue student enrollment number will be doubled (to equalize China's "advantage" of a greater population). 

In effect, this produces a "probability of person going to Purdue" that is independent of country population. 


So what happens when we plot normalized Purdue student enrollment vs. country GDP? Well, it's your lucky day (PDF version here):
 
 
Again, in the plot, Yellow = China, Orange = India, Green = Korea, Red = Taiwan, and Purple = Malaysia. In addition, Cyan = Norway (rich country, but low student population), Singapore = Pink, and Gray = Kuwait.


A power trend line fit to the data shows a much better correlation (which can be immediately seen - data points seem to fall along a linear trend (however, note the axes are logarithmic)).

Many have probably heard the maxim "correlation does not indicate causation" and this is true. I have shown that X (country GDP) correlates fairly well with Y "chance of student going to Purdue, country population taken into account". However, this can mean that X causes Y, Y causes X, or that some outside influence, Z, causes BOTH X and Y. These are all equally probable in a strict mathematical sense, but general intuition argues/dictates that a higher GDP causes/should lead to a correspondingly higher Purdue student enrollment.

Questions? Comments? Concerns?

P.S. From the last plot, we can extrapolate the following:
 
1) Per population, Koreans are the most likely to go to Purdue. They are 15.5 times more likely to enroll at Purdue than a Chinese. These dudes (and dudettes) care about their education! (They can also afford it.)
2) Taiwanese are the next most likely to go to Purdue. They are 
10.6 times more likely to enroll at Purdue than a Chinese.
3) Canadians are 2.3 times more likely to go enroll at Purdue than is a Chinese citizen.
4) Citizens from Turkey, Netherlands, Thailand, and Ireland are all equally likely to enroll at Purdue than is a Chinese citizen. These countries simply have fewer citizens, so the raw numbers don't look as impressive.
5) Citizens from Israel, Japan ,Denmark, and France are all roughly 50% less likely to enroll at Purdue than is a Chinese citizen.
3) Only 36% of the countries are more likely to enroll its citizens at Purdue than China. This means China is near the higher quartile (or so) in terms of "probability of student/person in country going to college at Purdue", but is NOT the best (Korea is).

EDIT: I finally buckled down and figured out to calculate Spearman's Rank Correlation Coefficient and it came out to be r=0.48 (n=101). Corresponding t-value was 45.4, and degrees of freedom, df = 101 - 2 = 99. Looking at a t-distribution table (like here) shows that the corresponding p-value is p < 0.001

 Hence this correlation between a country's GDP per capita and "probability of student going to Purdue, country population taken into account" is statistically significant (less than 0.1% probability that chance alone can account for these differences, well below the
5% threshold).