Tag Archives: statistics

Riddler Revisited

I’m taking a second crack at FiveThirtyEight’s Riddler problem. I thought I solved the problem two weeks ago, but my answer wasn’t quite right.

Here is this week’s problem:

There’s an airplane with 100 seats, and there are 100 ticketed passengers each with an assigned seat. They line up to board in some random order. However, the first person to board is the worst person alive, and just sits in a random seat, without even looking at his boarding pass. Each subsequent passenger sits in his or her own assigned seat if it’s empty, but sits in a random open seat if the assigned seat is occupied. What is the probability that you, the hundredth passenger to board, finds your seat unoccupied

I’m not even going to try and figure out the math for this puzzle. My statistics skills aren’t nearly strong enough. But my coding skills are. This is an easy problem to solve with a Monte Carlo simulation.

Here’s my python solution.

And the output charted.

nth_passengerThe last passenger has a 50% chance of getting their assigned seat.

passengers_in_assigned

The number of passengers who get their assigned seats is distributed around 95, with a 1% chance of everyone getting their assigned seat (in the event the first passenger randomly chooses his/her own seat).

Interestingly, it doesn’t matter how many seats are on the plane. If there are 10 or 200 seats, the last passenger still has a 50% chance of getting their assigned seat. The second last passenger has  67% chance, and the 3rd last passenger has a 75% chance.

It’s easy to derive the equation for the Xth last passenger from this, even though I can’t figure out the math to prove it. The Xth last passenger has a X/X+1 chance of getting their assigned seat. So the 5th last passenger, has a 5/6 = 83.3% chance of getting their seat.

Math Problems

einstein_math
I’ve noticed a few Grade 6 math problems on Facebook that I’m sure Einstein would be embarrassed to see his picture on. I’m not sure which is sadder, that people consider a problem given to 11-year olds genius-level, or that only 50% of adults got the answer right.

I’m much more interested in the problems that FiveThirtyEight has been posting for their Riddler series. This week the problem involves cars getting stuck in traffic. Thankfully, not something I normally have to deal with, but I think I figured out the answer.

The Problem:

There is a very long, straight highway with some number of cars (N) placed somewhere along it, randomly. The highway is only one lane, so the cars can’t pass each other. Each car is going in the same direction, and each driver has a distinct positive speed at which she prefers to travel. Each preferred speed is chosen at random. Each driver travels at her preferred speed unless she gets stuck behind a slower car, in which case she remains stuck behind the slower car. On average, how many groups of cars will eventually form? (A group is one or more cars travelling at the same speed.)

My solution:

f(N) <- average number of car groups if there are N cars
f(0) = 0
f(1) = 1

If we have 2 cars there is 50% chance of the first car being faster, which would create 2 groups; and 50% chance of the 2nd car being faster and merging into a single group.
f(2) = 0.5 * 2 + 0.5 * 1 = 1.5

More generally, if we just consider the first 2 cars. There’s a 50% chance of there being a solo lead car plus the groups that form behind it, and a 50% chance of the first car merging into the group behind it. We can define the average number of cars recursively, where:
f(N) = 0.5 * (1 + f(N-1)) + 0.5 * f(N-1)

that equation reduces to:
f(N) = 0.5 + f(N-1)

testing it out:
f(2) = 0.5 + f(1) = 1.5
f(3) = 0.5 + f(2) = 2
f(4) = 0.5 + f(3) = 2.5

Which is an arithmetic series that can be reduced to:
f(N) = 0.5 + 0.5 * N

or more succinctly:
f(N) = 0.5 * (N + 1) (for N >= 1)
f(N) = 0 (for N < 1)

Data Nerd: Analyzing the BMO Vancouver Marathon 2014 Results

2014 BMO Vancouver Marathon Finishing Times
Which is better for running – hot and dry or cold and wet? Personally, I’d prefer the heat but statistically it seems that the colder, wetter weather is better for finishing times. Last year’s BMO Vancouver marathon was the hottest in the race’s 42 year history (with temperatures over 20 C). This year it was cool and rainy (never getting above 10 C), but across the board times were faster. The winners were around 3 minutes faster than last year. The median times for men were 9 minutes faster. And there was a less slowdown between the first and second halves of the race, with 3 times more people running negative splits (faster 2nd halves). The only negative changes – less finishers and less Boston qualifiers (not sure why that is).

2013 2014
Finishers 3877 3783
Negative Splits 99 293
Second Half Slowdown 7.7% 5.7%
Fastest – Male 2:24:09 2:21:08
Fastest – Female 2:40:34 2:37:00
Median Time – Male 4:10:28 4:01:38
Median Time – Female 4:29:45 4:26:29
Boston Qualifiers 375 357

Here is the analysis I did last year: 2013 Results Analysis. If I have time I’ll do some more in depth analysis for the half marathon results.

1 million Flickr views!

Flickr Stats Chart
As of today, my Flickr account has received 1 million views. It’s the milestone I’ve been waiting for to cancel my paid account. I have no problem paying $25/year for a great photo storage website, but Yahoo announced two months ago that Flickr would phase out its Pro accounts and transition people to free accounts. Yes, Flickr has coerced me into not paying them. After 8 years and $218, my money is no good.

The new free account is almost as good as the old paid one was. The world will still have access to all my photos and I have the ability to upload as many new ones as I want. Sure there’s a 2 TB limit, but it will take me over 1000 years to reach it at my current upload rate. Realistically, the only downside is I no longer have access to detailed statistics, which is why I waited until today to cancel my account.

I started my Flickr account on July 30, 2005 by uploading this horrible picture of my sister (sorry Kelsey).
Lunch 003

Since then I’ve uploaded 5,335 more photos, most better than the one above. Here are some of the most popular photos on my Flickr account and a collection of my favourites.
MSN Butterfly Ladies Students Studying Plumber's Convention Sugar Cane Summit Salutations Cow Break Gay Cowboy School Girl and School Boy
Diving Board Feet Radiant Heated Sidewalks Scary Fish Sunbathers on Kits Beach Gullfoss Waterfall The Sombrero at Postcards Cafe Frozen Snowflake Rocky Peak

Data Nerd: Analyzing the BMO Vancouver Marathon 2013 Results

Now that my marathon training is over, I thought I’d use some of my free time to analyze the race results.

Registration in the 2013 BMO Vancouver Marathon was capped at 5000 and there were 4958 registered runners, but the results show only 3876 people finished the race – 1710 women and 2166 men. Another 102 people dropped out, including last year’s women’s winner Ellie Greenwood. For the remaining 980 runners, injuries probably forced many of them to drop down into shorter distances or not run at all.

2013 BMO Vancouver Finishing Times By Gender
Finishing times ranged from 2:24:08 to 8:12:33. Half of the men finished under 4:10:00. The median time for women was 4:30:00.

Half Splits Histogram
It was the hottest BMO Vancouver marathon in the race’s 42 year history, which clearly affected most runners. Only 99 finishers (2.6%) ran a negative split (where the 2nd half of the race is faster than the first). Everyone else slowed as the race progressed and the temperatures climbed. For most runners, the second half of the race was 3%-11% slower than the first half.
Half Splits By Finishing Time
The elites ran closer to even splits, but even they slowed by a few minutes. For someone running a 3:30 marathon, the second half of the race averaged 10 minutes slower than the first. For a 4 hour marathon, it was 16 minutes slower. For a 5-hour marathon, it was 27 minutes slower.

Boston Qualifiers - Male Boston Qualifiers - Female
375 marathon runners ran times fast enough to qualifiy for the Boston Marathon in their category. Overall the ladies did a better job qualifying, with 189 running BQ times (11%). 186 men (8.6%) also ran BQ times. The categories that had the most Boston qualifiers were Male 50-54 (43), Female 45-49 (34), Male 45-49 (33), and Female 40-44 (32).

Race results were acquired from SportStats.ca.

Data Nerd: 30 Years of Canadian Elections Charted

XKCD created a fascinating chart of the history of the US Congress. I thought it would be interesting to do something similar for Canada, but our multi-party system and separatist political parties makes it a lot more difficult. I was able to gather the results from all the federal and provincial elections in the past 31 years (my lifetime plus a bonus year), and there are some interesting trends and patterns.


– The Liberals are sometimes called “Canada’s Natural Governing Party”, but in the past 31 years, the Conservatives have been in charge of 46% of the governments (172 combined years). By comparison, the Liberals have governed 30% of the time, the NDP 18%, and other governments (many Conservative-leaning) 6% of the time.

– The NDP have never governed federally, buy have been in control of at least one government every year since 1982. In fact, the NDP have been in government somewhere in Canada as far back as 1969, when Ed Schreyer was elected in Manitoba. Most of those governments have been in Western Canada, with the exception of Ontario in the early 90’s and Nova Scotia today.

– 1984 was the height of Conservative governance in Canada. Brian Mulroney won the largest majority government in Canadian history, 8 provinces had Progressive Conservative governments (9 if you count the Social Credit government in BC), and the Liberals weren’t in power in a single province. That might explain Stephen Harper’s tendency toward Orwellian policies.

– Alberta is the only province with a political dynasty/monoculture. Every other province saw 3-5 changes in governments in the past 30 years.


– Although governments tend to cycle through political parties, support for conservative parties has stayed relatively constant in terms of total votes cast (federal and provincial) – between 7.5 million and 11 million votes. Support for the Liberals and NDP has a very strong inverse correlation (-0.85), meaning they are likely pulling support from the same voters. Combined support for the two parties has been pretty constant over the past 30 years – between 11.5 million and 14.5 million total votes.
Continue reading Data Nerd: 30 Years of Canadian Elections Charted

Vancouver’s Separated Bike Lanes – September Update

Bike Lane in the Rain
A quick update on Vancouver’s separated bike lanes. Last month I wrote about how the lanes were “more popular than ever“, and the trend is continuing. The data for September 2011 is now available, and the bike lanes are still rocking.

Continue reading Vancouver’s Separated Bike Lanes – September Update