Last week, we told you about going 8 for 8 in predicting the Round of 16. We also gave you a sneak peek at our predictions for the quarterfinals.

Perhaps in our excitement about using Cloud Dataflow, BigQuery and Compute Engine to arrive at our predictions, we may have been better served by heeding a more simple truth: Gary Lineker once said, “Football is a simple game. 22 men chase a ball for 90 minutes, and at the end, the Germans always win.”

And so it went in the quarterfinals. We gave France a 69% chance, but we were wrong. Germany defeated France 1 to 0. That was the only game that upset our predictions. After going 8 for 8 in the Round of 16, we're now at 11 for 12.

Why did we get Germany - France wrong?
World Cup teams are especially difficult to model because they play so few games together. USA coach Jurgen Klinsmann recently told the New York Times that he sees his players about as often as he sees his barber. If data is the lifeblood of a good model, we suffered for want of more information.

But, we know that in the same environment, others fared better in their predictions (h/t Cortana; their model relies more on what betting markets are saying, whereas ours is an inductive model derived from game-play data).

So, why did we get Germany - France wrong? In the first four games of the World Cup, France took more shots than Germany, had more shots on target, and their shots were from a more “dangerous location” (that is, closer to the goal). This information complements actual goals to form an ‘expected goals’ statistic in our model.

Moreover, in the first four games, Germany allowed their opponents to take more dangerous shots, and thus the expected goal statistic was higher for their opponents. And, it allowed their opponents to pass better in their third of the field. In the Germany-France game, France actually outshot Germany with 13 shots vs. 8 for Germany, and 9 vs. 6 on-target. With a little more luck on their side, they may have pulled ahead.

What about the semi-finals?
Enough commentary. Here’s our predictions for the next round:

  • Brazil vs. Germany: Germany (59%)
  • Netherlands vs. Argentina: Argentina (61%)

These predictions are based on the results of the quarterfinal games in addition to the previous World Cup games. Our model doesn’t take into account the fact that Brazilian striker Neymar is out of the tournament with a back injury, so the scales might actually tip a bit further in Germany’s favor.

- Posted by Benjamin Bechtolsheim, Product Marketing Manager