Thursday, August 26, 2010

Win Probability

One of my favorite baseball pages on the web is the win probability graphs at fangraphs. Simply put, these graphs show the probability of either team winning the chosen game after each play, based on historical results of identical situations. For example the home team is losing by two runs at the end of the 6th inning will win approximately 20% of the time. However if they are down two at the end of the 7th, they only win 15% of the time.

Here is the graph for today’s big comeback against the Braves. As you can see things were not looking good for the Rockies. It was a near lock until the Rockies scored three in the 5th to make it more manageable. Of course the real swing happened in the 8th inning, when Carlos Gonzalez hit the game tying single. Click on the photo to link to more details about the game.

Braves @ Rockies - Wednesday, August 25, 2010

The accompanying play log provides more detailed insight. I have copied an abbreviated version here with the details of the 8th inning.

J VentersS Smith80___8-10K
J VentersC Iannetta81___8-10BB
J VentersM Mora811__8-101B
J VentersE Young Jr.8112_8-10FC, 4-6
J VentersD Fowler821_38-10BB
J VentersC Gonzalez82


K FarnsworthT Tulowitzki821_311-101B
K FarnsworthT Helton8212_12-101B
K FarnsworthM Belisle821_312-10K

Notice that as each runner got on base, the win expectancy (WE) slowly creeped up. Naturally it went down as each out was made. Of course the big blow was Cargo’s single which increased the probability of of Rockies win from 24.9% to 61.2%. This is also indicated by the win probability added (WPA) column which is .363. It should be noted, that while the three base runners who reached ahead of Cargo did not increase the win expectancy very much, they did each push up the leverage index (LI) so that Cargo’s at-bat had the significance that it did. In other words, Cargo’s hit was the critical play. However, Iannetta, Mora (replace by Young on the fielder’s choice), and Fowler reaching base set up his big chance. Clearly while those plays did not have the impact of Cargo’s single, it could not have happened without those other three guys getting on base.

In case you’re curious here are win probability graphs for some other famous games in Rockies history.

Wednesday, August 4, 2010

Performance Pie

When comparing performance of different players, it can be easy to get overwhelmed by different numbers. So I’ve decided to take a more visual approach to evaluation. By using pie charts showing the six possible outcomes (walk, 1b, 2b, 3b, hr) for a batter and the percentage of plate appearances that each occurs, you can get a good idea of what a player has really done at the plate. Outs are represented by red, while the positive events are in various shades of yellow or green. Raw totals are shown along with the percentages that each event occurred. I’ve done this for this season’s performance (through Aug. 1) for all Rockies players with at least 100 PA’s this year. It should be noted that these charts do not necessarily predict future performance, only what has happened. (Click on the pics to enlarge.)

cg is
cb mo
tt th
df ss
bh rs
mm jh
jg ci

Some interesting things appear when the data is viewed in this way. Jason Giambi has the biggest slice of good events. This directly corresponds to having the highest on base percentage on the team. (The higher the OBP, the more good pie.) What is somewhat surprising as that the higher value events (2b, hr) occur less frequently then you might expect. However, there is still enough green and yellow pie there to not be considered punchless. Carlos Gonzalez’ chart shows that while he hasn’t done something good as often as some of his teammates, the value of what he has done has been very big. With a big chunk of orange, yellow, and green Carlos has clearly done a lot of damage. This also brings new insight to some position battles. Jonny Herrera and Clint Barmes have very similar proportions of red on their chart, however Clint has more green and yellow to Jonny’s orange. In other words, Clint’s advantage in the power department clearly comes through. Similarly, Brad Hawpe’s bigger cream section (walks) doesn’t quite measure up to Seth Smith’s bigger orange and dark green sections.

There are many different options that could be done with these. You could break outs down into strikeouts and outs in play, which would give you a rough idea of who is getting himself out and who is being put out by defenses. You could also have a chart for different splits, over careers or single seasons. I would really like it if others started using this approach to demonstrate player performance. Perhaps one of the big baseball websites that has the technology to do so can include these pie charts along with player profiles, to update with their stats. After all, there’s nothing wrong with having another tool to help us gain insight into player performance.

Wednesday, June 23, 2010

Was Chris Iannetta Afraid to Swing the Bat?

Before Chris Iannetta’s surprising demotion earlier this season, he had come under fire for not being aggressive enough. In particular, by one the Rockies’ tv commentators in a game about a week before being sent down. As a fan of patient hitting, I was pretty ok with Chris not chasing a low fast ball on the outside corner, that would have surely turned into a 4-6-3 double play if he had offered at it. The commentator was a lot quieter when Chris ended up with a base hit.

Having seen a lot of criticism about Chris being too patient, it got me wondering if the perception was true. Was he afraid to swing the bat? Looking at the following tables from Fangraphs, the answer up to this year was clearly NO.

Season O-Swing% Z-Swing% Swing% Outside Zone Total
2006 17.30% 75.90% 48.80% 175 203 378
2007 17.90% 70.60% 46.20% 432 502 934
2008 16.20% 72.30% 44.00% 853 836 1689
2009 16.70% 72.30% 45.60% 709 765 1474
2010 18.80% 68.70% 44.30% 69 72 141
Total * 16.80% 72.20% 45.30% 2239 2377 4616


Season O-Swing% Z-Swing% Swing%
2006 23.50% 66.60% 46.10%
2007 25.00% 66.60% 45.90%
2008 25.40% 65.40% 45.90%
2009 25.10% 65.90% 45.20%
2010 28.30% 63.90% 45.10%


The first of these tables shows Chris’s swing percentage outside and inside the zone (noted by O-Swing% and Z-Swing% respectively) as well as the total percentage of pitches swung at. The second part of the first table shows the number of pitches seen outside, inside, and total. This is all based on data up to Chris’s demotion.

Compare the first table to the second, which contains Major League averages of swing percentages over each year of Chris’s career. You’ll notice that Chris has been consistently good at not chasing pitches out of the strike zone, which shouldn’t surprise anyone. What me be surprising to some is that Chris is more aggressive on balls in the zone than the average major leaguer. It should be noted there was a slight decrease in pitches swung at in the zone. However, because of the sample size this means that he swung at only two fewer pitches than he normally would have. From Jim Tracy’s view that may have been all he needed to see, even if the stats don’t show the same urgency.

The next question we have to ask is “Has Chris become more aggressive since his recall?” The following table shows Chris’s swing rates since his recall.

O-Swing% Z-Swing% Swing% Outside Zone Total
30.07% 68.87% 46.19% 55 104 159

It practically jumps off the page. That outside zone swing percentage has gone up to over 30%. So Chris has become more aggressive, but not in a a good way. Strangely enough, his walk rate has been higher than usual at 21%. His K rate since then, also at 21%, is close to his career norm. Coincidentally or not, the one thing missing is the power. Chris has only 1 double in 36 pa’s since being recalled. Of course it’s probably too early to make any real conclusions out of that.

This is something worth following over the course of the year. Something tells me Chris won’t develop from a guy who has been criticized for being too patient, to someone who doesn’t see any pitches he doesn’t like.

Saturday, April 24, 2010

Rain Out

First off, I want to say how unfortunate it is that Rockies President, Keli McGregor, passed away earlier this past week. From what I know of him, he was very nice and sincere person. All the best to his family and friends.

It is somewhat fitting that we have dreary weather in the forecast for this weekend. Tonight's game got postponed due to rain/cold and will be made up tomorrow as part of a true doubleheader. I must say that I was disappointed to have the game called right I was getting to my seat, but I am pretty excited to go to the doubleheader tomorrow. I don't recall ever going to a doubleheader before, so this will be my first. I only hope neither of the games get rained out. Otherwise, I'll look forward to watching 'em play two.

Sunday, April 18, 2010


I'm pretty wound up right now, but who could blame me after this happened. Looking back I wonder if people realize how special this is. Obviously, anyone should realize that it any no hitter is a big deal. It gets even bigger to throw the first in the history of your team, for any club. What really makes this special, is how miserable the Rockies pitching staff had been through most of the team's history. Until the past few years, the pitching staff was notoriously awful. Part of the improvement may be due to bringing in the humidor, but a large part of that was simply lack of pitching talent. Who can forget the immortal Jamey Wright or David Nied, and that legendary bullpen crew of Steve Reed, Darren Holmes, and Mike "Moonshot" Munoz? I certainly can't.

We've come a long way. We've gone from hoping the starter could hold the other team to less than 5 runs, to expecting quality starts every time out, and believing that a few members of the staff could throw a no-hitter. Now, it's actually happened. I for one won't forget Ubaldo's performance. More than that I won't forget the road the Rockies organization has traveled to have a starting pitcher who is even capable of throwing a no hitter, let alone actually doing it.

Saturday, April 17, 2010

Panic Time?!?!

This is always sort of a frustrating time for me to be a baseball fan. Every year it seems people get hysterical when someone gets off to a slow start, when they just need to relax and let things develop. There are a lot of examples of people jumping to conclusions base on small samples. Your centerfielder is hitting .190? Bench him? After 37 AB’s, probably not. Your team is playing .500 ball after 10 games, so turn the whole roster over? No. The team has scored 51 runs in those 10 games, and has scored at least 4 runs in 9 of those 10 games, so make drastic changes to the lineup? I don’t think so! The solution is to have some patience and let everything settle, if you will.

Earlier I read Dexter Fowler was a “liability” in the lineup, because of his .189 batting average, so I’ll use him as an example. Dex’s batting average has come in 37 at bats, which is pretty obviously not very many. So how many is enough to actually worry? We can build a simple hypothesis test for a player’s batting average based on his current average, and his number of at bats. Given Dex’s ability to get on base (which is the real thing we care about, and deserves more analysis later), he needs to bat at least .250 to be a useful part of the lineup. If I’m Dan O’Dowd/Jim Tracy I’m going to want strong evidence that he’s not before I hit the panic button. Assuming at bats follow a typical binomial pattern, we test the hypothesis that the player is a .250 hitter after n at bats. It turns out that the number of AB’s that a player batting .189 can have before we feel truly confident that he’s not at least a .250 hitter is 111. (I’m more than willing to explain my math, if anyone asks.) That means Dexter only has 74 more AB's to get his average above .190. Don’t worry, something tells me he’ll do it.

Thursday, November 5, 2009

The Best Bandbox

Recently I was downloading some data for a project that I have planned relating to park factor, when I had a moment of inspiration. It occurred to me that one could get a rough idea of how easy it is to hit a home run, at a given stadium, simply by finding the percentage of batted balls that were home runs. In other words, dividing the number of home runs by the number of AB's where the batter did not strike out would yield a home run rate. The higher the rate, the easier it is to hit home runs. The equation is simple and looks like this:


Before doing this, my belief was that Coors field would not ave the highest rate. I also had a suspicion that a certain stadium would have the highest rate. So I ran the numbers, looking at both the home team's and away teams' home run rate for each stadium in 2009. Here are the results:

NYYYankee Stadium III5.96%4.61%5.30%
TEXRangers Ballpark in Arlington5.80%4.02%4.87%
PHICitizens Bank Park5.02%4.42%4.71%
MILMiller Park4.83%4.55%4.69%
CHAComiskey Park II4.77%4.03%4.40%
CINGreat American Ballpark4.46%4.20%4.33%
TAMTropicana Field4.97%3.70%4.31%
BOSFenway Park5.38%3.26%4.30%
BALOriole Park at Camden Yards4.15%4.42%4.29%
LAAAngel Stadium of Anaheim4.01%4.53%4.27%
DETComerica Park4.26%3.98%4.12%
MINHubert H. Humphrey Metrodome4.22%3.87%4.04%
ARIChase Field4.07%3.82%3.94%
COLCoors Field4.59%3.30%3.93%
FLADolphin Stadium3.99%3.74%3.87%
MLB Average4.00%3.63%3.81%
CHNWrigley Field3.87%3.74%3.80%
HOUMinute Maid Park3.57%3.84%3.71%
WASNationals Park3.57%3.57%3.57%
SEASafeco Field3.50%3.51%3.51%
OAKNetwork Associates Coliseum3.25%3.07%3.16%
PITPNC Park3.39%2.92%3.15%
SDGPetCo Park3.05%3.16%3.11%
SFGAT&T Park3.11%3.07%3.09%
CLEJacobs Field3.10%3.05%3.07%
KANKauffman Stadium2.87%3.01%2.94%
LADDodger Stadium3.18%2.68%2.94%
NYMCiti Field2.22%3.61%2.92%
ATLTurner Field3.18%2.55%2.87%
STLBusch Stadium II3.06%2.41%2.73%

As you can see, the new Yankee Stadium comes out on top, with 5.3% of batted balls hit here turning into home runs. So that's it, the new Yankee Stadium is the easiest place to hit it out. Coors Field, as I guessed, was not really an easy place to hit home home runs. Unfortunately it's not that simple. These results may have more to do with each team's ability to hit home runs, and of their pitching staff's inability to keep the ball in the yard. So a team with lot of power and poor pitching is likely to score high on this list.

So in order to adjust for a team's ability, a new value must be found. The first step that I took was to recalculate the above table for each team while on the road. The following table shows these rates:

MLB Average3.63%3.83%3.73%

In this table it can be seen that the Phillies had the highest rate of batted balls becoming home runs. This was largely due to their ability to hit home runs at a high rate. While on the road, 5.1% of batted balls by the Phillies became home runs. While at home, only 5.02% of their batted balls were home runs. Their opponenets did benefit by playing in Philly, with 4.42% of batted balls at the Bank becoming home runs and only 4.09% becoming home runs in Phillies' away games.

The next step is to divide the data in the two tables to determine the increase (or decrease) in rate of home runs to batted balls when a team is in it's home park. If there is an increase in thee ratios when playing at home, then playing in that stadium is beneficial to hitting home runs. The following table shows the ratios:

NYYYankee Stadium III1.30561.19621.2519
CHAComiskey Park II1.34161.10341.2201
LAAAngel Stadium of Anaheim1.12401.31481.2180
CINGreat American Ballpark1.54950.98561.2118
TEXRangers Ballpark in Arlington1.23301.12521.1769
BALOriole Park at Camden Yards1.49030.92301.1341
CHNWrigley Field1.06780.93341.1240
MILMiller Park1.30140.97041.1140
FLADolphin Stadium1.20001.00601.0988
MINHubert H. Humphrey Metrodome1.28590.92641.0865
NYMCiti Field1.12031.03931.0776
HOUMinute Maid Park1.26110.91291.0592
PHICitizens Bank Park0.98381.07891.0229
MLB Average1.10260.94691.0220
ARIChase Field1.00831.01661.0115
COLCoors Field0.99191.01831.0022
PITPNC Park1.40160.75240.9945
BOSFenway Park1.26650.74040.9943
DETComerica Park1.05960.93250.9942
SFGAT&T Park1.22550.80250.9834
OAKNetwork Associates Coliseum1.19730.78750.9600
TAMTropicana Field1.11690.80090.9511
WASNationals Park0.96430.91620.9393
SEASafeco Field0.94810.83100.8870
LADDodger Stadium0.98510.77830.8815
ATLTurner Field0.91570.84620.8812
SDGPetCo Park0.85760.69420.7679
STLBusch Stadium II0.73730.75890.7429
KANKauffman Stadium0.78970.64600.7110
CLEJacobs Field0.72570.62390.6706

As it turns out, new Yankee Stadium is the easiest stadium in the Major Leagues to hit a home run, with 25.19% more batted balls landing in the seats than in Yankee away games. Although Yankee Stadium provided the biggest increase in home run rate, the Yankees didn't benefit as much as some other teams. The rate of home runs was 55% higher at home for the Cincinnatti Reds than when they were on the road. Their opponents actually hit homeruns at a slightly lesser rate, when coming into Great American Ballpark. As it turns out, the Rockies had a tougher time in 2009 hitting home runs while on the road, than while at Coors. Their opponents did benefit slightly, but overall the rate was nearly the same as in away games.

This analysis provides a new look on whether or not a stadium really is a good home run park or not. Unlike park factor, which only considers the amount of homeruns per game, this method looks deeper and find the number of homeruns per batted ball. This is important since other factors may lead to increased number of plate appearances per game, in certain stadiums. The additional plate appearances add to the number of homeruns, thus slightly inflating the home run park factor. Like park factor, there is still a flaw which I will discuss further in a future post. Until then, hopefully I have shed some light on which ballparks really are home run friendly and which are not.