Baseball Articles
Wednesday, April 13, 2005
 

What is “Sabermetrics?”

So what exactly is meant by the term “sabermetrics?” A short answer would be the scientific and historical study of baseball. A more detailed explanation would involve the development of statistical models to explain how various facets of the game work, discovering pieces of information that is relevant to the early history of the game, and passing this information along in more of an academic manner (http://www.cfmc.com). /; Bill James is undoubtedly the greatest sabermetrician in the history of the field. He has been extremely influential for many years. “Sabermetrics” comes from SABR, the Society for American Baseball Research, which was founded in Cooperstown, New York in 1971. It currently has over 7,000 members throughout the country.

Most major league talent evaluators have shunned sabermetrics, up until recently. The practice was written off as a movement of “nerds that had nothing better to do.” It wasn't until the Oakland A's started to have success with their young General Manager Billy Beane did people start to take notice. Currently besides Beane in Oakland, General Managers J.P. Ricchardi in Toronto, Theo Epstein in Boston, and most recently Paul DePodesta of Los Angeles all support sabermetric theories. In Epstein and DePodesta’s case, they get to practice their philosophies on a huge budget payroll for the first time. Theo Epstein even hired the great Bill James as a talent evaluator for the Red Sox.

A statistic is useful only if it is understood properly by the person using it. As a result, the majority of sabermetrics includes understanding how to use statistics the right way, which stats are needed for a particular study, and so on. You don’t necessarily need to know the field of statistics and mathematics inside and out to understand what sabermetrics can accomplish. All you need to know is how stats can be used as well as misused.

      
Principles of Sabermetrics
 
The obvious goal of every baseball organization is to win more games than their competition.  The goal is essentially to win as many games as possible because a single franchise has minimal control over the number of games its competition wins.  For that reason, it makes sense to measure the players in your own franchise based on contribution to the team’s number of wins.
A clear relationship exists between a team’s number of runs scored and runs allowed and its number of wins and losses.  The relationship isn’t exactly perfect, but it is close.  Bill James concluded from his data that a team’s ratio of wins to losses should equal the square of the ratio of its runs scored and allowed.  For example, a team that scores and allows the same number of runs will end up with a .500 record.  A team that scores 800 runs while allowing only 700 runs will win approximately 64 games for every 49 that it loses.  This would project to a 92 – 70 record over the course of a full season (2004 Bill James Handbook, James).
Another way of stating it, the central goal of sabermetrics is to estimate a measure for a given purpose.  Statistics are most commonly used for evaluating past performances and to predict future performances.  In order to do this, one must consider measuring the contribution of an individual player to his team’s wins and losses.  This type of analysis is available because unlike most other sports stats, baseball statistics have the ability to measure individual performances, independent of what other players do around the selected individual.
To further illustrate, take this example.  When a batter hits a single, this describes what he did; when a quarterback throws a ten yard pass, the guard who took out the blitzing linebacker gets no statistical credit.  The batter who received a single is properly credited for a success.  The ten yard pass may have been a failure if it was third down and needed fifteen yards for a first down.  Thus it is reasonable for the goal of a baseball statistic to be to measure a player’s individual contribution to runs or wins (http://www-math.bgsu.edu/~grabine/manifesto.txt).
 
The Major Statistics
 
A solid measuring of offense ought to have a strong, positive relation with the total amount of runs scored.  Obviously, the best teams will score the most runs, while the lesser teams will not score as many runs.  Statistical measures like batting average do not accomplish this.  It is not uncommon for teams that have a higher collective batting average to be in the lower half of the league in scoring runs.  “Runs scored” by itself is a fine indicator for a team’s offense, but is not for determining individual contribution.
There are varying formulas that can estimate the total number of runs that an individual player will generate.  They depend heavily on two stats: on-base percentage (OBP) and slugging percentage (SLG).  “Batting average” doesn’t correlate well as stated earlier, and “runs batted in” and “runs scored” can also be deceptive because they depend on how well the surrounding players perform.  The most basic formula is:
·        Runs = OBP * SLG * At Bats
This equation can be further broken down into:
·        Runs = (hits + walks) * (total bases) / (at bats + walks)
            If you divide a player’s “runs produced” by the number of outs he made (“at bats” minus “hits”), and multiply it by 27 (number of outs in a game), you are able to estimate the number of “runs per game” (RPG) an entire lineup of that one particular player would produce.  The formulas above for team runs can be useful to give value to a single player, but it is better used to predict team runs over the course of a year.
            The average baseball fan could ask what makes sabermetrics so special.  What is wrong with using batting average, home runs, and runs batted in for evaluating players?  The answer is that this concept is really just another way of evaluating the components that make up the everyday stats that fans are so accustomed with.  It delves further below the surface of the numbers that average fans associate with offensive prowess and gives the user a much clearer idea of what the statistics actually represent.
            Sabermetrics was created mostly on the foundation of offense.  Perhaps the most widely used sabermetric stat is OPS, which stands for On-base percentage Plus Slugging percentage.  To get a player’s OBP, total up the number of walks, hits, and hit by pitches and divide it by the sum of at bats, walks, hit by pitches, sacrifice hits, and sacrifice flies.  Let’s look at perhaps the greatest single season statistically speaking in sabermetrics, Barry Bonds’ 2002 statistics.
·        OBP = (BB+H+HBP)/(AB+BB+HBP+SH+SF)
·        BONDS: .582 = (198+149+9)/(403+198+9+0+2)
            In 2002, Barry Bonds had an on-base percentage of .582, which means that he was able to reach base at a 58% success rate.  This number is absolutely astounding.  A good “OBP” is considered to be .350 or higher.  The number is so high because he set the single season record for most walks in a season.  Slugging percentage is much easier to calculate.  In order to figure it out, just divide “total bases” by “at bats.”
·        SLG = TB/AB
·        BONDS: .799 = 322/403
Bonds’ .799 slugging percentage in 2002 was the fourth highest mark in the history of baseball (he owns the record set in 2001, .863).  Slugging percentage averages have a wider range than on-base percentage averages.  A respectable slugging percentage is around .450 and higher.
            To fully understand what slugging percentage represents, a further explanation is needed.  Each base represents a certain value when computing a slugging percentage.  A single represents .250, or one out of four total bases per at bat.  A player cannot have any more than four total bases per at bat, meaning a home run represents 1.000, the highest figure given per at bat.  To round it out, a double represents .500, and a triple represents .750 per at bat.  Barry Bonds’ .799 slugging percentage means that he averaged more than a triple’s worth of base value per at bat, or over three total bases per at bat.
            Now that the two major statistics are defined, OPS can be fully understood.
·        OPS = OBP + SLG
·        BONDS: 1.381 = .582+.799
The statistic OPS is really easy to calculate once calculating the on-base and slugging percentages.  A .700 OPS is considered to be about the league average, while a .900 OPS is considered to be outstanding.  The mark that Bonds put up in 2002 was .481 above the “outstanding” line.
            Another important statistic when evaluating a player is his given “win share.”  A win share is considered to be a player’s individual contribution towards his team’s total number of wins.  Bill James defined a win share as “one-third of a victory.”  For example, if a team wins a total of 100 games, the team would then have a total win share of 300, no more or less.
            Listed below are the steps that Bill James uses according to his book, Win Shares, to calculate a given player’s win share.  The easiest way to explain this statistic is by referring to an article at http://www.baseballgraphs.com/details.html#sharecalc. 
1.      Divide the responsibility for a team’s wins between the offense (batting and base-running) and defense (pitching and fielding).  You do this by calculating the team run differential through a method called “marginal runs.”  You first calculate the average number of runs scored per team in the league.  You next adjust your team’s runs scored and runs allowed for the ballpark in which they played their home games.  Then you add together two figures: all runs scored over 52% of the league average (credited to offense), and all runs allowed less than 152% of the league average (credited to defense).  This represents the “total marginal runs.”
2.      Take the percent of marginal runs contributed by the offense, multiply it by the number of wins times three.  This is the total number of offensive “win shares.”  Do the same things for defensive “win shares.”
3.      Attribute offensive win shares to individual players.  This is done through two key metrics: “runs created” and “outs made.”  The formula for runs created was discussed earlier.  “Runs created” is calculated for every player, including pitchers.
4.      Subtract the league “background runs created” (52% of the league average) from each player’s “runs created” based on the number of “outs made” by that batter, adjust it for ballpark, and credit each player with the result; essentially individual marginal runs created.  Add these up for all players and use each player’s percentage of the whole to allocate offensive “win shares” to each.  Any player whose “runs created” are less than 52% of the league average “runs created per out” is credited with no “win shares.”  Now the offensive portion is calculated.
5.      The first step for defense is to divide defensive “win shares” between pitching and fielding.  This is done through a complicated formula that accounts for elements that can be attributed only to pitchers (home runs, walks, and strikeouts) as well as a team’s “defensive efficiency ratio” and other fielding statistics such as passed balls, errors, and double plays.  Typically, about 70% of defensive “win shares” are credited to pitching, and 30% to fielding.  The win shares system is bound so that pitching never is credited with less than 60%, or more than 75%, of defensive “win shares.”
6.      Allocate pitching “win shares” to individual pitchers.  This is accomplished through an even more complicated formula that starts with each pitcher’s marginal runs allowed (same approach as team marginal runs not allowed), wins, losses, and saves.  Special consideration is given to relievers by estimating the number of high-leverage innings they pitched (ninth innings with one-run leads are more important than first innings with no score) and something called “component ERA” which is essentially ERA re-calculated according to the actual underlying run elements.
7.      Pitchers are deducted “win shares” if they are absolutely horrible hitters.  All these elements are then mixed together in a complicated formula to allocate pitching “win shares” to individual pitchers.  As in offensive “win shares,” any pitcher who gives up more than 152% of league-average “runs scored” does not receive any credit for pitching “win shares.”
8.      The most difficult step is allocating fielding “win shares” to fielding positions, and then to individual fielders.  The calculations differ for each position.  Essentially, James has selected four defensive stats to evaluate positions.  Here they are by position, listed in order of importance:
·        Catchers: caught stealing, errors, passed balls, and sacrifice hits allowed
·        First Basemen: plays made, errors, arm rating, and errors made by third basemen and shortstops
·        Second Basemen: double plays, assists, errors, and putouts
·        Shortstops: assists, double plays, errors, and putouts
·        Third Basemen: assists, errors, sacrifice hits allowed, and double plays
·        Outfielders: putouts, team DER, arm elements, and assists and errors
As stated earlier, the base for “win shares” is equal to zero.  A player who has zero “win shares” is said to have not contributed anything to his team’s overall success.  Ten “win shares” represents the contributions of a decent regular player, starting pitcher, or closer.  Twenty “win shares” is the measure of an all-star player and Cy Young candidate pitcher and is considered a very good year.  Thirty “win shares” is an MVP-type year, reached by a handful of players each year.  With forty “win shares,” a player is considered to be having a historic season.  To continue with Barry Bonds’ 2002 season, he compiled 49 win shares, which equates out to being solely responsible for 16 wins.
 
Park Factors
 
Within the last decade, the average fan has be able to recognize that there is a chance that a ballplayer will have drastically different statistics depending on which park he plays his home games at.  Coors Field in Colorado is probably the most recognizable when discussing home field advantage factors.  The park has helped almost all Colorado Rockies players produce some ridiculous offensive numbers.  It makes the average Major Leaguer look like an all-star.  It now seems in this present day that when a player evaluation is made, somebody mentions where he played and how that park may or may not have helped his statistics.
Some problems do exist in determining overall park factors.  One problem concerns whether or not parks can affect different players in different manners.  Another deals with whether or not to adjust individual pieces of statistics or just by total runs scored.  A third area of concern is whether or not to use just a single year’s worth of statistics from a park, or to use the data gathered from a three-year moving average.
In order to get the best possible grasp on what a given player’s abilities are, his raw statistics have to be separated from the things that clearly reflect some of his ability on his part and those things which represent external factors.  So, if a park could misrepresent a player’s statistics by a wide enough margin, that distortion would have to be removed in order to get a clear and accurate glimpse of the given player’s abilities.
The concept was to use a player’s performance in the context of where he played his home games which would lead to a better understanding of how well or poorly that particular player performed.  One way to do this is by viewing the player’s home/road splits.  The big problem with this is that the sample size is not large enough to be considered significant.  An improved way is to total up how the entire team’s players performed at home and on the road and make adjustments from those collective averages.
A statistician by the name of Voros McCracken first came up with a method of being able to calculate the adjusted park factor for ballparks.  McCracken outlined the steps needed to form a clear cut number to assign for a ballpark (http://baseballstuff.com/mccracken/parkfact.html).  The steps are listed below:
1.      Gather the home and road stats for a team.
2.      Set up a “park neutral” total using these stats.
3.      Compare the actual stats to the park neutral stats and see if the actual stats could happen from chance from the park neutral ones.
4.      Establish levels below which the result probably didn’t happen from chance.
5.      Calculate park factors only for those parks that meet those levels, and adjust those slightly toward “neutral” as a nod to the fact we don’t know exactly what the actual effect is.
6.      Readjust the factors for everybody so the league average is neutral (If we only have one park rating as significant in a stat, it would be impossible to balance things without adjusting one of the neutral parks – this resolves that.)
These steps must be repeated for each statistical category that is to be examined.  A list of various park factors is located in Appendix C.
 
Minor League Statistics
 
Minor league statistics can prove to be just as useful as major league statistics when determining a player’s future performance.  The minor league stats need to be adjusted to account for park factors as well as the stiffer competition in the major leagues.  Once this is accomplished, it is effortless to see whether a prospect who is hitting well for the Triple-A club is hitting just as well or better than his counterpart, normally the more expensive player on the major league roster.  According to Bill James, “In my opinion, this is the most important thing that I learned in my years of studying sabermetrics in terms of its potential ability to help a baseball team.” (2004 Bill James Handbook, James).
Equivalent Average (EqA) is a park adjusted, league-average adjusted, competition level adjusted, and minor league adjusted statistic which is used to estimate how a minor league player would have performed with the major league ball club.  An EqA of .260 is considered to represent the average major league performance.  An EqA of .300 is considered excellent major league hitting, and poor major league pitching.  A hitter’s EqA is based largely on singles, doubles, triples, homers, walks, steals, caught stealing, and at bats.  A pitcher’s EqA is primarily based on earned run average, runs allowed, and estimated runs produced against the pitcher based on hits, walks, homers, and innings pitched – per 27 outs.
                
Age 
 
Studies have shown that the average major league baseball player tends to improve in talent up until the age of 27, after which his talent declines after the age of 27.  Bill James summarizes his own findings about age in one of his books, This Time Let’s Not Eat the Bones.
·        Almost every accomplishment (20-win seasons, 100 RBI seasons, .35 homer season, etc.) is more common at age 27 than any other age.
·        The peak period for ballplayers is not ages 28 to 32, as was once believed, but instead is 25 to 29.
·        All players as a group retain 77% of their peak value at the age of 30, and barely over one-half of their peak value (53%) at the age of 32.
·        Contrary to popular belief, power pitchers age more slowly and last much longer than do ‘finesse’ or ‘control’ type pitchers.
 
Frequently Asked Questions about Sabermetrics
 
Do Major League franchises use sabermetrics?
 
Sabermetric theories are being used by only a few teams, mainly: Oakland, Boston, Toronto, Cleveland, and most recently Los Angeles.  Teams with a smaller payroll have to be able to invest their resources wisely into the amateur draft and minor league development.  These two areas act as the cornerstones for building a winning franchise with little revenue.  Having a lackluster draft or incompetent player development in the minor leagues will severely hamper the opportunity of a low payroll team of being successful.  
The major point in using these theories is to be able to recognize characteristics in certain players that other franchises tend to overlook or even miss entirely.  Sabermetric practices can be used to notice the very qualities that make a ballplayer worthy of being signed into an organization.  Having stats such as a high walk rate, good on-base percentage, or excellent groundball-fly ball ratio can point to a player’s skill that is not noticeable in a normal stat line.  By using the sabermetric stats, organizations tend to have a much better idea about the players they are signing to play for their teams.
Most of the franchises in Major League Baseball do not want anything to do with the lessons of sabermetrics.  As discussed earlier, Oakland has been the lead example of the power of these theories for the past five years.  They operate with a $50 million payroll and have turned out several All Stars, a Gold Glove winner, two MVPs, and two Cy Young winners and have been to the playoffs the past three seasons during that span.  Boston General Manager Theo Epstein and Cleveland General Manager Mark Shapiro have also taken to a sabermetric direction in shaping their teams.  Even though Boston is a high payroll team, Epstein’s trades and acquisitions have been very successful to date and nearly got them to the World Series in 2003.  Mark Shapiro has built the Cleveland farm system into one of the top five systems as a result of sabermetric practices over the past few years.  As of early February, 2004, former Oakland and Billy Beane assistant Paul DePodesta was hired to become the general manager for the Los Angeles Dodgers.
 
Why don’t more franchises use sabermetrics?
 
No clear reason exists why more clubs aren’t adopting these principles into their organizations.  There are some franchises that don’t feel like taking the time and effort that is needed when learning and understanding the sabermetric principles.  Some other franchises simply don’t believe in the power of statistics being able to judge a player’s ability.  Which ever reason, there still is an established belief in the inner circles of baseball that believes in doing things the way they have been done for many decades.  They feel that approach has “worked” in the past and should continue to do so.


Comments: Post a Comment

<< Home

Powered by Blogger