(Updated 2/29/04 - Statistics still from 12/23/03)
Since the dawn of basketball statistical analysis, comprehensive ratings of basketball players have largely taken on the linear weights form. Pioneered by Dave Heeren, who developed TENDEX™, this form holds that players can be evaluated by some weight multiplied by each key thing they do on the court: +1 for each point, +0.75 for each rebound, +0.5 for each assist, and so on and so forth. In its most basic form, there is a weight of one attached to each factor involved. This formula, commonly known within the statistical community as "Manley Credits", is what the NBA uses as its "exclusive" efficiency rating system.
The problems with this form are many. As Dean Oliver demonstrated in Basketball on Paper, there are wide differences between the weights each system uses for each event. I've knocked linear-weights systems as "something a college kid could think up in his dorm room" -- based upon the fact that I indeed came up with my system, Value Over Replacement Player, in my dorm room while discussing it with others at the Hoopsworld message boards. The most fundamental problem, however, is that basketball doesn't work that way. So many hits are going to continually lead to a run in baseball, but this is not the case in basketball. An infinite of rebounds will be worth each point on a team that can't score, and the value of possessions is constantly changes as the game grows and matures. (That's the league's line, at least, on why scoring keeps going down.) The biggest problem may be that these systems are ultimately focused on looking at points created (as one formula, the aptly-named "Points Created", explicitly states). While this is all based on the value of each possession an action creates, there is only partial accounting for possessions used, which is equally as important as points created. The fundamental realization for baseball analysts was that outs were the most precious thing in the sport. In the same way, possessions drive everything else in basketball. Teams aren't trying to score on defense, but they are trying to keep the other team from making use of its possessions, just as they try to make use of their own possessions on offense and create extra possessions with their rebounding. It is worth an aside here to point out that there are differences amongst NBA statistical analysts in what, exactly, a "possession" denotes. At its most simple form, the difference of opinion is whether an offensive rebound starts a new possession. The implication is whether rebounding is a part of offense and defense, or its own separate category. Most people tend to take the latter stance. I disagree. I don't know if it's because of how I learned the game, or what, but I think of four different categories: offense, offensive rebounding, defense, and defensive rebounding. That's how I evaluate the game, and it makes a huge difference. Another point worth making is that teams aren't evaluated strictly by their points scored and allowed -- at least not by statistical analysts. Instead, we measure how well they make use of their possessions. Why shouldn't the same theory hold at an individual level? It is well acknowledged that we can do a pretty good job of evaluating offense and rebounding. Defense is a different story, enough so that I have at times considered giving up the effort of creating a comprehensive rating system. However, at this point I feel comfortable enough with my defensive ratings as to be able to share them with you. There are still players who will be dramatically mis-rated, but the current system is better than anything I've done before. How, then, do we put it all together? How important is rebounding relative to offense and defense? For my answer, I turn to teams. We can pretty easily evaluate the relative importance of these attributes of teams through the use of multiple regression. I used this on teams from 1989-90 through 2002-03 to derive the weights I use on the ratings I calculate. The key step in this whole process, which I first realized a couple of years ago, was that we can conceptualize a team made up of a selected player and four average starters. This allows us to use the team rating to evaluate the player. I'll now take a step-by-step look at how I evaluate each part of the game, beginning with offense. OffenseAs I pointed out earlier, rating offense comes down to two things: points and possessions. Neither is really extraordinarily difficult to calculate.For points, we start with (duh) points. To this, we add some credit for assists. I use a value of 0.5 for each assist. This is less than other statistical analysts, like Dean Oliver (.75) and John Hollinger (.66) use, because I've found guards to be overrated on offense by this system using those values. Any value for assists is a complete guess barring a detailed study on how a pass changes a player's likelihood of scoring (which we're working on, but it takes time). We also have to take away 0.5 points for each assisted field goal made. While data on assisted field goals is now available at the invaluable 82games.com, I don't particularly care to type it in for every player. It also is not available for any players prior to last season, or non-NBA players if we care to rate them. Thus, what I did was get the assisted field goal data for each player last season and create a regression to estimate it using the share of his team's assists the player distributes, the team's assisted field goal percentage, and the player's free throw rate (this is important because players who drive the lane tend to create their own shots more often than perimeter shooters). The formula, should you be interested in replicating my results, is 0.42 - (.63*((AST/MIN)/(TMAST/(TMMIN/5)))- (.89 * (FTA/MIN)) + (.67 * (TMAST/TMFGM)). This estimate is multiplied by field goals made and by 0.5 and subtracted from the prior point total to give us an estimate of offensive points created. Possessions are similarly easy. We begin with the general formula for possessions -- FGA+ (.44*FTA) + TO (other people use different free throw weights, but the general form is the same). To this, we add .25 for each assist and subtract .25 for each estimated assisted field goal. I use one-half the points value so that assists do have a significant impact on overall rating. If we didn't do this, ratings would go largely unchanged. A player's individual rating is simply points created divided by possessions used (multiplied by 100, as I strictly work in points per 100 possessions, on a team or individual level).
Final formulas: However, we still have to calculate a team rating. The first task is to figure out how large the player's role in his offense is. This is determined by finding the percentage of his team's possessions he uses by POS% = (Pos/Min)/(TmPos/(TmMin/5)). The formula for the team's offense rating (OffRtg), then, is Ind(Individual Rating)*Pos% + (1-Pos%)*LgOff (league offensive rating per 100 possessions). However, there is one slight adjustment I make to reflect what we know from reality -- that the more possessions a player uses, the less efficient he is. By extension, the more possessions a player uses, the more efficient his teammates are. Therefore, I add .25 points * (Pos% - 0.2) to the league average offensive rating. A player who uses exactly his expected percentage, 20%, of his team's possessions, goes unchanged. A player like Allen Iverson, however, who uses a high proportion of his team's possessions, gets rewarded for keeping other players from taking worse shots. I think this is fair and necessary to make this rating system work.
Final formulas: Offensive ReboundingAn individual player's offensive rebounding percentage is quite easy to determine. One simply needs estimate offensive rebound opportunities by ((TmOR + TmOppDR)/(TmMin/5))*Min. The player's offensive rebounds are divided by this projection of opportunities.Things get a little trickier when we add in teammates. For one thing, I have deemed it necessary to consider the position of teammates. Clearly, in reality we wouldn't add a center to a team of four completely average starters; he'd be added to an average power forward, an average small forward, and so on. I think it's fair to normalize offensive rebound percentages by considering the average performance at the position. One thing I always have in mind in rating players is being fair to all types of them. Bill James discusses this, and I believe it's critical. One of my biggest problems with pure TENDEX formulas is how much they overrate big men. Yeah, we can artificially adjust for that, but it just doesn't feel right. This does. What IS arbitrary is that I don't adjust for defensive rebounds. This is for two reasons. One, I couldn't get it to work. Two, I needed to adjust offensive rebounds to be fair. I didn't feel I needed to do that with defensive rebounds. Anyways, back to the math. Weighting by minutes the offensive rebounding percentage at players deemed to play each position (admittedly quite arbitrary), I come up with the percentage the players at the other four positions would grab. I next want to take into account the fact that rebounding is not quite linear -- a really good rebounder will also take away some rebounds that his own team would have gotten anyway. What this requires is multiplying this position factor by the following formula: (5*PosOR%)/(5-(1.25*PosOR%)). This is then multiplied by (1-OR%) and added to the player's offensive rebound percentage. Defensive ReboundingDefensive rebounds work virtually the same way, with the elimination of the position factor. Instead we use the league's average defensive rebound percentage (at the moment, 71.2%) and multiply it like this: (4*LgDR%)/(5-LgDR%)). This is multiplied by (1-DR%) and added to the player's defensive rebound percentage.The reason for that silly looking formula is that simply using the raw rebounding percentage and multiplying by the leftover rebounds will be too low. The formula is derived from the fact that a player who is a league-average rebounder should have a team with league average rebounding. So the formula you end up with is LgDR%/(1-(LgDR%/5)). Derive that out and you get the formulas I have. DefenseI've saved the worst for last. Defense has been a challenge for analysts to rate since they've begun thinking about it, and will continue to be until the league starts tracking more defensive statistics -- and probably still then as well.Here is the fundamental thing I had to realize to evaluate defense. All defensive possessions end with one of the following outcomes:
That's it. So what we have to do is estimate how often each of those things occur, and how many points result on average (from field goal and free throw attempts; obviously no points result from the others). The NBA is nice enough to track two of these five things for us, steals and blocks. To evaluate these, we need to first estimate defensive possessions by Min*((TmOppFGA + (.44 * TmOppFTA) + TmOppTO)/(TmMin/5)). We add the player's blocks and steals and divide by estimated defensive positions, to come up with what percentage of possessions he ends himself with a recorded defensive play, then add to that 80% of the league average of blocks + steals per possession (which the other four imaginary teammates are theoretically generating). That step is done. Next we need to look at fouls. We can estimate how many possessions each foul results in by the formula (.44*LgFTA)/LgPF. This ratio is multiplied by the player's personal fouls (PF) and divided by the estimated defensive possessions the player has participated in. To this, again we add 80% of the league average for what percentage of possessions end in free-throw attempts. These are the purely individual categories. The other two reflect team defense, and are not individually counted in the statistics. Some people would assign the team's average ratings to the individual, and call it done. I've done that in the past. However, this is fraught with problems, the most obvious being, why should Devin Brown get credit for the Spurs' starters being the best defense in the league? (That's no affront to Brown, it's just obvious that he is not the reason the Spurs defense is the best in a decade.) What I've done is create what I call a "team defense factor" (TDF), which is simply Min/TmMin. If a player was on the court every minute of every game, this ratio would be 20% (TmMin includes minutes for all five players on the court, which is why you repeatedly see it divided by five in previous calculations). Self-explanatorily, this determines how much of the team's defense the player gets credit for. What we're assuming is that each player is equally responsible for the team defense his team plays while he's on the floor. That's not true, of course, but it's probably the best we can do given available individual and team defensive statistics. A player who averages 36 minutes and has played in every game will have team defense ratings composed of 15% of his team's performance plus 85% of the league rate. A forced turnover shows up in the statistics as an opponent turnover which is not a team steal. Thus, the formula is (TDF*((TmOppTO-TmStl)/TmOppPos)) + ((1-TDF)*(LgAvgFTO/Pos)). In the same manner we can evaluate how many points each non-blocked field goal attempt results in, using the formula (TmOppPts - TmOppFTM)/(TmOppFGA - TmBlk). This is again weighted by the team defense factor, with the other percentage accounted for by the league's average points per non-blocked field goal attempt. Putting it all together. We have percentages of how often the player's imaginary team does not allow the opposition to score through either a steal, a block, or a forced turnover. We also have the percentage of the time they send the opposition to the line. If we add these and subtract from 100%, we estimate how often the imaginary opposition gets off a real shot. We then add in the estimated points per attempt and, voila, an overall rating that is pretty complicated in application but theoretically simple: TmDef = (FTArate * PPFTPos) + (1 - FTArate - STrate - BLrate - TOrate) * PPFGA Does this work? I think it does, in general. Looking at the ratings from this season so far, most of the players near the top of the league are considered outstanding defenders. There are notable exceptions -- Chris Andersen, courtesy a high block rate and a strong team defense, is rated the best defender in the NBA and Shawn Bradley, much to my chagrin, shows up in the top ten -- but for the most part these are guys you'd agree with. Ben Wallace is number two, and most of the all-defensive big men -- Tim Duncan, Kevin Garnett, Ron Artest -- rank in the top twenty. Some of the "exceptions" are guys who are arguably just underrated, like Elton Brand and Shawn Marion.
Going to WinsOne of the things I like about this system is that it produces a tangible result. 300 VORP? 24 Efficiency points per game? Nobody knows what those mean. Anybody can interpret a winning percentage. The formula I've derived is:=-0.919 + (0.031 * OFFRTG) - (0.030 * DEFRTG) + (1.346 * OFFREB) + (1.354 * DEFREB) Though these values cover a long period over which the game has changed, they still produce results centered around .500; I don't see anything wrong with using them. Okay, I've given you a lot of theory and a lot of formulas. Now, let's look at some examples, starting with the overall winning percentage:
Player Team Pos Win% Kevin Garnett MIN PF 0.791 Tim Duncan SAN PF 0.752 Andrei Kirilenko UTA SF 0.739 Elton Brand LAC PF 0.695 Jason Kidd NJN PG 0.686 Paul Pierce BOS SG 0.685 Shaquille O'Neal LAL C 0.674 Sam Cassell MIN PG 0.671 Baron Davis NOH PG 0.665 Tracy McGrady LAL SG 0.660
Garnett and Duncan. Tough to argue with them. The win percentages, particularly for Garnett, may be a little high. That works out to 65 wins, which is quite a lot indeed. When a similar system had Duncan and four average players winning 73 games in 2001-02, Oliver wondered whether Duncan was better than Michael Jordan or whether MJ just happened to have below-average teammates when the Bulls won 72. In my defense, I would point out that this rating presumes that the player would play 48 minutes per game, which obviously is not the case.
Returning to this list, the three biggest surprises are Kirilenko, Brand, and Cassell. Kirilenko may not be so surprising after his outstanding start to the season and his pair of 5x5s. Kirilenko is a great, versatile player, but I'm not quite ready to anoint him the NBA's best small forward. Brand's season is in its early stages because of the broken bone he suffered in Japan, but he's playing great ball and has been underrated for a long time. It's gone unnoticed, but Cassell is having an outstanding year: How does 19.8 ppg on 49.8% shooting, 7.5 apg with a 3.23 assist/turnover ratio sound to you? That's pretty tasty to me.
The rest of the guys aren't really surprises. I have half-jokingly evaluated rating systems in the past by whether they had Shaq number one, but it's early and his numbers haven't been great for him so far.
There are many other ways we can use this winning percentage to evaluate players. There are two in particular I'm going to take a look at. The first is what I call NetWins. It is found by (Win% - .500)*(MIN/48). The minutes portion looks at how many "games" a player has played, and the whole formula shows how much better (or worse) he's been than an average team. This can be thought of as a "games above .500" for a player, and is a good summary number because it takes playing time into account.
The top ten:
Some slight shifting, with Marbury -- who led the NBA in minutes through Sunday -- replacing Brand, who has been injured far too much to qualify.
The other summary method is WARP -- Wins Above Replacement Player. As of yet, I don't have any win percentage replacement theory developed. I'm simply using 0.35 instead of 0.5 in the same formula as above, which increases the relative importance of minutes played.
Shaq, who hasn't played very many minutes, drops out and is replaced by Marion, who has. (Anyone think the Suns might not have very much depth?) Not big changes here, but the increasing importance of minutes played has shifted Davis near the top of the list -- the talk of him as an MVP candidate is definitely realistic so far, though one really has to wonder about his minutes given his balky back.
As a quick note, one player you may be surprised not to see is Sacramento's Peja Stojakovic, who just misses the cut on several lists. Stojakovic's defensive rating isn't quite where I'd like to see it, given his burgeoning defensive reputation. Kobe Bryant is another notable near-miss. Bryant has been hurt by surprisingly mediocre defensive rebounding so far this season.
Another thing we can do with overall numbers is use a tool I introduced in my column, offensive/defensive bias. That's found by the (OFFRTG - LGAVG) - (LGAVG - DEFRTG). The players the most biased to offense, by this method:
Olowokandi, alas, has been as bad on offense as he has been good on defense (winning percentage - 31.4%).
Top Rookies?
Don't be too concerned with Anthony's and Bosh's mediocre NetWins ratings; it's rare for rookies seeing heavy action to crack the .500 mark, and they are in fine shape. At age 18, meanwhile, LeBron looks suspiciously like the second coming. Frahm didn't make the overall winning percentage list because he's played too few minutes, but he has been absolutely on fire all season long.
Let's break this down by top offensive and defensive players:
Clearly, the offensive ratings favor perimeter players, the defensive ratings interior ones. I think this is appropriate. Think about it -- perimeter-based teams tend to be better on offense than on defense, while post-based teams like the Spurs tend to win with defense. The Spurs have a third player, Manu Ginobili, just outside the top ten. Any questions on why they're so good on d?
|