1. Defensive metrics make a player significantly more/less valuable in WAR.
2. You have to look at a larger sample size to get a true reading.
3. You have to pick one or the other to get a read on pitching.
So I would say:
1 is usually not true and tends to get better over larger sample sizes. But yeah, it's a little weird. Frankly, I often just look at oWAR, because fielding metrics are A) pretty bad and B) don't converge for about 3 seasons, the sample sizes are huge. Which brings us to 2, which is correct. But you need big sample sizes for other stats to be meaningful as well. I mean, they're concretely right at all times. In a sense, WAR is probably fairly close to right in most cases as well. But we all know that Shane Spencer is not really a .375/.400+/.900+ hitter. Those were his real, concrete, easily calculable numbers in 1998, but as much as jtp seems to want to argue for the validity of more "concrete" stats, they clearly weren't meaningful as a representation of his talent level. Sample size
always matters. See this:
Sample Size. It takes over a season for batting average to converge, which should not be surprising. George Brett wasn't really a .390 hitter.
Point 3 is absolutely true. Both are in their own way meaningful measures of how much value a pitcher produced. BBR is much more concrete, and actually measures the value he produced. FanGraphs attempts to calculate how much value he "should have" produced. fWAR is
slightly more predictive, but it misses on a lot of pitchers, so I really don't like it. FIP is just a stupid stat. So I much prefer bWAR for pitchers. OTOH, I do think FanGraphs does a better job with dWAR. But it still sucks.