« No Confidence in Berkeley Study | Main | No Confidence in Berkeley Study III »
November 20, 2004
No Confidence in Berkeley Study II
The problem I discussed in the previous post concerns how the regression results were reported in the Berkeley paper and whether the authors' claim -- that 130,733 Florida votes went to Bush instead of Kerry due to the voting technology in some counties -- is justified by their results.
Here I tackle a more fundamental problem: Whether we should believe the regression results at all. By doing so, I am not saying that nothing went wrong in Florida; instead, the Berkeley paper is at best incomplete until further analysis is performed. To build a recount or a reform campaign around it at this point would be highly premature and risky. Because of the attention given this study, we are in danger of just that.
The question posed by the paper is what effect using electronic voting had on the Bush vote in a Florida county. That is, did the use of electronic voting technology award Bush more votes than he would have received otherwise? The question implies a thought experiment: If County X had used optical scans instead of electronic voting, would Kerry have received more votes? The problem is we cannot answer that question directly; in effect, that would mean conducting two elections in County X, one with electronic voting and the other with optical scans, and comparing the results. Obviously, that is impossible.
In order to answer such a question, therefore, the regression model creates a quasi-experiment. Holding other factors equal, was the Bush vote higher in counties with electronic voting?
The Berkeley authors are well aware that one cannot answer such a question without holding other factors equal. The reason is we would expect that the decision of counties to use optical scans or electronic voting was not random. Moreover, one or more underlying factors may contribute both to the decision to use one technology or the other and to the propensity of that county to support one candidate or the other (eg, the wealth of the county).
In specifying a regression model, variables are included which ought to account for these relationships. If the researcher has accounted for every meaningful relationship, then we can conclude that the remaining additional contribution of the "Electronic Voting" variable in fact helps explain the Bush vote and does not instead stand in for some other, hidden relationship. Whether we believe the claim that voting technology contributed to the disparity depends, therefore, on whether the authors have taken into account such alternate explanations.
I am not at all convinced that they have. Let's take two obvious problems. First, the Hispanic variable, which is meant to take into account the Latino population in a county, lumps Cubans with non-Cubans, who have very different voting and residential patterns across counties. Second, no effort is made to distinguish counties in the Panhandle, even though historically their voting patterns are quite different than in South Florida. That is to say, it is not merely a question in these counties of voting Republican, which the authors do control for, but of voters there registering Democrat while doing so. Thus, they cannot answer directly the Dixiecrat critique raised by others.
The results they get for the variable labeled "Electronic Voting" may or may not tell us the effect of voting technology at all. Instead, it may reflect the effect of some unspecified underlying variable other than the voting technology utilized which is correlated with number of votes for Bush.
Another paper by Jas Sekhon proposes an alternate method to help overcome these problems. He uses a matching algorithm in order to compare similar counties that differ on the one dimension of interest here: Their voting technology. In matching counties, he was able to simplify the analysis while at the same time better controlling for confounding factors than the Berkeley study does.
The method comes at a cost, as he admits. In order to find appropriate matches, he ends up with only 8 pairs of counties in the analysis, and 14 counties in all (two did double duty in the matched pairs). In other words, at the same time that his method provides greater precision in making comparisons across counties, 53 of 67 -- including the three large Dem-leaning counties that drove the Berkeley results -- are dropped. Sekhon notes that the only way to resolve such ambiguity would be to perform the same analysis at the precinct rather than county level.
Neither study should be taken as definitive. But, you might ask, why wait for the precinct data? Why not continue to press the claims in the Berkeley paper? Because if we hope that those in power will take recounts and reform seriously, our demands must be credible. They would be too eager to dismiss challenges as sour grapes, conspiratorial, and delusional, as some already have. If this is to be more than just a series of bitter gripes, we need strong evidence that the technology failed or was manipulated. So far we do not have such evidence, at least not that I've seen. And election reform should be about more than overturning Ohio or Florida, but about real reforms that provide transparency, accountability and open access to the polls for all who are eligible.
03:06 PM in Politics | Permalink
TrackBack
TrackBack URL for this entry:
http://www.typepad.com/t/trackback/1440762
Listed below are links to weblogs that reference No Confidence in Berkeley Study II:


















