Really superb new piece by Ansolabehere and Hersh in the forthcoming Political Analysis. While the underlying technology is pretty fierce, Steve and Eitan do an excellent job, I think, in making the material accessible. Anyone who has been skeptical about survey self-reports should read the paper–it provides optimism and pessimism on both sides.
From the abstract:
Social scientists rely on surveys to explain political behavior. From consistent overreporting of voter turnout, it is evident that responses on survey items may be unreliable and lead scholars to incorrectly estimate the correlates of participation. Leveraging developments in technology and improvements in public records, we conduct the first-ever fifty-state vote validation. We parse overreporting due to response bias from overreporting due to inaccurate respondents. We find that nonvoters who are politically engaged and equipped with politically relevant resources consistently misreport that they voted. This finding cannot be explained by faulty registration records, which we measure with new indicators of election administration quality. Respondents are found to misreport only on survey items associated with socially desirable outcomes, which we find by validating items beyond voting, like race and party. We show that studies of representation and participation based on survey reports dramatically misestimate the differences between voters and nonvoters.
It’s currently free access at http://pan.oxfordjournals.org/content/20/4/437.abstract
Charles Stewart has been updating the Florida early voting returns on a daily basis, so I’m not going to reproduce his work here. Readers will have to be satisfied with a lame bar chart.
Getting to the Florida files proves to be a lot more complicated than North Carolina. Florida is probably the second-easiest state to work with, so that tells you how difficult, opaque, and at times expensive it can be to work with voter files. I look forward to a day where states agree upon common data formats or at least to make voter files more readily accessible.
The first challenge in Florida is that 67 separate early voting files need to be “harvested” from the Elections website. This is more complicated than it might appear at first blush, but web harvesting is an important skill for anyone who works with data from the web.
The attached PowerPoint illustrates the steps, including some power user Unix commands to quickly manipulate the files using the terminal window. These steps can be performed using a graphic user interface in Windows or on a Mac, but, like web harvesting, anyone who manipulates data files of this size and number needs to learn (or relearn) the command line.
Processing the files in Stata turns out to be relatively simple–the files are all formatted the same way. Click here for the do file.
Enjoy!
Nice posting by Peter Hamby of CNN:
Just got this email:
Dear Prof. Gronke,
I, and my colleagues, have been unable to satisfactorily answer this question, “If by law voting shall be held on the “first Tuesday after the first Monday in November” how is it that we have ‘early voting?'” My state of Michigan does not have early voting so I/we don’t have any first hand experience with this practice. I first posed this question to Dr. Michael McDonald and he replied that early voting is legal as the result of a Supreme Court case involving the state of Oregon, but he was unable to recall the name of the case. As an expert in these matters, I was hoping that you might be able to provide the name of the case so that I can do further research and be better able to supply an informed answer.
Thank you in advance,
Here is my response. This probably should be part of another FAQ.
Thanks for the great question.
The time, place, and manner of holding elections, as you are aware, is “prescribed in each state”, but “congress may at any time by law make or alter such Regulations” as specified in Section IV Article 1 of the Constitution. The law establishing the first Tuesday was passed in 1844 (see here if you’re going to show this in a class, it’s pretty cool): http://memory.loc.gov/ll/llsl/005/0700/07590721.tif
The Constitutional basis of holding early voting is actually a case involving Texas. In a 1999 suit, the Voting Integrity Project brought suit against the Secretary of State of Texas charging the early voting in the state violated 2 USC 7 (the statute shown above).
The District Court denied the motion for summary judgment, the 5th Circuit affirmed, and the Supreme Court declined to review.
The 5th Circuit decision is here: http://federal-circuits.vlex.com/vid/voting-integrity-project-elton-bomer-18387341
The argument of Texas, affirmed by the Court, is that because the election is not decided or “consummated” prior to the 1st Tuesday, then this means early voting does not conflict with Federal statute. This relies on a very specific meaning of “election” which, in the words of the court, and relying on the Foster case.
Foster is instructive on the the meaning of “election.” 522 U.S. at 68, 118 S.Ct. at 466. The Court observed first that the term “election” in federal election statutes “plainly refer[s] to the combined actions of voters and officials meant to make a final selection of an officeholder.” Id. at 71, 118 S.Ct. at 467. In striking down Louisiana’s open primary statute, the Supreme Court held only that elections must not be “consummated” before federal election day. Id. at 72, n.4, 118 S.Ct. at 468.
It may interest you to know that jurisdictions honor the letter of the law in another way. While it is true that citizens can cast early ballots, these ballots are not actually tallied until Election Day. In some states,your vote sits on an electronic memory card. In Oregon, the physical ballots are not even scanned until Election Day.
I’ve always been proud of the description bestowed upon me by John Lindback, previously the director of elections for the State of Oregon and now a senior officer in the Elections Initiatives at the Pew Center on the States.
John once introduced me by saying: “Paul Gronke, who is frustratingly even handed with respect to vote by mail.”
Today’s posting is in John’s honor. It doesn’t make an argument for or against voting by mail, but it does show how well VBM can work in a mature system, and how many of the concerns that have been expressed about “early” early voting simply aren’t an issue in the Beaver State. (For illustrations, see CNN’s election blog here, or Bloomberg here which I addressed earlier here.)
After 12 years, how many Oregonians are “early” early voters?
Today’s Oregonian reports 30% of registered voters in the state have returned their ballots by the close of business on Tuesday, one week before election.
This return rate is comparable with past elections. Column 7 in this Table from the Secretary of State’s office shows ballot returns one week out: 34% (2012 primary), 26% (2010 general), 30% (2010 primary), 42% (2010 January special, 29% (2008 general).
Oregon’s voter turnout is high–69.3% of voter eligible population and 85.7% of registered voters so it’s not the case that these ballot totals reflect a disengaged electorate.
The facts are these:
- In a state that has had one of the most liberal early voting regimes for 12 years, as few as 25% and seldom over 40% of ballots arrive a week before Election Day.
- In most elections, approximately 25% (18-38%) of ballots are returned on Election Day.
- Finally, as I’d blogged about previously, Oregon (and Washington) somehow manage to make this all work even though they mail their domestic ballots approximately two weeks before election day, the shortest by mail transit time in the country.
Say what you will about vote by mail, but make sure what you say comports with the facts on the ground.
A hat tip for today’s posting goes to Charles Stewart of MIT, whose “Political Science Laboratory” course inspired me to engage my introductory statistics students in data management using real data sources.
Regular readers of this blog may have seen graphics plotting the daily ballot returns from North Carolina. The graphics are identical to the kind of ballot chasing engaged in by the presidential campaigns, and really any campaign in a state with substantial early voting.
The ballot return information is a public record, and theoretically, any citizen, organization, or campaign should have equal access. Unfortunately, things aren’t so simple. As Michael McDonald reports:
Election officials may not report early voting statistics. I attempt to collect as much of the information about these ballots as possible. However, I do not hound election officials for these statistics because they are busy doing the important work of preparing for the upcoming election. Sometimes data will be available only at the local level. I cannot continuously scan for local data, so I appreciate tips on where to find data.
I wish every state made these data available for a free electronic download. If your state does not, I urge you to contact your state legislator and see why not.
But suppose you do have these data: what do you do with them?
It turns out that it’s not very hard to go from individual level vote reports to turnout information, if you have the right toolbox. The tool you need is a statistical program capable of reading in datafiles that have hundreds of thousands of cases. That’s too many for Excel. The most commonly used packages in political science are Stata (the example shown below) and R. (The big advantage of R is that it is publicly available, but I’m not conversant yet with the software. My hopes are that some entrepreneurial reader of this blog will translate the Stata code into R code.)
With the tools in hand, the steps involved can seem confusing, but if you follow the attached presentation, I think not too difficult. In brief:
- You start with individual voter records that include the name, age, party, date that the absentee ballot was requested, date that the absentee ballot was returned, and the status of the absentee ballot. (We’re want to know if the ballot was “accepted” or not.) The data file is freely downloadable at ftp://www.app.sboe.state.nc.us/enrs/absentee11xx06xx2012.zip
The file looks something like thisVOTER CODE JOHN SMITH 123 MAIN ST RALEIGH NC … DEM … 10/1/2012 10/15/2012 BY MAIL ACCEPT
- You need to convert the date variables, which look to statistical programs like a string of characters (e.g. 10/10/2012) to a “date” variable.
- We count up how many partisan requests there were for absentee ballots.
- You need to code the ballot as accepted (voted = 1) or not (voted =0).
- Now things get tricky. We “collapse” the data so that our smaller data file is organized by date and by party. The file will end up looking like this:
DATE DEMS REPS UNA DEMVOTED REPVOTED UNAVOTED
10/15/2012 10,219 9221 8217 123 . .
10/15/2012 10,219 9221 8217 . 347 .
10/15/2012 10,219 9221 8217 . . 456This made up file shows that on Oct. 15, 123 Democratic ballots were returned, 347 Republican ballots, and 456 Unaffiliated ballots.
- With this file in hand, we “cumulate” the number of returned ballots, divide by the number in each party, and voila! We have the percentage of partisan ballots returned by day.
Obviously, it’s a bit more complicated than that, but I hope this powerpoint presentation (PDF format) that I prepared for my class can guide anyone through the process. The Stata do file referenced in the Power Point can be downloaded as well.
These data are “fresh” as of end of day Friday, and downloaded from the Secretary of State’s website.
Data from the Ohio Secretary of State as of 11/2/2012