By Canyon Foot ’20, Paul Manson ’01, Paul Gronke, and Jay Lee ’19
Motivation:
Canyon Foot and Paul Gronke have recently posted two analyses of the Portland City Council races. For these analyses, we hoped to understand the spatial and demographic variation of support for City Council and other contests defined by the geographic and political boundaries in Multnomah County.
What we are doing: Spatial joins between Census Tracts and precincts:
In order to answer these questions, researchers often rely on estimates produced by the US Census using the American Community Survey (ACS). Unlike the Decennial Census, the ACS samples a percentage of households each year to ask about detailed demographics including income, employment, housing, etc. These are then aggregated at various geographies from the block group up to counties, metropolitan areas and states. These smaller units have smaller samples, and thus more error. As such, researchers often must work with a larger area because of its greater sample size (and smaller error).
Election geographies do not line up perfectly with these lower error Census geographies, requiring some transformations and adjustments to use ACS estimates with the Multnomah County precincts that report vote counts.
Our strategy to address this problem is well-established. First we geocode each registered voter in the City of Portland and create a spatial join with the Census tract they are located in. Then using precinct boundaries we can proportionally aggregate these demographic characteristics into new geographies.
Specifically, the process is as follows:
- Sum the total number of voters inside each Census Tract and each precinct.
- Intersect the precinct boundaries with the Census Tract boundaries. This new set of polygons corresponds to the overlapping portions of each Census Tract and each precinct.
- Count the number of registered voters in each intersected polygon. Divide this count by the total number of registered voters in the corresponding Census Tract. This creates our fractional multiplier that we will use to allocate the Census values.
- Using this set of fractional multipliers, create a weighted mean or weighted sum to estimate precinct-level demographic quantities.
To illustrate the process, we will use this method to estimate the total population and the median income of a specific precinct.
Portland precinct 4601 contains Mount Tabor and some surrounding neighborhoods. If you mouse over the precinct, the total number of voters will be displayed.
The next map shows the five Census Tracts that overlap with this precinct. Each is labeled with the tract level population count, median income, and registered voter count. While one of the Tracts sits fully inside of the precinct, the others are split between multiple precincts.
The next map shows the five overlapping Census Tracts. Each is labeled with the tract level population count, median income, and registered voter count.
The third map shows the intersection and estimated population for each polygon. We have estimated the population in each polygon by multiplying the proportion of voters from the tract that live in the polygon (the fractional multiplier), then using these multipliers to create our Census estimate.
To produce “count” estimates for the precinct (such as population or number of people with a bachelor’s degree), all that needs to be done is add up the estimates from each of the polygons.
Household income is handled differently because the data reports a median for the Census geography. To produce the estimate of median income, the proportion of voters in each Tract-precinct intersection is used to weight an overall median for the precinct. These final estimates are shown in the map below:
Caveats and Limitations:
This approach has some limitations.
Creating estimates this way relies on two homogeneity assumptions. The first is a homogeneity between populations and registered voters: within a tract, the population is distributed similarly to the voters (i.e. there are not regions in the tract that have a much higher or lower rates of voter registration). We think that this is likely to be close to true.
The second assumption is distributional homogeneity within the tract: the population characteristics we are using are evenly distributed across the tract. For instance, if one half of a tract was high income and the other half was low-income, this method would attribute “average” income to the whole tract.
One final note: our method for calculating median income is not strictly correct. It is not generally possible to calculate the median of a set only using the medians of subsets – which is what we are forced to do to aggregate the medians. Nonetheless, we feel that the weighted mean approach is likely sufficient for our purposes here.
What are not doing: Ecological Inference: : It is intuitive to look at the demographic and political variations between precincts and assume that these provide insight into individual behavior — e.g. if a high-income precinct supports Candidate A, this means that high-income individuals support Candidate A.
This is a well-known example of the ecological fallacy. Correlations between aggregate groups do not necessarily imply individual correlations.
It is easy to fall prey to the ecological fallacy, in part because it is not a complete fallacy. Political methodologist Gary King famously proposed a “solution to the ecological inference” problem which shows that, while precise estimates of individual relationships are not possible, there are “bounds” that can be established that provide some leverage on individual level correlations. To put this simply, if all high income areas voted 90% for Candidate A, and all low income areas voted 90% for candidate B, there are mathematical boundaries around the possible individual-level correlations that could produce that aggregate relationship.
Ecological inference has been accepted by the Courts for use in voting rights cases, and there are two packages in R that provide access to this statistical technique.
We have not done the work to assess any individual voter-level results (nor is it clear that we would have sufficient statistical power, given the size of these precincts), and therefore are presenting only aggregate precinct-level analyses.
By Canyon Foot ’20, Paul Manson ’01, Paul Gronke, and Jay Lee ’19
Motivation:
Canyon Foot and Paul Gronke have recently posted two analyses of the Portland City Council races. For these analyses, we hoped to understand the spatial and demographic variation of support for City Council and other contests defined by the geographic and political boundaries in Multnomah County.
What we are doing: Spatial joins between Census Tracts and precincts:
In order to answer these questions, researchers often rely on estimates produced by the US Census using the American Community Survey (ACS). Unlike the Decennial Census, the ACS samples a percentage of households each year to ask about detailed demographics including income, employment, housing, etc. These are then aggregated at various geographies from the block group up to counties, metropolitan areas and states. These smaller units have smaller samples, and thus more error. As such, researchers often must work with a larger area because of its greater sample size (and smaller error).
Election geographies do not line up perfectly with these lower error Census geographies, requiring some transformations and adjustments to use ACS estimates with the Multnomah County precincts that report vote counts.
Our strategy to address this problem is well-established. First we geocode each registered voter in the City of Portland and create a spatial join with the Census tract they are located in. Then using precinct boundaries we can proportionally aggregate these demographic characteristics into new geographies.
Specifically, the process is as follows:
To illustrate the process, we will use this method to estimate the total population and the median income of a specific precinct.
Portland precinct 4601 contains Mount Tabor and some surrounding neighborhoods. If you mouse over the precinct, the total number of voters will be displayed.
The next map shows the five Census Tracts that overlap with this precinct. Each is labeled with the tract level population count, median income, and registered voter count. While one of the Tracts sits fully inside of the precinct, the others are split between multiple precincts.
The next map shows the five overlapping Census Tracts. Each is labeled with the tract level population count, median income, and registered voter count.
The third map shows the intersection and estimated population for each polygon. We have estimated the population in each polygon by multiplying the proportion of voters from the tract that live in the polygon (the fractional multiplier), then using these multipliers to create our Census estimate.
To produce “count” estimates for the precinct (such as population or number of people with a bachelor’s degree), all that needs to be done is add up the estimates from each of the polygons.
Household income is handled differently because the data reports a median for the Census geography. To produce the estimate of median income, the proportion of voters in each Tract-precinct intersection is used to weight an overall median for the precinct. These final estimates are shown in the map below:
Caveats and Limitations:
This approach has some limitations.
Creating estimates this way relies on two homogeneity assumptions. The first is a homogeneity between populations and registered voters: within a tract, the population is distributed similarly to the voters (i.e. there are not regions in the tract that have a much higher or lower rates of voter registration). We think that this is likely to be close to true.
The second assumption is distributional homogeneity within the tract: the population characteristics we are using are evenly distributed across the tract. For instance, if one half of a tract was high income and the other half was low-income, this method would attribute “average” income to the whole tract.
One final note: our method for calculating median income is not strictly correct. It is not generally possible to calculate the median of a set only using the medians of subsets – which is what we are forced to do to aggregate the medians. Nonetheless, we feel that the weighted mean approach is likely sufficient for our purposes here.
What are not doing: Ecological Inference: : It is intuitive to look at the demographic and political variations between precincts and assume that these provide insight into individual behavior — e.g. if a high-income precinct supports Candidate A, this means that high-income individuals support Candidate A.
This is a well-known example of the ecological fallacy. Correlations between aggregate groups do not necessarily imply individual correlations.
It is easy to fall prey to the ecological fallacy, in part because it is not a complete fallacy. Political methodologist Gary King famously proposed a “solution to the ecological inference” problem which shows that, while precise estimates of individual relationships are not possible, there are “bounds” that can be established that provide some leverage on individual level correlations. To put this simply, if all high income areas voted 90% for Candidate A, and all low income areas voted 90% for candidate B, there are mathematical boundaries around the possible individual-level correlations that could produce that aggregate relationship.
Ecological inference has been accepted by the Courts for use in voting rights cases, and there are two packages in R that provide access to this statistical technique.
We have not done the work to assess any individual voter-level results (nor is it clear that we would have sufficient statistical power, given the size of these precincts), and therefore are presenting only aggregate precinct-level analyses.