Assignment 5: Correlation and Spatial Autocorrelation

Goals:

The goal of this assignment was to become familiar with correlation and spatial autocorrelation. More specifically, the goals were as follows.

Creating a scatterplot in Microsoft Excel
Running a correlation using SPSS software.
Interpreting Correlation from a scatterplot and SPSS outputs
Using the US census site to download data and shapefiles
Identify GEOIDs from census data
Join Census data to other data
Creating a report to connect the data

Methods:

The first portion of the assignment focused on correlation. Correlation is a measurement of the relationship between variables measured on a scale from -1 to 1. Negative values indicate negative correlation, meaning as one value increases, the other decreases. Positive values indicate positive correlation, meaning as one value increases so does the other. The closer the value is to 1 or -1, the stronger the correlation is. Using data that was provided, a scatterplot was created in Microsoft Excel showing the relation ship between sound decibels and the distance one is from the sound. The chart shows a negative relationship between the two variables.

Figure 1
Scatter plot showing Sound level/Distance Relationship

The data was then analyzed in IBM SPSS software using a Pearson correlation with a two tailed test. The results show a correlation value of -.896 for both variables, implying a strong negative correlation. This means that the further away you are from a sound, the quieter it is. Logically this makes sense. The correlation analysis results may be seen below in figure two.

Figure 2
Pearson Correlation Two-Tailed test result

In the following section, a correlation matrix was created analyzing patterns in census data for Detroit, Michigan. The data analyzed included population and income data.

Correlation Matrix for Detroit Census Data

Some patterns that occur include a correlation between education and income. For example there is a strong positive correlation between individuals with bachelors degrees and household income levels as well as average home values. This is shown with correlation values of .753 Several racial patterns may also be observed. There is a strong negative correlation between black and white populations at the census tract level. What this indicates is that as the white population increases in an area, the black population decreases, and vice versa. This can be seen in the correlation value of -.604

Part 2: Spatial Autocorrelation

This part of the assignment focused on Spatial Auto-correlation, which is a measurement of how a variable is correlated with itself through space. In other words, it is a measure of how closely grouped or dispersed data are throughout a given space. This may be calculated using a Moran's I statistic. If several nearby or neighboring areas are alike, they are positively spatially auto-correlated. If different, they are negatively spatially auto-correlated.

Auto-Correlation can be measured using a Moran's I statistic. This is a complicated formula that determines how spatially auto-correlated data are. Because of the complexity of the equation a computer program is typically used to measure this. A Moran's I value can range from -1 to 1, with values closer to -1 indicating more dispersed data and values closer to 1 indicating more clustered data.

Figure 3: Equation to Calculate Moran's I Value

Another method of testing spatial auto-correlation is to use a Local Indicator of Spatial Auto-correlation (LISA) map. This method groups variables into four quadrants similar to an X,Y chart based on their values. The four quadrants, or categories are as follows

High,High- Areas of high value surrounded by other high value areas

High,Low- Areas of high value surrounded by areas of low value

Low , High- Areas of low value surrounded by areas of high value

Low, Low- Areas of low value surrounded by other low value areas.

To better put this into context, an exercise involving spatial auto-correlation using real world data was completed. The summary and results may be seen below.

_________________________________________________________________________________

Background: I have been provided election data by the Texas Election Commission (TEC) for the 1980 and 2012 presidential elections. More specifically, the data is the percentage of democratic votes. I have also downloaded a table from the US census containing information on the percent Hispanic populations at the county level in 2010 They have asked that I analyze the data to determine if voting patterns and voter turnout are clustered in the state. This data will be provided to the Governor of Texas to determine if voting patterns have changed over 32 years.

Methods: To process this data I used a combination of Morans I calculations and LISA maps. The first variable I analysed was the percentage of democratic votes for the 1980 and 2012 elections.

Figure 4
Moran's I and LISA Map of Democratic vote percentage in 1980

The Moran's I value is 0.57, which indicates that there is clustering in the data. The clustering can be seen on the LISA map. There appears to be a higher concentration of democratic votes in the southern counties along and near the Mexican Border. These high values are indicated by the red points in the upper right (high, high) quadrant of the scatterplot and their corresponding locations on the map highlighted in yellow.

Figure 5
Moran's I and LISA Map of Democratic vote percentage in 2012

Similar patterns may be observed for the 2012 election. The Moran's I value is .069, indicating stronger clustering than the previous map. Again, there appears to be a higher percentage of democratic vote in the southern counties and a lower percentage in the northern counties.

___________________________________________________________________________

The second variable I studied was the voter turnout rates for the 1980 and 2012 presidential elections. Again, I used the Moran's I statistic and LISA maps.

Figure 6
Moran's I and LISA Map of Voter Turnout for the 1980 presidential election

The Moran's I value is 0.48, indicating a positive spatial auto-correlation. There appears to be a strong clustering of low voter turnout in the southern counties near the Mexican border and a clustering of high voter turnout in the north and central parts of the state.

Figure 7
Moran's I and LISA Map of Voter Turnout for the 2012 Presidential Election

The data for voter turnout in the 2012 election is slightly different. The Moran's I value is 0.33. This still indicates positive spatial auto-correlation, however less so than the previous map. There is still a large cluster of low voter turnout in the south, but there are less counties in the north exhibiting high voter turnout.

_________________________________________________________________________________

Finally, the TEC suggested that if clustering is present, population data should also be examined for clustering to determine how different populations may influence election patterns. Because Texas, compared to other states, has a high Latino population, this was the population that was examined. Using United States 2010 census data a Moran's I and LISA map were created to look for clustering in the population.

Figure 8
LISA Map of % Hispanic Population using 2010 Census Data

The Moran's I value is 0.77 indicating a strong positive spatial auto-correlation. There is a very large cluster of Latino populations in the southern counties near the Mexican border. This would make sense. There is also a strong cluster of non- Hispanic populations in the northeastern counties.
______________________________________________________________________________

Conclusions: Multiple conclusions may be made by analyzing this data. The TEC is interested in studying clustering in the data as well as differences in percent democratic vote and voter turnout between 1980 and 2012.

LISA Maps comparing Democratic Vote in 1980 (Left) and 2012 (Right)

There are notable differences in the clustering patterns for percent democratic vote between the 1980 and 2012 elections. On both maps there is a clustering of low values in the north and high values in the south. The differences may be seen in the eastern and western parts of the state where the results almost appear mirrored. In 1980 there was a significant clustering of low democratic vote in western counties bordering New Mexico and a cluster of high democratic vote in the far eastern counties. The opposite occurred in 2012.
________________________________________________________________________________

LISA Maps comparing Voter Turnout in 1980 and 2012

There are also differences in clustering with voter turnout levels between the two elections. In both elections there was a significant cluster of low voter turnout in the southern counties. Areas with high voter turnout have changed, though. In 1980 there was a cluster of high voter turnout areas in the northern counties. This changed in 2012, when no significant cluster was found.
_________________________________________________________________________________

Since 1980, Texas has sided with the Republican Party in every single election. These LISA maps along with the census data provide some insight as to why. In both 1980 and 2012 there was a strong clustering of high democratic vote and low voter turnout the southern counties. There is also a strong positive spatial auto-correlation of percent Hispanic populations in these same counties. The high democratic vote can be contributed to the fact that, historically, Hispanic populations in the United States have tended to vote democratic. As for low voter turnout, this can also be explained. It is very possible that some people choose simply to not go to the polls to vote. However this strong clustering of low voter turnout may be related to the fact that in order to vote in a presidential election one must have a voter ID. This cannot be obtained without a US birth certificate. As a result of this, any illegal or undocumented person in United States cannot vote. Since many Latino individuals in the counties along the Mexican border may be illegal or undocumented immigrants, this would impact voter turnout. In conclusion a number of the population most likely to vote democratic was not legally allowed to vote, and the majority of the vote went to the Republican party,

Search This Blog

Geography 370 Quantitative Methods

Assignment 5: Correlation and Spatial Autocorrelation

Comments

Post a Comment

Popular posts from this blog

Assignment 6 - Regression Analysis

Assignment 4: Hypothesis Testing

Assignment 3: Z-Scores and Probability