Assignment 6 - Regression Analysis

Goals:

The goals of this assignment are to understand how to perform regression analysis and to predict results given data from a regression output. This information will then be mapped in ArcMap

Background: In order to understand this assignment, it is important to understand the basic elements of a regression analysis.

Regression Analysis- A statistical tool that is used to investigate the relationships between 2 variables

Regression Equation- Y= a +bx. Almost identical to the equation of a line

Independent Variable = The X variable in the equation. Explains the Dependent Variable

Dependent Variable= The Y value of the equation. Explained by the Independent Variable

Regression Coefficient= The slope of the line,  shows responsiveness of dependent variable to change in the independent variable.

Constant- The A value, or Y when X=0

Coefficient of Determination (r^2 value)= Measures, from 0 to 1, how well one variable explains another.

Methods:

Part 1- The first part of this assignment focuses on understanding regression outputs. A Excel spreadsheet was provided which contained data for an unnamed town. The data contains the percentage of children who receive free lunch for several different neighborhoods as well as crime data for the same locations. A local news station is investigating whether or not there is a significant correlation between poverty and crime rate. They claim that as the number of children who receive free lunch increases, so does crime. Using this data, a regression analysis was performed using SPSS statistical software to determine if this is true. The results may be seen below.

Figure 1
Coefficients and Model Summary for Part 1

The dependent variable was the percentage of children who receive free lunch and the dependent variable is the crime rate per 100,000 people. The slope is positive, indicating a positive direction. In other words, as one variable increases, so does the other. The coefficient of determination .173. Because this value is greater than zero, it indicates that one variable does explain the other, however the association is very weak. According to this model, if the town were to have 30% of students with free lunch, the crime rate would be 72 crimes per 100,000 people. I am rather confident in the results. There is a correlation between poverty and crime rate, however the amount of children on school lunch is not an influential factor. This can be seen in the spreadsheet. One neighborhood with 52% of children on free lunch had a crime rate of 42.2, whereas a neighborhood with only 17% of children on free lunch had a higher crime rate of 132. This shows how the results are too varied to be considered a significant correlation.
Figure 2
Table displaying conflicting results




Figure 3
Scatter Plot showing a weak correlation between the two variables. 
Part 2

The second part of the assignment involves using a dataset regarding 911 calls in Portland, Oregon. The scenario is that the City of Portland is concerned about sufficient response times for 911 calls. They would like to know what might explain where calls are coming from. Another company is interested in building a hospital and would like to know the best place to put it. I performed 3 separate regression analyses to determine some factors that may be correlated to 911 calls. 
_________________________________________________________________________________

Figure 4
Regression analysis results comparing alcohol sales to 911 calls
The first analysis performed examined the relationship between alcohol sales and 911 calls. The coefficient of determination is .152, indicating a weak relationship between the two variables. The constant is 9.5, meaning that if there were no alcohol sales, there would still be on average 9.5 911 calls. If a hypothesis test were being performed, the null hypothesis would be rejected because the significance value is below .05. However, it is just barely over this threshold, so the association is very weak. According to these results, for every alcohol sale, the number of 911 calls increase by 3.069E-5, or in other words a very small amount.  These results indicate that alcohol sales alone do not significantly contribute to the amount of 911 calls
_______________________________________________________________________________

Figure 5
Regression analysis results comparing low education and 911 calls

The second regression analysis examines the relationship between individuals without high school degrees and 911 calls. The coefficient of determination is .567. This indicates a fairly strong relationship between the two variables. The slope is positive, meaning as one variable increases, so does the other. The significance value for low education is far below .05 so the null hypothesis would be rejected. For every 1 person without a high school degree, the number of 911 calls increase by 16 percent. These results all indicate that there is a significant linear relationship between individuals without high school degrees and 911 calls.
_________________________________________________________________________________

Figure 6
Regression analysis results comparing foreign born populations and 911 calls

The final regression analysis examines the relationship between foreign born populations and 911 calls. The coefficient of determination is .55, meaning that there is a moderately strong relationship between the two variables. For every one person who is foreign born, 911 calls will increase by 8 percent. The significance value for foreign born populations is far below the threshold, therefore the null hypothesis is rejected. Based on these results it can be concluded that there is a significant linear relationship between the number of foreign born populations and 911 calls.

_________________________________________________________________________________

The final phase of this project involved creating a residual and choropleth map that illustrates the results 
Figure 7
Standard deviation map of 911 calls in Portland
The standard deviation map illustrates the census tracts in Portland that have higher numbers of 911 calls. The average amount of calls was 25 and the standard deviation was 28. This means that the census tracts in the north that were 1.5 standard deviations above the mean had an average of 42 more calls than other tracts.

Figure 8
Residual Map of 911 calls compared to lower education
The final map is a map of the residual for each census tract, or how far each census tract deviates from the best fit line if the values were placed on a trend line. In other words, this map shows the locations of major outliers in the data. The orange and red areas show outliers with a higher number of 911 calls, and the blue areas are ones with lower numbers. Comparing this map to the choropleth map, one can see that northern census tracts receive a higher than average number of 911 calls than the other tracts. If I were a company looking to build a new hospital, this would be the best place to do so. 
Figure 9:
Ideal Census tracts for a new hospital (shown in blue)

Comments

Popular posts from this blog

Assignment 4: Hypothesis Testing

Assignment 3: Z-Scores and Probability