Assignment 3: Z-Scores and Probability



Introduction:

The focus of this lab is using probability and Z-scores to investigate housing foreclosures in Dane County, WI between the years 2011 and 2012.

For this scenario, I have been hired by an independent research consortium to spatially analyze foreclosures in Dane County. The group is concerned about the number of foreclosures and would like to see if a similar trend will occur in he following years. 

The specific research questions for this project are  "Where have foreclosures occurred in 2011 and 2012?" and more specifically "What changes and spatial patterns can be observed in the foreclosure rate between the two years" ?
 
 Figure 1
Location of Study Area in Wisconsin

Data: 
The data used in this project is census tract data for Dane County Wisconsin. Within the data are the foreclosure numbers per census tract for the years 2011 and 2012

Methodology: 

This project usevariety of statistical and cartographic analyses. 

Mean Center: In order first understand where foreclosures are taking place, a mean center is calculated. This represents the average location of foreclosures within the county. To calculate a mean center, the latitude and longitude locations of each foreclosure are converted to x and y values on coordinate grid. The average of the x values (latitude) and the average of the y values (longitude) are combined to give the average x,y point, which in terms of latitude and longitude is the mean center

Standard Deviation: A standard deviation is a measure used to determined how far an observation on average differs, or deviates from the average. The formula for calculating a standard deviation can be seen in figure 2.
Figure 2.
Standard Deviation Formula

This project uses standard deviations to determine if the number of foreclosures in a census tract is above or below the average for the county. 

Z-Scores: A Z-score is a measure of how many standard deviations, on a scale of 0-3, an observation is from the norm. A lower Z-score means an observation is close to the average. A high Z-score indicates that the observation value is far away from the average, and would be considered an outlier. The formula for calculating Z-scores can be seen in figure 3.

Figure 3
Z-Score Formula

In this project, Z-Scores are used to analyze the number of foreclosures in each census tract and determine how far the values deviate from the average.

Probability: Probability is a measure of how often a certain outcome will occur. The probability of an outcome can be calculated using Z-scores by looking up the score on a probability chart. Each Z-score from 0-3 has a matching probability associated with it. An example of a probability chart is shown in figure 4.
Figure 4
Probability Chart

This project uses probability to predict the outcome of the foreclosure trend continuing into the following years.

Results:

Mean Center:

Figure 5
Map of Mean Centers

As can be observed in figure 5, the mean center locations for 2011 and 2012 do not differ much if at all. Both mean centers are located in the downtown Madison, Wisconsin area. This indicates the average locations for foreclosures remained consistent between the two years.  

Standard Deviations:

Figure 6: 
Standard Deviation Map of Change in Foreclosure Rate

Figure 6 is a standard deviation map of the change in foreclosure rate at the census tract level between the years 2011 and 2012. The standard deviation is 5.5 foreclosures, meaning that an observation that is one standard deviation above the mean will have 5.5 foreclosures higher than the average. The results indicate that census tracts that are surrounding the downtown Madison area and census tracts on the edge of the county have a larger number of foreclosures. 


Probability and Z-Score
Figure 7. 
Location of Tracts used in Z-Score Analysis

This portion of the assignment involved using Z-scores and probabilities to determine the number of foreclosures in upcoming years. To begin, the Z-scores for 3 census tracts (25,108 and 120.01) were calculated using statistics from the data for the years 2011 and 2012. They are as follows

2011
Tract 25:   -.61
Tract 108:   2.0
Tract 120.1  1.78

2012
Tract 25:  -.93
Tract 108:  1.48
Tract 120.1  3.0 (outlier)

These Z-scores follow the trend shown in figure 6 showing how higher Z-score values are located in census tracts surrounding the main city. 

The second question involves estimating the number of foreclosures for future years given probability. The question asks what number of foreclosures will occur 10% and 80% of the time in 2013 given the trends in 2011-2012. To calculate this number, the probability chart was used to find a Z-score. According to the chart (figure 4) a probability of .8 has a corresponding Z-score of .84 and a probability of 10% has a score of 1.28

Given these Z-scores, the number of foreclosures can be calculated using the formula in figure 3 by solving for X. The mean was calculated by averaging the total number of foreclosures for 2011 and 2012, and the standard deviation was calculated using this mean. 

Mean: 1268 foreclosures
Standard Deviation= 68.59

The results indicate that in 2013 1325 foreclosures will be exceeded 80% of the time and 1355 foreclosures will be exceeded 10% of the time.

Conclusions:

All of the analyses seem to indicate a trend of higher foreclosure rates surrounding the downtown Madison area and the surrounding suburbs. This can be explained by recent activities in the surrounding areas. In recent years a large number of corporations have moved to the Madison and Sun Prairie (surrounding suburbs) area. The presence of these corporations increases property taxes for residents, which intern increases mortgage rates. As these rates increase people can no longer afford to pay their mortgage and end up foreclosing on their homes. In short, there is little that can be done to remove these companies from the area. However, the number of foreclosures can be decreased by encouraging home owners to move farther away from the city to an area with lower property taxes. 

Image Source Links.   

Comments

Popular posts from this blog

Assignment 6 - Regression Analysis

Assignment 4: Hypothesis Testing