- The importance of pattern
- Patterns of categorical point data – Point Pattern Analysis
- Quadrat Analysis
- Ripley’s K
- DBSCAN
- Patterns of spatially referenced continuous observations
- Spatial autocorrelation
- Defining near and distant things
- Measuring spatial autocorrelation
- Moran’s I
- Geary’s C
- LISA
The human eye/ brain is notoriously hard at spotting patterns!!
Quantifying Spatial patterns
Some techniques are more appropriate for understanding the spatial distribution of discrete objects or events of a categorical nature, e.g.
Cholera Deaths; Blue plaques; Trees; Post boxes; Burglaries, Breweries in London etc.
Often these events are recorded as points and as such the techniques fall under banner of: Point Pattern Analysis
Here the properties are fixed, but the space varies
•On other occasions, we might observe a spatial distribution of some kind of attribute, for example:
•Rate of smoking across output areas in London
•Levels of unemployment across Census tracts in the US
•Average number of road accidents along sections of road
•Where rates of smoking, levels of unemployment, numbers of road accidents appear more similar in places that are closer, they are said to exhibit Spatial Autocorrelation
•The level of Spatial Autocorrelation can be assessed using statistics such as:
•Geary’s C
•Moran’s I
Here the space is fixed, but the properties vary
Are the values clustered or are they random?Point Pattern Analysis
•At the core of point pattern analysis is the question:
•Are these points distributed in a random way or is there some sort of pattern (uniform or clustered)?
•The expected random model is known as Complete Spatial Randomness (CSR)
•A random distribution of points is said to have a Poisson distribution
•By comparing the distribution of observed points with a CSR Poisson model, we can tell if we have an interesting point distribution….
The Poisson Distribution
•Describes the probability or rate of an event happening over a fixed interval of time or space
•Where the total number of events in a fixed unit is small (e.g. Breweries in a London Borough), then the probability of getting a low rate is higher
•As number of events increases, the mean (λ – lambda) increases and the probability distribution changesThe Poisson Distribution applies when:
1.The events are discrete and can be counted in integers
2.Events are independent of each other
3.The average number of events over space or time is known
•It’s very useful in Point Pattern Analysis as it allows us to compare a random expected model to our observations
•Where our data do not fit the Poisson model, then something interesting might be going on! Our events might not be independent of each other – they might be clustered or dispersed and something might be causing this…
Quadrat Analysis
•Developed and used frequently by ecologists
•Grid of squares
•Count number of incidents (burglaries, cholera deaths, hippos etc.) in each cell – store results in a table
•Compare the observed occurrences with a CSR Poisson model…•We would expect the probability distribution of X(breweries in London) to have a Poisson distribution if they exhibit Complete Spatial Randomness
•We can test for CSR by comparing the observed and expected counts and using a test such as the chi-squared (Χ^2) test (more on this next week…)
•In this example, Χ^2 statistic p-value, < 0.05 so we have spatial clustering…
•Simple to employ and therefore common – gives us an idea of whether our data are clustered in space
•Results affected (sometimes quite seriously) by:
•Quadrat size (generally determined as the area of study area divided by number of features)
•Quadrat shape (if not uniform) and boundary
•Both are examples of The Modifiable Areal Unit Problem
•Results for this particular set of quadrats likely to be less-reliable!
•Results affected (sometimes quite seriously) by:
•Quadrat size (generally determined as the area of study area divided by number of features)
•Quadrat shape (if not uniform) and boundary
•Both are examples of The Modifiable Areal Unit Problem
•Results for this particular set of quadrats likely to be less-reliable!
•To avoid scale and zoning problems associated with quadrat analysis, Ripley’s K tests for CSR for circles of varying radii around each point
•k(r)= λ^(-1) ∑_i∑_jI(d_ij<r)/n
•In English: Ripley’s K value for any circle radius (r) = the average density of points at that radius (λ = (n/πr^2)), multiplied by the sum of the distances (d_ij) between all points within that search radius, divided by the total number of points, n
•I = 1 or 0 depending if d_ij < r
•Ripley’s K can be computationally intensive when there are lots of points to consider
•The extent of the study area can affect the calculation – eventually the radius around any point will include all other points in the study area
•Sometimes the phenomenon being studied cannot just occur anywhere (you can’t set up a brewery in the Thames or in the middle of Hyde Park) so will naturally cluster – this needs to be accounted for (compare with similar activities)
•Modifications to the K equation can deal with edge (border) effects
Extended Point Pattern Analysis
•Point Patterns can also be assessed in the presence of covariates (other points, lines or polygons)
•Covariate analysis explores the influence of other categorical factors on the spatial patterning of points
•Point Patterns can also be assessed in the presence of covariates (other points, lines or polygons)
•Covariate analysis explores the influence of other categorical factors on the spatial patterning of points
Density-based spatial clustering of applications with noise - DBSCAN
•DBSCAN is a popular algorithm for detecting clusters of points based on their density
•Popular because can detect non-linear clusters
•2 parameters:
•Epsilon (eps) = size of neighbourhood within which to search for other points
•Minimum number of points to search for (MinPts)
•If a point has >= MinPts in neighbourhood then defined as ‘Core’
•If point in neighbourhood of a core point but has <MinPts in its own neighbourhood, then defined as ‘Border’
•Popular because can detect non-linear clusters
•2 parameters:
•Epsilon (eps) = size of neighbourhood within which to search for other points
•Minimum number of points to search for (MinPts)
•If a point has >= MinPts in neighbourhood then defined as ‘Core’
•If point in neighbourhood of a core point but has <MinPts in its own neighbourhood, then defined as ‘Border’
•Frequently in spatial analysis we don’t just want to study discrete events, but the ways in which variables change across space…
•Similar observations in similar places might be the result of some underlying cause
Tobler’s First Law of Geography
"Everything is related to everything else, but near things are more related than distant things.”
Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240.
Formalising Tobler’s First Law: Spatial Autocorrelation
The correlation among observations of a single (auto means self) variable (smoking rates, unemployment, etc.) that is strictly attributable to the proximity of those observations in geographic space.
The correlation among observations of a single (auto means self) variable (smoking rates, unemployment, etc.) that is strictly attributable to the proximity of those observations in geographic space.
Spatial Autocorrelation
•“Everything is related to everything else, but near thingsare more related than distant things”
•Are the GCSE scores of pupils in London more likely to be similar in areas that are close to each other than those in areas that are distant?
•What are near and distant things?
•When it’s adjacent…?
•Contiguity (common boundary)
•What is a shared boundary? Edges (rook’s case)? Nodes (Bishop’s case)? Both (Queen’s case)?•How many neighbours are important?
•When they are the nearest, or second, third, fourth… Kth nearest?
What is K – how many neighbours?
•When it’s within a certain physical distance…?
•What threshold – 1km, 10km, 100km?
•What threshold – 1km, 10km, 100km?
•Indexes to compare values for neighbouring features
•Help answer the question – “are the values for neighbouring features more similar than those for all other features?”
•If the average difference between neighbouring features is less than between all features, values are clustered.
•Can specify neighbouring based on adjacency, set distance or distance to all features – always conceived as a spatial weights matrix.
•There are global (one number for the whole system of interest) and local (individual numbers for each zone in the system) versions of both.
Moran’s I
•Classic and very popular (best?) measure of spatial autocorrelation•Moran’s I = Do we have a clustered pattern or a dispersed pattern?
•Depends on neighbour definition
•Typically ranges from -1 (-ve AC) to 1 (+ve AC). 0 is no relationship.
•N = Number of spatial Units (6) indexed by i and j
•X_i = variable of interest (people, badgers etc.) in zone i
•¯X= mean = 3.83
•W_ij = spatial weights matrixGeary’s C
•Value of Geary’s C varies between 0 and 2
•1 = no spatial autocorrelation
•<1 positive auto-correlation (similar values near each other)
•>1 negative auto-correlation (dissimilar values near each other)
•Good if values are evenly spread and spikes are high or low
•Tends to cancel out if both high and low values cluster…
LISA – Local Indicators of Spatial Association
•E.g. Local Moran’s I & Getis-Ord Gi*•Evolution of both previous global statistics
•Allows us to see where in our spatial system:
•High or low values are clustered (Getis Ord Gi*) – ‘Hot Spot Analysis’Clustered values are in relation to other clustered values
Limititations Of Geary’s C, Moran’s I and Getis-Ord G : Effect Of Spatial Units/ Scale
•You can’t compare the Moran’s I, Geary’s C or Getis-Ord G statistics between study areas with different numbers of / sized zones. •The Modifiable Areal Unit Problem – MAUP (refer back to previous lectures and practicals)
•Calculating the neighbours is computationally intensive. Creates an n2 matrix.
•So 10 spatial units is manageable, but 1000 requires 1,000,000 comparisons, and 10,000 100,000,000 etc.
Limitations: Applications•So 10 spatial units is manageable, but 1000 requires 1,000,000 comparisons, and 10,000 100,000,000 etc.
•Global measures can mask local trends (although local versions deal with this to an extent)
•Especially with fine-scale spatial units
•Good summary, but never possible to capture complex spatial interactions in a single figure
•Normally used as an exploratory tool rather than a definitive statement
•Especially with fine-scale spatial units
•Good summary, but never possible to capture complex spatial interactions in a single figure
•Normally used as an exploratory tool rather than a definitive statement
Software Tools
•Various spatial autocorrelation algorithms (local and global) are available in both R and ArcGIS
•R is more advanced in terms of the options available – more difficult to master
•ArcGIS can be used for some of the more popular measures – easier to implement. It is often slower.
SUMMARY
•Well established techniques are available to allow us to investigate spatial phenomena
•Point Pattern Analysis techniques all compare observed distributions of points to an expected model based on the Poisson distribution – with varying degrees of sophistication
•Analysis of the spatial autocorrelation of continuous variables over space allows us to assess if similar values cluster in space
•How we define neighbours is crucial in spatial autocorrelation analysis
•Understanding your data is key to applying the appropriate technique
•Various spatial autocorrelation algorithms (local and global) are available in both R and ArcGIS
•R is more advanced in terms of the options available – more difficult to master
•ArcGIS can be used for some of the more popular measures – easier to implement. It is often slower.
SUMMARY
•Well established techniques are available to allow us to investigate spatial phenomena
•Point Pattern Analysis techniques all compare observed distributions of points to an expected model based on the Poisson distribution – with varying degrees of sophistication
•Analysis of the spatial autocorrelation of continuous variables over space allows us to assess if similar values cluster in space
•How we define neighbours is crucial in spatial autocorrelation analysis
•Understanding your data is key to applying the appropriate technique














 
No comments:
Post a Comment