- Definition, and relationship to geographic representation
- Conception, measurement and analysis
- Vagueness, indeterminacy accuracy
- Statistical models of uncertainty
- Error propagation
- Living with uncertainty
- Biases and Statistical Fallacies
Are these true? Why?
Introduction

What do we mean by 'uncertainty'? (Spatial, Temporal, Thematic) Uncertainty
- If nothing change during an intervention, the results shows the impact of intervention
- If something is good, then more is better
- If I get too many tails, I should get a head quite soon
- Good-looking people are self-centered
Introduction
- Our world is too big and complex to be measured, studied, modeled, represented, and predicted with zero level of uncertainty
- Truth value is unknown!
- Measurements are not perfect (issues with quality and biases)

What do we mean by 'uncertainty'? (Spatial, Temporal, Thematic) Uncertainty
- Uncertainty accounts for the difference between the contents of a dataset and the phenomena that data are supposed to represent
- Inaccuracy and error: deviation from true values.
- Vagueness: imprecision in concepts used to describe the information (e.g., near, close, around, far).
- Incompleteness: lack of relevant information.
- Inconsistency: conflicts arising from the information.
- Imprecision: limitation on the granularity or resolution at which the observation is made, or the information is represented.
Complexity and size of our world makes it virtually impossible to capture
(A) every single facet,
(B) at every possible scale,
(C) from all individuals' perspectives (as we see world differently!)

Sources of Uncertainty
(A) every single facet,
(B) at every possible scale,
(C) from all individuals' perspectives (as we see world differently!)

Sources of Uncertainty
- Measurement error: different observers, measuring instruments
- Let's play! Volunteer plz
- Specification error: omitted variables, measuring the correct thing
- Ambiguity, vagueness and the quality of a GIS representation.
Uncertainty 1: Uncertainty between the real world and its Conception
More on vagueness
Changes in the shape or spatial extent of the geographical unit for which data are presented can change the patterns which are revealed
This is known as the Modifiable Areal Unit Problem(MAUP)
MAUP: example
MAUP is a challenge in choropleth maps, due to boundary options
Often can be assessed via geographic sensitivity analysis (i.e. repeating the analysis using a variety of different spatial units for the same area)The Modifiable Areal Unit Problem - MAUP
Scale + aggregation = MAUP can be investigated through simulation of large numbers of alternative zoning schemes.
Scale and geographical units
Functional spatial units can help avoid problems of MAUP observable with uniform units
Functional regions define units according to phenomena – e.g commuting patterns
Mortality rates classic example...
Low crude death rate West Lothian (Edinburgh)
Much higher SMR– West Lothian young population
Error propagation measures the effects of errors and uncertainty on the results of GIS analysis
Almost every input to a GIS is subject to error and uncertainty
In principle, every output should have confidence limits or some other expression of uncertainty – in practice this is not always the case...
Taking Advantage of geographic uncertainty: Gerrymandering
From Wikipedia: First printed in March 1812, this political cartoon was drawn in reaction to the state senate electoral districts drawn by the Massachusetts legislature to favour the Democratic-Republican Party candidates of Governor Elbridge Gerry over the Federalists. The caricature satirises the bizarre shape of a district in Essex County, Massachusetts as a dragon. Federalist newspapers editors and others at the time likened the district shape to a salamander, and the word gerrymander was a blend of that word and Governor Gerry's last name.
Are these always true?
If something is good, more is better
If nothing change during an intervention, the results shows the impact of intervention
If I get too many tails, I should get a head quite soon
If goodlooking people are self-centred.
...And be careful about negative rates/percentages in your analysis
...Ah, also causality and correlation
And GIGO!
Importance: GIGO(garbage in, garbage out)
Most algorithms are only as good as the quality and accuracy of the training data. If the data is biased or skewed in any way, the resulting algorithm will be, as well.
Dealing with uncertainty in analysis – ground truthing
It is easy to see the importance of uncertainty in GIS but much more difficult to deal with it effectively
We need to acknowledge this and account for it as best we can. Don’t pretend it doesn’t exist, or our data/results are perfect!
38.74564% of all statistics are made up (including this one!) Evaluate the reliability, provenance of your data
Reproducability
Some Basic Principles
Uncertainty is inevitable in GIS
Data obtained from others should never be taken as truth (but essential for quality assessment)
Use as many sources of data as possible
Consolidation
Uncertainty is more than error
Richer representations create uncertainty!
Need for a priori understanding of data and sensitivity analysis
- Spatial uncertainty
- Natural geographic units?
- Discrete objects
- Vagueness
- Statistical, cartographic, cognitive
- Ambiguity
- Values,language
More on vagueness
- What is rural?
- What is remote?
- What is accessible?
- What is large?
- Statistical thresholds for each?
- Different sources
- Direct/indirect indicators
- Context
Uncertainty 2: Uncertainty in measurement and representation
- Representational models filter reality differently
- Vector
- Raster
- A Coastline
- Vector resolution– number of vertices?
- Cell assignment?
- Statistical measures of uncertainty: a raster example
- How to measure the accuracy of nominal attributes?
- e.g. land/ sea or a vegetation cover map
The confusion matrix
- compares recorded classes (the observations) with classes obtained by some more accurate process, or from a more accurate source (the reference)
- Confusion matrix used to measure change over time, or error in measurement when data are gathered from different sources
- Examining every parcel may not be practical
- Rarer classes should be sampled more often in order to assess accuracy reliably sampling is often stratified by class
- Survival Bias
- Volunteer sampling vs random sampling
- 300 types of cognitive bias!
- Law of Small
- Numbers (>> the erroneous belief that small samples must be representative of the larger population)
- Law of Large Numbers
- Confirmation and congruence biases
- Bandwagon, groupthink and herd behaviour biases
- Status quo, loss aversion and endowment biases
- Lake Wobegon, self-serving and overconfidence
Why else data sources may not agree – precision & accuracy
The term precision is often used to refer to the repeatability of measurements. In both diagrams six measurements have been taken of the same position, represented by the center of the circle. On the left, successive measurements have similar values (they are precise), but show a bias away from the correct value (they are inaccurate). On the right, precision is lower but accuracy is higher.
Why else data sources may not agree - Reporting Measurements
Why else data sources may not agree - Reporting Measurements
- The amount of detail in a reported measurement (e.g., output from a GIS) should reflect its accuracy
- “14.4m” implies an accuracy of 0.1m
- “14m” implies an accuracy of 1m
- Excess precision can be removed by rounding
- The geographic units chosen to represent spatial phenomena can lead to uncertainty due to:
- Scale
- Spatial extent(shape)
- http://www.ons.gov.uk/ons/guide-method/geography/ons-geography/index.html
Scale & Geographic Units
UK example:- Scale of analysis can alter information revealed in GIS
- Should never assume patterns present at one level are applicable to other levels further down the hierarchy – this is known as the ecological fallacy
- Assumes that all individuals in a group (such as an OA) share the same average characteristics.
- This is clearly not the case.
- “The average age for people in Camden is 35
- Useful context but problematic in calculations.
- Several layers of administrative geography in UK
- Government Office Regions are largest
- 2011–Y&H Crude Death Rate 9.2 people per 1000
- Is this accurate for everywhere in Y&H...?
- Local Authority districts are the next level down the hierarchy
- Crude death rate for Leeds – 8.5 people per 1000
- Big student town, many young people
- Wardsor Middle Level Super Output Areas (MSOA) next level down
- More deprived, poorer life expectancy East Leeds – CDR (estimated) 10.1 per 1000
- Lower Level Super Output Areas (LSOA) nested within MSOA
- SOAs examples of uniform spatial units as similar populations
- CDR could be10.5 per 1000
- Output Areas (also Uniform units) are the smallest geographical area for which data are commonly produced in UK
- Old peoples’ home in LS15 4BX – increases CDR for whole OA
- CDR12.1 per 1000
Changes in the shape or spatial extent of the geographical unit for which data are presented can change the patterns which are revealed
This is known as the Modifiable Areal Unit Problem(MAUP)
MAUP: example
MAUP is a challenge in choropleth maps, due to boundary options
Often can be assessed via geographic sensitivity analysis (i.e. repeating the analysis using a variety of different spatial units for the same area)
Scale + aggregation = MAUP can be investigated through simulation of large numbers of alternative zoning schemes.
Scale and geographical units
Functional spatial units can help avoid problems of MAUP observable with uniform units
Functional regions define units according to phenomena – e.g commuting patterns
Other representation issues:
We often need to understand the underlying context of our dataMortality rates classic example...
Standardised Mortality Rates
Underlying population structures can obscure patternsLow crude death rate West Lothian (Edinburgh)
Much higher SMR– West Lothian young population
Uncertainty 3: Uncertainty in Analysis - Error Propagation
Cumulative effects of errors can be large and errors can persist in geographic dataError propagation measures the effects of errors and uncertainty on the results of GIS analysis
Almost every input to a GIS is subject to error and uncertainty
In principle, every output should have confidence limits or some other expression of uncertainty – in practice this is not always the case...
Taking Advantage of geographic uncertainty: Gerrymandering
From Wikipedia: First printed in March 1812, this political cartoon was drawn in reaction to the state senate electoral districts drawn by the Massachusetts legislature to favour the Democratic-Republican Party candidates of Governor Elbridge Gerry over the Federalists. The caricature satirises the bizarre shape of a district in Essex County, Massachusetts as a dragon. Federalist newspapers editors and others at the time likened the district shape to a salamander, and the word gerrymander was a blend of that word and Governor Gerry's last name.
Are these always true?
If something is good, more is better
If nothing change during an intervention, the results shows the impact of intervention
If I get too many tails, I should get a head quite soon
If goodlooking people are self-centred.
...And be careful about negative rates/percentages in your analysis
...Ah, also causality and correlation
And GIGO!
Importance: GIGO(garbage in, garbage out)
Most algorithms are only as good as the quality and accuracy of the training data. If the data is biased or skewed in any way, the resulting algorithm will be, as well.
Dealing with uncertainty in analysis – ground truthing
- Geo demographic classifications classify areas according to types of people living within
- Used commercially and in public sector
- OutputArea Classification
- Various sources of analysis error in classifications
- Clustering algorithm
- Datasources
- Variable selection
- Ground-truthingcan involve visiting places ‘on the ground’ to see if profiles make sense
It is easy to see the importance of uncertainty in GIS but much more difficult to deal with it effectively
We need to acknowledge this and account for it as best we can. Don’t pretend it doesn’t exist, or our data/results are perfect!
38.74564% of all statistics are made up (including this one!) Evaluate the reliability, provenance of your data
Reproducability
Some Basic Principles
Uncertainty is inevitable in GIS
Data obtained from others should never be taken as truth (but essential for quality assessment)
efforts should be made to determine quality
Effects on GIS outputs are of ten much greater than expected there is an automatic tendency to regard outputs from a computer as the truth.
More Basic PrinciplesEffects on GIS outputs are of ten much greater than expected there is an automatic tendency to regard outputs from a computer as the truth.
Use as many sources of data as possible
and cross-check them for accuracy
Be honest and informative in reporting results
add plenty of cave at sand cautions.Be honest and informative in reporting results
Consolidation
Uncertainty is more than error
Richer representations create uncertainty!
Need for a priori understanding of data and sensitivity analysis
No comments:
Post a Comment