Jyothi Gupta's blog: September 2020

Saturday, 26 September 2020

Spatial Analysis Methodologies and Spatial Processes

Outline

Spatial Analysis basics – Geometric Properties:
Calculating Distance, Area, Shape
Generalisation and Centroids
Spatial analysis concepts:

• Topology
• Clip, Merge, Intersect

Spatial Analysis

“Spatial analysis or spatial statistics includes any of the formaltechniques which study entities using their topological, geometric, or geographic properties.” – Wikipedia (slaps wrist...!)
Spatial analysis is a set of methods whose results are not invariant under changes in the locations of the objects being analysed.
Spatial Analysis (simple or complex)can make what is implicit explicit.
We do not look at analysis of attribute now.
Geometric properties are associated the shape, size and relative positions of objects
Topological properties relate to the elements of an entities geometry that remain unchanged if you alter their shape, size etc.
Geographic properties relate to an entities place on the earth – the importance of location

Distance

Calculating Distance on a National Grid

Very easy using trigonometry: A2 + B2 = C2
Distance between:
Blue dot (x = 300,y = 800) Red dot (x = 500,y = 200)
Abs(300-500)2+ Abs(800-200) 2 = C2
2002 + 6002 = 400,000
Sqrt(400,000)= 632.5km

The importance of distance in spatial analysis...
We do many analysis and processes using distance such as Select, Clip, Proximity and Buffer, Interpolation, Overlay and Union and Intersect

Geometric properties: Area
Depends on distance
Very common

calculation in GIS
Different to shape
Can be summarized as a bounding box (i.e. a box which contains the shape of interest)
Very important as a denominator for density, e.g. Population / Area = Population density
Calculating Area – a standard GIS method
Calculating area is straight forward when we know the coordinates of the vertices...

Shape – assessing the level of Gerrymandering
Shapes of electoral districts have become very irregular in some places as politicians have attempted to alter boundaries to capture more votes
Ansolabehere and Palmer (2015) use measures of shape compactness to assess the level of Gerrymandering in US congressional districts
http://www.vanderbilt.edu/csdi/events/ansolabehere_palmer_gerrymander.pdf

Geometric properties: Shape
Again, dependent on distance.
Various ways to measure irregular polygons:
Perimeter to area ratio. 𝐿𝑖2 𝐴𝑖

Compactness ratio. 𝑠=𝑃𝐴. 𝐴𝑖 𝐵𝑖. 𝐶𝑖 = 𝜋2

Geometric properties: Smoothing/ Generalisation
Overly complex (detailed – large in computer memory) shapes can be generalised.
Various algorithms will generalise shapes
Smoothing can be used to estimate shapes with poor spatial resolution

Topology

Konigsberg Bridge Problem - Can you find a walk through the city that would cross each bridge once and only once? Euler, 1736

Geography, shape, distance all irrelevant, only the relationship between the land masses and the bridges connecting them

Laid the foundations of Graph Theory

Topology

An important concept in spatial analysis. 4 main properties:
Dimensionality: the distinction between point, line, area, and volume, which are said to have topological dimensions of 0, 1, 2, and 3 respectively
Adjacency: including the touching of land parcels, counties, and nation-states
Connectivity: including junctions between streets, roads, railroads, and rivers
Containment: when a point lies inside rather than outside an area

Topological Analysis - Next To, Within, Intersects
Spatial queries are one of the core functions of any GIS system
They rely on topological relationships

Making intersection test more efficient
Take care of easy cases using coordinate comparisons
Only of f bounding boxes intersect

Sweepline Algorithm

Event queue

Start Segment (S2)
End Segment (E2)
Intersection (I2,4)

Move sweepline to next event
Maintain vertical order of segments as line sweeps across

Start Segment

Insert in list
Check above and below for intersection

End Segment

Remove from list
Check newly adjacent segments for intersection

Intersection

Reorder segments
Check above and below for intersection

Topological Analysis - Point In Polygon
Topological rules can be put to good use when carrying out some spatial analysis tasks
Counting the number of points contained within a polygon is one of the most common of these

Spatial Autocorrelation - Tobler’s First Law of Geography

"Everything is related to everything else, but near things are more related than distant things.” See Tobler, 1970.

Formalising Tobler’s First Law: Spatial Dependence/ Autocorrelation
Points closer to the red point are more likely to have characteristics which are similar
Sunlight,slope, vegetation, soil ph etc.

Conclusions
These concepts are at the heart of spatial analysis software and processes.
Many aspects are now routine in GIS.
Effective spatial analysis requires an intelligent user, not just powerful computer
Try and avoid complacency.

Friday, 18 September 2020

Core Components of Spatial Analysis: Spatial Patterns and Spatial Autocorrelation

Outline

The importance of pattern
Patterns of categorical point data – Point Pattern Analysis
Quadrat Analysis
Ripley’s K
DBSCAN
Patterns of spatially referenced continuous observations
Spatial autocorrelation
Defining near and distant things
Measuring spatial autocorrelation
Moran’s I
Geary’s C
LISA

The importance of Patterns

Need to disentangle the influence of space. It may be of specific interest, or need to be removed.

The human eye/ brain is notoriously hard at spotting patterns!!

Quantifying Spatial patterns

There are a number of techniques in spatial analysis which can allow us to quantify spatial patterns…
Some techniques are more appropriate for understanding the spatial distribution of discrete objects or events of a categorical nature, e.g.
Cholera Deaths; Blue plaques; Trees; Post boxes; Burglaries, Breweries in London etc.
Often these events are recorded as points and as such the techniques fall under banner of: Point Pattern Analysis
Here the properties are fixed, but the space varies

Are the points clustered or are they random?

Quantifying Spatial Pattern

•On other occasions, we might observe a spatial distribution of some kind of attribute, for example:
•Rate of smoking across output areas in London
•Levels of unemployment across Census tracts in the US
•Average number of road accidents along sections of road
•Where rates of smoking, levels of unemployment, numbers of road accidents appear more similar in places that are closer, they are said to exhibit Spatial Autocorrelation
•The level of Spatial Autocorrelation can be assessed using statistics such as:
•Geary’s C
•Moran’s I

Here the space is fixed, but the properties vary

Are the values clustered or are they random?

Point Pattern Analysis
•At the core of point pattern analysis is the question:
•Are these points distributed in a random way or is there some sort of pattern (uniform or clustered)?
•The expected random model is known as Complete Spatial Randomness (CSR)
•A random distribution of points is said to have a Poisson distribution
•By comparing the distribution of observed points with a CSR Poisson model, we can tell if we have an interesting point distribution….

The Poisson Distribution
•Describes the probability or rate of an event happening over a fixed interval of time or space
•Where the total number of events in a fixed unit is small (e.g. Breweries in a London Borough), then the probability of getting a low rate is higher
•As number of events increases, the mean (λ – lambda) increases and the probability distribution changes

The Poisson Distribution applies when:
1.The events are discrete and can be counted in integers
2.Events are independent of each other
3.The average number of events over space or time is known
•It’s very useful in Point Pattern Analysis as it allows us to compare a random expected model to our observations
•Where our data do not fit the Poisson model, then something interesting might be going on! Our events might not be independent of each other – they might be clustered or dispersed and something might be causing this…

Testing for CSR - Point Pattern Analysis

Quadrat Analysis

•Developed and used frequently by ecologists
•Grid of squares
•Count number of incidents (burglaries, cholera deaths, hippos etc.) in each cell – store results in a table
•Compare the observed occurrences with a CSR Poisson model…

•We would expect the probability distribution of X(breweries in London) to have a Poisson distribution if they exhibit Complete Spatial Randomness
•We can test for CSR by comparing the observed and expected counts and using a test such as the chi-squared (Χ^2) test (more on this next week…)
•In this example, Χ^2 statistic p-value, < 0.05 so we have spatial clustering…

•Simple to employ and therefore common – gives us an idea of whether our data are clustered in space
•Results affected (sometimes quite seriously) by:
•Quadrat size (generally determined as the area of study area divided by number of features)
•Quadrat shape (if not uniform) and boundary
•Both are examples of The Modifiable Areal Unit Problem
•Results for this particular set of quadrats likely to be less-reliable!

Ripley’s K
•To avoid scale and zoning problems associated with quadrat analysis, Ripley’s K tests for CSR for circles of varying radii around each point
•k(r)= λ^(-1) ∑_i∑_jI(d_ij<r)/n
•In English: Ripley’s K value for any circle radius (r) = the average density of points at that radius (λ = (n/πr^2)), multiplied by the sum of the distances (d_ij) between all points within that search radius, divided by the total number of points, n
•I = 1 or 0 depending if d_ij < r

•Ripley’s K can be computationally intensive when there are lots of points to consider
•The extent of the study area can affect the calculation – eventually the radius around any point will include all other points in the study area
•Sometimes the phenomenon being studied cannot just occur anywhere (you can’t set up a brewery in the Thames or in the middle of Hyde Park) so will naturally cluster – this needs to be accounted for (compare with similar activities)
•Modifications to the K equation can deal with edge (border) effects

Extended Point Pattern Analysis

•Point Patterns can also be assessed in the presence of covariates (other points, lines or polygons)
•Covariate analysis explores the influence of other categorical factors on the spatial patterning of points

Density-based spatial clustering of applications with noise - DBSCAN

•DBSCAN is a popular algorithm for detecting clusters of points based on their density
•Popular because can detect non-linear clusters
•2 parameters:
•Epsilon (eps) = size of neighbourhood within which to search for other points
•Minimum number of points to search for (MinPts)
•If a point has >= MinPts in neighbourhood then defined as ‘Core’
•If point in neighbourhood of a core point but has <MinPts in its own neighbourhood, then defined as ‘Border’

Patterns of spatially referenced continuous observations
•Frequently in spatial analysis we don’t just want to study discrete events, but the ways in which variables change across space…
•Similar observations in similar places might be the result of some underlying cause
Tobler’s First Law of Geography
"Everything is related to everything else, but near things are more related than distant things.”
Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240.

Formalising Tobler’s First Law: Spatial Autocorrelation
The correlation among observations of a single (auto means self) variable (smoking rates, unemployment, etc.) that is strictly attributable to the proximity of those observations in geographic space.

Spatial Autocorrelation
•“Everything is related to everything else, but near thingsare more related than distant things”
•Are the GCSE scores of pupils in London more likely to be similar in areas that are close to each other than those in areas that are distant?
•What are near and distant things?

When is a Neighbour a Neighbour?
•When it’s adjacent…?
•Contiguity (common boundary)
•What is a shared boundary? Edges (rook’s case)? Nodes (Bishop’s case)? Both (Queen’s case)?

•How many neighbours are important?
•When they are the nearest, or second, third, fourth… Kth nearest?

What is K – how many neighbours?

•When it’s within a certain physical distance…?
•What threshold – 1km, 10km, 100km?

Conceptualising Neighbours – The Spatial Weights Matrix

Moran’s I, Geary’s C and Getis-Ord’s G
•Indexes to compare values for neighbouring features
•Help answer the question – “are the values for neighbouring features more similar than those for all other features?”
•If the average difference between neighbouring features is less than between all features, values are clustered.
•Can specify neighbouring based on adjacency, set distance or distance to all features – always conceived as a spatial weights matrix.
•There are global (one number for the whole system of interest) and local (individual numbers for each zone in the system) versions of both.

Moran’s I

•Classic and very popular (best?) measure of spatial autocorrelation
•Moran’s I = Do we have a clustered pattern or a dispersed pattern?
•Depends on neighbour definition
•Typically ranges from -1 (-ve AC) to 1 (+ve AC). 0 is no relationship.

•N = Number of spatial Units (6) indexed by i and j
•X_i = variable of interest (people, badgers etc.) in zone i
•¯X= mean = 3.83
•W_ij = spatial weights matrix

Geary’s C
•Value of Geary’s C varies between 0 and 2
•1 = no spatial autocorrelation
•<1 positive auto-correlation (similar values near each other)
•>1 negative auto-correlation (dissimilar values near each other)

•Good if values are evenly spread and spikes are high or low
•Tends to cancel out if both high and low values cluster…

LISA – Local Indicators of Spatial Association

•E.g. Local Moran’s I & Getis-Ord Gi*
•Evolution of both previous global statistics
•Allows us to see where in our spatial system:
•High or low values are clustered (Getis Ord Gi*) – ‘Hot Spot Analysis’Clustered values are in relation to other clustered values

Limititations Of Geary’s C, Moran’s I and Getis-Ord G : Effect Of Spatial Units/ Scale

•You can’t compare the Moran’s I, Geary’s C or Getis-Ord G statistics between study areas with different numbers of / sized zones.
•The Modifiable Areal Unit Problem – MAUP (refer back to previous lectures and practicals)

•Calculating the neighbours is computationally intensive. Creates an n2 matrix.
•So 10 spatial units is manageable, but 1000 requires 1,000,000 comparisons, and 10,000 100,000,000 etc.

Limitations: Applications

•Global measures can mask local trends (although local versions deal with this to an extent)
•Especially with fine-scale spatial units
•Good summary, but never possible to capture complex spatial interactions in a single figure
•Normally used as an exploratory tool rather than a definitive statement

Software Tools
•Various spatial autocorrelation algorithms (local and global) are available in both R and ArcGIS
•R is more advanced in terms of the options available – more difficult to master
•ArcGIS can be used for some of the more popular measures – easier to implement. It is often slower.
SUMMARY
•Well established techniques are available to allow us to investigate spatial phenomena
•Point Pattern Analysis techniques all compare observed distributions of points to an expected model based on the Poisson distribution – with varying degrees of sophistication
•Analysis of the spatial autocorrelation of continuous variables over space allows us to assess if similar values cluster in space
•How we define neighbours is crucial in spatial autocorrelation analysis
•Understanding your data is key to applying the appropriate technique

Thursday, 17 September 2020

Uncertainty and GIS

Overview

Definition, and relationship to geographic representation
Conception, measurement and analysis
Vagueness, indeterminacy accuracy
Statistical models of uncertainty
Error propagation
Living with uncertainty
Biases and Statistical Fallacies

Are these true? Why?

If nothing change during an intervention, the results shows the impact of intervention
If something is good, then more is better
If I get too many tails, I should get a head quite soon
Good-looking people are self-centered

... When I review a paper or dissertation or thesis, these are some of the stuff I look for! Tada, I can play fun game in viva! These can happen to every project and just need a bit of attention. They all are to do with uncertainty, biases and fallacy.
Introduction

Our world is too big and complex to be measured, studied, modeled, represented, and predicted with zero level of uncertainty
Truth value is unknown!
Measurements are not perfect (issues with quality and biases)

What do we mean by 'uncertainty'? (Spatial, Temporal, Thematic) Uncertainty

Uncertainty accounts for the difference between the contents of a dataset and the phenomena that data are supposed to represent
Inaccuracy and error: deviation from true values.
Vagueness: imprecision in concepts used to describe the information (e.g., near, close, around, far).
Incompleteness: lack of relevant information.
Inconsistency: conflicts arising from the information.
Imprecision: limitation on the granularity or resolution at which the observation is made, or the information is represented.

Complexity and size of our world makes it virtually impossible to capture
(A) every single facet,
(B) at every possible scale,
(C) from all individuals' perspectives (as we see world differently!)

Sources of Uncertainty

Measurement error: different observers, measuring instruments
Let's play! Volunteer plz
Specification error: omitted variables, measuring the correct thing
Ambiguity, vagueness and the quality of a GIS representation.

Uncertainty 1: Uncertainty between the real world and its Conception

Spatial uncertainty
Natural geographic units?
Discrete objects
Vagueness
Statistical, cartographic, cognitive
Ambiguity
Values,language

More on vagueness

What is rural?
What is remote?
What is accessible?
What is large?
Statistical thresholds for each?

More on ambiguity

Different sources
Direct/indirect indicators
Context

Uncertainty 2: Uncertainty in measurement and representation

Representational models filter reality differently
Vector
Raster
A Coastline
Vector resolution– number of vertices?
Cell assignment?
Statistical measures of uncertainty: a raster example
How to measure the accuracy of nominal attributes?
e.g. land/ sea or a vegetation cover map

The confusion matrix

compares recorded classes (the observations) with classes obtained by some more accurate process, or from a more accurate source (the reference)
Confusion matrix used to measure change over time, or error in measurement when data are gathered from different sources

Sampling for the Confusion Matrix

Examining every parcel may not be practical
Rarer classes should be sampled more often in order to assess accuracy reliably sampling is often stratified by class

Issues to do with sampling size, biases, and fitness- for-purpose.

Survival Bias
Volunteer sampling vs random sampling
300 types of cognitive bias!
Law of Small
Numbers (>> the erroneous belief that small samples must be representative of the larger population)
Law of Large Numbers
Confirmation and congruence biases
Bandwagon, groupthink and herd behaviour biases
Status quo, loss aversion and endowment biases
Lake Wobegon, self-serving and overconfidence

Why else data sources may not agree – precision & accuracy

The term precision is often used to refer to the repeatability of measurements. In both diagrams six measurements have been taken of the same position, represented by the center of the circle. On the left, successive measurements have similar values (they are precise), but show a bias away from the correct value (they are inaccurate). On the right, precision is lower but accuracy is higher.
Why else data sources may not agree - Reporting Measurements

The amount of detail in a reported measurement (e.g., output from a GIS) should reflect its accuracy
“14.4m” implies an accuracy of 0.1m
“14m” implies an accuracy of 1m
Excess precision can be removed by rounding

Vector uncertainty - Scale & Geographic Units

The geographic units chosen to represent spatial phenomena can lead to uncertainty due to:
Scale
Spatial extent(shape)
http://www.ons.gov.uk/ons/guide-method/geography/ons-geography/index.html

Scale & Geographic Units

Scale of analysis can alter information revealed in GIS
Should never assume patterns present at one level are applicable to other levels further down the hierarchy – this is known as the ecological fallacy

Ecological Fallacy

Assumes that all individuals in a group (such as an OA) share the same average characteristics.
This is clearly not the case.
“The average age for people in Camden is 35
Useful context but problematic in calculations.

UK example:

Several layers of administrative geography in UK
Government Office Regions are largest
2011–Y&H Crude Death Rate 9.2 people per 1000
Is this accurate for everywhere in Y&H...?
Local Authority districts are the next level down the hierarchy
Crude death rate for Leeds – 8.5 people per 1000
Big student town, many young people
Wardsor Middle Level Super Output Areas (MSOA) next level down
More deprived, poorer life expectancy East Leeds – CDR (estimated) 10.1 per 1000
Lower Level Super Output Areas (LSOA) nested within MSOA
SOAs examples of uniform spatial units as similar populations
CDR could be10.5 per 1000
Output Areas (also Uniform units) are the smallest geographical area for which data are commonly produced in UK
Old peoples’ home in LS15 4BX – increases CDR for whole OA
CDR12.1 per 1000

Scale & Geographic Units
Changes in the shape or spatial extent of the geographical unit for which data are presented can change the patterns which are revealed
This is known as the Modifiable Areal Unit Problem(MAUP)

MAUP: example
MAUP is a challenge in choropleth maps, due to boundary options
Often can be assessed via geographic sensitivity analysis (i.e. repeating the analysis using a variety of different spatial units for the same area)

The Modifiable Areal Unit Problem - MAUP
Scale + aggregation = MAUP can be investigated through simulation of large numbers of alternative zoning schemes.
Scale and geographical units
Functional spatial units can help avoid problems of MAUP observable with uniform units
Functional regions define units according to phenomena – e.g commuting patterns

Other representation issues:

We often need to understand the underlying context of our data
Mortality rates classic example...

Standardised Mortality Rates

Underlying population structures can obscure patterns
Low crude death rate West Lothian (Edinburgh)
Much higher SMR– West Lothian young population

Uncertainty 3: Uncertainty in Analysis - Error Propagation

Cumulative effects of errors can be large and errors can persist in geographic data
Error propagation measures the effects of errors and uncertainty on the results of GIS analysis
Almost every input to a GIS is subject to error and uncertainty
In principle, every output should have confidence limits or some other expression of uncertainty – in practice this is not always the case...

Taking Advantage of geographic uncertainty: Gerrymandering

From Wikipedia: First printed in March 1812, this political cartoon was drawn in reaction to the state senate electoral districts drawn by the Massachusetts legislature to favour the Democratic-Republican Party candidates of Governor Elbridge Gerry over the Federalists. The caricature satirises the bizarre shape of a district in Essex County, Massachusetts as a dragon. Federalist newspapers editors and others at the time likened the district shape to a salamander, and the word gerrymander was a blend of that word and Governor Gerry's last name.
Are these always true?
If something is good, more is better
If nothing change during an intervention, the results shows the impact of intervention
If I get too many tails, I should get a head quite soon
If goodlooking people are self-centred.
...And be careful about negative rates/percentages in your analysis
...Ah, also causality and correlation
And GIGO!
Importance: GIGO(garbage in, garbage out)
Most algorithms are only as good as the quality and accuracy of the training data. If the data is biased or skewed in any way, the resulting algorithm will be, as well.

Dealing with uncertainty in analysis – ground truthing

Geo demographic classifications classify areas according to types of people living within
Used commercially and in public sector
OutputArea Classification
Various sources of analysis error in classifications
Clustering algorithm
Datasources
Variable selection
Ground-truthingcan involve visiting places ‘on the ground’ to see if profiles make sense

Living with Uncertainty
It is easy to see the importance of uncertainty in GIS but much more difficult to deal with it effectively
We need to acknowledge this and account for it as best we can. Don’t pretend it doesn’t exist, or our data/results are perfect!
38.74564% of all statistics are made up (including this one!) Evaluate the reliability, provenance of your data
Reproducability
Some Basic Principles
Uncertainty is inevitable in GIS
Data obtained from others should never be taken as truth (but essential for quality assessment)

efforts should be made to determine quality
Effects on GIS outputs are of ten much greater than expected there is an automatic tendency to regard outputs from a computer as the truth.

More Basic Principles
Use as many sources of data as possible

and cross-check them for accuracy
Be honest and informative in reporting results

add plenty of cave at sand cautions.
Consolidation
Uncertainty is more than error
Richer representations create uncertainty!
Need for a priori understanding of data and sensitivity analysis

Monday, 14 September 2020

RMarkdown & Github

Conducting Reproducible Research by R MarkDown and Github

Why?
From rOpenSci

“The aim of practising reproducible computational research is to expose more of the research workflow to our audience. This makes it easier for them to make a more informed assessment of our methods and results, and makes it easier for them to adapt our methods to their own research.”

http://ropensci.github.io/reproducibility-guide/sections/introduction/

Guards against

Academic fraud
Erroneous interpretation

Allows for

Independent verification through transparency
Critique of methods
Critique of results

Promotes

New techniques
Good scientific practice

How to we produce reproducible research?

Lots of different ways, but in Geographic Information / Spatial Data Science, two tools – Markdown and Github – are making openness and reproducibility far more straightforward

RMarkdown

RMarkdown documents enable the integration of text (commentary) with code and the outputs of that code.
RMarkdown documents can be ‘knited’ into a range ofdifferent output formats:
https://rmarkdown.rstudio.com/lesson-2.html

RMarkdown – Output Formats

HTML–Webpages
PDFdocuments
MS Word Documents
Books (see bookdown.org)
ScientificPapers formatted to particular journal styles
Slide-decks
InterpretiveDance*

*doesn’t actually output to interpretive dance - yet

Rmarkdown – Reproducibility Built In

RMarkdown forces you to carry out your analysis properly – it will only compile the end document if you have included everything required (packages, data, code etc.).
Analysis and commentary are integrated so it is possible to describe the full workflow from data import > processing > analysis > results > interpretation/conclusions.
Anyone viewing your RMarkdown document will be able to reproduce exactly what you have done and through the commentary, understand how and why you have done it.

GitHub

A web-based service for storing your code and, importantly, versions of it.
Git is a versioning tool (one of many, but probably the most popular – others include SVN).
Versioning is vital when working on collaborative projects, however it can also be useful when updating your own projects.
GitHub is becoming the de facto location for storing code for others to use and view on the web (although others exist like Bitbucket and Gitlab)
You will produce and critically compare TWO maps produced using 2 different pieces of software.
One map will be created using a GUI-based piece of GIS Software, such as ArcMap or QGIS.
One map will be produced using predominantly code- based software such as R or MapBox.
The accompanying text will evaluate both finished products and the cartographic/GIS/data science work-flows used to produce them. Images of the two maps should be embedded in an R Markdown .Rmd document titled “AssessmentPart1”. Once complete, you will upload your completed“AssessmentPart1.Rmd” file to a repo on your personal GitHub page.

Cartography and an Making Good Maps

Outline of Cartography and an Making Good Maps

Why making maps is hard
Parts of a map
Good cartographic practice
General principles
The consequences of maps

Making Maps is Hard!

The combination of good analysis and good visualisation.
One without the other makes for a bad map.
Poor analysis with good visualisation is probably more dangerous.
It is very easy to make a very bad map!
QGIS is particularly good for helping you decide on the breaks in your data – Natural Breaks
https://censusgis.wordpress.com/students/lesson-5- visualisation-cartographic-practice/

Making Maps is Hard: Krygier and Wood’s Checklist

What is the map trying to accomplish?
Do you really need a map?
Is the map suited to the audience?
Have you included sufficient attribution information for data sources etc.?
What are the likely impressions of the map?
Are the data appropriate for the map’s purpose?
Does the symbolisation reflect the character of the phenomenon/ data?
Is the level of generalisation appropriate?
Implications of the origins of the data?
Is the map suited to the audience? Have you included sufficient attribution information for data sources etc.?
What are the likely impressions of the map?
Are the data appropriate for the map’s purpose?Does the symbolisation reflect the character of the phenomenon/ data?

Making Maps is Hard: Krygier and Wood’s Checklist

Data quality/ accuracy.
Copyright or copyleft?
Appropriate projection and Coordinate reference system?
Does title indicate what, when where?
Does textual information add anything?
Does the legend include symbols that are not self-explanatory?
North arrow?
Do variations in design reflect variations in data?
Context of the map clear?
Is the typeface appropriate?
Is colour being used effectively?

Good Cartographic Practice - Map Scale and Orientation

Numerical: 1:100,000
Visual: 10km
Verbal: 1cm= 10km
Always good to show direction (conventionally North).
Can be implied by graticule (lat long grid).

Good Cartographic Practice - Text & Legends

Title
Data source
Attribution
Copyright
Labels
Extra context

Cartograms

Good for multivariate data
Can emphasise areas of the map of interest (often where people live).
Use one variable for colour, the other for scaling
Gastner-Newman a popular algorithm
Can be hard to interpret
Alternatives can be the graphical legend shown earlier

Faceting

Good for multivariate data eg temporal.
Facilitates display of large volumes of information.
Allows visual comparison between maps.
Can produce v. large plots.
Should avoid cramming too much on a page.

General Principles

Less can be more.
But avoid over-simplification.

General Principles

It is sometimes acceptable to break the rules
How does your map compare to maps you admire or have been impressed by?
The ultimate question to ask is “does it look right?”

Consequences of Mapping

Being able to create a map places you in a position of power.
This comes with responsibility.
How will the map be (mis)interpreted?

Conclusions

Good maps expand minds, improve perceptions and have a positive impact.
Good maps demonstrate your data and analysis.
Poor maps render them irrelevant.
Maps place you in a position of power...so get them right!

Monday, 7 September 2020

Defining Spatial and Coordinate Reference Systems

Representing the world

from infinitely complex reality to models and representations

Outline - what is representations, digital representations, Discrete objects and fields, Rasters and vectors and Projections.

Representations

Representations are needed to convey information. They are need to fit information into a standard form or model. In burglar diagram the coloured trajectories consist only of a few straight lines connecting points. If we looked closer we would reveal more information. They almost always simplify the truth.

Accuracy of Representations

Representations can rarely be perfect. Details can be irrelevant, or too expensive and voluminous to record. Its important to know what is missing in a representation. Representation can leave us uncertain about the real world.

Digital Representation

At its root uses only two symbols, 0 and 1 to represent information. The basis of almost all modern human communication. Many standards allow various types of information too be expressed in digital form. MP3 for music, JPEG for images, ASCII for text, GIS relies on standards for geographic data.

Why Digital?

Economies of scale - One type of information technology for all types of information. Simplicity, Reliability - Systems can be designed to correct errors. Easily copied and transmitted - At close to speed of light.

Digital data standards

Extensible Markup Language (XML). Help information systems share data particularly on the internet. In geographic information systems, a variety of standards. eg GML and KML, GeoRSS, GeoJSON. In GIS the open geospatial consortium (OGC) sets the standards for geo-spatial data.

Storing Digital geographic data

Geographic information links a place, and often a time, with some property of that place(and time). The temperature at 35N,120W at noon local time on 28/9/05 was 18 celsius. The potential number of properties is vast. In GIS we term them attributes. Attributes can be physical, social, economic, demographic, environmental etc. Attributes(data) can be classified by their type.

Nominal Attributes

Most basic data type. Otherwise known as categorical attributes, simply names or categories.

Ordinal Attributes

Ordinal attributes have an ordered or ranked relationship. The contain more information then nominal/categorical attributes as each attributes related to others in the rank.

Interval Attributes

With interval data, such as temperature, the differences between each value make sense. 40 degree Celsius is twice as hot as 20 degree Celsius. Interval values are not absolute - 0 degree Fahrenheit is clearly not 'no temperature' or absolute 0!

Ratio Attributes

Ratio attributes have a meaningful 0 point. Height above sea level is a classic ratio attribute.

Cyclic Attributes

Classic example is wind direction. Not a data type we will encounter too often.

The problem of reducing infinite complexity

The number of places and times is also vast. Potentially infinite data. The more closely we look at the world, the more detail it reveals. Potentially ad infinitum - Fractals! The geographic world is infinitely complex. Many methods are used in GIS to create representations or data models.

Geographic representation : 2 types

1. Discrete Objects : The most fundamental distinction in geography.Discrete objects examples are world as a tble-top. Objects with well-defined boundaries.

Points, lines and polygons. Countable state minnesota - 10000 lakes. Persistent through time, perhaps mobile. Biological organisms fit this model well eg animals and trees. As do human-made objects- vehicles, houses, fire hydrants.

2. Fields: Properties that vary continuously over space. Value is a function of location. Property can be of any attribute type, including directions. Elevation as the archetype. A single value at every point on the earth's surface. The source of metaphor and language. Any field can have slope, gradient , peaks, pits.

Rasters

Divide the world into square cells. Register the corners to the Earth. Represent discrete objects as collections of one or more cells. Represent fields by assigning attribute values to cells. More commonly used to represent fields than discrete objects.

Characteristics of Rasters

Pixel size: The size of the cell or picture element, defining the level of spatial detail. All variation within pixels is lost.

Assigning reality into pixels. The value of a cell may be an average over the cell, or a total within the cell, or the commonest value in the cell. It may also be the value found at the cell's central point.

Vector data

Used to represent points, lines and polygons. All are represented using coordinates. One per point, lines as polylines. Straight lines between points. Areas as polygons. Straight lines between points, connecting back to the start. Point locations recorded as coordinates.

Raster vs Vector

Volume of data : Raster becomes more voluminous as cell size decreases.

Source of data : Remote sensing, elevation data come in rater form. Vector facvoured for a administrative data.

Analysis : Some Analyses are better suited to raster [map calculation, suitabitlity indices etc.], some to vector [route finding, network analysis etc.]

"Raster is vaster, and vector is correct" [only works with a Northern accent!]

But:

apperent precision of vector assumes high locational accuracy gerneralisation

Representing the globe

the earth is a 3d sphere [well, almost] in order to locate a point on the surface of a sphere , we a set of coordinates. Coordinates will tell us how near to the top or bottom of the sphere we are ,or how far around but where do we start?

Word geodectic system (wgs 84)

standard (3d) method of represting our geoid [last revesion established in 1984] used by gps coordinates are latitude and longitude greenwich is at prime meridian [zero line of longitude]. All places to east have +ve longitude, all to the west are -ve latitudes emanate from the poles.

3D into 2D- Projecting

But what if we want to view a 3D object in 2 D. 2D planes are easier to deal with (ever tried navigating with a globe.) Projections enable us to represent 3D coordinates on a 2D surface. But losing a dimension means we lose some information - different projects lose different information.

Mercator Projection

Invented for navigation purposes by Gerardus mercator in 1569. Bearings (angles) are preserved (particularly useful when navigating a ship with a compass). Area and distance are not preserved.

Defining Spatial/Coordinate reference systems.

Knowing which coordinate reference system (CRS) your coordinates are in is absolutely vital for being able to specify your point on the earth correctly. Frequently in GIS you will work with data which refer to points on the earth using different CRSs. Therefore in order to compare them, you will need to know which data are in which CRS and how to convert between them - getting the wrong CRS is one of the most common sources of error in GIS. All of the projections described in the slides above (and many more besides) can be identified with a unique spatial reference system identifier (SRID)

SRID, EPSG and Proj.4

One of the more commonly used sets of SRID values are maintained by the European Petroleum Survey Group - EPSG. For example EPSG: 4326 refers to the WGS 84 world geodetic system. EPSG:27700 refers to british national grid. Proj.4 is a library for converting between spatial reference systems. Most EPSG identifiers will also have a Proj.4 string. For example, the Proj4 string for EPSG:4326 is

+proj=longlat + ellps = WGS84 + datum = WGS84 + no_defs

If you want to find an SRID code for a particular spatial reference system or its related Proj4 String, visit http://spatialreference.org

Summary

- Reducing the infinite complexity of the read world results in incomplete representations of it.

- Spatial data present a unique and interesting set of challenges.

-There is yet to be a definitive solution to many of them-we will only ever have abstractions from reality.

We need to decide what is (un)acceptable in our analysis.

Wednesday, 2 September 2020

Artificial Intelligence

Topic: What is Artificial Intelligence? What is its significance for business organizations? Explain how it is used by one such UK based organization of your choice. Provide appropriate examples to support your content.

Artificial intelligence (AI) is the buzz word these days in our present industries. In simple words, AI could be defined as human to machine interaction using computer application. Although, we don’t have any predefined definition for AI, but we predict that with new and improvised technology we could create a higher standard and more secure standard of living for human being. With help of technology, we could work intelligently and have a virtual future with robots, IOT (Internet of things) and Data science.

However, with this automation processes, we do have a major significance on our business organizations. Firstly, we increase competition and improve efficiency. Secondly, clients expect to have a advanced solution with use of AI, IOT and machine to human interaction. As demand increase, we gain massive productivity in our business. Finally, we have a major cut in timeline, as save time and invest on the technologies.

Nevertheless, AI is the basis of computer technology and is our future to make complex decision based on the data. The amount of data created with these technologies is giant. Henceforward, we do have data scientist and other specialist to analyze these big data. With help of algorithm and machine learning techniques we try to predict the accuracy of the data. For example, we work on retail modelling. We have the data of number of items, the population type, specification and other technical items. Later we do a comparative study with other stores to understand how the data could be validated. Although the procedure the dense and requires time to have a final decision. While, we do have many disadvantages with this kind of automated technologies. The major concern is the security of data or privacy concerns. How can we address the concern when the sensitive data is being stored in the database? With help of hackers, we could track and release the passwords which could sanitise the whole security of the facts. Still, we could fight against this privacy and create technology in sophisticated use for Human purpose.

With my experience in UK based organization, the demand for AI is very important skillset any individual should have to own a business or understand the detailed information. Although, it kills the creativity and does not improve unless being upgraded periodically. AI is totally depended computers which goes outside the supervised learning with more testing (trial and error) by the research scholars. Although, AI helps to solve foremost sustainable problems and also help us to create a future with more advanced technology and security control system.

Concluding, there is not good or bad for AI, there is risk involved in every task. The benefits for environment, medical applications, transportation, security system, Better human and business intelligence, Autonomous vehicles researched by Tesla, Uber etc. We need to learn and update our human skills to grow in this fast-moving technological world. In a nutshell, we have the power to elevate the data and predict the insights to achieve our desired goals with the help of AI and machine learning.

Tuesday, 1 September 2020

Spatial is Special

Geospatial Information is multidimensional (x,y), oftenprojected onto a flat surface, voluminous, representable atdifferent resolution/scale, with a lot to reveal about us!

Special methods are required to analyse geospatial data Procedures are usually complex and expensive (even if static)

Retrieval of large amounts of data for each analysis (even display), and long transactions for data manipulation requires special database, software, hardware

GISystem Can have the following architectures

Desktop GIS
Client Server
Distributed
Web GIS

Client-Server Architecture (Desktop GIS + Network + Data Server)

3 Logical Layers in many Physical machines
Many clients and one server (data server)

Advantages:

Consistency of data (there is only one version of data)
Many Users (100 for example; usually below 300 due to limited connection objects)

Distributed Architecture (Desktop GIS + Network + Data Server + Application Server)

3 Logical Layers in many Physical machines
Many clients and two servers (application, data server)

Advantages:

Consistency of data (there is only one version of data)
Consistency of Analysis
Many More Users (1000 for example)

Major Disadvantages of Client-Server and Distributed

• Maintenance of GIS software on client machines
• Limited Number of users: 1m, 10m simultaneous users!!
• Cost: dedicated administrator (team)
• System integration: GIS is not the only system in an organisation

WebGIS

No need to have the GIS software on client machines; just a web browser is needed for clients.
Platform-independency (desktop, mobile, Windows, Mac, Linux,...)
Users with no experience and expertise
Professionals can implement complex what-if scenarios
Low cost (no maintenance, no GIS admin, ...)
Once updated, all clients can access to the latest version of tools and data

Spatial databases Evolution

Why Relational Database is inefficient for geospatial data?

Query : Return the shape of (contours) of France?
Get all the boundaries of all objects constituting France (Mainland and the islands)

Queries on themes requires a knowledge of the spatial objects’ structure
Changing this structure implies a deep reorganization of the database and changing the query formulation

Why Relational Database is inefficient for geospatial data?

The bad performance of this approach, which requires in particular a considerable amount of relational tuples to represent spatial information
The impossibility of expressing geometric computations such as adjacency test, point query or window query using SQL

Why Relational Database is inefficient for geospatial data?

Queries on themes requires a knowledge of the spatial objects’ structure
Changing this structure implies a deep reorganization of the database and changing the query formulation
The bad performance of this approach, which requires in particular a considerable amount of relational tuples to represent spatial information
The impossibility of expressing geometric computations such as adjacency test, point query or window query using SQL

Geospatial Data Storage (part 1)

1- Geospatial data in text and binary files

Text
Simple import and export
Usually in CSV (Comma Separated Values) format
GeoJSON

Binary

In most cases they provide little (or no) attribute support
Limited Support for SRS, complex geometrires
Example: DXF (Autodesk Data eXchange Format)
BSON

Geospatial Data Storage (part 2)

2- Geospatial data in XML files KML, GML and GeoRSS

Geospatial Data Storage (part 3)

3- Geospatial data in GeoRelational Model (Loosely-Coupled approach)

Georelational model is the one of the most widely used models of storage, processing and sharing geospatial data.
geospatial data divided into two separate but related structures.
The geometrical element of features stored in a binary or set of binary files and corresponding attributes stored in a table (relation in terms of RDBMS).
There is a one-to-one relationship between rows in relational database and Features in binary files

Concurrency, security, recovery and ... problems in addition to limited file size and other issues

Geospatial Data Storage (part 4)

4- Use of Spatial Databases

4- A)Native Spatial Database
Natively supports spatial data
Spatial data-types and functions
Integration of functions with query language (SQL)
Support of spatial indexing (Accelerating retrieval of data)

Examples:

PostGIS

SQL Server

Oracle

DB2

MySQL

Geospatial Data Storage (part 4)

4- Use of Spatial Databases

4- B)Spatially-Enabled Database

Use of relational database to store and manage geospatial data
Attributes in columns of normal data-types such as text, int
Geometry in BLOB (Binary Large Object)
Handle all the geospatial functionality (such as query, analysis, indexing and ...) in Applications

Example:

Esri’s ArcSDE which is a middleware that can use SQL Server, Oracle, Informix, DB2 and PostgreSQL to store and manage geospatial data

Relational database systems to provide data for geospatial services

Advantages:

support for geospatial data storage
easy to use query language,
standard query language (based on OGC simple feature),
very good support from developer community,
flexibility for modelling various kinds of relationships
multiple users,
support for transactions (short at DBMS and Long at Application)

Disadvantage:

vertical scalability
need for multiple joins for complex spatial queries,
low performance for large volume of geospatial data

Jyothi Gupta's blog