j.b.krygier: geography 353: lecture outline

Geography 353 Cartography and Visualization

...to Geog 353 Main Page and Course Description
...to Geog 353 Syllabus
...to Geog 353 Course Schedule and Lecture Outlines
...to Geog 353 Laboratory Information and Student Projects

Geog 353 Lecture Outline: Map Generalization and Classification
Update: 10/13/19

We will review chapter 8 (Map Generalization and Classification) from the Making Maps book. Additional information and examples can be gleaned from the material below.

Making Maps PDF) Ch 8. Map Generalization and Classification
Data Classification in ArcGIS
- right-mouse-click on layer you want to classify data, then select Properties
- The Symbology tab takes you all symbolization options.
- In the upper left window, select quantities.
- Graduated color is the choropleth map.
- other options: Graduated Symbol, Proportional Symbol, Dot Denisty Maps
- Select Classify for more options.

Data Classification

Introduction

Recent Lectures: Issues concerning map symbolization: choosing visual marks to effectively represent the points, lines, and area data of our base maps and thematic data

Effective representation of intellectual hierarchy with a visual hierarchy

Visual variables

WWW) Visual Variables

WWW) Appropriate Visual Variables

Next: ways to logically match the dimensions of your data (point, line, area) to symbols on your map

Requires understanding

your thematic data and how your process it >> data classification
graphic symbolization methods for data at points, lines, and areas

more complex issues of symbolizing data in areas

1. Data Classification

Data is usually classified - put into some categories or groups - before it can be displayed

Different ways of classifying data will lead to different patterns on the map

Classification is a form of cartographic generalization which reduces the complexity of a set of thematic data

Classification: start by differentiating between

categorical (nominal, ordinal) data
numerical (interval/ratio) data

Categorical (qualitative, nominal) data classifications

Dealing with nominal (qualitative) data or data that is ordered but without a measurable range (rare as a type of mappable thematic data)

Vegetation in N. S. America (source)

There are no absolute rules for this kind of classification, just general guidelines

Features in the same class or category should be more similar than dissimilar and should be symbolized similarity
Features in different classes or categories should be more dissimilar than similar and should be symbolized differently
Be aware of how others have classified information:

USGS Topographic Map, Waukesha Wisconsin

Numerical data classifications

College Basketball Players per 1000 population

Ordered data with a measurable range: quantitative data

interval and ratio levels of measurement (combine)

Two big issues involved in the classification of numerical or quantitative data

the number of classes you divide your data into
the specific way you divide the data into those classes

Number of classes:

Most maps for presentation purposes should have four to six classes

why?

As you change the number of classes you may very well see different patterns:

1 class vs 62 classes vs 5 classes:

MacEachren 3.05: same data, different classes

Important to vary the number of classes and see what happens before you make a final choice

Number of Classes in ArcView

Legend Editor: double click on legend in a view
Legend Type: graduated color (choropleth)
classification field: the data to map
normalize by: area
ex) Hispanic Population: totals vs normalized (Texas vs New Jersey)

classify: number of classes

Number of Classes in ArcGIS

right-mouse-click on layer you want to classify data, then select Properties
The Symbology tab takes you all symbolization options.
In the upper left window, select quantities.
Graduated color is the choropleth map.
Select Classify for more options.

Also important is the way you divide up data: classification schemes

the breaks (boundaries) between classes

Data classification schemes

Slocum 3.2: Histograms

Histogram: graph relating data distribution and frequency

Some classification schemes take into account the distribution of the data, and others do not.

1. Exogenous schemes: class boundaries defined by criteria external to distribution of data

ex) income data classified into tax brackets

ex) hunger data classified into classes based on criteria for adequate nutrition

ex) ethnic population based on % required for state to get grant

Advantage: map can be matched to external criteria

Disadvantage: does not take into account the data distribution

Exogenous schemes in ArcView

double click on and edit numbers in the Legend Editor
change both value (for classifying) and label (for labeling in view legend)
+ and X are add and remove classes

Exogenous schemes in ArcGIS

demo

2. Arbitrary schemes: class boundaries are set by arbitrary criteria

Equal Intervals: class boundaries are defined by rounded numbers or regular divisions

ex) 0% - 100%, 100% -200% ...

Often chosen because the classification looks tidy

Simple to do by hand:

find the range of the data (high value - low)
find the common difference of the data (range / number of classes)
find the class limits:
- first class limit = low value + 1 x common difference
- second class limit = low value + 2 x common difference
- and so on

Advantages:

useful when you need to compare a series of maps
easy to interpret and understand
works best with data has a rectangular distribution (rare in geographic phenomena)

Disadvantage: not sensitive to the data distribution (if not rectangular)

Equal Interval schemes in ArcView

Legend Editor >> Classify >> Equal Intervals
divides the range of attribute values into equal sized sub ranges
problem: not tidy
solution: go back and start with 0 and impose a more tidy looking scheme

Equal Interval and Defined Interval schemes in ArcGIS

demo

3. Ideographic schemes: class boundaries defined by the shape of the data distribution

Ideographic schemes take more effort because they are chosen based on some characteristics of the data distribution itself

3a. Natural Breaks: Attempt to find natural breaks in the data; classify data into groups that are somewhat distinct from each other. Can do this by hand using a cumulative frequency graph (or graphic array) and then look for natural breaks in the data and put class breaks at those points

A good default method: good to start with this and see if it works

How to do it: start by creating a histogram

Dent 6.11: Natural Breaks in Histogram

hopefully you will see some "boundaries" in the graph

class boundaries placed at locations having lower frequencies or breaks in the graph

Advantages:

takes data distribution into account
intuitive
very flexible
good default for single map with clear breaks

Disadvantages:

breaks not always obvious: consider a different classification method
too many breaks obvious: consider a different classification method
gets more tedious as data sets get larger
replicable from person to person, poor for comparing several maps
...but...

Natural Breaks in ArcView

Legend Editor >> Classify >> Natural Breaks
based on Jenks Classification Algorithm

Natural Break (Jenks) scheme in ArcGIS

demo

3b. Quantiles: puts an equal number of values in each class

quartiles is four percentile classes (25% each)
quintiles is five percentile classes (20% each)
deciles is 10 percentile classes (10% each)

Easy to calculate

create a list of all data values in ascending order
number of values in each class = total number of values / number of classes
start with lowest value, include correct number of values in lowest class...
...the correct number of values in the second lowest class, and so on

Advantage:

good for skewed data sets IF you want to "flatten" it out

Disadvantage:

flattens out data distribution: hides fact that data is skewed! Bad Map!
bad for skewed data distributions (MacEachren example, p. 46-47)

Quantiles in ArcView

Quantile scheme in ArcGIS

demo

4. Serial schemes: class boundaries are defined by statistical or mathematical functions

Standard Deviation

Dent 6.9: Standard Deviation and Symbolization Schemes

Class boundaries determined by the mean and standard deviation

median: middle value in an ordered (low to high) data set
- half the values in the data set will be above the median, half below
mean: average of a data set: more important for statistical purposes
- add all values together and divide by the number of values in the data set

Normal distribution: values near the mean occur more often

the greater the deviation from the mean (average) the less often a value occurs

Other distributions: not normal: more dispersed

dispersed data: many more of the values not near the mean

flat distribution, rectangular distribution

Standard deviation: a measure of how dispersed a set of data is

comparing two data sets:
the smaller the standard deviation, the less dispersed the data

Advantages:

useful for normal (bell shaped) distributions
useful when you want to focus on areas that deviate the most or least from 'average'
useful when mapping linked to statistical analyses including mean and standard deviation

Disadvantages:

the mean and standard deviation should be understood by user

Standard Deviation schemes in ArcView

Standard Deviation scheme in ArcGIS

demo

5. Unclassified Schemes

Dent 6.14: Classed and Unclasssed Map of Population Change

"Unclassed" choropleth maps: the number of categories is equal to the number of data values

Each value has a unique symbol

ex) range of values from 0 to 100 represented with 0% to 100% black

Advantages

minimizes the amount of generalization and simplification
good for seeking patterns

Disadvantages

minimizes the amount of generalization and simplification
poor for picking out categories

Unclassified Schemes in ArcView or ArcGIS

How to do?
set number of classes to number of values in data set

Sum: data classification

1) categorical (nominal, ordinal) vs numerical (interval, ratio) data

categorical classification: mostly qualitative
numerical classification: quantitative

2) number of classes

3) dividing up data: numerical classification

1) exogenous schemes

2) arbitrary schemes (equal interval)

3) ideographic schemes (natural breaks, quantiles)

4) serial schemes (standard deviation)

5) unclassified schemes

Change the classification scheme or number of classes and you get a different map

different patterns

If all three classification schemes are appropriate for the data distribution then select the classification scheme that best represents what you know about the actual data distribution.

E-mail: jbkrygier@owu.edu

...to Geog 353 Main Page and Course Description
...to krygier teaching page.
...to krygier top page.

OWU Home
OWU Geology and Geography Home