Intro to QGIS (w/ Choropleth Maps and Equal Area World Map Projections)

Objective. The purpose of this exercise is to introduce QGIS, and use it to explore how map projections and data classification schemes impact the apearance of choropleth maps. We'll use QGIS and a "shapefile" of population and economic data for countries of the world. Specifically, you are asked to:

  1. Install QGIS from www.qgis.org .
  2. Download a GIS dataset of world economic data from www.NaturalEarthData.com
    (We'll use the 1:110m Cultural data Admin-0 Countries data)
  3. Use QGIS to display world population data in choropleth maps.
  4. Use QGIS to re-project the coordinates and compute the country areas.
  5. Use QGIS to categorize and symbolize world economic data.
  6. Assess the impacts of projection, data classification, and symbolization on apparent map information.
  7. Hand-in a brief (< 1 page of text plus any maps) report showing three of your maps and summarizing your observations in comparing the impacts of projection and data classing schemes on the appearance of the map.

Background...

CRS . Shorthand for Coordinate Reference System. The idea is to have a (standard) way of referring to the model of the Earth to which a set of coordinates are referenced. The standard that seems to be gaining traction is from the EPSG (European Petroleum Survey Group), which is based on the earlier work of the Oil & Gas Producers's Surveying and Postioning Committee. Basically, it is a list that gives an ID number to many (all?) of the different known coordinate systems used around the world. The EPSG system distinguishes between Geographical and Projected coordinate reference systems. A Geographical CRS entails an elipse, a prime meridian, and a datum. A Projected CRS entails those plus a measure of length, and a location; essentially it is saying that the locations are projected on a flat map.

Choropleth maps are maps in which enumeration districts, such as countries, are colored to symbolize the occurrence of the mapped phenomena. Usually, cartographers 'normalize' or 'standardize' the data, dividing counts by area or by population, e.g., we map 'cows per acre' or 'births per 100,000 people', to reduce the effect of enumeration district size on the map appearance. (Other wise, ceterus paribus big enumeration districts would tend to have greater amounts of whatever is being mapped.) (But, the ratios can get crazy when the counts get small, and some have argued against the "rule" of only mapping densities.) Cartographers usually use darker symbols to represent greater quantities.

GIS Data. The world is awash with GIS data available on the web. (See the list at the end of this document for some examples.) These data typically have spatial and attribute components and are provided in GIS-accessible formats such as ASCII text files, shapefiles, and other exchange formats. Some of these data are available through special purpose spatial data servers using specialized web clients (Google Earth is an example) to connect to online databases. Some are expressly in the public domain, and others have proprietary restrictions on their use. GIS data for distribution should come with metadata that describe the data and should help you assess whether the data are suitable for your intended use. Minimally, metadata should identify the coordinate system used for the spatial data and a data dictionary which tells what attributes are included and how they are encoded. Metadata might also include information on data provinence and assessments of its completeness and accuracy.

NaturalEarthData is an example of a handy online data source, run by cartographers and for cartographers. It provides data grouped for three broad presentation scale categories: Large Scale (1:10,000,000), Medium scale (1:50,000,000) and Small scale (1:110,000,000); and for cultural vs physical features. For this exercise, we'll use a data set from the small scale and cultural category.

From http://www.naturalearthdata.com/downloads/110m-cultural-vectors/ download the "Admin 0 - Countries" data set and unzip it to some sensible (memorable) directory on your computer. You should have four files:

ne_110m_admin_0_countries.dbf   (the data table part)
ne_110m_admin_0_countries.prj   (the CRS i.e., map projection info)
ne_110m_admin_0_countries.shp   (the geometric part)
ne_110m_admin_0_countries.shx   (an index linking the dbf and shp parts)

Note that "shapefiles" really are several files, sharing a common basename. The extensions indicate which part of the information is is each file.

Open the .prj file and see if you can devine the CRS (coordinate reference system) for the data. Do you see "WGS 84" and things that might look like major or minor axes for the earth model that the data are using?

Sometimes the .prj file is missing and you may need to research the CRS for such a data set. Documentation or "metadata" may tell you what coordinate system was used in your data. Most GIS software offers some way of associating data with a CRS. Oten you can copy and rename a .prj file from another shapefile that uses the same CRS. In other cases you may need to build a ".prj" file by hand in a text editor.

Quantum GIS (QGIS) Hints. QGIS is a free and open source (FOSS) geographic information system (GIS) that is free; runs on MS-Windows, MacOS, and Linux; uses common data file formats; presents a 'typical' point-and-click interface; and provides sufficient spatial analytic capabilities to demonstrate many GIS functions. For this assignment, the data input, data selection and symbolization, and map output functions will be most relevant.

QGIS can be downloaded from www.qgis.org . The self extracting installation works on Windows, Mac, and Linux. Go ahead and install it.

You may want to start the QGIS program and take a few moments to see what is under each menu tab; the names may not make sense at first but reading through them now will make it easier to find things later.

Reading data into QGIS... Reading vector data is in the Menu: Layer -> Add Vector Layer ... (or an icon in the menu bar). In the panel dialog, you will need to navigate to a directory and specify a kind of file to show (e.g., [OGR] KML or SHAPEFILE). Tell it to load the data. You should see the data and an entry in the "TOC" to the left.

Menu: Layer -> Properties -> Style... this is where you set symbology.

Map CRS. You can explicitly set the projection to use for the map display. (Use File:Project Properties:Coordinate Reference System (CRS) tab to select the desired coordinate system for the map display. NB This affects the display, not the data. ("Check" the "Enable 'on the fly' CRS transformation" if QGIS will need to re-project data sets from several CRS.)

We'll be making a choropleth map and should use an equal area projection. QGIS provides several possibilites under the Projected Coordinate Systems group. (Some work better than others with global data. Try selecting and applying several of these and see.)

Equal Area Cylindrical : NSIDIC EASE-Grid Global          EPSG:3410 1368
Albers Equal Area : NAD83 Texas Centric Albers EA         EPSG:3083 1046
Lambert Azimuthal Equal Area : US National Atlas EA       EPSG:2163 154
Lambert Azimuthal Equal Area : WGS84 North P LAEA Europe  EPSG:3575 1531

Open the attribute table for your features to explore the data that you have. (Layer:Open Attribute Table) The table should have one record (row of data) for each feature on the map. Each record has the same columns (fields or attributes). The columns of data record values of attributes of the spatial objects. You can sort the records on the values in the various fields. You can use the query facility at the bottom of the table panel to select features. You can add new coulmns to the table. You can do math (not meth) on the columns, mostly like a spread sheet.

You can set map symbology for your data. In the "TOC" you can check layers on and off, drag layers up and down in the "TOC" so that points and lines appear "on top of" polygons, and more. The order in the TOC sets the drawing order. There is finer control for the symbology in indiviual layers, allowing considerable design flexibility.

There are classification tools to produce range-graded graduated color symbols. (Right click the layer in the TOC, choose Properties, then Symbology.) (or use the Layers tab, and choose Properties). On the "Style" panel, you can set the Legend Type to be "Graduated Symbol" (rather than Single Symbol), select a Classification Field (the variable to map), select a classiication scheme or Mode (Equal Interval or Quantiles), select a Number of Classes (5, 6, 7, whatever). Then, press the "Classify" button.

The "New symbology" tools, are better than the "Old symbology" tools. You can choose on the upper right of the Layer -> Properties -> Style panel. This will give you better and more control of the classification and the symbology.

Normalizing count data by area or population. In making a choropleth map, you'll usually want to "normalize" count data, by either area or population, in order to map density. Examples include (area in farms in enumeration district)/(total area in enumeration district) and (sick people in enumeration district)/(population in enumeration district).

Some GIS software lets you normalize data when you choose a data field to display. QGIS makes you calculate a data field that reflects the normalization, and trys to make it easy with an "area" button in the "field calculator" dialog, but the "area" is calculated in the data CRS, not the display CRS. If you want to normalize by area and have data in latitude and longitude, save the data in a re-projected equal area projection first and then use that data to calculate areas. Saving your maps ... You can export the map as an image file, maybe to add to a word processing document. Save a snapshot of a map from QGIS by, from the menu... File:Save as Image will let you save a .jpg (or .bmp etc.) image of the map from your screen which you could then include in a webpage or other document.

You can use QGIS's map composition tool (Print Composer) to make a more attractive map. This is under File -> New Print Composer. Basically, you define a map compostion as a page with an orientation and size, and then build the composition by adding elements to the page. These elements can include one or more maps, text and title boxes, legend, scale bar, images, north arrow, etc. The composition can be saved/exported as an image, postscript, or svg file.

What to do and hand-in...

  1. Download the data.
  2. Download and install QGIS.
  3. Start QGIS and open the data set.
  4. Make an initial choropleth map of "POP_EST", estimated population.
    1. Layer -> Properties -> Style
    2. set "Graduated" (rather than "Single symbol").
    3. set Column to POP_EST.
    4. set Mode to Natural Breaks.
    5. OK.
    6. NB. Cartographers would genearlly not map count data this way, other things being equal you would expect that big areas would have more stuff. We should adjust for the area. This will require re-projecting the data and calculating the country's areas in an equal area projection. So...
  5. Update/explanation note... Your original data are in latitude-longitude not an non-equal-area CRS, QGIS or ArcGIS for that matter seems to calculate area using the data (not the display frame) CRS and treating the coordinates as if they were on a Cartesian plane (not a sphere). The work-around is to re-project the coordinates in a new shapefile in an equal area projected coordinate system (like the Equal Area Cylindrical), add that new dataset to the project, and use it for the area calculation. Note that if the new coordinate units are meters, the area will be calculated in square meters. Dividing that $area by 1000000.0 (one million) will give square kilometers). (You may need to have GDAL plugin support enabled for this.)
  6. Select an Equal-Area Projection for the project.
    1. Settings -> Project Properties.
    2. check "Enable 'on the fly' CRS transformation"
    3. under "Projected Coordinate Systems" try...
    4. a cylindrical equal area projection.
    5. "Apply"
    6. "OK"
    7. View -> Zoom to Layer
  7. NB as of Fall 2014, it looks like you can skip the next two steps. QGIS lets you calculate area in the frame's coordinate system.
  8. Export the data in a new shapefile
    1. Export the data as a shapefile
    2. using the CRS of the display frame rather than the data
  9. Add that new shapefile to the project. (And remove the old one.)
  10. Add a data column called "area" to the attribute table.
    1. Layer -> Open Attribute Table (or right click the layer name in the table of contents)
    2. toggle editing "on" with the pencil icon at the bottom left of the table
    3. add a data column using the icon with a star on the table.
      1. name is area
      2. type is Decimal Number (Real)
      3. width is 12
      4. precision is 6
  11. calculate the "area" field
    1. Open the Field Calculator (calculator icon below table)
    2. check "Update existing field"
    3. select "area"
    4. set the "Field calculator expression" to "$area" by clicking the "area" button.
    5. OK.
    6. NB. Is it clear how QGIS calculated the area? Or what units the area is in? Does it matter what CRS is used for the data? For the display? (A brief study sorting this out would make a great term project!)
  12. Calculate a column for population per area, "ppa".
    1. Open the Field Calculator (calculator icon below table)
    2. Under "New field"
      1. Output field name is "PPA"
      2. Output field type is "Decimal number (Real)"
      3. Width 12
      4. Precision is 6
      5. Field calculator expresion is: "POP_EST / area", built by double clicking on field names and clicking on the button for the "/" division operator.
    3. (NB this still leaves a negative population density, maybe make it "0".)
  13. Toggle editing "off" (pencil icon) and save your changes.
  14. Make a much better choropleth map symbolizing population per areas "ppa".
    1. Layer -> Properties -> Style
    2. Graduated
    3. Column "PPA"
    4. Mode "Natural Breaks (Jenks)"
    5. "Classify"
    6. Fix the color ramp...
      1. Color ramp "New color ramp", "Gradient", OK.
      2. set color 1 as a lighter and color two as a darker version of the same hue. OK.
      3. name the new ramp.
    7. "Classify"
    8. "Apply"
  15. Experiment with several equal-area world map projections. (NB Some are ill-behaved; don't be alarmed.)
  16. Experiment with classification schemes.
    1. Classification Methods: Equal Interval, Quantile, Natural Breaks (Jenks), Standard Deviations, Pretty Breaks.
    2. Classes: Probably about 5 or 7, but try fewer and more.
  17. Save three (or so) of your more interesting renditions as jpg files to include in your write-up.
  18. Write brief (< 1 page + example maps) summary of what you discovered about how classification scheme, symbol colors, and projection affect map appearance. How different can the apparent stories told by the same data appear?
  19. Suggestions on how to improve the exercise are welcome.

Some Other GIS Data Sources

www.data.gov/catalog/geodata

the National Map

wwww.OpenStreetMap.org

www.NaturalEarthData.com

Hawaii State GIS

C&C Honolulu GIS (HoLIS)

There are lots of other sources online. Try a web search, including "shapefile" and "data" in the search terms.