Objective. The purpose of this exercise is to introduce QGIS, and use it to explore how map projections and data classification schemes impact the apearance of choropleth maps. We'll use QGIS and a "shapefile" of population and economic data for countries of the world. Specifically, you are asked to:
CRS . Shorthand for Coordinate Reference System. The idea is to have a (standard) way of referring to the model of the Earth to which a set of coordinates are referenced. The standard that seems to be gaining traction is from the EPSG (European Petroleum Survey Group), which is based on the earlier work of the Oil & Gas Producers's Surveying and Postioning Committee. Basically, it is a list that gives an ID number to many (all?) of the different known coordinate systems used around the world. The EPSG system distinguishes between Geographical and Projected coordinate reference systems. A Geographical CRS entails an elipse, a prime meridian, and a datum. A Projected CRS entails those plus a measure of length, and a location; essentially it is saying that the locations are projected on a flat map.
Choropleth maps are maps in which enumeration districts, such as countries, are colored to symbolize the occurrence of the mapped phenomena. Usually, cartographers 'normalize' or 'standardize' the data, dividing counts by area or by population, e.g., we map 'cows per acre' or 'births per 100,000 people', to reduce the effect of enumeration district size on the map appearance. (Other wise, ceterus paribus big enumeration districts would tend to have greater amounts of whatever is being mapped.) (But, the ratios can get crazy when the counts get small, and some have argued against the "rule" of only mapping densities.) Cartographers usually use darker symbols to represent greater quantities.
GIS Data. The world is awash with GIS data available on the web. (See the list at the end of this document for some examples.) These data typically have spatial and attribute components and are provided in GIS-accessible formats such as ASCII text files, shapefiles, and other exchange formats. Some of these data are available through special purpose spatial data servers using specialized web clients (Google Earth is an example) to connect to online databases. Some are expressly in the public domain, and others have proprietary restrictions on their use. GIS data for distribution should come with metadata that describe the data and should help you assess whether the data are suitable for your intended use. Minimally, metadata should identify the coordinate system used for the spatial data and a data dictionary which tells what attributes are included and how they are encoded. Metadata might also include information on data provinence and assessments of its completeness and accuracy.
NaturalEarthData is an example of a handy online data source, run by cartographers and for cartographers. It provides data grouped for three broad presentation scale categories: Large Scale (1:10,000,000), Medium scale (1:50,000,000) and Small scale (1:110,000,000); and for cultural vs physical features. For this exercise, we'll use a data set from the small scale and cultural category.
From http://www.naturalearthdata.com/downloads/110m-cultural-vectors/ download the "Admin 0 - Countries" data set and unzip it to some sensible (memorable) directory on your computer. You should have four files:
ne_110m_admin_0_countries.dbf (the data table part) ne_110m_admin_0_countries.prj (the CRS i.e., map projection info) ne_110m_admin_0_countries.shp (the geometric part) ne_110m_admin_0_countries.shx (an index linking the dbf and shp parts)
Note that "shapefiles" really are several files, sharing a common basename. The extensions indicate which part of the information is is each file.
Open the .prj file and see if you can devine the CRS (coordinate reference system) for the data. Do you see "WGS 84" and things that might look like major or minor axes for the earth model that the data are using?
Sometimes the .prj file is missing and you may need to research the CRS for such a data set. Documentation or "metadata" may tell you what coordinate system was used in your data. Most GIS software offers some way of associating data with a CRS. Oten you can copy and rename a .prj file from another shapefile that uses the same CRS. In other cases you may need to build a ".prj" file by hand in a text editor.
Quantum GIS (QGIS) Hints. QGIS is a free and open source (FOSS) geographic information system (GIS) that is free; runs on MS-Windows, MacOS, and Linux; uses common data file formats; presents a 'typical' point-and-click interface; and provides sufficient spatial analytic capabilities to demonstrate many GIS functions. For this assignment, the data input, data selection and symbolization, and map output functions will be most relevant.
QGIS can be downloaded from www.qgis.org . The self extracting installation works on Windows, Mac, and Linux. Go ahead and install it.
You may want to start the QGIS program and take a few moments to see what is under each menu tab; the names may not make sense at first but reading through them now will make it easier to find things later.
Reading data into QGIS... Reading vector data is in the Menu: Layer -> Add Vector Layer ... (or an icon in the menu bar). In the panel dialog, you will need to navigate to a directory and specify a kind of file to show (e.g., [OGR] KML or SHAPEFILE). Tell it to load the data. You should see the data and an entry in the "TOC" to the left.
Menu: Layer -> Properties -> Style... this is where you set symbology.
Map CRS. You can explicitly set the projection to use for the map display. (Use File:Project Properties:Coordinate Reference System (CRS) tab to select the desired coordinate system for the map display. NB This affects the display, not the data. ("Check" the "Enable 'on the fly' CRS transformation" if QGIS will need to re-project data sets from several CRS.)
We'll be making a choropleth map and should use an equal area projection. QGIS provides several possibilites under the Projected Coordinate Systems group. (Some work better than others with global data. Try selecting and applying several of these and see.)
Equal Area Cylindrical : NSIDIC EASE-Grid Global EPSG:3410 1368 Albers Equal Area : NAD83 Texas Centric Albers EA EPSG:3083 1046 Lambert Azimuthal Equal Area : US National Atlas EA EPSG:2163 154 Lambert Azimuthal Equal Area : WGS84 North P LAEA Europe EPSG:3575 1531
Open the attribute table for your features to explore the data that you have. (Layer:Open Attribute Table) The table should have one record (row of data) for each feature on the map. Each record has the same columns (fields or attributes). The columns of data record values of attributes of the spatial objects. You can sort the records on the values in the various fields. You can use the query facility at the bottom of the table panel to select features. You can add new coulmns to the table. You can do math (not meth) on the columns, mostly like a spread sheet.
You can set map symbology for your data. In the "TOC" you can check layers on and off, drag layers up and down in the "TOC" so that points and lines appear "on top of" polygons, and more. The order in the TOC sets the drawing order. There is finer control for the symbology in indiviual layers, allowing considerable design flexibility.
There are classification tools to produce range-graded graduated color symbols. (Right click the layer in the TOC, choose Properties, then Symbology.) (or use the Layers tab, and choose Properties). On the "Style" panel, you can set the Legend Type to be "Graduated Symbol" (rather than Single Symbol), select a Classification Field (the variable to map), select a classiication scheme or Mode (Equal Interval or Quantiles), select a Number of Classes (5, 6, 7, whatever). Then, press the "Classify" button.
The "New symbology" tools, are better than the "Old symbology" tools. You can choose on the upper right of the Layer -> Properties -> Style panel. This will give you better and more control of the classification and the symbology.
Normalizing count data by area or population. In making a choropleth map, you'll usually want to "normalize" count data, by either area or population, in order to map density. Examples include (area in farms in enumeration district)/(total area in enumeration district) and (sick people in enumeration district)/(population in enumeration district).
Some GIS software lets you normalize data when you choose a data field to display. QGIS makes you calculate a data field that reflects the normalization, and trys to make it easy with an "area" button in the "field calculator" dialog, but the "area" is calculated in the data CRS, not the display CRS. If you want to normalize by area and have data in latitude and longitude, save the data in a re-projected equal area projection first and then use that data to calculate areas. Saving your maps ... You can export the map as an image file, maybe to add to a word processing document. Save a snapshot of a map from QGIS by, from the menu... File:Save as Image will let you save a .jpg (or .bmp etc.) image of the map from your screen which you could then include in a webpage or other document.
You can use QGIS's map composition tool (Print Composer) to make a more attractive map. This is under File -> New Print Composer. Basically, you define a map compostion as a page with an orientation and size, and then build the composition by adding elements to the page. These elements can include one or more maps, text and title boxes, legend, scale bar, images, north arrow, etc. The composition can be saved/exported as an image, postscript, or svg file.
There are lots of other sources online. Try a web search, including "shapefile" and "data" in the search terms.