A Study of the Openness of GIS Datasets on China, and the Integration of Available Data in the OS/OA Tsongkha Valley Project: Final Report

Abstract: 

As part of a growing movement to incorporate digital tools into humanities research, some scholars are turning to geographical software to organize historical data. By viewing historical sites in the context of an interactive, digital map, new spatial information is produced about these different sites. One of the most popular formats for geographical information is in geographic information system (GIS) datasets, which include coordinates (latitude and longitude) for various sites (cities, borders, important buildings, etc.) cross-referenced with variables such as population, occupations, elevation, languages spoken, and so on. Data from GIS datasets are used by Plateau Culture to organize information, such as images, writings and citations, about religious and historical sites on a digital map.  

This report is part of a 2010 project entitled “A Study of the Openness of GIS Datasets on China, and the Integration of Available Data in the OS/OA Tsongkha Valley Project,” sponsored by the University of Toronto's Project Open Source | Open Access initiative. As part of this project, undergraduate student Nicholas Field (University of Toronto) entered data about historical sites onto Plateau Culture (a web-based resource sharing project focusing on the Tsongkha Valley region of Qinghai Province, PRC) and developed this report, which evaluates the openness of GIS datasets (in terms of their cost, accessibility and freedom for use in academic works) relevant to Buddhology and Area Studies (specifically relating to Chinese and Tibetan cultural regions).  

A Study of the Openness of GIS Datasets on China, and the Integration of Available Data in the OS/OA Tsongkha Valley Project: Final Report 

Sponsor: Project Open Source | Open Access, Knowledge Media Design Institute, University of Toronto 
Project Lead: Frances Garrett, Assistant Professor, Department and Centre for the Study of Religion 
Student Researcher: Nicholas Field 
Project Duration: May 2010 - August 2010
 

As part of a growing movement to incorporate digital tools into humanities research, some scholars are turning to geographical software to organize historical data. By viewing historical sites in the context of an interactive, digital map, new spatial information is produced about these different sites. One of the most popular formats for geographical information is in geographic information system (GIS) datasets, which include coordinates (latitude and longitude) for various sites (cities, borders, important buildings, etc.) cross-referenced with variables such as population, occupations, elevation, languages spoken, and so on. Data from GIS datasets are used by Plateau Culture to organize information, such as images, writings and citations, about religious and historical sites on a digital map.  

This report is part of a 2010 project entitled “A Study of the Openness of GIS Datasets on China, and the Integration of Available Data in the OS/OA Tsongkha Valley Project,” sponsored by the University of Toronto's Project Open Source | Open Access initiative. As part of this project, undergraduate student Nicholas Field (University of Toronto) entered data about historical sites onto Plateau Culture (a web-based resource sharing project focusing on the Tsongkha Valley region of Qinghai Province, PRC) and developed this report, which evaluates the openness of GIS datasets (in terms of their cost, accessibility and freedom for use in academic works) relevant to Buddhology and Area Studies (specifically relating to Chinese and Tibetan cultural regions).  

Each entry below represents a website or company from which one can obtain GIS datasets relating to China, especially those involving Qinghai province, Tibetan cultural regions or religious / historical sites in China. The organizations below each offer a number of datasets; no attempt has been made to evaluate each dataset individually (though exceptionally interesting datasets are noted), but instead to evaluate the openness, cost and general usefulness of all datasets available from each organization. In terms of the basic structure of each entry, they include the name, organizing body (often a company or university), website and (occasionally) contact information for each organization. Also included are comments on the approximate cost of datasets, their openness for use in academic works and non-commercial online projects (such as Plateau Culture) and, when possible, any information on necessary citations or acknowledgements that each organization requires. While this information is supplied here for the convenience of users, users should verify its accuracy on their own - Plateau Culture makes no guarantee of the accuracy of this information. Also included are a brief description of what the various data the various available datasets include, such as borders (of countries, prefectures, counties, etc.), coordinates (in latitude and longitude) of cities and historical sites, and so on. Users are advised to compare several different organizations, since basic information (basic borders, highways, bodies of water and coordinates of major cities) are offered by many organizations but more specialized datasets (for example, historical datasets based on data from the nineteenth century) may be offered by only one organization. An overall comparison and evaluation of the different organizations follows their individual entries below.   

China Historical GIS (CHGIS)

Harvard University - Faculty of Arts and Sciences

Main Sitehttp://www.fas.harvard.edu/~chgis/

Openness: Open to anyone, but requires registration. Users must sign in each time they wish to download GIS datasets. Datasets may be used but not sold or distributed. Data can be used for non-commercial projects provided correct citation is included (see below).

Cost:  Free, and a CD with all of the datasets is also available for $10-$15 USD.

Contains: Cities, counties, and other shape files set at different periods (mostly 1820, 1911 and 1990). These datasets focus on historical data, and include datasets based on historical censuses (from the nineteenth century, for example) and a Buddhist Temple dataset (with very little in Qinghai, unfortunately for Plateau Culture). There are many datasets available (at least forty), including several based on Qinghai, the Tarim Basin and Kham in the “Other Datasets” section. Furthermore, several datasets that have been discontinued, such as the Digital Map Database of China, can be found for free on the CHGIS site. There are even more additional datasets (often in Excel worksheet format rather than in shapefiles) available at the working papers download section:

Required Citations:

CHGIS itself does not require a citation or acknowledgment, but individual datasets often do. Details can often be found in the readme file paired with the dataset. For example, below is a citation taken from a readme file: 

“ChinaW Dataset” (c) Zumou Yue, G.William Skinner, and Mark Henderson. 
Davis: University of California, Regional Systems Analysis Project, Jan 2007.
 
 
 

World Language Mapping System

Global Mapping International

Main Site

Openness: These datasets are published on CDs, which must be purchased. Owners and legitimate users may use the information in almost any form, provided they include some minor acknowledgments (below).

Cost:  Quite expensive ($1000 for the first workstation in a government office, reduced prices for additional workstations; reduced price for academic & non-profit customers). See the main site for pricing details.

Contains: Shapefiles for many language regions across the world, including a layer for area of mixed languages. There are some inaccuracies in the Xining area, but seems to be a rather unique resource.

Required Citations (taken directly from the end user agreement): 

11. Publication and Distribution of Maps: Licensee may make, reproduce, and distribute by any means any cartographic representation of the Data and Related Materials, alone or in combination with other data, in printed or graphical file form, provided:

11.1. The graphical file formats used do not contain geographically referenced data usable in a geographic information system.

11.2. If language points or polygons are included in the map, the attribution

“Language data from WLMS 2005 www.gmi.org/wlms” must appear on the map. 

Exception: If maps are an integral part of a larger publication, and if the publication size or resolution of the maps does not reasonably allow inclusion of the above attribution in the map itself, attribution may be placed in text outside the map image, provided a clear reference to the location of the attribution text is provided near the image. For maps appearing as screen-resolution images in HTML documents on the Web or other hypertext electronic publications, the attribution may appear in a separate document accessed by a hyperlink from the document(s) in which the map(s) appear(s). 

China Dimensions - Socioeconomic Data and Applications Center (SEDAC)

Columbia University

Main Sitehttp://sedac.ciesin.columbia.edu/china/index.html

Note: SEDAC includes the datasets from China in Time and Space ().

Openness: Completely open - no registration required. Datasets can be used but not distributed.

Cost:  Free

Contains: County and city level datasets, including several with data from censuses and provincial economic yearbooks. This includes population (broken down into ethnic group, occupational sector, etc.) for every county. Twelve datasets given, plus additional links.

Required Citations: Each dataset must be cited individually. Many of the citations can be found in the Readme files of the zipped archives or by clicking on “anonymous ftp” under each dataset's description. The main SEDAC site (linked on the China Dimensions SEDAC site) asks users to create their own citation if none is given.

Below is an example:

China Time Series Administrative Regions GIS Data: 1:1M, County Level

“The China Time Series Administrative Regions GIS Data: 1:1M, County Level, prepared by CIESIN, CASM and CITAS - UW.” 

China Digital Maps

GfK Geomarketing

Main Sitehttp://www.gfk-geomarketing.com/en/home.html

Openness: Only available through purchase. Under some licenses, datasets may be used but their derivative documents may not be made public (private use only), whereas other licenses are more open. Institutions with an Academic License are permitted to publicly display derivative works (such as Plateau Cultures). Raw data can never be made public, and removing copyright or company logos, as well as editing the raw data (except for minor administrative reasons) is prohibited. One may combine data from these datasets with data from other sources.

Cost:   Quotes are available via email. Costs presumably depend on the institution, number of maps / country, workstations, and license type.

Contains:  Basic shapefiles for provinces, prefectures and counties, as well as cities, highways and postal code regions for many (most?) countries. GfK is a research company that includes digital maps of most countries. Their GIS datasets are mostly regional boundaries, including provincial, prefectural and county-level borders, as well as postcode regions. They also include power lines, highways and other standard map elements, however they do not have more specialized datasets (such as religious sites). All of their datasets are available for a cost in a variety of formats. They are relatively open for use, requiring only a simple citation.

Required Citations: Below are two citations that must be included when presenting derivative works based on data from GfK Marketing.

"Map basis: GfK GeoMarketing" - to be added to all maps that include their data (presumably as part of the map)

"Data basis: GfK GeoMarketing" - to be added to all maps, data, tables, charts, graphs, etc. using their data

Presumably, “Data basis: GfK GeoMarketing” must be added to any derivative work based on GfK's data, while “Map basis: GfK GeoMarketing” must additionally be added to any maps (or digital map projects) using their data. Many details (especially regarding license types, but with no mention of Academic licenses) are available here:

http://www.gfk-geomarketing.com/en/legal/data_and_map_licenses.html

(Points 6-9 relate to the creation of documents based on these datasets, restrictions on publishing or publicly showing such documents, and related topics.) 
 
 

China Data Center

University of Michigan working in collaboration with All China Market Research Co., Ltd., publisher of the GIS datasets. Associated with the Online China Data Service.

Main Sitehttp://chinadatacenter.org/newcdc/

Openness: Datasets are only available to customers of the site. In order to use the datasets, one must subscribe and become a member of the China Data Center.

Cost: Varies. Datasets may be purchased individually or in groups, but cost for an average dataset ranges from several hundred to several thousand USD. Prices are available for each dataset and there is also an email address available to ask for quotes. A basic order sheet with a list of costs is available here:

http://chinadatacenter.org/newcdc/onlinedata.htm

However, they are also offering one free single user subscription to academic libraries. Interested parties can contact them at:

http://chinadataonline.org/asp/newuser.asp

Contains: The large number of datasets are based on provincial yearbooks, providing basic shapefiles of provinces, prefectures, counties and townships, census data on population employment, gender, ethnicity and so on for each province, and a wealth of industrial data since 1999. Some of the most detailed compilations of provincial data are available for purchase by province, with each province costing anywhere from several hundred to several thousand USD. Single page pdf “demos” are available on site to suggest what each dataset looks like. Dataset entries include a list of every variable in their dataset. Of particular interest for Buddhologists (and Plateau Culture) is their newly released dataset, "The Atlas of Religions in China,” which includes locations, addresses and contact information for over seventy thousand religious institutions in mainland China. Their email contact stated that there are over 1900 religious sites in Qinghai of various denominations (Buddhist, Muslim, Christian, Daoist, etc), including each site's location (lat/long coordinates), address, postal code, starting year, township, revenue range, employee scale and contact person. Academic copies are $1 800 USD for academic users.

Contact InformationThe China Data Center can be called at  and emailed at .

Citation Information: Those using the datasets for research or academic publications must cite Online China Data Service as the source of the data, and must also state that any opinions expressed in their publication are their own and not the opinion of the Online China Data Service. While there are no problems with using these datasets for personal or academic projects or publications (i.e. common derivative works), one may not be able to put derivative works online, depending on one's membership type. Users are advised to contact the China Data Center directly.  
 
 

Star Vision Limited

Main Sitehttp://www.starvision.com.hk/

Chinese datasets available here: http://www.starvision.com.hk/index.cfm?fuseaction=browse&id=37532&pageid=92#overview

Openness: The openness of datasets from Star Vision Ltd. for derivative, non-commercial projects has not yet been clarified. Users are suggested to contact them directly, and include information on which license one is interested in.

Cost: Prices are given only through quotes.

Contains: Includes a list of variables contained in the various shapefiles, including temples. Scales range from 1 : 4 000 000 to 1 : 20 000, with greater resolutions available through special requests. Maps follow a simple lat-long projection. Basically show regional divisions, highways, bodies of water, major buildings (temples, famous restaurants, etc) and census data.

Contact Information: Available by phone, e-mail and post:

Phone:  
Fax:

Email: 

Address:

Star Vision Limited

Suite 3104, 31/F, 
8 Commercial Tower, 
No.8 Sun Yip Street, 
Chai Wan  
Hong Kong 
 

There is also a query form available online:

http://www.starvision.com.hk/index.cfm?id=37532&fuseaction=browse&pageid=61 
 
 
 

Digital Chart of the World

Sold by Environmental Systems Research Institute, Inc. (ESRI), hosted by Pennsylvania State University Libraries. Oringinal data developed by the Defence Mapping Agency and made available via the National Imagery and Mapping Agency (NIMA) ().

Main Site

Openness: These files are readily available (though perhaps outdated) and can be used for private works or for non-commercial public projects with the addition of a few citations (see below for details).

Cost: Free for download via the Pennsylvania State University Libraries, but only the datasets for one country at a time. The entire database is also available on a set of four CDs from ESRI, the current owners of the data. The Pennsylvania University website allows one to generate map-based images for free without the need for ArcGIS, but the graphics are of low quality.

Contains: Datasets with regional borders and basic information, circa 1991-1993. Newer datasets are available from ESRI's site, but they are not compatible with ArcGIS. Due to the age of these files, the graphics are serviceable but not polished. Because these datasets were originally designed for low- and medium- altitude flights, they do not include highly detailed or specialized data.

Citation Information:

1. All derivative works based on ESRI require this citation:

“Portions of this document include intellectual property of ESRI and its licensors and are used herein under license. Copyright © [Insert the actual copyright date(s) from the source materials] ESRI and its licensors. All rights reserved.”

pg 2 article 4.1.d of General License Terms and Conditions 

2. For non-commercial web-based projects - such as Plateau Culture - ESRI has two additional citation requirements: one must acknowledge ESRI as the owner of the GIS datasets and reference their Terms of Use. The example citation from their Terms of Use is: “The ArcGIS JavaScript API [replace with product name] is owned by Esri and is subject to the Terms of Use found athttp://www.esri.com/legal/licensing/software-license.html.”  

Put simply, use the first citation when creating any derivative work, and use both citations when making a web-based, non-commercial derivative work (like Plateau Culture). Though individual countries are distributed for free on the PSU site, ESRI retains copyright of these datasets and has placed limitations on their commercial usage. For more details, see the ESRI Terms of Use here:http://www.esri.com/legal/licensing/termsofuse.html 

Conclusion

While most datasets must be purchased from various organizations, some groups, such as China Historical GIS (CHGIS), China Dimensions (SEDAC) and the Digital Chart of the World (DCW) provide datasets that can be downloaded for free. The China Data Center also gives out free single-user subscriptions to their services for interested academic libraries. In general, most organizations offer basic shapefiles of country, prefectural and county-level borders, as well as including other basic information like bodies of water, population by region (based on China's Provincial Yearbooks) and coordinates for major cities. Datasets available for purchase tend to be more comprehensive, detailed and current, especially when involving variables that change annually, such as population and employment, but several of the more specialized datasets are only available through free organizations, such as CHGIS. There are two exceptions: the World Language Mapping System and China Data Center's “Atlas of Religions in China.” The World Language Mapping System is still one of the few mapping systems that includes shapefiles of the distribution of different language families, though it lacks some detail with lesser-known languages. The “Atlas of Religions in China” contains current information, including coordinates, for tens of thousands of religious sites in modern China. However, both of these datasets are at least a thousand dollars (USD), making them difficult to use unless one's institution already has a subscription.  

While the cost of datasets ranges from completely open (i.e. free) to prohibitive for lone scholars, almost every organization reviewed permits the use of their datasets for a variety of projects. The only requirement above - though it is universal - is a basic acknowledgement or citation when the datasets are used, and examples are often provided in the Terms of Use or readme file accompanying the dataset(s). These datasets can be universally used for private, academic projects, and are almost always permitted to be used in public, non-commercial projects (such as Plateau Culture), though such permissions may be acceptable only under certain licenses. The only exception to this is StarVision Ltd., due to a lack of information. The only prohibited acts were generally those that would violate copyright laws - such as marketing the datasets as one's own, modifying them inappropriately or trying to sell them - or, in the case of websites and derivative products, allowing unlicensed users to obtain the datasets, whether through a direct download or through reverse engineering the derivative product in some manner.

In conclusion, there are many datasets available to scholars of China and Buddhism interested in using GIS software. While some are quite expensive, others are free, and almost all are open for academic derivative works with only a simple citation.  

Source Reference: 

A Study of the Openness of GIS Datasets on China, and the Integration of Available Data in the OS/OA Tsongkha Valley Project: Final Report

Title A Study of the Openness of GIS Datasets on China, and the Integration of Available Data in the OS/OA Tsongkha Valley Project: Final Report
Publication Type Plateau Culture Original
Authors Field, Nicholas
Year of Publication 2010
Date Published 11/2010
URL https://plateauculture.org/writing/study-openness-gis-datasets-china-and-integration-available-data-osoa-tsongkha-valley-projec
Citation Key placul1248
Project: 
Project Open Source/Open Access (POSOA)