Geocoding Public Health Data in New Mexico
Contents |
1 The Case for Geocoding 2 Geocoding Process Recommendations 3 Geocoding Precision Subgroup Members 4 GPS Group Artifacts References |
The Case for Geocoding
Geocoded data would allow the New Mexico DOH to present local health data for a variety of geographies, including census tracts and our recently-designed census-tract-based New Mexico Small Areas. A group of New Mexico Department of Health analysts (Geocoding Precision Subgroup) met regularly over a 2 year period to examine our current geocoding results and processes, identify issues and recommend improvements.
Accurate geocoding depends on having a complete and correct address on every data record that may be matched uniquely to an address in a reference database that provides geographic coordinates for standardized addresses. 'Standardized addresses' follow postal addressing conventions as defined by the United States Postal Service (see http://pe.usps.com/text/pub28/welcome.htm). Certain conditions make it difficult to geocode a record, including incorrect addresses, misspelled or non-standard street names, and addresses consisting of only a post office box or rural route or general delivery address. Furthermore, studies have shown that excluding non-address-matched records introduces systematic bias (e.g., Zandbergen, 2009). In other words, statistics calculated from records with standardized addresses are not representative of records with non-standard address information.
Our New Mexico group found that in recent years, 83% of both birth and death data records have been matched to a standardized street address. Records not matched to a standardized street address have been geocoded with the most precise alternate geocode available, such as the centroid of the ZIP code, populated place, or county. Records whose geo-coordinates placed them into the wrong census tract are considered misclassified. Anecdotal evidence from analysts in our group suggest that misclassification errors in New Mexico are more likely to occur in sparsely-populated areas of the state.
Geocoding Process Recommendations
The New Mexico group made the following recommendations.
- Address standardization at the time of data entry should be a priority goal for all future data systems. Designing information capture to collect this information while the informant is available to verify the standardized address is optimal to increase the proportion of records for which accurate geo-coordinates may be assigned.
- Data records with street addresses should be submitted to address standardization software before submission to the geocoding/linkage program service. The address standardization software puts each address into standardized USPS format, and will catch the most obvious problems, such as missing spaces between words (RioGrande vs. Rio Grande) and common incorrect spellings.
- Data records without street addresses should be manually examined for location information. For instance, institutions such as assisted living facilities, military bases, apartment complexes, and correctional facilities have specific point locations that can be used to geocode a record. A manual process such as this is labor-intensive and costly, but may identify geo-coordinates for a majority of records with un-matched addresses. (In NM manual examination of 10 years of birth and death data without street addresses involves over 107,000 records.) Close collaboration with the data steward is important not only to understand the idiosyncrasies of the data but to address the implications of potential modifications.
- Use of additional reference databases (e.g., E-911 road file, subdivision and facility lists, county parcel files, highway mile marker file, etc.) may be used to assign geo-coordinates to records that lack a standard address. Each specific geo-reference database should be evaluated. For instance, parcel files are notorious for having accurate parcel size information for tax purposes, without having accurate parcel location information. Parcel file accuracy also varies between counties. The ancillary geo-reference databases must be maintained and updated as new information (e.g., new facilities) is encountered.
- When precise location information cannot be assigned by any method, use of the GNIS (Geographic Names Information System) centroid should be used in sparsely-populated areas while the ZIP code centroid coordinates should perform better in more populous areas.
- The method of geo-coordinate assignment should be recorded in a separate field on the data record.
Geocoding Precision Subgroup Members
Will Athas - UNM, Family & Community Medicine
Camille Clifford - DOH, Bureau of Vital Records and Health Statistics
Lois Haggard - DOH, ERD, Community Health Assessment Program
Heidi Krapfl - DOH, ERD, Environmental Epi Bureau
Larry Nielsen - National Association for Public Health Statistics and Information Systems, formerly DOH, Bureau of Vital Records and Health Statistics
Jim Roeber - DOH, Injury and Behavioral Epi Bureau, Substance Use Epidemiology Section
Tom Scharmen - DOH, Metro Region
Barbara Toth - DOH, ERD, Environmental Epi Bureau, EPHT Program
Camille Clifford - DOH, Bureau of Vital Records and Health Statistics
Lois Haggard - DOH, ERD, Community Health Assessment Program
Heidi Krapfl - DOH, ERD, Environmental Epi Bureau
Larry Nielsen - National Association for Public Health Statistics and Information Systems, formerly DOH, Bureau of Vital Records and Health Statistics
Jim Roeber - DOH, Injury and Behavioral Epi Bureau, Substance Use Epidemiology Section
Tom Scharmen - DOH, Metro Region
Barbara Toth - DOH, ERD, Environmental Epi Bureau, EPHT Program
GPS Group Artifacts
Flowchart Diagram: |
Geocoding Protocol Document (DRAFT) |
References
1. PA Zandbergen, JW Green. Error and bias in determining exposure potential of children at school locations using proximity-based GIS techniques. Environmental Health Perspectives, 2007/9, 115(9), 1363-1370.