You are here: Home Research MPC Working Papers Series 2007 Working Papers #2007-01 Using Cyber-Resources to Build Databases for Social Science Research
Document Actions

#2007-01 Using Cyber-Resources to Build Databases for Social Science Research

Authors: Matthew Sobek, Monty Hindman, and Steven Ruggles, Minnesota Population Center, University of Minnesota

Abstract

The Integrated Use Microdata Series (IPUMS) is the premiere infrastructure project supported through the NSF Human and Social Dynamics Priority Area. Over the next four years, the IPUMS-International project will release data and metadata from approximately 150 censuses of 45 countries, totaling about a half-billion records and some 20,000 variables. Because of the unprecedented scale of this work, we have had to develop innovative cyber infrastructure for both data processing and dissemination. The source data consists in most cases of raw microdata captured by census enumerations during the past 50 years, usually in obsolete formats with paper documentation. Our greatest challenge is development of comprehensive machine-processable encoded electronic documentation, or metadata. This metadata underlies every aspect of IPUMS-International data processing work, including standardizing data formats and correcting format errors; assessing data quality and coverage problems; drawing high-density samples; identifying and correcting internal inconsistencies using logical and probabilistic procedures; allocating missing values; analyzing confidentiality risks and applying statistical confidentiality protections; and harmonizing variables. The same metadata drives our integrated web-based data access system that provides advanced tools for navigating documentation, defining datasets, constructing customized variables, and adding contextual information, as well as a basic set of on-line data analysis tools.

Download paper.


Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: