Historical Census Record Linkage Using Support Vector Machines
Minnesota Supercomputing Institute Open House
Muhammad Ahmad , Ron Goeken, Lap V Huynh , Tom Lenius, and Rebecca Vick
University of Minnesota, Minnesota Population Center
The Minnesota Population Center (MPC) received a grant to create representative linked datasets of individuals and family groups enumerated in the U.S. censuses of the late 19th and early 20th centuries. The project goal was to link records from the IPUMS 1850, 1860, 1870, 1900, 1910, 1920 and 1930 one-percent samples to a database consisting of all individuals enumerated in the 1880 census. These datasets will combine information from two distinct censuses for each linked record and will greatly expand research possibilities in areas such as migration and occupational mobility.
A primary focus for this project has been to create a record linkage process that will result in representative data. Thus the variable set used for establishing links is limited to characteristics that will not change over time (e.g., name, race, place of birth) or will change in predictable ways (e.g., age). We do not use place of residence to create links, for example, because doing so would increase the likelihood of linking non-migrants.