Data Source Descriptions
Current Population Survey
Bureau of Labor Statistics
The Current Population Survey (CPS) provides monthly estimates
of total employment, unemployment, and other characteristics of the civilian noninstitutionalized
population 16 years old and over as well as for various demographic groups. The
Annual Social and Economic Supplement (ASEC), formerly called the Annual Demographic
Supplement (ADS), supplements the basic CPS labor force data with information on
income, including noncash income sources such as food stamps, school lunch program,
employer-provided group health insurance plan, employer-provided pension plan, personal
health insurance, Medicaid, Medicare, CHAMPUS or military health care, and energy
assistance. Data from the ASEC also includes information on the prior year’s work experience of persons
for whom information is collected including occupation, and industry.
The survey universe is composed of persons 15 years
of age and over in the civilian noninstitutionalized population. Published labor
force data from the CPS are for those aged 16 years and over. While active-duty
members of the Armed Forces are not asked questions regarding their labor force
status, they are asked questions about their income.
The basic CPS has been conducted since 1945, although
some data were collected prior to that time. Collection of income data began in
1948. Over the years, the number of income questions in the ASEC has expanded. In
1994 major changes to the basic CPS labor force questions were introduced, which
included a complete redesign of the questionnaire including new health insurance
questions and the introduction of computer-assisted interviewing for the entire
survey. In addition, there were revisions to some of the labor force concepts and
definitions. Prior to the redesign, CPS data were primarily collected using a paper-and-pencil
Households in the sample are interviewed for 4 consecutive
months, not interviewed for 8 consecutive months, and then interviewed again for
4 consecutive months (then dropped out of the sample). Over the whole 16-month period
a household is interviewed eight times. The CPS includes both in-person and telephone
Unit of Analysis
Households, families, and persons.
The CPS sample is located in 754 sample areas, with coverage
in every State and the District of Columbia. The basic CPS sample is selected from multiple
frames using multiple stages of selection. Each unit is selected with a known probability to represent similar units
in the universe. The sample design is a State-based design, with the sample in each
State being independent of the others.
CPS data can be obtained from either the U.S. Census
Bureau (www.census.gov) or the Bureau of Labor Statistics (www.bls.gov/cps).
U.S. Census Bureau. Technical Paper 63RV. Current Population Survey: Design and Methodology. TP63RV, March 2002
found at www.census.gov/prod/2002pubs/tp63rv.pdf.
Decennial Census and Population Estimates
Sponsor: U.S. Census Bureau
The U.S. decennial census serves two main purposes:
- to apportion the 435 seats in the U.S. House of Representatives among the 50 States. The U.S. Constitution, Article
I, Section 2, apportionment of representatives among the States, for the House of
Representatives, must be carried out every 10 years (decennially); and
- to enumerate
the resident population. For Census 2000, data on sex, race, Hispanic origin, age,
and tenure were collected from 100 percent of the enumerated population. More detailed
information, such as income, education, housing, occupation, and industry, was collected
from a representative sample of the population.
U.S. resident population
Census 2000 was the last count of the U.S.
population collected by the U.S. Census Bureau. The U.S. Census Bureau’s primary
method of data collection is to mail out questionnaires using the Master Address
File, which includes information from the U.S. Postal Service and the Local Update
of Census Addresses (LUCA) program, and by using enumerators. Enumerators are U.S.
Census Bureau staff that travel door-to-door gathering data by canvassing roads
and streets looking for living quarters. For Census 2000, as in several previous
censuses, two forms were used—a short form and a long form. The short form was sent
to every household, and the long form, containing the seven 100 percent questions
plus the sample questions, was sent to only a limited number of households, about
one in every six homes. The extended census form collects information on social,
housing, economic, and financial characteristics. The national final response rate
for Census 2000 was 67 percent. This exceeded the projected response rate of 61
percent and was better than the 65 percent response rate from the 1990 census.
One of two different survey forms was used to enumerate the U.S. population:
- A short form with seven basic questions,
- a long form including
all questions from the short form and additional inquiry questions. On average,
one in every six households received the long form.
Unit of Analysis
Person-level data analysis.
There were several important survey question changes and/or
additions for Census 2000. One such change deals with the question of race. The
question on race on the 2000 census was based on OMB’s 1997. “Revisions of the Standards
for the Classification of Federal Data on Race and Ethnicity.” The 1997 Standards
incorporated two major changes in the collection, tabulation, and presentation of
race data. First, the 1997 standards increased from four to five the minimum set
of categories to be used by Federal agencies for identification of race: American
Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other
Pacific Islander, and white. Second, the 1997 standards included the requirement
that Federal data collection programs allow respondents to select one or more race
categories when responding to a query on their racial identity. One additional question added to Census 2000 asked
about grandparents as caregivers while several questions from the 1990 census, including
information about children ever born, source of water, sewage disposal, and condominium
status, were dropped for Census 2000. Another important change for Census 2000 was
the question on disability. In 1990, the question was .Does this person have a physical,
mental or other health condition which has lasted for more than 6 months and that
limits the amount of work this person can do at a job or prevents this person from
working at a job. In 2000, the question was revised to inquire about blindness,
deafness, and the ability to perform physical and mental tasks. Also in 1990, the
questions on disability were asked for those 15 years and over; while in 2000, the
data were collected for persons 5 years and over.
In addition to conducting the Census every 10 years,
updates to Census population counts are also conducted between census years.
Postcensal Population Estimates
These are estimates made for the
years following a census, before the next census has been taken. National postcensal
population estimates are derived by updating the resident population enumerated
in the decennial census using a component of population change approach. The following
formula is used to update the decennial census counts:
- decennial census enumerated resident population
- + births to U.S. resident women
- – deaths to U.S. residents
- + net international migration
- + net movement of U.S. Armed Forces and U.S. civilian citizens
Intercensal Population Estimates
The further from the census year
on which the postcensal estimates are based, the less accurate are the postcensal
estimates. With the completion of the decennial census at the end of the decade,
intercensal estimates for the preceding decade were prepared to replace the less
accurate postcensal estimates. Intercensal population estimates take into account
the census of population at the beginning and end of the decade. Thus intercensal
estimates are more accurate than postcensal estimates because they correct for the
“error of closure” or difference between the estimated population at the end of
the decade and the census count for that date.
Data from Census 2000 and previous census surveys
can be obtained primarily through various tools used on the Census 2000 website.
Census 2000 is the first census for which the internet site listed above is the
primary means of disseminating the data. In addition to formatted tables, the Census
Bureau website has maps and data sets available for downloading (file transfer protocol, or FTP), printing, viewing, and manipulating. Special reports and briefs on Census
data provide background information, explain how data was analyzed, and differences
between 1990 and 2000 data.
Public-Use Microdata Area (PUMA)
A geographic entity for which the U.S. Census
Bureau provides specially selected extracts of raw data from a small sample of long-form
census records that are screened to protect confidentiality of census records. The
extract files are referred to as public use microdata samples (PUMS). Public use
microdata areas (PUMAs), which must have a minimum census population of 100,000
and cannot cross a State line, receive a 5-percent sample of the long form records;
these records are presented in State files. These PUMAs are aggregated into super-PUMAs,
which must have a minimum census population of 400,000 and receive a 1-percent sample
in a national file. PUMAs and super-PUMAs are mutually exclusive, that is, they
use different records to create each sample. Data users can use these files to create
their own statistical tabulations and data summaries.
Specific microdata samples available on CD-ROM/DVD can be obtained through the census
catalog available on the U.S. Census Bureau’s home page (www.census.gov).
Summary File 1 (SF 1) contains 286 detailed tables focusing on age, sex, households,
families, and housing units. These tables provide in-depth figures by race and Hispanic
origin; some tables are repeated for each of nine major race/Latino groups. Counts
also are provided for over 40 American Indian and Alaska Native tribes and for groups
within race categories. The race categories include 18 Asian groups and 12 Native
Hawaiian and Other Pacific Islander groups. Counts of persons of Hispanic origin
by country of origin (28 groups) are also shown.
Alaska Native tribes and for groups within race categories. The race categories
include 18 Asian groups and 12 Native Hawaiian and Other Pacific Islander groups.
Counts of persons of Hispanic origin by country of origin (28 groups) are also shown.
Summary File 1 presents data for the United States, the 50 States, and the District of Columbia
in a hierarchical sequence down to the block level for many tabulations, but only
to the census tract level for others. Summaries are included for other geographic
areas such as ZIP Code Tabulation Areas (ZCTAs) and Congressional districts.
Geographic coverage for Puerto Rico is comparable to the 50 States. Data are presented in a
hierarchical sequence down the block level for many tabulations, but only to the census tract level for others.
Geographic areas include barrios, barrios-pueblo, subbarrios, municipios, places,
census tracts, block groups, and blocks. Summaries also are included for other geographic
areas such as ZCTAs.
Summary File 2 (SF 2) contains 47 detailed tables focusing on age, sex, households,
families, and occupied housing units for the total population. These tables are
repeated for 249 detailed population groups based on the following criteria:
- No tables are available for geographic areas having a population of less than 100.
- Tables are repeated only for the race groups, American Indian and Alaska Native tribes, and Hispanic or Latino groups having a population of 100 or more within the geographic area.
For a complete list of the 249 population groups, see Appendix H of the SF 2 Technical
Summary File 3 consists of 813 detailed tables of Census 2000 social, economic,
and housing characteristics compiled from a sample of approximately 19 million housing
units (about 1 in 6 households) that received the Census 2000 long-form questionnaire.
Fifty-one tables are repeated for nine major race and Hispanic or Latino groups:
White alone; Black or African American alone; American Indian and Alaska Native
alone; Asian alone; Native Hawaiian and Other Pacific Islander alone; Some other
race alone; Two or more races; Hispanic or Latino; and White alone, not Hispanic
Summary File 3 presents data for the United States, the 50 States, the District of Columbia, and
Puerto Rico in a hierarchical sequence down to the block group for many tabulations, but only
to the census tract levels for others. Summaries are included for other geographic
areas such as Zip Code Tabulation Areas (ZCTAsTM) and Congressional districts (106th
Summary File 4 (SF4) contains the sample data, which is the information compiled
from the questions asked of a sample of all people and housing units.
The sample data are presented in 213 population tables (matrices) and 110 housing
tables, identified with “PCT” and “HCT,” respectively. Each table is iterated for
336 population groups: the total population, 132 race groups, 78 American Indian
and Alaska Native tribe categories (reflecting 39 individual tribes), 39 Hispanic
or Latino groups, and 86 ancestry groups.
SF 4 is released as individual files for each of the 50 States, the District of Columbia, and Puerto Rico;
and for the United States. The tables (matrices) are identical for all files, but the geographic coverage
differs. Data are provided down to the census tract level.
The next decennial census will be conducted in 2010.
Reengineering of the 2010 census includes replacing the long form with the American
Community Survey (ACS). The ACS is a new nationwide survey designed to provide communities
a fresh look at how they are changing. It is intended to eliminate the need for
the long form in the 2010 Census. The ACS collects information from U.S.
households and group quarters similar to what was collected on the Census 2000 long
form, such as income, commute time to work, home value, veteran status, and other
important data. As with the official U.S. census, information about individuals will remain confidential.
Health and Retirement Study
The Health and Retirement Study (HRS) is a major national panel study of the lives
of older Americans. The HRS includes the original. HRS and the Asset and Health
Dynamics Among the Oldest-Old (AHEAD) study. These studies were merged in 1998 and
now represent the United States population over age 50. The study is funded by the
National Institute on Aging to provide researchers, policy analysts, and program
planners with current data on the antecedents and consequences of retirement.
Questionnaire topics include physical and cognitive functioning, retirement plans,
family structure and transfers, demographic characteristics, housing, employment
status, income, disability, health insurance, pension plans, job history,and
attitudes, preferences, and expectations for the future. The survey data are
linked with administrative records from the Employer Pension Study (1993 and 1999),
National Death Index, Social Security Administration earnings and projected benefits
data and W-2 self-employment data, and Medicare files.
During each 2-year cycle of interviews, the HRS team surveys more than 20,000 people
who represent the Nation’s diversity of economic conditions, racial and ethnic backgrounds,
health, marital histories and family compositions, occupations and employment histories,
living arrangements, and other aspects of life. Since 1992, more than 27,000 people
have given 200,000 hours of interviews. The HRS is managed jointly through a cooperative
agreement between the National Institute on Aging (NIA) and the Institute for Social
Research (ISR) at the University of Michigan.
The study is designed, administered, and conducted by the ISR, and decisions about
the study content are made by the investigators. The principal investigators at
the University of Michigan are joined by a cadre of co-investigators and working
group members who are leading academic researchers from across the United States
in a variety of disciplines, including economics, medicine, demography, psychology,
public health, and survey methodology. In addition, the NIA is advised by a Data Monitoring Committee
charged with maintaining HRS quality, keeping the survey relevant and attuned to the technical needs of researchers
who use the data, and ensuring that it addresses the information needs of policymakers
and the public.
Since the study began, 7,000 people have registered to use the data, and nearly
1,000 researchers have employed the data to publish more than 1,000 reports, including
more than 600 peer-reviewed journal articles and book chapters, and 70 doctoral
The origins of the HRS date back to the mid-1980’s when the NIA and its advisors
from demography, economics and sociology recognized that the Baby Boom and the subsequent
fertility decline coupled with growing life expectancy would confront America with
population aging which, in turn, would create major challenges for public sector
Social Security retirement and disability and Medicare programs and for private
sector employer pensions and health insurance when the Boomers began to retire around
HRS began in 1992 as a longitudinal study of a pre-retirement cohort of individuals
born in 1931–41, and their spouses, who were 51–61 years old at baseline and receive
longitudinal follow-up interviews at two year intervals. It was joined in 1993 by
a companion study, AHEAD (Asset and Health Dynamics of the Oldest Old), consisting
of a cohort of persons born before 1924 who were aged 70 and over, and their spouses.
In 1998, this design was revised to convert the HRS from a study of specific cohorts
into a steady state design that would represent the U.S. population over age 50
by adding a new six year cohort of persons entering their 50s every six years. In
1998, this design required the addition of the CODA (Children of the Depression)
cohort born in 1924–30 who were entering their seventies and the War Baby Cohort
born in 1942–47 who were entering their fifties. The longitudinal design has been
continued with interviews of all existing cohorts in 2000, 2002 and 2004 and the steady state aspect
has been carried forward with the addition in 2004 of the Early Baby Boomers (EBB)
who were born in 1948–53 and age 51–56 at baseline. Between 1992 and 2002, the HRS
conducted at least one interview with about 27,000 individuals with a total of 114,000
taken. During that period 5000 participants died, and 4000 retirements took place.
In 2004, HRS conducted 18,479 longitudinal follow-up interviews and 3,330 baseline
interviews with the Early Boomers.
All publicly available data may be downloaded after
registration from http://hrsonline.isr.umich.edu. Early Release data files are typically available within
3 months of the end of each data collection, with the Final Release following at
24 months after the close of data collection activities. Files linked with administrative
data are released only as restricted data through an application process, as outlined
on the HRS website.
The HRS bibliography of nearly a thousand publications
is online at http://hrsonline.isr.umich.edu/papers/sho_papers.php?hfyle=bib_all. To search for a specific
publication or topic in the bibliography, click on the link for the Dynamic Bibliography
or go to http://hrsonline.isr.umich.edu/biblio/index.
Contact: David R. Weir, Director
National Health Interview Survey
The National Health Interview Survey (NHIS), conducted by the
National Center for Health Statistics, is a continuing nationwide sample survey in which data are
collected during personal household interviews.The NHIS is the principal source
of information on the health of the civilian, noninstitutionalized, household population
of the United States. Interviewers collect data on illnesses, injuries, impairments, and chronic conditions;
activity limitation caused by chronic conditions; utilization of health services;
and other health topics. Information is also obtained on personal, social, economic,
and demographic characteristics, including race and ethnicity and health insurance
status. Each year the survey is reviewed, core questionnaire items are revised every
10–15 years (with major revisions occurring in 1982 and 1997) and special topics
are added or deleted annually.
In 2006 a new sample design was implemented. This design, which is expected to be
in use through 2014, includes all 50 states and the District of Columbia
as the previous design did. Oversampling of the black and Hispanic populations has
been retained in 2006 to allow for more precise estimation of health characteristics
in these growing minority populations. The new sample design also oversamples the
Asian population. In addition, the sample adult selection process has been revised
so that when black, Hispanic, or Asian persons aged 65 years or older are present,
they have an increased chance of being selected as the sample adult. The new design
reduces the size of the NHIS by approximately 13 percent relative to the previous
sample design. The interviewed sample for 2006 consisted of 29,204 households, which
yielded 75,716 persons in 29,868 families. More information on the survey methodology
and content of the NHIS can be found at www.cdc.gov/nchs/nhis.htm.
Additional background and health data for adults are available in Summary Health
Statistics for the U.S. Population: National Health Interview Survey available online
Contact: NHIS staff
Sponsor: U.S. Census Bureau
Information about the possible future race/origin/age/sex composition of the United States.
The population projections for the United States
are interim projections that take into account the results of Census 2000. These
interim projections were created using the cohort-component method, which uses assumptions
about the components of population change. They are based on Census 2000 results,
official post-census estimates, as well as vital registration data from the
National Center for Health Statistics. The assumptions are based on those used in the projections
released in 2000 that used a 1998 population estimate base. Some modifications were
made to the assumptions so that projected values were consistent with estimates
from 2001 as well as Census 2000.
Fertility is assumed to increase slightly from current estimates. The projected
total fertility rate in 2025 is 2.180, and it is projected to increase to 2.186
by 2050. Mortality is assumed to continue to improve over time. By 2050, life expectancy
at birth is assumed to increase to 81.2 for men and 86.7 for women. Net immigration
is assumed to be 996,000 in 2025 and 1,097,000 in 2050.
Race and Hispanic Origin
Interim projections based on Census 2000 were also done by race and Hispanic origin.
The basic assumptions by race used in the previous projections were adapted to reflect
the Census 2000 race definitions and results. Projections were developed for the following groups:
- non-Hispanic white alone,
- Hispanic white alone,
- black alone,
- Asian alone, and
- all other groups.
The fifth category includes the categories of American Indian and Alaska Native, Native Hawaiian and Other
Pacific Islanders, and all people reporting more than one of the major race categories defined by the Office
of Management and Budget.
For a more detailed discussion of the cohort-component method and the assumptions
about the components of population change, see U.S. Census Bureau, Population Division
Working Paper No. 38, “Methodology and Assumptions for the Population Projections
of the United States : 1999 to 2100,” by Hollmann, Mulder, and Kallan.
While this paper does not incorporate the updated assumptions made for the interim projections,
it provides a more extensive treatment of the earlier projections, released in 2000, on which the interim series
Contact: Population Projections Branch
Survey of Income and Program Participation
Sponsor: U.S. Census Bureau, Social Security Administration
To collect source and amount of income, labor force information,
program participation and eligibility data, and general demographic characteristics
to measure the effectiveness of existing Federal, State, and local programs; to
estimate future costs and coverage for government programs, such as food stamps;
and to provide improved statistics on the distribution of income in the country.
SIPP also offers detailed information on cash and noncash income on a sub-annual
basis in addition to collecting data on taxes, assets, liabilities, and participation
in government transfer programs.
U.S. civilian noninstitutionalized population
This is a longitudinal survey—a continuous series of national panels.
Most interviews conducted through 1991 were in the form of personal visits. In 1992, SIPP
switched to maximum telephone interviewing to reduce costs. Wave 1, 2, and 6 interviews were
still conducted in person, but other interviews were conducted by telephone to the extent possible.
SIPP telephone interviews and personal visits are carried out by the same interviewer interacting with the
same respondents. Interviewers typically make phone calls from their homes. For security and
confidentiality reasons, they are not allowed to use cellular or cordless telephones in the interviews.
If a standard telephone is not available, the interviews must be conducted face-to-face.
Repeated failure to reach a respondent by telephone may also require an in-person visit to the listed address.
Unit of Analysis
All household members 15 years old and over are interviewed by self-response, if possible;
proxy response is permitted when household members are not available for interviewing.
The SIPP sample is a multistage-stratified sample of the U.S. civilian noninstitutionalized
population with sample sizes ranging from approximately 14,000 to 45,000 interviewed
households per panel. The duration of each panel ranges from 2½ years to 4 years.
The Census Bureau also over sampled the low-income population for the 1996, 2001,
and 2004 Panels using decennial census information. Housing units within each PSU
were split into high- and low-poverty strata. If the housing unit received the Census
long form that included income questions, the unit’s poverty status was determined
directly; for other housing units, poverty status was assumed on the basis of responses
to Census short-form items predictive of poverty rates.
Income, labor force participation, program participation, and eligibility.
For the 1984–93 panels, a panel of households was introduced each year in February.
A 4-year panel was introduced in April 1996. A 2000 panel was introduced in February 2000
for 2 waves. A 3-year 2001 panel was introduced in February 2001, and a 2½ year 2004 panel
was introduced in February 2004.
The Census SIPPs are linked to the IRS wage and self-employment and tip tax records (1040SEs).
These exist for each year from 1982 to most recent year lagged one year. The Social Security
earnings are from IRS forms and owned by IRS. They are the employer reported wages and salaries
and self-employed reported income subject to taxation by Social Security up to the maximum subject
to tax. They exist for each year from 1951 to most recent year lagged one year. The most recent year available is 2004.
SSA benefit records contain information on Medicare Part A, Part B, and Part D low
income subsidy and Medicaid subsidies of Medicare Part B (QMB, SLMB, QI).
Data collected in SIPP and supporting documentation are available in various forms.
They include published estimates based on those data, micro data in several formats,
documentation for each of the micro data files, and more general documentation about
methodological issues in SIPP. The latter includes the SIPP Quality Profile, a series
of working papers distributed by the Census Bureau, articles published in academic journals,
and conference proceedings.
SIPP microdata files can be obtained from several sources. All public use micro
data files can be obtained on CD-ROM directly from the Census Bureau. SIPP micro
data are available online from the SIPP website at www.sipp.census.gov/sipp/. The
Internet site offers two data access tools DataFerrett and the SIPP FTP site. DataFerrett
is a system that enables users to access and manipulate large demographic and economic
data sets on-line. The SIPP FTP site has data files and documentation for downloading.
Cross-sectional data are presented for various socioeconomic characteristics for
a 4-month period. Longitudinal data are presented for a 2½-year or 3-year period.
Variables for both data sets include age, race, sex, Hispanic origin, marital status,
household/family relationship, educational attainment, work experience, and income.
Basic cross-sectional questions are supplemented with topically relevant questions
such as employment history, work disability, education, health care, financial assets,
retirement accounts, etc.
SIPP publications can be found at www.sipp.census.gov/sipp/pubs.html.