X

Administration on Aging

Agency for Healthcare Research and Quality

Bureau of Labor Statistics

Census Bureau

Center for Medicare & Medicaid Service

Department of Housing & Urban Development

Department of Veterans Affairs

Employee Benefits Security Administration

Environmental Protection Agency

National Center for Health Statistics

National Institute on Aging

Office of the Assistant Secretary for Planning & Evaluation, HHS

Office of Management & Budget

Social Security Administration

Substance Abuse & Mental Health Services Administration

Skip to the Main Content 

Appendix B
Data Source Descriptions

Current Population Survey
Sponsor: Bureau of Labor Statistics
Purpose: The Current Population Survey (CPS) provides monthly estimates of total employment, unemployment, and other characteristics of the civilian noninstitutionalized population 16 years old and over as well as for various demographic groups. The Annual Social and Economic Supplement (ASEC), formerly called the Annual Demographic Supplement (ADS), supplements the basic CPS labor force data with information on income, including noncash income sources such as food stamps, school lunch program, employer-provided group health insurance plan, employer-provided pension plan, personal health insurance, Medicaid, Medicare, CHAMPUS or military health care, and energy assistance. Data from the ASEC also includes information on the prior year’s work experience of persons for whom information is collected including occupation, and industry.
Survey Universe: The survey universe is composed of persons 15 years of age and over in the civilian noninstitutionalized population. Published labor force data from the CPS are for those aged 16 years and over. While active-duty members of the Armed Forces are not asked questions regarding their labor force status, they are asked questions about their income.
Research Design: The basic CPS has been conducted since 1945, although some data were collected prior to that time. Collection of income data began in 1948. Over the years, the number of income questions in the ASEC has expanded. In 1994 major changes to the basic CPS labor force questions were introduced, which included a complete redesign of the questionnaire including new health insurance questions and the introduction of computer-assisted interviewing for the entire survey. In addition, there were revisions to some of the labor force concepts and definitions. Prior to the redesign, CPS data were primarily collected using a paper-and-pencil form.
Survey Mode: Households in the sample are interviewed for 4 consecutive months, not interviewed for 8 consecutive months, and then interviewed again for 4 consecutive months (then dropped out of the sample). Over the whole 16-month period a household is interviewed eight times. The CPS includes both in-person and telephone interviews.
Unit of Analysis: Households, families, and persons.
Sample: The CPS sample is located in 754 sample areas, with coverage in every State and the District of Columbia. The basic CPS sample is selected from multiple frames using multiple stages of selection. Each unit is selected with a known probability to represent similar units in the universe. The sample design is a State-based design, with the sample in each State being independent of the others.
Data Availability: CPS data can be obtained from either the U.S. Census Bureau (www.census.gov) or the Bureau of Labor Statistics (www.bls.gov/cps).
Reports: U.S. Census Bureau. Technical Paper 63RV. Current Population Survey: Design and Methodology. TP63RV, March 2002 found at www.census.gov/prod/2002pubs/tp63rv.pdf.
For more information: Website: www.census.gov/cps/
 
Decennial Census and Population Estimates
Sponsor: U.S. Census Bureau
Purpose: The U.S. decennial census serves two main purposes:
   (1) to apportion the 435 seats in the U.S. House of Representatives among the 50 States. The U.S. Constitution, Article I, Section 2, apportionment of representatives among the States, for the House of Representatives, must be carried out every 10 years (decennially); and
   (2) to enumerate the resident population. For Census 2000, data on sex, race, Hispanic origin, age, and tenure were collected from 100 percent of the enumerated population. More detailed information, such as income, education, housing, occupation, and industry, was collected from a representative sample of the population.
Survey Universe: U.S. resident population.
Research Design: Census 2000 was the last count of the U.S. population collected by the U.S. Census Bureau. The U.S. Census Bureau’s primary method of data collection is to mail out questionnaires using the Master Address File, which includes information from the U.S. Postal Service and the Local Update of Census Addresses (LUCA) program, and by using enumerators. Enumerators are U.S. Census Bureau staff that travel door-to-door gathering data by canvassing roads and streets looking for living quarters. For Census 2000, as in several previous censuses, two forms were used—a short form and a long form. The short form was sent to every household, and the long form, containing the seven 100 percent questions plus the sample questions, was sent to only a limited number of households, about one in every six homes. The extended census form collects information on social, housing, economic, and financial characteristics. The national final response rate for Census 2000 was 67 percent. This exceeded the projected response rate of 61 percent and was better than the 65 percent response rate from the 1990 census.
Survey Mode: One of two different survey forms was used to enumerate the U.S. population:
(1) A short form with seven basic questions,
(2) a long form including all questions from the short form and additional inquiry questions. On average, one in every six households received the long form.
Unit of Analysis: Person-level data analysis.
Sample: There were several important survey question changes and/or additions for Census 2000. One such change deals with the question of race. The question on race on the 2000 census was based on OMB’s 1997. “Revisions of the Standards for the Classification of Federal Data on Race and Ethnicity.” The 1997 Standards incorporated two major changes in the collection, tabulation, and presentation of race data. First, the 1997 standards increased from four to five the minimum set of categories to be used by Federal agencies for identification of race: American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and white. Second, the 1997 standards included the requirement that Federal data collection programs allow respondents to select one or more race categories when responding to a query on their racial identity. One additional question added to Census 2000 asked about grandparents as caregivers while several questions from the 1990 census, including information about children ever born, source of water, sewage disposal, and condominium status, were dropped for Census 2000. Another important change for Census 2000 was the question on disability. In 1990, the question was .Does this person have a physical, mental or other health condition which has lasted for more than 6 months and that limits the amount of work this person can do at a job or prevents this person from working at a job. In 2000, the question was revised to inquire about blindness, deafness, and the ability to perform physical and mental tasks. Also in 1990, the questions on disability were asked for those 15 years and over; while in 2000, the data were collected for persons 5 years and over.
Data Availability: In addition to conducting the Census every 10 years, updates to Census population counts are also conducted between census years.
Postcensal Population Estimates: These are estimates made for the years following a census, before the next census has been taken. National postcensal population estimates are derived by updating the resident population enumerated in the decennial census using a component of population change approach. The following formula is used to update the decennial census counts:
   (1) decennial census enumerated resident population
   (2) + births to U.S. resident women
   (3) – deaths to U.S. residents
   (4) + net international migration
   (5) + net movement of U.S. Armed Forces and U.S. civilian citizens
Intercensal Population Estimates: The further from the census year on which the postcensal estimates are based, the less accurate are the postcensal estimates. With the completion of the decennial census at the end of the decade, intercensal estimates for the preceding decade were prepared to replace the less accurate postcensal estimates. Intercensal population estimates take into account the census of population at the beginning and end of the decade. Thus intercensal estimates are more accurate than postcensal estimates because they correct for the “error of closure” or difference between the estimated population at the end of the decade and the census count for that date.
Data Dissemination: Data from Census 2000 and previous census surveys can be obtained primarily through various tools used on the Census website www.census.gov/main/www/cen2000.html). Census 2000 is the first census for which the internet site listed above is the primary means of disseminating the data. In addition to formatted tables, the Census Bureau website has maps and data sets available for downloading (file transfer protocol (ftp)), printing, viewing, and manipulating. Special reports and briefs on Census data that provide background information, explain how data was analyzed, and differences between 1990 and 2000 data, can be obtained through the following website:   www.census.gov/population/www/cen2000/briefs.html.
  Public-Use Microdata Area (PUMA) A geographic entity for which the U.S. Census Bureau provides specially selected extracts of raw data from a small sample of long-form census records that are screened to protect confidentiality of census records. The extract files are referred to as public use microdata samples (PUMS). Public use microdata areas (PUMAs), which must have a minimum census population of 100,000 and cannot cross a State line, receive a 5-percent sample of the long form records; these records are presented in State files. These PUMAs are aggregated into super-PUMAs, which must have a minimum census population of 400,000 and receive a 1-percent sample in a national file. PUMAs and super-PUMAs are mutually exclusive, that is, they use different records to create each sample. Data users can use these files to create their own statistical tabulations and data summaries.

Specific microdata samples available on CD-ROM/DVD can be obtained through the census catalog available on the U.S. Census Bureau’s home page (www.census.gov).
  Summary Files Summary File 1 (SF 1) contains 286 detailed tables focusing on age, sex, households, families, and housing units. These tables provide in-depth figures by race and Hispanic origin; some tables are repeated for each of nine major race/Latino groups. Counts also are provided for over 40 American Indian and Alaska Native tribes and for groups within race categories. The race categories include 18 Asian groups and 12 Native Hawaiian and Other Pacific Islander groups. Counts of persons of Hispanic origin by country of origin (28 groups) are also shown.

Alaska Native tribes and for groups within race categories. The race categories include 18 Asian groups and 12 Native Hawaiian and Other Pacific Islander groups. Counts of persons of Hispanic origin by country of origin (28 groups) are also shown.

Summary File 1 presents data for the United States, the 50 States, and the District of Columbia in a hierarchical sequence down to the block level for many tabulations, but only to the census tract level for others. Summaries are included for other geographic areas such as ZIP Code Tabulation Areas (ZCTAs) and Congressional districts.

Geographic coverage for Puerto Rico is comparable to the 50 States. Data are presented in a hierarchical sequence down the block level for many tabulations, but only to the census tract level for others. Geographic areas include barrios, barrios-pueblo, subbarrios, municipios, places, census tracts, block groups, and blocks. Summaries also are included for other geographic areas such as ZCTAs.
    Summary File 2 (SF 2) contains 47 detailed tables focusing on age, sex, households, families, and occupied housing units for the total population. These tables are repeated for 249 detailed population groups based on the following criteria:
  • No tables are available for geographic areas having a population of less than 100.
  • Tables are repeated only for the race groups, American Indian and Alaska Native tribes, and Hispanic or Latino groups having a population of 100 or more within the geographic area.
For a complete list of the 249 population groups, see Appendix H of the SF 2 Technical Documentation (PDF).
    Summary File 3 consists of 813 detailed tables of Census 2000 social, economic, and housing characteristics compiled from a sample of approximately 19 million housing units (about 1 in 6 households) that received the Census 2000 long-form questionnaire. Fifty-one tables are repeated for nine major race and Hispanic or Latino groups: White alone; Black or African American alone; American Indian and Alaska Native alone; Asian alone; Native Hawaiian and Other Pacific Islander alone; Some other race alone; Two or more races; Hispanic or Latino; and White alone, not Hispanic or Latino.

Summary File 3 presents data for the United States, the 50 States, the District of Columbia, and Puerto Rico in a hierarchical sequence down to the block group for many tabulations, but only to the census tract levels for others. Summaries are included for other geographic areas such as Zip Code Tabulation Areas (ZCTAsTM) and Congressional districts (106th Congress).
    Summary File 4 (SF4) contains the sample data, which is the information compiled from the questions asked of a sample of all people and housing units.
The sample data are presented in 213 population tables (matrices) and 110 housing tables, identified with “PCT” and “HCT,” respectively. Each table is iterated for 336 population groups: the total population, 132 race groups, 78 American Indian and Alaska Native tribe categories (reflecting 39 individual tribes), 39 Hispanic or Latino groups, and 86 ancestry groups.

SF 4 is released as individual files for each of the 50 States, the District of Columbia, and Puerto Rico; and for the United States. The tables (matrices) are identical for all files, but the geographic coverage differs. Data are provided down to the census tract level.
Reports: See www.census.gov/main/www/cen2000.html
Future Plans: The next decennial census will be conducted in 2010. Reengineering of the 2010 census includes replacing the long form with the American Community Survey (ACS). The ACS is a new nationwide survey designed to provide communities a fresh look at how they are changing. It is intended to eliminate the need for the long form in the 2010 Census. The ACS collects information from U.S. households and group quarters similar to what was collected on the Census 2000 long form, such as income, commute time to work, home value, veteran status, and other important data. As with the official U.S. census, information about individuals will remain confidential.
For more information: E-mail: pio@census.gov
Phone: 301–763–3977
Website: www.census.gov/main/www/cen2000.html
Health and Retirement Study
  The Health and Retirement Study (HRS) is a major national panel study of the lives of older Americans. The HRS includes the .original. HRS and the Asset and Health Dynamics Among the Oldest-Old (AHEAD) study. These studies were merged in 1998 and now represent the United States population over age 50. The study is funded by the National Institute on Aging to provide researchers, policy analysts, and program planners with current data on the antecedents and consequences of retirement. Questionnaire topics include physical and cognitive functioning, retirement plans, family structure and transfers, demographic characteristics, housing, employment status, income, disability, health insurance, pension plans, job history,and attitudes, preferences, and expectations for the future. The survey data are linked with administrative records from the Employer Pension Study (1993 and 1999), National Death Index, Social Security Administration earnings and projected benefits data and W-2 self-employment data, and Medicare files.

During each 2-year cycle of interviews, the HRS team surveys more than 20,000 people who represent the Nation’s diversity of economic conditions, racial and ethnic backgrounds, health, marital histories and family compositions, occupations and employment histories, living arrangements, and other aspects of life. Since 1992, more than 27,000 people have given 200,000 hours of interviews. The HRS is managed jointly through a cooperative agreement between the National Institute on Aging (NIA) and the Institute for Social Research (ISR) at the University of Michigan. The study is designed, administered, and conducted by the ISR, and decisions about the study content are made by the investigators. The principal investigators at the University of Michigan are joined by a cadre of co-investigators and working group members who are leading academic researchers from across the United States in a variety of disciplines, including economics, medicine, demography, psychology, public health, and survey methodology. In addition, the NIA is advised by a Data Monitoring Committee charged with maintaining HRS quality, keeping the survey relevant and attuned to the technical needs of researchers who use the data, and ensuring that it addresses the information needs of policymakers and the public.

Since the study began, 7,000 people have registered to use the data, and nearly 1,000 researchers have employed the data to publish more than 1,000 reports, including more than 600 peer-reviewed journal articles and book chapters, and 70 doctoral dissertations.

The origins of the HRS date back to the mid-1980’s when the NIA and its advisors from demography, economics and sociology recognized that the Baby Boom and the subsequent fertility decline coupled with growing life expectancy would confront America with population aging which, in turn, would create major challenges for public sector Social Security retirement and disability and Medicare programs and for private sector employer pensions and health insurance when the Boomers began to retire around 2010.

HRS began in 1992 as a longitudinal study of a pre-retirement cohort of individuals born in 1931–41, and their spouses, who were 51–61 years old at baseline and receive longitudinal follow-up interviews at two year intervals. It was joined in 1993 by a companion study, AHEAD (Asset and Health Dynamics of the Oldest Old), consisting of a cohort of persons born before 1924 who were aged 70 and over, and their spouses. In 1998, this design was revised to convert the HRS from a study of specific cohorts into a steady state design that would represent the U.S. population over age 50 by adding a new six year cohort of persons entering their 50s every six years. In 1998, this design required the addition of the CODA (Children of the Depression) cohort born in 1924–30 who were entering their seventies and the War Baby Cohort born in 1942–47 who were entering their fifties. The longitudinal design has been continued with interviews of all existing cohorts in 2000, 2002 and 2004 and the steady state aspect has been carried forward with the addition in 2004 of the Early Baby Boomers (EBB) who were born in 1948–53 and age 51–56 at baseline. Between 1992 and 2002, the HRS conducted at least one interview with about 27,000 individuals with a total of 114,000 taken. During that period 5000 participants died, and 4000 retirements took place. In 2004, HRS conducted 18,479 longitudinal follow-up interviews and 3,330 baseline interviews with the Early Boomers.
Data Availability: All publicly available data may be downloaded after registration from http://hrsonline.isr.umich.edu. Early Release data files are typically available within 3 months of the end of each data collection, with the Final Release following at 24 months after the close of data collection activities. Files linked with administrative data are released only as restricted data through an application process, as outlined on the HRS website.
Bibliography: The HRS bibliography of nearly a thousand publications is online at http://hrsonline.isr.umich.edu/papers/sho_papers.php?hfyle=bib_all. To search for a specific publication or topic in the bibliography, click on the link for the Dynamic Bibliography or go to http://hrsonline.isr.umich.edu/biblio/index.
For more information: Contact: David R. Weir, Director
E-mail: hrsquest@isr.umich.edu
Phone: 734–936–7261
Website: http://hrsonline.isr.umich.edu/
National Health Interview Survey
The National Health Interview Survey (NHIS), conducted by the National Center for Health Statistics, is a continuing nationwide sample survey in which data are collected during personal household interviews.The NHIS is the principal source of information on the health of the civilian, noninstitutionalized, household population of the United States. Interviewers collect data on illnesses, injuries, impairments, and chronic conditions; activity limitation caused by chronic conditions; utilization of health services; and other health topics. Information is also obtained on personal, social, economic, and demographic characteristics, including race and ethnicity and health insurance status. Each year the survey is reviewed, core questionnaire items are revised every 10–15 years (with major revisions occurring in 1982 and 1997) and special topics are added or deleted annually.

In 2006 a new sample design was implemented. This design, which is expected to be in use through 2014, includes all 50 states and the District of Columbia as the previous design did. Oversampling of the black and Hispanic populations has been retained in 2006 to allow for more precise estimation of health characteristics in these growing minority populations. The new sample design also oversamples the Asian population. In addition, the sample adult selection process has been revised so that when black, Hispanic, or Asian persons aged 65 years or older are present, they have an increased chance of being selected as the sample adult. The new design reduces the size of the NHIS by approximately 13 percent relative to the previous sample design. The interviewed sample for 2006 consisted of 29,204 households, which yielded 75,716 persons in 29,868 families. More information on the survey methodology and content of the NHIS can be found at www.cdc.gov/nchs/nhis.htm.

Additional background and health data for adults are available in Summary Health Statistics for the U.S. Population: National Health Interview Survey available online at www.cdc.gov/nchs/nhis.htm.
For more information: Contact: NHIS staff
E-mail: nchsquery@cdc.gov
Phone: 866–441–6247
Website: www.cdc.gov/nchs/nhis.htm
Population Projections
Sponsor: U.S. Census Bureau
Purpose: Information about the possible future race/origin/age/sex composition of the United States.
Research Design: The population projections for the United States are interim projections that take into account the results of Census 2000. These interim projections were created using the cohort-component method, which uses assumptions about the components of population change. They are based on Census 2000 results, official post-census estimates, as well as vital registration data from the National Center for Health Statistics. The assumptions are based on those used in the projections released in 2000 that used a 1998 population estimate base. Some modifications were made to the assumptions so that projected values were consistent with estimates from 2001 as well as Census 2000.

Fertility is assumed to increase slightly from current estimates. The projected total fertility rate in 2025 is 2.180, and it is projected to increase to 2.186 by 2050. Mortality is assumed to continue to improve over time. By 2050, life expectancy at birth is assumed to increase to 81.2 for men and 86.7 for women. Net immigration is assumed to be 996,000 in 2025 and 1,097,000 in 2050.
Race and Hispanic origin: Interim projections based on Census 2000 were also done by race and Hispanic origin. The basic assumptions by race used in the previous projections were adapted to reflect the Census 2000 race definitions and results. Projections were developed for the following groups:
   (1) non-Hispanic white alone,
   (2) Hispanic white alone,
   (3) black alone,
   (4) Asian alone, and
   (5) all other groups.
The fifth category includes the categories of American Indian and Alaska Native, Native Hawaiian and Other Pacific Islanders, and all people reporting more than one of the major race categories defined by the Office of Management and Budget.

For a more detailed discussion of the cohort-component method and the assumptions about the components of population change, see U.S. Census Bureau, Population Division Working Paper No. 38, “Methodology and Assumptions for the Population Projections of the United States : 1999 to 2100,” by Hollmann, Mulder, and Kallan.
While this paper does not incorporate the updated assumptions made for the interim projections, it provides a more extensive treatment of the earlier projections, released in 2000, on which the interim series is based.
For more information: Contact: Population Projections Branch
Phone: 301–763–2428
Website: www.census.gov/population/www/projections/popproj.html
Survey of Income and Program Participation
Sponsor: U.S. Census Bureau, Social Security Administration
Purpose: To collect source and amount of income, labor force information, program participation and eligibility data, and general demographic characteristics to measure the effectiveness of existing Federal, State, and local programs; to estimate future costs and coverage for government programs, such as food stamps; and to provide improved statistics on the distribution of income in the country. SIPP also offers detailed information on cash and noncash income on a sub-annual basis in addition to collecting data on taxes, assets, liabilities, and participation in government transfer programs.
Survey Universe: U.S. civilian noninstitutionalized population
Research Design: This is a longitudinal survey—a continuous series of national panels.
Survey Mode: Most interviews conducted through 1991 were in the form of personal visits. In 1992, SIPP switched to maximum telephone interviewing to reduce costs. Wave 1, 2, and 6 interviews were still conducted in person, but other interviews were conducted by telephone to the extent possible. SIPP telephone interviews and personal visits are carried out by the same interviewer interacting with the same respondents. Interviewers typically make phone calls from their homes. For security and confidentiality reasons, they are not allowed to use cellular or cordless telephones in the interviews. If a standard telephone is not available, the interviews must be conducted face-to-face. Repeated failure to reach a respondent by telephone may also require an in-person visit to the listed address.
Unit of Analysis: All household members 15 years old and over are interviewed by self-response, if possible; proxy response is permitted when household members are not available for interviewing.
Sample: The SIPP sample is a multistage-stratified sample of the U.S. civilian noninstitutionalized population with sample sizes ranging from approximately 14,000 to 45,000 interviewed households per panel. The duration of each panel ranges from 2½ years to 4 years.

The Census Bureau also over sampled the low-income population for the 1996, 2001, and 2004 Panels using decennial census information. Housing units within each PSU were split into high- and low-poverty strata. If the housing unit received the Census long form that included income questions, the unit’s poverty status was determined directly; for other housing units, poverty status was assumed on the basis of responses to Census short-form items predictive of poverty rates.
Topics: Income, labor force participation, program participation, and eligibility.
Data Availability: For the 1984–93 panels, a panel of households was introduced each year in February. A 4-year panel was introduced in April 1996. A 2000 panel was introduced in February 2000 for 2 waves. A 3-year 2001 panel was introduced in February 2001, and a 2½ year 2004 panel was introduced in February 2004.
Linked Data: The Census SIPPs are linked to the IRS wage and self-employment and tip tax records (1040SEs). These exist for each year from 1982 to most recent year lagged one year. The Social Security earnings are from IRS forms and owned by IRS. They are the employer reported wages and salaries and self-employed reported income subject to taxation by Social Security up to the maximum subject to tax. They exist for each year from 1951 to most recent year lagged one year. The most recent year available is 2004.

SSA benefit records contain information on Medicare Part A, Part B, and Part D low income subsidy and Medicaid subsidies of Medicare Part B (QMB, SLMB, QI).
Data Dissemination: Data collected in SIPP and supporting documentation are available in various forms. They include published estimates based on those data, micro data in several formats, documentation for each of the micro data files, and more general documentation about methodological issues in SIPP. The latter includes the SIPP Quality Profile, a series of working papers distributed by the Census Bureau, articles published in academic journals, and conference proceedings.

SIPP microdata files can be obtained from several sources. All public use micro data files can be obtained on CD-ROM directly from the Census Bureau. SIPP micro data are available online from the SIPP website at www.sipp.census.gov/sipp/. The Internet site offers two data access tools DataFerrett and the SIPP FTP site. DataFerrett is a system that enables users to access and manipulate large demographic and economic data sets on-line. The SIPP FTP site has data files and documentation for downloading.

Cross-sectional data are presented for various socioeconomic characteristics for a 4-month period. Longitudinal data are presented for a 2½-year or 3-year period. Variables for both data sets include age, race, sex, Hispanic origin, marital status, household/family relationship, educational attainment, work experience, and income. Basic cross-sectional questions are supplemented with topically relevant questions such as employment history, work disability, education, health care, financial assets, retirement accounts, etc.
Reports: SIPP publications can be found at www.sipp.census.gov/sipp/pubs.html.
For more information: E-mail: hhes.sipp.survey@census.gov
Website: www.sipp.census.gov/sipp/
Last Modified: 12/31/1600 7:00:00 PM