Washington Statistical Society Seminar Archive: 2003
Topic: Clarifying Some Issues in the Analysis of Survey Data
- Speaker: Phil Kott, Senior Statistician, USDA/NASS/RDD
- Date & Time: January 14, 2003, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
The literature offers two distinct reasons for incorporating sample weights into the estimation of linear regression coefficients from a model-based point of view. Either the sample design is informative or the model is incomplete. The traditional sample-weighted least-squares estimator can be improved upon when the sample design is informative, but not when the standard linear model fails and needs to be extended.
It is often assumed that the realized sample derives from a two-phase process. In the first phase, the finite population is drawn from a hypothetical superpopulation via simple random (cluster) sampling. In the second phase, the actual sample is drawn from the finite population. Many think that the standard practice of treating the (cluster) sample as if it was drawn with replacement from the finite population is roughly equivalent to the full two-phase process. That is not always the case.
This program is physically accessible to persons with disabilities. For interpreting services, contact Yvonne Moore at TYY 301-457-2540 or 301-457-2853 (voice mail) Sherry.Y.Moore@census.gov.
Return to topTitle: Bureau of Transportation Statistics (BTS) Prototype Disclosure Software for Tabular Data
- Speakers:
James P. Kelly, OptTek Systems Inc, Boulder, Colorado
J. Neil Russell, Bureau of Transportation Statistics,Washington, DC - Date/Time: Tuesday, January 21, 2003, 12:30 to 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Rooms 7&8, 2 Massachusetts Ave., N.W., Washington DC. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
Abstract:
The United States Department of Transportation's Bureau of Transportation Statistics (BTS) is developing its confidentiality policy which is based on its legislative mandate (49 U.S.C. 111(i)) to protect individually identifiable information. Because the field of statistical disclosure limitation (SDL) research is still evolving, BTS wants to take advantage of the latest SDL research in updating its confidentiality policy and practices. To this end, BTS initiated a project to develop, demonstrate, and implement new, state-of-the-art SDL methods for complex, multi-dimensional (up to five) tables that contain a hierarchical structure.
After reviewing a wide variety of SDL methods described in the literature, the project team selected the Synthetic Data Substitution (SDS) method proposed by Dandekar and Cox (2002), which evaluated well for the BTS requirements. This method was subsequently enhanced to efficiently manipulate large tables. A modified version of this SDS method was implemented into prototype computer software for demonstration and testing. The explication of extremely efficient algorithms (capable of processing multi-dimensional tables with hundreds of thousands of entries) will be discussed, along with the software functionality and a demonstration of the prototype software using examples of agency tabular data.
Return to topTopic: The NASS Question Repository System
- Speaker: Daniel G. Beckler, National Agricultural
Statistics Service
- Date: Wednesday, February 19, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics Conference Room 7
- Sponsor: WSS Data Collection Methods Section
Abstract:
USDA's National Agricultural Statistics Service (NASS) conducts hundreds of surveys annually on the nation's farmers and agribusinesses. For most surveys multiple questionnaire versions are needed to address differences in agriculture between states. The questionnaires also need to be developed for multiple data collection modes: mail, telephone/CATI, and most recently the World Wide Web. In order to efficiently create the numerous questionnaires needed for all of the survey, state and mode combinations, NASS is developing a client-server based Question Repository System (QRS). The QRS includes a user interface to build properly formatted questions for the various modes; these questions are then stored in a database. The stored questions are then retrieved and used to build questionnaires, which may be saved, printed or ported to a Web server. This seminar discusses the capabilities and some technical details of the QRS.
Return to topTitle: The Statistical Administrative Records System and Administrative Records Experiment in 2000: System Design, Successes, and Challenges
- Speaker: Dean Judson, U.S. Census Bureau
- Discussant: John Czajka, Mathematical Policy Research, Inc.
- Chair: Pat Cantwell, U.S. Census Bureau
- Date/Time: Tuesday, February 11, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 7, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
Abstract:
The purpose of this presentation is to document the scope of administrative records use at the Census Bureau both historically and currently, and to describe the Statistical Administrative Records System (StARS) and Administrative Records Experiment in 2000 (AREX 2000). This presentation is an introduction to these two attempts to simulate an "administrative records census", and serves as an introduction to the following presentation, which will focus on evaluation. I first describe the StARS design for 1999 and 2000, describing challenges in using administrative records data, then describe the specific aspects of the administrative records experiment. I conclude by describing how the demographic results of the StARS and AREX experiments compare to national, state, and county level Census 2000 data.
Return to topTitle: Overview of New Legislation Protecting the Confidentiality of Statistical Information and Statistical Disclosure Limitation Methodology
- Presenter: Nancy Kirkendall, Director, Statistical Methods Group, EIA
- Date/Time: Wednesday, February 12, 2003, noon 1:00 p.m.
- Location: US Department of Transportation, Nassif Building, 400-7th St., SW, Room 6200, Washington DC (L'Enfant Plaza Metro Stop -- follow signs to the Department of Transportation exit at 7th and D Streets, SW). Please use Southwest entrance of the building (see Notice at the end of this announcement).
- Sponsor: Bureau of Transportation Statistics
Abstract:
In December 2002 President Bush signed into law HR 2458 the E-Government Act of 2002. Title V of this Act, the Confidential Information Protection and Statistical Efficiency Act (CIPSEA) of 2002, provides uniform safeguards to protect the confidentiality of information provided by the public for statistical purposes regardless of which agency collects the data. The speaker will give an overview of CIPSEA, describe the impact of CIPSEA on EIA, and discuss the questions that EIA sent to OMB about CIPSEA.
Dr. Kirkendall will also provide an introduction to statistical disclosure limitation methodology. OMB's Statistical Policy Working Paper (SPWP) #22, "Report on Statistical Disclosure Limitation Methodology," will serve as the foundation. The speaker chaired the subcommittee that authored SPWP # 22.
Return to topTopic: Some Advances in ARIMA-Model-Based Decomposition of Time Series (Including Seasonal Adjustment)
- Speakers: Agustin Maravall, Bank of Spain
- Discussant: Keith Ord, Georgetown University
- Chair: David Findley, U.S. Census Bureau
- Date/Time: Thursday, February 13, 2003; 12:30 - 2:00 PM
- Location: Bureau of Labor Statistics, Conference Center Room 10, Postal Square Building (PSB), 2 Massachusetts Ave. NE, Washington, D.C. Please use the First St., NE, entrance to the PSB.
- Sponsor: Economics Section
Abstract:
In the early 1980s, a new approach to seasonal adjustment was suggested, namely, the Arima-model-based (AMB) method (Burman, 1980; Hillmer and Tiao, 1982; among others). The method consisted of, first, identifying the Arima model for the observed series; second, decomposing the model into (unobserved) trend-cycle, seasonal, and irregular components, which also follow Arima-type models; and third, estimating the components by means of the Wiener-Kolmogorov filter extended to non-stationary series.
Twenty years later, the approach seems to have come of age. The presentation will describe some of its main features and illustrate how it can answer questions relevant for the analyst, and for economy-watchers and policy-makers.
Two extensions of the approach to fields different from seasonal adjustment will also be presented. One, to business-cycle estimation, will illustrate the complementarity of "ad-hoc" and AMB filtering. The second, to quality control of data, will illustrate application of the automatic regression-ARIMA model identification procedure on a very large scale (perhaps millions of series).
Return to topTitle: Regression Models for Time Series Analysis
- Speaker: Professor Benjamin Kedem, University of Maryland, College Park
- Date & Time: 11:00-12:00 noon, February 14, 2003
- Location: Funger Hall 307. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Abstract:
A relatively recent statistical development is the important class of models known as generalized linear models (GLM) that was introduced by Nelder and Wedderburn (1972), and which provides under some conditions a unified regression theory suitable for continuous, binary, categorical, and count data. The theory of GLM was originally intended for independent data, but it can be extended to dependent data under various assumptions. The extension to time series will be presented accompanied by some real data examples.
Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminars/Spring2003.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359.
Return to topTopic: The NASS Question Repository System
- Speaker: Daniel G. Beckler, National Agricultural
Statistics Service
- Date: Wednesday, February 19, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics Conference Room 7
- Sponsor: WSS Data Collection Methods Section
Abstract:
USDA's National Agricultural Statistics Service (NASS) conducts hundreds of surveys annually on the nation's farmers and agribusinesses. For most surveys multiple questionnaire versions are needed to address differences in agriculture between states. The questionnaires also need to be developed for multiple data collection modes: mail, telephone/CATI, and most recently the World Wide Web. In order to efficiently create the numerous questionnaires needed for all of the survey, state and mode combinations, NASS is developing a client-server based Question Repository System (QRS). The QRS includes a user interface to build properly formatted questions for the various modes; these questions are then stored in a database. The stored questions are then retrieved and used to build questionnaires, which may be saved, printed or ported to a Web server. This seminar discusses the capabilities and some technical details of the QRS.
Return to topTitle: Using Correlations Between Link Flows To Improve AADT/VMT Estimation: Simulation Results
- Presenter: Prof. Prem K. Goel, Department of Statistics, The Ohio State University.
- Date/Time: Tuesday, March 4, 2003, 10:00 a.m. - 11:00 a.m.
- Location: US Department of Transportation, Nassif Building, 400-7th St., SW, Room 3200, Washington DC (L'Enfant Plaza Metro Stop -- follow signs to the Department of Transportation exit at 7th and D Streets, SW).
- Sponsor: Bureau of Transportation Statistics
Abstract:
Dr. Goel will present results of his investigation into Bayesian and non-Bayesian strategies to improve AADT estimation by exploiting the inherent underlying correlations between link flows. These correlations arise partially because inflows and outflows to a node are always constrained. In addition, when the network has a large number of O-D zones, and a relatively smaller number of links, the correlation between the link flows can be large. Traditional AADT estimation procedure ignores these correlations completely, and amounts to using an ordinary least square estimate, after adjusting the coverage counts by daily and monthly factors. Simulation results will be presented, pointing out some network scenarios, under which the traditional estimates can be improved upon.
Note: This seminar will be held in a wheelchair-accessible location. Attendees who require sign language interpretation, other auxiliary aids or alternate accessible formats should advise the program coordinator at least three business days prior to the date of the seminar.
Return to topTitle: An Analysis of Box-Cox Transformed Data
- Speaker: Ms. Jade Lee Freeman, Department of Statistics, The George Washington University
- Time: 11:00-12:00 Noon March 7, 2003
- Location: Funger Hall 307. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Abstract:
We present a method for estimating the mean vector from a multivariate skew distribution that includes some unobserved data below the detection limits. To estimate the mean vector and the covariance matrix we develop an EM algorithm solution and use it to maximize the likelihood. We obtain expressions for the mean vector, covariance matrix, and the asymptotic covariance of the vector of means in the original scale. The performance of the MLE method in selecting the correct power transformation and the coverage rate of the confidence region under several conditions are investigated with Monte Carlo simulation.
Box-Cox transformation system produces the power normal (PN) family, whose members include normal and log-normal distributions. We study the moments of PN and obtain expressions for its mean and variance. The quantile functions and a quantile measure of skewness are discussed to show that the PN family is ordered with respect to the transformation parameter. The conditional distributions are studied and shown to belong to the PN family. We obtain expressions for the mean, median and modal regressions. Chebyshev-Hermite polynomials are used to obtain an expression for the correlation coefficient and to prove that correlation is smaller in the PN scale than the original scale. Frechet bounds are used to obtain expressions for the lower and upper bounds of the correlation coefficient. An algorithm is given to compute the bounds.
We also investigate the efficiency of tests after a power transformation. In particular, we consider the one sample test of location and study the gains in efficiency for one-sample t-test following a Box-Cox transformation. We prove that the asymptotic relative efficiency of transformed univariate t-test and Hotelling test of multivariate location with respect to the same statistics >based on untransformed data is at least one. We also study the efficiency of the correlation coefficient following a Box-Cox transformation. We prove that much stronger conclusions can be reached about the independence of the margins of bivariate normal variates once they have been transformed with a Box-Cox transformation.
Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminars/Spring2003.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359.
Return to topTitle: Practical Considerations in Selecting Statistical Disclosure Methodology for Tabular Data
- Presenter: Rich Allen, Deputy Administrator for Programs and Products, National Agricultural Statistics Service, U.S. Department of Agriculture
- Date/Time: Wednesday, March 12, 2003, 12:00 - 1 pm
- Location: US Department of Transportation, Nassif Building,400-7th St., SW, Room 6200.
- Sponsor: Bureau of Transportation Statistics
Abstract:
Population and data user attributes which should be considered in determining the most proper statistical suppression techniques will be outlined. A general overview of methods used by the National Agricultural Statistics Service and the Agricultural Marketing Service to protect tabular data will be presented. Certain methods will be highlighted and explored in more detail.
Return to topTopic: Unequal Treatment: Confronting Racial and Ethnic Disparities in Healthcare
- Speakers: Brian Smedley and Adrienne Stith, Institute of Medicine, The National Academies
- Chair: Shelly Ver Ploeg, Committee on National Statistics, The National Academies
- Date: Thursday, March 13, 2003, 12:30-2:00
- Location: Bureau of Labor Statistics. Conference Rooms 8. Video Conference to selected sites
- Sponsor: Bureau of Labor Statistics
Abstract:
Racial and ethnic disparities in health care are real and unacceptable. They occur across a wide range of medical conditions and heath care services, and exist independently of insurance status, income, and other access-related factors. At the level of health systems, minorities are likely to get poorer care because of several factors, including resource allocation policies that are less favorable to minorities, linguistic and cultural barriers, and the disproportionate representation of minorities in restrictive health plans. Minority patients, for a variety of historic and socioeconomic reasons, are more likely to refuse treatment, or fail to adhere to treatment due to misunderstanding or mistrust. These patient and system-level factors, however, don't fully explain the consistency of racial and ethnic gaps in treatment. Prejudice, bias, and stereotyping by providers, as well as clinical uncertainty, contribute to disparities in health care. This is a major conclusion of an expert panel of the Institute of Medicine, summarized in a report called Unequal Treatment: Confronting Racial and Ethnic Disparities in Healthcare. Brian Smedley and Adrienne Stith, program officers at the IOM, will discuss this and other conclusions of the report, along with the reports' recommendations for strategies to eliminate these disparities.
Return to topTitle: Improving the Quality of Surveys
- Speaker: David Marker, Westat
- Discussant: Steven B. Cohen, Agency for Healthcare Research and Quality
- Chair: Amrut Champaneri, Bureau of Transportation Statistics
- Date/Time: Wednesday, March 19, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 1, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: WSS Quality Assurance and Physical Sciences Section
Abstract:
There have been a number of conferences in the last few years focusing on improving the quality of surveys (Stockholm and Ottawa in 2001, Copenhagen in 2002). There are a great many ways to improve the quality of surveys, too many to be covered in one presentation. This keynote presentation from the International Conference on Improving Surveys will focus its discussion on three general topics: response rates, technological changes, and continuous improvement, particularly through communications. Improving response rates has been the topic of numerous papers and conferences. Surveys are being dramatically altered by changes in technology. We discuss three types of technological change, the Internet, mobile telephones, and handheld computers. Communication has not commonly been discussed, but is fundamental to successfully improving surveys.
Return to topTitle: Issues in Measurement and Analysis of Health Related Quality of Life
- Speaker: Professor Mounir Mesbah, Universite de Bretagne-Sud, Vannes, France
- Time: 5:00-6:00 pm April 4, 2003
- Location: Funger Hall 321. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Abstract:
Health Related Quality of Life surveys deals generally with two kinds of data: data recorded during an exploratory or validation step, in order to help with the construction (definition) of variables and indicators, and data recorded during an analysis step in order to investigate the evolution of the distribution of the previous constructed variables between various populations, times and areas.
These are generally two well separated steps during the research process of a scientist in the field of Health Related Quality of Life, Environment or any other. The first step, generally deal with measurement, calibration, metrology of variables and most used statistical methods are multivariate exploratory analysis and structural models, like factorial analysis models or item response theory models. The second step, is certainly more known by inferential statisticians. Linear, generalized linear, time series and survival methods (and models) are very useful in this step. The variables constructed in the first step are incorporated in this second step and their joint distribution - joint with the other analysis variables (treatment group, time, duration of life, etc …)- is investigated. In this talk, I will compare the simple strategy of separating the two steps with the one defining and analysing a global model including both the measurement and the analysis step. I will illustrate the issue with a real example in oncology, where the main goal is the analysis of the joint distribution of Survival and Quality of Life of cancer patients randomized in two treatment groups during a clinical trial.
Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminars/Spring2003.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359.
Return to topTitle: Evaluation Results of an Administrative Records Census Experiment
- Speakers: Harley Heimovitz and Mark Bauder, U.S. Census Bureau
- Discussant: John Czajka, Mathematical Policy Research, Inc.
- Chair: Dean H. Judson, U.S. Census Bureau
- Date/Time: Tuesday, April 8, 2003, 12:30 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 2, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
Abstract:
The Administrative Records Experiment in 2000 (AREX 2000) was an attempt to simulate an administrative records census using data from seven federal databases, supplemented by field and processing operations. This presentation describes the results of the AREX 2000 evaluations, comparing county, tract, block, and household tallies between administrative records and Census 2000 results. The evaluations focused on two key issues important to the Census Bureau and federal program administrators: Can administrative records data be used to develop small area, intercensal estimates of the population and its composition? To what extent can administrative records data substitute for costly non-response followup operations in a decennial census?
The evaluations compare alternative enumeration methods and assess our field, processing, and imputation operations. We present tabular, model, and geospatial support that administrative records provide good estimates of Census counts at larger geographies, with greater accuracy using the "bottom-up" enumeration method. The results also suggest that administrative records addresses and households have potential use in the nonresponse followup or imputation phase of a traditional census. AREX processing deficiencies are investigated and confirm known problems in identifying selected demographic groups. Identifying these deficiencies has allowed us to revise and improve our methodology.
Return to topTitle: The Effect Of Statistical Dependence on Inferences from Binomial Data
- Speaker: Professor Weiwen Miao Department of Mathematics and Computer Science, Macalester College, Minnesota
- Time: 11:00-12:00 Noon April 11, 2003
- Location: Funger Hall 307. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Abstract:
The talk describes the effect of the statistical dependence on tests and confidence intervals for the parameter p, the success probability in a binomial random variable. The problem was motivated by a jury discrimination case, Moultrie v. Martin, in which half the grand jurors served a second year. Hence, the racial compositions of the grand juries in consecutive years were no longer statistically independent. The first part of the talk concentrates on the effect of the dependence on hypothesis testing. It will be shown that ignoring dependence not only made the statistical evidence of discrimination appear stronger than it truly was but also exaggerated the power of the test used to determine the possible discrimination. Both the exact distribution of the number of "successes" and its normal approximation are compared in order to provide a practical condition for the use of the approximation. The second part of the talk focuses on the effect of dependence on confidence intervals for a population proportion. When observations are dependent, even slightly, the coverage probability of the virtually all the confidence intervals in the literature can deviate noticeably from their nominal level. We proposed and examined several modified confidence intervals. Our results showed that the modified Wilson interval performs well and can be recommended for general use.
Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminars/Spring2003.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359.
Return to topTitle: Making Sense of Census Data via the World Wide Web: A Case Study Using the 1997 Census of Agriculture
- Speaker: Irwin Anolik, USDA - National Agricultural Statistics Service - Research and Development Division (ianolik@nass.usda.gov)
- Time: Tuesday, April 15, 2003, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, Bldg. 3. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
This seminar demonstrates ways of enabling our data customers to freely and effectively access and analyze NASS data using a web browser connected to the Internet. We focus on using data from the 1997 Census of Agriculture to demonstrate methods of display and analysis that give a better understanding of inherent patterns and structure in the data. Specifically, we provide the ability to view, analyze, and dynamically interact with summary data at the state and county level. Historically, our customers have had access to this data in tabular form only.
We discuss the relevant concepts and technologies that we considered, and the selection of specific solutions that we currently implement on the NASS web site to give data customers the abilities discussed above. We also discuss our attempt to design a web site so that information of interest to large numbers of customers is easy to access.
While we focus on data from the 1997 Census of Agriculture, the principles and methods discussed can be applied to other sources of survey and census data.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to S.Yvonne.Moore@census.gov.
Return to topTitle: Interviewer Falsification and Scientific Misconduct
- Speaker: Bob Groves, Institute for Social Research, University of Michigan
- Time: Tuesday, April 15, 2003, 12:30 pm - 2:00 pm
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Rooms 1 and 2, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: Data Collection Methods Section, WSS; AAPOR-DC
Abstract:
The Office of Research Integrity of the Department of Health and Human Services recently ruled that interviewer falsification was an act of scientific misconduct. A recent meeting of survey researchers concerned with this issue reviewed the literature on interviewer falsification, and reviewed alternative personnel actions reacting to falsification. It drafted a proposed statement on current best methods for dealing with interviewer falsification. This talk reviews the ingredients of the statement and seeks input from participants.
Return to topTitle: Quality Review in Sampling Administrative Records
- Speaker: Wendy Rotz and Eric Falk, Ernst & Young LLP
- Discussant: Jeri Mulrow, National Science Foundation
- Chair: Alan Jeeves, Bureau of Transportation Statistics
- Date/Time: Thursday, April 24, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 10, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: WSS Quality Assurance and Physical Sciences Section
Abstract:
Sand traps and sanity checks in sampling will be discussed from creating a sampling population file, through sample design, selection, and estimation. Samplers heavily rely upon electronic data files to produce their sampling frames but how reliable is that data? What are some simple yet effective checks to avoid pitfalls along the way? When sample results are returned to a statistician for estimation, what could have gone wrong while non-statisticians handled the data? There are errors that are far too easy to make. How can they be caught or even avoided? The Ernst & Young Quantitative Economics and Statistics Group's quality review checks for sampling engagements will be presented.
Return to topTitle: Statistical Disclosure Limitation: Releasing Useful Data for Statistical Analysis
- Presenter: Stephen Fienberg, Department of Statistics, Center for Automated Learning and Discovery and Center for Computer and Communications Security, Carnegie Mellon University
- Discussant: Nancy Kirkendall, Director, Statistics and Methods Group, Energy Information Administration, US Department of Energy
- Date/Time: Monday, April 28, 2003, 1:30 3:30 pm
- Location: US Department of Transportation, Nassif Building, 400-7th St., SW, Room 332.
- Sponsor: Bureau of Transportation Statistics
Abstract:
Disclosure limitation has often been viewed by statistical agencies solely as a mechanism for "protecting" confidentiality, and not in terms of providing data that are useful for statistical analysis. A true statistical approach to disclosure limitation needs to assess the tradeoff between preserving confidentiality and the usefulness of the released data, especially for inferential purposes. In this presentation we discuss these issues, illustrate them with some recent methods for categorical data, and describe some of the research challenges that remain.
Return to topTitle: Overview of Statistical Disclosure Methodology for Microdata
- Presenter: Laura Zayatz, Disclosure Limitation Research Group Leader, US Census Bureau
- Date/Time: Tuesday, April 29, 2003, 1-2 pm
- Location: US Department of Transportation, Nassif Building, 400-7th St., SW, Room 3328.
- Sponsor: Bureau of Transportation Statistics
Abstract:
An overview of methods used to protect microdata will be presented. Certain methods will be highlighted and explored in more detail as they are applied to Census Bureau microdata.
Return to topTitle: Developing Information Quality Guidelines in a Political Environment
- Speaker: Patrick E. Flanagan, Bureau of Transportation Statistics
- Discussant: Jay Casselberry, Energy Information Administration
- Chair: Eugene M. Burns, Bureau of Transportation Statistics
- Date/Time: Tuesday, May 13, 2003, 12:30-2:00 PM
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 9, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB/
- Sponsor: WSS Quality Assurance and Physical Sciences Section
Abstract:
he "Data Quality Act" is a recent law in the United States requiring every federal agency in the U. S. Government to produce information quality guidelines. At the U. S. Department of Transportation, we wrote guidelines to comply with the Act, improve data quality, and create a consistency across our data systems. In the process of doing this, we encountered some realities of implementation that we had to address if we were to have a realistic chance of achieving the data quality goals. This presentation is about some of the choices we made and the tool we used to help make the choices.
Return to topTitle: Collecting Sensitive Data Using the 3-Card Method
- Speakers: Judy Droitcour, General Accounting
Office
Nathan Anderson, General Accounting Office - Discussant: Fritz Scheuren, National Opinion Research Center
- Organizer: Jonaki Bose, Bureau of Transportation Statistics
- Date: Wednesday, 14th May 2003
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Room 9, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB. Use the Red Line to Union Station.
- Sponsor: Data Collection Methods Section, WSS
Abstract:
Collecting sensitive information on, for example, illegal immigration or elder abuse can provide important inputs to the policy process, but is challenging because of privacy issues and the potential for response bias.
The three-card method is a survey-based indirect estimation technique that assures absolute privacy of response. No one the interviewer, data analyst, principal investigator, or anyone else can ever know whether a respondent is in the sensitive category, based on his or her responses. Yet when all data are combined, an estimate of the proportion of individuals in the sensitive category is possible. This method was initially designed to: 1) avoid the "mind-boggling" procedures in a randomized response interview; 2) allow follow-up questions; and 3) estimate all answer categories, including the sensitive category, for the total population and major subgroups. The three card method was originally designed to estimate all categories of immigration status, including the sensitive illegal category. Recent developments include (1) separate estimation of visa overstays within the illegal immigrant category (as this group is of special interest in the post-9/11 environment), and (2) elder abuse (a topic of growing concern as baby boomers age).
The first paper (by Judith Droitcour) discusses the general method, estimation of visa overstays, and the variance associated with a three-card estimate. The second paper (by Nathan Anderson) will present a potential application in the area of elder abuse and plans for new work. Fritz Scheuren will discuss the presentations.
Return to topTitle: Statistical Disclosure at the National Center for Health Statistics (NCHS)
- Presenter: Alvan Zarate, Confidentiality Officer, NCHS
- Date/Time: Tuesday, May 20, 2003, 10-11 am
- Location: US Department of Transportation, Nassif Building, 400-7th St., SW, Room 6200.
- Sponsor: Bureau of Transportation Statistics
Abstract:
This seminar describes how statistical disclosure limitation methods are implemented at the National Center for Health Statistics. The roles that a Disclosure Review Board and a Confidentiality Officer play in a Federal statistical agency will be highlighted.
Return to topTitle: Computer-automated Constructive Data Mining with PcGets
- Speaker: Neil Ericsson, FRB
- Discussant: Keith Ord, GU
- Moderator: Charlie Hallahan, ERS/USDA
- Place: BLS Conference Center, Room 1
- Date: May 20, 2003, 12:30-2:00
- Sponsor: Statistical Computing Section
Abstract:
Hoover and Perez (1999, Econometrics Journal) advocate a constructive approach to data mining. The current paper identifies four pejorative senses of data mining and shows how Hoover and Perez's approach counters each. To assess the benefits of constructive data mining, the current paper applies a data mining algorithm (PcGets) similar to Hoover and Perez's to a dataset for Venezuelan consumers' expenditure. The selected model is economically sensible and statistically satisfactory; and it illustrates how data can be highly informative, even with relatively few observations. Limitations to algorithmically based data mining provide opportunities for the researcher to contribute value added in the empirical analysis.
Return to topTitle: Ethnographic Methods for Understanding the Response Process in Demographic and Establishment Surveys: The Concept of Ecological Validity
- Speaker: Tony Hak, Erasmus University, Rotterdam, The
Netherlands
ASA/NSF Research Fellow, Establishment Survey Methods Staff
Email: Antonie.hak@census.gov - Date/Time: May 28, 2003, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
The Census Bureau has used ethnographic research for over thirty years to provide an understanding of unit and item non-response and of data error in the decennial census and in demographic surveys. The methods used comprise a wide range of qualitative techniques. Participant observation has been used in some studies, but in-depth interviewing has been used more frequently, often combined with a variety of other techniques, including card sorts, vignettes, focus groups and debriefings. Probes about specific wording or concepts from surveys, using techniques appropriate to cognitive interviewing, are also frequently included. The latter demonstrates that the border between ethnographic research and cognitive pre-testing is not clear-cut. Both ethnographic research and cognitive interviewing are forms of qualitative research that are aimed at understanding the response process and use a mixture of observational and interviewing techniques.
In this presentation I will discuss methodological criteria for evaluating the different qualitative techniques that are commonly used. I will discuss how different techniques yield different results and why this matters. One of the main criteria I will discuss is 'ecological validity' which refers to the similarity or commonality (in relevant respects) between the research context (such as the interview situation, the context of observation, etc.) and the research topic (i.e., in this case, the response process). I will show how different techniques can be ranked according to this criterion and to other methodological criteria (such as reliability and generalizability) and practical concerns (such as cost-effectiveness). Finally I will address the trade-offs involved in balancing these practical and methodological criteria.
This program is physically accessible to persons with disabilities. For interpreting services, contact Yvonne Moore at TYY 301-457-2540 or 301-457-2853 (voice mail) Sherry.Y.Moore@census.gov.
Return to topTitle: Using Administrative Records and Follow-Up Interviews to Verify Household Survey Responses
- Speakers: Jerry West, National Center for Education Statistics
- Organizer: Jonaki Bose, Bureau of Transportation Statistics
- Date: Thursday, June 5th 2003
- Time: 12:30 to 2 pm
- Location Bureau of Labor Statistics, conference room 7. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
- Sponsor: Data Collection Methods Section, WSS
Abstract:
Household survey data are often used to estimate the participation of different population groups (e.g., preschool and school-age children, adults) in a wide range of programs and activities. Yet, quite often, the validity of the survey data has not been directly assessed because of the cost of conducting studies to verify survey responses and the difficulties of implementing such studies. This paper examines two approaches for evaluating the quality of household survey data. Both approaches use a follow up survey with the individual or organization identified by the household respondent as providing a program or service. One approach uses a follow up survey in combination with an on-line directory of service providers built from administrative records data.
The two approaches were implemented by the U.S. Department of Education, National Center for Education Statistics in its newest longitudinal study of young children - the Early Childhood Longitudinal Study. The paper will use the findings from these approaches 1) to assess the quality of the data provided by household respondents and 2) to evaluate the promising features and shortcomings of each approach for verifying household survey responses. The implications of these findings for other surveys will be discussed.
Return to topTopic: The Second Seminar on the Funding Opportunity In Survey Research
- Organizer: Research Subcommittee of the Federal Committee On Statistical Methodology (Robert Fay; robert.e.fay.iii@census.gov; Monroe Sirken; mgs2@CDC.gov)
- Sponsors: Washington Statistical Society and Washington DC / Baltimore Chapter of AAPOR
- Date/Time: Monday, June 9, 2003, 9:00 AM - 4:00PM (NOTE SPECIAL TIME)
- Location: Bureau of Labor Statistics, Conference and Training Center, Rooms 1, 2, and 3, Postal Square Building (PSB), 2 Massachusetts Avenue, NE, Washington, DC. Please use First St., NE entrance (across from Union Station).
Abstract:
In 1998, a consortium of 12 Federal statistical agencies in collaboration with the Methodology, Measurement and Statistics Program, National Science Foundation, and with the support of the Federal Committee on Statistical Methodology initiated a grants program to fund basic survey and statistical research oriented to the needs of Federal agencies. Reports of the principal investigators of the 4 research projects funded during cycle 1 of the Program in 1999 were featured at a first Funding Opportunity Seminar held in Washington during June 2001.
The Second Funding Opportunity Seminar will feature the reports of principal investigators of the 4 projects that were funded in 2001, cycle 2 of the program: 1. "Bayesian Methodology for Disclosure Limitation and Statistical Analysis of Large Government Surveys" by Rod Little and Trivellore Raghunathan; 2. "Visual and Interactive Issues in the Design of Web Surveys" by Roger Tourangeau, Mick Cooper, Reginald Baker, and Fred Conrad; 3. "Robust Small Area Estimation Based on a Survey Weighted MCMC Solution for the Generalized Linear Mixed Model" by Ralph Folsom and Avinash Singh; and 4. "Small Area and Longitudinal Estimation Using Information from Multiple Surveys" by Sharon Lohr. Federal agency statisticians and survey methodologists will be discussants at each session.
There will be 3 morning and 3 afternoon sessions. The Introductory Session will be "The Origins of the Funding Opportunity", and the Concluding Session will be "The Benefits and Challenges of the Funding Opportunity."
There will be a continental breakfast, and refreshments at midmorning and afternoon breaks so we need preliminary counts of the number of attendees. Please call Pat Drummond at 301-458-4193 if you plan to attend. If planning to attend, contact Pat Drummond by May 5, 2003: Pdrummond@CDC.gov or 301-458-4193. Also, by noon June8 either e-mail wss_seminar@bls.gov or call 202-691-7524 and give your name, affiliation, and name of seminar attending. Finally, bring a photo ID.
Return to topTopic: The Planning Database: Its Potential Use in Current Surveys and Census 2010 Planning
- Antonio Bruce and J. Gregory Robinson, Population Division
- Date/Time: June 11, 2003, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Summary:
This presentation will provide an overview of the Planning Database (PDB). The PDB and associated Hard-to-Count Scores were successfully used in the planning, implementation, and evaluation of Census 2000. We will give some specific examples of PDB applications for 2000. The PDB has proved to be a highly effective targeting tool and this capability can be exploited in ongoing Census Bureau programs, including planning for the 2010 Census. First on the agenda is updating the PDB with Census 2000 results.
Background:
Using 1990 census data at the tract level, the PDB assembled a range of housing, demographic, and socioeconomic variables that are correlated with nonresponse and undercounting. The database provided a systematic way to identify potentially difficult-to-enumerate areas that were flagged for special attention in Census 2000. The PDB was provided to all regional offices and Local Census Offices in Census 2000, and the AHard-to-Count@ scores were used in planning the areas to place Questionnaire Assistance Centers and Be Counted Forms. The PDB was used for other purposes also, such as the Areal-time@ demographic analysis of mail response rates during the critical mail phase in 2000. We also illustrated the potential of the PDB for targeting areas with concentrations of non-English speakers and profiling the specific languages. The variables included in the Planning Database were guided by extensive research conducted at the Census Bureau and by other researchers to measure the undercount and to identify reasons for why people are missed. These variables include housing indicators (percent renters, multiunits, crowded housing, lack of telephones, vacancy), person indicators (poverty, not high school graduate, unemployed, complex household, mobility, language isolation and other operational and demographic data (such as nonresponse rates and race/ethnic distributions). The PDB contains Hard-to-Count (HTC) scores which provide a systematic way to summarize the attributes of each tract in terms of enumeration difficulty. A set of algorithms is used to derive the HTC score. The comparative standing of areas provide indicators of the degree of difficulty--areas with the highest scores are likely to be the areas with relatively high nonresponse and undercount while areas with the lowest scores are likely to be areas with low rates. The high correlation of HTC scores and nonresponse rates is empirically illustrated for both 1990 and 2000. In short, the PDB and associated HTC scores provided good predictions of difficult-to-enumerate areas in 2000. Our vision is a innovative PDB which would merge census, survey/ACS, and administrative data to provide a current and highly defined targeting database for use in ongoing surveys and 2010 census planning.This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov.
Return to topTitle: The OMB's Interagency Confidentiality and Data Access Committee (CDAC)
- Presenters: Jacob Bournazian, Chair of CDAC (2002-2003), Energy Information Administration, US Department of Energy, and Mark Schipper, CDAC Member, Energy Information Administration, US Department of Energy
- Date/Time: Wednesday, June 11, 2003, 12:00 - 1 pm
- Location: US Department of Transportation, Nassif Building, 400-7th St., SW, Room 6244.
- Sponsor: Bureau of Transportation Statistics
Abstract:
The activities of CDAC will be described. The presentation will highlight products created by CDAC members, including its "Checklist on the Disclosure Potential of Proposed Data Releases" and the auditing software for tabular products.
Return to topTitle: Statistical and Operational Issues in Sampling Race and Ethnic Groups for an RDD Survey
- Speakers: Sherman Edwards and Ismael Flores-Cervantes, Westat
- Date/Time: Tuesday, June 17, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 6**, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
Abstract:
A number of different methods for sampling rare populations have been considered, but in random digit dial (RDD) telephone surveys these methods usually are costly or have serious statistical problems. In the 2001California Health Interview Survey (CHIS), separate estimates were desired for the following rare populations: five Asian subgroups (Asian Indian, Cambodian, Japanese, Korean, and Vietnamese), American Indian and Alaska Natives, and Latinos in a particular county. Each of the subgroups posed operational obstacles, including interviewing language and culturally appropriate interviewing techniques. This paper describes the methods used to oversample these groups, the operational procedures used to deal with interviewing members of these groups, and the statistical estimation schemes that were used to provide estimates for each group. The key idea is to supplement the RDD sample with samples drawn from special lists. We evaluate the efficiency of the method and related procedures and provide some general suggestions for oversampling rare groups in RDD surveys.
Return to topTitle: Nonparametric Multi-step Ahead Prediction in Time Series Analysis
- Speaker: Lijian Yang, Michigan State University
- Authors: Rong Chen, Lijian Yang, and Christian Hafner
- Chair: Stuart Scott, Bureau of Labor Statistics
- Date/Time: Wednesday, June 18, 2003, 12:30 to 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 3, 2 Massachusetts Ave., NE, Washington, DC. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
Abstract:
We consider the problem of multi-step ahead prediction in time series analysis using nonparametric smoothing techniques. Forecasting is always one of the main objectives in time series analysis. Research has shown that nonlinear time series models have certain advantages in multi-step ahead forecasting. Traditionally, nonparametric k-step ahead least squares prediction for nonlinear AR(d) models is done by forecasting X_{t+k} via nonparametric smoothing of X_{t+k} on the variables (X_{t},…,X_{t-d+1}) directly. In this paper we propose a multi-stage nonparametric predictor. We show that the new predictor has smaller asymptotic mean squared error than the direct smoother, though the convergence rate is the same. Hence, the proposed predictor is more efficient. Some simulation results, advice for practical bandwidth selection and a real data example are provided.
Return to topTopic: Does Credit Quality Matter For Homeownership?
- Speaker: Paul Calem, Federal Reserve Board
- Discussant: Darryl Getter, HUD
- Chair: Linda Atkinson, Economic Research Service
- Date/Time: June 19, 2003; 12:30 Pm - 2:00 Pm
- Location: Bureau Of Labor Statistics, Conference Center Room 9, Postal Square Building (PSB), 2 Massachusetts Ave. NE, Washington, D.C. Please Use The First St., NE, Entrance To The PSB.
- Sponsor: Economics Section
Abstract:
While there has been considerable research empirically quantifying and simulating the role of borrowing constraints on homeownership rates, the primary focus of this work has been on measuring the relative importance of income and wealth constraints with respect to ownership outcomes. A lack of data on household credit ratings has precluded evaluation of credit quality as a potential barrier to homeownership. This paper overcomes the data problem by deriving a pseudo credit score for each respondent in the Survey of Consumer Finances. This is accomplished utilizing a separate, special sample of individual credit records from which we develop a score imputation equation. Thus, we empirically estimate tenure outcome equations including estimates of household credit quality along with other financial constraints to advance our understanding of how and why such constraints matter in homeownership.
The role of financing constraints also is of interest to academic researchers and policy analysts seeking to understand recent homeownership trends and design policies that may influence future trends. Although homeownership rates increased over the 1990s (from 64% to an historic high of 67%), there is policy interest in further expanding access to homeownership. A second contribution of this paper is to examine the changing role of financial constraints over time, drawing inferences about the possible impact of recent institutional changes in the mortgage market.
Return to topWashington Statistical Society President's Invited Address
This year's Washington Statistical Society President's Invited Address is in memory of Charles (Chip) Alexander Jr. It will be held start at 3:00 p.m. and end at 4:30 p.m. on Wednesday, June 25, 2003. The venue is the Bureau of Labor Statistics Conference Center Rooms 1 and 2. A reception will follow.
The speakers are Graham Kalton of Westat and Cynthia Z.F. Clark of the U.S. Census Bureau. The title of Dr. Kalton's presentation is "Small Domain Estimates: Challenges and Solutions" and the title of Cynthia Z.F. Clark's presentation is "Tribute to Charles (Chip) Alexander Jr.: Chip's Contributions to the Federal Statistical System and Sample Survey Methods." The chair is Nancy M. Gordon of the U.S. Census Bureau and the organizer is Alan R. Tupek of the U.S. Census Bureau.
The abstract for the presentation on small domain estimates is as follows:
The continually increasing demand for timely estimates for small geographic and other domains presents survey statisticians with significant challenges. This talk will review possible solutions, including censuses and large-scale surveys, rolling samples, combining data across time and across surveys, methods for oversampling small domains, and statistical modeling methods.
Return to topTopic: Design Issues in Electronic Business Surveys
- Speakers: Elizabeth Nichols and Elizabeth Murphy,
Statistical Research Division, U.S. Census Bureau
Kent Norman, Anna Rivadeneira and Cyntrica Eaton, University of Maryland - Date/Time: Wednesday, June 25, 2003, 10:30 a.m. -12:00 Noon
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
Do different electronic questionnaire designs affect data accuracy and respondent burden? This seminar presents results from a mock-business survey experiment conducted jointly by the U.S. Census Bureau's Usability Laboratory and the University of Maryland's Laboratory for Automation Psychology and Decision Processes. In this experiment, we compared respondent accuracy and burden for different questionnaire designs within an electronic survey.
We investigated the following design issues:
- Using automated summation
- Presenting questions in a grid format
- Using different response options for "choose one" questions
- Navigating through long lists of items
- Asking for figures in specific reporting units
- Using format techniques for text fields
Each of these issues arises in designing actual Census Bureau economic surveys and censuses. We will discuss the findings and their implications for designing electronic questionnaires.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov.
Return to topTitle: Graphical Analysis To Unmask Hidden Performance Differences
- Speakers: Susan Garille Higgins and Ru Sun, Ernst & Young
- Discussants: Fritz Scheuren, NORC, and Ed Mulrow, Price Waterhouse Coopers
- Chair: Mary Batcher, Ernst & Young
- Date/Time: Thursday, June 26, 2003, 12:30 p.m. - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 9, 2 Massachusetts Ave., N.W., Washington DC. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
Abstract:
Colin Mallows' paper "Parity: Implementing the Telecommunications Act of 1996," which just appeared with discussion in Statistical Science (2003), mentions that the implementation of the Telecommunications Act of 1996 has given rise to many challenging statistical questions. The Mallow article then goes on to describe how several of these problems were successfully attacked. As is often the case in hard real life situations, however, many issues remained open and have continued to be given attention.
One of the continuing issues is the possibility that aggregation over service subgroups can lead to increased heterogeneity and may, as a result, mask potentially important differences in performance. By heterogeneity we mean a systematic tendency for relative performance to be better for one subset of transactions than for another subset. In this talk we focus on methodological issues and provide novel analytic and graphical displays that allow the issue of performance heterogeneity to be addressed in a telecommunication context.
The analysis results are clearly presented in "interval plots." These plots are carefully designed to expose when heterogeneity and masking are present and the consistency with which they occur from month to month. We see clearly how visualization can help in data analysis. We also present ways to visually demonstrate the variability in the data.
Return to topTitle: Parental Reports of Children's Race and Ethnicity in a National Longitudinal Cohort Study
- Speaker: Jonaki Bose, Bureau of Transportation Statistics
- Date/Time: Tuesday, July 1, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 10*, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
Abstract:
A number of federal agencies have changed the way they ask about race and ethnicity in ongoing and new surveys of children and families. These changes are in response to new standards issued by OMB's Office of Information and Regulatory Affairs. The Early Childhood Longitudinal Study, Kindergarten Class of 1998-99, which began tracing the educational experiences and outcomes of a national sample of kindergartners in the fall 1998, asks parents to report separately about their own race and ethnicity, as well as their children's race and ethnicity using one or more racial designations. This paper examines some of the following topics:
- What is the racial and ethnic distribution of the children in the ECLS-K and of kindergarten children in the U.S.?
- How many children are identified as multiracial by their parents and what is the racial breakdown of these children?
- What is the relationship between parental race and children's race?
- How do children identified as multiracial by their parents compare to other children who have a single race? Are there any differences in the sociodemographic characteristics of these children or in their beginning school skills?
- Do parents consistently report their child's race(s) over multiple rounds of data collection?
- Are there particular operational issues or difficulties associated with collecting and processing data on race and ethnicity when respondents are given the opportunity to identify more than one race for themselves and their children?
Title: Restricted Data Access: The Role of Research Data Centers (RDCs)
- Presenters: Arnold Reznek, RDC Director, US Census
Bureau,
Wilbur Hadden, National Center for Health Statistics (NCHS), and
Vijay Gambhir, RDC, NCHS - Date/Time: Wednesday, July 23, 2003, 1:30-2:30 pm
- Location: US Department of Transportation, Nassif Building, 400-7th St., SW, Room 6200.
- Sponsor: Bureau of Transportation Statistics
Abstract:
The role that RDCs play in gaining access to confidential data at Federal statistic agencies is discussed. This seminar will review the practices of two agencies that have RDCs. The first speaker will describe the development of the RDCs at the Census Bureau. Then the RDC at NCHS will be described and the agency's remote access procedures for gaining access to confidential data will be highlighted.
Return to top
Topic: Multiple Outputation: Inference for Complex Clustered Data by Averaging Analyses from Independent Data
- Speaker: Dean Follman, PhD
Dr Follman is Assistant Institute Director for Biostatics, Chief of the Biostatics Research Brance, National Institute of Allergy and Infectious Diseases, National Institutes of Health, in Bethesda, Maryland. - Date and Time: Wednesday, September 3, 2003, 11:00 am
- Location: Executive Plaza North, Conference Room G, 6130 Executive Boulevard, Rockville, Maryland
- For Additional Information: Contact the Office of Preventive Oncology (301) 496-8640
Abstract:
This talk describes a simple method for settings where one has clustered data, but statistical methods are only available for independant data. We assume the statistical method provides us with a normally distributed estimate and an estimate of its variance. We randomly select a data point for each cluster and apply our statistical method to this independent data. We repeat this multiple times, and use the average of estimates as our overall estimate. An estimate of the variance is given by the average of the variance estimates minus the sample variance of the estimates. We call this procedure multiple outputation as all "excess" data within each cluster is thrown out multiple times. Hoffman, Sen, and Weinberg (2001) introduced this approach for generalized linear models when the cluster size is related to outcome. In this talk we demonstrate the broad applicability of the approach. Applications to angular data, p-values, vector parameters, Bayesian inference, genetics data, and random cluster sizes are discussed.
In addition, asymptotic normality of estimates based on all possible outputations as well as a finite number of outputations is proven given weak conditions.
Multiple outputation provides a simple and broadly applicable method for analyzing clustered data. It is especially suited to settings where methods for clustered data are imprac5ical, but can also be applied generally as a quick and simple tool.
This work is done jointly with Michael Proshan and Eric Leifer.
- Additional POC:
- Grant Izmirlian, PhD
Mathematical Statistician
National Cancer Institute
6130 Executive Blvd, Suite 3131
Bethesda, MD 20892-7354
phone: 301-496-7519
fax: 301-402-0816
email: Izmirlian@nih.gov
Topic: New Technologies and Methodologies at Research Triangle Institute (RTI) International
- Speakers: Dr. Paul Biemer, Dr. Jay Levinsohn, Mr. William Wheaton, Research Triange Institute
- Date & Time: September 10, 2003, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
Continuing advancements in technology have the potential to improve the efficiency and quality of survey data collection procedures and operations. In this presentation, RTI staff will discuss ways in which they have operationalized some recent technological advancements to enhance current procedures in survey research. Specifically, they will focus on the use of digital recordings to develop Computer Audio Recorded Interviewing (CARI) for use in field verification, Global Positioning Systems (GPS), and the development of automated testing procedures for Computer-Assisted Instruments (CAI). Examples of how these technologies have been incorporated into RTI's survey operations, results from initial studies, as well as plans for continuing research will be discussed. There will be time designated for questions and answers on these topics as well as other technologies and methodologies currently being used or developed by RTI.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to S.Y.Moore@census.gov
Return to topTitle: MASSC: A new data mask for limiting statistical information loss and disclosure
- Speakers: Avi Singh, RTI International
- Discussant: Fritz Scheuren, National Opinion Research Center at the University of Chicago
- Chair: Sameena Salvucci, Synectics for Management Decisions, Inc.
- Date: Tuesday, September 16, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB. Use the Red Line to Union Station.
- Sponsor: Data Collection Methods Section, WSS
Abstract:
We propose a method termed 'MASSC' for statistical disclosure limitation (SDL) of categorical or continuous micro data, while limiting the information loss in the treated database defined in a suitable sense. The new SDL methodology exploits the analogy between (1) taking a sample (instead of a census,) along with some adjustments via imputation for missing information, and (2) releasing a subset, instead of the original data set, along with some adjustments via perturbation for records still at disclosure risk. Survey sampling reduces monetary cost in comparison to a census, but entails some loss of information. Similarly, releasing a subset reduces disclosure cost in comparison to the full database, but entails some loss of information. Thus, optimal survey sampling methods for minimizing cost subject to bias and precision constraints can be used for SDL in providing simultaneous control on disclosure cost and information loss. The method consists of steps of Micro Agglomeration for partitioning the database into risk strata, optimal probabilistic Substitution for perturbation, optimal probabilistic Subsampling for suppression, and optimal sampling weight Calibration for preserving estimates for key outcomes in the treated database.
The proposed method uses a paradigm shift in the practice of disclosure limitation in that the original database itself is viewed as the population and the problem of disclosure by inside intruders is considered. (Inside intruders know the presence of their targets in the database in contrast to outside intruders.) This new framework has two main features: one, it focuses on the more difficult problem of protecting from inside intruders and as a result also protects against outside intruders, and second, it provides in a suitable sense model-free measures of both information loss and disclosure risk when disclosure treatment is performed by employing known random selection mechanisms for substitution and subsampling. Empirical results are presented to illustrate computation of measures of information loss and the associated disclosure risk for a small data set.
Return to topTitle: Preserving Quality and Confidentiality of Tabular Data
- Presenter: Lawrence H. Cox, Associate Director, National Center for Health Statistics
- Date/Time: Wednesday, Sept. 17, 2003, 11:00 am 12:00 pm
- Location: US Department of Transportation, Nassif Building, 400 7th St., SW, Room 8240.
Abstract:
Standard methods for statistical disclosure limitation (SDL) in tabular data either abbreviate, modify or suppress from publication the true (original) values of tabular cells. All of these methods are based on satisfying an analytical rule selected by the statistical office to distinguish cells and cell combinations exhibiting unacceptable risk of disclosure (the sensitive cells) from those that do not. The impact of these SDL methods on data analytic outcomes is not well-studied but can be shown to be subtle or severe in particular cases. Dandekar and Cox (2002) introduced a method for tabular SDL called controlled tabular adjustment (CTA). CTA replaces the value of each cell failing the analytical rule by a safe value, viz., a value satisfying the rule, and then uses linear programming to adjust the values of the nonsensitive cells to restore additivity of detail to totals throughout the tabular system. The linear programming framework allows adjustments to be selected so as to minimize any of a variety of linear measures of overall distortion to the data, e.g., total of absolute adjustments, total percent of absolute adjustments, etc. Cox and Dandekar (2003) provide further techniques for preserving data quality. While worthwhile, none of these techniques directly addresses the overarching issue: Will statistical analysis of original and disclosure limited data sets yield comparable results? We provide a mathematical programming framework and algorithms, introduced in Cox and Kelly (2003), that begins to address this issue. Specifically, we demonstrate how to preserve approximately mean values, variances and correlations when original data are subjected to CTA, and how to ensure approximately intercept=zero, slope=one simple linear regression between original and adjusted data.
Return to topTopic: Implied Edit Generation and Error Localization for Ratio and Balancing Edits
- Speaker: Maria Garcia, Statistical Research Division
- Date/Time: October 1, 2003, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
The U.S. Census Bureau has developed SPEER software that applies the Fellegi-Holt editing method to economic establishment surveys under ratio edit and a limited form of balancing. If implicit edits are available, then Fellegi-Holt methods have the advantage that they determine the minimal number of fields to change (error localize) so that a record satisfies all edits in one pass through the data. In most situations, implicit edits are not generated because the generation requires days-to-months of computation. In some situations, when implicit edits are not available, Fellegi-Holt systems use pure integer programming methods to solve the error localization problem directly and slowly (1-100 seconds per record). With only a small subset of the needed implicit edits, the current version of SPEER (Draper and Winkler 1997, upwards of 1000 records per second) applies ad hoc heuristics that finds error-localization solutions that are not optimal for as much as five percent of the edit-failing records. This talk will have two parts. In the first part we will describe new SAS7 software and corresponding methodology for generating the complete set of implicit ratio edits for a given set of explicit ratio edits. The new software implements a shortest path algorithm and borrows ideas from the Generate Edits portion currently used in the Census Bureau's Plain Vanilla Ratio Module. In the second part of this talk we present recent modifications to the SPEER editing system that maintain its exceptional speed and do a better job of error localization. The new SPEER uses the Fourier-Motzkin elimination method to generate a large subset of the implied edits prior to error localization. We describe the theory, computational algorithms, and results from evaluating the feasibility of this approach.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to S.Y.Moore@census.gov
Return to topTitle: Design and Methods for Non-Inferiority Clinical Trials
- Speaker: Valerie Durkalski, PhD, MPH; Associate Director of Biostatistics; The Clinical Innovation Group; Charleston, S.C.
- Time/Date: 11-1200, Wednesday, October 1st, 2003
- Place: Executive Plaza North, Conference Room
G.
Metro: Take the metro red line to either Twinbrook (and walk to Executive Plaza) or White Flint (and catch the shuttle to Executive Plaza).
Driving: Take 95 to 495W to 270N to Exit 4, Montrose Road East, right at the third light onto Executive Boulevard, right at the first light into Executive Plaza.
Map: http://www-dceg.ims.nci.nih.gov/images/localmap.gif Web Site: www3.cancer.gov/prevention/pob/fellowship/colloquia.html - Sponsor: National Cancer Institute, Division of Cancer Prevention
MORRIS HANSEN LECTURE
Title: Simple Response Variance Then and Now
- Speaker: Paul P. Biemer, RTI International and the University of North Carolina
- Chair: Daniel Kasprzyk, Mathematica Policy Research
- Discussants: Robert Groves, University of Michigan and Keith Rust, Westat
- Date/Time: Tuesday, October 14, 2003: 3:30 pm - 6:30 pm
- Location: The Jefferson Auditorium, USDA South Building, between 12th and 14th Streets on Independence Avenue S.W., Washington DC. The Independence Avenue exit from the Smithsonian METRO stop is at the 12th Street corner of the building, which is also where the handicapped entrance is located. Except for handicapped access, all attendees should enter at the 5th wing, along Independence Avenue. Please bring a photo ID to facilitate gaining access to the building.
- Sponsors: The Washington Statistical Society, Westat, and the National Agricultural Statistics Service.
- Reception: The lecture will be followed by a reception from 5:30 to 6:30 p.m. in the patio of the Jamie L. Whitten Building, across Independence Avenue S.W.
Abstract:
As a student of nonsampling error, I was greatly influenced by the papers of Morris Hansen and his colleagues at the Census Bureau. The paper that influenced me most was "The Estimation and Interpretation of Gross Differences and the Simple Response Variance" (Hansen, Hurwitz, and Pritzker, 1964). The authors developed a model for evaluating survey classification error using reinterview surveys, introduced the concepts of simple response variance and the index of inconsistency as measures of the gross difference, and demonstrated their usefulness for optimal survey design.
In this presentation, I review the response error model proposed in their seminal work and demonstrate how it flows naturally from the simple concept of two-stage cluster sampling. I also show how it relates to a latent class model in which the true value of a characteristic is the latent variable. Using this latent variable formulation of the gross difference, I consider the progress we have made in 40 years towards resolving the "unresolved issues" HHP listed at the end of their paper, including: (a) relaxing the assumptions of independent and parallel remeasurements, (b) using the estimates of response inconsistency to improve questionnaire design, and (c) clarifying the relationship between simple response variance and response bias.
Several examples that illustrate the utility of these error models for improving survey design and other uses will be provided. The paper concludes with an examination of the similarities and differences between HHP's view of nonsampling error in the 1960's and our current view.
Return to topTitle: Human Rights and Statistics: Recent Peruvian Example -- Behind the Headlines
- Speaker: Fritz Scheuren
- Time: Wednesday, Oct. 15 at noon
- Location: Chinatown Garden Restaurant, 618 H St., NW
Abstract:
Recent stories in the Washington Post, Washington Times and elsewhere have covered the report by the Peruvian government releasing information that approximately 69,000 people, mostly rural poor Quechua-speaking Indians, were killed in the armed internal conflicts conducted in Peru over the period from 1980 to 2000.
The talk will get behind the headlines. Statistical details will be covered briefly but the focus will be on how to react as professionals and Americans in such situations. Earlier international statistical work done with the Kosovar Albanians and with the Mayan Indians in Guatemala will be drawn on for additional examples. Sadly there are many other current instances, less prominent in the media that need attention too: Sri Lanka, Sierra Leone, and East Timor would be a partial list.
Return to topTopic: The Pricing and Mispricing of Consumer Credit
- Speakers: Darryl Getter, HUD
- Discussant: Paul Calem, Federal Reserve Board
- Chair: Linda Atkinson, Economic Research Service
- Date/Time: Wednesday, October 15, 2003; 12:30 2:00 p.m.
- Location: Bureau of Labor Statistics, Conference Center Room 10, Postal Square Building (PSB), 2 Massachusetts Ave. NE, Washington, D.C. Please use the First St., NE, entrance to the PSB.
- Sponsor: Economics Section
Abstract:
Some households face rejections and some are charged higher prices for credit relative to others. Previous academic studies view this as an indication of market failure in the consumer credit markets. This research, however, argues such findings could rather be viewed as proof that credit markets are functioning properly. Evidence presented here suggests that most rejected borrowers who have been considered "credit-constrained" in the traditional credit-rationing literature pose a higher credit risk to lenders, and most households that either get rejected or must pay higher loan rates to obtain credit are properly credit rationed. Meanwhile, some high credit quality borrowers, who may have qualified for lower rates, pay premium loan rates. The more relevant issue concerning modern consumer credit markets, therefore, is whether the prices borrowers pay for loans correctly reflect their level of credit risk. This paper re-examines consumer participation in credit markets looking specifically at issues related to the pricing of loans to borrowers of different levels of credit risk.
Return to topTitle: The Role of Statistics in Achieving the Dream
- Date & Time: October 16, 2003, 12:30 - 2:30 pm
- Location: BLS Conference Center, 2 Postal Square Building, across from Union Station.
- Sponsor: Washington Statistical Society (WSS)
On October 16, 2003, there will be a 40th anniversary Washington Statistical Society (WSS) session honoring the Rev. Martin Luther King, Jr., entitled "The Role of Statistics in Achieving the Dream."
A panel format will be used with speakers who are statisticians and demographers, mixed together with civil rights activists. Some of you who read this were on the Mall on August 28, 1963. Please come and share. And if you were not there, come and feel the sense of excitement and hope that existed then. Learn how statistics has played a role at nearly every juncture in leading to progress.
But there is still a long road ahead and much more for statisticians to do in achieving true equality of opportunity. It is safe to say that few of those attending the March on Washington that day would have predicted what happened in the 40 years since. Progress has been slower than hoped but the dream has also been broadened with many legislative, social, and economic accomplishments.
The October 16 session will begin at 12:30 pm and run until about 2:30pm. The location is the BLS Conference Center, 2 Postal Square Building, across from Union Station. To gain admittance you must call at least two days ahead to Kevin Cecco at 202-874-0464. Because this is a special occasion there will be light refreshments offered.
Return to topTitle: Conditional U-Statistics with Applications in Discriminant Analysis, ARMA Processes and Hidden Markov Models
- Speaker: Professor Madan L. Puri, Indiana University, Bloomington, Indiana
- Date & Time: 11:00-12:00 Noon October 31, 2003
- Location: Funger Hall, 310, 2201 G Street, NW. Foggy Bottom metro stop on the blue and orange line,
- Sponsor: The George Washington University, Department of Statistics
Abstract:
Stute (Ann. Probab. (1991), Ann. Statist. (1994) introduced a class of conditional U-statistics which generalize the Nadaraya-Watson estimate of a regression function. Under the usual iid set-up, Stute proved the asymptotic normality, weak and strong consistency and the universal consistency of the estimate in the rth mean. Here we extend Stute's results from the independent case to the dependent case. Applications to discriminant analysis, ARMA processes and hidden Markov models are provided. The work is in collaboration with Professor Michel Harel (C.N.R.S. Toulouse, France).
For a complete list of upcoming seminars check the Department's seminar web site: http://www.gwu.edu/~stat/seminars/Fall2003.htm. The campus map is at: http://www.gwu.edu/~map/html/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359.
Return to topTopic: New Metropolitan - Micropolitan Areas Implications for Statistical and Policy Decision Making
- Date & Time: November 4, 2003 8:00 AM - 5:00 PM
- Location: Embassy Suites, Alexandria, VA
- Sponsor: Council of Professional Associations on Federal Statistics (COPAFS)
Session Topics: What We Have and Why Using the New Classifications at the National Level Regional Uses of the New Classifications Where Do We Go From Here?
Cost: $95.00 for the seminar payable by purchase order or check made to COPAFS. There is an attendance limit of 150 participants. For a registration form or for more information, contact the COPAFS office: 703/836-0404 or by email at copafs@aol.com.
Return to topTitle: Calibration Weighting: Past, Present, and Future Policy Decision Making
- Speaker: Phillip S. Kott, National Agricultural Statistics Service, U.S. Dept. of Agriculture
- Date/Time: Wednesday, November 5, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Room tba, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
This is the first in a series of WSS seminars on calibration and related types of estimation.
Abstract:
Calibration is a methodology for adjusting probability-sample weights. Using a single set of calibration weights can produce model-unbiased estimators for a number of different target variables. This talk reviews a bit of the history of calibration weighting before Deville and S rndal (1992) coined the term, discusses the contribution of their famous paper, and highlights a few major developments since, including some new results by the speaker.
A change in the definition of a calibration estimator is recommended. This change expands the class to include such special cases as, 1, randomization-optimal estimators (usually called "optimal estimators" in the literature), and, 2, randomization-consistent estimators incorporating local polynomial regression.
The most common nonlinear calibration adjustment is raking. Viewed as a form of calibration, raking can be generalized to include continuous control variables.
Calibration weighting can be used to adjust for unit nonresponse and coverage errors. In this context, the difference between a linear and nonlinear calibration adjustment can be nontrivial. Consequently, some care is often needed in constructing a valid linearization estimator of quasi-randomization mean squared error. An analogous, nonstandard jackknife avoids iteration even when the calibration weights themselves are computed using an iterative process as is the case with generalized raking.
Return to topTopic: Using Latent Variable Models to Assess Non-Sampling Error and Measurement Bias In Survey Research
- Speaker: Adam C. Carle, Ph.D., Post-Doctoral Research Associate Statistical Research Division
- Date/Time: Thursday, November 6, 2003, 11:00 - 12:30 p.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - Room 3225, FOB 4. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
Recent years have seen a trend in research undertaken at the U.S. Census Bureau and elsewhere to establish the quality of data collected in surveys. Among topics of specific concern is measurement bias, a type of non-sampling error. Measurement bias, also labeled differential item functioning, is present when individuals equivalent on true levels of a variable (e.g., income, depression, etc.), but from different groups (e.g. males and females, race, etc.), do not have identical probabilities of observed scores. Bias can lead to inaccurate estimates, attenuate or accentuate group differences, and affect the validity and reliability of research. Latent variable models offer researchers a tool to demonstrate that an instrument functions with equal precision across different groups. This seminar will present the results of the speaker's recent dissertation as a vehicle for discussing measurement bias and latent variable models. The study explored the possibility of measurement bias across sex on the Children's Depression Inventory using rating scale item response theory (IRT), confirmatory factor analysis (CFA) for continuous measures, and CFA for ordered-categorical measures. The presentation will broadly address the possible effects of measurement bias, why survey methodologists and statisticians should find it of concern, and finally review the application of latent variable models to assess differential item functioning and bias.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to S.Yvonne.Moore@census.gov.
Return to topTitle: The Wave 2 National Epidemiologic Survey on Alcohol and Related Conditions (NESARC): Survey Design and Incentive Experiment
- Speaker: Dr. Bridget Grant, Chief of Laboratory of Epidemiology and Biometry in the Division of Intramural Clinical and Biological Research at the National Institute on Alcohol Abuse and Alcoholism
- Date & Time: Wednesday, November 12, 2003 from 10:30 a.m. to 12:00 p.m.
- Location: Morris Hansen Auditorium, Federal Office Building #3, U.S. Census Bureau, 4700 Silver Hill Road, Suitland, MD. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
THE 2003 ROGER HERRIOT AWARD FOR
INNOVATION IN FEDERAL STATISTICS
Title: Statistical Issues in Counterterrorism
- Recipient and Main Speaker: David L. Banks, Duke University
- Other Speakers
Nancy L. Spruill, Office of the Secretary of Defense
Wendy L. Martinez, Office of Naval Research - Chair: Fritz Scheuren, NORC
- Date/Time: Thursday, November 13, 2003 12:30 - 2:00
p.m.
Reception to Follow - Location: BLS Conference Center, 2 Postal Square Building, across from Union Station. Conference Rooms 2 and 3.
- Co-sponsors of the Herriot Award: Washington Statistical Society, American Statistical Association's Government Statistics Section and Social Statistics Section
Abstact:
In less than 20 years of service, David L. Banks has made significant contributions to federal statistics. At the National Institute of Standards and Technology he pioneered the use of Bayesian statistics for metrology and made key comparisons to improve accuracy and support international commerce. At the Department of Transportation (DOT), he helped to build a new federal statistical agency (BTS) and led efforts in the economic analysis of transportation data. During his short time at the Food and Drug Administration, he led the effort to apply statistical methods for risk analysis and game theory to counter bio-terrorism.
Other Information:
Fritz Scheuren will discuss David's many contributions to the federal statistical community. After which, David will present a talk, "Statistical Issues in Counterterrorism." Counterterrorism has introduced a number of new research problems for statisticians. This talk quickly reviews the range of topics that are being addressed by various researchers, and then focuses upon two in which there seems to be particular potential: the combination of statistical risk analysis with game theory, and the use of multidimensional scaling to improve biometric identification algorithms.
Nancy L. Spruill and Wendy L. Martinez will comment on David's talk and his contributions promoting the use of good statistical methods in government counterterrorism efforts Roger Herriot was the Associate Commissioner for Statistical Standards and Methodology at the National Center for Education Statistics (NCES) before he died in 1994. Throughout his career at NCES and the Census Bureau, Roger developed unique approaches to the solution of statistical problems in federal data collection programs.
Return to topTitle: Efficient Estimation for Surveys with Nonresponse Follow-Up Using Dual-Frame Calibration
- Speaker: Vincent G. Iannacchione, Statistics Research Division, RTI International
- Co-authors: Avinash C. Singh and Jill A. Dever, RTI International
- Date/Time: Tuesday, December 2, 2003, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Room tba, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
This is the second in a series of WSS seminars on calibration and related types of estimation.
Abstract:
In surveys where response rates are low, a follow-up survey of nonrespondents may be used to augment the respondents from the main survey. This may help in reducing the residual nonresponse bias still present in survey estimates based only on the main survey after adjustments for high nonresponse are made via modeling. However, when cost considerations require that the follow-up sample size be small, the reduction in bias obtained from the follow-up may be negated by the increase in sampling variance due to highly unequal selection probabilities in the combined sample. In this situation, a possible solution may be to trim the extreme weights in order to reduce the mean square error (MSE) associated with key survey estimates. However, it is not clear how to control the bias introduced by trimming.
We present an alternative in which we make more efficient use of information in the data. Our method is motivated by analogy with small-area estimation techniques in that our goal is to balance the variance of an unbiased but unstable quasi design-based estimator (this is based on the main and the follow-up samples with possibly nonresponse model adjustments for the follow-up) with a biased but stable quasi model-based estimator (this is based on the main sample with a nonresponse model adjustment). The term 'quasi' is used to signify that in the first case, the design-based estimate plays the major role as only a small part of the sample has nonrespondents, while in the second case with no follow-up, model adjustment for nonresponse plays the major role as a large part of the sample has nonrespondents.
We propose that the ideas underlying dual-frame estimation together with sampling weight calibration can be used to develop composite weights to produce estimates that are expected to strike a balance between variance and bias. The weight calibration is performed such that it has built-in controls for extreme weights while preserving the known population totals for various auxiliary variables as well as zero controls for difference estimates from the two samples for a key set of study variables. The proposed method is illustrated for a survey of Gulf War veterans with a nonresponse follow-up survey.
Return to topTitle: Survey Nonresponse Measurement Reconsidered
- Speakers: Fritz Scheuren, NORC; Mike Dennis, Knowledge Network; Robie Sangster, BLS
- Chair: Wendy Rotz, Ernst and Young, LLP
- Discussant: Brian Harris-Kojetin, OMB
- Date/Time: December 3, 2003 at 1230-1400.
- Location: BLS Conference Center, 2 Postal Square Building, across from Union Station
- Sponsors: Methodology Section of the Washington Statistical Society (WSS) and the American Association for Public Opinion Research (AAPOR)
Abstract:
Nonresponse has many effects on survey quality. All forms of unit nonresponse increase the expense of getting a sample of a given size. Completely ignorable nonresponse, however, only reduces the sample size; otherwise it does not impact on the mean square error. Other forms of unit nonresponse have potential biasing effects, depending on the success that the survey practitioner has in modeling the response mechanism. Currently surveys often report overall measures which do not distinguish between these types of nonresponse. Can new measures be constructed? And if so, would such measures change our current emphasis on refusal conversion and focus efforts elsewhere?
Return to topTitle: Standards and Metadata in a Statistical Agency
- Speaker: Daniel W. Gillman, Bureau of Labor Statistics
- Discussant: Charles J. Rothwell, National Center for Health Statistics
- Chair: Eugene M. Burns, Bureau of Transportation Statistics
- Date/Time: Thursday, December 4, 2003, 12:30-2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 8, 2 Massachusetts Ave., N.W., Washington, D.C. Please use the First Street entrance to the PSB.
- Sponsor: WSS Quality Assurance and Physical Sciences Section
Abstract
There is a wide range of standards in use today, they cover many subject areas, and there are many standards development organizations (SDOs). The most successful standards over time are developed through a consensus building process that is open, subject to due process, is transparent, and has a right of appeal. The World Wide Web Consortium (W3C) and the International Organization for Standardization (ISO) are two well-known examples of SDOs.
Many statistical agencies develop standards, either by themselves or with other similar organizations. In fact, some have a standards division that is responsible for statistical standards within the organization. Two well-known (in the US) examples of statistical standards developed by statistical organizations are the North American Industrial Classification System (NAICS) and the Standard Occupational Classification (SOC). Statistical organizations also use ISO, W3C, and other standards.
NAICS and SOC are standards related to data. These and other code sets, often under the responsibility of ISO or other SDOs, are used to describe, classify, or code data that is collected by the agency. In this sense, these standards are also metadata. And, so, there is a strong connection between standards and the metadata that describes the data and survey life cycle within the statistical agency.
There are also metadata standards. These are standards that address how one organizes or describes data. The Unified Modeling Language (UML) and the eXtensible Markup Language (XML) are two such examples.
This paper describes an ideal standards setting process, relates how standards influence the work of statistical agencies, describes the connection between standards and metadata, and shows how standards based metadata management works. The need for a coherent standards strategy in the statistical agency is discussed.
Return to topTitle: NCI's Biostatistics Grant Portfolio and NIH Funding Mechanism
- Speaker: Dr. Ram Tiwari, National Cancer Institute/ National Institute of Health
- Time: 11:00-12:00 Noon December 5, 2003
- Location: Funger Hall, 310, 2201 G Street, NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Abstract
The talk consists of two parts. In Part I, I will talk about our newly released website: www.statfund.cancer.gov, which contains information about a large proportion of NIH's funded grants in Biostatistics. These grants are >housed in the Division of Cancer Control and Population Sciences at the National Cancer Institute (NCI). I will also discuss various funding opportunities in (Bio)statistics at NCI. In Part II, I will go over NIH's funding mechanisms and discuss the grant review process at NIH in great detail.
For a complete list of upcoming seminars check the Department's seminar web site: http://www.gwu.edu/~stat/seminars/Fall2003.htm. The campus map is at: http://www.gwu.edu/~map/html/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359.
Return to topMapping Environmental Indicators: A Demonstration of Dynamic Choropleth Maps (DC Maps) Java-based Web Application
- Speaker: William P. Smith, Ph.D., Senior Statistician, Computer Scientist, U.S. Environmental Protection Agency
- Session Chair: Mel Kollander, Director, Washington Office, Institute for Survey Research of Temple University (mellk@erols.com or mel.kollander@temple.edu).
- Location: Conference Room 9, BLS Conference Center, Postal Square Building, 2 Massachusetts Ave., NE, Washington, DC
- Sponsor: WSS
PLEASE NOTE: IF YOU ARE PLANNING TO ATTEND, E-MAIL marilyn.bagel@temple.edu NO LATER THAN NOON, WEDNESDAY DEC. 10, AND YOUR NAME WILL BE PLACED ON THE OFFICIAL VISITORS LIST. FOR SECURITY REASONS, YOUR NAME MUST BE ON THE VISITORS LIST OR YOU WILL BE DENIED ACCESS.
About the Presentation. Dr. Smith will demonstrate Dynamic Choropleth Maps (DC Maps), a dynamic Web-based geographic mapping tool that the U.S. Environmental Protection Agency (U.S. EPA) uses for visualizing possible relationships between environmental, health, and demographic indicators. This interactive visualization focuses on using map slider controls to make spatial contexts and data interactions visible. Such a tool can be used to visualize environmental indicators spatially and to allow one to interact with up to three indicators at once for dynamic real-time map rendering. Patterns that would be almost impossible to discern from static maps may become apparent through dynamic views of these indicators on a choropleth map. Multiple indicators may be selected for mapping from a list of over 300 data sets. Data are displayed using a county-level choropleth map of the United States. A choropleth map displays numerical data for geographic areas by sorting the data into classes and assigning each class a color on the map.
How DC Maps Work. DC Maps can be used to create quick map-based displays or to identify possible associations between indicators for further study. For each indicator displayed on the map, a slider bar allows the user to condition or filter the data to observe possible relationships between the indicators. As the sliders are moved, the map is updated instantly to reflect interactions in the data. This enables the user to see, for example, the change in the distribution of chemical releases as the user varies poverty rates. The list of indicators can be customized to reflect user needs. Also, the geographic boundary data can be varied to accommodate these needs and display alternative data sets.
Data Available for Display. Currently DC Maps displays environmental, health, demographic, and economic data at the county level from a number of key sources, including the following. The data used for the indicators listed are available for export and use outside DC Maps.
- Census 2000 Demographic Data
- HHS Health Indicators
- NCI Cancer Mortality Data
- U.S. EPA Toxics Release Inventory, Air and Water Quality Data
- Other Economic, Labor, Agricultural, and Health Statistics
Contact Information.
To view the DC Maps development site go to http://users.erols.com/turboperl/dcmaps.html
For more information go to http://users.erols.com/turboperl/help.html
Or contact: William P. Smith, Ph.D.
U.S. Environmental Protection Agency
OEI/OIAA/EAD/ASB (2842T)
U.S. EPA West, Room 5305K
1200 Pennsylvania Ave., NW
Washington, DC 20460
smith.will@epa.gov
Tel: 202-566-0636
Fax: 202-566-0677
Dr. Smith is a senior statistician and computer scientist at the U.S. Environmental Protection Agency in Washington, D.C. He is currently working in the Office of Environmental Information, where he has built several high performance, information products for Web, such as EPA's well known TRI Explorer. Dr. Smith received his Ph.D. in Statistics from the American University in 1979.
Return to topSeminar Archives
2024 | 2023 | |||
2022 | 2021 | 2020 | 2019 | |
2018 | 2017 | 2016 | 2015 | |
2014 | 2013 | 2012 | 2011 | |
2010 | 2009 | 2008 | 2007 | |
2006 | 2005 | 2004 | 2003 | |
2002 | 2001 | 2000 | 1999 | |
1998 | 1997 | 1996 | 1995 | |