Washington Statistical Society Seminar Archive: 2002
Title: On Disclosure Protection for Non-Traditional Statistical Outputs
- Speaker: Arnold Reznek, U.S. Census Bureau
- Co-Author: David Merrell, U.S. Census Bureau
- Chair: Virginia DeWolf, National Academy of Science
- Discussant: Lawrence H. Cox, National Center for Health Statistics
- Date/Time: Tuesday, January 15, 2002, 12:30 to 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 1, 2 Massachusetts Ave., NE, Washington, DC. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
By law, U.S. Federal statistical agencies must protect the confidentiality of the microdata provided by respondents on surveys and censuses. These agencies' "traditional" data products are aggregates (e.g., tables of frequency counts or totals) or public use microdata files. Disclosure limitation methods for these products (e.g., cell suppression; perturbation; topcoding; recoding) are well developed.
Researchers at the Census Bureau's Center for Economic Studies (CES) and its Research Data Centers (RDCs) commonly generate "nontraditional" statistical output; e.g., from linear and nonlinear regression models, semi-parametric and non-parametric estimation models, and simulations. We think current disclosure limitation methods are less appropriate for this type of output, but the literature provides little guidance. This can make it difficult to balance releasing meaningful research results with maintaining confidentiality protection.
For these nontraditional outputs, we discuss disclosure risks and the appropriateness of existing disclosure limitation methods. We use simulation methods and examples from past disclosure review. We also pose questions for future research. Return to top
Topic: An Investigation of Response Rates in Random Digit Dialed Telephone Surveys
- Speakers:
Brenda Cox, Senior Vice-President, RoperASW
Daniel O'Connor, Mathematica Policy Research
Kathryn Chandler, NCES - Discussant: Clyde Tucker, Bureau of Labor Statistics
- Date & Time: Wednesday, January 23, 2002, 12:30 - 2:00 p.m.
- Location: BLS Conference and Training Center (basement level), Rooms #9 & #10, Postal Square Building, 2 Massachusetts Ave., NE, Washington, DC (Enter on First St., NE, and bring a photo ID.) Metro: Union Station, Red Line.
- Co-sponsored by:
American Association for Public Opinion Research
Washington/Baltimore Chapter
& Washington Statistical Society Data Collection Methods Section
Conventional wisdom suggests that obtaining response in telephone surveys is becoming difficult. In describing current problems, interviewers mention the increasing use of answering machines and caller ID as well as the frequency with which households receive sales calls. This perception of declining response rates provided the impetus for this investigation of whether the 1990's had witnessed a decline in response rates for random digit dialed (RDD) telephone surveys. Response rate results were compiled for public and privately sponsored RDD surveys conducted since 1990. To allow comparisons across surveys, each survey's response rate was recalculated using the definition provided by the Council of American Survey Research Organizations (CASRO). This presentation summarizes the results of that investigation, including the wide variety of definitions the surveys used in defining response rates, the variation in response rates across surveys, and potential correlates of response rate differences.
Note: If you did not get an e-mail notice of this meeting but want one for future meetings, please contact dc-aapor.admin@erols.com. Return to top
Topic: Politeness and Cross-cultural Communication
- Speaker: Yuling Pan, Georgetown University
- Date & Time: January 23, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
This presentation explores the issue of linguistic politeness and its role in cross-cultural communication. Linguistic politeness refers to the appropriate way of using language to communicate with others in a given situation. It involves not only the use of language, but also the knowledge of cultural norms and behavioral patterns that govern the use of politeness strategies. Because of the cultural differences in history and worldview, socialization process, and the concept of human relationships, different cultural groups use or prefer certain politeness strategies for communication in a given social setting. Failure to recognize or use the right politeness strategies in cross-cultural communication will cause misunderstanding and/or misjudgment among people from different cultural backgrounds.
In this talk, the speaker will focus on the role of language and the following issues: 1) types of linguistic politeness and their functions, and how the social dimensions of power, distance, affect and formality influence the use of politeness strategies; 2) the relationship between politeness strategies and the social factors of participants, setting, topic and function; 3) situational variation and cultural variation in the use of politeness strategies and their implication in cross-cultural communication.
The study of politeness is important not only for sociolinguists, but also for those professionals whose work requires a constant use of spoken or written language to communicate with people from different cultural groups. The speaker will draw on consulting experience to illustrate how the study of politeness can be applied to professional communication and business communication in multicultural settings.
This program is physically accessible to persons with disabilities. For interpreting services, contact Yvonne Moore at TYY 301-457-2540 or 301-457-2853 (voice mail) Sherry.Y.Moore@census.gov. Return to top
Title: Semiparametric Bayesian Techniques for Problems in Circular Data
- Speaker: Professor Kaushik Ghosh, Department of Statistics, George Washington University
- Date & Time: January 25, 2002, 111:00 a.m - 12:00 p.m.
- Location: Funger Hall 321. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Many scientific experiments generate observations that are two-dimensional directions or are periodic with a known period. Such data can be represented by points on a circle - hence the name circular data. In this work, we consider the problems of prediction and tests of hypotheses for circular data in a semiparametric Bayesian setup. Observations are assumed to be independently drawn from the von Mises distribution and uncertainty in the location parameter is modeled by a Dirichlet Process Prior. For the prediction problem, we present a method to obtain the predictive density of a future observation, and, for the testing problem, we present a method to obtain the posterior probabilities of the hypotheses under consideration. Incorporation of the semiparametric model gives us more flexibility and robustness against prior mis-specifications. While analytical expressions are intractable, the methods are easily implemented using the Gibbs sampler. We illustrate their use with examples from Medicine and Geology.
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/Fall2001.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359. Return to top
Topic: Record Linkage and Machine Learning
- Speaker: William E. Winkler, Statistical Research
Division, U.S. Bureau of Census
william.e.winkler@census.gov - Date/Time: Tuesday, January 29, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call 301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
The record linkage model of Fellegi and Sunter (1969) is equivalent to generalized Bayesian network models in machine learning (e.g., Mitchell 1997, Winkler 2000). The underlying computational model uses Maximum Entropy ideas (I.J. Good 1963, Dykstra 1985, Winkler 1990). The first part of this talk introduces an elementary version of the Fellegi-Sunter model that corresponds to naive Bayesian networks. For record linkage, the methods are extended to deal with approximate string comparison, automatic estimation of probabilities without training data, and missing or erroneous identifiers (Winkler 1988, 1990). Friedman (1997, 1999) presents related methods for Bayesian networks.
The second part of this talk provides methods for dealing with automatic estimation of probabilities when there is interaction between identifiers (Winkler 1989, 1990, 1993, Meng and Rubin 1993) and when affine constraints are used to predispose probabilities to certain regions of the parameter space. Efficient methods of accounting for 2-way interactions have been used for Bayesian networks (Sahami 1996, Dumais et al 1998, Sahami et al 1999). Record linkage has primarily been applied in situations where labeled training data are not available. Recent work has shown how general EM methods (Nigam et al. 2000, Winkler 2000) and general MCMC methods (Larsen and Rubin 2001) can yield suitable classification rules when combinations of labeled training and unlabeled test data are used for training.
This program is physically accessible to persons with disabilities. For interpreting services, contact Yvonne Moore at TYY 301-457-2540 or 301-457-2853 (voice mail) or Sherry.Y.Moore@census.gov. Return to top
Title: A Adaptive and Link-Tracing Sampling Designs Surveys
- Speaker: Professor Steven K. Thompson, Ph.D.
Pennsylvania State University - Discussant: C Monroe G. Sirken, Ph.D.
Sr. Research Scientist, Office of the Director, National Center for Health Statistics (NCHS) - Chair: Myron J. Katzoff, Ph.D.
Office of Research and Methodology, NCHS - Date & Time: January 31, 2002, 10:00 - 11:00 a.m
- Location: NCHS Auditorium (11th floor)
Metro III/Presidential Bldg.
6525 Belcrest Road
Hyattsville, MD - Sponsor: NCHS and WSS Methodology Sectionn
Adaptive sampling designs are those in which the procedure for selecting the sample depends on values of the variable of interest observed during the survey. They can be useful for surveys of populations or monitoring of health events that are highly clustered in space or time. For example, if cases of a rare, contagious disease are encountered in a survey unit, neighboring units can be added to the sample.
Link-tracing designs are used in studies of hidden human populations, such as populations of people at high risk for HIV infection or transmission. In such studies, social links are followed from one individual to another to add more members of the hidden population to the sample. Similarly, in national health surveys, link-tracing techniques could be used to increase the representation of underrepresented target groups.
In this talk, a variety of adaptive and link-tracing sampling methods will be described and their application to health surveys and surveillance programs discussed. Return to top
Title: Estimation of the Self-Similarity Parameter When Data Has Finite Variance or is Heavy-Tailed
- Speaker: Vladas Pipiras, University of Boston
- Time: Monday, February 4, 2002, 4:15 pm
- Place: Room 3206, Mathematics Building, University of Maryland College Park. For directions, please visit the Mathematics Web Site: http://www.math.umd.edu/dept/contact.html
- Sponsor: University of Maryland, Statistics Program, Department of Mathematics
The focus of this talk is on data whose statistical properties are preserved with respect to scaling in time and/or space. Such scaling (self-similar) datasets have been found in various applications, most notably in telecommunications several years ago. One of the key questions related to scaling data is estimation of its self-similarity parameter. We will discuss how this can be done by using wavelets. We will first go over wavelet-based estimation in finite variance data and then talk about extensions to data with heavy tails. Return to top
Topic: GIS and Image Based Approaches to TIGER Enhancement
- Speakers:
Patricia Hu, Bruce Peterson, Demin Xiong
Center for Transportation Analysis
Oak Ridge National Laboratory - Date/Time: February 6, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: University of Maryland, Statistics Program, Department of Mathematics
In this seminar, we will discuss technical approaches for improving and updating TIGER databases, focusing on the use of GIS-based software tools and remotely sensed data. We will first propose a distributed mode of a geographic data system which is modular in space, scale, and function, and facilitates distributed responsibility for maintenance. We will then talk about a set of software tools that can be potentially used for the process. These tools will include: (1) image road network extraction tools for TIGER line file update; (2) map matching tools for road network data conflation and integration, address matching and geographic boundary correlation; and (3) generalization tools for multi-scale centerline road network representations. More importantly we would like to take this opportunity to learn more about existing capabilities and the previous experience for TIGER enhancement, as well as ideas and requirements regarding future directions.
This program is physically accessible to persons with disabilities. For interpreting services, contact Yvonne Moore at TYY 301-457-2540 or 301-457-2853 (voice mail) or Sherry.Y.Moore@census.gov. Return to top
Title: A Transaction Price Index for Air Travel
- Speaker: Dr. Janice Lent, U. S. Bureau of Labor
Statistics
Joint work with Alan Dorfman, U. S. Bureau of Labor Statistic - Date & Time: February 8, 2002, 11:00 a.m - 12:00 p.m.
- Location: Funger Hall 321. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
We present research undertaken to develop a price index estimator based on data from the U.S. Transportation Department's (DOT's) Origin and Destination (O&D) Survey. Through this survey, the DOT collects prices actually paid by consumers for air travel; these may differ considerably from "list prices" (used in the official U.S. airfare CPI) due to the airlines' use of complex pricing structures. Since the O&D survey was not designed to provide data for price index estimation, however, the research involves testing unique imputation and across-time matching procedures. After a brief introduction to the general field of price index estimation, we describe our methodology and compare our experimental index series to the official airfare CPI series. We illustrate their use with examples from Medicine and Geology.
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/spring2002.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359. Return to top
Title: Small Area Estimation for U.S. States, Counties, and School Districts
- Speaker: William R. Bell, Ph.D. Senior Mathematical
Statistician
Small Area Estimation SRD, Bureau of the Census, Washington, DC - Time: Thursday, February 14th, 2002, 12:10 pm - 1:00 pm
- Place: Room 1208, LeFrak Hall, University of Maryland College Park. For directions, please visit the Mathematics Web Site: http://www.math.umd.edu/dept/contact.html
- Sponsor: University of Maryland, Statistics Program, Department of Mathematics
In response to growing demand form small area estimates for public policy purposes, the Census Bureau has developed the Small Area Income and Poverty Estimates (SAIPE) program. This talk describes the SAIPE models and methods used to produce small area estimates of poor school-age children, which are used in allocations of over $7 billion of funds under Title I of the Elementary and Secondary Education Act. The state and county models use dependent variables obtained from direct poverty estimates from March current Population survey (CPS) data, and predictor variables formed from IRS tax return files, food stamp program data, 1990 census data, and updated Census Bureau population estimates. These models also allow for sampling error in the direct CPS estimates. Some statistical issues arising with these models are discussed. A Synthetic procedure is used to produce school district estimates of children in poverty by applying school district-to-county shares from the 1990 census to the SAIPE county model-based estimates of children in poverty. Evaluations of the school district estimates are discussed. Return to top
Title: Memorial Session: Wray Jackson Smith
- Speakers:
Bette S. Mahoney, Consultant
Dhirendra Ghosh and Sameena Salvucci, Synectics for Management Decisions, Inc.
Nancy Kirkendall, Energy Information Administration
John L. Czajka, Mathematica Policy Research, Inc. - Chair: Elizabeth Margosches, Environmental Protection Agency
- Date/Time: Thursday, February 14, 2002, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Conference Center, Room 1, Postal Square Building (PSB), 2 Massachusetts Avenue, NE, Washington, DC. Please use the First St., NE, entrance to the PSB.
- Sponsors: WSS Public Policy Section, ASA Government Statistics Section, ASA Caucus on Women in Statistics
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
This session honors Wray Jackson Smith for his contributions to the practice of statistics within the federal government and his service to the profession. Dr. Smith, who died unexpectedly in May 2000, was an active member of the Washington Statistical Society, a Fellow of the American Statistical Association (ASA), a founding member of the Government Statistics Section, and a long-time supporter of the Women's Caucus. Dr. Smith is remembered best for his work in federal program evaluation and his role in coordinating or helping to launch several major surveys for the Department of Health and Human Services and the Department of Energy. After retiring from the federal government in 1983, he maintained a very active role in federal statistics from the private sector.
In the first paper, Bette Mahoney reviews some of Dr. Smith's contributions to the application of statistical research to social policy, including his participation in early evaluations of Vista, the Job Corps, and Day Care and his role in some of the major social surveys of the 1970s. In the second paper, Dhirendra Ghosh and Sameena Salvucci discuss the influence of Dr. Smith's 1980 doctoral dissertation on the later work of Smith and others on the optimum periodicity of repeated surveys. Ghosh and Salvucci extend the generality of this work. The third paper, by Nancy Kirkendall, outlines a planned collaboration between herself and Dr. Smith on a text discussing the use of time series methods in periodic surveys. The paper reviews applications to assessing survey costs, analyzing survey design information, and designing edit and imputation procedures. The final paper, by Thomas Jabine and John Czajka, considers the problems that attend the use of periodic sample surveys as a data source in formulas for allocating federal program funds. The paper draws on Dr. Smith's work in the late 1970s as chair of the subcommittee that produced Statistical Policy Working Paper 1 for the Federal Committee on Statistical Methodology. In the month before his death Dr. Smith revisited this topic with a background paper that he delivered to keynote a Workshop on Formulas for Allocating Program Funds. Jabine and Czajka explore alternative approaches to addressing these problems.
This session kicks off the new Wray Smith Scholarship, sponsored by the Government Statistics Section of the ASA, with support from a broad range of organizations, including the Washington Statistical Society. The scholarship is intended to reward promising young statisticians for their diligence and, thereby, encourage them to consider a future in government statistics. A reception will follow this session. Return to top
Topic: Masking and Re-identification Methods for Public-Use Microdata
- Speaker: William E. Winkler, Statistical Research
Division, U.S. Bureau of Census
- Date/Time: February 21, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
This talk describes methods for masking public-use microdata. A primary concern (analytic validity) is whether the masked microdata will allow reproduction of some of the analyses that could be produced by the original unmasked data. In some situations, special software routines may be required to analyze the masked microdata. We describe re-identification methods that can be used to evaluate the confidentiality of the microdata.
This program is physically accessible to persons with disabilities. For interpreting services, contact Yvonne Moore at TYY 301-457-2540 or 301-457-2853 (voice mail) or Sherry.Y.Moore@census.gov. Return to top
Title: Global Atmospheric Changes: Statistical Trend Analyses of Ozone and Temperature Data
- Speakers: George C. Tiao, The University of Chicago
- Chair: David Findley, U.S. Census Bureau
- Date/Time: Monday, February 25, 2002, 12:30 to 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 7, 2 Massachusetts Ave., NE, Washington, DC. Please use the First Street entrance to the PSB.
- Sponsor: WSS Methodology Section
The ozone layer in the stratosphere plays an important role in the life cycle on earth. This is mainly because ozone absorbs the harmful ultraviolet radiation from the sun and prevents most of it from reaching the surface. In recent years, there has been considerable attention focused on the effect of the release of chlorofluoromethanes on the ozone layer. There has also been an intense interest in global warming due to man-made causes such as the burning of fossil fuels.
In this talk we present findings of an extensive statistical analysis of ozone and temperature data over the last thirty years from networks of ground stations and from satellites. The principal objectives of the analysis are (i) to assess trends in ozone and temperature, and (ii) to compare the estimated trends with predictions obtained from large scale chemical/dynamical models of the atmosphere. Some statistical issues related to trend detection and analyses will also be discussed. Return to top
Title: Maximum Likelihood Estimation for Fractional Diffusions
- Speaker: Dr. Jay Bishwal, Department of Mathematics
University of Cincinnati
- Date & Time: March 1, 2002, 2:00 - 3:00 p.m.
- Location: Funger Hall 307. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Recently, it has been empirically found that log share prices allow long range dependence between returns on different days. In view of this, it becomes necessary to extend the diffusion models to processes having long range dependence. One way is to model these data by stochastic differential equations with fractional Brownian motion (fBM) driving term, with Hurst index greater than 1/2. The fbm being not a Markov process and not a semi-martingale, except where the Hurst index equals 1/2, the classic Ito calculus cannot be used to develop the theory. First, recent developments in fractional stochastic calculus: stochastic integral with respect to fBM, fractional Ito formula and fractional Girsanov formula will be reviewed. The use of Volterra and Dirichlet stochastic calculus will be emphasized. The long time asymptotic behaviour of the maximum likelihood estimator in the drift parameter in the nonlinear SDE driven by fBM will be studied. Some further problems on estimation in fractional diffusions based on discrete observations will be discussed.
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/spring2002.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359. Return to top
Title: Estimating Output Growth with Labor Market Indicators: A Kalman Filter Approach to Interpolation and Prediction of GDP with Noisy Data
- Speaker: Mark French, Federal Reserve Board
- Date & Time: March 1, 2002, 2:00 - 3:00 p.m. Discussant: Peter Zadrozny, Bureau of Labor Statistics
- Chair: Linda Atkinson, Economic Research Service, USDA
- Date & Time: Wednesday, March 6 , 2002; 12:30 PM - 2:00 PM
- Location: Bureau of Labor Statistics, Conference Center Room 10, Postal Square Building (PSB), 2 Massachusetts Ave. NE, Washington, D.C. Please use the First St., NE, entrance to the PSB. To gain entrance to BLS, please see "Notice" at the top of this web page.
- Sponsor: Economics Section
This paper uses monthly labor-market data to estimate the unobserved month-to-month path of GDP, and to predict near-term growth of quarterly GDP. Past studies typically used generalized least squares methods, which rely on the exogeneity of the indicator variables. However, indicator variables are typically not exogenous: in this paper, the indicator variables, production-worker hours and the unemployment rate, are determined simultaneously with GDP. This paper uses a preferable state-space/Kalman Filter approach to the interpolation/prediction problem. The paper extends Zadrozny's 1990 work in several ways -- allowing for real-time analysis with noisy data. It incorporates the effects of data revisions, sampling error, and prior release of indicators relative to GDP, using several indicator variables. The resulting forecasts of GDP are marginally improved over least-squares estimates, and at the same time generate smoothed estimates of the indicator variables and the unobserved level of monthly GDP. Return to top
Topic: Overview of the Fellegi-Holt Model of Statistical Data Editing: Current Methods and Research Problems
- Speaker:
William E. Winkler
Statistical Research Division
U.S. Bureau of the Census
william.e.winkler@census.gov - Date/Time: March 13, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, Bldg. 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
The editing paper of Fellegi and Holt (JASA 1976) provides a model that can be used for production edit/imputation systems. Two advantages of the model are that all edits are contained in easily modified tables and that each edit-failing record can be "corrected" in one pass through the data. Classic if-then-else systems cannot assure that a "corrected" record will satisfy all edits. They are difficult to maintain if there are many if-then-else edit rules in hundreds or thousands of lines of code. Progress on general purpose Fellegi-Holt systems has been slow because of the needed skills in operations research and computer science for the edit portion of systems and the lack of suitable general imputation software. This talk describes methods implemented in Canada and the U.S. and promising new methods being researched in the Netherlands and Italy. Some general background is given in research report srd98/01 at http://www.census.gov/srd/www/byyear.html.
PLEASE CALL (301) 457-4974 IF YOU PLAN TO ATTEND. A PHOTO ID IS REQUIRED FOR SECURITY PURPOSES.
This program is physically accessible to persons with disabilities. For interpreting services, contact Yvonne Moore at TYY 301-457-2540 or 301-457-2853 (voice mail) or Sherry.T.Moore@census.gov. Return to top
Title: Probabilistic Analysis of Algorithms by the Contraction Method
- Speaker: Professor Ralph Neininger
School of Computer Science, McGill University, Montreal - Date/Time: Friday, March 15, 2002, 11:00 am - 12:00 pm
- Location: Funger Hall 308. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
The contraction method provides a framework to prove limit laws for sequences of random variables satisfying recurrence relations on the level of distributions as they arise for parameters of recursive algorithms or random tree structures. The name of the method refers to the characterization of the occurring limit distributions as fixed-points of maps between spaces of probability measures, which turn out to be contractions with respect to appropriate probability metrics. In this talk an overview of this method is given with particular emphasis on recent developments.
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/spring2002.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359. Return to top
Topic: Two-Sided Coverage Intervals For Small Proportions Based On Survey Data
- Speaker: Phil Kott, National Agriculture Statistical Service
- Chair: Mary Batcher, Ernst & Young
- Date/Time: Wednesday, March 20, 2002, 12:30 to 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 7, 2 Massachusetts Ave., NE, Washington, DC. Please use the First Street entrance to the PSB. To gain entrance to BLS, please see "Notice" at the top of this web page.
- Sponsor: WSS Methodology Section
Abstract:
The standard two-sided Wald coverage interval for a small proportion, P, may perversely include negative values. One way to correct this anomaly when analyzing unweighted data from a simple random sample is to compute an asymmetric Wilson (or score) coverage interval. This approach has proven not only theoretically satisfying but empirically effective.
When P is estimated with a weighted estimator, p, using data from a complex sample, some have suggested computing an ad-hoc Wilson coverage interval by replacing the actual sample size in the Wilson formula with the effective sample size. We will examine this approach and an alternative based on a proposal by Andersson and Nerman (at ICES II). Their method focuses on removing the impact of the correlation between the numerator and denominator of the pivotal, (p - P)/v, where v is the estimated randomization standard error of p. When p is unweighted and the data comes from a simple random sample, the coverage interval generated by the Anderson-Nerman approach is asymptotically identical to the Wilson interval. Consequently, their approach appears to be a more theoretically grounded generalization of the Wilson method than replacing the sample size with the effective sample size.
An empirical study of weighted estimators under simple random sampling reveals that Anderson-Nerman coverage intervals are only slightly better than those derived using the ad-hoc Wilson approach. Both are much better than standard Wald intervals, but nowhere as good as the Wilson approach for an unweighted estimator. An investigation into the stability of the two methods reveals why and suggests an ad hoc remedy: computing the effective degrees of freedom and constructing t-based intervals. The effective degrees of freedom is not the same as the effective sample size. In fact, for unweighted p under simple random sampling, the effective degrees of freedom are infinite, which is why the Wilson interval works so well.
Return to topTopic: Machine Learning Methods for Text Classification
- Speaker:
William E. Winkler
Statistical Research Division
U.S. Bureau of the Census
william.e.winkler@census.gov - Date/Time: March 27, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, Bldg. 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Textual information consisting of words can be used for areas such as classification of documents into categories (e.g., industry and occupation coding), queries in web and library searches, and the record linkage of name and address lists. To use text effectively, the text might possibly be cleaned to remove typographical error and documents (records) be given a mathematical representation in a probabilistic model. This talk describes an application of Bayesian networks to classify a collection of Reuter's newspaper articles (Lewis 1992) into categories (Nigam, McCallum, Thrun, and Mitchell 2000, Winkler 2000). The results are indirectly compared with the current best-performing methods such as Support Vector Machines (Vapnik 1995, 2000) and Boosting (Schapire and Singer 2000, Friedman, Hastie, and Tibshirani 2000). For text classification, until five years ago, the best methods in computational linguistics outperformed the best machine learning methods. Without the need to build complicated semantic or syntactical representations, the best machine learning methods now outperform the best methods in computational linguistics. These make the methods much more language independent.
PLEASE CALL (301) 457-4974 IF YOU PLAN TO ATTEND. A PHOTO ID IS REQUIRED FOR SECURITY PURPOSES.
This program is physically accessible to persons with disabilities. For interpreting services, contact Yvonne Moore at TYY 301-457-2540 or 301-457-2853 (voice mail) or Sherry.T.Moore@census.gov. Return to top
Title: Weighted Likelihood, Mixture Models and Model Assessment
- Speaker: Marianthi Markatou, PhD
Statistics Program Director
Division of Mathematical Sciences
National Science Foundation and Professor of Statistics, Columbia University - Date/Time: Wednesday, April 3rd, 2002, 11:00 am
- Location: Executive Plaza North, Conference Room G, 6130 Executive Boulevard, Rockville, Maryland
We will discuss the methodology of weighted likelihood and its application to mixture models as well as issues associated with model assessment. A methodology for model assessment will be proposed.
For Additional Information Contact:
Susan Winer Office of Preventive Oncology
301-496-8640 Return to top
Title: The SAR Procedure: A Diagnostic Analysis of Heterogeneous Data
- Speaker: George C. Tiao, The University of Chicago
- Date/Time: Thursday, April 4, from 2:00 to 3:30 p.m.
- Location: Room 6200 Nassif (DOT) Building 400 7th Street SW (See below for security instructions)
This paper presents a procedure for detecting heterogeneity in a sample with respect to a given model. It can be applied to find if a univariate sample or a multivariate sample has been generated by different distributions, or if a regression equation is really a mixture of different regression lines. Based on some special features of cross-validating predictive distributions, the idea of the procedure is first to split the sample into more homogeneous groups and then recombine the observations in order to form homogeneous clusters. The proposed procedure can be applied to find heterogeneity in any statistical model. The performance of the procedure is illustrated in univariate, multivariate and linear regression problems
Getting into the DOT building:
1. There are four primary entrances into the Department of Transportation Nassif Building. They are labeled NE, NW, SE, and SW according to the orientation. Only one will admit visitors - that is the SW entrance on the 7th street side.
2. Visitors will need to show a picture ID upon entrance.
3. Both Promod Chandhok and Jeremy Wu are points of contact for clearance and escort. Promod's phone number is (202)-366-2158; Jeremy's is (202)-366-4648. They and other staff members will serve as escorts at the entrance.
4. Please send names, federal agency (if government worker) or picture ID such as driver's license number (if not government worker), and phone or email to Promod (Promod.Chandhok@bts.dot.gov) and Jeremy Wu (Jeremy.Wu@ost.dot.gov)to facilitate getting into DOT. Return to top
Title: Efficiency of Monte Carlo EM and Simulated Maximum Likelihood in Two-Stage Hierarchical Models
- Speaker: Wolfgang Jank, Department of Decision & Information Technologies, The Robert H. Smith School of Business, University of Maryland
- Date/Time: Thursday, April 4, 3:30 p.m.
- Location: Room 1313, Mathematics Building, University of Maryland. For directions, please visit the Mathematics Web Site: http://www.math.umd.edu/dept/contact.html
- Sponsor: University of Maryland, Statistics Program, Department of Mathematics
Likelihood estimation in hierarchical models is often complicated by the fact that the likelihood function involves an analytically intractable integral. Numerical approximation to this integral is an option but it is generally not recommended when the integral dimension is high. An alternative approach is based on the ideas of Monte Carlo integration, which approximates the intractable integral by an empirical average based on simulations. In this paper we investigate the efficiency of two Monte Carlo estimation methods, the Monte Carlo EM (MCEM) algorithm and simulated maximum likelihood (SML). We derive the asymptotic Monte Carlo errors of both methods and show that, even under the optimal SML importance sampling distribution, the efficiency of SML decreases rapidly (relative to that of MCEM) as the missing information about the unknown parameter increases. Return to top
Title: Optimal Designs For Phase I Clinical Trials
- Speaker: Professor William F. Rosenberger
Department of Mathematics and Statistics University of Maryland, Baltimore County - Date/Time: Friday, April 5, 2002, 11:00-12:00 pm
- Location: Funger Hall 321. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
A broad approach to the design of phase I clinical trials for the efficient estimation of the maximum tolerated dose is presented. The method is rooted in formal optimal design theory and involves the construction of constrained Bayesian c- and D-optimal designs. The imposed constraint incorporates the optimal design points and their weights and ensures that the probability that an administered dose exceeds the maximum acceptable dose is low. Results relating to these constrained designs for log doses on the real line are described and the associated equivalence theorem is given. The ideas are extended to more practical situations and specifically to those involving discrete dose spaces. In particular, a Bayesian optimal design scheme comprising a pilot study on a small number of patients followed by the allocation of patients to doses one-at-a-time is developed and its properties explored by simulation.
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/spring2002.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359. Return to top
Topic: Incentives in Internet Surveys
- Speakers: Robert Tortora, The Gallup Organization
- Date: Tuesday, April 9th, 2002, 12:30-2:00 p.m.
- Location: Bureau of Labor Statistics Conference Rooms 7
- Sponsor: WSS Data Collection Methods Section and AAPOR-DC
This paper discusses the use of incentives for online studies. Two surveys are studied. The first study, a quarterly survey, uses a telephone interview to screen for adult internet users. Qualified adults are asked to compete a web survey about their Internet habits. Pre-paid cash incentives are offered. The impacts of various levels of cash are presented with respect to the length of the online questionnaire and types of respondents. The second study was a census of an email list. This study asked respondents to evaluate an online subscription (product) along with various price points. Respondents were promised varying degrees of free subscription access to the product once it was available. Results of the impact of the incentive on response rates and data are presented. Return to top
Title: Beyond Black-Scholes: Probability Distribution of Stock Price Changes in a Model with Stochastic Volatility
- Speaker: Adrian Dragulescu
- Date/Time: Thursday, April 11, 3:30 p.m.
- Location: Room 1313, Mathematics Building, University of Maryland. For directions, please visit the Mathematics Web Site: http://www.math.umd.edu/dept/contact.html
- Sponsor: University of Maryland, Statistics Program, Department of Mathematics
The geometric Brownian motion model proposed by Bachelier to describe the distribution of stock prices gives a Gaussian probability density for the stock price returns. This contradicts empirical findings, which show that the tails of the probability distribution are underestimated by the Gaussian distribution. Thus we study a generalized geometric Brownian motion process were the volatility also is a stochastic variable. The solution of the Fokker-Planck equation associated with this system of two stochastic variables has a path-integral representation. We were able take the sum over all paths exactly and to write the final answer as an ordinary Fourier integral. The solution is characterized by four parameters and gives the probability distribution as a function of both price change and time. We took the integral analytically in the limit of long times. We also found that the asymptotic behavior of the probability distribution for large price changes is exponential for all times. The results are compared with the Dow Jones data. Return to top
Title: Asymptotics of Brownian and Diffusion Sample Paths
- Speaker: Dr. Srinivasan Balaji
Department of Mathematical Sciences, New Jersey Institute of Technology - Date/Time: Friday, April 12, 2002, 11:00-12:00 pm
- Location: Funger Hall 321. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
The study of stability properties like recurrence, transience, and positive recurrence of stochastic processes are of great importance in varied applications including heavy traffic queuing networks, structural stability, and stochastic finance. In this talk we will focus our attention mainly on diffusion processes. Initially some elementary properties of Brownian motion, the basic diffusion process, will be discussed in detail. Conditions for stability of diffusions and reflecting diffusions will be obtained. Also the finiteness or infiniteness of passage time moments for multidimensional diffusions will be considered. Finally some interesting open problems and future directions will be discussed.
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/spring2002.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359. Return to top
Topic: Machine Learning Methods for Text Classification
- Speaker: Kent Norman, Professor of Psychology, University of Maryland at College Park
- Date/Time: April 16, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, Bldg. 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
On-line questionnaires, particularly on the World Wide Web, are increasing in popularity for many good reasons. Not surprisingly however, bad user interface designs are appearing along with many good surveys. Unfortunately, these designs are being copied as "models" and "templates" for other surveys, resulting in a snowball proliferation of bad survey design with the resulting consequences of non-response and unreliable data.
In this talk, a number of bad designs of Web-based surveys and questionnaires will be illustrated along with some guiding principles for good design from cognitive psychology and empirical research on human/computer interaction.
Research on interface design of Web-based surveys from our lab and others will be presented. These studies include comparisons of item-based (one question per screen) versus form-based (scrolling) presentations; alternative ways of partitioning the survey into sections; the use of navigational tools such as buttons, indexes, and search fields; methods of implementing conditional branching; dealing with respondent errors, edits, and corrections; and the effect of adding hypermedia to enrich and supplement survey items.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov. Return to top
Title: Mean Squared Error of Empirical Predictor
- Speaker: Jiming Jiang, University of California, Davis
- Date/Time: Thursday, April 18, 3:30 p.m.
- Location: Room 1313, Mathematics Building, University of Maryland. For directions, please visit the Mathematics Web Site: http://www.math.umd.edu/dept/contact.html
- Sponsor: University of Maryland, Statistics Program, Department of Mathematics
The term empirical predictor refers to a two-stage predictor of a mixed effect, linear or nonlinear. In the first stage, a predictor is obtained but it involves unknown parameters, thus in the second stage, the unknown parameters are replaced by their estimators. In the context of small area estimation, Prasad and Rao (1990) proposed a method based on Taylor series expansion for estimating the mean squared errors (MSE) of empirical best linear unbiased predictor (EBLUP). The method is suitable for a special class of normal mixed linear models. In this talk I consider extensions of Prasad-Rao' approach in two directions. The first extension is to estimation of MSE of EBLUP in general mixed linear models, including mixed ANOVA models and longitudinal models. The second extension is to estimation of MSE of empirical best predictor in generalized linear mixed models for small area estimation. This talk is based on my joint work with Kalyan Das, Partha Lahiri and J. N. K. Rao. Return to top
Title: The Information Content of Trades: A Class of Market Microstructure Models
- Speaker: Dr. Anna Valeva
Department of Statistics and Applied Probability
University of California, Santa Barbara - Date/Time: Thursday, April 18, 2002, 10:00 - 11:00 am
- Location: Funger Hall 321. 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Market microstructure is a relatively new field in Economics. It deals with the study of the process and outcomes of exchanging assets under explicit trading rules. We focus on a class of models in which the market specialist exploits the information content of trades in order to set the bid-ask spread for a given asset. The presence of asymmetric information is assumed, i.e., there are informed traders against which the market specialist loses on average, but he/she is able to offset the loss by trading against `noise' traders. Thus, asymmetric information alone explains the existence of a bid-ask spread, and provides insights into the adjustment process of prices. While the idea for such type of models dates back to the mid eighties, we introduce a dynamic way of quantifying the information which informed traders use. We discuss how volume of trade conveys information about the true asset value to the market specialist. The model also explains some empirical facts described in the literature, namely, serial correlation in trades and serial correlation in squared price changes.
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/spring2002.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359. Return to top
Topic: The Medicare Current Beneficiary Survey (MCBS)
- Speakers:
Dave Ferraro, Westat
Sophia Chan, Westat
Eileen Horan, Westat
Ravi Sharma, Westat - Date: Tuesday, April 23th, 2002, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics Conference Rooms 9
- Sponsor: WSS Data Collection Methods Section
There will be four papers presented in the by the panel members on the MCBS survey:
- 1. The Redesign of MCBS PSUs (David Ferraro)
- Since its inception in 1991, the MCBS has employed a
multi-stage stratified sampling design, where the
first-stage sample is a nationally representative sample
of geographical areas referred to as primary sampling
units (PSUs). To counter potential losses in efficiency,
a decision was made in 2000 to redesign the MCBS PSU
sample. This talk summarizes the sample design and the
procedures used to select the new PSU sample.
- 2. Asking about Diseases: How Questionnaire Design Changes Impact Chronic Disease Data of the Medicare Current Beneficiary Survey Facility Instruments (Sophia Chan)
- The Facility questionnaires of the MCBS underwent
significant redesign in 1997. The mode of administration
was switched from PAPI to CAPI. The item wording,
response categories, reference period, and skip patterns
of the chronic disease questions have also been changed.
This paper compares the distributions of the chronic
disease variables before and after the redesign. The
implications of these findings for questionnaire design
and long-term care research are discussed.
- 3. MCBS Income Data Collection and Imputation (Eileen Horan and Hongji Liu)
- This presentation first will describe the MCBS'
definition of income, income variables, and current
design used in collecting income data along with fielding
issues, followed by a discussion on nonresponse and item
nonresponse rates of the income data, strategies and
procedures used in income data imputation (hot-decking
and GLM modeling).
- 4. Prescription Drug Coverage, Utilization and Spending by Medicare Beneficiaries with Heart Disease (Ravi Sharma and Hongji Liu)
-
Data from the 1998 Medicare Current Beneficiary Survey
indicate that for otherwise similar individuals with
heart disease, the likelihood and extent of utilization
of heart medications are independent of supplemental
insurance and drug coverage, whereas total and
out-of-pocket expenses are not. Yet, a large share of
heart patients does not use heart medications, as many
lack drug coverage. Nonusers without drug coverage are
disproportionately represented in the subsample that
reports a recent inpatient hospital stay for heart
disease. This paper discusses these findings. Return to top
Topic: A Weighted Jackknife Method for the Fay-Herriot Model with an Application in the Saipe Program
- Speaker: P. Lahiri University of Nebraska-Lincoln & University of Maryland at College Park
- Date/Time: April 23, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
We present a weighted jackknife method to estimate the mean square error (MSE) of empirical best linear unbiased predictor (EBLUP) of a small-area mean for the well-celebrated Fay-Herriot model. The proposed MSE estimator improves on the existing MSE estimators and is robust under a variety of situations. We illustrate our methodology for the U.S. Census Bureau's SAIPE program.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov. Return to top
Title: Combination of Information from Several Sources: The Case of t and F Tests
- Speaker: Professor Benjamin Kedem, Chair of Statistics Program, University of Maryland, College Park, Maryland
- Chair: Jai Choi, PhD, mathematical statistician, Office of Research and Methodology, National Center for Health Statistics, (NCHS), 301-458-4144
- Date/Time: Wednesday, April 24, 2002, 10:00 a.m.- 11:30 a.m.
- Location: National Center for Health Statistics Auditorium, Room 1110
- Sponsor: WSS Public health and Biostatistics Program and the Office of Research and Methodology, NCHS
We consider the following general problem. Suppose there are several sources of information regarding a certain quantity, where some of the sources are reliable and some are distorted. How can we combine all the data, reliable as well as distorted, to improve the reliability of the "good data" ? A case in point is the classical analysis of variance. It will be demonstrated that the idea of combining poor and reliable data can improve and generalize the classical t and F tests without the usual normal assumption. Return to top
Topic: Including Families with Limited English Proficiency in the Early Childhood Longitudinal Study, Birth Cohort (ECLS-B)
- Speaker: Brad Edwards, Westat
- Date: Thursday, April 25, 2002, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics Conference Rooms 9
- Sponsor: WSS Data Collection Methods Section and AAPOR-DC
Language minority families present special challenges for the Early Childhood Longitudinal Study's Birth Cohort (ECLS-B). Data collection methods include CAPI interviews with parents, direct assessments of children, self-administered paper questionnaires for fathers, and CATI interviews with child care providers. In Round 1 of the study, data will be collected from about 1,800 Asian, 1,400 Hispanic, and 900 American Indian births, part of a national sample of about 13,000 children born throughout 2001. The approach to language minority issues is to make every reasonable effort to include these families in the study, to collect their data without compromising quality in any major way, and to be sensitive to cultural differences presented by these families. At the same time, fixed resources are available to the project and there are tradeoffs in reaching out to minority language families without jeopardizing the overall study design. Specific criteria and decision rules have been developed, so that the procedures for including language minority families are not arbitrary and their data are collected in a standardized manner. Although much of the focus in developing the ECLS-B language minority protocol has been on the first two data collection points, the general approach incorporates a longitudinal perspective, and this presentation addresses issues that are likely to occur over the course of all waves of data collection, ending when the children are in first grade. Return to top
Title: Monte Carlo Approximation and the Bootstrap
- Speaker: Jim Booth, Department of Statistics. University of Florida
- Date/Time: Thursday, April 25, 3:30 p.m.
- Location: Dean's Conference Room, Van Munching Hall 3300, University of Maryland. For directions, please visit http://www.rhsmith.umd.edu/visitors/planning.html
- Sponsor: University of Maryland, Statistics Program, Department of Mathematics
The bootstrap can be thought of as a simple plug-in rule that may be stated as follows: "Estimate any functional characteristic of an unknown distribution by the same characteristic of a fitted or empirical distribution". In particular, given an i.i.d. sample from an unknown distribution, the bootstrap can be used to estimate the bias, the variance and quantiles of the sampling distribution of any statistic. In most cases exact computation of bootstrap estimates is either analytically intractible or computationally infeasible. Thus, in practice bootstrap estimates are usually approximated by Monte Carlo methods. In this talk I will discuss the amount of Monte Carlo simulation necessary for accurate approximation of bootstrap standard errors and confidence intervals and argue that the answer is more than is generally thought. Return to top
Title: On the Correlation Structure of Transformed Gaussian Random Fields
- Speaker: Victor Deoliveira
- Date/Time: Thursday, April 25, 3:30 p.m.
- Location: Room 1313, Mathematics Building, University of Maryland. For directions, please visit the Mathematics Web Site: http://www.math.umd.edu/dept/contact.html
- Sponsor: University of Maryland, Statistics Program, Department of Mathematics
Transformed Gaussian random fields can be used to model continuous time series and spatial data when the Gaussian assumption is not appropriate. The main features of these random fields are specified in a transformed scale, while for modeling and parameter interpretation it is useful to establish connections between these features and those of the random field in the original scale. This work provides evidence of the property that, for many `normalizing' transformations and under certain conditions, the correlation function of a transformed Gaussian random field is not much dependent on the transformation that is used. Hence many commonly used transformations of correlated data have little effect on the original correlation structure. The property is shown to hold for some kinds of transformed Gaussian random fields, and a statistical explanation based on the concept of parameter orthogonality is provided. The property is also illustrated using two spatial data sets and several `normalizing' transformations. Some consequences of this property for modeling and inference are also discussed. Return to top
Title: Application of the Sanov Large Deviation Theorem to the Density Estimation and Screening Significant Factors
- Speaker: Mikhail B. Malioutov, Northeastern University, Boston
- Time: Thursday, May 2nd, 2002, 3:30 pm
- Place: Room 1313, Mathematics Building, University of Maryland College Park. For directions, please visit the Mathematics Web Site: http://www.math.umd.edu/dept/contact.html
- Sponsor: University of Maryland, Statistics Program, Department of Mathematics
Two remarkable applications of the Sanov theorem will be outlined. The first one deals with large Lp deviations of general regular density estimates. The exponential rate of the Lp decay turns out to be free of the underlying density function and the estimator. The second application proves the asymptotic optimality of the famous Jaynes principle in finding significant inputs of an unknown noisy function. Return to top
Topic: YOU ARE HERE: Information Architecture and Web Navigation
- Speaker: Jonathan Lazar, Professor of Computer and Information Sciences, Towson University
- Date/Time: May 8, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
In large web sites, intranets, extranets and other information spaces, users tend to get lost and disoriented among the hundreds and thousands of web pages. It is frustrating to users when they cannot reach their task goal because they cannot find the content that they need. Information architecture and web navigation focus on structuring information in a manner so that users can find what they need with relative ease. With appropriate architecture and navigation, users are aware of what information is available on the web site and can reach the maximum amount of content with minimal effort. In turn, this will increase user satisfaction and productivity. This presentation will focus on what web designers need to know about information architecture and web navigation to design effective sites for users.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov. Return to top
Topic: Survey Automation: The Promise and the Reality
- Speaker: Jesse Poore, Ericsson-Harlan D. Mills Chair in Software Engineering, University of Tennessee
- Date/Time: May 10, 2002 3:00-4:30 p.m. (See below--RSVP by May 3d required)
- Location: Auditorium at the National Academy of Sciences, 2100 C Street, NW, Washington, DC. Please arrive early, as parking is limited, and be prepared to show identification to enter the building. Please note that the entrance to the National Academy of Sciences building at 2101 Constitution Avenue, NW, is closed to the public. Guests wishing to take Metro to the seminar are encouraged to take the National Academy's shuttle, which departs from the Foggy Bottom/GWU Metro station every 30 minutes.
A tea, from 2:30 to 3:00 p.m., will precede the afternoon session, which will begin with a discussion of recent developments in national statistics, followed by a seminar on the challenges of automating complex survey questionnaires and how statistical agencies may benefit from the computer sciences to make survey automation more efficient and effective (The seminar is based on a recent CNSTAT workshop on survey automation, which brought together leading computer scientists and survey methodologists.) The seminar will include a brief overview of why the replacement of paper questionnaires by computerized instruments---so promising in theory---can be so difficult in practice, and feature a presentation by Jesse Poore, Ericsson-Harlan D. Mills Chair in Software Engineering, University of Tennessee, on computer science tools for management, documentation, and testing of complex software. Discussion will follow the presentation. A reception will follow from 4:30-5:15 p.m. in the Members' Room.
All are welcome, but for security purposes, you must RSVP by May 3rd. To RSVP, or if you need further information, please contact Danelle Dessaint at (202) 334-3096 or email ddessain@nas.edu. Return to top
Topic: The One-Way Fixed and Random Models under Heteroscedasticity
- Speaker: Aref N. Dajani, Statistical Research Division, U.S. Census Bureau
- Date/Time: May 14, 2002, 10:30 - 11:30 a.m..
- Location: U.S. Bureau of Census, 4700 Silver Hill Road, Suitland, Maryland - the Henry Gannett and Herman Hollerith Rooms, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call Barbara Palumbo at (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.14
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
For testing the equality of several treatment effects in a one-way fixed effects model, or for testing the significance of the treatment variance component in a one-way random effects model, the usual F test is appropriate when error variances are assumed to be equal. When this assumption is violated, the F test may not be appropriate.
Many alternative tests have been suggested in the literature. When applied to actual data, the different tests can yield drastically different p-values and opposing conclusions. This brings up the issue of which test should be chosen for practical use. To address this, the different tests are compared in terms of their Type I error probability and power, estimated by Monte Carlo simulation. It turns out that there are scenarios where many of the tests have Type I error probabilities far greater than the nominal level. Based on the numerical results, recommendations are made on the choice of the test for practical use.
For the one-way random model, a test is also derived for testing the more general hypothesis that the random effect variance component is below a known bound. Interval estimation is also addressed in this context. The results are applied to several examples.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov. Return to top
Topic: An "Optimal" Data Swapping Procedure
- Speakers:
Krish Muralidhar
School of Management
Gatton College of Business & Economics
University of Kentucky, Lexington KY 40506
Rathindra Sarathy
Department of Management
College of Business Administration
Oklahoma State University, Stillwater OK 74078 - Date/Time: May 20, 2002, 10:00 - 11:30 a.m.
- Location: U.S. Bureau of Census, 4700 Silver Hill Road, Suitland, Maryland - the Henry Gannett and Herman Hollerith Rooms, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call Barbara Palumbo at (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.14
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Data swapping can be described in simple terms as a process by which the values of two records in the microdata are interchanged (or swapped). Reiss (1980) was one of the first proponents of data swapping. The objective of data swapping is to mask the original data while maintaining its characteristics. Compared to other methods of masking, data swapping provides two major advantages: (1) when analyzing a single masked attribute, data swapping preserves its statistical characteristics, while most other masking methods are subject at least to sampling error; (2) from a human perspective, it is likely to be more acceptable to users than other masking methods that involve use of noise, since data swapping uses only the original (true) values.
The two major objectives of masking procedures are accuracy and security. In broad terms, accuracy can be defined as the extent to which the masked values faithfully replicate the characteristics of the original values in the microdata set, while security can be defined as the extent to which a snooper can gain information about the confidential attributes and/or the identity of a particular record using the masked data. Ideally, an "optimal" masking procedure would replicate the information in the original data and would provide a snooper with no additional information. Most masking procedures have a theoretical basis for their implementation, enabling modifications that provide improvements in their performance. This is not the case with data swapping, although Moore (1996) has provided some theoretical results regarding the efficacy of rank-based proximity swap in achieving the two objectives of masking. This has resulted in limited advancements in swapping techniques.
In this study, we propose a new data swapping procedure for continuous numerical data that is capable of achieving both objectives of masking, leading to an "optimal" masking procedure. The new approach has a strong theoretical basis, and theoretically achieves both the accuracy and security objectives. We illustrate the application of the new procedure by using simulated microdata sets having a multivariate normal distribution (with and without non-confidential categorical data) and other distributions (with and without non-confidential categorical data). We also hope to extend the results of this study and investigate the suitability of this approach for confidential categorical data as well.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov. Return to top
Topic: Analyzing patterns of killings and migration flow in Kosovo, March-June 1999
- Speakers: Patrick Ball, American Association for the Advancement of Science
- Discussant: Mary Gray, American University
- Chair: Fritz Scheuren, Urban Institute
- Date: Friday, May 24th, 12:30-2:00 p.m.
- Location: Bureau of Labor Statistics Conference Rooms 7 and 8
- Sponsor: WSS Data Collection Methods Section, WSS Methodology Section and AAPOR-DC
During the conflict between NATO and Yugoslavia, thousands of people were killed and hundreds of thousands more fled their homes. Logically, NATO and Yugoslavia advanced quite different explanations for the violence. Yugoslavia claimed that the deaths and migration were the result of NATO's airstrikes and local actions by the ethnic Albanian insurgents (the KLA). NATO claimed that the deaths and migration were the result of a coordinated campaign by Yugoslav authorities to "ethnically cleanse" Kosovo of Albanians.
This report used techniques from historical demography as well as multiple systems estimation to model patterns of killing and migration flow. Comparing killings and migration to patterns of KLA activity and NATO airstrikes, the hypotheses advanced by the Yugoslav government are rejected. Key coincidences in the data are observed which are suggestive of agreement with the hypothesis that Yugoslav forces were responsible for the violence.
This analysis was presented in the trial of Slobodan Milosevic at the International Criminal Tribunal for Former Yugoslavia (ICTY) in The Hague on 13-14 March 2002. Return to top
Topic: Why Are Semiconductor Prices Falling So Fast? Industry Estimates and Implications for Productivity Measurement
- Speaker: Ana Aizcorbe, Federal Reserve Board
- Discussant: Marshall Reinsdorf, Bureau of Economic Analysis
- Chair: Linda Atkinson, Economic Research Service, USDA
- Date/Time: Thursday, June 13, 2002; 12:30 p.m. 2:00 p.m.
- Location: Bureau of Labor Statistics, Conference Center Room 2, Postal Square Building (PSB), 2 Massachusetts Ave. NE, Washington, D.C. Please use the First St., NE, entrance to the PSB. To gain entrance to BLS, please see "Notice" at the top of this page.
- Sponsor: Economics Section
Abstract:
By any measure, price deflators for semiconductors fell at a staggering pace over much of the last decade. These rapid price declines are typically attributed to technological innovations that lower constant-quality manufacturing costs. But, given Intel's dominance in the microprocessor market, those price declines may also reflect changes in Intel's profit margins. Disaggregate data on Intel's operations are used to explore these issues. There are three basic findings. First, the industry data show that Intel's markups from its microprocessor segment shrank substantially from 1993-99. Second, about 3-1/2 percentage points of the average 24 percent price decline in a price index for Intel's chips can be attributed to declines in these profit margins over this period. And, finally, the data suggest that virtually all of the remaining price declines can be attributed to quality increases associated with product innovation.
Return to top
WSS Annual Dinner
Statistics For A New Century: Meeting The Needs Of A World Of Data
- Speaker: Richard L. Scheaffer, Professor Emeritus, University of Florida, and Past ASA President
- Date/Time: June 18, 2002
- Location: Maggiano's Little Italy, 5333 Wisconsin Ave., N.W., Washington, DC.
Abstract:
The world is awash in data. Many are aware of the importance and power of data in their professional and personal lives, but few are educated in ways that would allow them to more fully comprehend the vast array of uses (and misuses) of data or to effectively use the quantitative information that confronts them daily. Even fewer are aware of the fact that formal study of statistics can serve to strengthen their own academic preparation for a wide variety of careers. Some successes are being achieved, however, through recent efforts to infuse statistics into the school (K-12) curriculum and to enhance opportunities for undergraduates to learn more statistics. The goals of these efforts are to empower students through improved quantitative literacy and to provide strong foundations for careers that depend increasingly on data.
Modern statistics education has generated terrific interest among educators and students at all levels; it now must prove itself by making effective use of this opportunity to produce new generations of graduates that will not drown in their world of data.
Return to top
Topic: Leonardo's Laptop: Human Needs and the New Computing Technologies
- Speaker: Ben Shneiderman, Professor of Computer
Science
University of Maryland at College Park - Date/Time: June 20, 2002, 10:30 - 11:30 a.m.
- Location: Bureau of the Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
The old computing was about what computers could do; the new computing is about what users can do. Attention is shifting from making computers intelligent to making users creative. Leonardo da Vinci could help as an inspirational muse for the new computing to push for improved quality through scientific study and more elegant design through visual thinking. We can follow Leonardo's example by integrating text and graphics, functionality and esthetics.
The new computing emphasizes empowerment and collaboration. We must reduce user frustration with annoying crashes, incomprehensible dialog boxes, and incompatible attachments. Then we can promote universal usability through interfaces that are more customizable for diverse users, more tailorable to a wide range of hardware, software, and networks, and designed to bridge the gap between what users know and what they need to know.
With these basics in place, the new computing principle is that human needs should shape technology. Four circles of human relationships and four human activities map out the human needs for mobility, ubiquity, creativity and community. Million-person communities will be accessible through desktop, palmtop and fingertip devices that support e-learning, e-business, e-healthcare, and e-government.
This talk will present an agenda of what is needed to bring about The New Computing (www.cs.umd.edu/hcil/newcomputing).
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov.
Topic: Bootstrap Approximation to Prediction MSE for State-Space Models with Estimated Parameters
- Speaker: Danny Pfeffermann, Professor of
Statistics
Hebrew University and University of Southampton
Joint work with Dr. Richard Tiller, Bureau of Labor Statistics - Date/Time: August 7, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of Census, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, FOB 3. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
We propose a simple, but general method for approximating the prediction Mean Square Error (PMSE) of the state vector predictors in a state-space model when the unknown model parameters are estimated from the observed series. As is well known, substituting the model parameters with the sample estimates in the theoretical MSE expressions that assume known parameter values results in under-estimation of the true MSE. Methods proposed in the literature to deal with this problem are inadequate and may not even be operational when fitting complex models, or when some of the parameters are close to their boundary values. Application of the method to a model fitted to sample estimates of employment ratios in the U.S.A. that contains eighteen unknown parameters estimated by a three-step procedure yields accurate results. The method may be applied to a wide variety of problems, including many of the time series and mixed linear models used for Small Area Estimation problems. This will be illustrated using the Fay-Herriot model.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov.
Return to top
Topic: Confidentiality Audit On Suppressed Entries in Multi-Dimensional Contingency Tables
- Speaker: Lawrence H. Cox, National Center for Health Statistics
- Discussant: Paul B. Massell, Bureau of the Census
- Chair: Virginia de Wolf
- Date/Time: Tuesday, July 16, 12:30 to 1:45 p.m.
- Location: Bureau of Labor Statistics, Conference Center, Conference Room 3, Postal Square Building (PSB), 2 Massachusetts Ave. NE, Washington, D.C. Please use the First St., NE, entrance to the PSB. To gain entrance to BLS, please see "Notice" at the top of this announcement.
- Sponsor: WSS Methodology Section
Abstract:
Disclosure limitation in contingency tables amounts to thwarting the ability of the data intruder to infer or make narrow estimates of small cell values. The Census Bureau adopts the base value five for "small"; the Statistics of Income Program and Statistics New Zealand prefer base value three. In two-dimensional tables, for disclosure limitation the statistical office traditionally has chosen either to round the counts to the base value or to perturb (add noise to) the counts or to suppress small counts together with additional cell values known as complementary suppressions. In multi-dimensions, a suggested approach is massive suppression, such as suppressing all internal entries, leaving only (some) marginals. Suppressed values must be subjected to a confidentiality audit to ensure that confidentiality protection has been achieved. This amounts to computing, for every suppressed small value x, the interval [min (x), max(x)] subject to all released and suppressed cell values and marginal totals. This is easily accomplished in two-dimensions using standard, efficient methods and software from linear programming. The purpose of this talk is to explore the difficulties of performing confidentiality audit in multi-dimensions. Preliminaries on mathematical properties of multi-dimensional contingency tables will be introduced, followed by examination based on examples of the utility of linear programming for confidentiality audit in multi-dimensional contingency tables. The talk is comprised of examples illustrating good, bad and ugly behaviors of contingency tables in two-, three- and four-dimensions.
Return to top
Topic: Parameter Estimation in Logistic Regression -- Not an Easy Matter
- Speaker: Thomas P. Ryan, Consultant
- Date/Time: August 19, 2002, 10:30 - 11:30 a.m.
- Location: U.S. Bureau of Census, 4700 Silver Hill Road, Suitland, Maryland - Room 3225, FOB 4. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
Logistic regression is a popular statistical tool that is used primarily in health and medical applications, but is also used in many other applications, including modeling data from complex sample surveys. Because parameter estimation is straightforward in linear regression, it would be easy to assume the same thing for logistic regression. Unfortunately, parameter estimation in logistic regression is problematic. This is known for maximum likelihood, the usual estimation method, in the case of rare events, but is apparently not known in the case of near separation of the data. The latter can cause serious problems, as will be illustrated. One alternative is to use exact logistic regression, which is generally preferable, but which also has some shortcomings. What is a user to do? Some insight will be given, and needed research will also be discussed.
(This talk will be based primarily on the paper "A Preliminary Investigation of Maximum Likelihood Logistic Regression versus Exact Logistic Regression" by E. N. King and T. P. Ryan, The American Statistician, August, 2002, 163-170.)
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov.
Return to top
Topic: Combined Survey Sampling Inference: Compromise or Consummation?
- Speaker: Kenneth R.W. Brewer, Australia National University
- Date: Tuesday, August 20, 2002
- Location: U.S. Bureau of Census, 4700 Silver Hill Road, Suitland, Maryland. Enter at Gate 5 on Silver Hill Road. Please call (301) 457-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
- Sponsor: U.S. Bureau Of Census, Statistical Research Division
Abstract:
Part 1: The Why and the How (10:30 - 12:00 Noon - Morris Hansen Auditorium/FOB3)
Design (or randomization) inference is particularly appropriate for large samples and populations, and model (or prediction) inference for small ones. It is useful to combine them, if for no other reason than because large populations are usually made up of small domains, but there are certain spinoffs as well. These include (for the design approach) circumventing the need for asymptotics when justifying the use of the Classical Ratio Estimator, and (for the prediction approach) being easily able to avoid unacceptably small case weights. The combination of the two is achieved by equating a design-based (GREG) estimator and a prediction-based (PRED) estimator, and then imposing the resulting condition on the estimator of the relevant regression coefficient. The imposition of that condition involves both approaches in something of a compromise, but it will be shown that this is seldom of any material consequence for either of them.
Part 2: Some Simple Variance Formulas and Estimators (2:00 - 3:30 p.m. - the Herman Hollerith Room, FOB 3)
The sampling literature has long been heavily sprinkled with theoretical and empirical comparisons of alternative variance estimators, some of which can involve rather complex formulas and/or logic which is difficult to follow. Some even require ad hoc adjustments as well. The combination of the two approaches seems at first to make matters even worse, because there are then three types of variance to consider: the design variance, the randomization variance and the "anticipated variance", the last involving a double expectation (over all possible samples and over all possible realizations of a prediction model). In this event, however, the three are so intimately related that transition from one to another is simple and obvious. Some surprising spinoffs include a simplification of the prediction variance (and its estimator) that can only be made when the estimator (of mean or total) is also supported by design inference. These spinoffs resemble so closely the "emergent phenomena" of modern complexity theory that the bringing together of the two approaches can arguably be viewed more appropriately as a fruitful consummation than as a mere compromise.
This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. Requests for sign language interpreting services or other auxiliary aids should be directed to Yvonne Moore at (301) 457-2540 text telephone (TTY), 301-763-5113 (voice mail), or by e-mail to Sherry.Y.Moore@census.gov.
Return to top
Title: Partial Volume Correction for Neuroimaging using Tensor Based Statistical Algorithms
- Speaker: Dr. John Aston, Bureau of the Census
- Date/Time: 11:00-12:00 a.m., September 20, 2002
- Location: Funger Hall 321, 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Abstract:
The partial volume effect in Positron Emission Tomography (PET) is a problem for quantitative adiotracer studies. These studies can be used to study of many well-known diseases such as Epilepsy but partial volume effects can cause misinterpretation of the data. The partial volume effect results from the limited spatial resolution of the imaging device (a few mm's) and results in a blurring of the data. Two factors are involved for pre-defined regions; spillover of radioactivity into neighboring regions and the underlying tissue inhomogeneity (mixed tissue types) of the particular region. Linear modelling methods are currently used to correct for this effect on a regional level, using tissue classification from higher resolution imaging modalities, e.g. Magnetic Resonance Imaging, and anatomically defined regions which are assumed to contain homogeneous tracer concentrations. We extend these methods to incorporate the underlying noise structure of the PET tomograph measurements, and develop fast tensor based algorithms to facilitate the computation of true tracer concentration estimates and their associated errors. This allows calculation of linear models in the case of massive data sets with inherent spatial correlation structure. We also investigate the possibility of using the developed noise models to infer whether the defined regions were homogenous using Krylov subspace based approximate estimates for the regional errors associated with the fits.
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/Fall2002.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359.
Return to top
Topic: Robust Seasonal Adjustment using Heavy-Tailed Distributions
- Speakers:
John Aston, Statistical Research Division, Census Bureau
Siem Jan Koopman, Free University Amsterdam, Netherlands - Discussant: Stuart Scott, Bureau of Labor Statistics
- Chair: David Findley, U.S. Census Bureau
- Date/Time: September 26, 2002, Thursday; 12:30 PM - 2:00 PM
- Location: Bureau of Labor Statistics, Conference Center Rooms 7 and 8, Postal Square Building (PSB), 2 Massachusetts Ave. NE, Washington, D.C. Please use the First St., NE, entrance to the PSB. To gain entrance to BLS, please see "Notice about Seminars at the Bureau of Labor Statistics" at the beginning of this web page.
- Sponsor: Economics Section
Abstract:
Seasonal adjustment is routinely used to eliminate seasonal effects from monthly economic time series. However these seasonal adjustments are influenced by many factors in the data. Outliers, both additive and level shift, can result in highly variable seasonal factors if outlier detection is used and outliers drop in and out of the calculation on a month by month basis. This is especially true when outliers appear towards the end of the series, as these have greater effect on up-to-date estimates.
A new method of accounting for outliers is proposed involving the use of heavy-tailed distributions, namely t-distributions. Recent developments in state space modelling techniques (Durbin and Koopman, 2000) have facilitated the incorporation of heavy-tailed distributions into the state equations. This allows error distributions to be extended, and through importance sampling, estimates of parameters from these distributions to be found.
Assessment of these new models and techniques will be presented using both simulated and real data sets. It will be shown that use of the new models can allow for more robust seasonal adjustment than the traditional outlier detection methods.
Return to top
Title: Baysian Group Testing
- Speaker: Dr. Curtis Tatsuoka, Department of Statistics, The George Washington University
- Date/Time: 11:00-12:00 a.m., October 4, 2002
- Location: Funger Hall 321, 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Abstract:
A Bayesian formulation of group testing with testing error will be considered, where group testing is viewed as a sequential classification problem on lattices. Various response distribution formulations will be presented, including the case when testing error is a function of pool size. Results include describing experiment selection rules that attain optimal rates of convergence. Non-standard group testing problems also will be discussed.
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/Fall2002.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359.
Return to top
Title: Synthetic Tabular Data To Limit Statistical Disclosure Of Sensitive Information
- Speaker: Ramesh A. Dandekar, Energy Information Administration
- Co-Author: Lawrence H. Cox, National Center for Health Statistics
- Chair: Phillip Steel, U.S. Census Bureau
- Discussant: Brian Greenberg, Social Security Administration
- Date/Time: Tuesday, October 15, 2002, 12:30 to 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 1, 2 Massachusetts Ave., N.W., Washington DC. Please use the First Street entrance to the PSB. To gain entrance to BLS, please see Notice at the beginning of this announcement.
- Sponsor: WSS Methodology Section
Abstract:
In the scientific community a synthetic product is developed when the real product is either in short supply or exhibits some undesirable properties. The objective in the latter case is to remove undesirable properties from the synthetic product. Several examples of synthetic products include: rubber, wood, sugar, fiber, fuel, and hormones.
We apply this notion to the release of statistical data products in tabular form. Here, the undesirable property is that of revealing confidential information on entities covered by the data. We explore the possibility of generating synthetic tabular data, which exhibits overall statistical characteristics similar to that of the real tabular data, yet offers protection from statistical disclosure. The method applies linear programming to synthesize tabular cells by controlled adjustments to original tabular cells. The controlled adjustments are made in such a way that the overall distortion of the original cell value is minimal, based on one of several standard criteria. The resultant synthetic table conveys approximately the same statistical information to the end users as the original table, but at reduced risk of disclosure.
Return to top
Title: Afghan Refugee Camp Surveys: Pakistan, 2002
- Speakers: James Bell, Ruth Citrin, David Nolle U.S. Department of State and Fritz Scheuren, NORC, University of Chicago
- Chair: Mary Batcher, Ernst & Young LLP
- Date/Time: Thursday, October 17, 2002, 12:30 to 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 1, 2 Massachusetts Ave., N.W., Washington DC. Please use the First Street entrance to the PSB. To gain entrance to BLS, please see Notice at the beginning of this announcement.
- Sponsor: WSS Methodology Section and AAPOR-DC
Abstract:
Both as professionals and as citizens, the events of the last year and now more have brought about many changes to our view of the world and our engagement in it. This survey was one response to those changes. Its main goal was measuring attitudes on a variety of social, economic, and political issues of the Afghan refugees that are now returning to their homeland from Pakistan. Particularly important was learning about their perceptions regarding current circumstances as well as future expectations. Methodologically, in a setting of great danger, trying to obtain a good sample of adult males in the refugee camps posed many challenges and most of the discussion will be focused on these.
Return to top
Title: The Value of Standardization - Software and Current Best Methods
- Speaker: Dr. David Morganstein, WESTAT Corporation
- Date/Time: 11:00-12:00 a.m., October 18, 2002
- Location: FFunger Hall 323, 2201 G Street NW. Foggy Bottom metro stop on the blue and orange line.
- Sponsor: The George Washington University, Department of Statistics
Abstract:
In a private statistical organization, the amount of effort needed to plan and conduct a survey is a critical indicator of success in competing for government contracts. Westat, an employee owned survey organization, must be concerned about the staff time needed to do it's work. It must also be concerned about retaining high quality staff, so job satisfaction is also a critical measure of success. The statistical group of 55 statisticians is involved in dozens of surveys every year. Often a staff member is working on 3 or more surveys simultaneously. To reduce the effort needed to support the variety of surveys and to increase interest in the work, our statistical group has standardized in two areas: software and current best methods. In this talk, we'll describe why we choose to do this, how we do it and the benefits we have observed
Note: For a complete list of upcoming seminars check the dept's seminar web site: http://www.gwu.edu/~stat/seminars/Fall2002.htm. The campus map is at: http://www.gwu.edu/Map/. The contact person is Reza Modarres at Reza@gwu.edu or 202-994-6359.
Return to top
Title: The 2002 Roger Herriot Award For Innovation in Federal Statistics
- Recipient: Daniel H. Weinberg, U.S. Census Bureau
- Speakers:
Katherine K. Wallman, Statistical Policy Office, Office of Management and Budget
William P. Butz, Rand Corporation
Paula J. Schneider, U.S. Census Bureau (retired)
Daniel H. Weinberg, U.S. Census Bureau - Chair: Edward J. Spar, Council of Professional Associations on Federal Statistics
- Date: Tuesday, November 12, 2002 12:30 - 2:00 p.m. Reception to Follow
- Location: Bureau of Labor Statistics. Conference Rooms 7 and 8. To gain entrance to BLS, please see Notice at the beginning of this announcement.
- Video Conference to selected sites.
- Co-sponsors of the Herriot Award: Washington Statistical Society, American Statistical Association's Government Statistics Section and Social Statistics Section
Abstract:
On August 12, 2002, Dan Weinberg was awarded the Roger Herriot Award at the annual meeting of the American Statistical Association in New York. Dan is the Chief of the Housing and Household Economic Statistics Division of the U.S. Census Bureau. Dan has been immersed in all three sectors of federal statistics: He taught at Yale and Tufts Universities, worked as a private sector research contractor, and has spent the last 22 years in the federal government. Those who know Dan know he "thinks outside the box" in the tradition of Roger Herriot. Because of his strong intellectual interest and expertise in poverty measurement, Dan has been a active champion for updating the 40-year-old poverty measure. Dan's successful leadership on issues of strategic importance resulted in initiating the Small Area Income and Poverty Estimates program. The promise to "end welfare as we know it" set the stage for Dan's vision to establish the Survey of Program Dynamics in order to measure the effects of welfare reform over a 10 year period. These activities exemplify the accomplishments the Herriot Award represents.
Kathy Wallman, Bill Butz, and Paula Schneider will first discuss Dan's contributions to federal statistics. Dan will then present a paper: Better Measures of Income and Poverty
Roger Herriot was the Associate Commissioner for Statistical Standards and Methodology at the National Center for Education Statistics (NCES) before he died in 1994. Throughout his career at NCES and the Census Bureau, Roger developed unique approaches to the solution of statistical problems in federal data collection programs. Dan truly exemplifies this tradition.
Return to top
Title: Correcting for Omitted-Variables and Measurement-Error Bias in Autoregressive Model Estimation with Panel Data
- Speaker: P.A.V.B. Swamy, BLS
- Discussant: Tom Lutton, OFHEO
- Moderator: Charlie Hallahan, ERS/USDA
- Place: BLS Conference Center, Room 6. To gain entrance to BLS, please see Notice at the beginning of this announcement.
- Date: Tuesday, November 12, 2002, 12:30 - 2:00 p.m.
- Sponsor: Statistical Computing Section
- Talk to be videoconferenced.
Abstract:
The parameter estimates based on an econometric equation are biased and can also be inconsistent when relevant regressors are omitted from the equation or when included regressors are measured with error. This problem gets complicated when the "true" functional form of the equation is unknown. Here, we demonstrate how auxiliary variables, called concomitants, can be used to remove omitted-variable and measurement-error biases from the coefficients of an equation with the unknown "true" functional form. The method is specifically designed for panel data. Numerical algorithms for enacting this procedure are presented and an illustration is given using a practical example of forecasting small-area employment from nonlinear autoregressive models.
Return to top
MORRIS HANSEN LECTURE
Title: Privacy and Confidentiality A New Era?
- Panelists:
Eleanor Singer, Senior Research Scientist, Survey Research Center, Institute for Social Research, University of Michigan
Norman Bradburn, Assistant Director for the Social, Behavioral, and Economic Sciences at the National Science Foundation
Tiffany and Margaret Blake Distinguished Service Professor Emeritus in the Department of Psychology, Graduate School of Business, the College, and the Harris Graduate School of Public Policy Studies, University of Chicago
Katherine Wallman, Chief Statistician of the United States Office of Management and Budget - Date/Time: Tuesday, November 19, 2002: 3:30-5:30 p.m.
- Location: The Jefferson Auditorium, USDA South Building, between 12th and 14th Streets on Independence Avenue S.W., Washington DC. The Independence Avenue exit from the Smithsonian METRO stop is at the 12th Street corner of the building, which is also where the handicapped entrance is located. Except for handicapped access, all attendees should enter at the 5th wing, along Independence Avenue. Please bring a photo ID to facilitate gaining access to the building.
- Sponsors: The Washington Statistical Society, Westat, and the National Agricultural Statistics Service.
- Reception: The lecture will be followed by a reception from 5:30 to 6:30 p.m. in the patio of the Jamie L. Whitten Building, across Independence Avenue S.W.
Abstract:
Privacy and confidentiality issues are receiving heightened attention for a variety of reasons. The exponential growth of available data and the increased ease with which large databases can be mined are causing government agencies concern over the release of information collected for the public good. At the same time, Institutional Review Boards (required for a number of government surveys) are taking a more restrictive view of what is permissible. And a seemingly minor provision of the USA Patriot Act, in response to the September 11 national tragedy, permits the Attorney General to petition the court for access to confidential data maintained by the National Center for Education Statistics. The success of the Federal statistical system, and the ability to provide accurate aggregate information for the public good, relies on the confidence that respondents place in government statistical organizations and their willingness to participate in government surveys.
This panel session brings together three key government leaders and researchers to discuss the implications of these important issues. The first speaker, Dr. Eleanor Singer, will review research findings concerning people's concerns about confidentiality of personal information collected by government agencies, focusing on the Census Bureau. In the second presentation, Dr. Norman Bradburn will highlight issues of privacy and confidentiality that are becoming increasingly salient in social and behavioral research. He will argue that information and data are different concepts and that the privacy and confidentiality issues are different for the two concepts. The third speaker, Ms. Katherine Wallman, will discuss some of the issues that have arisen since the events of September 11, the challenges they place on the Federal statistical system, and current initiatives to address these challenges.
Panelists will identify the main issues in the debate, weigh tradeoffs, discuss the role of informed consent procedures, and consider the long-term implications for federal statistics. The discussion will inform decisions on the need for, and nature of, a formal response from the statistics profession with respect to balancing compelling interests in data for national security against promises made to respondents to Federal surveys.
Return to top
Title: Confidentiality for a Mandatory Reporting System: Challenges and Solutions
- Speaker: Rich Allen, National Agricultural Statistics Service
- Discussant: Laura Zayatz, Bureau of the Census
- Chair: Jay Casselberry, Energy Information Administration
- Date/Time: Thursday, November 21, 2002, 12:30 to 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 9, 2 Massachusetts Ave., N.W., Washington DC. Please use the First Street entrance to the PSB. To gain entrance to BLS, please see the Notice above.
- Sponsor: WSS Methodology Section
Abstract:
The Agricultural Marketing Service (AMS) of the U.S. Department of Agriculture has implemented provisions of a challenging new law which requires large meat packing plants to report details of all purchases for specific time periods every day and requires AMS to issue summary reports an hour later. This presentation will outline security and analysis procedures developed to meet the publication requirements but will concentrate on the confidentiality issues.
Even with active market operations, conventional confidentiality rules meant many planned reports and data cells could not be issued. Detailed study of actual data demonstrated that market participation (and non-participation) was random and that other market participants would not have been able to identify who had purchased even if an occasional report based on one company had been issued. A new confidentiality approach based on continual analysis of 60-day reporting patterns was developed. This presentation will trace the development and approval of the new approach and present follow-up performance evaluations.
Return to top
Title: The U.S. Census Bureau's Corporate Metadata Repository: An Overview of the Development Process and Current Status
- Speaker: Samuel N. Highsmith, Jr., U.S. Census Bureau
- Chair: Manuel de la Puente, U.S. Census Bureau
- Date/Time: Thursday December 5, 2002, 12:30 - 2:00 p.m.
- Location: Bureau of Labor Statistics, Postal Square Building (PSB), Conference Center, Conference Room 9, 2 Massachusetts Ave., N.W., Washington DC. Please use the First Street entrance to the PSB. To gain entrance to BLS, please see the Notice above.
- Sponsor: WSS Social and Demographic Statistics Section
Abstract:
This presentation will provide an overview of the methodology behind the U.S. Census Bureau's Corporate Metadata Repository (CMR). This overview will begin with a discussion of the needs brought about the development of the CMR. This will be followed by a description of the development of the applications which use the CMR and a description of the metadata registry components that have been built or are under construction. The presentation will also outline some of the challenges encountered during the building a metadata registry and conclude with a report on the current status of this effort.
The construction of a corporate metadata repository is based on two combined models: a business process model of the survey and census process at the Census Bureau and a data element registry model providing complete definition of the data elements residing in datasets. The model was first adapted and used in construction of the American FactFinder Internet application. Areas providing data for dissemination on the American FactFinder site were required to provide metadata files describing the dissemination files, their variables and allowable values. The second area to adopt the CMR model was the Economic Directorate in their 2002 Economic Census redesign effort. The Economic Directorate used the CMR model to describe, organize, and all the questionnaire content for their more than 650 paper forms. One of our more recent internal customers is the Geography Division, for whom we have automated the validation of geographic metadata files sent to the AmericanFactfinder staff.
Return to top
Seminar Archives
2024 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995