Washington Statistical Society on Meetup

Washington Statistical Society Seminars: 2007

January, 2007
9
Tues.
ROC Analysis of the Multiple-Biomarker Classifier Training and Testing Problem: The Influence Function and Specification of Uncertainties in ROC Summary Measures
12
Fri.
Georgetown University Seminar
Parameter Estimation for the Exponential-Normal Convolution Model for Background Correction of Affymetrix GeneChip Data
19
Fri.
Georgetown University Seminar
Considerations in Adapting Clinical Trial Design for Drug Development
29
Mon.
Economic Turbulence in the U.S. Economy
February, 2007
2
Fri.
Georgetown University Seminar
Adaptive "Simon" Designs for Heterogeneous Patient Populations in Phase II Cancer Trials
6
Tues.
Mortality in Iraq
March, 2007
3
Thur.
Georgetown University Seminar
Sequential Monitoring of Randomization Tests
7
Wed.
New Methods and Satellites: A Program Update on the NASS Cropland Data Layer Acreage Program
8
Thur.
Measurement and Statistical Analysis of Human Rights: A Model
8
Thur.
University of Maryland
Statistics Program Seminar
The Dominance Order
16
Fri.
Georgetown University Seminar
Use of a Visual Programming Environment for Creating and Optimizing Mass Spectrometry Diagnostic Workflows
23
Fri.
Bayesian Diagnostics for Detecting Hierarchical Structure
27
Tues.
President's Invited Panel Discussion on Finite Population Correction Factors
27
Tues.
U.S. Bureau Of Census
Statistical Research Division Seminar
The Role of Context in the Recall of Minimally Counterintuitive Concepts
28
Wed.
Applications of the Johnson SB Distribution to Environmental Data
29
Thur.
U.S. Bureau Of Census
Statistical Research Division Seminar
A Test of Association of a Two-Way Categorical Table for Correlated Counts
April, 2007
10
Tues.
Introduction to Data Mining Methodology for Statisticians
13
Fri.
University of Maryland
Statistics Program Seminar
Wait! Should We Use the Survey Weights to Weight?
16
Mon.
2006 Roger Herriot Award
Bridging: Roger Herriot's Time to the Present
23
Mon.
American Community Survey Weighting and Estimation: ACS Family Equalization
May, 2007
2
Wed.
An Overview of the Semi-Competing Risk Problem
4
Fri.
Georgetown University Seminar
Systems Pharmacology of Type 2 Diabetes: A Case Study for Pharmaceutical Development
8
Tues.
The STATCOM Network: A Role for Students in Pro Bono Statistical Consulting to the Community
10
Thur.
Using the t-distribution to Deal with Outliers in Small Area Estimation
15
Tues.
Confidence Interval Coverage in Model-Based Estimation
17
Thur.
The Role of Statistics and Statisticians in Human Rights
22
Tues.
Characterization, Modeling and Management of Inferential Risk, Data Quality Risk and Operational Risk in Survey Procedures
June, 2007
12
Tues.
Book Signing and Wine Tasting
13
Wed.
The Role of Fringe Benefits in Employer and Workforce Dynamics
21
Thurs.
Spatial Association Between Speciated Fine Particles and Mortality
25
Mon.
National Health Interview Survey's 50th Anniversary Commemorative Conference
27
Wed.
BLS Statistical Seminar
Robust Prediction of Small Area Means and Distributions
July, 2007
11
Wed.
Estimation under Ignorable Response Mechanism and Unweighted Imputation
18
Wed.
Assessment of Coverage and Utility of Residential Address Lists
24
Tues.
Imputation Using Empirical Likelihood
September, 2007
4
Tues.
A Geostatistical Approach to Linking Geographically-Aggregated Data/A System for Detecting Arbitrarily Shaped Hotspots
7
Fri.
Modeling Multiple-Response Categorical Data From Complex Surveys
7
Fri.
Georgetown University Seminar
Bayesian Methods for Proteomic Biomarker Discovery Using Functional Mixed Models
7
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Experiences with Congressional Testimony: Statistics and The Hockey Stick
12
Wed.
An Introduction to the Key National IndicatorsInitiative: the State of the USA
12
Wed.
New Experiments on the Design of Complex Survey Questions
18
Tues.
U.S. Bureau Of Census
Statistical Research Division Seminar
Unduplicating the 2010 Census
19
Wed.
Survey Methodology for Assessing Geographically Isolated Wetlands Map Accuracy
21
Fri.
Georgetown University Seminar
A Geometric Approach to Comparing Treatments for Rapidly Fatal Diseases
25
Tues.
American University
Department of Mathematics and Statistics Colloquium
A Bayesian IRT Model for the Comparison of Survey Item Characteristics under Dual Modes of Administration
26
Wed.
U.S. Bureau Of Census
Statistical Research Division Seminar
Alternative Survey Sample Designs, Seminar #1: Network, Spatial, and Adaptive Sampling
28
Fri.
Small Area Estimation: An Empirical Best Linear Unbiased Prediction Approach
28
Fri.
George Washington University
Department of Statistics Seminar
Multi-Stage Sampling for Genetic Studies
28
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Text Data Mining in Defense Applications
October, 2007
4
Thur.
University of Maryland
Statistics Program Seminar
Two for the Price of One: Statistics in Natural Language Processing and Information Retrieval
5
Fri.
Georgetown University Seminar
The Statistical Challenge of Studies with Errors-in-Covariates When Only the Means are Modelled
12
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Finding the Fittest Curve for the Binary Classification Problem
16
Tues.
Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata
19
Fri.
Georgetown University Seminar
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
19
Fri.
George Washington University
Department of Statistics Seminar
Limitations of the Non-homogeneous Poisson Process (NHPP) Model for Analyzing Software Reliability Data
24
Wed.
Estimating the Measurement Error in the Current Population Survey Labor Force - A Mixture Markov Latent Class Analysis Approach
25
Thur.
Statistical Issues and Challenges Arising from Analysis of Genome-Wide Association Studies
30
Tues.
17th Annual Morris Hansen Lecture
Assessing the Value of Bayesian Methods for Inference About Finite Population Quantities
November, 2007
2
Fri.
Georgetown University Seminar
Multilevel Functional Principal Component Analysis
2
Fri.
George Washington University
Department of Statistics Seminar
Multiphase Regression Models for Assessing Highly Multivariate Measurement Systems
7
Wed.
Cell Lines, Microarrays, Drugs and Disease: Trying to Predict Response to Chemotherapy
8
Thur.
Introduction to Number Theory and Modeling the Average Running Time of Computer Programs
9
Fri.
George Washington University
Department of Statistics Seminar
Evaluation of Trace Evidence in the Form of Multivariate Data and Sample Size Estimation in a consignment
9
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Multi-modal Data and Text Mining
15
Thur.
University of Maryland
Statistics Program Seminar
An MM Algorithm for Multicategory Vertex Discriminant Analysis
16
Fri.
Georgetown University Seminar
Ranges of Association Measures for Dependent Binary Variables

Fri.
George Washington University
Department of Statistics Seminar
Sensitivity Analysis for Instrumental Variables Regression with Overidentifying Restrictions
16
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Handwriting Identification: Identifying the Writer of a Questioned Document Using Statistical Analysis
28
Wed.
The Effects of Active Duty on the Income of Reservists and the Labor Market Participation of Spouses
29
Thur.
Tests of Unit Roots in Time Series Data
29
Thur.
Analyzing Forced Unfolding of Protein Tandems via Order Statistics
December, 2007
5
Wed.
Evaluating Alternative One-Sided Coverage Intervals for an Extreme Binomial Proportion
6
Thur.
Evaluating Continuous Training Programs Using the Generalized Propensity Score
7
Fri.
Disparate Modes of Survey Data Collection
10
Mon.
Empirical Likelihood Based Calibration Method in Missing Data Problems
12
Wed.
Approaches to Reducing and Evaluating Nonresponse Bias, With Applications to Adult Literacy Surveys


Title: ROC Analysis of the Multiple-Biomarker Classifier Training and Testing Problem: The Influence Function and Specification of Uncertainties in ROC Summary Measures

  • Speaker:
    Waleed A. Yousef, D.Sc.,
    George Washington University and
    Center for Devices and Radiological Health (CDRH) FDA.
    wyousef@gwu.edu
  • Co-investigators:
    Robert F. Wagner, Ph.D.
    FDA Center for Devices and Radiological Health
    robert.wagner@fda.hhs.gov

    Murray H. Loew, Ph.D.
    George Washington University
    loew@gwu.edu
  • Chair: Robert F. Wagner, Ph.D.
  • Discussant: Grant Izmirlian, Ph.D., NCI Division of Cancer Prevention
  • Date/Time: Tuesday, January 9, 2007 / 12:30 to 2:00 p.m.
  • Location: NIH's Executive Plaza complex. Executive Plaza North, Conference Room 319, 6130 Executive Boulevard,Rockville, Maryland;pay parking is available. Check with security upon entry photo ID required.
  • Sponsor: WSS Section on Public Health and Biostatistics

Abstract:

One of the central biomedical issues for our time is the identification and fusion of multiple biomarkers for a specified diagnostic task. The fusion stage can be recognized immediately as a special case of the problem of statistical learning. That is, one trains a statistical learning machine (SLM) with cases whose health status or outcome is already known and then tests the learning machine on cases previously unseen. Almost all investigators of SLMs are familiar with early optimism, tempered by later experience. Assessment methods are needed that provide estimates not only of mean performance, but also of uncertainties associated with the finite size of the training and testing samples. Taking the work of Efron and Tibshirani as a point of departure, we have developed methods for calculating the statistical influence function for figures of merit based not only on probability of misclassification but also on the full receiver operating characteristic (ROC) or true-positive versus false-positive rate and several of its summary measures and their uncertainties. These methods have broad applicability across most diagnostic fields that plan to use multiple biomarkers and, in particular, are useful for designing a target database size based on a pilot study.

Return to top

Title: Parameter Estimation for the Exponential-Normal Convolution Model for Background Correction of Affymetrix GeneChip Data

  • Speaker: Monnie McGee, Ph.D.
    Assistant Professor
    Department of Statistical Science
    Southern Methodist University
    Dallas, Texas
  • Date: January 12, 2007
  • Time: 10:00-11:00 AM (refreshments will be served at 9:45)
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    New Research Building, E501 Conference Room
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Georgetown University

Abstract:

There are many methods of correcting microarray data for non-biological sources of error. Authors routinely supply software or code so that interested analysts can implement their methods. Even with a thorough reading of associated references, it is not always clear how requisite parts of the method are calculated in the software packages. However, it is important to have an understanding of such details, as this understanding is necessary for proper use of the output, or for implementing extensions to the model.

In this paper, the calculation of parameter estimates used in Robust Multichip Average (RMA), a popular preprocessing algorithm for Affymetrix GeneChip brand microarrays, is elucidated. The background correction method for RMA assumes that the perfect match (PM) intensities observed result from a convolution of the true signal, assumed to be exponentially distributed, and a background noise component, assumed to have a normal distribution. A conditional expectation is calculated to estimate signal. Estimates of the mean and variance of the normal distribution and the rate parameter of the exponential distribution are needed to calculate this expectation. Simulation studies show that the current estimates are flawed; therefore, new ones are suggested. We examine the performance of preprocessing under the exponential-normal convolution model using several different methods to estimate the parameters.

Return to top

Title: Considerations in Adapting Clinical Trial Design for Drug Development

  • Speaker: H.M. James Hung, Ph.D.
    Director, Division of Biometrics I
    Office of Biostatistics
    Office of Translational Sciences
    Center for Drug Evaluation and Research
    Food and Drug Administration
  • Date/Time: Friday, January 19, 2007 / 10:00 to11:00 a.m. (refreshments will be served at 9:45)
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW.
    New Research Building, E501 Conference Room
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Georgetown University

Abstract:

Enhancing flexibility of clinical trial designs is one of the hot topics nowadays. Proper adaptation of clinical trial design is one of the ways for achieving this goal and has drawn much attention from clinical trialists. In past decades, the classical design has been improved to allow the flexibility for terminating the trial early if the experimental treatment is proven effective or deemed harmful or futile, based on the data accumulating during the course of the trial. Statistical validity of such an enhanced design in terms of type I error is maintained. The operational aspects of this design can still be an issue but, by and large, there have been many good models for how to deal with these aspects. As the flexibility of trial design is enhanced further, the potential risk that the resulting trial may not be interpretable increases. In this presentation we shall share our review experience, discuss the many issues arising from use of more flexible designs and hopefully stimulate further research in this area.

Return to top

Title: Economic Turbulence in the U.S. Economy

  • Speaker: Julia Lane, NORC/University of Chicago
  • Discussants:
    Jared Bernstein, Economic Policy Institute
    Ralph Rector, The Heritage Foundation
  • Chair: Linda Atkinson, Economic Research Service, USDA
  • Date/time: Monday, January 29, 2007 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section
  • Presentation slides: Download the pdf (~250kb)

Abstract:

Turbulent change is the hallmark of the U.S. economy, and one of the reasons for its success. Every week, in every part of the economy, and in every corner of the country, some firms are shutting down and others are starting up, some jobs are being created and others are being destroyed, some workers are being hired and others are quitting or being laid off.

The presentation will summarize the analysis from a new book "Economic Turbulence" derived from the use of the LEHD data at the Census Bureau, as well as from interviews with firms and workers in each industry.

Three key topics will be discussed:

  1. Firm performance and survival: What is the relationship between workforce quality, turnover, and firm survival?
  2. Worker career paths: What impact do firms have on workers' career paths? What is the long run impact of firm stability and instability on a worker's earnings growth?
  3. Wage distribution: What has happened to worker earnings over time? What has happened to middle, low, and high income jobs? Do new firms pay more or less than old?
Return to top

Title: Adaptive "Simon" Designs for Heterogeneous Patient Populations in Phase II Cancer Trials

  • Speaker:
    Karen Messer, PhD
    Associate Professor
    Director of Biostatistics
    Moores UCSD Cancer Center
    University of California, San Diego
  • Date: February 2, 2007
  • Time: 10:00-11:00 AM (refreshments will be served at 9:45)
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    New Research Building, E501 Conference Room
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Georgetown University

Abstract:

In a Phase II cancer trial it may be advantageous to open enrollment to several patient populations, each with a very different null probability of response. For example in a trial of a novel therapeutic agent for relapsed Acute Myelogenous Leukemia (AML), patients in a first relapse may have a 30% probability of response under standard treatment, while patients in second relapse or higher may have only a 10% probability of response. These Phase II trials are generally uncontrolled (they often use "historical controls"), and the experimental agent may be expected to induce certain Grade 3 toxicities which would not be considered dose limiting. Furthermore, historically most of these Phase II trials can be expected to prove no better than standard-of-care. Phase II trials with these characteristics are usually designed with an early stopping rule which checks for initial evidence of efficacy after a first stage enrollment target is met. If there is insufficient evidence, the trial stops for futility. We discuss the standard two-stage optimal designs in this situation, and describe their operating characteristics under heterogeneous patient enrollment. These are compared to other approaches in the literature. Simple, approximately optimal designs which account for heterogeneity are presented. We recommend a practical adaptive design strategy which we have implemented at Moores UCSD Cancer Center.

Return to top

Title: Mortality in Iraq

  • Chair: Dr. Graham Kalton, Westat
  • Speakers:
    Dr. Gilbert Burnham, Center for Refugee and Disaster Response, Bloomberg School of Public Health, JHU
    Ms. Shannon Doocy, Center for Refugee and Disaster Response, Bloomberg School of Public Health, JHU
    Dr. Scott Zeger, Dept of Biostatistics, Bloomberg School of Public Health, JHU
  • Discussants: Jana Asher, AAAS and Dr. David Marker, Westat
  • Date: Tuesday, February 6, 2007
  • Time: 2:00 pm to 4:30 pm (please note the atypical start time, light refreshments will follow the seminar)
  • Location: Bureau ofLabor Statistics, Conference Center in Room 1 & 2. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology and Human Rights sections of the WSS, Science and Human Rights Program of AAAS, DC-AAPOR, and CASPA

Abstract:

In unstable situations, population based data are the most reliable method of estimating mortality and other health indicators. In many conflicts and fragile state settings, however, collecting such data is difficult to do. Aside from the physical dangers, there is often an incomplete understanding of population numbers, population locations, migration patterns, and health status of the population. That lack of understanding contributes to many methodological challenges. However, population based data are increasingly important in planning protection of and assistance to affected populations, as well as for reconstruction policy.

In Iraq wehave undertaken two population-based national surveys of mortality related to conflict using a cluster survey approach. The first covered the period from January 2002 until July 2004, using 33 clusters with 988 households and 7,868 persons. That survey estimated an excess mortality of over \,000 persons following the March 2003 invasion. The second survey covered the period from January 2002 until July 2006. That survey included 47 clusters containing 1,849 households and 12,801 persons. From that survey an excess mortality of 654,965 (CI 392 797-942 636) was estimated, with 601,027 deaths attributed to violent causes.

The presentations will discuss the methodological and ethical issues involved in conducting our research in Iraq.

Return to top

Title: Sequential Monitoring of Randomization Tests

  • Speaker: William F. Rosenberger, PhD, Professor and Chairman of Statistics, George Mason University
  • Date: March 1, 2007
  • Time: 10:00-11:00 AM (refreshments will be served at 9:45)
  • Location:
    Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    New Research Building, E501 Conference Room
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Georgetown University

Abstract:

Randomization provides a basis for inference, but it is rarely taken advantage of. We discuss randomization tests based on the family of linear rank tests in the context of sequential monitoring of clinical trials. Such tests are applicable for categorical, continuous, and survival time outcomes. We prove the asymptotic joint normality of sequentially monitored test statistics, which allows the computation of sequential monitoring critical values under the Lan-DeMets procedure. Since randomization tests are not based on likelihoods, the concept of information is murky. We give an alternate definition of randomization and show how to compute it for different randomization procedures. The randomization procedures we discuss are the permuted block design, stratified block design, and stratified urn design. We illustrate these results by reanalyzing a clinical trial in retinopathy.

Return to top

Title: New Methods and Satellites: A Program Update on the NASS Cropland Data Layer Acreage Program

  • Speaker: Rick Mueller, National Agricultural Statistics Service
  • Chair: Mike Fleming
  • Date/time: Wednesday, March 7, 2007 / 12:30 - 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources Section

Abstract:

The USDA/National Agricultural Statistics Service (NASS) annually produces remote sensing based crop specific classifications and acreage estimates over the major growing regions of the United States using medium resolution satellite imagery. The classifications are published in the public domain as the Cropland Data Layer (CDL) after the publication of the official release of county estimates. This program has mapped 24 total states since 1997 and is currently mapping 11 states annually (AR, IA, IL, IN, LA, MO, MS, ND, NE, WA and WI). This program previously used Landsat TM and ETM+ satellite imagery, the NASS June Agricultural Survey (JAS) segments for the ground truth information, and NASS public domain Peditor software for producing the classification and regression estimates. The unpredictability of the aging Landsat program assets, the labor intensive nature of digitizing June Agricultural Survey input for the Cropland Data Layer program, and the potential efficiency gains using commercial software warranted the need to investigate new program methods.

In 2004, NASS investigated alternative sensors to the Landsat platform, annually acquiring ResourceSat-1 Advanced Wide Field Sensor (AWiFS) data over the active Cropland Data Layer states. Additionally, evaluations were carried out on alternative ground truth methodologies to the June Agricultural Survey, using data collected through the USDA/Farm Service Agency (FSA) Common Land Unit (CLU) program. Testing and comparisons with regression tree See5 software against Peditor began in 2006 to produce the Cropland Data Layer. The goal was to determine which application was more efficient and delivered the most accurate estimates.

Accuracy assessments and acreage indications determined that the AWiFS significantly reduced the statistical variance of acreage indications from using the June Agricultural Survey area sampling frame, delivering a potential successor to the Landsat platform. In 2006, pilot testing was complete and the AWiFS sensor was selected as the exclusive source of imagery for the production of the Cropland Data Layer and acreage estimates. The Farm Service Agency Common Land Unit program provides a comprehensive national digitized and attributed GIS dataset collected annually for inclusion into programs like the Cropland Data Layer. Commercial image processing programs such as See5 were tested in 2006 against the AWiFS imagery and Common Land Unit datasets, providing evidence of efficiency gains in statistical accuracy, scope of coverage, and time of delivery.

Return to top

Title: Measurement and Statistical Analysis of Human Rights: A Model

  • Speaker:
    Brian J. Grim, Ph.D.
    Senior Research Fellow, Religion and World Affairs
    Pew Forum on Religion & Public Life
    1615 L Street, NW, Suite 700
    Washington, DC 20036
    bgrim@pewforum.org
  • Date/Time: Thursday, March 8, 2007/ 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsors: WSS Human Right's, AAAS, and DC-AAPOR

Abstract:

The study of human rights violations and the development of statistical models that can offer explanations are severely handicapped by a lack of adequate data. Most information on human rights is embedded in qualitative reports. Quantitative data that do exist tend to be limited to rough counts of violations or numeric indexes with little if any methodological transparency. This presentation will describe an extensive and rigorous coding project which uses the annual U.S. State Department's International Religious Freedom Reports as the primary information source and the procedures developed to check the coded data against alternative sources. The usefulness of these coded data will be demonstrated by testing an explanatory theory of religious persecution using structural equation modeling. The presentation will conclude with a discussion of how this research could be extended to the measurement and statistical analysis of other human rights.

Return to top

Title: Use of a Visual Programming Environment for Creating and Optimizing Mass Spectrometry Diagnostic Workflows

  • Speaker:
    Maciek Sasinowski, PhD
    CEO and Founder of INCOGEN, INC.
    Williamsburg, VA
  • Date/Time: Friday, March 16, 2007 / 10:00 - 11:00 a.m.
    (refreshments will be served at 9:45)
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    New Research Building, E501 Conference Room
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Georgetown University

Abstract:

The use of mass spectrometry for clinical applications has extraordinary potential for accurate, early, and minimally invasive diagnoses of complex diseases, such as cancer, which require sensitive diagnostic tools for prognosis and development of flexible treatment strategies. Unfortunately, current mass spectrometry data analysis options available to researchers often require improvised combinations of tools provided by instrument manufacturers, third-parties, and in-house development. The lack of unified interfaces to access existing resources presents a significant bottleneck in the research and discovery process.

In this seminar, we present a modular software tool for the analysis of mass spectrometry profiling data that aims to address this bottleneck. The modules that comprise the analysis workflows can be broadly classified into three categories: signal processing tools, variable selection algorithms, and classification utilities. The software tool provides a platform that allows researchers to construct, validate, and optimize classification workflows of serum samples analyzed with time-of-flight mass spectrometry. Our work suggests that this type of flexible and interactive architecture is highly useful for 1) the development of mass spectrometry workflows and 2) biomarker discovery and validation in clinical environments.

Return to top

Title: Bayesian Diagnostics for Detecting Hierarchical Structure

  • Speaker: Guofen Yan, University of Virginia
  • Chair: Donald Malec, U.S. Bureau of the Census
  • Date/Time: Friday, March 23, 2007 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

Motivated by an increasing number of Bayesian hierarchical model applications, we investigate several diagnostic techniques when the fitted model includes some hierarchical structure, but the data are from a model with additional, unknown hierarchical structure. We start by studying the simple situation where the data come from a normal model with two-stage hierarchical structure while the fitted model does not have any hierarchical structure, and then extend this to the case where the fitted model has two-stage normal hierarchical structure while the data come from a model with three-stage normal structure. Our investigation suggests two promising techniques: distribution of individual posterior predictive p values and the conventional posterior predictive p value with the F statistic as a checking function. Finally, we apply these two techniques to examine the fit of a model for data from the Patterns of Care Study, a two-stage cluster sample of cancer patients undergoing radiation therapy.

Return to top

Title: President's Invited Panel Discussion on Finite Population Correction Factors

Abstract:

It is common practice to use finite population correction factors (fpc) in estimating variances when sampling from a finite population. Various approximate fpcs are used with more complex designs sometimes. When the interest is in a wider population than the specific finite sampling frame, many argue that it suffices to drop the fpc from the variance estimates, but others maintain this is appropriate only in a limited number of contexts.

Return to top

Title: The Role of Context in the Recall of Minimally Counterintuitive Concepts

  • Speaker: Lauren O. Gonce, Bowling Green State University, Bowling Green, Ohio
  • Date/Time: March 27, 2007, 10:45 - 11:45 a.m.
  • Location: U.S. Census Bureau, 4600 Silver Hill Road, Seminar Room 5K410, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes. All visitors to the Census Bureau are required to have an escort to Seminar Room 5K410. An escort will leave from the Guard's Desk at the Metro Entrance (Gate 7) with visitors who have assembled at 10:30 a.m. and again at 10:40 a.m. Parking is available at the Suitland Metro.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

Counterintuitive concepts have been identified as major aspects of religious belief, and have been used to explain the retention and transmission of such beliefs. To resolve inconsistencies within this literature, three experiments were conducted to study the effect of context on recall. Context was found to be the key element affecting recall and the discrepancy among prior studies was resolved. The results imply that the nature of the surrounding context must be included in any account of the formation and transmission of religious concepts. A recent extension of this work involving type of context (science or religion) will also be introduced.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Applications of the Johnson SB Distribution to Environmental Data

  • Speaker: David T. Mage, (Retired), Institute for Survey Research, Temple University
  • Chair: Mel Kollander
  • Date/time: Wednesday, March 28, 2007 / 12:30 to 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Room 1. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources Section

Abstract:

In analyzing environmental data, it is common practice to assume that such data are from a 2-parameter lognormal if right skew and from a normal distribution if symmetrical. It is not generally recognized that the Johnson SB Distribution provides a continuum of distributions between the normal and lognormal distributions that constitute SB asymptotes. The Johnson SB transforms experimental data bounded by a minimum value (Xmin) and a maximum value (Xmax) into a normally distributed variable Y = ln [(x - Xmin) / (Xmax - x)] which is bounded as -infinity < Y < +infinity. As Xmax goes to +infinity and Xmin goes to 0, the distribution is asymptotically 2-parameter lognormal. As Xmax goes to +infinity and Xmin goes to -infinity the distribution is asymptotically normal.

Methods of objectively determining 4 optimal parameters for the SB distribution (Xmin, Xmax, mu, sigma) by the maximum likelihood estimation procedures are reviewed. Bruce Hill (1963) showed that the maximum likelihood solution for the three parameter lognormal yields degenerate and absurd solutions as Xmin goes in the limit to the minimum observation; the likelihood of the minimum observation tends to infinity, as the likelihood of all other observations tend to zero. Although somewhat surprising, Hill's result conforms with known general problems with likelihood methods when the support points of the probability distribution are a function of the parameters of the distribution, in this case the parameters Xmin and Xmax. Several modifications of the maximum likelihood methods are proposed. It is also shown that for the standard likelihood function a local maximum occurs within natural parameter space.

Different methods of resolving this problem are discussed along with other methods of obtaining the SB parameters, by fitting to 4 percentiles, by method of moments, and by a graphical technique that plots the data and minimizes the Kolmogorov-Smirnov statistic.

Return to top

Title: A Test of Association of a Two-Way Categorical Table for Correlated Counts

  • Speaker: Jai Choi, National Center for Health Statistics
  • Date/Time: March 29, 2007, 10:30 - 11:30 a.m.
  • Location: U.S. Census Bureau, 4600 Silver Hill Road, Seminar Room 5K410, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes. All visitors to the Census Bureau are required to have an escort to Seminar Room 5K410. An escort will leave from the Guard's Desk at the Metro Entrance (Gate 7) with visitors who have assembled at 10:15 a.m. and again at 10:25 a.m. Parking is available at the Suitland Metro.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

When the counts in a two-way categorical table are formed from the correlated members of a cluster, the common chi-squared test no longer applies. There are several approximate adjustments to the common chi-squared test. For example, Choi and McHugh (1989, Biometrics, 45) showed how to adjust the chi-squared statistic for clustered and weighted data. However, our main contribution is the construction and analysis of a Bayesian model that removes analytical approximation especially when the expected cell is empty or small. This is an extension of a standard multinomial Dirichlet model to include the intra-class correlation associated with the individual within a cluster. We have used the formula described by Altham (1976, Biometrika, 63) to incorporate the intra-class correlation. This intra-cluster correlation varies with the size of the cluster, but assume that it is the same for all clusters of the same size for the same variable. We use MCMC to fit our model, and to make posterior inference about the intra-class correlation and the cell probabilities. Also, using Monte Carlo integration with a binomial importance function, we obtain the Bayes factor for a test of no association. To demonstrate the performance of the alternative test and estimation procedure, we have used data on activity limitation status and age from the National Health Interview Survey and a simulation study.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Introduction to Data Mining Methodology for Statisticians

  • Chair: Meena Khare, NCHS
  • Speakers: Dr. Dan Steinberg, Salford Systems
  • Date/Time: Tuesday, April 10, 2007/ 12:30 p.m. to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center, room 10. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS
  • Presentation slides: Download the pdf (~2.9MB)

Abstract:

This presentation will introduce data mining methodology and address some of the common questions from statisticians about data mining. It will include a discussion of typical questions from statisticians about data mining. Sample questions include what is common to data mining and statistical analysis, what is the role of the statistician in the analysis and interpretation of results from data mining, how results are validated, how data mining came to be, datasets appropriate for data mining, and why have computer scientists led so much of the data mining development. Data mining is considered to be the application of modern, highly automated nonparametric analytical methods to recognize enduring patterns in data. Several of the major tools of data mining will be discussed, including decision trees (CART), artificial neural networks, multivariate adaptive regression splines (MARS), rule induction, RandomForests, Multiple additive regression trees (TreeNet/MART Stochastic Gradient Boosting) and several others. Finally, case study examples will be provided for a variety of data mining methods.

Return to top

Title: Wait! Should We Use the Survey Weights to Weight?

  • Speaker: Roderick J. Little, Richard D. Remington Collegiate Professor and Chair of the Department of Biostatistics at the University of Michigan Professor of Statistics and Research Professor, Institute for Social Research
  • Discussants:
    John Eltinge, Bureau of Labor Statistics
    Richard Valliant, Professor, JPSM
  • Time and Date: Friday, April 13, 2007, 3:30pm. There will be a reception immediately afterwards.
  • Location: 2205 Lefrak Hall, University of Maryland, College Park, MD 20742
  • Sponsor: University of Maryland, Statistics Program

Abstract:

The lecture will discuss the use of weights in survey inference. A fundamental idea in survey sampling is to weight cases by the inverse of their probabilities of inclusion, when deriving survey inferences. The weight indicates the number of population units the included case represents, and thus can be seen as a fundamental feature of the design-based survey inference. Modelers, on the other hand, seem more ambivalent about weighting, and argue that (at least in some settings) weighting is unnecessary. Dr. Little will discuss various perspectives and myths about survey weights. He will argue that, from a robust Bayesian perspective, weights are a key feature of the data that cannot be ignored, but weighting may not be the best way to use them.

Return to top

2006 ROGER HERRIOT AWARD

Title: Bridging: Roger Herriot's Time to the Present

  • Speaker: Nathaniel Schenker, National Center for Health Statistics
  • Chair: Dwight Brock, Westat
  • Presentation of the award by Daniel Weinberg, Census Bureau, Chair of the Herriot Award Committee
  • Date/Time: Monday, April 16, 2007 / 12:30 p.m. to 2:00 p.m.
    Note: there will be a reception immediately following the session.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Roger Herriot Award Committee (with representatives from the ASA Social Statistics and Government Statistics Sections and WSS)

Abstract:

In the 1980s, the Census Bureau and outside collaborators bridged the transition from the industry and occupation coding system for the 1970 census to that for the 1980 census by creating multiple imputations of 1980-system codes for 1970 census public-use samples. The imputation models were fitted using a relatively small "double-coded" (both 1970 and 1980 systems) sample from the 1970 census. This project had roots in the Population Division at the Bureau. Roger Herriot, as Chief of the Division, was very supportive of the project and contributed ideas to it, and the project was described in William Butz's 1995 ASA Proceedings article in memory of Herriot (http://www.amstat.org/sections/sgovt/outofbox.htm) as one of Herriot's major innovations at the Bureau. This talk will discuss the industry and occupation code project and statistical lessons learned from it. Two recent bridging projects at the National Center for Health Statistics, one addressing the transition from single-race to multiple-race reporting in Federal data collections and one adjusting for differences between self-reported and clinical data in surveys, will be discussed as well. The talk will highlight similarities and differences among the three bridging projects and will point out some outstanding methodological issues.

Return to top

Title: American Community Survey Weighting and Estimation: ACS Family Equalization

  • Speakers: Alfredo Navaro and Mark E. Asiala, Bureau of the Census
  • Chair: Michael Cohen, National Academy of Sciences
  • Discussant: Graham Kalton, Westat
  • Date/Time: Monday, April 23, 2007/12:30 p.m. to 2:00 p.m.
  • Location: Bureau ofLabor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Social and Demographic Statistics Section

Abstract:

Historically the American Community Survey (ACS) has produced inconsistent estimates of households and householders and inconsistent estimates of husbands and wives in married couple households even though logically these estimates should be equal. In the 2005 ACS, the size of these inconsistencies at the national level was approximately 3.7 million more householders than households and approximately 1.8 million more spouses than married-couple households. Likewise, for unmarried-partner households there are approximately 176,000 more unmarried-partners than unmarried-partner households. The cause of these data inconsistencies were rooted in the current person weighting methodology which was independent of the housing unit weighting and did not consider relationship to the householder. This paper describes the current weighting methodology and changes introduced to reduce these data inconsistencies while having a minimal impact on other estimates and on the variances of the estimates. A three-dimensional raking methodology is used where the marginal control totals are derived from the survey itself rather than an independent source for the first two dimensions related to equalizing spouses and householders. Changes in the estimation of housing unit characteristics are also discussed. Empirical results from the implementation of this new methodology are presented based on the 2004 and 2005 ACS data.

Return to top

Title: An Overview of the Semi-Competing Risk Problem

  • Speaker:
    Hongyu Jiang, PhD
    Assistant Professor of Biostatistics
    Harvard University School of Public Health
  • Date: Wednesday, May 2, 2007, 11:00 am
  • Location: Executive Plaza North, Conference Room G, 6130 Executive Boulevard, Rockville, Maryland
  • Map: http://www-dceg.ims.nci.nih.gov/images/localmap.gif

Abstract:

Semi-competing risks problem refers to a special bivariate time-to-event data structure, where one event is terminal and the other is non-terminal. Since the terminal event may censor the non-terminal event, we may only observe both events if the non-terminal event occurs earlier. This type of data frequently arise in studies of human health and behavior as multiple event times from subjects are routinely studied. The association between the two times and their marginal distributions may be of interest. In clinical trial setting or heterogeneous study population, covariate effect on either event time may be the focus. However, inference based on semi-competing risks data is often complicated by administrative censoring and potentially dependent censoring on the non-terminal event from the terminal event if the two event-times are associated. This talk will describe the unique feature of semi-competing risks data by comparing them with bivariate right censored time-to-event data and competing risks data, discuss identifiability issue and review on recent methodology advances for making inferences based on semi-competing risks data.

More Information about this and other talks sponsored by the Division of Cancer Prevention: http://www3.cancer.gov/prevention/pob/fellowship/colloquia.html

Return to top

Title: Systems Pharmacology of Type 2 Diabetes: A Case Study for Pharmaceutical Development

  • Speaker:
    Terry Ryan, PhD
    Senior Director, Chemical Biology
    Wyeth Pharmaceuticals, Collegeville, PA
  • Date: May 4, 2007
  • Time: 10:00-11:00 AM (refreshments will be served at 9:45)
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    LL Lombardi, Room 131
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Georgetown University

Abstract:

A keyissue in drug discovery is the appropriate use of animal models to study human disease and therapeutic drug response. Animal models have, in general, been used to mimic human disease based upon relatively few points of analogy. The richness of open discovery "omics" platforms allows a comprehensive measurement of disease and drug response across a range of analyte classes, allowing investigators to better understand the predictive value of animal models for human disease. In a study conducted by GlaxoSmithKline, we compared disease effects and treatment response in two mouse models of type 2 diabetes and in a parallel human study. Three registered medicines for diabetes (rosglitazone, metformin, and glyburide) were studied, and detailed measurements of transcripts, lipids, metabolites, and proteins were obtained in tissues and biofluids. Integrated data analysis using various multivariate techniques allowed for the generation of predictive fingerprints which shorten the time required to demonstrate treatment efficacy in diabetes trials, as well as allowing the identification of patients most likely to respond to a particular therapy form baseline measurements. In addition, analysis uncovered a previously unsuspected mechanism for rosiglitazone activity in diabetic adipose. The use of Systems Biology approaches with large "omic" datasets holds great promise for deeper understandings of disease biology and pharmacology.

Return to top

Title: The STATCOM Network: A Role for Students in Pro Bono Statistical Consulting to the Community

  • Speakers: Cherie Ochsenfeld and Gayla Olbricht, Purdue University
  • Discussant: Shail Butani, BLS
  • Date/Time: Tuesday, May 8, 2007 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center room 10, Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsors: WSS Human Rights, Quantitative Literacy, and General Methodology Programs
  • Who should attend: Students, faculty and professionals who are interested in statistical volunteer work in the community.

Abstract:

The Statistics in the Community (STATCOM) Network is a graduate student-run consulting service that provides free statistical consulting to local governmental and nonprofit community groups. A need for statistical expertise in the local community was identified by a graduate student at Purdue University who founded STATCOM in 2001. Students who participate in STATCOM work in teams on community projects, while applying classroom knowledge and gaining marketable skills.

STATCOM also has a P-12 Outreach component, which serves as an effort to increase interest and achievement in statistics among pre-college students by involvement in community events and classrooms. STATCOM, through a Strategic Initiatives Grant from the American Statistical Association, is currently developing a network across institutions of students devoting time to pro bono statistical consulting. This talk will cover the structure of the STATCOM Network, from a national and local level. In addition, this talk will address how the STATCOM Network can help fill a niche in pro bono statistical efforts and be supported by professional statisticians.

This is joint work with Alexander E. Lipka, Amy E. Watkins and Nilupa S. Gunaratna.

Cherie A. Ochsenfeld

Cherie A. Ochsenfeld received a B.S. in Mathematics/Economics, a M.A. in Teacher Education, and a California Teaching Credential in Mathematics from the University of California, Los Angeles. She received a M.S. in Applied Statistics from California State University, Hayward, a M.S. in Mathematical Statistics and is currently a Ph.D. student in Statistics at Purdue University. Her research interests include statistical genetics, nonparametric statistics, and QTL analysis. Cherie is the current Director of STATCOM at Purdue University and has served within the organization for three years.

Gayla R. Olbricht

Gayla R. Olbricht received a B.S. in Mathematics from Missouri State University. She received a M.S. in Applied Statistics and is currently a Ph.D. student in Statistics at Purdue University. Her research interests include statistical genetics, hidden Markov models, and epigenomics. Gayla is the current Student Advisor of STATCOM at Purdue University and has served within the organization for four years.

Shail Butani

Ms. Butani is Chief of the Statistical Methods Staff in the Office of Employment and Unemployment Statistics, U.S. Bureau of Labor Statistics (BLS). She received both her B.A. and M.A. in mathematical statistics from George Washington University. Last year, she was one of the organizers of the ASA Special Interest Group for Volunteers.

In the early 1990's to mid 1990's, she led a very successful quantitative literacy (QL) effort for Washington Statistical Society particularly in the Fairfax County, VA. Major activities were: 1) Conducted and organized speakers and materials for career days for over 100 math classes each year. 2) Participated and provided consultants for QL workshops conducted by ASA for local teachers. 3) Provided statisticians to assist in developing math curricula for Fairfax County Public Schools. 4) Conducted and provided statisticians for elementary schools teachers' workshops. 5) Presented materials at Female Achieving Mathematics Equity (FAME) project. 6) Provided speakers for Girls Excelling in Math and Science (GEMS) programs. 7) Conducted and provided consultants for girls scouts' workshops.

Return to top

Title: Using the t-distribution to Deal with Outliers in Small Area Estimation

  • Speakers: William R. Bell and Elizabeth T. Huang, U.S. Census Bureau
  • Chair: Donald Malec, U.S. Census Bureau
  • Discussant: Alan M. Zaslavsky, Department of Health Care Policy, Harvard Medical School
  • Date/Time: Thursday, May 10, 2007 / 1:30 to 3:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Section

Abstract:

Small area estimation using linear area level models typically assumes normality of the area level random effects (model errors) and of the survey errors of the direct survey estimates. Outlying observations can be a concern, and can arise from outliers in either the model errors or the survey errors, two possibilities with very different implications. We consider both possibilities here and investigate empirically how use of a Bayesian approach with a t-distribution assumed for one of the error components can address potential outliers. The empirical examples use models for U.S. state poverty ratios from the U.S. Census Bureau's Small Area Income and Poverty Estimates program, extending the usual Gaussian models to assume a t-distribution for the model error or survey error. Results are examined to see how they are affected by varying the number of degrees of freedom (assumed known) of the t-distribution. We find that using a t-distribution with low degrees of freedom can diminish the effects of outliers, but in the examples discussed the results do not go as far as approaching outright rejection of observations.

Return to top

Title: Confidence Interval Coverage in Model-Based Estimation

Abstract:

When there is a strongly related auxiliary variable, model-based estimation can yield more precise estimates from smaller samples. Assumptions are made to build the models, produce estimates, and calculate confidence intervals. The first talk explores confidence interval coverage with deep stratification under scenarios when the assumptions are not quite correct, such as failing to assume correct scedasticity, recognize curvature, or incorporate an intercept. In these settings confidence interval coverage can be poor, robust, or ultra conservative. The second talk explores confidence interval coverage and Satterthwaite's approximation to the degrees of freedom when two or more model based estimates are summed in complex sample designs.

Return to top

Title: The Role of Statistics and Statisticians in Human Rights

  • Speakers:
    David Banks Duke University
    Gary Shapiro, Westat
    Paul Zador, Westat
  • Chair: Ariela Blätter, Director, Crisis Preparedness and Response Center, Amnesty International
  • Discussant: Erik Voeten, Department of Political Science, Elliott School of International Affairs, The George Washington University
  • Date/Time: Thursday, May 17, 2007 / 12:30 to 2 p.m.
  • Location:
    AAAS Headquarters Auditorium
    1200 New York Ave NW
    Washington DC 20005
    Note the location change. To RSVP, please submit your name, professional affiliation, and the title of the seminar to shrp@aaas.org (even if your name is already on the BLS visitor list).
  • Sponsors:
    Human Rights Section, Washington Statistical Society (WSS)
    Science and Human Rights Program of the American Association for the Advancement of Science (SHR-AAAS)
    Washington-Baltimore Chapter of the American Association for Public Opinion Research (DC-AAPOR)
    Capitol Area Social Psychological Association (CASPA)
    District of Columbia Sociological Society (DCSS)
    District of Columbia Psychological Association (DCPA)
  • Who Should Attend: Human rights activists, sociologists, psychologists, political scientists, survey researchers, researchers of unsettled populations, and statisticians.

Abstract:

This seminar, designed with human rights practitioners in mind, outlines some examples of situations in which statisticians were asked to contribute to human rights projects. Our hope is to allow networking between the statistical community and the human rights community so that the unique contributions that statisticians can make towards human rights advocacy will be utilized in the future.

David Banks - A Katrina Experience

In 2005 the NSF sponsored a number of research projects on the aftermath of Katrina. This talk describes a survey led by Duke, UNC-Charlotte, and Tulane to study the factors that affected whether or not New Orleans residents chose to evacuate in advance of the storm, and what factors affected their post-Katrina experience. As part of this effort we found that some aspects of classic survey methodology do not work well with unsettled populations, and we developed workarounds that often were surprisingly successful.

Gary Shapiro - Guatemala Police Records

Several warehouses were discovered in Guatemala that contain millions of documents belonging to the National Police prior to 1996. The documents are of interest because some provide information on instances of police violence. The Human Rights Data Analysis Group at Benetech was asked to provide technical assistance for understanding and analyzing the archives. In turn, a group of ASA members provided assistance to Benetech on how sampling of these documents could be done. This talk discusses the complex structure of the archives, the sampling that is now being done, and the type of assistance provided to Benetech.

Paul Zador - Darfur What Could Have Been

Several estimates of deaths during the Darfur crisis will be summarized. The methods used to derive them, and their reliability, will be reviewed and critiqued based in part on comments recently published in GAO's report on the Darfur crisis. The question will be raised: How do we determine the practical difference having precise disaster estimates of deaths, hunger, injuries, etc. might make? A volunteer group designed a survey of refugee camps in Chad, but the survey was never conducted. We will describe the survey's design, and discuss why it never happened.

Return to top

Title: Characterization, Modeling and Management of Inferential Risk, Data Quality Risk and Operational Risk in Survey Procedures

  • Chair: Dr. Nathaniel Schenker, NCHS
  • Speaker: Dr. John Eltinge, Bureau of Labor Statistics
  • Discussant: Dr. Fritz Scheuren, NORC, University of Chicago
  • Date/Time: Tuesday, May 22, 2007/12:30 p.m. to 2:00 p.m.
  • Location:Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

This paper explores someconceptual and methodological issues that are important in the design, operation and analysis of large-scale government surveys. We view the design of survey procedures (including initial planning, sample design, data collection, inference and dissemination) as a mixture of optimization and risk management efforts in the presence of constraints and incomplete information. This in turn suggests several potentially rich areas for research in mathematical and applied statistics.

Five topics receive principal attention, beginning with some relatively well-defined technical issues and then expanding to several broader topics related to data quality and risk management. First, a review of the goals, constraints and risk profiles of survey practice suggests a spectrum of potential approaches to survey work, ranging from rigorously predetermined survey procedures at one extreme to highly exploratory analyses of previously collected data at the other extreme. Classical randomization-based procedures are arguably compatible with a mandate for predetermined methodology. Nonetheless, these procedures have limitations arising from efficiency issues, the presence of nonsampling error, and prospective inferential interest beyond the finite population that was sampled. These limitations lead to review of a second class of approaches to the analysis of survey data, based on models for survey variables, auxiliary variables and nonsampling error processes. Third, we use the framework of risk management to explore six dimensions of survey data quality suggested in Brackstone (1999): accuracy (incorporating all of the components of error considered in standard models for total survey error), timeliness, relevance, interpretability, accessibility and coherence. Fourth, we expand our discussion of risk management by considering operational risk, i.e., the risk that one or more steps in a survey procedure may not be carried out as specified. Finally, we note that work with large-scale surveys will involve a mixture of statistical science and statistical technology, and we suggest that the literature on adoption and diffusion of technology can offer important insights into the distribution of expectations, utility functions and behaviors of large survey organizations, data analysts and other data users.

Return to top

Special WSS Session: Book Signing and Wine Tasting

  • Date/Time: June 12, 2007 Usual time 12:30 pm to 2:00 pm
  • Unusual Location: Reiter's Book Store SW Corner 20th and K Streets
  • Speakers: Tom Herzog, Fritz Scheuren, and Bill Winkler announce their just released Springer book Data Quality and Record Linkage Techniques

They invite all of you to celebrate with them. All the authors are longtime WSS members and will each say a few words about the book even signing copies if requested.

Reiter's Book Store, a Washington Landmark for over 60 years, is hosting this special event. There will be wine and cheese as long at it lasts.

Easyto get too, Reiter's' is on 20th street at the Southwest corner of 20th and K. Just two blocks on K street from the Farragut West Metro or 4 blocks down 20th from the Metro at Dupont Circle.

Return to top

Title: The Role of Fringe Benefits in Employer and Workforce Dynamics

  • Speaker: Anja Decressin, Employee Benefits Security Administration, Department of Labor
  • Discussant: Keenan Dworak-Fisher, Bureau of Labor Statistics
  • Chair: Linda Atkinson, Economic Research Service, USDA
  • Date/time: Wednesday, June 13, 2007 / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section

Abstract:

This paper examines how the evolution of a firm's human capital stock is related to firms' benefit choices using integrated data on firms, their employees, and their benefit offerings from the Census Bureau's Longitudinal Employer-Household Dynamics Program and from IRS Form 5500. It then estimates the relationship between compensation packages and firm productivity and survival, controlling for workforce characteristics. The authors find that firms that offer benefits have significantly lower turnover rates and faster growth rates. Benefit-offering firms have higher labor productivity and higher survival rates, even when controlling for firm and workforce characteristics and the level of wage compensation. Greater labor productivity explains some but not all of the differences in survival rates.

Return to top

Title: Spatial Association Between Speciated Fine Particles and Mortality

  • Chair: Myron J. Katzoff, PhD, Centers for Disease Control and Prevention (CDC), National Center for Health Statistics, Office of Research and Methodology
  • Speaker: Sujit Ghosh, PhD, Statistics Department, North Carolina State University
  • Date/time: Thursday, June 21, 2007 / 2:00 to 3:30 p.m.
  • Location: CDC,National Center for Health Statistics (NCHS), Metro 4 Building, Conference Room 1406. To be placed on the seminar list attendance list at NCHS you need to e-mail your name and affiliation to Frances D. Chichester-Wood at FChichester- Wood@cdc.gov by noon at least 2 days in advance of the seminar or call 301-458-4606 and leave a message. Bring a photo ID to the seminar. If you are not a U.S. citizen, you must contact Ms. Chichester-Wood at least two (2) weeks prior to the seminar. Directions: The NCHS building is located at 3311 Toledo Road, Hyattsville, Maryland 20782. See the NCHS Web site for maps of the area, directions for getting to NCHS (including Metro information), and parking information: http://www.cdc.gov/nchs/about/hyatdir.htm
  • Sponsor: Office of Research and Methodology, NCHS/CDC and WSS Methodology Program

Abstract:

articulate matter (PM) has been linked to a range of serious cardiovascular and respiratory health problems, including premature mortality. The main objective of our research is to quantify uncertainties about the impacts of fine PM exposure on mortality. A multivariate spatial regression model is developed for the estimation of the risk of mortality associated to fine PM and its components across all counties the coterminous United States. Different sources of uncertainty in the data and model are explored using the spatial structure of the mortality data and the speciated fine PM. A flexible Bayesian hierarchical model is proposed for a space-time series of counts (mortality) by constructing a likelihood-based version of a generalized Poisson regression model that combines methods for point-level misaligned data and change of support regression. Our results seem to suggest an increase by a factor of two in the risk of mortality due to fine particles with respect to coarse particles. Our study also shows that in the Western United States, the nitrate and crustal components of the speciated fine PM seem to have more impact on mortality than the other components. On the other hand, in the Eastern United States, sulfate and ammonium explain most of the PM fine effect.

Return to top

Title: Robust Prediction of Small Area Means and Distributions

  • Speaker: Prof. Ray Chambers, University of Wollongong
  • When: 12:30 - 2:00 p.m., Wednesday, June 27
  • Where:
    Room 2, Bureau of Labor Statistics (BLS) Conference Center (G440)
    2 Massachusetts Ave NE
    Washington, DC 20212
  • Sponsor: Bureau of Labor Statistics

Abstract:

Small area estimation techniques typically rely on mixed models containing random area effects to characterise between area variability. In contrast, Chambers and Tzavidis (2006) describe an approach to small area estimation based on regression M-quantiles. This approach avoids conventional Gaussian assumptions and problems associated with specification of random effects, allowing between area differences to be characterized by the variation of area-specific M-quantile coefficients. However, the resulting M-quantile predictors of small area means can be biased. In this talk I will describe a general framework for robust bias adjusted small area prediction that corrects this problem, and is based on representing a small area predictor as a functional of the Chambers and Dunstan (1986) predictor of the within area distribution function of the target variable. An important advantage of this framework is that it allows integrated prediction of small area means and quantiles. I will demonstrate the usefulness of this framework through both model-based as well as design-based simulation, with the latter based on two realistic survey data sets containing small area information. The talk also includes an application of the bias adjusted M-quantile approach to predicting key percentiles of district level distributions of per-capita household consumption expenditure in Albania in 2002.

Return to top

Title: Estimation under Ignorable Response Mechanism and Unweighted Imputation

  • Chair: John Eltinge, Bureau of Labor Statistics
  • Speaker: Santanu Pramanik, the Joint Program in Survey Methodology, University of Maryland
  • Discussant: Yves Thibaudeau, U.S. Census Bureau
  • Date/Time: Wednesday, July 11, 2007 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

In many surveys, unweighted imputation methods are employed because of the unavailability of survey weights at the time of imputing missing survey data. In such situations, it is well known that certain customary design-based estimators with imputed data generally are biased even under the usual uniform response mechanism assumption. In this paper, we present the expression of the bias of a design-based estimator under more realistic ignorable response mechanism and then use this expression to propose a bias-corrected estimator. The second part of the paper deals with a variance estimator that captures different sources of uncertainties. Both theory and results from a Monte Carlo simulation study are presented to justify our approach.

Keywords: ratio imputation, bias-adjusted estimator, variance estimation, small area estimation

Return to top

Title: Assessment of Coverage and Utility of Residential Address Lists

  • Chair: Meena Khare, NCHS
  • Speakers: Sylvia Dohrmann (Westat) and Stephanie Eckman (NORC)
  • Date/Time: Wednesday, July 18, 2007 / 12:30 p.m. to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

Coverage and Utility of Purchased Residential Address Lists: A Detailed Review of Selected Local Areas. Sylvia Dohrmann

Recently there has been much interest in using address lists originating from the United States Postal Service (USPS) as area sampling frames in place of on-site enumerations of dwelling units. While it has become clear that purchased USPS lists are less costly than the process of on-site enumeration, it is still unclear as to whether these lists are adequate as substitutes for them. In this presentation, we compare the coverage of purchased lists for a selection of PSUs (Primary Sampling Units), differing in size and composition, compared to area sample frames created using on-site enumeration. We will examine the coverage of the USPS lists by comparing them to enumerated lists and review what type of areas are more completely covered by the USPS lists. We will also demonstrate how the extent to which the addresses on the purchased lists can be geocoded relates to their usefulness as the basis for area sampling frames.

Suitability of the USPS Delivery Sequence File as a Commercial-Building Frame. Stephanie Eckman, Michael Colicchia, Colm O'Muircheartaigh, NORC.

The USPS Delivery Sequence File (DSF) has proven to be an accurate and low-cost frame for household surveys. However, no research organization has evaluated the use of the DSF as a frame of non-residential buildings. Given the success that we and other organizations have had using the DSF as a household frame, we are optimistic that the database will provide good coverage of non- residential buildings as well. But we must assess its accuracy and coverage. We have conducted such an assessment in eleven segments across the county. For each segment, we have both a recent field listing of commercial buildings as well as the DSF database of non-residential delivery points. We will compare the two frames, presenting match rates and maps showing the discrepancies between the frames.

Return to top

Title: Imputation Using Empirical Likelihood

  • Chair: Clyde Tucker, Bureau of Labor Statistics
  • Speaker: JunShao, ASA/NSF/Census Bureau Research Fellow Department of Statistics, University of Wisconsin-Madison
  • Date/Time: Tuesday, July 24, 2007 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS
  • Sponsor: Bureau of Labor Statistics

Abstract:

Imputation isone of the most popular methods in dealing with nonrespondents in survey problems. In this presentation I focus on the use of empirical likelihood method in imputation that leads to more efficient and/or robust imputation than other methods such as the parametric regression imputation, nonparametric kernel imputation, and random hot deck imputation. More specifically, (1) an empirical likelihood imputation method using information provided by covariates and the propensity function is introduced to produce efficient and doubly robust estimators of population means; (2) an empirical likelihood method is introduced for creating imputation cells in hot deck random imputation where imputation cells are constructed using a categorical covariate; (3) an empirical likelihood method is studied in the case of non-ignorable nonrespondents with either categorical or continuous covariates. Simulation results are presented to show the efficiency and robustness properties of the proposed methods.

The work of Jun Shao was generously supported by grant DMS-0404535 from the National Science Foundation: Methodology, Measurement, and Statistics Program in the Division of Social and Economic Sciences.

Return to top

Title: A Geostatistical Approach to Linking Geographically-Aggregated Data/A System for Detecting Arbitrarily Shaped Hotspots

Abstracts:

1. A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources
Carol A. Gotway Crawford,Office of Workforce and Career Development, CDC; and Linda J. Young, Department of Statistics, University of Florida, Gainesville, FL USA

The widespread availability of digital spatial data and the capabilities of Geographic Information Systems (GIS) make it possible to easily synthesize spatial data from a variety of sources. More often than not, data have been collected at different geographic scales, and each of the scales may be different from the one of interest. Geographic information systems effortlessly handle these types of problems through raster and geoprocessing operations based on proportional allocation and centroid smoothing techniques. However, these techniques do not provide a measure of uncertainty in the estimates and lack the ability to incorporate important covariate information that may be used to improve the estimates. They also often ignore the different spatial supports (e.g., shape and orientation) of the data. On the other hand, statistical solutions to change of support problems are rather specific and difficult to implement. In this presentation, we present a general geostatistical framework for linking geographic data from different sources. This framework incorporates aggregation and disaggregation of spatial data, as well as prediction problems involving overlapping geographic units. It explicitly incorporates the supports of the data, can adjust for covariate values measured on different spatial units at different scales, provides a measure of uncertainty for the resulting predictions, and is computationally feasible within a GIS. The new framework we develop also includes a new approach for simultaneous estimation of mean and covariance functions from aggregated data using generalized estimating equations.

2. Upper Level Set Scan Statistic System for Detecting Arbitrarily/span>
Shaped Hotspots by Reza Modarres, Professor and chair, Dept of Statistics at GWU

The Upper Level Scan Statistic (ULS), its theory, design and implementation, and its extension to the bivariate data are discussed. We provide the ULS-Hotspot algorithm that maintains a list of connected components of the rate surface at each level of the ULS tree. The tree is grown in the immediate successor list, which provides a computationally efficient method for likelihood evaluation, visualization and storage. An example shows how the zones are formed and the likelihood function is developed for each candidate zone. The general theory of bivariate hotspot detection is discussed, including the bivariate binomial and Poisson models and the multivariate exceedance approach. We propose the joint and intersection methods for detecting bivariate hotspots and study the sensitivity of the joint hotspots to the degree of association between the variables. We investigate the hotspots in two diverse applications, one in Microbial Risk Assessment and the other in Mapping of Crime hotspots.

Return to top

Title: Modeling Multiple-Response Categorical Data From Complex Surveys

  • Chair: Robert E. Fay, III, Census Bureau
  • Speakers: Christopher R. Bilder, University of Nebraska-Lincoln, and Thomas M. Loughin, Simon Fraser University
  • Date/Time: Friday, September 7, 2007, 12:30 Ð 2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS
  • Presentation material (slides, pdf, ~2mb)

Abstract:

Although "choose all that apply" questions are common in modern surveys, methods for analyzing associations among responses to such questions have only recently been developed. These methods are generally valid only for simple random sampling, but many "choose all that apply" and related questions appear in surveys conducted under more complex sampling plans. The purpose of this talk is to provide statistical analysis methods that can be applied to "choose all that apply" questions in complex survey sampling situations. Loglinear models fit to marginal data are used to describe associations among the multiple responses that occur with this type of data. Model comparison test statistics along with their asymptotic distributions are presented in order to choose a good fitting model. Estimates of odds ratios and their corresponding standard errors are provided in order to measure associations among responses.

Return to top

Title: Bayesian Methods for Proteomic Biomarker Discovery Using Functional Mixed Models

  • Speaker:
    Jeffrey S. Morris, PhD, Associate Professor
    University of Texas, MD Anderson Cancer Center
  • Date: September 7, 2007
  • Time: 10:00-11:00 AM
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    Martin Marietta Conference Room
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics.
  • Sponsor: Georgetown University

Abstract:

Various proteomic assays yield spiky functional data, for example MALDI-TOF and SELDI-TOF yield one-dimensional spectra with many peaks, and 2D gel electrophoresis and LC-MS yield two-dimensional images with spots that correspond to peptides present in the sample. In this talk, I will discuss how to identify candidate biomarkers for various types of proteomic data using methods based on the Bayesian wavelet-based functional mixed models. This approach models the functions in their entirety, so avoid reliance on peak or spot detection methods. The flexibility of this framework in modeling nonparametric fixed and random effect functions enables it to model the effects of multiple factors simultaneously, allowing one to perform inference on multiple factors of interest using the same model fit, while adjusting for clinical for experimental covariates that may affect both the intensities and locations of the peaks and spots in the data. I will demonstrate how to identify regions of the functions that are differentially expressed across experimental conditions, in a way that takes both statistical and clinical significance into account and controls the Bayesian false discovery rate to a pre-specified level. Time allowing, I will also demonstrate how to use this framework as the basis for classifying future samples based on their proteomic profiles in a way that can also combine information across multiple sources of data, including proteomic, genomic, and clinical, and may also discuss improvements of the modeling framework that result in more robust inference. These methods will be applied to a series of proteomic data sets from cancer-related studies.

Return to top

Title: Experiences with Congressional Testimony: Statistics and The Hockey Stick

  • Speaker:
    Yasmin H. Said
    Department of Computational and Data Sciences
    George Mason University
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: September 7, 2007
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 301, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

Rarely does the federal government need advice on theoretical statistics. I would like to talk about one exception. Efforts to persuade Congress to enact legislation that affects public policy are constantly being made by lobbyists who are paid by special interests. While this mode of operation is frequently extremely effective for achieving the goals of the special interest groups, it often does not serve the public interests in the best possible way. As counterpoint to this mode of operation, pro bono interaction with individual legislators and especially testimony in Congressional hearings can be remarkably effective in presenting a balanced picture. The debate on anthropogenic global warming has in many ways left scientific discourse and landed in political polemic. In this talk I will discuss our positive and negative experiences in formulating testimony on this topic.

Return to top

Title: An Introduction to the Key National Indicators Initiative: the State of the USA

  • Chair: Edward Sondik, National Center for Health Statistics
  • Presenters:
    Christopher Hoenig, IBM
    Robert Groves, University of Michigan/JPSM
    Jane Ross, National Research Council, The National Academies
  • Date/Time: Wednesday, September 12, 2007 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center Room 1. To be placed on the seminar. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Public Policy Section, WSS and DC-AAPOR

Abstract:

Several countries around the world have developed organized systems of statistical indicators that are used to inform civil discourse, to track the change in basic economic, social, and environmental statuses of the country. These key national indicator systems have audiences that are both the policy makers in central and local governments but also interested citizens.

The State of the USA is envisioned to be a web-based resources permitting user-friendly presentation of key indicators at national and subnational levels. It will have explicit quality criteria and interest thresholds that inform what indicators are contained in the system. It will include official government statistics, private sector statistics, and academic statistics.

The State of the USA is currently funded by grants from several private foundations and is being incubated in the National Academies.

This WSS session will provide an introduction to the inception and development of the State of the USA, its basic goals, and its emergent organization. A demonstration of a test web site, illustrating some of the features of the indicator presentation will be given.

Return to top

Title: New Experiments on the Design of Complex Survey Questions

  • Chair: Adam Safir, Bureau of Labor Statistics
  • Presenters:
    Paul Beatty, National Center for Health Statistics
    Floyd J. Fowler, University of Massachusetts, Boston
  • Discussant: Gordon Willis, National Cancer Institute
  • Date/Time: Wednesday, September 12, 2007 / 3:30 to 5:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center Room 1. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: DC-AAPOR and Methodology Section, WSS

Abstract:

Survey researchers often need their questions to convey very specific information to respondents for example, questions may include complex definitions, instructions to include or exclude various considerations while answering, and a particular set of closed-ended responses. Although questionnaire design principles provide some advice on constructing complex questions, little empirical evidence demonstrates the superiority of certain decisions over others. For example, in some questions, important respondent instructions "dangle" after the core question has been asked; one alternative is to provide such definitions before asking the core question.

We have conducted several rounds of RDD telephone surveys with split-ballot experiments to explore such issues. This seminar reports on the latest round of 425 interviews conducted via an RDD telephone survey, in which respondents received alternative versions of various survey questions. For example, in some experiments, alternative questions used the same words but were structured differently. Other experiments compared the use of examples vs. definitions to explain complex concepts, compared the use of one vs. two questions to measure the same phenomenon, and compared questions before and after cognitive interviews had been used to clarify key concepts. With permission, interviews were tape recorded and behavior-coded, making it possible to compare various interviewer and respondent difficulties across question versions, in addition to comparing differences in response distributions.

Taken in conjunction with findings from previous rounds of experiments, the results begin to suggest some general design principles for complex questions. For example, the disadvantages of "dangling qualifiers" are becoming clear, as are the advantages of using multiple questions to disentangle certain complex concepts. The seminar will report results of these and other experimental comparisons, with an eye toward providing more systematic questionnaire design guidance.

Following the seminar, all are welcome and invited to attend a social hour at Capitol City Brewing Company, located in the same building as the talk.

Return to top

Topic: Unduplicating the 2010 Census

  • Speaker: Michael Ikeda and Edward Porter, Statistical Research Division, U.S. Census Bureau
  • Date/Time: September 18, 2007, 9:30 - 11:00 a.m.
  • Location: U.S. Census Bureau, 4600 Silver Hill Road, Seminar Room 5K410, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes. All visitors to the Census Bureau are required to have an escort to Seminar Room 5K410. An escort will leave from the Guard's Desk at the Metro Entrance (Gate 7) with visitors who have assembled at 9:15 a.m. and again at 9:25 a.m. Parking is available at the Suitland Metro.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

The current plan for the 2010 Census includes a nationwide unduplication operation. One potential problem is the possibility of large numbers of false positives. To help evaluate the extent of this problem, the unduplication procedures have been run on the data from the 2000 Census.

The first section of the talk describes a simple approach to take full advantage of multiple processors through writing C programs and using basic UNIX commands. This section also describes the programming concepts used in unduplicating the entire country using BigMatch and the SRD Matcher. It also describes metaprogramming techniques used in this large system and documents some of the errors and problems made during development. One example involves keeping track of all files so that multiple runs do not interfere with each other. A similar system is currently expected to be used for unduplication of the 2010 census on production machines.

The second section of the talk gives an overview of the results of our analysis. Most of the problem with apparent false matches seems to be concentrated in the most common surnames and the most common Hispanic surnames, especially for matches outside the state. Name frequency does not seem to have much effect when there are multiple links of reasonable quality between housing units or when the phone number matches.

This event is accessible to persons with disabilities. Please direct all requests for sign language interpreting services, Computer Aided Real-time Translation (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY), or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Survey Methodology for Assessing Geographically Isolated Wetlands Map Accuracy

  • Speaker: Breda Munoz, RTI International
  • Chair: Mel Kollander
  • Date/Time: Wednesday, September 19, 2007 / 12:30 - 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources Section

Abstract:

Wetlands provide significant environmental benefits such as assimilation of pollutants, flood water storage, water recharge and fish and wildlife habitat. Geographically isolated wetlands (GIW) can provide the same benefits as wetlands in general, and are particularly vulnerable to losses from urbanization and agriculture precisely because they are geographically isolated and have varying amounts of regulatory protection. Currently, there is not a dependable and cost-effective method to generate an accurate GIW map without sending a field scientist to perform surveys or requiring image technicians to perform heads-up digitalization of aerial photography. By using statistically valid estimates of accuracy rates one can evaluate the quality of the information contained in GIW maps. Accuracy rates are used to describe the misclassification errors of the maps. A probability sampling survey methodology that balances statistical considerations, expert opinion and operational considerations is proposed for assessing the accuracy of GIW maps. The proposed sampling design is based on a stratified multi-stage sampling design that addresses sampling size requirements for the different strata and types of GIWs and also recognizes the need for spatial coverage while minimizing operational efforts. Expressions for design-based accuracy estimates and an estimate of the number of GIW, as well as their corresponding variances are also provided.

A simulation exercise is used to illustrate the proposed sampling methodology. A GIW map for Brunswick County in North Carolina, created using historical data was used as the sampling frame. The GIW map was created from a combination of satellite imagery, classification tools to process the imagery and auxiliary information. The sampling methodology was used to randomly select sites from this GIW map. An updated GIW map for the same counties showing exact location of GIW was used to provide "ground-truth" observations from wetland delineations approved by the US Army Corps of Engineers. Accuracy estimates was calculated by comparing site classification differences obtained by using both the original and updated GIW maps. Survey based accuracy estimates and their corresponding variance estimates were calculated.

Return to top

Title: A Geometric Approach to Comparing Treatments for Rapidly Fatal Diseases

  • Speaker:
    Peter Thall, PhD, Professor
    University of Texas, MD Anderson Cancer Center
  • Date: September 7, 2007
  • Time: 10:00-11:00 AM
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    Research Building, Conference Room E501
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics.
  • Sponsor: Georgetown University

Abstract:

In therapy of rapidly fatal diseases, early treatment efficacy often is characterized by an event, "response," which is observed relatively quickly. Since the risk of death decreases at the time of response, it is desirable not only to achieve a response, but to do so as rapidly as possible. We propose a Bayesian method for comparing treatments in this setting based on a competing risks model for response and death without response. Treatment effect is characterized by a two-dimensional parameter consisting of the probability of response within a specified time and the mean time to response. Several target parameter pairs are elicited from the physician so that, for a reference covariate vector, all elicited pairs embody the same improvement in treatment efficacy compared to a fixed standard. A curve is fit to the elicited pairs and used to determine a two-dimensional parameter set in which a new treatment is considered superior to the standard. Posterior probabilities of this set are used to construct rules for the treatment comparison and safety monitoring. The method is illustrated by a randomized trial comparing two cord blood transplantation methods.

Return to top

Title: A Bayesian IRT Model for the Comparison of Survey Item Characteristics under Dual Modes of Administration

  • Speaker: Lou Mariano, Ph.D., Statistician, RAND Corporation
  • Date/Time: Tuesday, September 25, 2007, 3:35pm.
  • Location: Bentley Lounge, Gray Hall 130, American University
  • Direction: Metro RED line to Tenleytown-AU. AU shuttle bus stop is next to the station.
  • Contact: Jaqueline Sosa, 202-885-3124, jsosa@american.edu
  • Sponsor: American University Department of Mathematics and Statistics Colloquium

Abstract:

Ordinal scale survey response items are often used in quantifying a latent trait. When the survey is offered in multiple modes of administration, e.g., telephone interview or self-administered questionnaire, the mode of administration may affect the characteristics of the survey items, such that an individualÕs responses may differ depending on the mode. Using a mental health survey as a case study, the Bayesian Differential Mode Effects Model (BDMEM) is introduced as an Item Response Theory (IRT) model-based solution for the detection, quantification and reconciliation of mode of administration effects at the item, response category, and scale levels. The BDMEM is compared to the popular approach of differential item functioning (DIF), and its advantages over DIF are highlighted, including the optimal use of repeated measures, the detection of differences in categorical response probabilities, and the automatic equating of results under different modes.

Return to top

Topic: Alternative Survey Sample Designs, Seminar #1: Network, Spatial, and Adaptive Sampling

  • Speaker: Professor Steven K. Thompson, Simon Fraser University
  • Discussant: Professor Jean D. Opsomer, Colorado State University
  • Date/Time: September 26, 2007, 9:15 a.m. - 12:00 p.m.
  • Location: U. S. Census Bureau, 4600 Silver Hill Road, Auditorium, Suitland, Maryland. By Metro, use the Green Line to Suitland Station and walk through the Metro parking garage to the main entrance of the Census Bureau. Please send an e-mail to Carol.A.Druin@census.gov, or call (301) 763 - 4216 to be placed on the visitors' list for this seminar by September 21, 2007. A photo ID is required for security purposes.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

The Census Bureau's Demographic Survey Sample Redesign Program, among other things, is responsible for research into improving the designs of demographic surveys, particularly focused on the design of survey sampling. Historically, the research into improving sample design has been restricted to the "mainstream" methods like basic stratification, multi-stage designs, systematic sampling, probability-proportional-to size sampling, clustering, and simple random sampling. Over the past thirty years or more, we have increasingly faced reduced response rates and higher costs coupled with an increasing demand for more data on all types of populations. More recently, dramatic increases in computing power and availability of auxiliary data from administrative records have indicated that we may have more options than we did when we established our current methodology.

This seminar series is the beginning of an exploration into alternative methods of sampling. In this first seminar, from 9:30 to 10:30, we will hear about Professor Thompson's work on network, spatial, and adaptive sampling. He will discuss various alternative approaches and their statistical properties. Following Professor Thompson's presentation, there will be a 15-minute break, and then from 10:45 - 11:30, Professor Jean Opsomer will provide discussion about the methods and their potential in demographic surveys, particularly focusing on impact on estimation. The seminar will conclude with an open discussion session from 11:30 - 11:45 with 15 additional minutes available if necessary.

Seminar #2 is currently slated for December 10, 2007 and will feature Professor Sharon Lohr of Arizona State University discussing multiple overlapping frame designs.

This event is accessible to persons with disabilities. Please direct all requests for sign language interpreting services, Computer Aided Real-time Translation (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY) or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Small Area Estimation: An Empirical Best Linear Unbiased Prediction Approach

  • Speaker: Huilin Li, University of Maryland
  • Chair: William Bell, Census Bureau
  • Date/Time: Friday, September 28, 2007, 12:30 Ð 2:00 pm
  • Speaker: Huilin Li, University of Maryland Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Speaker: Huilin Li, University of Maryland
  • Sponsor: Methodology Program, WSS
  • Presentation material (slides, pdf, ~420kb)

Abstract:

In this paper, based on the general Fay-Herriot model we evaluate the performance of different variance component estimation methods in the model-based point estimates and interval predictions. Following Morris' comments, we propose a new approach to estimate the model variance, which can always produce the positive estimates. Its positiveness and consistency are established also. A parametric bootstrap prediction interval method using the weighted least square estimator and ADM estimator under the general Fay-Herriot model is also proposed, and obtain coverage accuracy of O(mÁ3=2). Extensive simulation and real life data analysis are conducted. Our results suggest that this new approach performs better.

Return to top

Title: Multi-Stage Sampling for Genetic Studies

  • Speaker:
    Dr. Gang Zheng
    Mathematical Statistician
    Office of Biostatistics Research
    National Heart, Lung and Blood Institute, NIH
  • Date/Time: Friday, September 28, 2007, 11:00am-12:00pm
  • Location: DUQUES 255, 2201 G Street, N.W., Washington D.C. Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

In the firstpart of the talk, I will review various multi-stage sampling in classical genetic linkage and association studies. This part does not involve much statistics. In the second part, I will focus on a cost-effective two-stage design for genome-wide case-control association studies. Some test statistics for this two-stage design will also be discussed. Most of the talk is based on an article with Robert Elston and Danyu Lin to appear in Annual Review of Genomics and Human Genetics (Sept 2007).

For a complete listing of our current seminars, visit http://www.gwu.edu/~stat/seminar.htm. For more information about the George Washington University Department of Statistics Seminars, contact:
Efstathia Bura. Department of Statistics
E-mail: ebura@gwu.edu, Phone: 202-994-6358
Joseph L. Gastwirth, Department of Statistics
E-mail:jlgast@gwu.edu, Phone: 202-994-6548

Return to top

Title: Text Data Mining in Defense Applications

  • Speaker:
    Jeffrey L. Solka
    Advanced Computation Division
    Naval Surface Warfare Center, Dahlgren Division and Department of Bioinformatics
    George Mason University
  • Location:
    Research 1, Room 301, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: September 28, 2007
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

This talk will discuss the role of text data mining in defense applications. Discussions will include, but not be limited to, the role of text data mining in the characterization of country capabilities, its role in the characterization of the state of the art of a discipline area, and its role in discovery. Discussion will focus on the speaker's experiences in this area and his knowledge of the state of the text data mining literature. We also will explore who the customers might be for these techniques and where the future lies, both in the technology and in the important problems that have not yet been addressed.

Return to top

Title: Two for the Price of One: Statistics in Natural Language Processing and Information Retrieval

  • Speakers:
    Professor Douglas Oard, Associate Dean for Research & Associate Professor, College for Information Studies, UMCP
    Professor Philip Resnik, Associate Professor, Linguistics Dept. & UMIACS, UMCP
  • Date/Time: Thursday, October 4, 2007, 3:30pm
  • Location: Room 1313, Mathematics Building, University of Maryland, College Park. Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml
  • Sponsor: University of Maryland, Statistics Program

Abstract:

Interesting problems in statistics arise in several areas of natural language processing and information retrieval. Broadly, we might divide these into (1) estimating useful distributions for language use and (2) designing insightful and affordable evaluation methods. In this talk, we will provide a broad overview of these two closely related fields, focusing first on the consequences of what has been called the "evaluation guided research paradigm" that now dominates both fields. We'll then drill down to each describe one or two problems from our recent work where it seems to us that our worlds and yours [the statisticans'] might intersect. Our goal in this seminar is to start a discussion about the kinds of problems we might productively work on together.

Return to top

Title: The Statistical Challenge of Studies with Errors-in-Covariates When Only the Means are Modelle

  • Speaker:
    John Hanfelt, PhD, Associate Professor
    Emory University, Department of Biostatistics
  • Date: Friday, October 5, 2007
  • Time: 10:00-11:00 AM
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    Martin Marietta Conference Room
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics.
  • Sponsor: Georgetown University

Abstract:

Given the recent advances in convenient, flexible and powerful computer-intensive methods to analyze data, it is natural to wonder about the relevance of the `classical' theory of statistical inference. Here we discuss an application, namely studies with a covariate measured with error, that poses a severe statistical challenge when only the means of the observations are modelled. In this setting, standard methods of data analysis typically yield dramatically biased results -- even if computer-intensive methods are used. We draw upon the theory of bias reduction of profile estimating functions to arrive at inferences that are substantially less biased. We apply the proposed method to a study examining whether a biomarker measured with error (long-term alanine aminotransferase level) is related to length of hospital stay in patients treated for herpes zoster infections.

Return to top

Title: Finding the Fittest Curve for the Binary Classification Problem

  • Speaker:
    Denise M. Reeves
    Mitre Corporation
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: October 12, 2007
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 301, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

Solving the binary classification problem for an application involves solving a data driven modeling problem. Such problems entail multiple and coupled sources of errors. Two communities of practice have approached this problem with different sets of assumptions and resulting limitations. The statistical community assumes that data is generated by a given stochastic model with parameter estimates based on the given class of models. On the other hand, the machine learning or algorithmic modeling community uses algorithmic modeling methods that treat data mechanisms as unknown. Machine learning methods have been successfully used on large data sets and offer a more accurate alternative to data modeling on small data sets. In this talk we consider the hard margin support vector algorithm applied to several bivariate Gaussian data sets with common covariance matrices.

Return to top

Title: The Statistical Challenge of Studies with Errors-in-Covariates When Only the Means are Modelle

  • Speaker:
    John Hanfelt, PhD, Associate Professor
    Emory University, Department of Biostatistics
  • Date: Friday, October 5, 2007
  • Time: 10:00-11:00 AM
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    Martin Marietta Conference Room
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics.
  • Sponsor: Georgetown University

Abstract:

Given the recent advances in convenient, flexible and powerful computer-intensive methods to analyze data, it is natural to wonder about the relevance of the `classical' theory of statistical inference. Here we discuss an application, namely studies with a covariate measured with error, that poses a severe statistical challenge when only the means of the observations are modelled. In this setting, standard methods of data analysis typically yield dramatically biased results -- even if computer-intensive methods are used. We draw upon the theory of bias reduction of profile estimating functions to arrive at inferences that are substantially less biased. We apply the proposed method to a study examining whether a biomarker measured with error (long-term alanine aminotransferase level) is related to length of hospital stay in patients treated for herpes zoster infections.

Return to top

Title: Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata

  • Speaker: Paul B. Massell, Statistical Research Division, U.S. Census Bureau
  • Discussants:
    Richard Clayton, Office of Employment and Unemployment Statistics, BLS
    John Ruser, Office of Compensation and Working Conditions, BLS
  • Chair: Anne Polivka, BLS
  • Date/time: Tuesday, October 16, 2007 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section

Abstract:

Users ofstatistical tables released by the Economic Directorate of the U.S. Census Bureau have raised the issue of whether an alternative to cell suppression can be used to protect the confidentiality of such tables. These users would like to have access to at least an approximate value for each cell, except possibly for those cells that are the most sensitive. An alternative method was developed several years ago by researchers at the Census Bureau that successfully meets that goal. This method uses a carefully calibrated noise distribution to generate noise which is then added to the microdata values of a magnitude variable requiring protection. These noisy microdata values are then tabulated to form the cell values for all the tables in a statistical program that describe that variable (e.g., receipts for Non-Employer Statistics). This method is conceptually simple and easy to implement; in particular, it is much simpler than cell suppression. The main concerns are whether noise protected tables are fully protected and whether the noisy cell values are as or more useful to users than the combination of exact and suppressed values provided by cell suppression. The seminal paper by Evans-Zayatz-Slanta (J. Official Statistics, 1998)showed that this was clearly true for the survey analyzed in that paper. The work presented in this paper provides analysis for additional surveys with different features than the survey described in the earlier paper. We present general protection arguments that involve ways of relating the uncertainty provided by noisy values to the required amount of protection. We present graphs which show the different distributions of net noise on the set of sensitive cells versus that for the non-sensitive cells. We also discuss some ways to fine-tune the algorithm to a particular table, taking advantage of its special characteristics. We call this new variation balanced EZS noise'. Our conclusion is that when EZS noise is appropriately applied, it fully protects tables while usually releasing more useful data than cell suppression. The possible application of EZS noise to a variety of statistical programs within the Economic Directorate is currently being researched.

This talk is an expanded version of an invited talk presented at the Third International Conference on Establishment Surveys (ICES 2007 in Montreal) session called "Advances in Disclosure Protection: Releasing More Business and Farm Data to the Public". That paper was co-authored with Jeremy Funk. Paul and Jeremy are members of the Disclosure Avoidance Research Group in the Statistical Research Division of the U.S. Census Bureau. Laura Zayatz is the head of that group and provided guidance on this project.

Return to top

Title: Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

  • Speaker:
    Mitchell H. Gail, PhD, Chief of Biostatistics Branch
    Division of Cancer Epidemiology and Genetics, National Cancer Institute
  • Date: Friday, October 19, 2007
  • Time: 10:00-11:00 AM
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    Research Building, Conference Room E501
    Washington, DC 20057
  • Phone: 202-687-4114
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics.
  • Sponsor: Georgetown University

Abstract:

Some case-control genome-wide association studies (CCGWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. For such a study, we define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected", namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk, that allow for heterogeneity in genetic risk. DP increases with genetic effect size and case-control sample size, and decreases with the number of non-disease-associated SNPs, mainly through the ratio of T to N, the total number of SNPs. We show that DP increases very slowly with T, and the increment in DP per unit increase in T declines rapidly with T. DP is also diminished if the number of true disease SNPs exceeds T. For a genetic odds ratio per minor disease allele of 1.2 or less, even a CCGWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. We further calculate the sample size of the initial CCGWAS that is required to minimize the total cost of a research program that also includes follow-up studies to examine the T selected SNPs. A large initial CCGWAS is desirable if genetic effects are small or if the cost of a follow-up study is large.

Return to top

Title: Limitations of the Non-homogeneous Poisson Process (NHPP) Model for Analyzing Software Reliability Data

  • Speaker:
    Prof. Sudip Bose, Department of Statistics
    The George Washington University
  • Date: Friday, October 19, 2007
  • Time: 11:00am-12:00pm
  • Location: DUQUES 255, 2201 G Street, N.W., Washington D.C.
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
  • Contacts:
    Efstathia Bura, Department of Statistics, E-mail: ebura@gwu.edu, Phone: 202-994-6358
    Joseph L. Gastwirt, Department of Statistics, E-mail: jlgast@gwu.edu, Phone: 202-994-6548.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Software failure data can be analyzed to provide statistical estimates of the reliability of software, which are useful for assessing its quality, and for determining the date of release of a software package. The non-homogeneous Poisson process (NHPP) model is one of the models most widely used for describing and analyzing software failure processes. NHPP models in which the expected number of errors over infinite observation time is finite, are called NHPP-I models.

Our research proves a key statistical limitation of NHPP-I models, namely inconsistency of parameter estimates. In other words, even if the process is observed for an arbitrarily long time one cannot estimate unknown parameters of the model very accurately. The inconsistency feature is a consequence of a representation of an NHPP-I model as a mixture of General Order Statistics or GOS models (Raftery, 1987) and holds more generally for mixture distributions in broader settings, and not just for the NHPP model for software failures. This result also has implications for a Bayesian analysis of NHPP models.

We show that optimal unbiased estimation of any parametric function in an NHPP-I model essentially reduces to estimating related parametric functions of the underlying GOS model. We discuss other known features of an NHPP model that are not consistent with certain intuitive features of software failure processes and reliability growth.

This talk is based on joint research with my departmental colleagues, Professors Tapan Nayak and Subrata Kundu.

Return to top

Topic: Estimating the Measurement Error in the Current Population Survey Labor Force - A Mixture Markov Latent Class Analysis Approach

  • Speakers:
    Professor Jeroen Vermunt, Tilburg University, Netherlands Dr. Jay Magidson, President Statistical Innovations, Boston MA
  • Date/Time: Wednesday, October 24, 2007
    9:00 a.m. - 12:00 p.m.: Research results to enhance the LCA model for complex sample design, weighting and the modification of software
    1:30 p.m. - 3:30p.m.: (Application of Mixed LCA models to provide measurement error for current surveys)
  • Location: U. S.Census Bureau, 4600 Silver Hill Road, Auditorium, Suitland, Maryland. By Metro, use the Green Line to Suitland Station and walk through the Metro parking garage to the main entrance of the Census Bureau. Please send an e-mail to Shirrell.Adams@census.gov at (301) 763 - 5955, or Alexis.D.Reese@census.gov at (301) 763 - 4080, to be placed on the visitors list for this seminar by October 12, 2007. A photo ID is required for security purposes.

The Demographic Statistical Methods Division (DSMD) of the Census Bureau, among other things, is responsible for conducting research to implement more timely and less costly methods to estimate and prevent measurement error in demographic surveys. Latent Class Analysis (LCA) is an alternative approach to achieve this goal in contrast to the current reinterview methodology. Historically, at the Census Bureau, the research into LCA (First-order Markov Latent Class Model) was subject to non-complex sample designs. The DSMD has continued its research to improve the use of LCA for estimating response error. Through the most recent partnership with Westat and Statistical Innovations, the DSMD was able to accomplish this goal by conducting a thorough violation study that incorporates complex sample design with weighting and heterogeneity across latent classes. In addition, the research also incorporated an aspect to modify existing software to estimate the models.

This symposium will provide the research results of that partnership, as well as a session on how to apply the enhanced model to estimate measurement error in current surveys.

This event is accessible to persons with disabilities. Please direct all requests for sign language interpreting services, Computer Aided Real-time Translation (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

About the speaker: Dr. Vermunt is a professor in the Department of Statistics Research and Methodology at the University of Tilburg, Netherlands. Dr. Vermunt is the first recipient of the Leo Goodman Award of the ASA Methodology Section (2005). Dr. Vermunt's primary methodological contributions are in the area of categorical data analysis, with particular attention to latent heterogeneity. Using a latent class analysis approach, he has incorporated into log-linear event history analysis methods for handling missing data, unobserved heterogeneity, censoring, and measurement error. He has also successfully applied the same approach to classification and clustering analysis, and multi-level and random coefficient models for categorical data. In his recent work, he has made original and important contributions to the analysis of ordered data with different flexible constraints. Now, with the Census Bureau, Dr. Vermunt showed that the mixture Markov latent class model has a better fit (than previous models) in estimating the Current Population Survey labor force classification errors.

Agenda

8:45 am Refreshments
9:15 am Introductory Remarks
Ruth Ann Killion, Chief, Demographic Statistical Methods Division
Candice Barnes, Chief, Survey Response Analysis Branch
9:30 am Research results to enhance the LCA model for complex sample design, weighting and the modification of software
Dr. Jeroen Vermunt, Tilburg University, Netherlands
Dr. Jay Magidson, President, Statistical Innovations
1:00 am Question/Answer
12:00 Lunch
1:30 pm Workshop -- Application of Mixed LCA models to provide measurement error for current surveys
Dr. Jeroen Vermunt, Tilburg University, Netherlands
Dr. Jay Magidson, President, Statistical Innovations
3:30 pm Wrap Up
4:00 pm Adjourn

Return to top

Title: Statistical Issues and Challenges Arising from Analysis of Genome-Wide Association Studies

  • Chair: Rene Gonin, Westat
  • Speaker: Gang Zheng, National Heart, Lung, and Blood Institute
  • Date/Time: Thursday, October 25, 2007, 12:30 - 2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS
  • Presentation material (slides, pdf, ~420kb)

Abstract:

With the advance of biotechnology and reduction of genotyping cost, a genome-wide association study testing association between a disease and 100,000 to 500,000 genetic markers (single nucleotide polymorphisms: SNPs) is feasible. Such a study consists of several stages, from quality control, a genome-wide single marker analysis, to more powerful regional analysis, replication studies. Statisticians face challenges in each of these stages. Consequentially, many statistical issues arise from the analyses. We will review and discuss these statistical issues and controversies.

Return to top

17th ANNUAL MORRIS HANSEN LECTURE

Title: Assessing the Value of Bayesian Methods for Inference About Finite Population Quantities

Joe Sedransk, Professor of Statistics (Case Western Reserve University, Cleveland, Ohio), will give the 17th Annual Morris Hansen Lecture "Assessing the Value of Bayesian Methods for Inference About Finite Population Quantities" on Tuesday October 30 at 3:30 P.M. in the Jefferson Auditorium of the Department of Agriculture's South Building (Independence Avenue SW, between 12th and 14th Street). The Hansen Lecture Series is sponsored by the Washington Statistical Society, Westat, and the National Agricultural Statistics Service (NASS).

The USDA South Building (Independence Avenue SW) is between 12th and 14th Streets at the Smithsonian Metro Stop (Blue Line). Enter through Wing 5 or Wing 7 from Independence Ave. (The special assistance entrance is at 12th & Independence). A photo ID is required.

Please pre-register for this event to help facilitate access to the building. After September 1, pre-register on line at http://www.nass.usda.gov/morrishansen/. Additional information will appear in the October issue.

Abstract:

Bayesian methodology is well developed and there are successful applications in many areas of substantive research. However, the use of such methodology in making inferences about finite population quantities is limited. I will describe several types of application where greater use of Bayesian methods is likely to be profitable and some where they are not. In addition, I will describe research whose successful completion should lead to improved analysis. The illustrations will come, primarily, from establishment surveys and a related area, providing public "report cards" for providers of medical care.

Return to top

Title: Multilevel Functional Principal Component Analysis

  • Speaker: Ciprian Crainiceanu, Ph.D. (pronounced Chip-ree-ann Cray-nee-cha-noo), Assistant Professor, Johns Hopkins University, Department of Biostatistics
  • Date: Friday, November 2, 2007
  • Time: 10:00-11:00 AM
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, Martin Marietta Conference Room, Washington, DC 20007
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics.
  • For Information- please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu
  • Sponsor: Georgetown University

Abstract:

Modern research data have become increasingly complex, raising non-traditional modeling and inferential challenges. In particular, advancements in technology and computation have made recording and processing of functional data possible. Examples of functional data are time series of electroencephalographic (EEG) activity, anatomical shape, and functional MRI. The purpose of this talk is to describe statistical models for feature extraction from single-level (one or multiple functions per subject at one visit) and clustered or longitudinal (one or multiple functions per subject at multiple visits) functional data having a large number of subjects and large within- and between-subject heterogeneity. We introduce the framework and inferential tools for multilevel functional data (MFD) obtained by recording of functional characteristics at multiple visits. Though motivated by a novel experimental setting, the proposed methodology is general, with potential broad applicability to many high-throughput scientific studies. A prototypical example of MFD is the Sleep Heart Health Study (SHHS), which contains electroencephalographic (EEG) signals for each subject at two visits.

Return to top

Title: Multiphase Regression Models for Assessing Highly Multivariate Measurement Systems

  • Speaker:
    Dr. Z.Q. John Lu
    National Institute of Standards and Technology
    Gaithersburg, MD
  • Date: Friday, November 2, 2007
  • Time: 11:00 am - 12:00 pm
  • Location: DUQUES 255, 2201 G Street, N.W., Washington D.C.
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
  • Contacts:
    Efstathia Bura, Department of Statistics, E-mail: ebura@gwu.edu, Phone: 202-994-6358
    Joseph L. Gastwirt, Department of Statistics, E-mail: jlgast@gwu.edu, Phone: 202-994-6548.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

While there exist some nice models for the measurement process of scalar and small-scale analytical chemistry experiments, there is lack of understanding and tools for establishing the standards and performance of high throughput measurement systems, such as mRNA microarray measurements. An ongoing program at NIST on gene expression microarray experiments has demonstrated some potential approaches, including some performance metrics for scanner microarray measurement, and use of spike-in experiments in calibration and validation. I will describe a class of multiphase and nonlinear regression models used in these studies, and show how these general measurement models can accommodate for the wide exponential range of signal variation while accounting for the background error, multiplicative signal error, instrument saturation at high intensity, and how they can be adapted to model the highly parallel and multivariate nature of modern biochemical experiments.

Return to top

Title: Cell Lines, Microarrays, Drugs and Disease: Trying to Predict Response to Chemotherapy

  • Speaker:
    Keith Baggerly
    Bioinformatics and Computational Biology
    T M. D. Anderson Cancer Center
  • Date/time: Wednesday. November 7, 2007 / 11a.m.-12noon
  • Location:
    Executive Plaza North
    Conference Room G
    6130 Executive Boulevard
    Rockville, MD

Abstract:

Over the past few years, microarray experiments have supplied much information about the disregulation of biological pathways associated with various types of cancer. Many studies focus on identifying subgroups of patients with particularly agressive forms of disease, so that we know who to treat. A corresponding question is how to treat them. Given the treatment options available today, this means trying to predict which chemotherapeutic regimens will be most effective. We can try to predict response to chemo with microarrays by defining signatures of drug sensitivity. In establishing such signatures, we would really like to use samples from cell lines, as these can be (a) grown in abundance, (b) tested with the agents under controlled conditions, and (c) assayed without poisoning patients.

Recent studies have suggested how this approach might work using a widely-used panel of cell lines, the NCI60, to assemble the response signatures for several drugs. Unfortunately, ambiguities associated with analyzing the data have made these results difficult to reproduce. In this talk, we will discuss the steps involved in attacking response prediction, and describe how we have analyzed the data. We will cover some specific ambiguities we have encountered, and in some cases how these can be resolved. Finally, we will describe methods for making such analyses more reproducible, so that progress can be made more steadily.

For Additional Information contact Lisa Poe at the Office of Preventive Oncology (cpfpcoordinator@mail.nih.gov) or (301) 496-8640

Return to top

Topic: Introduction to Number Theory and Modeling the Average Running Time of Computer Programs

  • Speaker: George Andrews, Professor, Department of Mathematics, The Pennsylvania State University
  • Date/time: Thursday, November 8, 2007, 3:00 - 4:00 p.m.
  • Location: U.S.Census Bureau, 4600 Silver Hill Road, Seminar Room 5K410, Suitland, Maryland. A photo ID is required for security purposes. Also, all visitors to the Census Bureau are required to have an escort to Seminar Room 5K410. An escort will leave from the Guard's Desk at the Metro Entrance (Gate 7) with visitors who have assembled at 2:45 p.m. and again at 2:55 p.m. Parking is available at the Suitland Metro.

Abstract:

One of the basic aspects of number theory concerns the divisibility of integers. Of particular interest are d(n) the number of divisors of n, and sigma(n), the sum of the divisors of n. We shall begin with a discussion of these functions and their respective generating functions. In the latter portion of the talk, we shall look at a graph-theoretic, probability model used to estimate the average running time of a large class of computer programs. Seemingly out of nowhere we will wind up back with the divisor function.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Evaluation of Trace Evidence in the Form of Multivariate Data and Sample Size Estimation in a consignment

  • Speaker:
    Prof. Colin Aitken
    School of Mathematics
    The University of Edinburgh
  • Date: Friday, November 9, 2007
  • Time: 11:00 am - 12:00 pm
  • Location: DUQUES 255, 2201 G Street, N.W., Washington D.C.
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
  • Contacts:
    Efstathia Bura, Department of Statistics, E-mail: ebura@gwu.edu, Phone: 202-994-6358
    Joseph L. Gastwirt, Department of Statistics, E-mail: jlgast@gwu.edu, Phone: 202-994-6548.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Two issuesof importance to forensic scientists in which statistics has a role to play will be discussed. Multivariate data occur often in forensic science and the example used for illustration is that of the elemental composition of glass. Measurements are made of fragments of glass at a crime scene and from fragments of glass found on a suspect. An approach to the evaluation of evidence is described that takes account of variation in the measurements between different sources and within different sources.

The second issue is that of determination of the size of a sample that needs to be taken from a consignment of drugs in order to make an inferential statement about the proportion of the consignment that is illicit.

Return to top

Title: Multi-modal Data and Text Mining

  • Speaker:
    John Thomas Rigsby
    Naval Surface Warfare Center
    Advanced Computation Division
  • Date: Friday, November 9, 2007
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 301, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

There are many attributes to text analysis: words, documents, bigrams, trigrams, n-grams, contextual relationships, latent semantics, and many others. This paper covers a spectral graph method for co-clustering multiple attributes at the same time. Co-clustering is very useful not only because it turns a two step process into a one step process, but it also shows you the relationships between different sets of attributes. This paper goes beyond normal two-mode co-clustering (ie words and documents) into the area of co-clustering multiple modes (ie words, documents, bigrams, trigrams, etc.) all at the same time.

Return to top

Title: An MM Algorithm for Multicategory Vertex Discriminant Analysis

  • Speaker: Professor Tongtong Wu, Department of Epidemiology and Biostatistics, UMCP
  • Date/Time: Thursday, November 15, 2007, 3:30pm
  • Location: Room 1313, Mathematics Building, University of Maryland, College Park. Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml
  • Sponsor: University of Maryland, Statistics Program

Abstract:

This talk introduces a new method of supervised learning based on linear discrimination among the vertices of a regular simplex in Euclidean space. Each vertex represents a different category. Discrimination is phrased as a regression problem involving -insensitive residuals and a quadratic penalty on the coefficients of the linear predictors. The objective function can by minimized by a primal MM (majorization- minimization) algorithm that (a) relies on quadratic majorization and iteratively reweighted least squares, (b) is simpler to program than algorithms that pass to the dual of the original optimization problem, and (c) can be accelerated by step doubling. Limited comparisons on real and simulated data suggest that the MM algorithm is competitive in statistical accuracy and computational speed with the best currently available algorithms for discriminant analysis.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml.

Return to top

Title: Ranges of Association Measures for Dependent Binary Variables

  • Speaker: N. Rao Chaganty, Ph.D., Department of Mathematics and Statistics, Old Dominion University
  • Date: Friday, November 16, 2007
  • Time: 10:00-11:00 AM
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, Martin Marietta Conference Room, Washington, DC 20007
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics.
  • For Information- please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu
  • Sponsor: Georgetown University

Abstract:

Analysis of longitudinal and clustered binary data is important in biomedical research. Numerous measures of association have been proposed in the literature for the study of dependence between the binary variables. These measures include correlations, odd ratios, kappa statistics and relative risks. In this talk I will discuss permissible ranges of these measures of association. Knowledge of these ranges is crucial for developing efficient estimation methods for real life data. I will show moment based methods such as generalized estimating equations, which ignore these ranges, could result in misleading p-values and incorrect conclusions.

Return to top

Title: Sensitivity Analysis for Instrumental Variables Regression with Overidentifying Restrictions

  • Speaker:
    Prof. Dylan Small
    Department of Statistics
    University of Pennsylvania
  • Date: Friday, November 16, 2007
  • Time: 11:00 am - 12:00 pm
  • Location: DUQUES 255, 2201 G Street, N.W., Washington D.C.
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
  • Contacts:
    Efstathia Bura, Department of Statistics, E-mail: ebura@gwu.edu, Phone: 202-994-6358
    Joseph L. Gastwirt, Department of Statistics, E-mail: jlgast@gwu.edu, Phone: 202-994-6548.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Instrumental variables (IV) regression is a method for making causal inferences about the effect of a treatment based on an observational study in which there are unmeasured confounding variables. The method requires one or more valid IVs; a valid IV is a variable that is associated with the treatment, is independent of unmeasured confounding variables and has no direct effect on the outcome. Often there is uncertainty about the validity of the proposed IVs. When a researcher proposes more than one IV, the validity of the IVs can be tested via the "overidentifying restrictions test.'' Although the overidentifying restrictions test does provide some information, the test has no power versus certain alternatives and can have low power versus many alternatives due to its omnibus nature. To fully address uncertainty about the validity of the proposed IVs, we argue that a sensitivity analysis is needed. A sensitivity analysis examines the impact of plausible amounts of invalidity of the proposed IVs on inferences for the parameters of interest. We develop a method of sensitivity analysis for IV regression with overidentifying restrictions that makes full use of the information provided by the overidentifying restrictions test, but provides more information than the test by exploring sensitivity to violations of the validity of the proposed IVs in directions for which the test has low power. Our sensitivity analysis uses interpretable parameters that can be discussed with subject matter experts. We illustrate our method using a study of food demand among rural households in the Philippines.

Return to top

Title: Handwriting Identification: Identifying the Writer of a Questioned Document Using Statistical Analysis

  • Speakers:
    Donald Gantz and John Miller
    Department of Applied Information Technology and Department of Statistics
    George Mason University
  • Date: Friday, November 16, 2007
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 301, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

The speakers are co-principal investigators in the Volgenau School's Document Forensics Laboratory. One of the technologies they have developed involves the identification of the unknown writer of a questioned handwritten document from among a population of writers, who have handwriting samples in a database. They will explain how they have designed a system based on applying discriminant analysis in a novel manner to solve this handwriting identification problem. Graph theory is used to quantify handwritten characters yielding high-dimensional feature vectors capturing physical information for the characters. The statistical methodology selects and utilizes a small number of discriminating measurements from the high-dimensional feature vector. They will demonstrate the surprising writer identification power possible using very few lower-case letters of the alphabet.

Return to top

Title: The Effects of Active Duty on the Income of Reservists and the Labor Market Participation of Spouses

  • Speaker: Joshua Pinkston, Office of Employment and Unemployment Statistics, BLS
  • Discussants: Paul F. Hogan
  • Chair: Linda Atkinson, Economic Research Service, USDA
  • Date/Time: Wednesday, November 28, 2007 / 12:30 2:00 p.m.
  • Location: Bureauof Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section

Abstract:

Using data provided by the Department of Defense merged with Unemployment Insurance wage records, we first examine the effect of being called to active duty on the income of reservists and members of the National Guard. We examine how effects on reservists' income vary by income before being called to active duty as well as by the industry the reservists were employed in prior to active duty.

Furthermore, the data allow us to identify an unanticipated shock that entails both a short-run component (the effect on the reservist's income) and a long-run component in the form of increased risk of death or injury. We can then clearly identify the spouse's labor market response to this shock as well as the overall effect on family income.

In contrast to a traditional displaced worker problem, being called to active duty makes the reservists less available for household production than they were prior to being called up. If a reservist's income falls (or if expected lifetime income falls) the spouse's labor market participation may or may not increase. If a reservist's income rises when called to active duty (due to combat pay, etc.), the spouse's labor market participation will likely fall, unless the increase in income is balanced out by a lower expectation of future income. A reservist's income and the income of the reservist's family, therefore, will not necessarily move in the same direction when the reservist is called up.

Return to top

Title: Tests of Unit Roots in Time Series Data

  • Speaker: Sastry Pantula, Head, Department of Statistics, North Carolina State University
  • Chair: Anne Polivka, Bureau of Labor Statistics
  • Date/Time: Thursday, November 29, 2007 / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section
  • Presentation slides: Download the pdf (~548kb)

Abstract:

Unit root testsin time series analysis have received considerable attention since the seminal work of Dickey and Fuller (1976). In this talk, some of the existing unit root test criteria will be reviewed. Size, power and robustness to model misspecification of various unit root test criteria will be discussed. More recent work on unit root tests where the alternative hypothesis is a unit root process will be discussed. Tests for trend stationary versus difference stationary models will be discussed briefly. Current work on unit root test criteria on random coefficient models and seasonal series will also be discussed. Examples of unit root time series and future directions in unit root hypothesis testing will be presented.

Return to top

Title: Analyzing Forced Unfolding of Protein Tandems via Order Statistics

  • Speaker: Prof. Efstathia Bura, Department of Statistics, George Washington University
  • Date/Time: 11:00 - 12:00 pm, November 29, 2007
  • Location: National Heart Lung and Blood Institute, Conference room 9201, Two Rockledge Center, 6701 Rockledge Drive, Bethesda, MD 20892

Abstract:

Mechanically active proteins are typically organized as homogeneous or heterogeneous tandems of protein domains. A large number of proteins perform important biological functions in their unfolded state. In current force-clamp atomic force microscopy (AFM), mechanical unfolding of protein tandems is studied by using constant stretching force and recording the unfolding transitions of individual domains. The main goals of these experiments are (a) to obtain the distributions of unfolding times for individual domains and (b) to probe interdomain interactions. Existing statistical methodology offers limited information gain as it ignores the complexities of the data. By the very method of AFM instrumentation, the observable quantities are the ordered forced unfolding times. Extending the existing and developing new theoretical approaches and statistical tools for the analysis of ordered unfolding transitions is the aim of this collaborative project. In this talk, order statistics based methodology will be presented for analyzing the unfolding times of protein tandems and to infer the parent unfolding time distributions of individual domains from ordered unfolding times. Statistical tests for independence of the unfolding times and equality of their (parent) distributions, which use ordered data as their input, will be presented. The proposed tests will enable experimentalists and theoreticians to detect presence of interdomain interaction. This presentation is based on a collaborative research project with biophysicists Prof. Barsegov (University of Massachusetts at Lowell) and Prof. Klimov (George Mason University).

Return to top

Title: Evaluating Alternative One-Sided Coverage Intervals for an Extreme Binomial Proportion

  • Chair: Keith Rust, Westat
  • Speakers:
    Phillip Kott, Research and Development Division, NASS, and
    Yan Liu, Statistics of Income Division, IRS
  • Discussant: Randy Curtin, NCHS
  • Date/Time: Wednesday, December 5, 2007, 12:30 Ð 2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS
  • Presentation slides: Download the pdf (~2.3mb)

Abstract:

The interval estimation of a binomial proportion is difficult, especially when the proportion is extreme (very small or very large). Most of the methods discussed in the literature implicitly assume simple random sampling. These interval-estimation methods are not immediately applicable to data derived from a complex sample design. Some recent papers have addressed this problem, proposing modifications for complex samples. Matters are further complicated when a one-sided coverage interval is desired. This paper provides an extensive review of existing methods for constructing coverage intervals for a binomial proportion under both simple random and complex sample designs. It also evaluates the empirical performances of different one-sided coverage intervals under both a simple random and a stratified random sample design.

Return to top

Title: Evaluating Continuous Training Programs Using the Generalized Propensity Score

  • Speaker: Arne Uhlendorff, IZA Institute for the Study of Labor, Bonn
  • Discussant: Julia Lane, National Opinion Research Center at the University of Chicago
  • Chair: Linda Atkinson, Economic Research Service, USDA
  • Date/Time: Thursday, December 6, 2007 / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section

Abstract:

This paper assesses the dynamics of treatment effects arising from variation in the duration of training. We use German administrative data that have the extraordinary feature that the amount of treatment varies continuously from 1 day to 720 days (i.e. 2 years). This feature allows us to estimate a continuous dose-response function that relates each value of the dose, i.e. days of training, to the individual post-treatment employment probability (the response). The dose-response function is estimated after adjusting for covariate imbalance using the generalized propensity score, a recently developed method for covariate adjustment under continuous treatment regimes. Our results indicate an increasing dose-response function for treatments of up to 360 days, and a similarly steady decline afterwards.

Return to top

Title: Disparate Modes of Survey Data Collection

  • Speaker: Mark Pierzchala, Senior Fellow, Mathematica Policy Research, Inc.
  • Discussant: Brad Edwards, Vice President, Westat
  • Chair: Carl Pierchala, Mathematical Statistician, National Highway Traffic Safety Administration
  • Date/Time: Friday, December 7, 2007 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Data Collection Methods
  • Presentation material:
    Slides from the Speaker (pdf, ~44kb)
    Handout from the Speaker (pdf, ~132kb)
    Slides from the Discussant (pdf, ~80kb)

Abstract:

Multimode surveys are increasingly fielded in an effort to reduce costs, increase response rates, and accelerate data collection. However the essential survey-taking process of posing a question, formulating an answer, and communicating and recording a response occurs differently in each mode. For example in Web and paper modes the survey presentation is visual and the respondent is solely responsible for understanding the question and providing an answer. On the other hand, in CAPI and CATI modes, the survey presentation is aural and providing an answer involves an interviewer.

This seminar reviews the concept of disparate modes. Survey modes are disparate for a survey item when they result in a different optimal question form in each mode. The intrinsic aspects of each mode are reviewed for their influence on disparity taking into account the specific kinds of items the survey uses.

This presentation uses examples of multimode surveys conducted by Mathematica Policy Research, Inc. It reviews the methods used to investigate this topic, where and why disparity occurs, and how some kinds of items are more prone to disparate presentation across modes. It also notes that different question forms for an item across modes can be the result of the survey design and survey operations environment rather than due to intrinsic disparity. Much of this material was presented at the International Statistical Institute conference in August 2007 in Lisbon, Portugal.

Return to top

Title: Empirical Likelihood Based Calibration Method in Missing Data Problems

  • Speaker: Prof. Efstathia Bura, Department of Statistics, George Washington University
  • Date/Time: 11:00 - 12:00 pm, December 10, 2007
  • Location: National Heart Lung and Blood Institute, Room 9201, OBR Conference Room, OBR/NHLBI, 6701 Rockledge Drive, Bethesda, MD 20892. To enter the building, contact: Gang Zheng, 301-435-1287 and your photo ID is also required. Parking behind of the buildings is free.
  • Driving Directions: From Frederick (north): Take 1-270 South. Take exit 1 for Rockledge Drive. Merge onto Rockledge Blvd. Turn right at Rockledge Drive. From DC/NVA (south): Take 1-495 North. Slight left at 1-270 Spur North for Rockville/1-270/Frederick. Take exit 1 for Democracy Blvd. Keep right at the fork and follow sings for Democracy Blvd E and merge onto Democracy Blvd. Turn left at Rockledge Dr.

Abstract:

Calibration estimation has been developed into an important field of research in survey sampling during last decade. It is now an important methodological instrument in the production of statistics. A few national statistical agencies have developed software designed to compute calibrated weights based on auxiliary information available in population registers and other sources. However its application in general statistics outside of survey sampling is limited. In this paper we have found the simple calibration method is a powerful tool to handle the general missing data problem when the parameters of interest are defined by unbiased estimating equations. Unlike the traditional calibration method in which the calibrated weights do not depend on any unknown parameters, our calibration weights depend on the unknown parameters of interest and must be estimated by the calibration estimating equations. Large sample results and simulations are included. All results show that in general the proposed empirical likelihood calibration method produces improved estimation over its competitors. This talk is based on joint works with some of my colleagues.

Return to top

Title: Approaches to Reducing and Evaluating Nonresponse Bias, With Applications to Adult Literacy Surveys

  • Chair: Tom Krenzke, Westat
  • Speaker: Wendy Van de Kerckhove, Westat
  • Discussant: Brian Harris-Kojetin, Office of Management and Budget
  • Date/Time: Wednesday, December 12, 2007 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

Almost all surveys are subject to some level of nonresponse. Nonresponse bias can be substantial when two conditions hold, 1) when the response rate is relatively low, and 2) when the difference between the characteristics of respondents and nonrespondents is relatively large. As addressed in the most recent OMB guidelines, approaches to reducing and evaluating nonresponse bias should consider both components. This presentation describes several approaches for reducing and evaluating nonresponse bias in surveys aimed at assessing adult literacy. Several bias-reduction approaches will be presented relating to data collection, weighting, and imputation for outcome-related nonresponse. In addition, an evaluation of nonresponse bias will be shown that extends the standard demographic comparison of respondents and nonrespondents to incorporate key survey estimates.

Return to top

Seminar Archives

2017 2016 2015 2014 2013
2012 2011 2010 2009
2008 2007 2006 2005
2004 2003 2002 2001
2000 1999 1998 1997
1996 1995    

Methodology