Washington Statistical Society on Meetup

Washington Statistical Society Seminars: 2006

January, 2006
17
Tues.
Statistical Methods for Alerting Algorithms in Biosurveillance
19
Thurs.
The Use of Contact History Data for Exploring Survey Nonresponse in Federal Demographic Surveys (A Joint Seminar)
20
Fri.
The George Washington University
Departments of Management Science and Statistics
Joint Seminar
Selecting the Threshold Through the Entropy of the Dirichlet Process When the Peaks Over Threshold Method is Applied in Extreme Value Analysis
27
Fri.
The George Washington University
Department of Statistics Seminar
Phase Changes in Subtree Varieties in Random Trees
February, 2006
9
Thur.
University of Maryland
Statistics Program Seminar
An RKHS Formulation of Discrimination and Classification for Stochastic Processes
10
Fri.
The George Washington University
Department of Statistics Seminar
Agent-based Simulation Using Social Network Models of Alcohol Systems with Acute Outcomes
10
Fri.
George Mason University
Statistics Colloquium Series
On the Borders of Statistics and Computer Science
16
Thur.
University of Maryland
Statistics Program Seminar
Regression fractional hot deck imputation
22
Wed.
Indirect Monetary Incentives with a Complex Agricultural Establishment Survey
22
Wed.
U.S. Bureau Of Census
Statistical Research Division Seminar
Usability Engineering: Can-Do Processes for Software Development in the Federal Workplace
24
Fri.
University of Maryland
Statistics Program Seminar
Regression fractional hot deck imputation
24
Fri.
The George Washington University
Department of Statistics Seminar
Distributions in the Ehrenfest Process
March, 2006
1
Wed.
The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities
2
Thur.
University of Maryland
Statistics Program Seminar
Statistical Applications in FDA
3
Fri.
George Mason University
AES/CCS/SCS/Statistics Colloquium Series
Andrews Image: An Alternative Visualization of Andrews' Curves
7
Tues.
U.S. Bureau Of Census
Statistical Research Division Seminar
Survey Participation as Prosocial Behavior
9
Thurs.
Small Area Estimation using Multiple Surveys
9
Thur.
University of Maryland
Statistics Program Seminar
On local Minimax Estimation With Some Consequences For Ridge Regression, Tree Learning and Reproducing Kernel Methods
15
Wed.
The State of Record Linkage
22
Wed.
Applying the Capability Maturity Model in a Statistical Organization
28
Tues.
Estimating Drug Use Prevalence Using Latent Class Models with Item Count Response as One of the Indicators
29
Wed.
U.S. Bureau Of Census
Statistical Research Division Seminar
Introductory Expertise and Tailoring in the Opening Moments of Telephone Requests for Survey Participation
31
Fri.
The George Washington University
Department of Statistics Seminar
Statistical Issues in Large-Scale Genetic Association Studies
April, 2006
4
Tues.
An Analysis of a Two-Way Categorical Table Incorporating Intra-Class Correlation
7
Tues.
Conversational Practices with a Purpose: Interaction within the Standardized Interview
20
Mon.
Using Comparative Genomics to Assess the Function of Moncoding Sequences (NCS)
26
Wed.
University of Maryland
Statistics Program Seminar
A Generation of Data: The General Social Surveys, 1972-2006 and Beyond
28
Fri.
Stochastic Variants of EM: Monte Carlo, Quasi-Monte Carlo and More
20
Fri.
The George Washington University
Departments of Management Science and Statistics
Joint Seminar
Selecting the Threshold Through the Entropy of the Dirichlet Process When the Peaks Over Threshold Method is Applied in Extreme Value Analysis
May, 2006
1
Mon.
U.S. Bureau Of Census
Statistical Research Division Seminar
Assessing the Effects of Variability in Interest Rate Derivative Pricing
3
Wed.
Exploiting Sparsity and Within-array Replications in Analysis ofMicroarray Data
4
Thurs.
Self-employment and Entrepreneurship: Reconciling Household and Administrative Measures
9
Tues.
2005 Roger Herriot Award
Encouraging Innovation in Government Statistical Agencies: Roger Herriot's Legacy
16
Tues.
A Multiscale Method for Disease Mapping in Spatial Epidemiology
17
Wed.
Optimizing the Use of Microdata: A New Perspective on Confidentiality and Access
June, 2006
1
Thurs.
Estimating Drug Use Prevalence Using Latent Class Models with Item Count Response as One of the Indicators
6
Tues.
Bayesian and Frequentist Methods for Provider Profiling Using Risk-Adjusted Assessments of Medical Outcomes
7
Wed.
Model Evaluation and Model Selection Based on Prediction Error for Various Outcomes
7
Wed.
Independence
7
Wed.
U.S. Bureau Of Census
Statistical Research Division Seminar
How Many Students Really Graduate from High School? The Process of High School Attrition
8
Thurs.
Characterization of Cost Structures, Perceived Value and Optimization Issues in Small Domain Estimation
13
Tues.
U.S. Bureau Of Census
Statistical Research Division Seminar
Individual Differences in Opinion Formation and Change
15
Thurs.
Pesticide Epidemiology and Poisoning Surveillance in the Twenty First Century
20
Tues.
An Update on the NIST Statistical Reference Datasets for MCMC: Ranking the Sources of Numerical Error in MCMC Computations
22
Thurs.
Implications for RDD Design from an Incentive Experiment
July, 2006
12
Tues.
U.S. Bureau Of Census
Statistical Research Division Seminar
The International Programs Center Involvement in HIV/AIDS Activities
25
Tues.
A Multiscale Method for Disease Mapping in Spatial Epidemiology
27
Thurs.
Bayesian Methods for Incomplete Two-way Categorical Table with Application to the Buckeye State Polls
August, 2006
29
Tues.
U.S. Bureau Of Census
Statistical Research Division Seminar
Besov Spaces and Empirical Mode Decomposition for Seasonal Adjustment in Nonstationary Time Series
September, 2006
1
Fri.
Integration of Gene Expression and Copy Number
12
Tues.
Economic Turbulence: Is a Volatile Economy Good for America?
19
Tues.
Prediction of Finite Population Totals Based on the Sample Distribution
October, 2006
4
Wed.
Baseline Adjustment By Inducing Partial Ordering When Measurements Are Ordered Categories
6
Fri.
Detection of Anatomical Landmark
16
Mon.
Nonprofit Employment: Improving Estimates With A Match Of IRS Information Forms And BLS QCEW
18
Wed.
Moving versus Fixed Sampling Designs For Detecting Airborne Biological Pathogens
20
Fri.
On Missing Data And Interactions In SNP Association Studies
20
Fri.
The George Washington University
Department of Statistics Seminar
Absolute Risk: Clinical Applications and Controversies
24
Tues.
Protecting the Confidentiality of Commodity Flow Survey Tabular Data by Adding Noise to the Underlying Microdata
26
Thur.
The National Academies
101st Meeting of the Committee on National Statistics
November, 2006
2
Thurs.
Partially Synthetic Data For Disclosure Avoidance: An Application To The American Community Survey Group Quarters Data
3
Fri.
Data-Driven Systems Biology: Direct Paths From Measurements to Biomedical Insight and Personalized Medicine
6
Mon.
Morris Hansen Lecture
Statistical Perspectives On Spatial Social Science
14
Tues.
Working With The American Community Survey: Findings From The National Academies Panel
15
Wed.
The Advanced Technology Program: Evaluating A Public-Private Partnership
17
Fri.
Current Proteome Profiling Methods and Applications
29
Wed.
Empirical Likelihood Methods for Complex Surveys
December, 2006
1
Fri.
Efficient Design and Analysis of Biospecimens with Incomplete Measurements
4
Mon.
OMBÕs Proposed Implementation Guidance for the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA)
7
Wed.
Ephedra: A Case Study of Statistics, Policy, and Politics
14
Wed.
Weight Adjustments for the Grouped Jackknife Variance Estimator


Title: Statistical Methods for Alerting Algorithms in Biosurveillance

  • Chair: Myron Katzoff, National Center for Health Statistics
  • Speakers: Howard S. Burkom, National Security Technology Department, The Johns Hopkins University Applied Physics Laboratory
  • Date/Time: Tuesday, January 17, 2006 / 12:30 - 2:00 p.m.
  • Location: National Center for Health Statistics, room 1403A, 3311 Toledo Road, Hyattsville, MD (Metro: Green Line, Prince George's Plaza and then about a 10 minute walk). Note: please try to arrive 15-30 minutes early because of possible security screening delays.
  • Sponsors: WSS Section on National Defense and Homeland Security

Abstracts:

Syndromic surveillance involves the monitoring of available data sources for early warning of outbreaks of unspecified disease or of specified disease before the confirmation of identifying symptoms, with the objective to complement physician sentinel surveillance with false alarm rates acceptable to the public health infrastructure. Data sources include clinical data such as counts of syndrome-specific emergency department visits or physician office visits, and nonclinical data such as over-the-counter remedy sales and school/work absentee rates.

The terrorist attacks of 2001 added urgency to the development and activation of automation-aided biosurveillance, and system applications have extended to monitoring of natural public health threats such as the onset of influenza season as well as to recent new ones such as West Nile virus, the SARS epidemic, and a potential avian flu pandemic. Effective systems require a combination of expertise in medicine and epidemiology, in information technology, and in statistics and related fields of analysis.

A common approach among system developers has been to adapt chart-based methods from the field of statistical process control. Major obstacles to this approach are the evolving and often nonstationary input data streams, the uncertainty of the nature of the signal to be detected, and the presence of systematic or periodic behavior in the data background. Thus, robust detection performance, measured by timeliness and sensitivity at controlled alert rates, requires a combination of modeling and process control suitable to the characteristics of the monitored data.

The technical part of this presentation discusses several algorithmic approaches to the monitoring of syndromic time series, including adaptations of standard control charts, Riffenburgh's moving F statistic, and scan statistics. An interactive spreadsheet environment will be used to enable detailed examination of the positive and negative features of these methods on several data types. A generalized exponential smoothing approach to data modeling will be discussed, and a control chart derived from it will be used to illustrate a detection evaluation methodology.

Return to top

Title: The Use of Contact History Data for Exploring Survey Nonresponse in Federal Demographic Surveys. (A Joint Seminar)

  • Chair: Richard L Bitzer, U.S. Census Bureau
  • Speakers:
    Nancy Bates, U.S. Census Bureau
    James M. Dahlhamer, National Center for Health Statistics/Centers for Disease Control and Prevention
  • Date/Time: Wednesday, January 19, 2005 / 12:30 - 2 p.m.
  • Location: Bureau of Labor Statistics, Conference Center Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstracts:

Reluctance to Participate in Federal Demographic Surveys: An Exploration of the National Health Interview Survey and Consumer Expenditure Survey using Survey Process Data

Nancy Bates and Andrea Piani U.S. Census Bureau

In 2002-2003, the Census Bureau designed an automated contact history data collection system known as the Contact History Instrument or CHI. The CHI was developed to systematically record the number of contact attempts, mode, date and time of attempt and other details behind interim outcomes in personal visit surveys (e.g., reasons for refusals and strategies attempted).

Using CHI data from the 2005 National Health Interview Survey and the 2005 Consumer Expenditure Survey, we explore reasons why some households are reluctant to participate in the interview process. We investigate the extent of reluctance, what the most frequently cited reasons are, and whether these vary by characteristics such as survey topic, household composition, and other auxiliary variables such as region, urbanicity, or mode of contact. We also report how patterns of reluctance may change as the number of contacts increases. Finally we explore whether some reasons are more highly correlated with the decision to refuse the survey. In closing we offer recommendations how CHI data can be used as a feedback mechanism for improving field productivity and understanding the reasons people participate in federal surveys.

Developing Models of Initial Contact in the National Health Interview Survey (NHIS)

James M. Dahlhamer, Barbara J. Stussman, Catherine M. Simile and Beth Taylor National Center for Health Statistics, Centers for Disease Control and Prevention

Response rates in government surveys have been declining over the past two decades raising concerns about the ability of survey estimates to accurately reflect the characteristics of the target population. One of the reasons for declining response rates is the reduced accessibility of households, arising, in part, from increased physical control of access to housing units and household compositions in which no one is home for long periods of time. In an effort to achieve acceptable rates and quality of response, interviewers need to be as efficient as possible in contacting sample households so as to leave ample time for gaining respondent cooperation. The purpose of this study, therefore, is to identify factors that influence contactability.

The National Health Interview Survey (NHIS), an on-going population-based health survey conducted by the National Center for Health Statistics, Centers for Disease Control and Prevention, recently adopted the stand-alone, Blaise-based Contact History Instrument (CHI). Interviewers use CHI to record critical information on each contact attempt, including mode, date, and time of attempt, features of doorstep interactions, and strategies used for making contact and gaining cooperation. Using core survey and CHI data from the 2005 NHIS, models of initial contact with sample households are developed and tested. In addition to social-environmental (e.g., MSA status, region of residence) and household-level measures (e.g., the presence of children, household size, etc.) known to influence contactability, the role of interviewer strategies (e.g., time and mode of contact attempt, information-seeking behaviors) is assessed. By identifying attributes of difficult-to-contact households and the strategies for improving accessibility, survey procedures can be adjusted to improve the efficiency of field operations.

Return to top

Title: Selecting the Threshold Through the Entropy of the Dirichlet Process When the Peaks Over Threshold Method is Applied in Extreme Value Analysis

  • Speaker: Professor DJ DeWaal
    Department of Statistics/Mathematical Statistics
    Bloemfontein University, South Africa
  • Date: Friday, January 20, 2006
  • Time: 4:00 pm - 5:00 pm
  • Location: 2140 Pennsylvania avenue, Statistics Library. Foggy Bottom metro stop on the blue and orange line.
  • Sponsor: The George Washington University, Departments of Management Science and Statistics

Abstract:

The choice of the threshold t if the Peaks over Threshold (POT) method is applied to model extreme data through the entropy of the Dirichlet Process (DP) is considered. Davison & Smith (1990) proposed the mean excess function as a way to choose the threshold if the Generalized Pareto Distribution (GPD) is fitted, but it is not very satisfactorily. Various methods for comparison such as minimizing the mean square error of the Hill estimator and Bootstrap methods exist (Beirlant et al, 2004) to select t, but they are independent of the model. Various models require different thresholds. The predictive density of a future extreme observation is derived from the DP and an application discussed.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Reza Modarres, Department of Statistics. Email: Reza@gwu.edu, phone: 202-994-6359.

Return to top

Title: Phase Changes in Subtree Varieties in Random Trees

  • Speaker: Professor Hosam M. Mahmoud
    Department of Statistics
    George Washington University
  • Date: Friday, January 27, 2006
  • Time: 11:00 am - 12:00 noon
  • Location: 1957 E street, room B16. Foggy Bottom metro stop on the blue and orange line.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

The occurrence of patterns in random objects is an important area of modern research. The prime example is the interest one may have in the number of occurrences of words of a certain length in that text. Applications abound in linguistics where one wishes to analyze grammatical frequencies, or in genetics where one tries to identify genes in strands of DNA. The equivalent and equally important view in random trees is to find patterns (which are trees of a certain size or a certain shape) in a given tree generated randomly. We look at the number of subtrees of a certain size on the fringe of random recursive trees, which have applications in epidemiology, philology, etc., and in random binary search trees, which have applications in computer science.

We consider the variety of subtrees of various sizes and shapes lying on the fringe of a recursive tree. For the number of subtrees of a given size k =k(n) in a random recursive tree of size n, three cases are identified: the subcritical, when k(n)/sqrt(n) tends to zero, the critical, when k(n) is of the exact order sqrt(n), and the supercritical, when k(n)/ sqrt(n) tends to infinity. We show by analytic methods convergence in distribution to 0 in the supercritical case and to normality (of a normalized version of the size) in the subcritical case. We show that the size in the critical case when k/sqrt(n) approaches a limit converges in distribution to a Poisson random variable, and in the case k/sqrt(n) does not approach a finite nonzero limit, the size oscillates and does not converge in distribution to any random variable. This provides an understanding of the complete spectrum of phases and the gradual change from the subcritical to the supercritical phase.

We utilize the same battery of methods to derive similar results for binary search trees. Connections are made to Riccati equations, Polya urns and contraction in metric spaces of distributions and fixed-point equations for distribution functions.

This work is based on papers joint with Chun Su, and Qunqiang Feng, University of Science and Technology of China, and Alois Panholzer, Technical University, Vienna, Austria.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Reza Modarres, Department of Statistics. Email: Reza@gwu.edu, phone: 202-994-6359.

Return to top

Title: An RKHS Formulation of Discrimination and Classification for Stochastic Processes

  • Speaker: Hyejin Shin, Department of Statistics, Texas A&M University
  • Time And Place: Thursday , February 9, 2006, 12:30-1:45pm in Room 3206, Math Building.
  • Sponsor: University of Maryland, Statistics Program

Abstract:

Modern data collection methods are now frequently returning observations that should be viewed as the result of digitized recording or sampling from stochastic processes rather than vectors of finite length. In spite of great demands, only a few classification methodologies for such data have been suggested and supporting theory is quite limited. Our focus is on discrimination and classification in the infinite dimensional setting. The methodology and theory we develop are based on the abstract canonical correlation concept in Eubank and Hsing (2005) and motivated by the fact that Fisher's discriminant analysis method is intimately tied to canonical correlation analysis. Specially, we have developed a theoretical framework for discrimination and classification of sample paths from stochastic processes through use of the Lo`eve-Parzen isometric mapping that connects a second order process to the reproducing kernel Hilbert space generated by its covariance kernel. This approach provides a seamless transition between finite and infinite dimensional settings and lends itself well to computation via smoothing and regularization.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.

Return to top

Title: New Monte Carlo strategies with applications to spatial models

  • Speaker: Professor Murali Haran
    Department of Statistics
    Pennsylvania State University
  • Date: Friday, February 10, 2006
  • Time: 11:00 am - 12:00 noon
  • Location: 1957 E street, room B16. Foggy Bottom metro stop on the blue and orange line.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Hierarchical Bayes models often result in posterior distributions that present challenges to non-expert users of Markov chain Monte Carlo (MCMC) methods. Two long standing questions are: How long should one run an MCMC algorithm and, once the algorithm is stopped, how accurate are the resulting estimates? I will describe two approaches for resolving these questions. The first involves the construction of "perfect" or exact sampling procedures for which these questions are moot. Perfect samplers have so far been generally impractical for all but the simplest Bayesian problems; I will explain how one can use perfect samplers for some realistic Bayesian models. The second approach describes how Monte Carlo standard errors for MCMC-based estimates can be computed and used to determine the run length of the algorithm, thereby providing practical and theoretically justified answers to both questions of interest. I will conclude with an application of these methods to some data examples.

This talk is based on joint work with: Brian Caffo (Johns Hopkins University), Galin Jones and Ronald Neath (University of Minnesota) And Luke Tierney (University of Iowa),

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.

Return to top

Title: Agent-based Simulation Using Social Network Models of Alcohol Systems with Acute Outcomes

  • Speaker: Yasmin H.Said
    Department of Applied Mathematics and Statistics
    Johns Hopkins University
  • Time: February 10, 2006 -- 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Location: JInnovation Hall, Room 139, Fairfax Campus. eorge Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University Statistics Colloquium

Abstract:

This research is to investigate a prototype of a model framework for the use and abuse of alcohol. The model is intended to provide a tool for the assessment for interventions that are meant to minimize alcohol-related acute outcomes (intentional and unintentional injuries/death) without causing a financial or social burden and without imposing interventions that are ultimately ineffective (or even simply not cost effective). Our framework is ecological (individual agents and interactions are represented), stochastic (neither individual behavior nor consequences of interventions are certain) and very flexible. We have developed a space dependent stochastic digraph model of alcohol use and abuse. The intent is to study potential interventions and investigate their effectiveness at reducing the overall prevalence of acute outcomes. Current interventions focus on one outcome at a time rather than simultaneously considering all outcomes. It is clear that a similar model structure of social networks can be applied to terrorist networks, to computer networks, to syndromic surveillance, and to other applications that are characterized by requiring interventions for the simultaneous suppression of acute outcomes.

The Statistics Colloquium Series is open to all and is sponsored by the Department of Applied and Engineering Statistics, the Center for Computational Statistics, the School of Computational Sciences and the Data Sciences Program at George Mason University. Use these links for directions and a campus map. If driving, visitors should use the visitor's parking area in the Parking Deck (near the middle of the map). Signs on campus point the way to the Parking Deck. Visitors using Metro can take a bus from the Vienna Metro Station.

Return to top

Title: Regression fractional hot deck imputation

  • Speaker: Professor Jae-Kwang Kim, Dept. of Applied Statistics, Yonsei University, Korea
  • Time And Place: Thursday , February 16, 2006, 3:30pm, Room 1313, Math Building.
  • Sponsor: University of Maryland, Statistics Program

Abstract:

Imputation using a regression model is a method to preserve the correlation among variables and to provide imputed point estimators. We discuss the implementation of regression imputation using fractional imputation. By a suitable choice of fractional weights, the fractional regression imputation can take the form of hot deck fractional imputation, thus no artificial values are constructed after the imputation. A variance estimator, which extends the method of Kim and Fuller (2004, Biometrika), is also proposed. By a suitable choice of imputation cells, the proposed estimators can be made robust against the failure of the assumed regression imputation model. Comparisons based on simulations are presented.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.

Return to top

Title: Indirect Monetary Incentives with a Complex Agricultural Establishment Survey

  • Chair: Diane Willimack, Census Bureau
  • Speakers: Dan Beckler and Kathy Ott, USDA, National Agricultural Statistics Service
  • Discussant: Danna Moore, Washington State University
  • Date/Time: Wednesday, February 22, 2006 / 12:30 - 2 pm
  • Location: Bureau of Labor Statistics, Conference Center Room 4. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

The United States Department ofAgriculture's National Agricultural Statistics Service (NASS) conducts several agricultural surveys. One of the most complex is the annual Agricultural Resource Management Survey (ARMS) Phase III, a survey that collects detailed economic data such as farm assets, operator expenses, farm income, debt, and operator characteristics. Part of this survey's sample uses a sixteen-page questionnaire with mail-out/mail-back data collection and face-to-face nonresponse follow-up.

Both prepaid and promised indirect monetary incentives were used in survey year 2004 in order to increase mail response rates and reduce costly face-to-face follow-up interviews. Five treatment groups, including a control group, were used for the incentive experiment. Prepaid and promised indirect cash incentives in the form of $20 automated teller machine (ATM) cards and priority mail were used as stimuli. Response rates, ATM card usage, and costs for the treatment groups will be presented. In addition, recommendations and future use of incentives at NASS will be discussed.

Return to top

Topic: Usability Engineering: Can-Do Processes for Software Development in the Federal Workplace

  • Speaker: Theresa A. O'Connell, President, Humans and Computers, Inc., Upper Marlboro, Maryland
  • Date/Time: February 22, 2006, 10:30 - 11:30 a.m.
  • Location: U.S. Census Bureau, the Morris Hansen Auditorium, Building 3, 4700 Silver Hill Road, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

Once almost an afterthought, usability engineering (UE) is taking its place as an integral part of software development. Theresa A. (Teri) O'Connell explains what usability and UE are. She demonstrates how UE has progressed from its roots in human factors and computer science to become a set of multi-disciplinary-based processes that address the needs of both software users and software providers. She smashes usability myths and presents the reality of UE as a set of beneficial, can-do processes for the Federal workplace.

Metrics-based usability testing is only the tip of the iceberg. UE starts during project planning and contributes value through delivery. Learn how UE can integrate with software development lifecycles in the Federal workplace. See how, as steps in a software development project, usability engineering processes can prevent expensive retrofitting, promote on-time delivery of quality products and contribute to user satisfaction.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Model Selection and Inference: Facts and Fiction

  • Speaker: Professor Hannes Leeb, Statistics Department, Yale University
  • Time and Date: Friday, February 24, 2006, 3:00 pm
  • Location: 2205 Lefrak Hall, University of Maryland, College Park, MD 20742
  • Sponsor: University of Maryland, Statistics Program

Abstract:

Model selection has an important impact on subsequent inference. Ignoring the model selection step leads to invalid inference. We discuss some intricate aspects of data-driven model selection that do not seem to have been widely appreciated in the literature. We debunk some myths about model selection, in particular the myth that consistent model selection has no effect on subsequent inference asymptotically. We also discuss an `impossibility' result regarding the estimation of the finite-sample distribution of post-model-selection estimators.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.

Return to top

Title: Distributions in the Ehrenfest Process

  • Speaker: Professor Srinivasan Balaji
    Department of Statistics
    George Washington University
  • Date: Friday, February 24, 2006
  • Time: 4:00 pm - 5:00 pm
  • Location: 1957 E street, room B16. Foggy Bottom metro stop on the blue and orange line.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

In this talk we will discuss recent results on Ehrenfest processes, which are special cases of Polya processes. Polya process is obtained by embedding a discrete Polya urn process in continuous time. Polya processes give us the best heuristics to approximate and understand the intricate Polya urn models as the discrete process is in general difficult to deal with. Recent work has shown that a class of partial differential equations, related to the corresponding ball addition matrix, governs such processes. However only in some cases these partial differential equations are amenable to asymptotic solution. After describing the general Polya process, we will restrict our attention to a tenable class of urns that generalize the classical Ehrenfest model. Ehrenfest urns arise in applications to model the exchange of particles in two connected gas chambers. Finally we conclude with some remarks about the connections to pseudo expectation of Markov chains.

The work on Ehrenfest processes is a joint work done with Prof. Hosam Mahmoud in the department of Statistics, GWU, and Prof. Osamu Watanabe of Tokyo Institute of Technology.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.

Return to top

Title: The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

  • Speaker: Elizabeth Garrett-Mayer, Ph.D.
  • Date/Time: Wednesday, March 1, 2006 / 11:00 am - 12:00 noon
  • Location: Executive Plaza North, Conference Room G. Address: 6130 Executive Blvd, Rockville MD, 20852 (http://www-dceg.ims.nci.nih.gov/images/localmap.gif). Contact: the Office of Preventive Oncology, 301-496-8640
  • Sponsor: WSS Biostatistics/Public Health Section

Abstract:

I develop a model for evaluating an ordinal rating system where I assume that the true underlying disease state is continuous in nature. This approach is motivated by a dataset with 35 microscopic slides with 35 representative duct lesions of the pancreas. Each of the slides was evaluated by eight raters using two novel rating systems (PanIN illustrations and PanIN nomenclature), where each rater used each system to rate the slide with slide identity masked between evaluations. I find that the two methods perform equally well but that differentiation of higher grade lesions is more consistent across raters than differentiation across raters for lower grade lesions.

A proportional odds model is assumed, which allows us to estimate rater-specific thresholds for comparing agreement. In this situation where there are two methods of rating, it can be determined whether the two methods have the same thresholds and whether or not raters perform equivalently across methods. Unlike some other model-based approaches for measuring agreement, the focus is on the interpretation of model parameters and their scientific relevance. Posterior estimates of rater-specific parameters are compared across raters to see if they are implementing the intended rating system in the same manner. Estimated standard deviation distributions are used to make inferences as to whether raters are consistent and whether there are differences in rating behaviors in the two rating systems under comparison.

Return to top

Title: Statistical Applications in FDA

  • Speaker: Guoxing (Greg) Soon, Ph.D., Food & Drug Administration, CDER
  • Time and Date: Thurs., March 2, 2006, 3:30pm
  • Location: Room 1313, Math Bldg, University of Maryland, College Park, MD 20742
  • Sponsor: University of Maryland, Statistics Program

Abstract:

This talk will be divided into three parts. In the beginning I will briefly describe the kind of work the FDA statistician do, then I will discuss two topics, one is on "From Intermediate endpoint to final endpoint: a conditional power approach for accelerated approval and interim analysis", one is on "Computer Intensive and Re-randomization Tests in Clinical Trials".

1. Statistical Issues in FDA

Statistics plays an important role in the FDAÕs decision making process. Statistical inputs were critical for design, conduct, analysis and interpretation of clinical trials. The statistical issues we dealt with include, but not limited to the following: appropriateness of randomization procedure, determination of analysis population, blinding, potential design flaws that may lead to biases, quality of endpoint assessment, interim analysis, information handling, missing values, discontinuations, decision rule, analysis methods, and interpretation. In this talk I will describe the type of work we do with a few examples.

2. From Intermediate endpoint to final endpoint: a conditional power approach for accelerated approval and interim analysis

For chronic and life threatening diseases, the clinical trials required for final FDA approval may take a long time. It is therefore sometimes necessary to approve the drug temporarily (accelerated approval) based on early surrogate endpoints. Traditionally such approvals were based on similar requirements on the surrogate endpoints as if it is final endpoint, regardless of the quality of the surrogacy. ÊHowever, in this case the longer term information on some patients is ignored, and the risk for the eventual failure on the final approval is not being considered.

In contrast, in typical group sequential trials, only information on the final endpoint on a fraction of patients are used, and short-term endpoints on other patients are being ignored.Ê This reduces the efficiency of inferences and will also fail to account for potential shift of population over the course of the trial.

In this talk I will propose an approach that utilizes both short-term surrogate and long-term final endpoint at interim or intermediate analyses, and the decision for terminating trial early, or granting temporary approval, will be based on the likelihood of seeing a successful trial were the trial to be completed. Issues on Type I error control as well as efficiency of the procedure will be discussed.

3. Computer Intensive and Re-randomization Tests in Clinical Trials

Quite often clinicians are concerned about balancing important covariates at baseline. Allocation methods designed to achieve deliberate balance on baseline covariates, commonly called dynamic allocation or minimization, were used for this purpose. This non-standard allocation poses challenge for the common statistical analysis. In this talk I will examine robustness of level and power of common tests with deliberately balanced assignments when assumed distribution of responses is not correct.

There are two methods of testing with such allocations: computer intensive and model based.Ê I will review some of the common mistaken attitudes about the goals of randomization. And I will discuss some simulations that attempt to explore the operating characteristics of re-randomization and model based analyses when model assumptions are violated.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.

Return to top

Title: Andrews Image: An Alternative Visualization of Andrews' Curves

  • Speaker: Wendy Martinez, Office of Naval Research
  • Date: March 3, 2006
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Location: Innovation Hall, Room 136, Fairfax Campus, George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

Andrews' curves are a way of visualizing high-dimensional data. Each observation is projected onto a set of orthogonal trigonometric functions and displayed as a curve. It is known that Andrews' curves preserve distances, so they have many uses for data analysis and exploration. However, they are not very useful when working with large data sets because of over plotting. In this talk, I present an alternative visualization methodology suitable for Andrews' curves that is based on a technique sometimes known as data images. This new visualization methodology is most useful when the size of the data set is large. I first describe the data sets that are used to illustrate the concepts. I then present some background information on Andrews' curves, as well as data images. Finally, I provide examples and show how this technique can be used to explore the data sets.

Note: Colloquia on various topics in statistics are presented on Friday mornings at 10:30. The colloquia are open to all, and registration is not required. A list of current and past colloguia is avaiable at http://www.science.gmu.edu/~jgentle/compstat/colloquium.htm. Directions to the campus are avaiable at http://www.gmu.edu/welcome/Directions-to-GMU.html.

Return to top

Topic: Survey Participation as Prosocial Behavior

  • Speaker: C. Daniel Batson, Professor of Social Psychology, University of Kansas
  • Date/Time: March 7, 2006, 10:30 a.m. - 11:30 a.m.
  • Location: U.S. Bureau of Census, 4401 Suitland Road, Room 3225/4, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

Why do people participate in surveys? Survey participation has typically been thought of as a form of compliance, often in response to persuasion. Might participation instead be framed as a prosocial act? Such a frame prompts one to consider possible prosocial motives that might lead one to participate. It also prompts one to consider the possible dangers of inducing participation with persuasion. Especially in the long term, a prosocial frame may produce better results.

This is the third in a series of scholarly exchanges and seminars at the Census Bureau on interviewer-respondent interaction sponsored by the Statistical Research Division. For more details, please contact Andy Jocuns (301-763-2726).

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Small Area Estimation using Multiple Surveys

  • Chair: Donald Malec, U.S. Bureau of the Census
  • Speaker: William W. Davis, National Cancer Institute, NIH
  • Discussant: Robin Fisher, U.S. Bureau of the Census
  • Date/Time: Thursday, March 9, 2006 / 12:30 to 2 pm
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

Cancer surveillance research requires estimates of the prevalence of cancer risk factors and screening for small areas such as counties. We make small area estimates utilizing information from the Behavioral Risk Factor Surveillance System (BRFSS), a telephone survey conducted by state agencies, and the National Health Interview Survey (NHIS), an area probability sample survey conducted through face-to-face interviews. Both data sources have advantages and disadvantages. The BRFSS is a larger survey, but it has lower response rates than the NHIS, and it does not include subjects who live in households with no telephones. On the other hand, the NHIS is a smaller survey, but it includes both telephone and non-telephone households and has higher response rates than the BRFSS.

We combine the information from the two surveys to address both non-response and non-coverage errors using the following two methods:

  • A hierarchical Bayesian approach using the Markov Chain Monte Carlo (MCMC) method is used to simulate draws from the posterior distribution. This approach utilizes NHIS county-level identifiers that are not available in the public use file.

  • A statistical weight modification approach where propensity scores are used to adjust BRFSS weights so that the propensity score distribution among the BRFSS cases approximates that of the NHIS. This approach is carried out using the NHIS public use file.

Prevalence estimates with standard errors are made for selected geographical areas (counties and states), selected time periods (1997-1999 and 2000-2003), and selected binary outcomes (smoking and cancer screening). Multi-year periods are used to obtain more accurate small area estimates. The estimates made using both surveys are compared with the (single-source) BRFSS direct estimates.

Return to top

Title: On local Minimax Estimation With Some Consequences For Ridge Regression, Tree Learning and Reproducing Kernel Methods

  • Speaker: Professor Lee K. Jones, Department of Mathematical Sciences, University of Massachusetts Lowell
  • Time and Date: Thurs., March 9, 2006, 3:30pm
  • Location: Room 1313, Math Bldg, University of Maryland, College Park, MD 20742
  • Sponsor: University of Maryland, Statistics Program

Abstract:

Local learning is the process of determining the value of an unknown function at only one fixed query point based on information about the values of the function at other points. We propose an optimal methodology ( local minimax estimation) for local learning of functions with band-limited ranges which differs from (and is demonstrated in many interesting cases to be superior to) several popular local and global learning methods. In this theory the objective is to minimize the (maximum) prediction error at the query point only - rather than minimize some average performance over the entire domain of the function. Since different compute-intensive procedures are required for each different query, local learning algorithms have only recently become feasible due to the advances in computer availability, capability and parallelizability of the last two decades.

In this talk we first apply local minimax estimation to linear functions. A rotationally invariant approach yields ridge regression, the ridge parameter and optimal finite sample error bounds. A scale invariant approach similarly yields best error bounds but is fundamentally different from either ridge or lasso regression. The error bounds are given in a general form which is valid for approximately linear target functions.

Using these bounds an optimal local aggregate estimator is derived from the trees in a Breiman (random) forest or a deterministic forest. Finding the estimator requires the solution to a challenging large dimensional non-differentiable convex optimization problem. Some approximate solutions to the forest optimization are given for classification using micro-array data.

Finally the theory is applied to reproducing kernel Hilbert space and an improved Tikhonov estimator for probability of correct classification is presented along with a proposal for local determination of optimal kernel shape without cross validation.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.math.umd.edu/statistics/seminar.shtml. Directions to the campus is at http://www.math.umd.edu/contact/.

Return to top

Title: The State of Record Linkage

  • Chair: Marianne Winglee, Westat
  • Speaker: William E. Winkler, U.S. Bureau of the Census
  • Discussant: Charles Day, Internal Revenue Service
  • Date/Time: Wednesday, March 15, 2006 / 12:30 to 2 pm
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

This talk will provide an overview of record linkage (Fellegi & Sunter JASA 1969) and describe a series of research problems. Although most of the research is in the computer science literature (http://csaa.byu.edu/kdd03cleaning.html - using statistical models of machine learning - and http://iqis.irisa.fr - using database methods), some of the most difficult problems are primarily statistical (see Belin & Rubin JASA 1995; Scheuren & Winkler Survey Methodology 1993, 1997; Larsen & Rubin JASA 2001, Lahiri & Larsen JASA 2005 for initial progress). This talk describes methods of string comparison (Yancey 2003, 2005; Cohen et al. 2003 - that sometimes apply Hidden Markov models), various methods of data extraction and standardization (Borkar et al. ACM SIGMOD 2001, Cohen & Sarawagi KDD 2004, Agichtein & Ganti KDD 2004 again using Hidden Markov models), and various methods of estimating error rates and adjusting statistical analyses for linkage error. It will describe beginning research on Parallel BigMatch (Yancey, Winkler, and Creecy 2006 in progress) that is hoped to be 30+ times as fast as the uniprocessor BigMatch. Current BigMatch (130,000 pairs per second) is designed for matching moderate size files having 300 million records against large administration having upwards of 4 billion records.

Return to top

Title: Applying the Capability Maturity Model in Statistical Organization

  • Speaker: John M. Bushery, U.S. Census Bureau
  • Discussant: Sean Curran, Bureau of Labor Statistics
  • Chair: Eugene Burns, Bureau of Transportation Statistics
  • Date/Time: Wednesday, March 22, 2006, 12:30-2:00 PM
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Quality Assurance and Physical Sciences Section

Abstract:

Statistical principles and tools play a key role in quality management methodologies, such as Total Quality Management (TQM), Six Sigma, and the like. Many corporations have used these management methods and statistical tools to improve efficiency, lower costs, reduce cycle times, and increase customer satisfaction. Ironically, statistical organizations generally appear reluctant to adopt quality management methods or apply statistical tools to most aspects of their own work.

About a decade ago, TQM was "all the rage" and several statistical organizations attempted to implement it. However, for the most part, TQM "did not stick."

Thispaper introduces yet another model for quality management, Capability Maturity Model Integration, and explains why the chances for success are higher with this methodology than with TQM.

Return to top

Title: Estimating Drug Use Prevalence Using Latent Class Models with Item Count Response as One of the Indicators

  • Chair: Dean H. Judson, U.S. Bureau of the Census
  • Speaker: Paul Biemer, RTI International
  • WebPage: http://www.rti.org/experts.cfm?objectid=6E703887-343D-4D32-8DDA0F933AA1A886
  • Discussant: Douglas Wright, Substance Abuse and Mental Health Services Administration
  • Date/Time: Tuesday, March 28, 2006 / 12:30 - 2 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in G440. To be placed on the seminar list attendance list at the Bureau. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

The item count (IC) method for estimating the prevalence of sensitive behaviors was applied to the National Survey on Drug Use and Health (NSDUH) to estimate the prevalence of past year cocaine use. Despite considerable effort and research to refine and adapt the IC method to this survey, the method failed to produce estimates that were any larger than the estimates based on self-reports. Further analysis indicated the problem to be measurement error in the IC responses. To address the problem, a new model-based estimator was proposed to correct the IC estimates for measurement error and produce less biased prevalence estimates. The model combines the IC data, replicated measurements of the IC items, and responses to the cocaine use question to obtain estimates of the classification error in the observed data. The data were treated as fallible indicators of (latent) true values and traditional latent class analysis assumptions were made to obtain an identifiable model. The resulting estimates of the cocaine use prevalence were approximately 43 percent larger than the self-report only estimates and the estimated underreporting rates were consistent with those estimated from other studies of drug use underreporting.

Return to top
Return to top

Topic: Introductory Expertise and Tailoring in the Opening Moments of Telephone Requests for Survey Participation

  • Speaker: Douglas W. Maynard, Professor, Department of Sociology, University of Wisconsin
  • Date/Time: March 29, 2006, 10:30 - 11:30 a.m.
  • Location: U.S. Bureau of Census, 4401 Suitland Road, Room 3225/4, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

In recent times, the difficulty of reaching potential respondents to request participation in survey interviews has increased, and these respondents are more likely to say they do not have time and/or that they are not interested. In this presentation, I show the initial phase of a collaborative investigation aimed to improve response rates for computer-aided telephone surveys (CATIs). We apply the well-developed procedures of conversation analysis to the recordings of CATI openings to specify and measure the effectiveness of efforts at recruitment to the survey. The main issues are (a) introductory expertise and (b) tailoring. For (a) we focus on interviewers' ways of configuring an opening--the initial request for participation--for facets such as identification, greeting, listening for response and so on. For (b) we identify how interviewers answer questions and deal with issues that call recipients may raise when saying they are not interested or otherwise beginning to decline the request. I will use audio recordings to illustrate these matters, and briefly discuss plans for future research in which we will develop a coding scheme for measuring introductory expertise and tailoring, and code a large sample of initial contacts as well as refusal conversion attempts. Using the coding results, we will conduct logistic regression analyses to determine whether and to what extent interviewer practices increase participation rates and whether practices increase participation rates for some sample subgroups more than others. Our overall ambition is to better understand the moment-to-moment contingencies involved in obtaining survey participation so as to develop implications for experimental research and training protocols for interviewers.

This is the fourth in a series of scholarly exchanges and seminars at the Census Bureau on interviewer-respondent interaction sponsored by the Statistical Research Division. For more details, please contact Andy Jocuns (301-763-2726).

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Statistical Issues in Large-Scale Genetic Association Studies

  • Professor Eleanor Feingold
    Department of Human Genetics
    University of Pittsburgh
  • Time: 11:00-12:00 p.m.
  • Date: March 31, 2006
  • Location: 1957 E street, room B16. Foggy Bottom metro stop on the blue and orange line.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

New genomic technologies are now making it possible (if not quite affordable) to conduct genetic association studies on a very large scale - up to half a million genetic markers spanning the genome. In some ways these studies are quite statistically routine, but there are also a number of new challenges for biostatisticians in both the design and analysis. This talk is a survey of several areas in which I think there are important statistical problems that have not gotten enough attention. I will give some illustrative research results and discuss some open questions.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Kaushik Ghosh, Department of Statistics. Email: ghosh@gwu.edu, phone: 202-994-6889.

Return to top

Title: An Analysis of a Two-Way Categorical Table Incorporating Intra-Class Correlation

  • Speaker: Jai Choi, Mathematical Statistician, NCHS/ORM
  • Chair: Joe Fred Gonzalez, National Center for Health Statistics (NCHS)
  • Date/Time: Tuesday, April 4, 2006 / 10:00 -11:30 a.m.
  • Location: National Center for Health Statistics, Room 1403. To attend seminars at NCHS, you need to email your name and title of the seminar to JGonzalez@cdc.gov by noon of the work day before the seminar. Bring a photo ID to the seminar. Further instructions for admission will be given upon receipt of your email. NCHS is located at 3311 Toledo Road in Hyattsville, MD. Metro: From Prince Georges Plaza on Green line, take footbridge across East-West Hwy and go one block north on Belcrest, go half block east on Toledo.
  • Sponsor: WSS Public Health and Biostatistics Section and NCHS/ORM

Abstract:

It is straight forward to analyze data from a single multinomial table. Specifically, for the analysis of a two-way categorical table, the common chi-squared test of independence between the two variables and maximum likelihood estimators of the cell probabilities are readily available. When the counts in the two-way categorical table are formed from familial data (clusters of correlated data), the common chi-squared test no longer applies. We note that there are several approximate adjustments to the common chi-squared test. For example, Choi and McHugh (1989, Biometrics, 45, 979-996) showed how to adjust the chi-squared statistic for clustered and weighted data. However, our main contribution is the construction and analysis of a Bayesian model which removes all analytical approximations. This is an extension of a standard multinomial-Dirichlet model to include the intra-class correlation associated with the individuals within a cluster. We have used a key formula described by Altham (1976, Biometrika, 63, 263-269) to incorporate the intra-class correlation. This intra-class correlation varies with the size of the cluster, but we assume that it is the same for all clusters of the same size for the same variable. We use Markov chain Monte Carlo methods to fit our model, and to make posterior inference about the intra-class correlations and the cell probabilities. Also, using Monte Carlo integration with a binomial importance function, we obtain the Bayes factor for a test of no association. To demonstrate the performance of the alternative test and estimation procedure, we have used data on activity limitation status and age from the National Health Interview Survey and a simulation study.

Return to top

Title: Conversational Practices with a Purpose: Interaction within the Standardized Interview

  • Speaker: Nora Cate Schaeffer, Prof. of Sociology, University of Wisconsin
  • Discussants: Frederick Conrad, University of Michigan & JPSM and Betsy Martin, Census Bureau
  • Time and Date: Friday, April 7, 2006, 3:30pm
  • Location: 2205 Lefrak Hall, University of Maryland, College Park, MD 20742
  • Contact: Rupa Jethwa Eapen, 301-314-7911, rjeapen@survey.umd.edu

Abstract:

The lecture will discuss interactions in survey interviews and standardization as it is actually pacticed. An early view of the survey interview characterized it as a "conversation with a purpose," and this view was later echoed in the description of survey interviews as "conversations at random." In contrast to these informal characterizations of the survey interview, stand the formal rules and constraints of standardization as they have developed over several decades. Someplace in between a "conversation with a purpose" and a perfectly implemented standardized interview are the actual practices of interviewers and respondents as they go about their tasks. Most examinations of interaction in the survey interview have used standardization as a starting point and focused on how successfully standardization has been implemented, for example by examining whether interviewers read questions as worded. However, as researchers have looked more closely at what interviewers and respondents do, they have described how the participants import into the survey interview conversational practices learned in other contexts. As such observations have accumulated, they provide a vehicle for considering how conversational practices might support or undermine the goals of measurement within the survey interview. Our examination of recorded interviews from the Wisconsin Longitudinal Study provides a set of observations to use in discussing the relationship among interactional practices, standardization, and measurement.

Return to top

Title: Using Comparative Genomics to Assess the Function of Moncoding Sequences (NCS)

  • Speaker: Professor Peter Bickel, Department of Statistics, University of California, Berkeley
  • Discussant: Professor Steven Salzberg, University of Maryland
  • Date: Thursday, April 20, 2006
  • Time: 4:15 -- 6:00 PM
  • Reception: 6:00 -- 6:45 PM, Rotunda, Mathematics Building
  • Place: Lecture Hall 1410, Physics Building, University of Maryland, College Park
  • Sponsor: University of Maryland, Statistics Program

Abstract:

We have studied 2094 NCS of length 150-200bp from Edward Rubin's laboratory. These sequences are conserved at high homology between human, mouse, and fugu. Given the degree of homology with fugu, it seems plausible that all or part of most of these sequences is functional and, in fact, there is already some experimental validation of this conjecture. Our goal is to construct predictors of regulation (or potential irrelevance) by the NCS of nearby genes and further using binding sites and the transcription factors that bind to them to deduce some pathway information. One approach is to collect covariates such as features of nearest genes, physical clustering indices, etc, and use statistical methods to identify covariates, select among these for importance, relate these to each other and use them to create stochastic descriptions of the NCS which can be used for NCS clustering and NCS and gene function prediction singly and jointly. Of particular importance so far has been GO term annotation and tissue expression of downstream genes as well as the presence of blocks of binding sites known from TRANSFAC data base in some of the NCS. Our results so far are consistent with those of recent papers engaged in related explorations such as Woolfe et al (2004), Bejerano et al (2005) and others but also suggest new conclusions of biological interest.

Parking: Free parking is available from 4:00 PM at Parking Lot XX, and at the Parking Garage. As you enter the campus from Route 1, make the first right and again a first right to reach lot XX. To enter the Parking Garage, turn right at the "Big M", go to the first stop sign, make a left and another left to enter.

The Physics Building is located very near the "Big M" circle, and across from the Parking Garage.

Please visit the UMD Statistics Consortium web site at http://www.statconsortium.umd.edu for updates and details.

For information contact Rupa Jethwa Eapen, 301-314-7911, rjeapen@survey.umd.edu.

Return to top

Title: A Generation of Data: The General Social Surveys, 1972-2006 and Beyond

  • Speaker: Tom Smith, National Opinion Research Center, University of Chicago
  • Chair: Norman Bradburn, NORC, University of Chicago
  • Discussant: Clyde Tucker, Bureau of Labor Statistics
  • Date/Time: Wednesday, April 26, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Social and Demographic Statistics Section

Abstract:

This presentation will describe the design and structure of the GSS; discuss important findings including major societal trends, cross-national differences, and sub-group analyses of ethno-racial and religious groups; and detail several methodological and substantive innovations that are being introduced in the latest round of GSSs.

Return to top

Title: Stochastic Variants of EM: Monte Carlo, Quasi-Monte Carlo and More

  • Chair: Charles Hallahan, USDA/ERS
  • Speaker: Wolfgang Jank, University of Maryland
  • Discussant: James Gentle, George Mason University
  • Date/Time: Friday, April 28, 2006 / 12:30 to 2 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Statistical Computing Section, WSS

Abstract:

Many statistical models involve a combination of observed and unobserved data. Examples include the linear mixed model, the generalized linear mixed model or the hierarchical model. The EM (Expectation-Maximization) algorithm naturally appeals to this situation by iteratively imputing the unobserved data. One of the problems though with EM is that in many contemporary models the expectation step is analytically intractable, leading to integrals that have no closed-form solution. This is especially problematic in situations where the integral is of high dimension. An increasingly popular approach to overcome this problem is to approximate the integral via simulation. This leads to a stochastic EM implementation. In this presentation we review some of the recent advances in this field which include the Ascent-based Monte Carlo EM algorithm (a new automated MCEM version based on EM's famous likelihood ascent property), efficient quasi-Monte Carlo EM versions, and a new automated implementation of the stochastic approximation version of EM. We motivate and illustrate our problem in the context of a geostatistical model for online purchases.

Return to top

Title: A Semiparametric Approach to Time Series Prediction

  • Speaker: Professor Benjamin Kedem
    Department of Mathematics
    University of Maryland
  • Time: 11:00-12:00 noon - April 28, 2006
  • Location: 1957 E street, room B16. Foggy Bottom metro stop on the blue and orange line
  • Sponsor: The George Washington University, Departments of Management Science and Statistics

Abstract:

Given m time series regression models, linear or not, with additive noise components, it is shown how to estimate the predictive probability distribution of all the time series conditional on the observed and covariate data at the time of prediction. This is done by a certain synergy argument, assuming that the distributions of the residual components associated with the regression models are tilted versions of a reference distribution. Point predictors are obtained from the predictive distribution as a byproduct. Applications to US mortality rates prediction and to value at risk (VaR) estimation will be discussed.

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact person is Reza Modarres, Department of Statistics. Email: Reza@gwu.edu, phone: 202-994-6359.

Return to top

Topic: Assessing the Effects of Variability in Interest Rate Derivative Pricing

  • Speaker: Michael Crotty, North Carolina State University
  • Date/Time: May 1, 2006, 11:00 - 12:00 Noon
  • Location: U.S. Census Bureau, 4401 Suitland Road, Suitland, Maryland, Room 3225/FOB 4. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

Once almost an afterthought, usability engineering (UE) is taking its place as an integral part of Interest rate derivatives are financial instruments similar in spirit to stock options. These derivatives depend on what a particular interest rate is or will be in a certain amount of time. A commonly traded interest rate derivative is an interest rate cap, which pays an amount equal to a notional amount multiplied by the amount a specified interest rate exceeds the initially agreed upon strike rate at periodic intervals until maturity. This type derivative can be analyzed as a series of payment streams called caplets. There are many models available for pricing interest rate derivatives. One such model utilized by this research is the Hull-White model for the short rate, implemented with a trinomial tree. This model can be used to determine pricing for interest rate derivatives at some point in the future. More information regarding interest rate derivatives and various pricing methods can be found in Hull (2003).

This analysis of the effects of variation in interest rate derivative pricing is in two parts. First, a bootstrap approach is used to determine the variability in the term structure of the zero rate curve, one of the inputs into the Hull-White trinomial tree. The zero rate is the interest rate that would be earned on a bond that has no intermediate coupon payments and pays face value at maturity. The zero rate curve is modeled with splines using an approach developed by Fisher et al. (1994). This spline approach is then bootstrapped to determine the variability of the spline at various maturities.

The secondpart of this research deals with propagating the variability of the zero rate curve into the Hull-White pricing model. For this, a bootstrap approach is also used. This bootstrap includes the entire method of derivative pricing from the calculation of a zero curve using a set of bond prices to inputting that zero curve into the pricing model.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Exploiting Sparsity and Within-array Replications in Analysis of Microarray Data

  • Speaker: Professor Jianqing Fan, Department of Operation Research and Financial Engineering, Princeton University
  • Date/Time: Wednesday, May 3, 2006 / 11:00 a.m. - 12:00 noon
  • Location: Lectures are held at NIH's Executive Plaza complex (Executive Plaza North, Conference Room G), 6130 Executive Boulevard, Rockville, Maryland. Pay parking is available.

Abstract:

Normalization of microarray data is essential for coping with experimental variations and revealing meaningful biological results. We have developed a normalization procedure based on Semi-Linear In-slide Model (SLIM), which adjusts objectively experimental variations and are applicable to both cDNA microarrays and Affymetrix oligonucleotide arrays. We then present methods for validating the effectiveness of different methods of normalization, using within-array replications. We exploit the sparsity in differently expressed genes for normalization and analysis of gene expressions. The significant analysis of gene expressions is based on a variation t-statistic. The P-values are estimated based on a sieved permutation, which explores the sparsity of differently expressed genes. The use of the newly developed techniques is illustrated in a comparison of the expression profiles of neuroblastoma cells that were suppressed by a growth factor, macrophage migration inhibitory factor.

Return to top

Title: Self-employment and Entrepreneurship: Reconciling Household and Administrative Measures

  • Speakers:
    Melissa Bjelland, Bureau of Labor Statistics, Cornell University and LEHD (Bureau of the Census)
    John Haltiwanger, University of Maryland, NBER and LEHD
    Kristin Sandusky, LEHD
    James Spletzer, Bureau of Labor Statistics
  • Discussant: Katharine Abraham, University of Maryland and NBER
  • Chair: Linda Atkinson, Economic Research Service, USDA
  • Date/time: Thursday, May 4, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 8. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section

Abstract:

Have changes in the economy blurred the boundaries of the population of self-employed, a large and historically difficult segment of the workforce to quantify? To date, household-based surveys such as the Current Population Survey (CPS) have provided the leading sources of information on the self-employed, a substantial group accounting for about 11% of the workforce that operate over 16 million businesses. Yet it is unknown how well respondent reports of self-employment align with information from administrative sources and how disagreements may have changed over time. The increase in outsourcing and hobby businesses has made the standard survey question, "Are you self-employed?" less straightforward to answer. In this paper, we use micro data from the 1995-2001 Annual Social and Economic (March) Supplements of the CPS linked with administrative (tax-based) data from the Social Security Administration's Detailed Earnings Records (DER) and the Census Bureau's Business Register (containing both employer and non-employer businesses). While levels of entrepreneurship are fairly similar in the CPS, DER and Business Register, our initial findings suggest that the datasets do not consistently agree on which workers are self-employed. We find striking levels of misclassification; for example, less than half of the workers who are self-employed in the survey data are also self-employed in the administrative data. To better understand this disparity and to help identify respondent types likely to provide misleading or incorrect information, we characterize these differences over time by worker and job traits that include age, education, and industry. Lastly, we examine possible connections between this mismatch and gaps between household and business-based measures of employment and earnings at various stages of the business cycle.

Return to top

2005 ROGER HERRIOT AWARD

Title: Encouraging Innovation in Government Statistical Agencies: Roger Herriot's Legacy

  • Speaker: Robert E. Fay, U.S. Census Bureau
  • Chair: Lawrence H. Cox, National Center for Health Statistics
  • Date/time: Tuesday, May 9, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: Roger Herriot Award Committee

Abstract:

When he died unexpectedly in 1994, Roger Herriot left behind numerous colleagues in the Federal statistical community who knew both him and his work closely. At that time, it was self-evident that Roger had been a remarkable innovator.

In a 1995 paper, William Butz fondly recollected many of Roger's personal characteristics possibly contributing to his creativity. I wish to revisit Roger's creativity and to reflect on possible lessons for the statistical system. Although creativity is not yet thoroughly understood, findings from cognitive psychology and related disciplines suggest ways in which many of Roger's attributes served him well. At the same time, he benefited from the institutional environments in which he worked. I will argue that the future of innovation in government statistical agencies may in part depend on preserving niches where people like Roger can be both creative and influential.

Return to top

Topic: A Multiscale Method for Disease Mapping in Spatial Epidemiology

  • Chair: Linda Williams Pickle, National Cancer Institute, NIH
  • Speaker: Mary M. Louie, National Center for Health Statistics
  • Discussant: Myron J. Katzoff, National Center for Health Statistics
  • Date:/Time: Tuesday, May 16, 2006 / 12:30 - 2:00 p.m.
  • Location: Bureau ofLabor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

The effects of spatial scale in disease mapping are well-recognized, in that the information conveyed by such maps varies with scale. Here we provide an inferential framework, in the context of tract count data, for describing the distribution of relative risk simultaneously across a hierarchy of multiple scales. In particular, we offer a multiscale extension of the canonical standardized mortality ratio (SMR), consisting of Bayesian posterior-based strategies for both estimation and characterization of uncertainty. As a result, a hierarchy of informative disease and confidence maps can be produced, without the need to first try to identify a single appropriate scale of analysis. We explore the behavior of the proposed methodology in a small simulation study, and we illustrate its usage through an application to data on gastric cancer in Tuscany. By way of comparison, we also present results from a hierarchical Bayesian model. Throughout, we discuss broader issues associated with the task of disease mapping such as over-dispersion and estimating relative risks for small areas.

Return to top

Topic: Optimizing the Use of Microdata: A New Perspective on Confidentiality and Access

  • Chair: Mary Grace Kovar, NORC, University of Chicago
  • Speaker: Julia Lane, National Opinion Research Center, University of Chicago
  • Discussant: Stephanie Shipp, NIST
  • Date:/Time: Wednesday, May 17, 2006, 12:30-2:00 pm
  • Location: Bureau ofLabor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Social and Demographic Statistics Section

Abstract:

The effects of spatial scale in disease mapping are well-recognized, in that the information conveyed New capacities to collect and integrate data offer expanded potential for scientists and policy-makers to understand factors contributing to key national priorities -like job, income and wealth creation, as well career path and retirement decisions made by individuals. This capacity can also contribute to meeting a critical national security need. The major security threat to the United States is inherently human and an improved ability to understand and predict malevolent behaviors can provide one means for addressing that threat. Two substantial challenges face collectors and producers of economic data as a result of this increased capacity. The first is how can the information derived from vast streams of data on human beings be used while protecting confidentiality? The second is the essence of good science: how can society best provide and promote access to rich and sensitive data so that empirical results can be generalized and replicated?

This paper argues that focusing on confidentiality protection alone will lead to piecemeal approaches and result in outcomes that are neither in the best interests of decision-makers nor of society at large. The appropriate approach is to optimize the amount of data access, subject to meeting key confidentiality constraints. This paper begins by discussing current confidentiality protection techniques accompanied by illustrations of some consequences for the typical type of analyses performed by economists. It then describes the challenges that are emerging as a result of technological advances, and develops a simple economic framework. The paper concludes with a suggested research agenda.

Return to top

Title: Estimating Drug Use Prevalence Using Latent Class Models with Item Count Response as One of the Indicators

  • Chair: John Bushery, U.S. Bureau of the Census
  • Speaker: Paul Biemer, RTI International
  • WebPage: http://www.rti.org/experts.cfm?objectid=6E703887-343D-4D32-8DDA0F933AA1A886
  • Discussant: Douglas Wright, Substance Abuse and Mental Health Services Administration
  • Date/Time: Thursday, June 1, 2006 / 12:30 - 2 p.m.
  • Location: Bureau ofLabor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor Methodology Section, WSS

Abstract:

The item count (IC) method for estimating the prevalence of sensitive behaviors was applied to the National Survey on Drug Use and Health (NSDUH) to estimate the prevalence of past year cocaine use. Despite considerable effort and research to refine and adapt the IC method to this survey, the method failed to produce estimates that were any larger than the estimates based on self-reports. Further analysis indicated the problem to be measurement error in the IC responses. To address the problem, a new model-based estimator was proposed to correct the IC estimates for measurement error and produce less biased prevalence estimates. The model combines the IC data, replicated measurements of the IC items, and responses to the cocaine use question to obtain estimates of the classification error in the observed data. The data were treated as fallible indicators of (latent) true values and traditional latent class analysis assumptions were made to obtain an identifiable model. The resulting estimates of the cocaine use prevalence were approximately 43 percent larger than the self-report only estimates and the estimated underreporting rates were consistent with those estimated from other studies of drug use underreporting.

Return to top

Title: Bayesian and Frequentist Methods for Provider Profiling Using Risk-Adjusted Assessments of Medical Outcomes

  • Chair: Trena M. Ezzati-Rice, Agency for Healthcare Research & Quality
  • Speaker: Joseph Sedransk, Case Western Reserve University
  • Discussant: Robert Baskin, Agency for Healthcare Research & Quality
  • Date/Time: Tuesday, 6 June, 2006 / 12:30 to 2:00 pm
  • Location: NIH's Executive Plaza complex at Executive Plaza North, Conference Room 319, 6130 Executive Boulevard, Rockville, Maryland; Pay parking is available. Check with security upon entry photo ID required.
  • Sponsor: WSS Section on Public Health and Biostatistics, WSS Methodology Section

Abstract:

We propose a new method and compare conventional and Bayesian methodologies that are used or proposed for use for 'provider profiling', an evaluation of the quality of health care. The conventional approaches to computing these provider assessments are to use likelihood-based frequentist methodologies, and the new Bayesian method is patterned after these. For each of three models we compare the frequentist and Bayesian approaches using the data employed by the New York State Department of Health for its annually released reports that profile hospitals permitted to perform coronary artery bypass graft surgery. Additional, constructed, data sets are used to sharpen our conclusions. With the advances of Markov chain Monte Carlo methods, Bayesian methods are easily implemented and are preferable to standard frequentist methods for models with a binary dependent variable since the latter always rely on asymptotic approximations.

Comparisons across methods associated with different models are important because of current proposals to use random effect (exchangeable) models for provider profiling. We also summarize and discuss important issues in the conduct of provider profiling such as inclusion of provider characteristics in the model and choice of criteria for determining unsatisfactory performance.

Return to top

Title: Model Evaluation and Model Selection Based on Prediction Error for Various Outcomes

  • Speaker: Tanxi Cai, Ph.D., Harvard University School of Public Health, Department of Biostatistics
  • Date/Time: Wednesday, June 7th, 2006 / 11:00 am - 12:00 noon
  • Location: Executive Plaza North, Conference Room G. Address: 6130 Executive Blvd, Rockville MD, 20852. Contact: the Office of Preventive Oncology, 301-496-8640
  • Sponsor: WSS Biostatistics/Public Health Section

Abstract:

The construction of a reliable, practically useful prediction rule for future responses is heavily dependent on the ``adequacy" of the fitted regression model. In this research, we consider the absolute prediction error, the expected value of the absolute difference between the future and predicted responses, as the model evaluation criterion and as a basis for evaluating the accuracy of a given prediction rule. This prediction error has the same scale as the observed outcome and thus has better interpretation than the average squared error and the R-square.

When the outcome is binary, the absolute prediction error is is equivalent to the mis-classification error. When the outcome is censored event time, we propose classification rules for predicting the t-year survival status and compare the classification accuracy of prediction rules constructed based on various working models. We show that the distributions of the apparent error type estimators and their cross-validation counterparts are approximately normal even under a misspecified fitted model. When the prediction rule is ``unsmooth", the variance of the above normal distribution can be estimated well via a perturbation-resampling method.

We also show how to approximate the distribution of the difference of the estimated prediction errors from two competing models. Through real data examples and simulation studies, we demonstrate that the resulting interval estimates for prediction errors provide much more information about model adequacy than the point estimates alone.

Return to top

Topic: Independence

  • Speaker: Cynthia Clark, Director of Methodology, Office of National Statistics, UK
  • Chair: Connie Citro, Committee on National Statistics, the National Academies
  • Discussant: Fritz Scheuren, NORC
  • Date/Time: Wednesday, June 7, 2006 / 12:30 to 2:00
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Social and Demographic Statistics Section

Abstract:

The U.K. Chancellor, Gordon Brown, announced in November 2005 that he planned to introduce legislation to make the U.K. Office of National Statistics (ONS) independent of Government. His proposal is to make the governance and publication of official statistics the responsibility of a wholly separate body at arms length from Government and fully independent of it. The legislation would create an independent Governing Board for the ONS with external members of the Board including leading experts in statistics. The Board has responsibility for meeting an overall objective for the statistical system's integrity. The ONS would be accountable to Parliament through reporting of the Board. The Board would be questioned by the Treasury Select Committee on their performance. The overall goal is for the legislation and the change in organizational arrangements to improve public trust in official statistics. This presentation will discuss the current governmental structure for ONS and the decentralized U.K. Government Statistical Service, covering the issues that have arisen in developing proposed legislation. Comparisons will be made with issues faced in the U.S. with similar proposals.

Return to top

Topic: How Many Students Really Graduate from High School? The Process of High School Attrition

  • Speaker: Charles Hirschman Bixby Visiting Scholar, Population Reference Bureau (On leave from the University of Washington)
  • Date/Time: Wednesday, June 7, 2006, 10:30 - Noon
  • Location: U.S. Census Bureau, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, Bldg. 3. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

Retrospective survey questions of educational attainment and school administrative records yield different estimates of on-time high school graduation in the United States. This study applies life table models to track students from 9th grade forward to estimate the risks of retention, dropout, and graduation with school administrative records. High school attrition is a process with failure and retention often preceding dropping out. Some students return, but it is much easier to descend than to return and catch up. For many students, poor academic performance in the first semester of the 9th grade is the critical experience that leads to subsequent failure and attrition.

This presentation, with insights for the U.S. and federal workforces, is a joint collaboration of SRD, POP and HRD/Human Capital Management Council.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Characterization of Cost Structures, Perceived Value and Optimization Issues in Small Domain Estimation

  • Chair: Michael P. Cohen, Bureau of Transportation Statistics (retired)
  • Speaker: John L. Eltinge, Bureau of Labor Statistics
  • Discussant: David G. Waddington, U.S. Bureau of the Census
  • Date/Time: Thursday, 8 June, 2006 / 12:30 to 2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center in G440 Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

In recent years, government statistical agencies have encountered many requests from stakeholders for production of estimates covering a large number of relatively small subpopulations. Due to resource constraints, agencies generally are not able to satisfy these requests through additional data collection and subsequent production of standard direct estimates. Instead, agencies attempt to meet some of the stakeholders' requests with estimators that combine information from sample data and auxiliary sources. In essence, the agencies are substituting technology (i.e., modeling and related methodological work) for data-collection labor, and in exchange the agencies and data users incur additional risks related to potential model lack of fit and potential misinterpretation of published results.

This presentation characterizes some of the resulting trade-offs among cost structures, data quality, perceived value and optimization issues in small domain estimation. Four topics receive principal attention. First, we highlight several classes of direct and indirect costs incurred by the producers and users of small domain estimates. This leads to consideration of possible cost optimization for small domain estimation programs, which may include the costs of sample design features, access to auxiliary data sources, analytic resources and dissemination efforts. Second, we use the Brackstone (1999) framework of six components of data quality to review some statistical properties of direct design-based and model-based estimators for small domains, and to link these properties with related components of risk. Quality issues related to exploratory analysis and implicit multiple comparisons receive special attention. Third, we explore data users' perceptions of the value of published small domain estimates, and of costs incurred through decisions not to publish estimates for some subpopulations. We suggest that the data users' perceptions are similar to those reported in the general literature on adoption and diffusion of technology, and that this literature can offer some important insights into efficient integration of efforts by researchers, survey managers and data users. Fourth, we emphasize the importance of constraints in the administrative development and implementation of small domain estimation programs. We consider constraints on both the production processes and on the availability of information regarding costs and data quality. These constraints can often dominate the administrative decision process. This in turn suggests some mathematically rich classes of constrained optimization problems that would warrant further research.

Return to top

Title: An Update on the NIST Statistical Reference Datasets for MCMC: Ranking the Sources of Numerical Error in MCMC Computations

  • Chair: Charles Hallahan, USDA/ERS
  • Speakers: Hung-kung Liu and William F. Guthrie, NIST
  • Discussant: TBD
  • Date/Time: Tuesday, June 20, 2006 / 12:30 to 2 p.m.
  • Location: Bureauof Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Statistical Computing Section, WSS

Abstract:

In the Statistical Reference Datasets (StRD) project, NIST provided datasets on the web (www.itl.nist.gov/div898/strd/index.html) with certified values for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. Another important new area in statistical computing, not addressed in the original STRD project, is the Bayesian analysis using Markov chain Monte Carlo. Despite its importance, the numerical accuracy of software for MCMC is largely unknown. We have recently updated the StRD web site with the six new datasets for Bayesian model fitting using MCMC algorithms. We will discuss some results obtained using these datasets that challenge the conventional wisdom that longer simulations lead to improved approximation of posterior distribution parameters. The sources of numerical error that arise in the computations associated with a simple Bayesian model for data sets from the StRD web site will be studied. The different sources of numerical error will be compared and ranked with respect to their impact on the total numerical error.

Return to top

Title: Implications for RDD Design from an Incentive Experiment

  • Chair: Jonaki Bose, Bureau of Transportation Statistics
  • Speakers: Chris Chapman, National Center for Education Statistics
  • Date/Time: Thursday, June 22, 2006 / 12:30 to 2 pm
  • Location: Bureau of Labor Statistics, Conference Center Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

The National Household Education Surveys Program (NHES) includes a series of random digit dial (RDD) surveys developed by the National Center for Education Statistics (NCES) in the Institute of Education Sciences, U.S. Department of Education. It is designed to collect information on important educational issues through telephone surveys of households in the United States. In 2003, we conducted an experiment using NHES to test the effectiveness of various levels of incentives in gaining increased initial cooperation, refusal conversion, and overall unit response rates. Approximately 79,000 telephone numbers were included in the experiment. The results of the experiment indicate that small cash incentives, used during initial contact stages of the interview process (the Screener stage) can be effective in improving unit response.

Return to top

U.S. BUREAU OF CENSUS
THE WISE ELDERS PROGRAM

Topic: Statistics and Public Policy: Past and Future

  • Speaker: Janet L. Norwood
  • Date/Time: June 28, 2006, 10:30 - Noon
  • Location: U.S. Census Bureau, 4700 Silver Hill Road, Suitland, Maryland - the Morris Hansen Auditorium, Bldg. 3. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.

The Human Capital Management Council and the SRD Seminars Series are pleased to sponsor the 5th Wise Elders' Program Presentation.

Dr. Norwood will talk about the interaction of data and public policy based on her experience as Labor Statistics Commissioner under three Presidents and six Secretaries of Labor and against that background what she thinks some of the major issues are that the agencies in the U.S. statistical system will face in the future.

Janet L. Norwood is Counselor and Senior Fellow at the New York Conference Board. She chairs a National Academy of Sciences panel on the measurement of Hunger and Food Insecurity and chairs an Academy of Public Administration Panel on Employment Offshoring. She has served on several corporate and non-profit Boards and currently is a Director on the Board of the National Opinion Research Center at the University of Chicago.

From 1992-99, she was a Senior Fellow at the Urban Institute, where she worked on statistical policy and labor market issues. President Bush named her Chair of the Advisory Council on Unemployment Compensation in 1992, and President Clinton reappointed her to that post in 1993. She served as U.S. Commissioner of Labor Statistics from 1979-92, having been appointed by Presidents Carter and Reagan and confirmed by the Senate. She has testified often before Congressional Committees, has written articles and monographs on statistical and labor market issues and is the author of a 1995 book, Organizing to Count: Change in the Federal Statistical System.

Norwood earned a B.A. from Douglass College, Rutgers University and an M.A. and Ph.D. from the Fletcher School of Law and Diplomacy of Tufts University. She was awarded honorary Doctor of Law degrees by Rutgers, Harvard, Carnegie Mellon, and Florida International Universities. In 1988, she was awarded distinguished rank in the federal Senior Executive Service and Meritorious rank in 1985. She was designated a National Associate at the National Academies for her contributions to the National Research Council, and is the recipient of the National Public Service award, the Elmer B. Staats Award, Dickinson College's Benjamin Rush Award, the American Statistical Association's Founders Award, and the Labor Department's Philip Arnow Award. In 2002, the Committee of Presidents of Statistical Societies awarded her the Elizabeth Scott Award for furtherance of careers of women in statistics. She has served on Visitors' Committees at Harvard, Carnegie Mellon and American Universities, the University of Pennsylvania, MIT, and the University of California at Berkley.

Norwood is a past President and Fellow of the American Statistical Association, an elected member and past Vice President of the International Statistical Institute, a Fellow of the National Association of Business Economists and of the National Academy of Public Administration, and was elected an Honorary Fellow of the Royal Statistical Society. She is a past President of the Cosmos Club and of the Consortium of Social Science Associations.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Topic: The International Programs Center Involvement in HIV/AIDS Activities

  • Speakers: Timothy B. Fowler, Peter D. Johnson, Laura M. Heaton International Programs Center Population Division
  • Date/Time: July 12, 2006, 10:30 - 11:30 a.m.
  • Location: U.S. Census Bureau, the Morris Hansen Auditorium, FOB 3, 4700 Silver Hill Road, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

The International Programs Center's (IPC) work in the area of HIV/AIDS began in 1987 investigating the impact of AIDS mortality on population projections for less-developed countries, work that IPC has been doing since the 1950s. IPC staff will present an overview of their work on HIV/AIDS and how that work has evolved over time. They will discuss their products, sponsors, and the extensive mathematical modeling work on AIDS-related mortality. They also will review the impact of HIV/AIDS mortality on population projections (primarily for countries in sub-Saharan Africa) dating back to 1994. One of IPC's most recent projects will be highlighted: modeling HIV infections averted in selected countries. Averting 7 million HIV infections is one of three key goals in the President's Emergency Plan for AIDS Relief (PEPFAR) designed to evaluate progress in stemming the tide of HIV/AIDS epidemics in fifteen focus countries.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: A Multiscale Method for Disease Mapping in Spatial Epidemiology

  • Chair: Linda Williams Pickle, National Cancer Institute, NIH
  • Speaker: Mary M. Louie, National Center for Health Statistics
  • Discussant: Myron J. Katzoff, National Center for Health Statistics
  • Date/Time: Tuesday, 25 July, 2006 / 12:30 to 2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

The effects of spatial scale in disease mapping are well-recognized, in that the information conveyed by such maps varies with scale. Here we provide an inferential framework, in the context of tract count data, for describing the distribution of relative risk simultaneously across a hierarchy of multiple scales. In particular, we offer a multiscale extension of the canonical standardized mortality ratio (SMR), consisting of Bayesian posterior-based strategies for both estimation and characterization of uncertainty. As a result, a hierarchy of informative disease and confidence maps can be produced, without the need to first try to identify a single appropriate scale of analysis. We explore the behavior of the proposed methodology in a small simulation study, and we illustrate its usage through an application to data on gastric cancer in Tuscany. By way of comparison, we also present results from a hierarchical Bayesian model. Throughout, we discuss broader issues associated with the task of disease mapping such as over-dispersion and estimating relative risks for small areas.

Return to top

Title: Bayesian Methods for Incomplete Two-way Categorical Table with Application to the Buckeye State Polls

  • Speaker: YouSung Park, Korea University, and Jai Won Choi, NCHS
  • Chair: Joe Fred Gonzalez, NCHS
  • Date/time: Thursday, July 27, 2006 / 10:30 a.m.- 12:00 p.m.
  • Location: Auditorium 1405B, National Center for Health Statistics, 3311 Toledo Road, Hyattsville , MD 20782. Park at the next building. A photo ID is needed to enter the NCHS/CDC building. Please call Jai Choi (301-458-4144) or Joe Gonzalez (301-458-4239) for directions and to let them know that you are attending.
  • Sponsor: WSS Public Health and Biostatistics Section

Abstract:

When survey counts or responses are classified into a two-way table, substantial counts miss information on the column or row or both. Then we can not have correct cell inference with fully classified counts only. Hence we propose a method how to utilize the partially classified counts to have correct inference on the cells. To accomplish this goal, we use Bayesian method with five different priors, three previously known and two newly created; we then compare them to maximum likelihood (ML) method under the assumption that responses could be ignorable or nonignorable. Although Bayesian method (BM) often solve the boundary solution problem of ML, BM not always gives better solution as the performance of BM depends partly on the prior specification. We use four sets of data from 1998 Ohio state polls to illustrate the method. Our simulation study also compares the five Bayesian models of five different priors to ML under the ignorable or non-ignorable nonresponse assumption. It is interesting to see that the winner of Columbus Mayor could loose if more people who were unlikely to vote actually voted.

Return to top

Topic: Besov Spaces and Empirical Mode Decomposition for Seasonal Adjustment in Nonstationary Time Series

  • Speaker: Christopher D. Blakely, PhD Candidate, University of Maryland and U.S. Census Bureau
  • Date/Time: August 29, 2006, 2:00 - 3:00 p.m.
  • Location: U.S. Census Bureau, 4401 Suitland Road, Room 3225, FOB 4, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

The purpose of this presentation is to introduce an empirical analysis technique for seasonal extraction in nonstationary time series. The proposed method is a non-model based approach to signal extraction which utilizes analysis techniques borrowed from harmonic analysis including certain wavelet characterizations and empirical mode decompositions of the time series. We give a detailed account of this new seasonal adjustment algorithm with brief reviews of wavelet characterizations and the fast empirical mode decomposition followed by numerical examples which both verify and validate the method's accuracy and robustness.

This seminar is physically accessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free and confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your requests via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: Integration of Gene Expression and Copy Number

  • Speaker: Debashis Ghosh, PhD
  • Date/time: Friday, September 1, 2006 / 10:00 - 11:00 AM
  • Location: 3950 Reservoir Road, NW, Research Building, Conference Room E501, Georgetown University Medical Center, Washington, DC 20057
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics. For information please contact Marina E. Vacaru at 202-687-4114 or emv6@georgetown.edu

Abstract:

Recently, there has been a plethora of cancer studies in which samples have been profiled using both gene expression and copy number microarrays. We give a brief introduction to copy number and describe a study of copy number/gene expression correlation in publicly available cancer cell line datasets. We then describe some recent multiple testing procedures for integrating the different types of data along with a comparative study of different segmentation methods for microarray data.

Return to top

Symposium: Economic Turbulence: Is a Volatile Economy Good for America?

  • Speakers:
    Clair Brown, (University of California, Berkeley)
    John Haltiwanger, (University of Maryland)
    Julia Lane, (NORC/University of Chicago)
  • Date/Time: Tuesday, September 12, 2006 / 12:00 1:00 p.m.
    (lunch); 1:00 - 4:30 p.m. (symposium)
  • Location: The Keck Room in the National Academies of Sciences, 500 5th Street, NW. Please register before September 7 with Maggie Newman (newman-maggie@norc.uchicago.edu).
  • Sponsors: The symposium is supported by the Alfred P. Sloan Foundation, the U.S. Census Bureau, NORC/University of Chicago and the University of Chicago Press. Additional sponsors include the Washington Statistical Society.

Abstract:

Clair Brown, John Haltiwanger and Julia Lane would like to invite you to lunch followed by a symposium to highlight the findings of a new book "Economic Turbulence: Is a Volatile Economy Good for America" that they have coauthored. Clair, John and Julia will present key chapters from the book; David Autor (MIT), Charlie Brown (University of Michigan), and Erica Groshen (NY Federal Reserve) will provide their insights. More information on the book is available at http://www.press.uchicago.edu/cgi-bin/hfs.cgi/00/212025.ctl. Attendees will receive a free copy of the book.

Return to top

Title: Prediction of Finite Population Totals Based on the Sample Distribution

  • Chair: Daniell Toth - BLS, Bureau of Labor Statistics
  • Speaker: Michail Sverchkov, Bureau of Labor Statistics
  • Discussant: Phillip S. Kott, National Agricultural Statistics Service
  • Date/Time: Tuesday, September 19, 2006 / 12:30 to 2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Section

Abstract:

Recently, there has been a plethora of cancer studies in which samples have been profiled using both This research studies the use of the sample distribution for the prediction of finite population totals under single-stage sampling. The proposed predictors employ the sample values of the target study variable, the sampling weights of the sample units and possibly known population values of auxiliary variables. The prediction problem is solved by estimating the expectation of the study values for units outside the sample as a function of the corresponding expectation under the sample distribution and the sampling weights. The prediction mean square error is estimated by a combination of an inverse sampling procedure and a re-sampling method. An interesting outcome of the present analysis is that several familiar estimators in common use are shown to be special cases of the proposed approach, thus providing them a new interpretation. The performance of the new and some old predictors in common use is evaluated and compared by a Monte Carlo simulation study using a real data set. This is joint work with Danny Pfeffermann.

Return to top

Title: Baseline Adjustment By Inducing Partial Ordering When Measurements Are Ordered Categories

  • Chair: Grant Izmirlian, NCI Division of Cancer Prevention
  • Speaker: YanYan Zhou, Florida International University
  • Discussant: Vance Berger, NCI Division of Cancer Prevention
  • Date/time: Wednesday, October 4, 2006 / 11:00 a.m. to 12:00 noon
  • Location: NIH's Executive Plaza complex. Executive Plaza North, Conference Room 319, 6130 Executive Boulevard, Rockville, Maryland; pay parking is available. Check with security upon entry photo ID required.
  • Sponsor: NCI Division of Cancer Prevention and WSS Section on Public Health and Biostatistics

Abstract:

In the context of randomized clinical trials, multiplicity arises in many forms. One prominent example is when a key endpoint is measured and analyzed both at baseline and after treatment. It is common to analyze each separately, but more efficient to adjust the post-treatment comparisons for the baseline values. Adjustment techniques generally treat the covariate (baseline value, in this case) as either nominal or continuous. Either is problematic when applied to an ordinal covariate, the former because it fails to exploit the natural ordering and the latter because it relies on an artificial notion of linear prediction and differences between values.

We propose new methods for adjusting for ordinal covariates without having to treat them as nominal or continuous.

Return to top

Title: Detection of Anatomical Landmark

  • Speaker :Bruno Jedynak, PhD, Johns Hopkins University Baltimore, MD
  • Date/time: Friday, October 6, 2006, 10:00 - 11:00am. Refreshments will be served at 9:45 am.
  • Location: Lombardi Comprehensive Cancer Center -Georgetown University Medical Center
    4000 Reservoir Road, NW
    E501 Conference Room, The Research Building
    Washington, DC 20057
  • Sponsor: Department of Biostatistics, Bioinformatics, and Biomathematics, Bio3 Seminar Series

Abstract:

Anatomical landmarks are well-defined points in the anatomy that experts use to establish biologically meaningful correspondences between structures. Such correspondences are commonly used by registration algorithms, as initialization and/or as constraints. Landmarks also provide a local shape description useful for anatomical shape comparison. However, locating landmarks on biological structures is a challenging and time-consuming task, even for experts. For example, in Brain MRI imagery, manually locating 15 landmarks on the Hippocampus takes several hours.

Dr. Jedynak will present during this talk a system for automatic landmarking developed by his student Camille Izard.

Return to top

Title: Nonprofit Employment: Improving Estimates With A Match Of IRS Information Forms And BLS QCEW

  • Speaker: Martin David, The Urban Institute
  • Discussants: John Czajka, Mathematica Policy Research Paul Arnsberger, IRS
  • Chair: Linda Atkinson, Economic Research Service, USDA
  • Date/time: Monday, October 16, 2006 / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section

Abstract:

Using the QCEW for 2002, Salamon and Sokowloski (2005) estimated employment in nonprofit organizations to be 8.2% of private workers in the US. Alternative estimates (in this report) use information returns filed on IRS Form 990 and 990EZ supplemented by matches to the QCEW. The IRS registers Federal employer identifying numbers (EIN) of exempt organizations and requires most to file Forms 990 annually. The IRS Forms used and the QCEW have limited coverage of the exempt sector. In addition, more than 20% of Form 990 filers fail to report employment and Form 990EZ filers are not required to report employment.

Using the two data sources matched on EIN improves employment estimates. (a) Employment counts on Form 990 are substantially increased by imputing QCEW employment to nonreporters; and (b) employment counts for organizations filing 990EZ are 100% imputed from the QCEW. For organizations that do not match, aggregate employment can be estimated by extrapolating from the probability of false negative reports observed for matched organizations. Coverage and nonreporting vary by industry class. The most global estimates can be made using the National Taxonomy of Exempt Organizations (NTEE) coded for IRS information returns. NAICS is only available for matched data.

Estimates are adjusted for false positive and false negative matches. Criteria that identify false positives eliminate many matched establishments in the years 1999-2003. (The method used in Salamon and Sokowloski can not identify such cases.) False negatives occur because some EIN's are invalid. Estimates of employment taken from the QCEW for organizations that have no Form 990 or File Form 990-EZ are weighted to account for invalid EIN's in the QCEW. The report concludes with recommendations for further work and suggestions for improvements in data processing used by IRS and BLS.

Return to top

Title: Moving versus Fixed Sampling Designs for Detecting Airborne Biological Pathogens

  • Speaker: Steven K. Thompson, Simon Fraser University, Department of Statistics and Actuarial Science
  • Chair: Myron Katzoff, NCHS
  • Date/Time: Wednesday, October 18, 2006 / 12:00 1:00 p.m.
  • Location: Bureau of Labor Statistics (BLS) Conference Center, Room 9. Bring a Photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE across from Union Station which is on the Red Line.
  • Sponsor: Defense and National Security Section

Abstract:

For detecting releases of biological pathogens and other airborne health hazards, is it better to set out sensors in fixed positions or to have them move in some pattern? Beyond simple detection of the hazard, what is the best fixed or moving pattern for characterizing the release in space and time? The more general question is what is the best design for sampling a population that is changing, when the sampling units themselves may move as observations are collected. In this talk I'll describe the motivation for a study of this issue, the results of the study, and some open questions remaining.

Return to top

Title: On Missing Data and Interactions in SNP Association Studies

  • Speaker: Ingo Ruczinski, PhD Department of Biostatistics Johns Hopkins Bloomberg School of Public Health
  • Date/time: Friday, October 20, 2006 / 10:00 11:00 a.m.
  • Location: Georgetown University, Lombardi Comprehensive Cancer Center, New Research Building, Room E501, 3900 Reservoir Road, NW, Washington, DC 20007. Please contact Marina Vacaru at 202-687-4114 or emv6@georgetown.edu if there are any questions.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University

Abstract:

In this presentation we discuss possible solutions for two common problems in SNP association studies: the presence of missing data in the covariates, and the search and evaluation of models allowing for higher order SNP-SNP and SNP-environment interactions.

The majority of SNP association studies are based on data with missing genotype information. The most common approach for dealing with those missing data is to omit the observations that have missing records in the model's covariates. This approach however can have severe shortcomings for the statistical inference, namely a potential bias in the parameter estimates, and the loss of power. The latter can be overwhelming especially when SNP-SNP interactions are considered. In this presentation we show some examples that illustrate the shortcomings of omitting observations, and compare some methods to address the missing data issue. In particular, we propose a novel tree-based imputation algorithm as a solution, and demonstrate how this approach can be used to draw valid statistical inference in the search for and assessment of SNP-SNP interactions, using the Logic regression methodology.

Return to top

Title: Absolute Risk: Clinical Applications and Controversies

  • Speaker: Mitchell H. Gail, M.D., Ph.D.
    Chief of Biostatistics Branch
    Division of Cancer Epidemiology and Genetics
    National Cancer Institute
  • Time: 11:00am - 12:15pm, Oct. 20, 2006
  • Location: ROME 206, GWU. Foggy Bottom metro stop on the blue and orange line.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Absolute risk is the probability that a disease will develop in a defined age interval in a person with specific risk factors. Sometimes absolute risk is called "crude" risk to distinguish it from the cumulative "pure" risk that might arise in the absence of competing causes of mortality. I shall present a model for absolute breast cancer risk and illustrate its clinical applications. I will also describe the kinds of data and approaches that are used to estimate models of absolute risk and two criteria, calibration and discriminatory accuracy, that are used to evaluate absolute risk models. I shall describe efforts to increase the discriminatory accuracy of a model to predict breast cancer by incorporating information on mammographic density and address whether well calibrated models with limited discriminatory accuracy can be useful

Note: For a complete list of upcoming seminars check the department's seminar web site: http://www.gwu.edu/~stat/seminar.htm. The campus map is at: http://www.gwu.edu/~map/. The contact persons are Efstathia Bura, Department of Statistics, email: ebura@gwu.edu, phone: 202-994-6358, and Yinglei Lai, Department of Statistics, e-mail: ylai@gwu.edu and phone: 202-994-6664.

Return to top

Title: Protecting the Confidentiality of Commodity Flow Survey Tabular Data by Adding Noise to the Underlying Microdata

  • Speaker: Paul B. Massell, Statistical Research Division, U.S. Census Bureau
  • Discussant: Jacob Bournazian, Energy Information Administration, U.S. Department of Energy
  • Chair: Linda Atkinson, Economic Research Service, USDA
  • Date/time: Tuesday, October 24, 2006 / 12:30 2:00 p.m.
  • Location: Bureau ofLabor Statistics Conference Center, Room 8. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section

Abstract:

BTS and the U.S. Census Bureau are co-sponsors of the Commodity Flow Survey (CFS). The CFS produces data on the movement of goods in the United States. These data are used by analysts for transportation planning and decision-making and for modeling transportation facilities and services demand. Cell suppression has been used over the years to protect magnitude data values. Data users, especially transportation modelers, have indicated their desire for access to tables with fewer suppressed cells. Census and BTS are exploring the addition of noise to the underlying CFS microdata as an alternative method for protecting the magnitude values. The noise method used here is due to Evans, Zayatz, and Slanta (J. Official Statistics, 1998). Initial research findings have been quite positive. This paper will present our results to date including analysis of noise effects on selected CFS tables. We will describe various ways of measuring the effectiveness of this noise method on any set of tables.

This talk is an expanded version of an invited paper presented at an ASA session on disclosure at JSM2006. That paper was co-authored with Neil Russell, who, at the time of this research, was the confidentiality officer at the Bureau of Transportation Statistics (BTS).

Return to top

THE NATIONAL ACADEMIES
COMMITTEE ON NATIONAL STATISTICS

101st Meeting of the Committee on National Statistics

  • Time: Thursday, October 26, 2006, 3:00 pm
  • Location: Auditorium of the Main Building of the National Academy of Sciences, 2101 Constitution Avenue, N.W., Washington, D.C.
  • A!l are welcome to attend. Please RSVP by October 24 to Bridget Edmonds at (202) 334-3096 or cnstat@nas.edu
  • Welcome and Introduction - Kenneth Prewitt, Columbia University and CNSTAT
  • Developments at the OMB Statistical and Science Policy Office - Katherine K. Wallman
  • Confidentiality Protection and Informed Consent: What Have We Learned? How Should We Proceed? Eleanor Singer and Roderick J. A. Little, University of Michigan

Abstract:

Expanding access to statistical information while protecting confidentiality, respecting privacy, and providing informed consent is fundamental to the health and usefulness of federal statistics. The conversation on these topics is ongoing because the threats to confidentiality and privacy and the means of data access change over time. It is essential to inform this conversation with empirical knowledge. The seminar features research under way at the University of Michigan, funded by the National Institute for Child Health and Human Development. Eleanor Singer will report on how information provided for informed consent affects public perceptions of the risk of survey participation. Rod Little will report on experiments to estimate disclosure risk from linking survey with commercial data and the effects on disclosure risk and analytic utility of topcoding and synthetic methods for confidentiality protection. Hermann Habermann will discuss.

Return to top

Title: Partially Synthetic Data for Disclosure Avoidance: An Application to the American Community Survey Group Quarters Data

  • Chair: Yves Thibaudeau, U.S. Census Bureau
  • Speakers: Sam Hawala, U.S. Census Bureau Rolando Rodriguez, U.S. Census Bureau
  • Discussant: Jerry Reiter, Duke University Institute of Statistics and Decision Sciences
  • Date/time: Thursday, November, 2, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

We investigate the disclosure avoidance approach of releasing to the public partially synthetic microdata. Partially synthetic data are data constructed by keeping some of the actual observations and substituting others with modeled values. This form of data release has the advantage of reflecting the loss of information incurred for confidentiality protection so that inferences are correct. The approach works by selecting variables and records for which we produce synthetic values through models chosen conveniently to satisfy disclosure avoidance but still providing data analysis conclusions comparable to those based on the original data. We assess the method by providing an estimate of disclosure risk and some analytic validity comparisons between estimates of statistics produced from the synthetic and original data.

Return to top

Title: Data-Driven Systems Biology: Direct Paths From Measurements to Biomedical Insight and Personalized Medicine

  • Speaker: Roland Somogyi, Ph.D.
    Biosystemix Ltd, 1090 Cliffside La. RR1PO,
    Sydenham, ON, K0H2T0, Canada
    http://www.biosystemix.com
  • Date: November 3, 2006
  • Time: 10:00-11:00 AM (refreshments will be served at 9:45)
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    Research Building, E501 Conference Room
    Washington, DC 20057

Abstract:

Detailed information on a person's and organism's state can be measured today using novel, high-throughput genomic molecular activity profiling methods, and established cellular, physiological and clinical assays. Given the complexity of living systems and human disease, there is a growing need for powerful and effective computational approaches to inferring and generating predictive models for personalized medicine and mechanistic discovery from the now available large-scale molecular and clinical information. The deeper insight provided by such models promises breakthroughs in evidence-based personalized medicine and bioengineering. Several challenges must be met to generate effective, systems levels computational models, i.e. data organization and management, advanced analysis algorithm development, judicious statistical validation, and visual and conceptual representations that are meaningful to life scientists and clinical practitioners. Successfully meeting these challenges will require scientific and technological development through public and private sector collaborations, and the training of new scientists with interdisciplinary competence in computational, quantitative, and biomedical sciences.

Return to top

16th ANNUAL MORRIS HANSEN LECTURE

Title: Statistical Perspectives On Spatial Social Science

  • Speaker: Michael F. Goodchild, University of California Santa Barbara
  • Discussants: Sarah Nusser, Iowa State University, and Linda Williams Pickle, National Cancer Institute
  • Lecture slide pdfs:
    Michael F. Goodchild (~3.2 mb)
    Sarah Nusser (~1.7 mb)
    Linda Williams Pickle (~1.1 mb)

Abstract:

Recent commentators have drawn attention to what appears to be a "spatial turn" in several disciplines, including some of the social sciences, driven in part by advances in the geographic information technologies - geographic information systems, the Global Positioning System, and satellite remote sensing - and in part by an increasing emphasis on place-based analysis and policy formulation. It is possible to identify several general characteristics of geographic data, each of which presents problems in the application of traditional statistical methods. Spatial dependence and spatial heterogeneity both run counter to standard assumptions of statistical methods, yet both are potentially useful properties of geographic data. There are interesting applications of classic problems in statistical geometry, and much attention over the past two decades has been devoted to modeling the uncertainties that are inevitably present in geographic data. The presentation ends with comments and speculation on future directions for the field, including an increasing emphasis on the temporal dimension.

Return to top

Title: Working with the American Community Survey: Findings from the National Academies Panel

  • Chair: Susan Schechter, U.S. Census Bureau
  • Speaker: Graham Kalton, Westat; Connie Citro, Committee on National Statistics, the National Academies
  • Discussant: Andrew Reamer, The Brookings Institution
  • Date/Time: Tuesday, November 14, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in Room 1. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

The American Community Survey (ACS) has just issued data products for states and large counties and cities for 2005--the first year of full implementation for the survey, which has been in testing and development since 1996. The speakers, who are, respectively, the chair and co-study director of a CNSTAT/NAS Panel on the Functionality and Usability of Information from the ACS, will discuss two topics under review by the panel. The first topic concerns the ways in which the ACS replaces and is different from the decennial census long-form sample and some of the implications for data users. The second topic concerns estimation issues for the ACS, including the weighting for 1-year, 3-year, and 5-year period estimates and the effects of population controls.

Return to top

Title: The Advanced Technology Program: Evaluating a Public-Private Partnership

  • Speaker: Stephanie Shipp, Director, Economic Assessment Office, Advanced Technology Program, NIST
  • Chair: Nancy Donovan, GAO
  • Discussant: Lynda Carlson, NSF
  • Date/time: Wednesday, November 15, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Social and Demographic Statistics Section

Abstract:

The Advanced Technology Program challenges industry to accelerate the development of high-risk technologies that are unlikely to be developed at all or in time to compete in rapidly changing markets. The innovative technologies that ATP funds have the potential to generate significant commercial payoffs and widespread benefits to the U.S. economy, which is the ultimate goal of the program. ATP tracks the progress of these funded projects during the life of the project funding (3 to 5 years) and for up six years after ATP funding ends. ATP tracks projects even longer if the project is successful. This overview will highlight ATP's evaluation best practices that assess the success of individual projects and the portfolio of all projects.

Return to top

Title: Current Proteome Profiling Methods and Applications

  • Speaker: Yetrib Hathout, Ph.D.
    Children's National Medical Center
    Center for Genetic Medicine
    111 Michigan Avenue, NW
    Washington, DC 20010
  • Date: November 17, 2006
  • Time: 10:00-11:00 AM (refreshments will be served at 9:45)
  • Location: Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    Research Building, E501 Conference Room
    Washington, DC 20057

Abstract:

The overarching goal of many proteomics applications is to measure protein abundance between samples representing different biological and physiological states. Two dimensional gel electrophoresis is still the golden standard method used by several researchers. However, the technique has a limited dynamic range, MW/PI ranges and throughput. As a result, considerable efforts were devoted to develop new technical strategies for comprehensive comparative proteomics such as ICAT, 18O labeling and metabolic labeling. In all methods mass spectrometry plays a crucial role for both identification and quantification of proteins. While each technique has its advantages and disadvantages, we found stable isotope labeling by amino acids (SILAC) to be the most promising strategy for accurate comparative proteomics when dealing with cell culture. Because labeled and unlabeled cells can be mixed before protein extraction, variations that would result from sample processing and handling are extremely minimized. Special interest will be focused on the use of the SILAC strategy to accurately monitor changes in protein expression and protein translocation in mammalian cells and the use of current bioinformatics tools for data processing and analysis.

Return to top

Title: Empirical Likelihood Methods for Complex Surveys

  • Chair: Phillip S. Kott, National Agricultural Statistics Service
  • Speaker: Changbao Wu, Department of Statistics and Actuarial Science, University of Waterloo
  • Discussant: James Gentle, Department of Computational and Data Sciences, George Mason University
  • Date/Time: Wednesday, November 29, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center in G440. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Section, WSS

Abstract:

In this talk we provide an overview onrecent development of the empirical likelihood (EL) methods for analyses of complex survey data. Major features of the approach include (1) likelihood-based motivations; (2) the flexibility in using known auxiliary information; (3) data-driven and range respecting confidence intervals; (4) the power for combining information from multiple surveys and multiple frame surveys; and (5) stable and efficient computational algorithms. The EL method can also be viewed as a general approach to calibration and raking and is practically appealing due to its computational advantages. A bootstrap procedure for the pseudo EL ratio confidence intervals with some limited simulation results will be presented.

Return to top

Title: Efficient Design and Analysis of Biospecimens with Incomplete Measurements

  • Speaker:
    Albert Vexler, PhD
    Division of Epidemiology, Statistics & Prevention Research
    National Institute of Child Health & Human Development
    NIH/ DHHS
    Rockville, MD 20852
  • Date/Time: Thursday, December 7, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Washington Statistical Society's Section on Public Policy
  • Date: December 1, 2006
  • Time: 10:00-11:00 AM (refreshments will be served at 9:45)
  • Location:
    Georgetown University
    Lombardi Comprehensive Cancer Center
    3800 Reservoir Road, NW
    New Research Building, E501 Conference Room
    Washington, DC 20057
  • Phone: 202-687-4114

Abstract:

Pooling biospecimens is a well accepted sampling strategy in biomedical research to reduce study cost of measuring biomarkers, and has been shown in the case of normally distributed data to yield more efficient estimation. In this paper we examine the efficiency of pooling, in the context of information matrix related to estimators of unknown parameters, when the biospecimens being pooled yield incomplete observations due to the instruments' limit of detection. Our investigation of three sampling strategies shows that, for a range of values of the detection limit, pooling is the most efficient sampling procedure. For certain other values of the detection limit, pooling can perform poorly.

Surprise: Pooled data based on n observations can be more efficient than unpooled data based on N>n observations. And hence pooling can reduce the Limit of Detection issue.

Return to top

Title: OMB's Proposed Implementation Guidance for the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA)

  • Chair: Katherine K. Wallman, Office of Management and Budget
  • Speaker: Brian Harris-Kojetin, Office of Management and Budget
  • Date/Time: Monday, December 4, 2006 / 12:30 pm Ð 2:00 pm
  • Location: Bureau of Labor Statistics Conference Center Room 1. To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after ÔwssÕ) by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Public Policy Section, WSS

Abstract:

The Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA) can provide strong confidentiality protections for statistical information collections, such as surveys and censuses, as well as for other statistical activities, such as data analysis, modeling, and sample design, that are sponsored or conducted by Federal agencies. On October 16, 2006, OMB issued proposed implementation guidance on CIPSEA for public comment. The purpose of the proposed CIPSEA implementation guidance is to inform agencies about the requirements for using CIPSEA and clarify the circumstances under which CIPSEA can be used. In this session, we will provide an overview of the key issues and requirements covered in the guidance and their implications for agencies using CIPSEA to protect information.

Surprise: Pooled data based on n observations can be more efficient than unpooled data based on N>n observations. And hence pooling can reduce the Limit of Detection issue.

Return to top

Title: Ephedra: A Case Study of Statistics, Policy, and Politics

  • Chair: Dwight Brock, Westat
  • Speaker: Sally C. Morton, RTI International
  • Date/Time: Thursday, December 7, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Washington Statistical Society's Section on Public Policy

Abstract:

In February 2004, the U.S. Food and Drug Administration (FDA) prohibited the sale of dietary supplements containing ephedrine alkaloids (ephedra), stating that such supplements present an unreasonable risk of illness or injury. The Dietary Supplement Health and Education Act (DSHEA) of 1994 governs dietary supplement regulation in the United States. DSHEA places the burden of proof for safety on the government rather than on the manufacturer, and thus differs significantly from regulations that govern the marketing of drugs. Part of the evidence the FDA used in reaching its decision was a systematic review of the efficacy and safety of ephedra conducted by the Southern California Evidence-Based Practice Center. In addition to a meta-analysis of controlled trial data, the review contained an evaluation of observational case report data, a study design that has limited inferential abilities regarding cause and effect.

How did the FDA decide what data were relevant to its decision? How did the FDA argument for the ban differ from a decision based solely on statistical hypothesis testing? This talk will address these questions by describing the systematic review approach, the evidence presented, the interpretation of that evidence by those on both sides of the argument, and the process by which the decision was made.

Return to top

Title: Ephedra: A Case Study of Statistics, Policy, and Politics

  • Chair: Dwight Brock, Westat
  • Speaker: Sally C. Morton, RTI International
  • Date/Time: Thursday, December 7, 2006 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Washington Statistical Society's Section on Public Policy

Abstract:

In February 2004, the U.S. Food and Drug Administration (FDA) prohibited the sale of dietary supplements containing ephedrine alkaloids (ephedra), stating that such supplements present an unreasonable risk of illness or injury. The Dietary Supplement Health and Education Act (DSHEA) of 1994 governs dietary supplement regulation in the United States. DSHEA places the burden of proof for safety on the government rather than on the manufacturer, and thus differs significantly from regulations that govern the marketing of drugs. Part of the evidence the FDA used in reaching its decision was a systematic review of the efficacy and safety of ephedra conducted by the Southern California Evidence-Based Practice Center. In addition to a meta-analysis of controlled trial data, the review contained an evaluation of observational case report data, a study design that has limited inferential abilities regarding cause and effect.

How did the FDA decide what data were relevant to its decision? How did the FDA argument for the ban differ from a decision based solely on statistical hypothesis testing? This talk will address these questions by describing the systematic review approach, the evidence presented, the interpretation of that evidence by those on both sides of the argument, and the process by which the decision was made.

Return to top

Seminar Archives

2017 2016 2015 2014 2013
2012 2011 2010 2009
2008 2007 2006 2005
2004 2003 2002 2001
2000 1999 1998 1997
1996 1995    

Methodology