Washington Statistical Society on Meetup

Washington Statistical Society Seminars: 2008

January, 2008
9
Wed.
Reliability Growth Projection of One-ShotSystems
9
Wed.
U. S. Census Bureau
Demographic Statistical Methods Division Seminar
Alternative Survey Sample Designs, Seminar #2: Sampling with Multiple Overlapping Frames
16
Wed.
Medicaid Underreporting in the CPS: Results from a Record Check Study
17
Thur.
Questionnaire Design Guidelines for Establishment Surveys
23
Wed.
Coverage Measurement for the 2010 Census
30
Wed.
U.S. Bureau Of Census
Statistical Research Division Seminar
Extracting Intrinsic Modes in Stationary and Nonstationary Time Series Using Reproducing Kernels and Quadratic Programming
31
Thur.
U.S. Bureau Of Census
Statistical Research Division Seminar
Experiments on the Optimal Design of Complex Survey Questions
February, 2008
5
Tues.
U. S. Census Bureau
The Wise Elders Program Seminar
A View From The Field
6
Wed.
Statistical Analysis of Bullet Lead Compositions as Forensic Evidence
7
Thur.
Reducing Disclosure Risk in Microdata and Tabular Data
8
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences and Department of Statistics Seminar
Functional Shape Analysis to Forecast Box-Office Revenue using Virtual Stock Exchanges
13
Wed.
Conducting a 360 Degree Feedback Survey for Managers: Implementing Organization Culture Change
14
Thur.
National Household Travel Survey - Demographic Indicators of Travel Demand
15
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences
Pre-Modeling via Bayesian Additive Regression Trees (BART)
20
Wed.
University of Maryland
Statistics Program Seminar
Rationalizing Momentum Interactions
22
Fri.
Office of Biostatistics Research
National Heart, Lung and Blood Institute
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
22
Fri.
Georgetown University Seminar
Resampling from the past to improve on MCMC algorithms
22
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Knowledge Mining in Health Care
March, 2008
5
Wed.
Generalized Confidence Intervals: Methodology and Applications
6
Thur.
Bringing Statistical Principles to US Elections
6
Thur.
Statistics Can Lie But Can Also Correct for Lies: Reducing Response Bias in NLAAS via Bayesian Imputation
6
Thur.
University of Maryland
Statistics Program Seminar
One-Sided Coverage Intervals for a Proportion Estimated from a Stratified Simple Random Sample
7
Fri.
George Washington University
Department of Statistics Seminar
Bayesian Variable Selection Methods For Class Discovery And Gene Selection
7
Fri.
George Washington University
Department of Decision Sciences and The Institute for Integrating Statistics in Decision Sciences Seminar
Baby at Risk: The Uncertain Legacies of Medical Miracles for Babies, Families and Society
13
Thur.
A Semiparametric Generalization of One-Way ANOVA
13
Thur.
University of Maryland
Statistics Program Seminar
What happens to the location estimator if we minimize with a power other than 2?
28
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences and Department of Statistics Seminar
Non-parametric Continuous Bayesian Belief Nets
28
Fri.
U. S. Census Bureau
Statistical Research Division Seminar
Using Cognitive Predictors for Evaluation
April, 2008
2
Wed.
Studies in Military Medicine from the Center for Data Analysis and Statistics (CDAS) at West Point
8
Tues.
Using the Peters-Belson Method in EEO Personnel Evaluations
11
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences and Department of Statistics Seminar
Computation with Imprecise Probabilities
11
Fri.
Joint Program in Survey Methodology Distinguished Lecture
Survey Design a la carte: Survey Research in the 21st Century
15
Tues.
Assessing Disclosure Risk, and Preventing Disclosure, in Microdata
15
Tues.
U. S. Census Bureau
Statistical Research Division Seminar
Statistical Meta-Analysis - a Review
25
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Text Mining, Social Networks, and High Dimensional Analysis
25
Fri.
George Washington University
Department of Statistics
Network Sampling with Sampled Networks
25
Fri.
Georgetown University Seminar
Preprocessing in High Throughput Biological Experiments
May, 2008
2
Fri.
Statistical issues in disease surveillance: A case study from ESSENCE
2
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Some Issues Raised by High Dimensions in Statistics
5
Mon.
Statistical Issues Arising in the Interpretation of a Measure of Relative Disparity Used in Educational Funding: The Zuni School District 89 Case
8
Thur.
U. S. Census Bureau
9th Elders Program Seminar
Different Directorates, Not So Different Approach
13
Tues.
Multivariate Event Detection and Characterization
15
Thur.
President's Invited Seminar
What's Up at the ASA?
16
Fri.
Bayesian Dose-finding Trial Designs for Drug Combinations
June, 2008
3
Tues.
New Methods in Network And Spatial Sampling
4
Wed.
U. S. Census Bureau
Demographic Statistical Methods Division Seminar
Alternative Survey Sample Designs, Seminar #3: The Pros and Cons of Balanced Sampling
10
Tues.
Nonresponse Adjustments in Survey Applications
17
Tues.
Recent Developments in Address-based Sampling
18
Wed.
Entropy and ROC Based Methods for SNP Selection and GWA Study
26
Thur.
Multiple Frame Surveys: Lessons from CBECS Experience — cancelled
30
Mon.
Statistical Policy Issues Arising in Carrying Out the Requirements of the Prison Rape Elimination Act of 2003 — cancelled and will be rescheduled
September, 2008
4
Thur.
Roger Herriot Memorial Lecture
Collaborative Efforts to Foster Innovations in Federal Statistics
10
Wed.
Metadata from the Data Collection Point of View
12
Fri.
Cell Lines, Microarrays, Drugs and Disease: Trying to Predict Response to Chemotherapy
23
Tues.
Weighted-Covariance Factor Decomposition of VARMA Models Applied to Forecasting Quarterly U.S. GDP at Monthly Intervals
26
Fri.
Prediction Limits for Poisson Distribution
October, 2008
7
Tues.
Statistical Policy Issues Arising in Carrying Out the Requirements of the Prison Rape Elimination Act of 2003
8
Wed.
An Introduction to Using GIS for Geostatistical Analysis
15
Wed.
Greenhouse, White House, and Statistics: The use of statistics in environmental decision making
15
Wed.
George Washington University
Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies
17
Fri.
George Washington University
Imbalance in Digital Trees and Similarity of Digital Strings
22
Wed.
NORC Data Enclave
23
Thur.
The Increasing Difficulty of Obtaining Personal Interviews in the United States'Ever Changing Social Environment
24
Fri.
Usability of Electronic Voting and Public Opinion about the New Technology
28
Tues.
18th Annual Morris Hansen Lecture
The Federal Statistical System: Is It Stronger Now Than It Was Eight Years Ago?
31
Fri.
George Washington University
Department of Statistics Seminar
Statistics in Forensic Science
31
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Volatility, Jump Dynamics in the U.S. Energy Futures Markets
November, 2008
3
Mon.
Can Calibration Be Used to Adjust for ÔNonignorableÓ Nonresponse?
4
Tues.
Self-Service Business Intelligence for Statistical Agencies/Departments
6
Thur.
SNearest Neighbor Imputation Strategies : Does 'nearest' imply most likely? - And other difficult questions …
10
Mon.
NOAA's National Weather Service Weather Services for the Nation -- A Transition Briefing
13
Thur.
Administrative Data in Support of Policy Relevant Statistics: The Medicaid Undercount Projects
14
Fri.
High-throughput Flow Cytometry Data Analysis: Tools And Methods In Bioconductor
21
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences and Department of Statistics Seminar
Analysis of Multi-Factor Affine Yield Curve Models
20
Thur.
Bayesian Multiscale Multiple Imputation with Implications to Data Confidentiality
December, 2008
3
Wed.
Administrative Data in Support of Policy Relevant Statistics: the Earned Income Tax Credit (EITC) Eligibility, Participation, and Its Impact on Employment
8
Mon.
George Washington University Biostatistics Center Seminar
Clinical research and lifelong learning: An example from the BLISS cluster randomised controlled trial of the Effect of *Active Dissemination of Information* on standards of care of premature babies in England (BEADI)
9
Tues.
Getting Started with ODS Statistical Graphics in SAS 9.2 & An Introduction to SAS Stat Studio
11
Thur.
Visualizing Patterns in Data with Micromaps
12
Fri.
(Bio)Statistics Seminar Series Office of Biostatistics Research
Division of Prevention and Population Sciences
On Robust Tests For Case-Control Genetic Association Studies
12
Fri.
Model Building: Data with Random Location and Random Scale Effects
16
Tues.
Disclosure Protection: A New Approach to Cell Suppression
18
Thur.
Income Data for Policy Analysis: A Comparative Assessment of Eight Surveys


Title: Reliability Growth Projection of One-ShotSystems

  • Speaker: Brian Hall, Army Evaluation Center, Aberdeen Proving Ground, Maryland
  • Chair: Myron Katzoff, CDC/National Center for Health Statistics
  • Date/Time: January 9, 2008 (Wednesday) / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Defense and National Security Section

Abstract:

This paper offers several contributions to the area of discrete reliability growth projection. We present a new, logically derived model for estimating the reliability growth of complex, one-shot systems (i.e., the reliability following implementation of corrective actions to known failure modes). Multiple statistical estimation procedures are utilized to approximate this exact expression. A new estimation method is derived to approximate the vector of failure probabilities associated with a complex, one-shot system. A mathematically convenient functional form for the s-expected initial system reliability of a one-shot system is derived. Monte-Carlo simulation results are presented to highlight model accuracy with respect to resulting estimates of reliability growth. This model is useful to program managers, and reliability practitioners who wish to assess one-shot system reliability growth.

Index Terms One-shot system, projection, reliability growth.

Return to top

Title: Alternative Survey Sample Designs, Seminar #2: Sampling with Multiple Overlapping Frames

  • Speaker: Professor Sharon Lohr, Arizona State University
  • Discussant: Professor Jean D. Opsomer, Colorado State University.
  • Date/Time: Wednesday, January 9, 2008 / 9:30 a.m. - 12:00 p.m.
  • Location: U. S. Census Bureau, 4600 Silver Hill Road, Auditorium, Suitland, Maryland. By Metro, use the Green Line to Suitland Station and walk through the Metro parking garage to the main entrance of the Census Bureau. Please send an e-mail to Carol.A.Druin@census.gov, or call (301) 763 4216 to be placed on the visitors' list for this seminar by 4 January 2008. A photo ID is required for security purposes.
  • Sponsor: U.S. Bureau Of Census, Demographic Statistical Methods Division

Abstract:

The Census Bureau's Demographic Survey Sample Redesign Program, among other things, is responsible for research into improving the designs of demographic surveys, particularly focused on the design of survey sampling. Historically, the research into improving sample design has been restricted to the "mainstream" methods like basic stratification, multi-stage designs, systematic sampling, probability-proportional-to size sampling, clustering, and simple random sampling. Over the past thirty years or more, we have increasingly faced reduced response rates and higher costs coupled with an increasing demand for more data on all types of populations. More recently, dramatic increases in computing power and availability of auxiliary data from administrative records have indicated that we may have more options than we did when we established our current methodology.

This seminar series is the beginning of an exploration into alternative methods of sampling. In this second seminar of the three seminar series, from 9:30 to 10:30, we will hear about Professor Lohr's work on the use of multiple overlapping frames for sampling. She will discuss various alternative approaches and their statistical properties. Following Professor Lohr's presentation, there will be a 10-minute break, and then from 10:40 to 11:30, Professor Jean Opsomer will provide discussion about the methods and their potential in demographic surveys, particularly focusing on impact on estimation. The seminar will conclude with an open discussion session from 11:30 to 11:45 with 15 additional minutes available if necessary.

Seminar #3 is currently slated for June 2, 2008 and will feature Professor Yves Tille of University of Neuchatel in Switzerland discussing balanced sampling.

This event is accessible to persons with disabilities. Please direct all requests for sign language interpreting services, Computer Aided Real-time Translation (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Return to top

Title: Medicaid Underreporting in the CPS: Results from a Record Check Study

  • Chair: Robert Stewart, Congressional Budget Office
  • Speaker: Joanne Pascale, U.S. Census Bureau
  • Discussant: John Czajka, Mathematica Policy Research
  • Date/Time: Wednesday, January 16, 2008 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS
  • Slides from the Presentation (pdf, ~212kb)

Abstract:

The Medicaid program covers roughly 38 million people in the U.S., and the research community regularly studies the effectiveness of the program. Though administrative records provide information on enrollment status and history, the data are 3 years old before they can be used for analysis, and they do not offer information on certain characteristics of Medicaid enrollees, such as their employment status, health status and use of health services. Researchers generally turn to surveys for this type of rich data, and the Current Population Survey (CPS) is one of the most common sources used for analysis. However, there is a fairly substantial literature that indicates Medicaid is underreported in surveys when compared to counts from records. Recently an inter-agency team of researchers was assembled to address the Medicaid undercount issue in the CPS. Records on enrollment in 2000-2001 were compiled from the Medicaid Statistical Information System (MSIS) and matched to the CPS survey data covering the same years. This matched dataset allows researchers to compare data on known Medicaid enrollees to survey data in which those same enrollees were (or were not) reported to have been covered by Medicaid. This kind of "truth source" enables a rich analysis of the respondent and household member characteristics associated with Medicaid misreporting. In the CPS a single household respondent is asked questions about coverage status for all other household members, and one possible source of misreporting is the relationship between the household respondent and the other household members for whom he or she is reporting. Recent research from cognitive testing of the CPS suggests that the household respondent may be more likely to report accurately about another household member if they both share the same coverage. This paper explores whether the hypothesis suggested by cognitive testing is evident in the records data. Other variables are also considered, such as recency and duration of coverage and demographics of both respondents and people for whom they are reporting.

Return to top

Title: Questionnaire Design Guidelines for Establishment Surveys

  • Speaker: Rebecca L. Morrison, Survey Statistician, U.S. Census Bureau
  • Discussant: Brenda G. Cox, Survey Research Leader, Battelle
  • Chair: Jennifer K. Lawhorn, Graduate Student, Georgetown University and Intern, Energy Information Administration
  • Date/Time: Thursday, January 17, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 9. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Data Collection Methods
  • Presentation material:
    Slides from the presentation (pdf, ~7.6mb)
    Presentation handout (pdf, ~60kb)
    Slides from the discussant (pdf, ~432kb)

Abstract:

Previous literature has shown the effects of question wording or visual design on the data provided by respondents. However, few articles have been published that link the effects of question wording and visual design to the development of questionnaire design guidelines. This article proposes specific guidelines for the design of establishment surveys within statistical agencies based on theories regarding communication and visual perception, experimental research on question wording and visual design, and findings from cognitive interviews with establishment survey respondents. The guidelines are applicable to both paper and electronic instruments, and cover such topics as the phrasing of questions, the use of space, the placement and wording of instructions, the design of answer spaces, and matrices.

This talk is an expanded version of a paper given at ICES-3 in Montreal, Quebec, Canada in June 2007. It represents a collaborative effort with Don A. Dillman (Washington State University), and Leah M. Christian (University of Georgia).

Return to top

Title: Coverage Measurement for the 2010 Census

  • Chair: Gregg Diffendal, U.S. Census Bureau
  • Speaker: Thomas Mule, U.S. Census Bureau
  • Discussant: Michael Cohen, Committee on National Statistics
  • Date/Time: Wednesday, January 23, 2008 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS
  • Presentation material:
    Slides from the presentation (pdf, ~500kb)
    Slides from the discussion (pdf, ~84kb)

Abstract:

For the 2010 Census Coverage Measurement (CCM), we plan to use logistic regression modeling instead of post-stratification cells in the dual system estimation. We believe that by using logistic regression that we can potentially utilize more variables than we have used in the past in trying to minimize the impact of correlation bias and high variances. Logistic regression gives us the option of using variables in the modeling as main effects and not having to introduce any unnecessary interactions. In addition to potentially utilizing more variables, logistic regression can also use variables in the model as continuous variables. This presentation shows some of the initial results of using continuous variables for the modeling and dual system estimation.

Return to top

Topic: Extracting Intrinsic Modes in Stationary and Nonstationary Time Series Using Reproducing Kernels and Quadratic Programming

  • Speaker: Christopher D. Blakely, Statistical Research Division, U.S. Census Bureau
  • Date: January 30, 2008
  • Time: 10:00 a.m. - 11:00 a.m.
  • Location: U.S. CensusBureau, 4600 Silver Hill Road, Seminar Room 5K410, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes. All visitors to the Census Bureau are required to have an escort to Seminar Room 5K410. An escort will leave from the Guard's Desk at the Metro Entrance (Gate 7) with visitors who have assembled at 9:55 a.m. Parking is available at the Suitland Metro.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

The Empirical ModeDecomposition (EMD) method is a nonlinear adaptive process for stationary and nonstationary time series which produces a finite family of Intrinsic Mode Functions (IMFs) from which many time-frequency properties of the data can be analyzed. The main difficulty in extracting IMFs is in choosing the criteria of convergence in the iterative extraction algorithm along with the basis in which the IMFs are represented. In this paper, we introduce a new method for extracting intrinsic modes using certain linear combinations of reproducing kernel functions which satisfy a quadratic programming problem with both equality and inequality constraints. We discuss advantages of this proposed method of signal extraction compared to the classical EMD approach and present applications to nonstationary time series.

This seminar is physicallyaccessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free & confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your request via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Topic: Experiments on the Optimal Design of Complex Survey Questions

  • Speaker: Paul Beatty, National Center for Health Statistics
  • Date/Time: January 31, 2008, 10:30 - 11:30 a.m.
  • Location: U.S. Census Bureau, 4600 Silver Hill Road, Seminar Room 5K410, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes. All visitors to the Census Bureau are required to have an escort to Seminar room 5K410. An escort will leave from the Guard's Desk at the Metro Entrance (Gate 7) with visitors who have assembled at 10:25 a.m. Parking is available at the Suitland Metro.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

Many survey questions are complex in order to convey very specific information to respondents. Often, questions could be written in several different ways to meet specific measurement goals-for example, the information could be compressed into a single complex item or spread out over multiple questions; when single questions are used, the information within them can be structured in various ways; and, potentially ambiguous concepts can be illustrated through examples or definitions. Questionnaire designers must also decide whether it is necessary to include certain details in questions, or whether general statements will be sufficient to explain the topic of interest and to adequately stimulate recall. Although questionnaire design principles provide some advice on constructing complex questions, little empirical evidence demonstrates the superiority of certain decisions over others.

This seminar presents results from several rounds of split-ballot experiments that were designed to provide more systematic guidance on these issues. Alternative versions of questions were embedded in RDD telephone surveys (n=450 in one set of experiments and n=425 in another). In some experiments, alternative questions used the same words but were structured differently. Other experiments compared the use of examples and definitions to explain complex concepts, compared the use of one vs. two questions to measure the same phenomenon, and compared questions before and after cognitive interviews had been used to clarify key concepts. With respondent permission, interviews were tape recorded and behavior-coded, making it possible to compare various interviewer and respondent difficulties across question versions, in addition to comparing differences in response distributions.

Taken altogether, the results begin to suggest some general design principles for complex questions. For example, the disadvantages of presenting information that "dangles" after the question mark are becoming clear, as are the advantages of using multiple questions to disentangle certain complex concepts. The paper will report results of these and other experimental comparisons, with an eye toward providing more systematic questionnaire design guidance.

This seminar is physicallyaccessible to persons with disabilities. For TTY callers, please use the Federal Relay Service at 1-800-877-8339. This is a free & confidential service. To obtain Sign Language Interpreting services/CART (captioning real time) or auxiliary aids, please send your request via e-mail to EEO Interpreting & CART: eeo.interpreting.&.CART@census.gov or TTY 301-457-2540, or by voice mail at 301-763-2853, then select #2 for EEO Program Assistance.

Return to top

Title: A View From The Field

  • Speaker: James Holmes, Former Acting Director, U.S. Census Bureau
  • Date/Time: Tuesday, February 5, 2008, 10:30 a.m.- Noon
  • Location: U.S. Census Bureau, 4600 Silver Hill Road, Census Bureau Auditorium, Suitland, Maryland. Please call (301) 763-2118 to be placed on the visitors' list. A photo ID is required for security purposes.
  • Sponsor: The Human Capital Management Council and the SRD Seminars Series are pleased to sponsor the 8th Wise Elders' Program Presentation
  • Sponsor: U.S. Bureau Of Census, The Wise Elders Program

Abstract:

I will talk about some of my experiences and career with the Census Bureau. I will talk a bit about the Field data collection environment and some of the changes that I have observed over the last 40 years. I will also talk a bit about the potential impact of those changes on the Census Bureau and the larger statistical community. I will conclude with some thoughts on the challenges I see ahead for the Census Bureau.

Biography:

In June 2005, James Holmes retired after a distinguished 37-year career with the United States Census Bureau. His position just prior to retirement was Director for the Atlanta Regional Office. During his career, Jim also served as the Director for the Philadelphia Regional Office, Assistant Director for the Los Angeles Regional Office, Assistant Census Manager for the Kansas City Regional Office, as well as other management positions in the Kansas City and Detroit Regions.

In January 1998, Jim was appointed by then Secretary of Commerce William Daley, to serve as Acting Director of the Census Bureau while the search was conducted for a new permanent Director. He is the only African American to have served as Census Bureau Director (acting or otherwise). Jim served in that capacity through October 1998, when Dr. Kenneth Prewitt was sworn in as Director. In the press release announcing Dr. Prewitt's confirmation, Secretary Daley made the following statement:

"I would also like to take this opportunity to thank James Holmes, who has served with distinction as Acting Director of the Census Bureau for most of this year. Jim has done a superb job, successfully guiding the Census Bureau at a very critical time. I salute Jim for his hard work and success. Both Dr. Prewitt and I will continue to turn to him for his trusted insights as the important work of the Census Bureau moves forward."

Important Information:

Please e-mail or callLaVonne Lewis by COB, Friday, February 1, to be placed on the visitors' list - lavonne.m.lewis@census.gov; (301) 763-2118. A photo ID is required for security purposes.

Please direct all requests for Sign Language Interpreting Services, Computer Aided Real-time (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Return to top

Title: Statistical Analysis of Bullet Lead Compositions as Forensic Evidence

  • Speaker: Karen Kafadar, Indiana University
  • Chair: Myron Katzoff, CDC/National Center for Health Statistics
  • Date/Time: February 6, 2008 (Wednesday) / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Defense and National Security Section

Abstract:

Since the 1960s, FBI has performed Compositional Analysis of Bullet Lead (CABL), a forensic technique that compares the elemental composition of bullets found at a crime scene to that of bullets found in a suspect's possession. CABL has been used when no gun is recovered, or when bullets are too small or fragmented to compare striations on the casings with those on the gun barrel.

The National Academy of Sciences formed a Committee charged with the assessment of CABL's scientific validity. The report, "Forensic Analysis: Weighing Bullet Lead Evidence" (National Research Council, 2004), included discussions on the effects of the manufacturing process on the validity of the comparisons, the precision and accuracy of the chemical measurement technique, and the statistical methodology used to compare two bullets and test for a "match". The report has been cited in recent appeals brought forth by defendants whose trials involved bullet lead evidence (60 Minutes, 11/18/2007; Washington Post, 11/18-19/2007). This talk will focus on the statistical analysis: the FBI's methods of testing for a ``match'', the apparent false positive and false negative rates, the FBI's clustering algorithm (``chaining''), and the Committee's recommendations. Additional analyses on data later made available, and the use of forensic evidence in general, also will be discussed.

Index Terms One-shot system, projection, reliability growth.

Return to top

Title: Reducing Disclosure Risk in Microdata and Tabular Data

Abstracts:

Analytically Valid Discrete Data Files and Re-identification (Winkler)

With the exception of synthetic data (e.g., Reiter 2002, 2005) and a few other methods (Kim 1986, Dandekar, Cohen, and Kirkendal 2002), masking methods and resultant public-use files are seldom justified in terms of valid analytic properties. If a file has valid analytic properties, then the analytic characteristics can be used as a starting point for re-identification using analytic methods only (Lambert 1993, Fienberg 1997). In this paper, we describe a general method for building a synthetic data file having valid analytic properties. If we use general modeling/edit/imputation methods (Winkler 2007a, 2007b, 2008) that allow additional convex constraints, then we can create synthetic data with nearly identical analytic properties and with significantly reduced re-identification risk.

Comparative Evaluation of Seven Different Sensitive Tabular Data Protection Methods Using a Real Life Table Structure of Complex Hierarchies and Links (Dandekar)

The practitioners of tabular data protection methods in federal statistical agencies have some familiarity with commonly used table structures. However, they require some guidance on how to evaluate appropriateness of various sensitive tabular data methods when applied to their own table structure. With that in mind, we use a real life "typical" table structure of moderate hierarchical and linked complexity and populate it with synthetic micro data to evaluate the relative performance of four different tabular data protection methods. The methods selected for the evaluation are: 1) lp-based classical cell suppression; 2) lp-based CTA (Dandekar 2001); 3) network flow-based cell suppression as implemented in DiAna, a software product made available to other Federal statistical agencies by the US Census Bureau; 4) a micro data level noise addition method documented in a US Census Bureau research paper; 5) Hybrid EM/IPF based CTA method; 6) simplified CTA method; 7) conventional rounding based method.

Return to top

Title: Functional Shape Analysis to Forecast Box-Office Revenue using Virtual Stock Exchanges

  • Speaker: Wolfgang Jank, Robert H Smith School of Business, University of Maryland
  • Date/Time: Friday, February 8th 3:00-4:00 pm
  • Location: Funger Hall 320, George Washington University
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and Department of Statistics

Abstract:

In this paper we propose a novel model for forecasting innovation success based on online virtual stock markets. In recent years, online virtual stock markets have been increasingly used as an economic and efficient information gathering tool for the online community. It has been used to forecast events ranging from presidential elections to sporting events and applied by major corporations such as HP and Google for internal forecasting. In this study, we demonstrate the predictive power of online virtual stock markets, as compared to several conventional methods, in forecasting demand for innovations in the context of the motion picture industry. In particular, we forecast the release weekend box office performance of movies which serves as an important planning tool for allocating marketing resources, determining optimal release timing and advertising strategies, and coordinating production and distributions for different movies. We accomplish this forecasting task using n ovel statistical methodology from the area of functional data analysis. Specifically, we develop a forecasting model that uses the entire trading path rater than only its final value. We also employ trading dynamics and we tease out differences between different trading paths using functional shape analysis. Our results show that the model has strong predictive power and improves tremendously over competing approaches.

Return to top

Title: Conducting a 360 Degree Feedback Survey for Managers: Implementing Organization Culture Change

  • Speaker: Eduardo S. Rodela, Ph.D., Rodela Consulting Group, Fairfax, Virginia
  • Chair: Mel Kollander
  • Date/Time: February 13, 2008 (Wednesday) / 12:30 - 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources Section

Abstract:

A nationwide 360 Degree Feedback (Multi-Source) Pilot for Managers was conducted in a federal agency. The multi-source work group conducted an on-line survey asking managers to rate themselves on fifty two questions addressing important managerial behaviors (e.g., strategic planning skills). The number of managers participating in the study was around 500. In addition to having the managers rate themselves, the managers'supervisors, peer group, and employees rated the manager as well. While the pilot work group did not have access to survey data, a number of lessons were learned about administering a multi-source feedback pilot for managers. The discussion will focus on the survey methodology and lessons learned.

e-mail: esrodela@cox.net

Return to top

Title: National Household Travel Survey - Demographic Indicators of Travel Demand

  • Speaker: Heather Contrino, Federal Highway Administration, Department of Transportation
  • Chair: Promod Chandhok, RITA/Bureau of Transportation Statistics
  • Date/Time: February 14, 2008 (Thursday) / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Social & Demographic Statistics Section

Abstract:

Since 1969, the National Household Transportation Survey (NHTS) has collected information about the U.S. population's daily travel behavior. The NHTS connects detailed trip characteristics to vehicle information, geography, and household and person demographic data. The study is designed primarily to obtain behavioral data to understand demand data needed for performance measurement, policy analyses, and program development and prioritization.

Thispresentation will provide an overview of information included in the NHTS and how demographic information plays an important role in assessing programs, policies, and in forecasting future demand. The discussion will include an overview of methods, emerging challenges, and example applications of the data, including the integration with other data sources such as the ACS.

Return to top

Title: Pre-Modeling via Bayesian Additive Regression Trees (BART)

  • Speaker: Edward I. George, University of Pennsylvania
  • Date/Time: Friday, February 15st 11:00 am -12:00 noon
  • Location: DUQUES 254, 2201 G Street, NW.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences

Abstract:

Consider the canonical regression setup where one wants to learn about the relationship between y, a variable of interest,and x1…xp, potential predictor variables. Although one may ultimately want to build a parametric model to describe and summarize this relationship, preliminary analysis via flexible nonparametric models may provide useful guidance. For this purpose, we propose BART (Bayesian Additive Regression Trees), a flexible nonparametric ensemble Bayes approach for estimating f(x1…xp) ≡ E(Y|x1…xp), for obtaining predictive regions for future y, for describing the marginal effects of subsets of x1…xp, and for model-free variable selection. Essentially, BART approximates f by a Bayesian "sum-of-trees" model where fitting and inference are accomplished via an iterative backfitting MCMC algorithm. By using a large number of trees, which yields a redundant basis for f, BART is seen to be remarkably effective at finding highly nonlinear relationships hidden within a large number of irrelevant potential predictors. BART alsoprovides an omnibus test: the absence of any relationship between and any subset of x1…xp,, is indicated when BARTposterior intervals for reveal no signal.

This is joint workwith Hugh Chipman and Robert McCulloch.

Return to top

Title: Rationalizing Momentum Interactions

  • Speaker: Professor Doron Avramov, University of Maryland R.H.Smith School of Business and Electrical and Computer Engineering Department
  • Time and Date: Wednesday , February 20, 2008 at 4pm
  • Location: 0106 Math Building, University of Maryland, College Park, MD 20742
  • Sponsor: University of Maryland, Statistics Program

Abstract:

Momentum profitabilityconcentrates in high information uncertainty and high credit risk firms and is virtually nonexistent otherwise. This paper rationalizes such momentum interactions in equilibrium asset pricing. In our paradigm, dividend growth is mean reverting, expected dividend growth is persistent, the representative agent is endowed with stochastic differential utility of Duffie and Epstein (1992), and leverage, which proxies for credit risk, is modeled based on the Abel's (1999) formulation. Using reasonable risk aversion levels we are able to produce the observational momentum effects. In particular, momentum profitability is especially large in the interaction between high levered and risky cash flow firms. It rapidly deteriorates and ultimately disappears as leverage or cash flow risk diminishes.

Please check for seminar updates at: http://www.math.umd.edu/statistics/seminar.shtml

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

Return to top

Title: Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

  • Speaker: Dr. Mitchell H. Gail, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute
  • Date/Time: Friday, February 22, 11 - 12pm
  • Location: Conference Room 9201, Two Rockledge Center, 6701 Rockledge Drive, Bethesda, MD 20892
  • Sponsor: National Heart, Lung and Blood Institute, Office of Biostatistics Research

Abstract:

Some case-control genome-wide association studies (GWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. We define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected", namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association among the N (~ 500,000) SNPs studied. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk. For a genetic odds ratio per disease allele of 1.2 or less, even a GWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. These results for one-stage designs have implications for two- and multi-stage designs. In particular, a large fraction of the available cases and controls usually must be studied in the first stage if the study is to have adequate DP.

This is the joint work of Mitchell H. Gail, Ruth M. Pfeiffer, William Wheeler and David Pee.

Return to top

Title: Resampling from the past to improve on MCMC algorithms

  • Speaker: Yves Atchade, Department of Statistics, University of Michigan
  • Time: Friday, February 22, 3:15 - 4:15 pm
  • Place: 326 St. Mary's Hall, Department of Mathematics, Georgetown University
  • Sponsor: Georgetown University

Abstract:

Markov Chain Monte Carlo (MCMC) methods provide a very general and flexible approach to sample from arbitrary probability distributions. MCMC has considerably expanded the capability of Statistics in dealing with more realistic models. But designing MCMC samplers with good mixing properties is often tedious and involves many trial-and-errors. This talk will explore various ideas where sample paths are re-used to build more adaptive and automatic MCMC samplers. I will discuss the mixing of these new samplers theoretically and through examples.

Technical Report available at:
http://www.stat.lsa.umich.edu/~yvesa/eprop.pdf

Return to top

Title: Knowledge Mining in Health Care

  • Speaker:
    Janusz Wojtusiak
    Machine Learning and Inference Laboratory and Center for Discovery Science and Health Informatics
    George Mason University
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: February 22, 2008
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 302, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

Knowledge mining concerns discovering knowledge that is useful and understandable by people. Unlike traditional data mining, it is not only about discovering useful patterns in large volumes of data, but also from small datasets that can be deficient, and with extensive use of background knowledge. This talk presents an approach to knowledge mining developed at the GMU Machine Learning and Inference Laboratory, its relation to health care, and some applications in this area.

Return to top

Title: Generalized Confidence Intervals: Methodology and Applications

  • Speaker: Thomas Mathew, University of Maryland Baltimore County
  • Chair: Myron Katzoff, CDC/National Center for Health Statistics
  • Date/Time: March 5, 2008 (Wednesday) / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Defense and National Security and Public Health and Biostatistics Sections

Abstract:

The concept ofgeneralized confidence intervals is fairly recent, and is useful to obtain confidence intervals for certain complicated parametric functions. The usual confidence intervals are derived using the percentiles of a pivotal quantity. Generalized confidence intervals are derived based on a generalized pivotal quantity, which is a function of a random variable, its observed value, and also the parameters. In the talk, I will explain the construction of a generalized pivotal quantity and will describe the conditions that they must satisfy. I will then discuss a series of applications of the generalized confidence interval methodology for obtaining confidence intervals for a number of somewhat complicated problems: confidence intervals for (i) the lognormal mean, (ii) the lognormal variance, (iii) the mean and variance of limited and truncated normal as well as lognormal distributions and (iv) some problems involving random effects models. In each case, I will motivate the problem with specific applications and will also illustrate the results using the relevant data analysis. Some attractive features of the generalized confidence intervals are that they are easy to compute and they exhibit excellent performance even for small sample sizes. We will comment on the situation where some variation on the assumption of normality does not apply.

Return to top

Title: Bringing Statistical Principles to US Elections

  • Speakers:
    Arlene Ash, Boston University School of Medicine
    Mary Batcher, Ernst & Young, LLP
  • Discussant: David Marker, Westat
  • Chair: Wendy Rotz, Ernst & Young, LLP
  • Date/Time: March 6, 2008 (Thursday) / 12:30 - 2 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Human Rights Statistics, WSS

Abstract:

Members of The ASA Special Interest Group on Volunteerism and the ASA Scientific and Public Affairs Advisory Committee have been actively working on issues related to elections. Vote counts seem to be off in some measurable way in some precinct whenever there is an election. The most recent example is in the November 2006 results in the 13th district of Florida where the undervote, apparently due to poor design form, appears to have changed the election outcome. These incidents provide interesting discussions for statisticians and survey methodologists but the more important result is that they undermine confidence in the electoral process. Electronic vote tally miscounts arise for many reasons, including hardware malfunctions, unintentional programming errors, malicious tampering, or stray ballot marks that interfere with correct counting. Thus, Congress and several states are considering requiring audits to compare machine tabulations with hand counts of paper ballots in randomly chosen precincts. This session will describe some of the analyses that have been used to indicate potential problems. It will also describe work that ASA members have been doing in conjunction with election activists to bring statistical principles to the procedures for sampling precincts for post-election audits of election results.

Return to top

Title: Statistics Can Lie But Can Also Correct for Lies: Reducing Response Bias in NLAAS via Bayesian Imputation

  • Speaker: Xiao-Li Meng, Harvard University
  • Chair: David Cantor, Westat
  • Date/Time: March 6, 2008 (Thursday) / 3:30 - 4:30 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

This talk is based on thejoint work with Liu, Chen and Alegria of the same title. The National Latino and Asian American Study (NLAAS) is a multi-million dollar survey of psychiatric epidemiology, the most comprehensive survey of its kind. Data from the NLAAS was made public in July 2007. A unique feature of NLAAS is its embedded experiments for estimating the effect of alternative interview questions orderings. Although the findings from the experiments were not completely unexpected, the magnitudes of the effects were nevertheless astonishing. Compared to survey results from the widely used traditional ordering, the self-reported psychiatric service-use rates often doubled or even tripled under the new, more sensible, ordering introduced by NLAAS. These findings partially answer some perplexing questions in the literature, e.g., why the self-reported rates of using religious services were typically much lower than results from other sources of empirical evidence. At the same time, however, these new insights come at a price. For example: how can one assess racial disparities when different races were surveyed with different survey instruments, (e.g., the existing data on white populations were collected using the traditional questionnaire ordering) when it is now known that these survey instruments induce substantial differences? The project documented in this paper is part of the effort to address these questions. We do this by creating models for imputing the correct responses had the respondents under the traditional survey not been able to take advantage of skip patterns to reduce interview time. The ability to skip large numbers of questions resulted in increased rates of untruthful negative responses over the course of the interview. The task of modeling the imputation is particularly challenging because of the complexity of the questionnaire, the small sample sizes for subgroups of interests, the existence of high-order interactions among variables, and above all, the need to provide sensible imputation for whatever subpopulation a future user might be interested in studying. This paper is intended to serve three purposes: (1) to provide a published record of the key steps and strategies adopted in creating the released multiple imputation for NLAAS, (2) to alert the potential users of the limitations of the imputed data, and (3) to provide a vivid demonstration of the type of challenges and opportunities typically encountered in modern applied statistics.

Return to top

Title: One-Sided Coverage Intervals for a Proportion Estimated from a Stratified Simple Random Sample

  • Speaker: Dr. Phil Kott, USDA National Agricultural Statistics Service, Research & Development Division
  • Time and Date: Thursday, March 6, 2008, 3:30pm
  • Location: Room MTH 1313, Math Building, University of Maryland, College Park, MD 20742
  • Sponsor: University of Maryland, Statistics Program

Abstract:

It is well known that Wald confidence intervals for a proportion calculated from a simple random sample often do not work very well. Hall (1982) showed how translating such an interval towards 1/2 could markedly improve its one-sided coverage properties. While extending Hall's methodology to a proportion estimated from a sample drawn independently within a number of strata having (perhaps) differing means, we discovered some surprising things. First, a simple modification of Hall's method is more effective that the more complicated `second-order' correction proposed by Cai (2004). Second, the heart of the method is less using an Edgeworth expansion to account for the skewness of the sample proportion as Hall (and Cai) argued and more replacing the standard formulation of the estimated variance in the denominator of the Wald pivotal with a more efficient, but not directly calculable, expression. We investigated two choices for this expression under stratified sampling. One allows the strata to have differing means. As a consequence, the expression itself has a variance. This suggested replacing the Normal z-score with a t -score when constructing the interval. Our last surprise was the realization that the methodology extends not only to more complicated sampling designs but also to more complicated estimands.

Please check for seminar updates at: http://www.math.umd.edu/statistics/seminar.shtml

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

Return to top

Title: Bayesian Variable Selection Methods For Class Discovery And Gene Selection

  • Speaker:
    Prof. Mahlet Tadesse
    Department of Mathematics
    Georgetown University
    mgt26@georgetown.edu
  • Date/Time: Friday, March 7th 11:00 am-12:00 pm
  • Location: DUQUES 250, 2201 G Street, N.W., George Washington University, Washington D.C. Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and Department of Statistics

Abstract:

For various malignancies, currently used diagnostic approaches tend to be too broad in their classification. Patients who receive the same diagnosis often follow significantly different clinical courses and respond differently to therapy. It is believed that gene expression profiles may better capture disease heterogeneities. This calls for methods that uncover cluster structure among tissue samples and identify genes with distinctive expression patterns. I will present some Bayesian methods we have proposed that provide a unified approach to address these problems simultaneously. Model-based clustering is used to uncover the cluster structure and a stochastic search variable selection method is built into the model to identify discriminating genes. We let the number of clusters be unknown and adopt two different approaches. One consists of formulating the clustering problem in terms of finite mixture models with an unknown number of components and uses a reversible jump MCMC technique. The second approach uses infinite mixture models via Dirichlet process mixture priors. We illustrate the methods with applications to gene expression microarray data.

For a complete listing of our current seminars, visit http://www.gwu.edu/~stat/seminar.htm.

Return to top

Title: Disparities in Defining Disparities: Statistical Conceptual Frameworks

  • Speaker:Xiao-Li Meng, Professor & Chair, Department of Statistics, Harvard University
  • Date/Time: Friday, March 7th 11:00 am-12:00 pm
  • Location: EPS 7107, 6116 Executive Blvd, Bethesda, MD 20892
  • Sponsor: National Cancer Institute, Biostatisgtics Branch

Abstract:

This talk is based on a join work with Naihua Duan, Julia Y. Lin, Chih-nan Chen, and Margarita Alegria (Statistics in Medicine, to appear) with the same title and the following abstract. "Motivated by the need to meaningfully implement the Institute of Medicine's (IOM's) definition of health care disparity, this paper proposes statistical frameworks that lay out explicitly the needed causal assumptions for defining disparity measures. Our key emphasis is that a scientifically defensible disparity measure must take into account the direction of the causal relationship between allowable covariates that are not considered to be contributors to disparity and non-allowable covariates that are considered to be contributors to disparity, to avoid flawed disparity measures based on implausible populations that are not relevant for clinical or policy decisions. However, these causal relationships are usually unknown and undetectable from observed data. Consequently, we must make strong causal assumptions in order to proceed. Two frameworks are proposed in this paper, one is the conditional disparity framework under the assumption that allowable covariates impact non-allowable covariates but not vice versa. The other is the marginal disparity framework under the assumption that non-allowable covariates impact allowable ones but not vice versa. We establish theoretical conditions under which the two disparity measures are the same, and present a theoretical example showing that the difference between the two disparity measures can be arbitrarily large. Using data from the Collaborative Psychiatric Epidemiology Survey, we also provide an example where the conditional disparity is misled by Simpson's paradox, while the marginal disparity approach handles it correctly."

Return to top

Title: Baby at Risk: The Uncertain Legacies of Medical Miracles for Babies, Families and Society

  • Speaker:Ruth Levy Guyer, Haveford College, Pennsylvania
  • Date/Time: Friday, March 7th 4:00 pm - 5:00 pm
  • Location: Duques 451, 2201 G Street, N.W., George Washington University, Washington D.C. Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and Department of Statistics

Abstract:

Seven years ago, I became interested in how decisions are made for babies who are born at risk. These babies are sick at birth, or are born with genetic anomalies, or are born too early. The latter group--premature babies, or preemies--have been growing in number each year for the past 25 years; currently, 500,000 preemies are born each year in just the United States. The problems for these babies and their families can be medical, social, financial, and legal. The consequences of their premature births and illnesses can be short-lived or lifelong. The children and their families may have financial, social, medical, educational, psychological, and legal needs. The lives of these children affect everyone, not just the babies and their families and those who care for them in the hospital and afterward. They live in the contexts of their families and their communities, and few communities (either local or state or federal) have adequately prepared for their complex and resource-demanding lives.

In 2006, I wrote the book whose themes I will be discussing: Baby at Risk: The Uncertain Legacies of Medical Miracles for Babies, Families and Society. I interviewed staff members of neonatal intensive care units, families whose babies had done well or had not, and many others. The parents are always young (that is, young enough to have babies) and typically have had little or no experience facing a medical ethics dilemma. They have no sense of the longterm outcomes for their newborn babies, and they are making decisions in a highly emotionally charged climate.

I will describe the roles of the therapeutic imperative and the technological imperative in decision making, the moral distress of nursing and medical staff members who care for these babies, and various other themes that I address in the book. I will talk about how medical and nursing staff members, women and their partners, community members, and policy makers might become better educated about what is medically appropriate and what is not. I will also discuss the role of the media (who have caused huge problems by hyping stories of "miracle babies") in raising expectations about what medicine and science can do. Many medical decisions today are also ethics decisions, and it is time for American society to grasp this concept and then more proactively help families whose babies are born at risk.

Return to top

Title: A Semiparametric Generalization of One-Way ANOVA

  • Speaker: Benjamin Kedem, University of Maryland
  • Chair: Myron Katzoff, CDC/National Center for Health Statistics
  • Date/Time: March 13, 2008 (Thursday) / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsors: WSS Public Health and Biostatistics and Defense and National Security Sections

Abstract:

Under the classical one-way ANOVA, with normal data and equal variances, the problem is to test the equality of means. Then, under the hypothesis of normality, the problem reduces to testing equality of distributions. By relaxing the normal assumption, we show how to test for equi-distribution directly and obtain tests that rival the usual t and F tests. The key idea is to "tilt" a reference distribution. This provides estimates for all the distributions from which we have data, using a modified kernel density estimate which is superior to the traditional kernel estimate. The attractive feature of the semiparametric generalization is that it provides BOTH powerful tests and graphical displays of all the estimated distributions. This will be demonstrated using gene expression data. The "tilting" idea has numerous other statistical applications. We shall briefly outline several recent applications.

Return to top

Title: What happens to the location estimator if we minimize with a power other than 2?

  • Speaker: Dr. Robert Blodgett, Center for Food Safety and Applied, Nutrition, Food and Drug Administration
  • Time and Date: hurs., March 13, 2008, 3:30pm
  • Location: Room 1313, Math Bldg , University of Maryland, College Park, MD 20742
  • Sponsor: University of Maryland, Statistics Program

Abstract:

he location estimator forms a path as the power varies from 1 to infinity. This path indicates how critical the selection of an exponent is. An alternative proof of Descartes' rule of signs, applied to exponential sums, limits the number of repeated exponents for the same minimum point with usual data sets. Several bounds on this path include that it stays among the averages of pairs of data points.

Reference: Robert J. Blodgett, The Path of the Minimum Lp-Norm estimator for p Between 1 and infinity, Commun. in Statistics-Theory and Methods, 36, 2007, pp. 2829-2839.

Please check for seminar updates at: http://www.math.umd.edu/statistics/seminar.shtml

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

Return to top

Title: Non-parametric Continuous Bayesian Belief Nets

  • Speaker: Roger M. Cooke, Resources for the Future
  • Date/Time: Friday, March 28th 11:00 am-12:00 noon
  • Location: Funger Hall 320 (2201 G Street, NW)
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and Department of Statistics

Abstract:

Bayesian Belief Nets (BBNs) enjoy wide popularity in Europe as a decision support tool. The main attraction is that the directed acyclic graph provides a graphical model in which the problem owner recognizes his problem and which at the same time is a user interface for running and updating the model. discrete BBN's support rapid updating algorithms, but involve exponential complexity that limits their use to toy problems. Continuous BBNs hold more promise. To date, only 'discrete normal' BBNs have been available. The user specifies a mean and conditional variance for each node, and the child nodes are regressed on their parents. Continuous nodes can have discrete parents but not discrete children and all continuous nodes are normal. Overcoming the restriction to normality has opened new areas for applications. A large risk model for Schiphol airport involving some 300 probabilistic nodes and 300 functional nodes will be demonstrated. Updating is facilitated by the use of the 'normal copula'. This type of BBN can be used either in a probabilistic modeling mode (user supplies distributions) or in a data mining mode (a BBN is built to model multivariate data). The latter application will be demonstrated using fine particulate emission and collector data.

Return to top

Topic: Using Cognitive Predictors for Evaluation

  • Speaker: Speaker: Mark Palumbo, Indiana University of Pennsylvania
  • Date/Time: March 28, 2008, 10:30 a.m. - Noon, Seminar Room 5K410
  • Location: U.S. Census Bureau, 4600 Silver Hill Road, Seminar Room 5K410, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes. All visitors to the Census Bureau are required to have an escort to Seminar Room 5K410. An escort will leave from the Guard's Desk at the Metro Entrance (Gate 7) with visitors who have assembled at 10:25 a.m. Parking is available at the Suitland Metro.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

This research examines job/task knowledge as a mechanism through which cognitive ability affects performance. Further, two types of job knowledge tests are compared to a cognitive ability test for their efficacy in performance prediction. The knowledge tests differ in the methods used for their development and in the resulting types of information that are assessed. More specifically, one test developed with 'traditional' methods used in Industrial/Organizational Psychology, assesses Basic knowledge about how to complete the task. The second, more 'Cognitively Oriented' test is focused on the assessment of the understanding of the application of task knowledge for successful task completion. Results demonstrate that the Cognitively-Oriented test, i.e., the test of 'Understanding' accounts for significantly more variance in performance than the cognitive ability test, completely mediates cognitive ability effects on performance, and predicts performance more fairly than the test of Basic knowledge. These results have clear implication for selection, training development and assessment, and display design and evaluation of the effectiveness of the tools (displays) being used. For example, the Cognitively-Oriented test provides a method for evaluating the individual's amount of knowledge acquired after receiving training on the use of a particular device and will allow us to evaluate the level of task 'understanding' provided by the use of device itself.

Please direct all requests for Sign Language Interpreting Services, Computer Aided Real-time (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Return to top

Title: Studies in Military Medicine from the Center for Data Analysis and Statistics (CDAS) at West Point

  • Speakers:
    LTC Rodney X. Sturdivant, Ph.D., Center for Data Analysis and Statistics (CDAS), Department of Mathematical Sciences, United States Military Academy, West Point, NY
    MAJ Krista Watts, M.S., Center for Data Analysis and Statistics (CDAS), Department of Mathematical Sciences, United States Military Academy, West Point, NY
  • Chair: Myron Katzoff, CDC/National Center for Health Statistics
  • Date/Time: April 2, 2008 (Wednesday) / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center.Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsors: WSS Public Health and Biostatistics & Defense and National Security Sections

Abstract:

The importance of maintaining and improving the health and fitness of soldiers in the Army has always been high. Stresses of combat as an Army at war have made concerns in this area even greater and highlighted new areas where improvements are necessary. The military medical community has responded with new treatment ideas that have resulted in studies that will both contribute to efforts on behalf of our soldiers and impact medical practices more generally. The Center for Data Analysis and Statistics (CDAS) has been involved in several of these studies in support of Walter Reed, Beaumont Army Medical Center and Keller Army Community Hospital. We will discuss several of these studies and the results to include Leishmania detection, ACL repair, air casts, LASEK surgery, incidence rates for injuries among different demographics, lumbar support for air crews and medical leadership.

Return to top

Title: Using the Peters-Belson Method in EEO Personnel Evaluations

  • Speaker: Michael Sinclair, Director of Statistical Analyses, Equal Employment Advisory Council
  • Chair: Hormuzd A. Katki, National Cancer Institute
  • Statistical Discussant: Barry Graubard, National Cancer Institute
  • Legal Discussant: Jeffrey Bannon, EEOC
  • Date/Time: April 8, 2008 (Tuesday) / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 10. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Human Rights Committee

Abstract:

The Peters-Belson method was developed to examine wage discrimination using linear regression analyses. In application, one conducts a regression analysis on the favored class and applies it to the non-favored class to identify a disparity between the actual and predicted values. Recently, the method was extended to examine health care disparities and other forms of discrimination for binary outcomes via logistic regression. In this paper, we will examine the general properties in personnel hiring discrimination evaluations as compared to a standard regression analysis as related to the size of the applicant pool, the differences in the traits for the favored and non-favored class members ,and the employer's uniform consideration applied for factors by class. We will also discuss some of the philosophical and legal issues from selected court cases surrounding the use of this approach relative to a standard regression analysis and the methodology for applying a jackknife variance estimator to measure the statistical precision in the disparities.

Return to top

Title: Two-Sample Rank Tests for Treatment Effectiveness When Death and Censoring Depend on Covariates

  • Date/Time: Tuesday, April 8, 2008 / 11:00 a.m. to 12:00 p.m.
  • Place: Conference room 9201, 6701 Rockledge Drive, Bethesda, MD 20892
  • Sponsor: Office of Biostatistics Research, National Heart, Lung and Blood Institute, National Institutes of Health

Abstract:

Popular two-sample rank tests for treatment effectiveness in clinical trials rely on the independence of death and censoring, yet there are often baseline covariates on which survival and potentially also censoring may depend. Suppose that survival T and censoring C are conditionally independent given covariates V, and that treatment-allocation Z is independent of V. In a paper of Slud and Kong (Biometrika 1997), an assumption was introduced [essentially, that the conditional survival function for censoring is the sum of a function of (Z,t) and another function of (V,t)] under which the usual logrank test was shown to be consistent. But this assumption is not fully general, and DiRienzo and Lagakos (2001, papers in Biometrika and JRSSB) proposed a bias-correcting weighting for the logrank and studied its performance. In this talk, some theoretical results are presented on asymptotic validity of tests based on an estimated form of their weighting function. Simulations will show the good performance of these methods, and theoretical calculations show that these weight-adjustments cannot simply be ignored.

For current and future OBR seminar series, please contact: Gang Zheng (zhengg@nhlbi.nih.gov) or Jungnam Joo (jooj@nhlbi.nih.gov).

Return to top

Title: Computation with Imprecise Probabilities

  • Speaker: Lotfi A. Zadeh, Department of EECS, University of California, Berkeley
  • Dedicated to Peter Walley
  • Date/Time: Friday, April 11th 3:30-4:30 pm (Followed by wine & cheese reception)
  • Location: Funger Hall 420 (2201 G Street, NW)
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and Department of Statistics

Abstract:

Computation with imprecise probabilities is not an academic exercise-it is a bridge to reality. In the real world, imprecision of probabilities is the norm rather than exception. In large measure, real-world probabilities are perceptions of likelihood. Perceptions are intrinsically imprecise. Imprecision of perceptions entails imprecision of probabilities. In applications of probability theory it is a common practice to ignore imprecision of probabilities and treat imprecise probabilities as if they were precise. A problem with this practice is that it leads to results whose validity is open to question. Publication of Peter Walley's seminal work "Statistical Reasoning with Imprecise Probabilities," in l99l, sparked a rapid growth of interest in imprecise probabilities. Today, there is a substantive literature. The approach described in this lecture is a radical departure from the mainstream. First, imprecise probabilities are dealt with not in isolation, as in the mainstream literature, but in an environment of imprecise events, imprecise relations and imprecise constraints. Second, imprecise probability distributions are assumed to be described in a natural language. The approach is based on the formalism of Computing with Words (CW) (Zadeh 1999, 2006). In the CW-based approach, the first step involves precisiation of information described in natural language. Precisiation is achieved through representation of the meaning of a proposition, p, as a generalized constraint. A generalized constraint if an expression of the form X isr R , where X is the constrained variable, R is a constraining relation and r is an indexical variable which defines the modality of the constraint, that is, its semantics. The primary constraints are possibilistic, probabilistic and veristic. Computation follows precisiation. In the CW- based approach the objects of computation are generalized constraints. The CW-based approach to computation with imprecise probabilities enhances the ability of probability theory to deal with problems in fields such as economics, operations research, decision sciences, theory of evidence, analysis of causality and diagnostics.

Return to top

Title: Assessing Disclosure Risk, and Preventing Disclosure, in Microdata

  • Chair: John Czajka, Mathematica
  • Speakers:
    J. Neil Russell, National Center for Education Statistics
    Michael Weber, Internal Revenue Service
    Sonya Vartivarian, Mathematica
    Sam Hawala, U.S. Census Bureau
  • Date/Time: Tuesday, April 15, 2008 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS
  • Presentation material:
    J. Neil Russell slides (pdf, ~220kb)
    Sonya Vartivarian slides (pdf, ~56kb)
    Sam Hawala slides (pdf, ~24kb)

Abstracts:

Matching NCES Data to External Databases to Assess Disclosure Risk
J. Neil Russell, National Center for Education Statistics

The National Center for Education Statistics (NCES) is the Federal statistical agency responsible for collecting information on the condition of education in the United States. The agency's Disclosure Review Board (DRB) reviews and approves all microdata products prior to release. Since the early 1990s, the DRB has required that survey programs that release public-use microdata files (PUMFs) match to external databases as part of a disclosure risk analysis. Most NCES PUMFs have been matched to external databases to model an intruder's behavior for trying to disclose a respondent's identity. This presentation will focus on two features of this process. First, we will chronicle the history of matching at NCES as a disclosure risk assessment method. Second, we will present general findings of the disclosure risks discovered by matching to external databases.

Measuring Disclosure Risk and an Examination of the Possibilities of Using of Synthetic Data in the Individual Income Tax Return Public Use File (PUF)
Michael Weber, Internal Revenue Service and Sonya Vartivarian, Mathematica

The Statistics of Income Division (SOI) currently measures disclosure risk through a distance based technique that compares the Public Use File against the population of all tax returns and uses top-coding, subsampling and multivariate microaggregation as disclosure avoidance techniques. SOI is interested in exploring the use of other techniques that prevent disclosure while providing less data distortion. Synthetic or simulated data may be such a technique. But while synthetic data may be the ultimate in disclosure protection, creating a synthetic dataset that preserves the key characteristics of the source data presents a significant challenge. An additional constraint in creating synthetic data for the SOI PUF is found in maintaining the accounting relationships among numerous income, deduction, and tax items that appear on a tax return.

Data Synthesis via Expert Knowledge, Modeling, and Hot Deck
Sam Hawala, U.S. Census Bureau

The presentation will focus on a method to produce synthetic data through the combined use of expert knowledge, model fitting to the data, and matching using the model predicted values. All three elements play an important role in the successful reproduction of the aggregate behavior and the main features of a data set.

Return to top

Topic: Statistical Meta-Analysis - a Review

  • SSpeaker: Bimal Sinha, Presidential Research Professor, University of Maryland, Baltimore County
  • Date/Time: April 15, 2008, 3:00 - 4:00 p.m.
  • Location: U.S. Census Bureau,4600 Silver Hill Road, Seminar Room 5K410, Suitland, Maryland. Please call (301) 763-4974 to be placed on the visitors' list. A photo ID is required for security purposes. All visitors to the Census Bureau are required to have an escort to Seminar Room 5K410. An escort will leave from the Guard's Desk at the Metro Entrance (Gate 7) with visitors who have assembled at 2:55 p.m. Parking is available at the Suitland Metro.
  • Sponsor: U.S. Bureau Of Census, Statistical Research Division

Abstract:

Statistical meta-analysis deals with statistical methods to efficiently combine information or evidence from several studies in order to produce a meaningful inference about a common phenomenon. Applications of meta-analysis abound in the literature. In this talk a review of some salient features of statistical meta-analysis will be presented.

This talk is based on the 2008 John Wiley Book by the speaker.

This seminar is physically accessible to persons with disabilities. Please direct all requests for Sign Language Interpreting Services, Computer Aided Real-time (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Return to top

Title: Text Mining, Social Networks, and High Dimensional Analysis

  • Speaker:
    Edward J. Wegman Department of Computational and Data Sciences and Department of Statistics
    George Mason University
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: February 22, 2007
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 301, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

A traditional approach to text mining has been to represent a document by a vector. In the bag-of-words representation binary vectors are used and two documents are regarded as similar if the angle between their corresponding vectors is small (i.e., correlation between the vectors is high). The document vectors may be assembled into a term-document matrix (TDM). A more satisfying representation of a document can be formulated in terms of bigrams or trigrams, because these have a better chance of capturing semantic content Bigram vectors ran be assembled into bigram document matrices (BDM). The TDM and BDM resemble the two-mode adjacency matrices associated with social network analysis (SNA). Using cues from SNA, we formulate the one-mode social network adjacency matrices to form document-document matrices (DD) and bigram-bigram matrices (BB). In this talk I outline the basics, discuss the connection between text mining and social networks and, by example, illustrate the dimensionality issues raised by such vector space methods.

Return to top

Title: Text Mining, Social Networks, and High Dimensional Analysis

  • Speaker:
    Prof. Joe Blitzstein
    Department of Statistics
    Harvard University
    blitzstein@stat.columbia.edu
  • Date & Time:11:00-12:00pm, April 25 (Friday)
  • Location: Duques 250, 2201 G Street, N.W., George Washington University, Washington D.C. Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Data naturally represented in the form of a network, such as social and information networks, are being encountered increasingly often and have led to the development of new generative models (such as exponential random graphs and power law mechanisms) to attempt to explain the observed structure. However, it is usually prohibitively expensive to observe the entire network, so sampling in the network is needed. There has been comparatively little attention given to the question of what network properties are stable under what sampling schemes. We will discuss some examples where valid inferences about the structure of the network can and cannot be drawn from the sample, depending on the generative model, the sampling method, and the quantity of interest.

Return to top

Title: Preprocessing in High Throughput Biological Experiments

  • Speaker:
    Jeffrey C. Miecznikowski, Ph.D.
    Department of Biostatistics
    University at Buffalo
    Center of Excellence in Bioinformatics and Life Sciences
    Department of Biostatistics
    Roswell Park Cancer Institute
  • Time: Friday, April 25, 2008, 3:15 - 4:05 pm
  • Place: 326 St. Mary's Hall, Department of Mathematics, Georgetown University
  • Sponsor: Georgetown University

Abstract:

High throughput experiments including gene expression arrays,array comparative genomic hybridization (aCGH), and other spot imaging bioassays require so called "low level" processing to remove backgroundsignal and systematic variation. Afterwards a normalization step is required to compare assays within a given experiment. Although each platform in high throughput experiments has a distinct set of preprocessing steps, there are common preprocessing concepts and principles that should drive any scheme for preprocessing spot bioassays. In this talk, we will compare and contrast several preprocessing schemes that we developed for high throughput experimentsincluding: xerogel assays, gene expression arrays, aCGH arrays, and gel electrophoresis images.

Return to top

Title: Statistical issues in disease surveillance: A case study from ESSENCE

  • Speaker: Cara Olsen, PhD, Biostatistics Consulting Center (CIV, USUHS)
  • Date/Time: Friday, May 2, 2008 / 10:00 -11:00 a.m.
  • Location: Georgetown University Medical Center, Building D, 4000 Reservoir Rd., NW, Warwick Evans Conference Room, Washington, DC 20007
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics

Abstract:

Syndromic surveillance systems attempt to monitor the burden of disease in communities in real time, using health-related data and tools from statistics, epidemiology, informatics, and other disciplines. A potential benefit of such surveillance is early detection and tracking of infectious disease outbreaks.

The Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) is a syndromic surveillance system that monitors outpatient visits to military medical treatment facilities. This study examines whether ESSENCE can detect more infectious disease outbreaks, and detect them earlier, using joint monitoring of laboratory test orders and outpatient visit data rather than outpatient visit data alone. Statistical issues that arise from this question include which aberration detection algorithm is best suited to these data sources, how to quantify the tradeoffs among sensitivity, specificity and timeliness for detecting outbreaks, and how to monitor information from multiple data sources simultaneously.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Return to top

Title: Some Issues Raised by High Dimensions in Statistics

  • Speaker:
    D.M. Titterington
    Department of Statistics
    University of Glasgow
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: May 2, 2008
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 302, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

This talk is an overview presentation made by D.M. Titterington as a summary of the activities at Cambridge during the Spring of 2008. Most of twentieth-century statistical theory was restricted to problems in which the number p of 'unknowns', such as parameters, is much less than n, the number of experimental units. However, the practical environment has changed dramatically over the last twenty years or so, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is comparatively small but the underlying dimension is massive, leading to the desire to fit complex models for which the effective p is very large. Areas of application include image analysis, microarray analysis, finance, document classification, astronomy and atmospheric science. Some methodological advances have been made, but there is a need to provide firm consolidation in the form of a systematic and critical assessment of the new approaches as well as appropriate theoretical underpinning in this 'large p, small n' context. The existence of key applications strongly motivates the programme, but the fundamental aim is to promote core theoretical and methodological research. Both frequentist and Bayesian paradigms will be featured. The programme is directed at a broad research community, including both mainstream statisticians and the growing population of researchers in machine learning.

Return to top

Title: Statistical Issues Arising in the Interpretation of a Measure of Relative Disparity Used in Educational Funding: The Zuni School District 89 Case

  • Speaker: Joseph L. Gastwirth, Department of Statistics, George Washington University
  • Discussant: Marc Rosenblum, Office of the General Counsel and Chief Economist, Equal Employment Opportunity Commission
  • Chair: Michael L. Cohen, Committee on National Statistics
  • Date/Time: Monday, May 5, 2008 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Section on Public Policy

Abstract:

This seminar will discuss statistical issues that arose in recent cases. The first case concerns the interpretation of a formula Congress wrote when it revised a law that provides funds for educating children in areas with a large federal presence (e.g. major research lab). Because federal land is not subject to local real estate tax, the primary source of funding education, the law is intended to assist the relevant school districts. We will discuss the statute and the various interpretations that arose during the proceedings and the justifications provided. A counter-example to one of the assertions made by the lawyers at the Supreme Court hearing, which appears to have been accepted by the Court's majority, will also be presented.

Return to top

U. S. CENSUS BUREAU
9TH ELDERS PROGRAM SEMINAR

Topic: Different Directorates, Not So Different Approach

  • Speaker: Don Adams, Former Assistant Director of Economic Programs
  • Date/Time: May 8, 2008, 10:30 a.m. - Noon
  • Location: U.S. Census Bureau, 4600 Silver Hill Road, Suitland, Maryland, Census Bureau Auditorium. Please call (301) 763-2118 to be placed on the visitors' list. A photo ID is required for security purposes.
  • Sponsor: Human Capital Management Council and the SRD Seminar Series

Abstract:

I hope to provide insight into the earlydevelopment of Jeffersonville; how it originated, expanded, and how it interfaced with the Bureau's subject matter divisions in the 1970's and 1980"s. Also, I will address the changes in the Bureau's collection and publication of foreign trade statistics in the late 1980's and early 1990's. Biography : Don joined the Bureau in 1963 as an Industry Division analyst, moved to Demographic Surveys Division and in 1969 relocated to Jeffersonville in charge of processing the 1969 Census of Agriculture. This "temporary" assignment lasted for 16 years. He became Chief, Data Preparation Division (now NPC) in 1976 until late 1985 when he returned to Suitland as Chief, Data User Services Division. In less than a year, he became Chief, Foreign Trade Division, a position he held until the end of 1993. For much of the year 1993, one of reorganization in the Economic Directorate, Don was the Assistant Director for Economic Programs; Acting Chief, Foreign Trade Division; Acting Chief, Construction Division; and Acting Chief, Industry Division--all at the same time. A recipient of the Department's Silver and Gold Medals, Don retired as Assistant Director of Economic Programs on December 31, 1993.

Important Information:

This seminaris physically accessible to persons with disabilities. Please direct all requests for Sign Language Interpreting Services, Computer Aided Real-time (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Return to top

Title: Multivariate Event Detection and Characterization

  • Speaker: Daniel B. Neill, Carnegie-Mellon University
  • Chair: Myron Katzoff, CDC/National Center for Health Statistics
  • Date/Time: Tuesday, May 13, 2008 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Defense and National Security Section

Abstract:

We present the multivariate Bayesian scan statistic (MBSS), a general framework for event detection and characterization in multivariate spatial time series data. MBSS integrates prior information and observations from multiple data streams in a principled Bayesian framework, computing the posterior probability of each type of event in each space-time region. MBSS learns a multivariate Gamma-Poisson model from historical data, and models the effects of each event type on each stream using expert knowledge or labeled training examples. We evaluated MBSS on various disease surveillance tasks, detecting and characterizing disease outbreaks injected into three streams of Pennsylvania medication sales data. We demonstrated that MBSS can be used both as a "general" event detector, with high detection power across a variety of event types, and a "specific" detector that incorporates prior knowledge of an event's effects to achieve much higher detection power. MBSS has many other advantages over previous event detection approaches, including efficient computation and easy interpretation and visualization of results, and allows faster and more accurate detection by integrating information from the multiple streams. Most importantly, MBSS can model and differentiate between multiple event types, thus distinguishing between events requiring urgent responses and other, less relevant patterns in the data. This talk will present an overview of the MBSS framework, and compare MBSS to other recently proposed multivariate detection approaches. Time permitting, I will also discuss how incremental learning (both passive and active) can be incorporated into the MBSS framework and used to improve detection performance, and consider extensions of MBSS to more general pattern detection problems.

Return to top

PRESIDENT'S INVITED SEMINAR

Title: What's Up at the ASA?

  • Chair: Michael P. Cohen, WSS President
  • Speaker: Ron Wasserstein, ASA Executive Director
  • Date/Time: Thursday, May 15, 2008/12:30 pm to 2:30 pm
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS

Abstract:

ASA Executive Director Ron Wasserstein will provide a brief update on activities and directions of the association. However, most of the session will be devoted to questions and comments from the participants. Among the many things we could discuss:

  • What needs to be done to attract and retain statisticians in government service, and what role could the ASA play?

  • What benefits and services could the ASA provide that would increase its attractiveness to public and private sector members in this area?

  • Where do you see the profession heading in the next few years, and what should the ASA be doing? Ron has his opinions, of course, but is most interested in hearing yours.
Return to top

Title: Bayesian Dose-finding Trial Designs for Drug Combinations

  • Speaker: Guosheng Yin, Ph.D., Assistant Professor, Department of Biostatistics, M. D. Anderson Cancer Center
  • Date/Time: Friday, May 16, 2008 / 10:00 -11:00 a.m.
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, Research Building, Conference Room E501, Washington, DC 20007
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics

Abstract:

Treating patients with a combination of agents is becoming commonplace in cancer clinical trials, with biochemical synergism often the primary focus. In a typical drug combination trial, the toxicity profile of each individual drug has already been thoroughly studied in the single-agent trials, which naturally offers rich prior information. We propose Bayesian adaptive designs to search for the maximum tolerated dose combination. We continuously update the posterior estimates for the toxicity probabilities of the combined doses. By reordering the dose toxicities in the two-dimensional probability space, we adaptively assign each new cohort of patients to the most appropriate dose. Dose escalation, de-escalation or staying the same is determined by comparing the posterior estimates of the toxicity probabilities of combined doses and the prespecified toxicity target. We conduct extensive simulation studies to examine the operating characteristics of the design and illustrate the proposed method under various practical scenarios.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Return to top

Title: New Methods in Network And Spatial Sampling

  • Speaker: Steven K. Thompson, Simon Fraser University
  • Chair: Myron Katzoff, CDC/National Center for Health Statistics
  • Date/Time: Tuesday, June 3, 2008 / 12:30 - 2:00 p.m.
  • Location:Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Defense and National Security Section

Abstract:

Over the last several decades new sampling methods have been developed in response to problems in studies of difficult-to-sample populations and to theory on optimal designs. In this talk I'll describe a handful of such methods for sampling of populations in network and spatial settings.

Title: Alternative Survey Sample Designs, Seminar #3: The Pros and Cons of Balanced Sampling

  • Speaker: Professor Yves Tille, University of Neuchatel, Switzerland
  • Discussant: Professor Jean D. Opsomer, Colorado State University
  • Date/Time: Wednesday, 8:50AM - 11:00PM, 4 June 2008
  • Location: U. S. Census Bureau, 4600 Silver Hill Road, Auditorium, Suitland, Maryland. By Metro, use the Green Line to Suitland Station and walk through the Metro parking garage to the main entrance of the Census Bureau. Please send an e-mail to Carol.A.Druin@census.gov, or call (301) 763 - 4216 to be placed on the visitors' list for this seminar by 30 May 2008. A photo ID is required for security purposes. Non-U.S. Citizens: If you wish to attend and you are not a U.S. Citizen, Federal security procedures require that you submit the following information no later than 21 May 2008: Your name, date of birth, gender, country of birth, country of citizenship, country of residence, countries of dual citizenship (if it applies), and passport number & issuing country. Security will then approve or deny the visit. If approved, you must come with your passport and be escorted throughout your visit.
  • Sponsor: U.S. Bureau Of Census, Demographic Statistical Methods Division

Abstract:

The Census BureauÕs Demographic Survey Sample Redesign Program, among other things, is responsible for research into improving the designs of demographic surveys, particularly focused on the design of survey sampling. Historically, the research into improving sample design has been restricted to the ÒmainstreamÓ methods like basic stratification, multi-stage designs, systematic sampling, probability-proportional-to-size sampling, clustering, and simple random sampling. Over the past thirty years or more, we have increasingly faced reduced response rates and higher costs coupled with an increasing demand for more data on all types of populations. More recently, dramatic increases in computing power and availability of auxiliary data from administrative records have indicated that we may have more options than we did when we established our current methodology.

This seminar series is the beginning of an exploration into alternative methods of sampling. In this third seminar of the three seminar series, from 8:50 to 9:50, we will hear about Professor TilleÕs work on the use of balanced sampling. He will discuss the various approaches to balanced sampling, focusing particularly on the statistical properties of each. Following Professor TilleÕs presentation, there will be a 10-minute break, and then from 10:00 Ð 10:45, Professor Jean Opsomer will provide discussion about the methods and their potential in demographic surveys, particularly focusing on impact on estimation. The seminar will conclude with an open discussion session from 10:45 Ð 11:00.

This event is accessible to persons with disabilities. Please direct all requests for sign language interpreting services, Computer Aided Real-time Translation (CART), or other accommodation needs, to HRD.Disability.Program@census.gov. If you have any questions concerning accommodations, please contact the Disability Program Office at 301-763-4060 (Voice), 301-763-0376 (TTY).

Return to top

Title: Nonresponse Adjustments in Survey Applications

  • Chair: Nancy Bates, U.S. Census Bureau
  • Speakers:
    Frauke Kreuter, Joint Program in Survey Methodology, University of Maryland
    Trena Ezzati-Rice, Agency for Healthcare Research and Quality
  • Discussant: Keith Rust, Westat and JPSM
  • Date/Time: Tuesday, June 10, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program
  • Presentation material:
    Frauke Kreuter slides (pdf, ~140kb)
    Trena Ezzati-Rice slides (pdf, ~252kb)
    Keith Rust slides (pdf, ~164kb)

Abstract (Kreuter):

Using Proxy Measures and Other Correlates of Survey Outcomes to Adjust for Nonresponse: Examples from Multiple Surveys Nonresponse weighting is a commonly used method to adjust for bias due to unit nonresponse in surveys. Theory and simulations show that, in order to effectively reduce bias without increasing variance, a covariate used for nonresponse weighting adjustment needs to be highly associated with both response and the survey outcome. In practice, these requirements pose a challenge that is often overlooked. Recently some surveys have begun collecting supplementary data, such as interviewer observations and other proxy measures of key survey outcomes. These variables are promising candidates for nonresponse adjustment because they should be highly correlated with the actual outcomes. In the present study, we examine the extent to which traditional covariates and new proxy measures satisfy the weighting requirements for the National Survey of Family Growth, the Medical Expenditure Survey, the U.S. National Election Survey, the European Social Surveys and the University of Michigan Transportation Research Institute Survey. We provide empirical estimates of the association between proxy measures and the likelihood of response as well as the actual survey responses. We also compare unweighted and weighted estimates under various nonresponse models. Results show the difficulty of finding suitable covariates and the need to improve the quality of proxy measures. s to examine the operating characteristics of the design and illustrate the proposed method under various practical scenarios.

Abstract (Ezzati-Rice):

Assessment of the Impact of Health Variables on Nonresponse Adjustment in the Medical Expenditure Panel Survey The Medical Expenditure Panel Survey(MEPS) is a large complex sample survey, designed to provide nationally representative annual estimates of health care use, expenditures, sources of payment, and insurance coverage for the U.S. civilian non-institutionalized population. A new panel of households is selected each year for the MEPS from households that responded to the previous year's National Health Interview Survey(NHIS). Nonresponse is a common problem in household sample surveys. To compensate for nonresponse and to reduce the potential bias of the survey estimates, two separate nonresponse adjustments are performed in development of analytic weights in MEPS. The first, the focus of this presentation, is an adjustment for dwelling unit (DU) level nonresponse to account for nonresponse among those households subsampled from NHIS for the MEPS. The adjustment is carried out using socio-economic, demographic, and health variables that are available for both respondents and nonrespondents. In this study, we examine the impact of health variables on the MEPS DU level nonresponse weight adjustment. Response propensity scores are calculated based on logistic regression models and quintiles of the propensity scores are used to adjust the MEPS base weights. Comparisons of the nonresponse adjusted weights and selected survey variables with and without inclusion of health variables as a nonresponse adjustment covariate are discussed.

Return to top

Title: Recent Developments in Address-based Sampling

  • Chair: Meena Khare, NCHS
  • Speakers: Mansour Fahimi, Marketing Systems Group
  • Discussant: Sylvia Dohrmann, Westat
  • Date/Time: Tuesday, June 17, 2008 / 12:30 - 2:00 p.m.
  • Sponsor: WSS Methodology Program
  • Location: Bureau of Labor Statistics, Conference Center Room 8. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Presentation material:
    Mansour Fahimi slides (pdf, ~1.1mb)
    Sylvia Dohrmann slides (pdf, ~232kb)

Abstract:

Increasingly, survey researchers are reverting back to address-based methodologies to reach the general public for survey administration and related commercial applications. Essentially, there are three main factors for this change: evolving coverage problems associated with telephone-based methods; eroding rates of response to telephone contacts; and on the other hand, recent improvements in the databases of household addresses available to researchers. This presentation provides an assessment of these three factors along with an over view of the structure of the Delivery Sequence File (DSF) of the USPS that is often used for construction of address-based sampling frames. Moreover, key enhancements available for the DSF will be discussed. While reducing undercoverage bias particularly in rural areas where more households rely on P.O. Boxes and inconsistent address formats such enhancements enable researcher to develop more efficient sample designs as well as broaden their analytical possibilities through an expanded set of covariates for hypothesis testing and statistical modeling tasks.

Return to top

Title: Entropy and ROC Based Methods for SNP Selection and GWA Study

  • Speaker: Prof. Zhenqiu Liu, Department of Epidemiology and Preventive Medicine, University of Maryland School of Medicine
  • Date/Time: Wednesday, June 18, 2008 / 11:00 a.m. to 12:00 p.m.
  • Location: Conference Room 9201. Two Rockledge Center, 6701 Rockledge Drive, Bethesda, MD 20892
  • Sponsor: NHLBI/DPPS/OBR

Abstract:

GWA studieshave become an important approach in the last few years as a means to elucidate associations between particular alleles and a predisposition to disease. Genome wide SNP selection and association study with entropy related methods has been proven to be useful in the literature. In this talk, we introduce a multi-locus LD measure with generalized mutual information. SNP tagging, genetic mapping, and association study are performed with the proposed LD measure and Monte Carlo methods. We also briefly introduce a ROC based statistical learning approach for SNP selection and association study and discuss methods to detect the rare alleles associated with disease.

Return to top

ROGER HERRIOT MEMORIAL LECTURE

Title: Collaborative Efforts to Foster Innovations in Federal Statistics

  • Chair/Discussant: Dr. Nancy Kirkendall, retired from the Energy Information Administration
  • Speakers:
    Mr. Thomas Petska, Statistics of Income Division, IRS
    Dr. Nancy Gordon, Bureau of the Census
    Dr. John Eltinge, Bureau of Labor Statistics
    Dr. Janice Lent, Energy Information Administration
  • Date/Time: September 4, 2008 / 12:30 to 2:00 pm
  • Place: Bureau of Labor Statistics Conference Center, Rooms 1-3. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.

Abstract:

This session is organized by Nancy Kirkendall, winner of the 2007 Roger Herriot Award for Innovations in Federal Statistics. Speakers will discuss some of the current thinking about how to encourage innovation in federal statistics and will describe and demonstrate collaborative efforts that are underway.

Return to top

Title: Metadata from the Data Collection Point of View

  • Speaker: Daniel Gillman, Information Scientist, U.S. Bureau of Labor Statistics
  • Discussant: Julia I. Lane, Program Director Science of Science and Innovation Policy, National Science Foundation
  • Chair: Katie E. Joseph, Mathematical Statistician, Energy Information Administration
  • Date/Time: Wednesday, September 10, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 2. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsors: WSS Data Collection Methods and DC-AAPOR

Abstract:

The term metadata was first used to name the data generated to describe other data - data about data. The success with that approach led toexpanding the term to mean data that describes any object. Surveys produce many kinds of objects (e.g., questionnaires, case contacts, edit specifications, etc.), and each can be described. Those descriptions are statistical metadata.

The survey life-cycle is unusual in that metadata from one part of the cycle has an effect on actions in later steps. For example, sampling has an impact on the cost of data collection. Paradata, which is metadata obtained from the data collection process, is included. Unfortunately, using the term paradata rather than metadata has the side-effect of isolating this metadata from other parts of the survey life-cycle in the minds of survey methodologists and analysts. Now, there are many reasons to use paradata to enhance data collection activities only, but paradata may affect other processing, too.

Using a fabricated survey, we trace the origin and uses of metadata throughout the survey life-cycle with emphasis and perspective on data collection. The objective is to demonstrate how the data collection process both uses and produces metadata, how metadata produced by one life-cycle step is used in later steps, and how metadata management techniques can greatly increase the usefulness of metadata. This is true for survey processing, survey planning and redesign, and data dissemination.

The ultimate goal of the talk is to show how metadata may be used to tie the pieces of a survey together into a coherent whole. The advantages are numerous.

Return to top

Title: Cell Lines, Microarrays, Drugs and Disease: Trying to Predict Response to Chemotherapy"

  • Speaker: Keith Baggerly, PhD, Bioinformatics and Computational Biology, UT M. D. Anderson Cancer Center
  • Date/Time: Friday, September 12, 2008, 10:00-11:00 AM.
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, Research Building, Conference Room E501, Washington, DC 20007.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics

Abstract:

Over the past few years, microarray experiments have supplied much information about the disregulation of biological pathways associated with various types of cancer. Many studies focus on identifying subgroups of patients with particularly agressive forms of disease, so that we know who to treat. A corresponding question is how to treat them. Given the treatment options available today, this means trying to predict which chemotherapeutic regimens will be most effective.

We can try to predict response to chemo with microarrays by defining signatures of drug sensitivity. In establishing such signatures, we would really like to use samples from cell lines, as these can be (a) grown in abundance, (b) tested with the agents under controlled conditions, and (c) assayed without poisoning patients. Recent studies have suggested how this approach might work using a widely-used panel of cell lines, the NCI60, to assemble the response signatures for several drugs. Unfortunately, ambiguities associated with analyzing the data have made these results difficult to reproduce.

In this talk, we will describe how we have analyzed the data, and the implications of the ambiguities for the clinical findings. We will also describe methods for making such analyses more reproducible, so that progress can be made more steadily.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Return to top

Title: Weighted-Covariance Factor Decomposition of VARMA Models Applied to Forecasting Quarterly U.S. GDP at Monthly Intervals

  • Speakers:
    Baoline Chen, Bureau of Economic Analysis
    Peter Zadrozny, Bureau of Labor Statistics
  • Discussant: Tara Sinclair, George Washington University
  • Chair: Linda Atkinson, Economic Research Service, USDA
  • Date/Time: Tuesday, September 23, 2008 / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsor: WSS Economics Section

Abstract:

We develop and apply a method, called weighted-covariance factor decomposition (WCD), for reducing large estimated vector autoregressive moving-average (VARMA) data models of many "important" and "unimportant" variables to smaller VARMA-factor models of "important" variables and significant factors. WCD has four particularly notable features, compared to frequently used principal components decomposition, for developing parsimonious dynamic models: (1) WCD reduces larger VARMA-data models of "important" and "unimportant" variables to smaller VARMA-factor models of "important" variables, while still accounting for all significant covariances between "important" and "unimportant" variables; (2) WCD allows any mixture of stationary and nonstationary variables; (3) WCD produces factors, which can be used to estimate VARMA-factor models, but more directly reduces VARMA-data models to VARMA-factor models; and, (4) WCD leads to a model-based asymptotic statistical test for the number of significant factors. We illustrate WCD with U.S. monthly indicators (4 coincident, 10 leading) and quarterly real GDP. We estimate 4 monthly VARMA-data models of 5 and 11 variables, in log and percentage-growth form; we apply WCD to the 4 data models; we test each data model for the number of significant factors; we reduce each data model to a significant-factor model; and, we use the data and factor models to compute out-of- sample monthly GDP forecasts and evaluate their accuracy. The application's main conclusion is that WCD can reduce moderately large VARMA-data models of "important" GDP and up to 10 "unimportant" indicators to small univariate-ARMA-factor models of GDP which forecast GDP almost as accurately as the larger data models.

Return to top

Title: Prediction Limits for Poisson Distribution

  • Speaker: Valbona Bejleri, PhD, Associate Professor, Department of Mathematics, University of D.C.
  • Date/Time: Friday, September 26, 2008 / 10:00-11:00 AM
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, Research Building, Conference Room E501, Washington, DC 20007.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics

Abstract:

Statistical prediction differs from standard confidence interval estimation. A point of interest in prediction is the estimation of the unknown values of the random variable, corresponding to the outcomes from the future experiment. We derive prediction limits for a Poisson process using both frequentist and Bayesian approaches. An algorithm of how to construct the optimal (smallest) frequentist upper prediction limit for a single future observation is presented. Our work is based on a Poisson model that uses a Poisson-binomial relationship. Bayesian prediction limits are also calculated. The relationship between prediction limits derived using Bayesian approach (with noninformative priors) and limits derived using frequentist approach is discussed. We show that there is no prior distribution which produces a two sided prediction interval which coincides with the frequentist prediction interval at both the upper and lower limit. Conditions under which Bayesian and frequentist limits agree are important in order to inform our choice of method. The area of application includes the prediction of rare events. An example with real life data will be presented.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Return to top

Title: Statistical Policy Issues Arising in Carrying Out the Requirements of the Prison Rape Elimination Act of 2003

  • Speaker: Allen Beck, Bureau of Justice Statistics, U.S. Department of Justice
  • Discussant: Hermann Habermann, former Director, U.N. Statistics Division and former Deputy Director, U.S. Census Bureau
  • Chair: Shelly Wilkie Martinez, Office of Statistical and Science Policy, U.S. Office of Management and Budget
  • Date/Time: Tuesday, October 7, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Section on Public Policy

Abstract:

Our speaker will discuss how the Bureau of Justice Statistics has approached its responsibilities under the Prison Rape Elimination Act of 2003. The act provides fairly detailed sampling specifications and requires BJS to publish prison- and jail-level data on the incidence of rape, and to identify the three "best" and "worst" of each. Given the sensitive nature of the content, the developmental nature of the data collections, and the administrative and enforcement purposes to which the data will be put, BJS has had to step carefully to maintain its position as the Justice Department's principal statistical agency. Our discussant will assess the unfolding BJS experience as a case study of agency practice against professional practice and ethical criteria embodied in frameworks such as the United Nation's Fundamental Principles of Official Statistics and in Principles and Practices of a Statistical Agency, a seminal publication of the Committee on National Statistics.

Return to top

Title: An Introduction to Using GIS for Geostatistical Analysis

  • Presenter:
    Dr. Kathleen Hancock
    Director for the Center for Geospatial Information Technology
    Associate Professor in the Via Department of Civil and Environmental Engineering at Virginia Tech.
  • Chair:
  • Date/Time: Wednesday, October 8, 2008 / 12:30 to 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Training Center Room 10. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Interagency Council on Statistical Policy Innovation Subgroup

Abstract:

When any spatially distributed phenomenon is measured, the observations display some form of spatial pattern, whether physical, environmental, or demographic. Traditional sampling methods, and resulting models, require that these phenomena be measured such that they were spatially independent to avoid spatial correlations. With geostatistics, these autocorrelations can be modeled, allowing sampling to be less restrictive and changing the emphasis from estimation of averages to mapping of spatially-distributed populations.

This presentation will provide an overview of Geographic Information Systems GIS) methods to explore spatially distributed information and to describe, model, and predict values from geospatially-referenced sample data.

Return to top

Title: Greenhouse, White House, and Statistics: The use of statistics in environmental decision making

  • Speaker: Barry D. Nussbaum, Chief Statistician, US Environmental Protection Agency, Washington, DC (e-mail: Nussbaum.Barry@epamail.epa.gov)
  • Chair: Mel Kollander
  • Date/Time: Wednesday, October 15, 2008 / 12:30 - 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources Section

Abstract:

In many applications, statistics and the statistician play an enormous role in the initiation, analysis, and implementation of policy and decisions. Yet, frequently after elaborate analyses, it is only a concise summary of the analysis that reaches the decision makers desk. Using a number of examples, the author describes how frequently it is a relatively simple description of statistical analyses, when presented truly effectively (and often humorously), that can have a large impact on major decisions. The author also demonstrates the growing need to review the quality of large acquired data bases. This presentation includes several real examples from relatively simple surveys, analyses, and reviews with applications in regulation development, court cases, and policy making. One of the examples even landed on the desk of the President of the United States.

Return to top

Title: Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies

  • Speaker: Xiao-Li Meng, Department of Statistics, Harvard University
  • Date/Time: Wednesday, October 15, 2008 (3:00-4:00pm)
  • Location: 1957 E Street NW, Washington DC, Room 212
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

This talk is based on a forthcoming discussion paper in Statistical Science (jointly with Nicolae and Kong, and preprint available at http://www.imstat.org/sts/future_papers.html ) with the following abstract:

Many practical studies rely on hypothesis testing procedures applied to datasets with missing information. An important part of the analysis is to determine the impact of the missing data on the performance of the test, and this can be done by properly quantifying the relative (to complete data) amount of available information. The problem is directly motivated by applications to studies, such as linkage analyses and haplotype-based association projects, designed to identify genetic contributions to complex diseases. In the genetic studies the relative information measures are needed for the experimental design, technology comparison, interpretation of the data, and for understanding the behavior of some of the inference tools. The central difficulties in constructing such information measures arise from the multiple, and sometimes conflicting, aims in practice. For large samples, we show that a satisfactory, likelihood-based general solution exists by using appropriate form s of the relative Kullback-Leibler information, and that the proposed measures are computationally inexpensive given the maximized likelihoods with the observed data. Two measures are introduced, under the null and alternative hypothesis respectively. We exemplify the measures on data coming from mapping studies on the inflammatory bowel disease and diabetes. For small-sample problems, which appear rather frequently in practice and sometimes in disguised forms (e.g., measuring individual contributions to a large study), the robust Bayesian approach holds great promise, though the choice of a general-purpose "default prior" is a very challenging problem. We also report several intriguing connections encountered in our investigation, such as the connection with the fundamental identity for the EM algorithm, the connection with the second CR (Chapman-Robbins) lower information bound, the connection with entropy, and connections between likelihood ratios and Bayes factors. We hop e that these seemingly unrelated connections, as well as our specific proposals, will stimulate a general discussion and research in this theoretically fascinating and practically needed area.

Return to top

Title: Imbalance in Digital Trees and Similarity of Digital Strings

  • Speaker: Hosam Mahmoud, Department of Statistics, George Washington University
  • Date/Time: Friday, October 17, 2008 (11:00am-12:00 noon)
  • Location: 220 Funger Hall (2201 G Street, NW)
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

The imbalance factor of the nodes containing keys in a random digital tree is investigated. Accurate asymptotics for the mean are derived for a randomly chosen key in the tree via poissonization and the Mellin transform, and the inverse of the two operations. It is also shown from a singularity analysis of the moving poles of the Mellin transform of the poissonized moment generating function that the imbalance factor (under appropriate centering and scaling) follows a Gaussian limit law.

The methods are amenable to the investigation of the average similarity of random strings as captured by the average number of "cousins" in the underlying tree structures. Certain analytic issues arise in the digital tree underlying DNA that do not have an analog in the binary case.

Return to top

Title: NORC Data Enclave

  • Speaker: Chet Bowie and Tim Mulcahy, National Opinion Research Center at The University of Chicago
  • Chair: Jeri M. Mulrow, Mathematical Statistician, National Science Foundation
  • Date/Time: Wednesday, October 22, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsors: ICSP Innovation Working Group

Abstract:

Federal Statistical Agencies currently employ a variety of ways to share and disseminate information and data ranging from print publications to tabular output to integrated database systems. The goal is to provide useful data to a variety of audiences. Statistical agencies face a significant challenge in striking an appropriate balance between providing access to sensitive data and safeguarding confidentiality. Of particular interest to many data users is access to microdata files. Many exciting options are available for producing microdata that balance availability with protection. These options range from statistical methods to produce public use files, to analytic solutions sitting on top of the data, to restricted data access. This seminar is designed to showcase one new option.

The NORC Data Enclave began in July 2007. The enclave provides a secure environment within which authorized researchers can access sensitive microdata remotely from their offices or onsite at NORC. The enclave currently provides access to data produced by the Department of Commerce's National Institute for Standards and Technology (NIST) - Technology Innovation Program (TIP), the Kauffman Foundation, and the U.S. Department of Agriculture's Economic Research Service and National Agricultural Statistical Service.

This presentation will discuss the rational behind the enclave's development, NORC's portfolio approach to protecting data in the enclave, the training and collaboratory aspects of the enclave, and basic navigation within the enclave environment. The presentation also will include discussion on lessons learned, challenges, and the future direction of remote data access platforms.

Return to top

Title: The Increasing Difficulty of Obtaining Personal Interviews in the United States' Ever Changing Social Environment

  • Speaker: William Wayne Hatcher, Regional Director, Charlotte Region, Census Bureau
  • Discussant: Terry P. O'Connor, Head, Data Quality Research Section, National Agricultural Statistics Service/USDA
  • Chair: Marilyn Worthy, Energy Information Administration
  • Date/Time: Thursday, October 23, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 7. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsors: WSS Data Collection Methods and DC-AAPOR

Abstract:

Personal (face to face) interviewing provides many advantages to telephone or other modes of interviewing. However personal interviewing has become much more challenging the past decade. This presentation will address barriers to personal interviewing including the challenge of interviewing in an increasingly diverse society, difficulty in finding respondents at home, convincing respondents to make time in their busy schedule for an interview, locating respondents who move to different locations on longitudinal surveys, gaining access to locked buildings and gated communities, interviewing a polarized or disenfranchised population, techniques used to convince reluctant respondents to participate in a survey, and gaining trust and cooperation from uncooperative respondents.

Return to top

Title: Usability of Electronic Voting and Public Opinion about the New Technology

  • Chair: Brian Meekins, Bureau of Labor Statistics
  • Speaker: Frederick Conrad and Michael Traugott, Institute for Social Research, University of Michigan
  • Date/Time: Friday, October 24, 2008 / 2:00 - 3:30pm
  • Location: Bureau of Labor StatisticsConference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS, and D.C. AAPOR

Abstract:

In response to the contentious presidential election in 2000, most states have introduced some form of electronic voting. The new technology has almost certainly repaired the ambiguity inherent in analog voting systems that were made famous by phenomena like "hanging chads" in which one cannot be certain whether the voter intended to register a vote for a candidate or not. However, e-voting has generated considerable vitriol because of concerns about voting security (e.g., hacking) and to a lesser degree usability (inability to vote as intended). We present recent research that addresses the usability of current e-voting systems and the public's confidence in the accuracy, privacy and anonymity of their votes. Our results come from (1) a laboratory study that measures the relationships between voting accuracy, micro-level voting actions and satisfaction in six different e-voting interfaces, (2) a natural experiment that compares rates of incomplete or spoiled ballots in two states before and after the introduction of e-voting using a new statistical method (ecological regression), and (3) two national surveys with an embedded vignette experiment that varies the description of e-voting and that allow us to examine the impact of voters' scientific knowledge and attitudes toward science on their attitudes toward e-voting.

Return to top

Title: Statistics in Forensic Science

  • Speaker: Walter Rowe (Department of Forensic Sciences, George Washington University)
  • Date/Time: Friday, October 31, 2008, 11:00-12:00 pm
  • Location: Funger Hall, Room 220 (2201 G Street, NW, Washington, DC 20052)
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Forensic scientists make frequent use of statistical methods. Like other scientists they may have to concern themselves with obtaining representative samples from large (possibly inhomogeneous) collections of evidence; they may also be concerned about the precision of their measurements. However, in many criminal and civil cases forensic scientists have two fundamental questions to answer when confronted with a piece of evidence. What is it? And where did it come from? Sometimes it is only necessary to answer the first question. Is that white powder cocaine? A positive or negative answer to that question suffices in most drug possession and drug trafficking cases. The more intriguing forensic question is where the piece of evidence came from. Some types of evidence (fingerprints, shoe and tire impressions, tool marks and fired bullets and cartridge cases) present what appear to be unique features (fingerprint ridge characteristics or patterns of striations). Probability models have been developed for some types of pattern evidence (e.g. fingerprints, tool marks and striation patterns on fired bullets) to support the argument that their features are unique. With other types of evidence, forensic scientists may determine a set of features, no one of which is unique but which when aggregated specify a unique source. In DNA profiling the alleles present at a large number of gene loci are determined. For each gene locus the combination of alleles found usually is present in a large fraction of the human population. However, if enough gene loci are examined the DNA recovered from a blood or semen stain can be linked to one and only one member of the human population. This association is possible because geneticists and forensic molecular biologists have accumulated frequency data for the gene loci in which they are interested. Relevant frequency data is usually lacking for other types of evidence. In dealing with these types of evidence the forensic scientist may only be able to say that the evidence came from a particular geographical area or in the case of manufactured items belongs to a particular product formulation.

Principal component analysis (PCA) and discriminant analysis (DA) have been applied to a variety of forensic problems in recent years. These range from the comparison of soil samples and the classification of ignitable liquids used as arson accelerants to the comparison of writing inks such as ball pen inks, gel pen inks, permanent markers and dry erase markers. PCA and DA allow examine the similarities and differences between similar materials and assign unknown samples to groups having similar formulations. In the field of forensic document examination identifying the formulation of the ink used to prepare a document can be useful because the dates at which a particular formulation came on the market will generally be known. A document which has been prepared with an ink that was not available at the time it was supposedly created cannot be authentic. PCA also allows forensic scientists to compare different methods of analysis and select those that have the greatest discriminating power.

This presentation will conclude with a brief survey of the attitudes of United States courts toward statistical inference and the prevailing rules for the admissibility of scientific and technical evidence (which includes statistics).

Return to top

Title: Volatility, Jump Dynamics in the U.S. Energy Futures Markets

  • Speaker: Johan Bjursell, Department of Computational and Data Sciences, George Mason University
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: October 31, 2008
  • Location: Research 1, Room 301, Fairfax Campus, George Mason University, 4400 University Drive, Fairfax, VA 22030 li>Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

In this talk, I apply a nonparametric method based on realized and bipower variations calculated from intraday data to identify jumps in daily futures price series of crude oil, heating oil and natural gas contracts traded on the New York Mercantile Exchange. The sample period of our intraday data covers January 1990 to December 2007. Alternative methods such as staggered returns and optimal sampling frequency methods are used to remove the effects of microstructure noise biases on the tests against detecting jumps. Our empirical work documents the monthly and intraday variation of the frequency of jumps and the jump size. The relative contribution of jumps to total futures price variation is also investigated. In general, the results are consistent with the implications from the theory of storage and news arrival.

This is joint work with James E. Gentle and George H. K. Wang.

Return to top

Title: Can Calibration Be Used to Adjust for ÔNonignorable" Nonresponse?

  • Speaker: Phillip S. Kott, National Agricultural Statistics Service (written with Ted Chang, University of Virginia)
  • Chair: TBA
  • Discussant: John Eltinge, Bureau of Labor Statistics
  • Date/Time: Monday, November 3, 2008, 12:30 Ð 2:00pm
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsors: Methodology Program, WSS

Abstract:

Although not originally designed for that purpose, calibration can be used to adjust for unit nonresponse. It is less well known that calibration can be employed when the (explanatory) model variables on which the response/nonresponse mechanism depends do not coincide with the benchmark variables in the calibration equation. As a result, model-variable values need only be known for the respondents. This allows the treatment of what is usually considered nonignorable nonresponse.

Two distinct theories justify using calibration as a method for nonresponse adjustment: quasi-random response modeling and prediction modeling. The prediction- modeling approach needs to be extended to cover nonignorable nonresponse. The prediction model itself relates the survey variable to the model variables. A second model equation, called the "measurement-error model," connects the model variables to the benchmark variables.

The justification for both the response and prediction modeling approaches relies on samples being large and on model assumptions that can fail in practice. We explore these limitations empirically using data from an agricultural census.

Mutually exclusive group-indicator variables known for all units in the population serve as the benchmark variables in our investigation. The "benchmark groups" themselves are based on previously-collected frame information. Model variables are created by constructing analogous "model groups" using survey information known only for the respondents.

Neither the prediction/measurement-error model nor the response model employing these model variables is correct. Both, however, are closer to the truth than commonly-invoked models that treat the benchmark groups as the model groups. As a consequence, using the response-generated model groups leads to much lower empirical biases and smaller mean squared errors if slightly larger empirical standard deviations.

Return to top

Title: Self-Service Business Intelligence for Statistical Agencies/Departments

  • Speakers:
    Karen Cholak, Space-Time Research
    Brian Garrett, Space-Time Research
  • Chair: Jeri M. Mulrow, Mathematical Statistician, National Science Foundation
  • Date/Time: Tuesday, November 4, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsors: ICSP Innovation Working Group

Abstract:

Federal, state and local statistical agencies are under increasing demands from their key stakeholders and the public at large. Timely, relevant and robust statistics are being recognized as fundamental to enabling policy formulation by government agencies and allowing industry and the community to make better informed decisions. At the same time, agencies must ensure the confidentiality of the data.

Some of the key business, policy, and management issues facing statistical agencies include:

  • Enhancing the transparency of government information
  • Transforming government through information sharing
  • Providing better information services with fewer programmers and statisticians
  • Providing appropriate access to a diverse range of information users
  • Meeting the user's expectation for instant answers to questions
  • Making more data available to the stakeholders
  • Balancing increased demands of privacy protection for individuals versus increasing access to data
  • Ensuring responsible statistical communication to users

This seminar is designed to address these issues.

Space-Time Research (STR) is the global leader in Self-Service Business Intelligence. Our solutions are EASIER, FASTER, and SAFER than traditional statistical analysis. End-users interactively analyze and visualize data in a drag-and-drop environment. By optimizing detailed-level data, or microdata, our solution supports a "Query-Answer-Query" approach to data exploration. Confidentiality routines protect the privacy of the data; integration with mapping technology supports visualization options for geo-coded data.

Our customers are the most advanced government agencies for statistics, education, transportation, health, and justice. Customers include the U.S. Census Bureau, the Australian Bureau of Statistics, Statistics New Zealand, Office for National Statistics in the United Kingdom, and the Russian Federal Statistics Office, among others.

Return to top

Title: Nearest Neighbor Imputation Strategies : Does 'nearest' imply most likely? - And other difficult questions …

  • Speaker: Timothy Keller, National Agricultural Statistics Service, Washington, DC
  • Chair: Mike Fleming
  • Date/Time: Thursday, November 6, 2008 12:30 - 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources Section

Abstract:

A frequently stated objective for methods of imputing values for missing data fields is that the distribution of the data be preserved. Some knowledge of the underlying distribution seems to be necessary to any reasonable method of imputation, although in practice that knowledge may be very limited. It is proposed that imputation methods using the concept of a nearest neighbor with respect to some appropriate metric may be viewed as a sort of surrogate for likelihood maximizing substitutions. A formal statement of the problem of determining when nearest neighbor techniques are a reasonable proxy for selection based on likelihood considerations is attempted.

e-mail: Tim_Keller@nass.usda.gov

Return to top

Title: NOAA's National Weather Service Weather Services for the Nation -- A Transition Briefing

  • Speaker: Jack Hayes, Assistant Administrator for Weather Services and National Weather Service (NWS) Director
  • Chair: Mel Kollander
  • Date/Time: Monday, November 10, 2008 / 12:00 - 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources Section

Abstract:

A newAdministration assumes responsibility for leading our Nation on January 20, 2009. There will be new people running the Executive Branch of the Federal Government and they will need to learn about the many agencies they oversee. The Transition Briefing offers a unique opportunity to explain the critical role the National Weather Service serves to provide the people of our Nation with weather, water, and climate forecasts and warnings for the protection of life and property and enhancement of the national economy.

Point of contact e-mail: Leslie.Taylor@noaa.gov

Return to top

Title: Administrative Data in Support of Policy Relevant Statistics: The Medicaid Undercount Project

  • Speaker: Dr. Michael Davern, Assistant Professor and Research Director, State Health Access Data Assistance Center, University of Minnesota
  • Discussant: Linda Bilheimer, Associate Director for Analysis and Epidemiology, National Center for Health Statistics
  • Chair: Shelly Wilkie Martinez, Office of Statistical and Science Policy, U.S. Office of Management and Budget
  • Date/Time: Thursday, November 13, 2008 / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Section on Public Policy
  • Presentation material:
    Slides from the presentation (pdf, ~2.2mb)
    Slides from the discussant (pdf, ~192kb)

Abstract:

The seminar will focus on efforts to understand why survey estimates of the number of people enrolled in Medicaid are well below administrative data enrollment counts. A crude comparison between the Current Population Survey's Annual Social and Economic Supplement (CPS) and the Medicaid Statistical Information System (MSIS) shows the survey estimate to be 43% smaller than the administrative data estimate. The causes of this large discrepancy are varied and many of them can have a profound impact on our understanding of health access policy. This project categorized the identified causes of the "undercount" into universe alignment issues and survey response error. After adjusting for universe alignment issues (which include adjustments for people counted in MSIS that will not be counted in the CPS such as institutional group quarters and people enrolled in more than one state during the reference year), the gap between the survey estimate and administrative data narrows to 31%. The remaining cause of discrepancy is survey reporting error with 17% of the linked cases reporting being uninsured in the CPS. The extent of the reporting error, at first glance, places the error clearly in the domain of survey measurement for counting many people as being uninsured all of last year when they were enrolled in Medicaid for at least a day.

The large reduction in the number of uninsured has far reaching implications on those who use the survey data to evaluate federal and state programs, model program eligibility and forecast the costs of program alterations. However, the reporting error is not unique to one survey instrument as other subsequent linkage projects show large numbers of linked cases answering other surveys as though they are uninsured (although the CPS has the highest level examined so far). Furthermore, the reporting error varies greatly by state. States exercise substantial control over the operation of the Medicaid program. Thus, what at first appears to be "survey error" may partially be the result of Medicaid program operations as large numbers of people do not know they (or their dependents) have health insurance coverage. The study, therefore, has implications for not only improving survey instrument design (as some instruments are clearly better than others at reducing the error), but is also suggestive that some states may need to improve communication with enrollees regarding their enrollment in Medicaid.

Return to top

Title: High-throughput Flow Cytometry Data Analysis: Tools And Methods In Bioconductor

  • Speaker: Dr. Florian Hahne, Computational Biology Program, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center
  • Time/Location: 10:00-11:00 AM. Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 4000 Reservoir Rd, NW, Warwick Evans Conference Room, Building D, Washington, DC 20007.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Abstract:

Automation technologies developed during the last several years have enabled the use of flow cytometry high content screening (FC-HCS) to generate large, complex datasets in both basic and clinical research applications. A serious bottleneck in the interpretation of existing studies and the application of FC-HCS to even larger, more complex problems is that data management and data analysis methods have not advanced sufficiently far from the methods developed for applications of flow cytometry (FCM) to small-scale, tube-based studies. Some of the consequences of this lag are difficulties in maintaining the integrity and documentation of extremely large datasets, assessing measurement quality, developing validated assays, controlling the accuracy of gating techniques, automating complex gating strategies, and aggregating statistical results across large study sets for further analysis. In this seminar, we present a range of computational tools developed in Bioconductor that enable the automated analysis of large flow cytometry data sets, from the initial quality assessment to the statistical comparison of the individual samples.

Return to top

Title: Bayesian Multiscale Multiple Imputation with Implications to Data Confidentiality

  • Speaker: Dr. Scott Holan, University of Missouri-Columbia
  • Discussant: Stephen Cohen, National Science Foundation
  • Chair: Jeri M. Mulrow, Mathematical Statistician, National Science Foundation
  • Date/Time: Thursday, November 20, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsors: ICSP Innovation Working Group

Abstract:

Federal Statistical Agencies currently employ a variety of ways to share and disseminate information and data ranging from print publications to tabular output to integrated database systems. The goal is to provide useful data to a variety of audiences. Statistical agencies face a significant challenge in striking an appropriate balance between providing access to sensitive data and safeguarding confidentiality. This seminar is designed to showcase a new option.

Many scientific, sociological and economic applications present data that are collected on multiple scales of resolution. Frequently, such data sets experience missing observations in a manner that they can be accurately imputed using the method we propose known as Bayesian multiscale multiple imputation. Although our method is of independent interest one immediate implication of such methodology is the potential affect on confidential databases where the mechanism of protection is through cell suppression. In order to demonstrate the proposed methodology and to access the effectiveness of disclosure practices in longitudinal databases, we conduct a large scale empirical study using the U.S. Bureau of Labor Statistics Quarterly Census of Employment and Wages (QCEW). During the course of our empirical investigation it is determined that we can predict several suppressed cells to within 1% accuracy, thus causing potential concerns for data confidentiality.

Return to top

Title: Analysis of Multi-Factor Affine Yield Curve Models

  • Speaker: Siddhartha Chib, Harry C. Hartkopf Professor of Econometrics and Statistics, Olin Business School, Washington University in St. Louis
  • Date/Time: Friday, November 21, 10:45 am - 11:45 am
  • Location: Duques 552 (2201 G Street, NW)
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and Department of Statistics

Abstract:

In finance andeconomics, there is a great deal of work on the theoretical modeling and statistical estimation of the yield curve (defined as the relation between log(ρt(τ))/τ and τ, where ρt(τ) is the time price of the zero-coupon bond with payoff 1 at maturity date t + τ. Of much current interest are models in which the bond prices are derived from a stochastic discount factor (SDF) approach that enforces an important no-arbitrage condition. The log of the SDF is assumed to be an affine function of latent and observed factors, where thesefactors are assumed to follow a stationary Markov process. In this paper we revisit the question of how such multi-factor affine models of the yield curve should be fit. Our discussionis from the Bayesian MCMC viewpoint, but our implementation of this viewpoint is different and novel. Key aspects of the inferential framework include (i) a prior on the parameters of the model that is motivated by economic considerations, in particular, those involving the slope of the implied yield curve; (ii) posterior simulation of the parameters in ways to improve the efficiency of the MCMC output, for example, through sampling of the parameters marginalized over the factors, and through tailoring of the proposal densities in the Metropolis-Hastings steps using information about the mode and curvature of the current target based on the output of a simulating annealing algorithm; and (iii) measures to mitigate numerical instabilities in the fitting through reparameterizations and square root filtering recursions.We apply the techniques to explain the monthly yields on nine US Treasuries (with maturities ranging from 1 to 120 months) over the period January 1986 to December 2005. The model contains three factors, one latent and two observed. We also consider the problem of predicting the nine yields for each month of 2006. We show that the (multi-step ahead) prediction regions properly bracket the actual yields in those months, thus highlighting the practical value of the fitted model.

Return to top

Title: Administrative Data in Support of Policy Relevant Statistics: the Earned Income Tax Credit (EITC) Eligibility, Participation, and Its Impact on Employment

  • Speaker:V. Joseph Hotz, Arts & Sciences Professor of Economics, Department of Economics, Duke University
  • Discussant: Nada Eissa, Associate Professor of Public Policy and Economics, Georgetown Public Policy Institute, Georgetown University
  • Chair: Clinton W. Brownley
  • Date/Time: Wednesday, December 3, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 10. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Section on Public Policy

Abstract:

Hotz will report on research he has conducted on the EITC using administrative matched administrative data sources for the State of California during the 1990s. These data include information from California's welfare and unemployment insurance administrative data systems that is linked to federal tax returns under a unique arrangement with the State of California's taxing authority. Hotz will report on findings on rates of EITC eligibility, participation and the impacts of the EITC on rates of employment for California using these data.

Return to top

Title: Clinical research and lifelong learning: An example from the BLISS cluster randomised controlled trial of the Effect of *Active Dissemination of Information* on standards of care of premature babies in England (BEADI)

  • Speaker: Diana Elbourne, Professor of Health Care Evaluation, Medical Statistics Unit, London School of Hygiene and Tropical Medicine
  • Date/Time: Monday, December 8th, 2008, 11:00am
  • Location: 5th Floor Conference Rooms, 6110 Executive Blvd, Rockville, Maryland. (Come to the 7th floor to suite 750 and someone will escort you to the 5th floor where the seminar will take place.)
  • Sponsor: The George Washington University, Biostatistics Center

Abstract:

Gaps between research knowledge and clinical practice have been consistently reported. Traditional ways of communicating information have limited impact on practice changes. There is a need for clarification as to which dissemination strategies work best to translate evidence into practice in neonatal units across England. The objective of this trial is to assess whether an innovative active strategy for the dissemination of neonatal research findings, recommendations, and national neonatal guidelines is more likely to lead to changes in policy and practice than the traditional (more passive) forms of dissemination. A cluster randomised controlled trial of all neonatal units in England (randomised by hospital, stratified by neonatal regional networks and neonatal units level of care) will assess the relative effectiveness of active dissemination strategies on changes in local policies and practices.

For more information, contact
Karen Miller
Executive Coordinator
The George Washington University Biostatistics Center
6110 Executive Boulevard, Suite 750
Rockville, MD 20852
Phone: 301-881-9260
Fax: 301-881-3742

Return to top

Title: Visualizing Patterns in Data with Micromaps

  • Chair: Brian Meekins
  • Speakers:
    Daniel Carr, George Mason University
    Linda Pickle, StatNet Consulting LLC
  • Date/Time: Thursday, December 11, 2008, 12:30 - 2:00pm
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

Micromaps originated and evolved from research and collaboration with federal agencies including EPA, BLS, NCHS, NCI, CDC, NASS, ERS, and BTS. Micromaps are graphics that link statistical information to an organized set of small maps. They take on a range of forms to serve different pattern visualization tasks. These forms include linked, conditioned, and comparative micromaps. Linked micromaps link entities mapped as polygons, lines, or points to statistical graphics such as box plots. The National Cancer Institute now uses a linked micromap applet to communicate cancer statistics to health planners across the nation and has produced a stand alone application for use in data quality assessment by SEER registrars. Conditioned micromaps encode a response variable in color and use predictor variables to highlight subsets of entities in grids of replicated maps. A variation on conditioned micromaps conditioned by significance tests provides a basis for setting health action priorities. This idea is easily extended to other domains. Comparative micromaps are a grid of response variable maps indexed by one or more variables and enhanced with additional maps to address cognitive issues such as change blindness. Comparative micromaps can show differences over time explicitly, reducing the cognitive burden on the reader. Micromaps are applicable to an almost unlimited range of applications. This includes statistics summarizing massive datasets for polygons representing ecoregions, statistics for polylines representing roads, streams, or social networks, and statistics for points that might represent cities, baseball positions on a field or locations in buildings.

Return to top <

Title: On Robust Tests for Case-control Genetic Association Studies

  • Speaker: Gang Zheng, Ph.D., Office of Biostatistics Research, National Heart, Lung and Blood Institute
  • Date/Time: Friday, December 12, 2008 / 10:30am-11:30am
  • Location: Conference Room 9091, Two Rockledge Center, 6701 Rockledge Drive, Bethesda, MD 20892

Abstract:

When testing association between a single marker and a disease using case-control samples, the data can be presented in a 2x3 table. Pearson's Chi-square test (2 df) and the trend test (1 df) are commonly used. Usually one does not know which of them to choose. It depends on the unknown genetic model underlying the data. So one could either choose the maximum (MAX) of a family of trend tests over all possible genetic models (following Davies, 1977; 1987; both in Biometrika) or take the smaller p-values (MIN2) of Pearson's test and the trend test (following WTCCC - Wellcome Trust Case-Control Consortium, 2007, Nature).

We first show that Pearson's test, the trend test and MAX are all trend tests with different types of scores: data-driven or prespecified, restricted or not restricted. The results provide insight into the properties that MAX is always more powerful than Pearson's test when the genetic model is restricted and that Pearson's test is more robust when the model is not restricted. Then, for the MIN2 of WTCCC (2007), we show that its asymptotic null distribution can be derived, so the p-value of MIN2 can be obtained. Simulation is used to compare some common test statistics. The results are applied to WTCCC (2007). In particular, MIN2 is applied to the SNPs obtained by The SEARCH Collaborative Group (NEJM, August 21, 2008) who used MIN2 to detect these SNPs in a genome-wide association study, but also reported the minimum p-values as the true p-values.

This talk is based on three recent manuscripts with Jungnam Joo, and/or Minjung Kwak, Kwangmi Ahn and Yaning Yang.

Return to top

Title: Model Building: Data with Random Location and Random Scale Effects

  • Speaker: William S. Cleveland, Shanti S. Gupta Distinguished Professor, Purdue
  • Date/time: Friday, December 12, 2008 / 10:00-11:00 a.m.
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 4000 Reservoir Rd, NW, Warwick Evans Conference Room, Building D, Washington, DC 20007
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics

Abstract:

General approaches and tools for model building will be presented for data with random effects, pervasive in medical studies where people are units with repeat measurements. Typically, fitted models have random location effects, but any time location effects are present, there is a high potential for random scale effects to be present; at the very least, it is wise to routinely check for scale effects.

Our stepwise model building approach identifies the error, scale, and location distributions, in that order; each subsequent step uses any previous identifications. Visualization tools are at the core of the identification methods. Also at the core is in-field null-power simulation, which applies to the specific data at hand and its specific finite sample. Null simulations allow us to judge if deviations from expected patterns warrant attention. Power simulations determine our ability to differentiate alternative models. Approaches and methods are illustrated by application to three data sets from customer opinion polling, nutrition, and hospital services.

Joint work with Lei Shu, Abbott Laboratories; Chaunhai Liu, Purdue; and Lorraine Denby, Avaya Labs

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Return to top

Title: Disclosure Protection: A New Approach to Cell Suppression

  • Speaker: Bei Wang, U.S. Bureau of the Census
  • Discussant: Lawrence Cox, National Center for Health Statistics
  • Chair: Linda Atkinson, Economic Research Service, USDA
  • Date/Time: Tuesday, December 16, 2008 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Economics Section

Abstract:

Census products and related programs use cell suppression to protect data that is sensitive to our respondents. A disclosure procedure is applied before any data goes out for publication. The underlining algorithm used is a network flow model. We will review the disclosure procedure and how well the model does. A question that always arises is "how is a near optimized solution to be determined to the Cell Suppression Problem (CSP)?" A new linear programming approach is used in this research. The algorithm is applied to Survey of Business Owners (SBO)'s Hispanic data and comparisons with the 2002 publications are made.

Return to top

Title: Income Data for Policy Analysis: A Comparative Assessment of Eight Surveys

  • Speaker: John Czajka, Mathematica Policy Research, Inc.
  • Discussants:
    David Johnson, U.S. Census Bureau
    Roberton Williams, Urban Institute
  • Chair: Joan Turek, Office of the Assistant Secretary for Planning and Evaluation, Department of Health and Human Services
  • Date/Time: December 18th, 2008 (Thursday) / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Section on Public Policy
  • Presentation material:
    Slides from the presentation (Czajka & Denmead, pdf, ~49.6mb)
    Slides from the discussant (Johnson, pdf, ~628kb)

Abstract:

Income is a critical variable in policy analysis, and because of this, most federal household surveys collect at least some data on income. Yet income is exceedingly difficult to measure well in a household survey. Income questions produce some of the highest item nonresponse rates recorded in surveys, and comparisons of survey estimates with benchmarks developed from administrative records provide evidence of significant under-reporting for many sources. Under contract to the Office of the Assistant Secretary for Planning and Evaluation (ASPE), Department of Health and Human Services (HHS), Mathematica Policy Research, Inc. (MPR) and its subcontractor, Denmead Services & Consulting, have conducted a comprehensive and systematic assessment of the income data and its utility for policy-related analyses in eight major surveys: the Survey of Income and Program Participation (SIPP); the Annual Social and Economic Supplement to the Current Population Survey (CPS); the American Community Survey (ACS); the Household Component of the Medical Expenditure Panel Survey (MEPS); the National Health Interview Survey (NHIS); the Medicare Current Beneficiary Survey Cost and Use files (MCBS); the Health and Retirement Study (HRS); and the Panel Study of Income Dynamics (PSID).

The assessment included both descriptive and empirical components. The descriptive component compiled extensive information on survey design and methodology in addition to the measurement of income and poverty and presented these data in a side-by-side format. The empirical component generated comparative tabulations of the distribution of income and poverty status for a range of personal characteristics for a common universe, income concept, and family definition, to the extent that this was feasible. Additional analysis focused on the implications of specific design choices.

This seminar will present key findings from the study. Findings from the descriptive analysis will include the treatment of armed forces members and students living away from home, survey timing and recall, and sources of income captured. Empirical findings will include comparative estimates of aggregate income and its distribution by quintile; poverty status; earned versus unearned income; the proportion of income allocated because of nonresponse; and the frequency of rounding. Highlights of the methodological analyses will include the impact of the family definition on estimated poverty; the effect of proximity of measured family composition to the income reference period, and the relationship between the interview month and the frequency of allocation.

Return to top

Seminar Archives

2017 2016 2015 2014 2013
2012 2011 2010 2009
2008 2007 2006 2005
2004 2003 2002 2001
2000 1999 1998 1997
1996 1995    

Methodology