Washington Statistical Society on Meetup

Washington Statistical Society Seminars: 2009

January, 2009
22
Thur.
(Bio) Statistics Seminar Series
Office of Biostatistics Research
Division of Prevention and Population Sciences
Challenges and Opportunities for the Statistics Profession and the American Statistical Association
27
Tues.
Vulnerability of Complementary Cell Suppression to Intruder Attack
February, 2009
5
Thur.
Using Administrative Data in Support of Policy Relevant Disability Research: Success of a Three-Way Partnership
13
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
A Distribution for P-values
18
Wed.
Fully-Synthetic Data for Disclosure Control
27
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Statistical input into Science Policy
March, 2009
5
Thur.
Administrative Data in Support of Policy Relevant Statistics: The Bureau of Labor Statistics Quarterly Census of Employment and Wages
6
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences
Combinatorial Patterns for Probabilistically Constrained Optimization Problems
10
Tues.
New Bootstrap Bias Corrections with Application to Estimation and Prediction MSE in Small Areas Estimation
13
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
A Bayesian Approach to Adjust for Diagnostic Misclassification in Poisson Regression
6
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences
Sequential Predictive Regressions and Optimal Portfolio Returns
27
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
ÒImproving the Efficiency of the Logrank Test Using Auxiliary Covariates
April, 2009
3
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
STATGRAPHICS: New Developments in Desktop and Online Statistical Analysis Software
3
Fri.
The George Washington Unversity
Department of Statistics Seminar Series
Inferring Likelihoods and Climate System Characteristics From Climate Models and Multiple Tracers
8
Wed.
An LEHD Primer: An Innovative Use of Administrative Data for Policy Analysis
10
Fri.
The George Washington Unversity
Department of Statistics Seminar Series
General Classes of Skewed Link Function for Binary Response Data
10
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Probabilistic Aspects of Exploration Risk
17
Fri.
The George Washington Unversity
Department of Statistics Seminar Series
Analysis of Cohort Studies with Multivariate, Partially Observed Disease Classification Data
17
Fri.
The George Washington Unversity
The Institute for Integrating Statistics in Decision Sciences
The Impact of Supply Quality and Supplier Development on Contract Design
22
Wed.
Some History of, and Current Issues in, Seasonal Adjustment
28
Tues.
(Bio) Statistics Seminar Series
Office of Biostatistics Research
Division of Prevention and Population Sciences
Statistical Challenges in Genetics Studies of Mental Disorders
30
Thur.
Responsive Design for Random Digit Dial Surveys Using Auxiliary Survey Process Data and Contextual Data
May, 2009
20
Wed.
The Future of Telephone Surveys
21
Thur.
Underreporting of Transfers in Household Surveys: Its Nature and Consequences
27
Wed.
Building Effective, Exceptionally Fast Fellegi-Holt Edit/Imputation Systems
June, 2009
10
Wed.
When is the Verdict or Judgment Final?: An Examination of Post Trial Activity in Civil Litigation
24
Wed.
The Statistical Adventures of an Applied Sociologist: Reflections, Challenges, and Lessons Learned
July, 2009
23
Thur.
An Empirical Evaluation of Signal Extraction Goodness-of-fit Diagnostic Tests
September, 2009
4
Fri.
Calibration Alternatives to Poststratification for Doubly Classified Data
11
Fri.
Improving Differential Expression Analysis with yhe Consideration of Genome-Wide Co-Expression Information
22
Tues.
Why is Survey Research 20 Years Behind?
25
Fri.
Some Lessons from Our Collaborative Studies in Esophageal Cancer, Prostate Cancer, Hiv, and Breast Cancer
25
Fri.
George Washington University
Department of Statistics
Filling the Gap: Introducing the Conway-Maxwell-Poisson regression for count data
October, 2009
8
Tues.
The Sociolinguistics of Survey Translation
8
Tues.
University of Maryland
Statistics Seminar
Using Longitudinal Surveys to Evaluate Interventions
9
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
The Zooniverse: Advancing Science through User-Guided Learning in Massive Data Streams
13
Tues.
19th Annual Morris Hansen Lecture
The Care, Feeding and Training of Survey Statisticians
19
Mon.
George Washington University
The Institute for Integrating Statistics in Decision Sciences
Attitudes Towards Firm and Competition: How do they Matter for CRM Activities?
20
Tues.
Racial Profiling Analysis
23
Fri.
George Washington University
Department of Statistics
Flexible Stepwise Regression: An Adaptive Partition Approach to the Detection of Multiple Change-Points
28
Wed.
Differences in the Academic Careers of Men and Women at Research Intensive Universities and at Critical Transitions
November 2009
5
Thur.
(Bio) Statistics Seminar Series
Office of Biostatistics Research
Division of Prevention and Population Sciences
Pattern Analysis of Pairwise Relationship in Genetic Network
5
Thur.
University of Maryland
Statistics Seminar
A Class of Multivariate Distributions Related to Distributions with a Gaussian Component
10
Tues.
Panel on Address-Based Sampling
10
Tues.
ASA Survey Research Methods Section Webinar
Dual Frame Theory Applied to Landline and Cell Phone Surveys
13
Fri.
Computer-Intensive Statistical Methodology with Applications to Translational Cancer Research
13
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences
What Data Mining Teaches Me About Teaching Statistics
16
Mon.
George Washington University
Department of Statistics
Combined State and Parameter Estimation in General State-Space Models
19
Thur.
Empirical Likelihood Based iInference for Quantilesand Low Income Proportions in Selection Bias and Missing Data Problems
19
Thur.
George Washington University
Department of Statistics
Moment Determinancy of Distributions: Some Recent Results
December 2009
1
Tues.
American University
Department of Mathematics and Statistics Colloquium Perfect simulation of Vervaat perpetuities
4
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series
Why I am Not a System Dynamicist
10
Thur.
(Bio) Statistics Seminar Series
Office of Biostatistics Research
Division of Prevention and Population Sciences
U-Estimation for Measurement Error Problems
16
Wed.
Geographic Information (GIS) Data Collection and Storage
16
Wed.
University of Maryland
Statistics Seminar
Frailty Modeling via the Empirical Bayes Hastings Sampler
17
Thur.
Comparing the Census Bureau's Master Address File (MAF) with both Fresh Area Listing and Commercial Address Lists
18
Fri.
Probability of Detecting Disease-Associated SNPs in Genome-Wide Association Studies


Title: Challenges and Opportunities for the Statistics Profession and the American Statistical Association

  • Speaker: Ronald L. Wasserstein, Ph.D., Executive Director, American Statistical Association
  • Date/Time: Thursday, January 22, 2009 / 11am - noon
  • Location: Conference Room 9091, Two Rockledge Center, 6701 Rockledge Drive, Bethesda, MD 20892
  • Sponsor: Office of Biostatistics Research, Division of Prevention and Population Sciences, National Heart, Lung, and Blood Institute

Abstract:

From his perspective as ASA's Executive Director, Ron Wasserstein will discuss seven sets of challenges and opportunities he sees as particularly important for our profession, and for the ASA. These include: membership, the statistical pipeline, visibility and impact of the profession, publications, meetings, internationalization/globalization, and accreditation. Each set includes a set of questions for the participants, so a large portion of the time will be spent in audience discussion.

Return to top

Title: Vulnerability of Complementary Cell Suppression to Intruder Attack

  • Chair: Philip Steel, U.S. Census Bureau
  • Speaker: Lawrence H. Cox, National Center for Health Statistics
  • Date/Time: Tuesday, January 27, 2009 / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsors: Confidentiality and Data Access Committee, an interest group of the Federal Committee on Statistical Methodology; Methodology Program, WSS

Abstract:

Complementary cell suppression was the first and remains a popular method for disclosure limitation of magnitude data such as economic censuses data. We show that, when not solved in a rigorous mathematical way, suppression can fail to protect data, sometimes fatally. When solved properly as a mathematical programming problem, suppression is guaranteed to meet certain conditions related to protecting individual data, but we demonstrate that other vulnerabilities exist. Suppression sacrifices both confidential and non-confidential data, forcing potentially significant degradation in data quality and usability. These effects are often compounded because mathematical relationships induced by suppression tend to produce "over-protected" solutions. To mitigate these effects, it has been suggested that the data releaser provide exact interval estimates of suppressed cell values. We demonstrate for two standard data sensitivity measures that, even when safe, exact intervals further threaten data security, in some situations completely.

Return to top

Title: Using Administrative Data in Support of Policy Relevant Disability Research: Success of a Three-Way Partnership

  • Speakers:
    Richard Burkhauser, Sarah Gibson Blanding Professor of Policy Analysis, Cornell University
    Robert Weathers, Economist Social Security Administration
  • Discussant: Jameela Akbari, Program Examiner, U.S. Office of Management and Budget
  • Chair: Shelly Wilkie Martinez
  • Date/Time: Thursday, February 5, 2009 / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 10. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Section on Public Policy
  • Presentation material:
    Slides from the presentation (pdf, ~6.8mb)

Abstract:

A three way partnership between the Social Security Administration, the National Technical Institute for the Deaf of the Rochester Institute of Technology (NTID-RIT), and Cornell University resulted in the creation and ongoing analysis of a unique data set that links Social Security Administrative records data to educational attainment records of deaf and hard of hearing applicants to NTID from 1965 to the present. Each of the partners entered this agreement to achieve both joint and individual institutional goals. The project's success lies in mutual cooperation that allowed each partner to achieve their institutional goals. The result is a data set containing the most detailed information on the earnings and SSA program history of the applicants of any US institution of higher education. It has been of great value to NTID in evaluating it program's success. Because these students all meet or exceed the medical listings for Social Security Disability Insurance and Supplemental Security Income benefits these data also have provided SSA researchers and program administrators with a unique opportunity to trace the use of these programs. Cornell University researchers have used these data to complete externally funded research that looks at the factors that determine both educational success and SSA program outcomes. We provide a description of these data and key findings from our research using them.

Return to top

Topic: A Distribution for P-values

  • Speaker: Chang Yu, Ph.D., Vanderbilt University School of Medicine, Department of Biostatistics
  • Date/time: Friday, February 13, 2009 / 10:00 - 11:00 a.m.
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, New Research Building, E501, Washington, DC 20007
  • Sponsor: Georgetown University, Department of Biostatistics, Bioinformatics and Biomathematics

Abstract:

What is the distribution of the p-value under the alternative hypothesis? We describe the properties of a parametric distribution defined on the interval (0,1). This distribution includes the uniform as a special case. The functional form is derived as the distribution of the p-value in a statistical test of a pair of close hypotheses in a wide variety of settings. The distributional form is retained when it is compounded with a uniform or when the individual p-values are sampled from a variety of different hypotheses. We describe properties of the parameter estimate and the distribution of extreme order statistics. The distribution is fitted to data from a study of breast cancer patients comparing many genetic markers. The p-values generated in a microarray experiment comparing gene expressions can be considered a mixture of p-values under the null hypothesis and under a range of alternative hypotheses. The proportion under the null is of interest. Using the derived distributions, we provide a method to estimate this proportion under the framework of mixture models.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu.

Return to top

Title: Fully-Synthetic Data for Disclosure Control

  • Chair: TBA
  • Speaker: Mandi Yu, Office of Biostatistics, FDA
  • Discussant: Steve Cohen, National Science Foundation
  • Date/Time: Wednesday, February 18, 2009 / 12:30 - 2:00pm
  • Location: Bureau of Labor Statistics Conference Center, Room 10. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

Recent advances in technology dramatically increase the volume of data that statistical agencies can gather and disseminate. The improved accessibility translates into higher risks of identifying individuals from public microdata, and therefore increases the importance of the development of statistical confidentiality control methods. A potential useful technique is to alter certain elements without distorting the statistical information in the Microdata. Rubin (1993) and Little (1993) proposed Multiple Imputation approach to limit disclosure, where multiply imputed fully synthetic public use data are released in place of the actual survey data.

This seminar presents findings from two methodological studies of fully-synthetic data approach. The first study develops semi-parametric models to construct fully-imputed synthetic datasets for a large complex longitudinal survey. The actual values of about one hundred variables of different types on more than 12,000 subjects are synthesized. In the second study, we extend this approach to cope with situations where small area statistics are of vital importance. Both theoretical and empirical findings are included. The fully-synthesized data contains enough geographic details to permit small area analyses, which otherwise is impossible because such geographical identifiers are usually suppressed to control disclosure. We evaluated the information loss of synthetic data inferences on both descriptive and analytic statistics . In the second study, information loss was also assessed for statistics at sub-national level.

Return to top

Topic: Statistical input into Science Policy

  • Speaker: Mary A. Foulkes, Ph.D., George Washington University, Department of Epidemiology and Biostatistics
  • Date/time: Friday, February 27, 2009 / 10:00 - 11:00 a.m.
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, New Research Building, E501, Washington, DC 20007
  • Sponsor: Georgetown University, Department of Biostatistics, Bioinformatics and Biomathematics

Abstract:

A new emphasison evidence-based policy presents unprecedented opportunities for statistical input, for statisticians to contribute to new efforts. There are, however, numerous substantive statistical contributions to policy from the past which will be reviewed. Public health issues, for example, in infectious diseases, have raised challenges and questions that statistical modeling, experimental design and novel analyses have addressed. Many new directions in science, such as genomics, and new capabilities, such as high throughput computing, require quantitative approaches often provided by bioinformatics, economics, or other disciplines, but may miss some essential statistical thinking. Science policy is at a tipping point where statistical thinking will become a necessary component; where communication of statistical issues will become an even more essential aspect of the discipline of statistics. The science of science policy as a new initiative of the Federal government, and particularly of NSF, will be reviewed.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Return to top

Title: Administrative Data in Support of Policy Relevant Statistics: The Bureau of Labor Statistics Quarterly Census of Employment and Wages

  • Speakers: Richard L. Clayton and James R. Spletzer, Bureau of Labor Statistics
  • Chair: Shelly Wilkie Martinez, Office of Management and Budget
  • Date/Time: Thursday, March 5, 2009 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 10. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Section on Public Policy
  • Presentation material:
    Slides from the presentation (pdf, ~1.1mb)

Abstract:

The Quarterly Census of Employment and Wages (QCEW) program provides national, State, MSA, and county data on monthly employment, quarterly total wages, and the number of establishments, by 6-digit NAICS code. These data originate from the administrative records of the Unemployment Insurance system in each State, augmented by two supplemental surveys the Annual Refiling Survey and the Multiple Worksite Report that are necessary to yield accurate data at the local level. In the second quarter of 2008, the QCEW statistics show an employment level of 136.6 million, with 9.1 million establishments in the U.S. economy. The QCEW data also are the basis for the BLS Business Employment Dynamics (BED) series, which are created from longitudinally linking the QCEW microdata. The linkage process tracks net employment changes at the establishment level, which allows for a decomposition of net employment growth into the jobs gained at opening and expanding establishments (gross job gains) and the jobs lost at closing and contracting establishments (gross job losses). In this WSS seminar, we will emphasize the "foundations" of the QCEW program and describe how we transform raw' administrative UI records into something statistically useful, and we will also discuss the exciting new data products and research opportunities that the data present.

Return to top

Title: Combinatorial Patterns for Probabilistically Constrained Optimization Problems

  • Speaker: Miguel Lejeune, Department of Decision Sciences, The George Washington University
  • Time: Friday, March 6th, 11:00-12:00 noon
  • Place: The George Washington University, Duques 453 (2201 G Street, NW)
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences

Abstract:

We propose a new framework for the solution of probabilistically constrained optimization problems by extending some recent developments in combinatorial pattern theory. The method involves the binarization of the probability distribution and the generation of a consistent partially defined Boolean function (pdBf) representing the combination (F,p) of the binarized probability distribution F and the enforced probability level p. We represent the pdBf representing (F,p) as a disjunctive normal form taking the form of a collection of combinatorial patterns. We propose a new integer programming-based method for the derivation of combinatorial patterns and present several methods allowing for the construction of a disjunctive normal form that defines necessary and sufficient conditions for the probabilistic constraint to hold. The obtained disjunctive forms are then used to generate deterministic reformulations of the original stochastic problem. The method is implemented for the solution of a numerical problem. Extensions to the present study are discussed.

Return to top

Title: New Bootstrap Bias Corrections with Application to Estimation and Prediction MSE in Small Areas Estimation

  • Chair: TBA
  • Speaker: Danny Pfeffermann, Hebrew University and University of Southampton
  • Date/Time: Tuesday, March 10, 2009 / 12:30 - 2:00pm
  • Location: Bureau of Labor Statistics Conference Center, Room 10. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

Classical Bootstrap bias corrections estimate the bias of an estimator computed from the original sample by the bias of the estimator computed from the bootstrap samples in estimating the original estimator, and then corrects the original estimator accordingly. The use of these corrections has two important limitations. First, it assumes implicitly that the bias is independent of the true parameter value, whereas in practice, the bias may be a more complicated function of the true parameter value. Second, when there are multiple parameters, the bias of the estimator of any one of them may depend on the bias of the estimators of the other parameters. This possibility is not accommodated by the classical corrections.

In thispresentation I propose a new bootstrap bias correction procedure that aims to overcome the above two limitations. The procedure attempts to model the bias of the original estimator of any given parameter as a function of the original estimators and the corresponding bootstrap estimators of all the parameters. An application of the procedure for estimating the prediction MSE when estimating small area proportions under the unit level mixed logistic model will be illustrated and compared to other methods proposed in the literature to deal with this problem.

Return to top

Topic: A Bayesian Approach to Adjust for Diagnostic Misclassification in Poisson Regression

  • Speaker: James Stamey, PhD, Associate Professor, Department of Statistical Sciences, Baylor University, Texas
  • Date/time: Friday, March 13, 2009 / 10:00 - 11:00 a.m.
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, New Research Building, E501, Washington, DC 20007
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics
  • Sponsor: Georgetown University, Department of Biostatistics, Bioinformatics and Biomathematics

Abstract:

Response misclassification of counted data biases and understates the uncertainty of parameter estimators in Poisson regression models. To correct these problems, classical procedures have been proposed but they rely on asymptotic distribution results and on supplemental validation data in order to estimate unknown misclassification parameters. We derive a new Bayesian Poisson regression procedure that accounts and corrects for misclassification for a count variable. Under the Bayesian paradigm one may use validation data, expert opinion, or a combination of these two approaches to correct for the consequences of misclassification. The Bayesian procedure proposed here yields an operationally effective way to correct and account for misclassification effects in Poisson count regression models. We also investigate a Bayesian variable selection procedure. We demonstrate the performance of the model and variable selection procedure in simulation studies. Additionally, we analyze two real data examples and compare our new Bayesian inference method that adjusts for misclassification to a similar analysis ignoring misclassification.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu.

Return to top

Title: Sequential Predictive Regressions and Optimal Portfolio Returns

  • Speaker: Nicholas Polson, Professor of Econometrics and Statistics, The University of Chicago Graduate School of Business
  • Time: Friday, March 13th 11:00-12:00 noon (Coffee and Refreshments at 10:45 am)n
  • Place: The George Washington University, Duques Hall 553 (2201 G Street, NW)
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences

Abstract:

This paper analyzes sequential learning in the context of predictive regression models. To do this, we develop new particle based methods for sequential learning about parameters, state variables, hypotheses, and models. This sequential perspective allows us to quantify how investor's views about predictability and models varies over time, and naturally mimics the learning problem encountered in practice. We consider learning about predictability using dividend/payout data and models that incorporate drifting coefficients and stochastic volatility. We analyze the time-variation of parameter estimates and model probabilities, using both the traditional cash dividends measure and a measure taking into account share repurchases and issuances. We also analyze the economic benefits of using these models by considering optimal portfolio allocation problems.

Return to top

Topic: Improving the Efficiency of the Logrank Test Using Auxiliary Covariates

  • Speaker: Chang Yu, Ph.D., Vanderbilt University School of Medicine, Department of Biostatistics
  • Date/time: Friday, March 27, 2009 / 10:00 - 11:00 a.m.
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, New Research Building, E501, Washington, DC 20007
  • Sponsor: Georgetown University, Department of Biostatistics, Bioinformatics and Biomathematics

Abstract:

The logrank test is widely used in many clinical trials for comparing the survival distribution between two treatments with censored survival data. Under the assumption of proportional hazards, it is optimal for testing the null hypothesis of H0: β = 0, where β denotes the logarithm of the hazard ratio. In practice, additional auxiliary covariates are collected together with the survival times and treatment assignment. If the covariates correlate with survival times, making use of their information will increase the efficiency of the logrank test. We apply the theory of semiparametrics to characterize a class of regular and asymptotic linear estimators for β when auxiliary covariates are incorporated into the model, and derive estimators that are more efficient. The Wald tests induced by these estimators are shown to be more powerful than the logrank test. Simulation studies and a real data from ACTG 175 are used to illustrate the gains in efficiency.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu.

Return to top

Title: STATGRAPHICS: New Developments in Desktop and Online Statistical Analysis Software

  • Speaker:
    Dr. Neil Polhemus, Chief Technology Officer
    StatPoint Technologies, Inc.
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: April 3, 2009
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 301, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

This talk will discuss some of the new developments currently underway at StatPoint Technologies for incorporation in the STATGRAPHICS statistical analysis system. Some of the topics to be covered include: (a) accessing statistical software as a web service over the Internet; (b) sharing XML scripts between desktop and online versions; (c) a new wizard for design of experiments; (d) interactive methods for exploring response surfaces; (e) automatic alert mechanisms for unusual observations; (f) incorporation of statistical advice in the software for non-statisticians. The talk will include demonstrations of STATGRAPHICS Online and the upcoming release of STATGRAPHICS Centurion Version 16.

Biography:

Dr. Polhemus is Chief Technology Officer for StatPoint Technologies, Inc, located in Warrenton, Virginia. He directs the development of the STATGRAPHICS statistical software products. Neil received his B.S.E. and Ph.D. degrees from the School of Engineering and Applied Science at Princeton University, under the tutelage of Dr. J. Stuart Hunter. Dr. Polhemus spent two years as an assistant professor in the Graduate School of Business Administration at the University of North Carolina at Chapel Hill and six years as an assistant professor in the Engineering School at Princeton University. Dr. Polhemus founded Statistical Graphics Corporation in 1980 to develop and promote STATGRAPHICS. Since then, he has created various statistical software products including Execustat, Statlets, StatBeans, and STATGRAPHICS .Net Web Services.

Return to top

Title: Inferring Likelihoods and Climate System Characteristics From Climate Models and Multiple Tracers

  • Speaker: Murali Haran, Department of Statistics, Penn State University
  • Date/Time: Friday, April 3, 2009, 11-12pm
  • Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

To understand the current state of the climate system and to predict its future behavior, it is critical to have good estimates of key climate system parameters. ÊSince these climate parameters are very difficult to measure directly, we have to infer their values based on two sources of information — spatial data on `tracers' that indirectly provide information about these parameters, and output from complex climate computer models run at several climate parameter settings. These climate models are computationally expensive and can take weeks or months to run at each setting. I will discuss an inferential approach that uses Gaussian processes to emulate the climate models, thereby establishing a connection between the climate parameters and the multiple tracers. Using a spatial model, it is then possible to carry out statistical inference for the climate parameters, while accounting for various sources of variability and dependence. I will describe how our methods propose to address a few of the many challenges involved in this research including computational obstacles posed by the size of the data and the need to simultaneously model potentially non-linear relationships between tracers while accounting for spatial dependence in the observations.

This is joint work with K.S.Bhat (Statistics, Penn State), and R.Tonkonojenkov and K.Keller (Geosciences, Penn State)

Return to top

Title: An LEHD Primer: An Innovative Use of Administrative Data for Policy Analysis

  • Speaker: Julia Lane, National Science Foundation.
  • Discussants:
    Ron Jarmin, Census Bureau
    Nicholas Greenia, Internal Revenue Service
  • Chair: Michael L. Cohen, Committee on National Statistics
  • Date/Time: Wednesday, April 8, 2009 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 1. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.)
  • Sponsor: WSS Section on Public Policy

Abstract:

The talk will provide a general overview of the challenges faced with the establishment of the Longitudinal Employer-Household Dynamics (LEHD) program. It will describe the initial goals of the project and how they evolved in response to pragmatic challenges. It will also describe the difficulties faced in obtaining and merging the input data sets, various disclosure avoidance issues raised and the techniques used to address them, as well as other relevant methodological issues faced at the outset. The speaker will also discuss the actual and realized potential of various areas of application, including the development of indicators of employment and earnings dynamics, mapping applications, and research in a variety of policy areas. (This is one of several talks in a series that the WSS Section on Public Policy has been presenting this year on the uses of state and local administrative records to inform public policy issues.)

Return to top

Title: General Classes of Skewed Link Function for Binary Response Data

  • Speaker: Dipak K. Dey, Department of Statistics, University of Connecticut
  • Date/Time: Friday, April 10, 2009, 11-12 pm
  • Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

The choice of the links is one of most critical issues involved in modeling binary data as substantial bias in the mean response estimates can be yielded if the link could be misspecified. The objective of this study is to introduce a flexible skewed link function for modeling categorical data. The commonly used complementary log-log (Cloglog) link is prone to link misspecification because of its positive and fixed skewness. We propose a new link function based on the generalized extreme value (GEV) distribution. The GEV link has a very wide range of skewness, which is purely decided by its shape parameter. Using Bayesian methodology, we can automatically detect the skewness in the data along with the model fitting by the GEV link. Various theoretical properties are examined and explored in details. We compare the logit, the probit, the Cloglog and the GEV links under different scenarios. The possibility of applying this link to the large p, small n cases is also discussed. The deviance information criterion measure is used for guiding model selection when comparing different links. The results are further extended to incorporate spatial structure. The methodologies are exemplified through a bank transaction data and a species abundance data with spatial variation.

This is joint work with Xia Wang.

Return to top

Title: Probabilistic Aspects of Exploration Risk

  • Speaker: Nozer Singpurwalla, Department of Statistics, The George Washington University
  • Date/Time: Friday, April 10, 2009, 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Location: Research 1, Room 301, Fairfax Campus, George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

The extraction of natural resources from the earth, such oil, natural gas, coal, diamonds, and minerals, brings into the foray several probabilistic modelling and statistical inferential issues. Such issues have attracted the attention of notables like Kolmogorov, Halmos, and Barndorff-Nielson. Kolmogorov and Halmos leaned on limit theorems to propose the lognormal distribution, whereas Barndorff-Nielson advocated the log hypergeometric distribution purely based on empirical data. The log hypergeometric distribution has fat tails and could be a suitable model for modeling financial data as well. None of these authors brought into the picture Bayesian ideas which I think have a key role to play here, especially, because prospecting for oil and gas entails the judgements of wildcatters.

In this talk I will give a broad brushed overview of the topic and inject some Bayesian thoughts into the arena of exploration risk.

Return to top

Title: Analysis of Cohort Studies with Multivariate, Partially Observed Disease Classification Data

  • Speaker: Nilanjan Chatterjee, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health
  • Date/Time: Friday, April 17, 2009, 11-12 pm
  • Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Complex diseases, like cancer, can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semi-parametric Cox proportional hazard regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating-equation (EE) approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of such an EE method under general missing-at-random (MAR) assumption and propose a novel influence-function based sandwich variance estimator. The methods are illustrated using simulation study and a real data application involving the Cancer Prevention Study (CPS-II) nutrition cohort.

Return to top

Title: The Impact of Supply Quality and Supplier Development on Contract DesignÊ

  • Speaker: Sila Cetinkaya, Department of Industrial and Systems Engineering, Texas A&M
  • Date/Time: Friday, April 17, 2009, 11-12 pm
  • Location: Duques Hall 453 (2201 G Street, NW, Washington, DC 20052)
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences

Abstract:

In this talk, we examine two key issues in supply management: supply quality and supplier development. To this end, we consider a supplier-buyer pair and develop analytical models for designing optimal buyer-initiated supply contracts with lot sizing, supply quality, and supplier development considerations while modeling private information and individual incentives explicitly. We study two distinct contractual settings. First, we concentrate on the case where there is no supplier development, and, hence, no supply quality improvement effort. In this case, we show that the contractual lot size is larger than the channel optimum when the buyer has incomplete information about the supplier's quality level. For the special case where the buyer's prior distribution of supplier's quality level is uniform, we prove that immediate contracting is in fact more efficient for the channel than not having a contract in the long run as long as the buyer's prior expected quality level is sufficiently high, i.e., more than 75%, or the buyer's estimation of the quality level is unbiased. However, for general prior distributions of the supplier's quality level, immediate contracting may not be effective for the channel depending on the characteristics of the hazard rate function of the prior distribution. Since the efficiency of the contract depends on the buyer's prior distribution of the supplier's quality level, we also present a dynamic programming model for the buyer to determine when to offer a contract to the supplier under information updating. Next, we concentrate on the case where the buyer seeks a quality improvement initiative under a supplier development program but she has incomplete information about the supplier's quality investment sensitivity. We show that the buyer will request a lower level of quality improvement than in the full information case. Also, in this case, we demonstrate that buyer-initiated contracting under asymmetric information is always worthwhile; however, contracting may not lead to quality improvement. In particular, depending on the characteristics of the reverse hazard rate function of the buyer's estimation of the supplier's investment sensitivity, investment decision may not be made. As a result, information asymmetry may ruin the buyer's interest in initiating a supplier development program in practice.

Return to top

Title: Some History of, and Current Issues in, Seasonal Adjustment

  • Speaker: William R. Bell, U.S. Census Bureau
    2008 co-winner of the Julius Shiskin Memorial Award for Economic Statistics
  • Chair: Stuart Scott, U.S. Bureau of Labor Statistics
  • Date/Time: Wednesday, April 22, 2009 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

The talk will be in two parts. The first part will review the history of seasonal adjustment, tracing its development from initial efforts starting around 1920, through the development of the X-11 program (Shiskin, Young, and Musgrave 1967), and continuing on to the development of model-based seasonal adjustment. Some comparisons with the history of seasonal time series modeling will be made. The second part of the talk will discuss some current technical issues in seasonal adjustment. This will not be an attempt at a comprehensive review of current issues, but rather will focus on some issues on which the speaker has done some work. Issues expected to be discussed (time permitting) include seasonal adjustment with sampling error, seasonal adjustment variances, ARIMA versus ARIMA component time series models, time-varying trading-day effects, and comparing X-12 and model-based seasonal adjustment filters. In discussing these issues the emphasis will generally be on presenting results from applications, not on technical derivations related to the models used.

Return to top

Topic: A Bias Correction in Testing Treatment Efficacy under Informative Dropout in Clinical Trials

  • Speaker:Dr. Fanhui Kong, U. S. Food and Drug Administration
  • Date/Time: Friday, April 24, 2009, 3:15 pm
  • Location: St. Mary's Hall 326, Reservoir Road, NW, between 37th and 38th Streets, NW. Building #16 on Campus Map at http://maps.georgetown.edu/index.cfm?Action=View&MapID=3
  • Sponsor: Georgetown University, Department of Mathematics

Abstract:

In clinical trials of drug development, patients are often followed for a certain period of time, and the outcome variables are measured at scheduled time intervals. ÊThe main interest of the trial is the treatment efficacy at a prespecified time point, which is often the last visit. In such trials, patient dropout is often the major source for missing data. With possible informative patient dropout, the missing information often causes biases in the inference of treatment efficacy. ÊIn this paper, for a time-saturated treatment effect model and an informative dropout scheme that depends on the unobserved outcomes only through the random coefficients, we propose a grouping method to correct the biases in the estimation of treatment effect. ÊThe asymptotic variance estimator is also obtained for statistical inference. ÊIn a simulation study, we compare the new method with the traditional methods of the observed case (OC) analysis, the last-observation-carried-forward (!LOCF) analysis, and the mixed-model-repeated-measurement (MMRM) approach, and find it improves the current methods and gives more stable results in the treatment efficacy inferences.

Refreshments will be served after the talk.

Return to top

Title: Statistical Challenges in Genetics Studies of Mental Disorders

  • Spearker: Heping Zhang, Ph.D. Professor of Biostatistics, Department of Biostatistics, Yale University School of Medicine
  • Date: Tuesday, April 28th, 2009
  • Time: 3pm-4pm
  • Location: Conference room 9091, Two Rockledge Center
  • Sponsor: Office of Biostatistics Research, Division of Prevention and Population Sciences, National Heart, Lung, and Blood Institute

Abstract:

It has been a century since early preliminary reports suggested heredity in some psychiatric disorders such as insanity. Decades have passed since the modes and levels of inheritance were documented for a number of psychiatric and behavioral disorders such as Tourette's Syndrome and nicotine dependence. Despite recent landmark successes that led to discoveries of genetic variants for several complex diseases, the hunting for genes underlying mental disorders remains largely elusive. In addition to political challenges, there are also major clinical and analytical challenges. Mental disorders are difficult to characterize both phenotypically and genetically. Beyond the challenges that are common for complex diseases such as cancer and age-related macular degeneration, there are great intrapersonal variations and uncertainties, particularly over time. The diagnoses of mental disorders generally depend on instruments that include many descriptive questions, and comorbidity is common. I will present some of the joint work conducted by my group in recent years that is motivated by the need arising from studying mental disorders. For example, we have developed methodology and software to analyze ordinal traits and multiple traits commonly encountered in mental health research. The potential of these methods has been demonstrated through simulation as well as several genetic analyses of several mental disorders such as hoarding, nicotine dependence, and alcohol dependence.

Return to top

Title: Responsive Design for Random Digit Dial Surveys Using Auxiliary Survey Process Data and Contextual Data

  • Speaker: Sunghee Lee, Department of Biostatistics, UCLA School of Public Health
  • Chair: TBA
  • Date/Time: Thursday, April 30 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS & D.C. AAPOR

Abstract:

This study examines a potential framework for a random digit dial (RDD) telephone survey design that responds to findings from nonresponse bias studies. In order to overcome the absence of data for nonrespondents, the major challenge of studying nonresponse bias for RDD surveys, this study will use the following two types of data that are available regardless of response status. The first set of data, termed as paradata, comes from the survey process. Paradata exhibit the history of all calls made to each sampled number (e.g., number of calls placed, calling dates and times) and the indicator of survey design features used for each number (e.g., advance letter, monetary incentives, refusal conversion). The second type is what is known as contextual or ecological data. They come from external sources (e.g., decennial census data) and are prepared by linking the geographic identifier (e.g., address, census tract, ZIP code) of all sampled telephone numbers to the external data available at the corresponding geographic level. They include various characteristics of the corresponding geography, such as demographics and socio-economics, which are assumed to approximate the characteristics of individuals residing in the geography.

As the literature indicates, this study is premised on stochastic nature of survey response behavior, where the participation decision is influenced simultaneously by the traits of the sample, the survey features, and the situational circumstances, and the perceived importance of these factors. Therefore, this study models the response behavior with variables in the paradata from California Health Interview Survey and contextual data mostly from Census SF-1 and interactions among these variables. Multilevel models using the two types of data are tested to predict how response behaviors change given hypothetical design features. The model is applied to any sample with the same set of variables, and the response behavior of a new sample can be predicted before fielding the survey. Based on the predicted response behavior, the design features may be tailored for each case so as to maximize positive response behavior for fixed costs.

The major element of this study is that the design tailoring will be done not only to increase response rates but also to decrease potential nonresponse bias. This will be done by using a bias indicator, such as a variable highly associated with key survey variables and available regardless of response status. By modeling the bias indicator similar to the response behavior as described above, the expected value of the indicator can be estimated for the new sample before conducting the survey. By applying the expected response status, the estimate of the chosen variable will be calculated for respondents and nonrespondents separately. Comparisons of these estimates will indicate the magnitude of nonresponse bias.

Return to top

Title: The Future of Telephone Surveys

  • Panelists:
    Clyde Tucker, Senior Survey Methodologist, Bureau of Labor Statistics
    Scott Keeter, Director of Survey Research, Pew Research Center
    Karol Krotki, Senior Research Statistician, RTI International
  • Chair: Carol Joyce Blumberg, Mathematical Statistician, Energy Information Administration
  • Date/Time: Wednesday, May 20, 2009 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 8. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsors: WSS Data Collection Methods and DC-AAPOR
  • Presentation material:
    Telephone Surveys: Challenges for the Future (Tucker, pdf, ~300kb)
    A Revolution in Survey Research (Krotki, pdf, ~676kb)
    The Future of Telephone Surveys: Evidence From the World of Political Polling in 2008 (Keeter, pdf, ~1mb)

Abstract:

We are witnessing what could be termed a revolution in survey research. Trends such as increased use of cell phones and other wireless devices, decreasing cooperation rates, and pervasive presence of the Internet are motivating survey researchers to rethink data collection methodologies. New tools that are being used more frequently are multi-frame sampling, multi-mode interviewing, Internet panels, use of various wireless devices, and the reliance on large extant data bases for profile and screening information. This panel will focus on the implications of these new tools for the future of telephone surveys. Each panelist will give ten minutes of introductory comments. This will be followed by five minutes of comments by each of the panelists on the remarks of the other two panelists. The remainder of the time will be reserved for audience questions and discussion.

For further information contact Carol Joyce Blumberg at carol.blumberg@eia.doe.gov or (202) 586-6565.

Return to top

Title: Underreporting of Transfers in Household Surveys: Its Nature and Consequences

  • Speaker: Bruce Meyer, The Harris School, University of Chicago
  • Discussant: Charles Pierret, Bureau of Labor Statistics
  • Chair: Shelly Martinez, Office of Management and Budget
  • Date/Time: Thursday, May 21, 2009 / 12:30 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: WSS Section on Public Policy

Abstract:

In recent years, roughly half of the dollars received through Food Stamps, Temporary Assistance for Needy Families (TANF) and Workers' Compensation have not been reported in the Current Population Survey (CPS). High rates of understatement are found also for many other government transfer programs and in other datasets that are commonly used to analyze income distributions and transfer receipt. Thus, this understatement has major implications for our understanding of the economic circumstances of the population and the working of government programs. We provide estimates of the extent of transfer under-reporting for ten of the main transfer programs and five major nationally representative household surveys. We obtain estimates of under-reporting by comparing weighted totals reported by households for these programs with those obtained from government agencies. We also examine imputation procedures and the share of reported benefits that are imputed. Our results show increases in under-reporting and imputation over time and sharp differences across programs and surveys. These differences shed light on the reasons for under-reporting and are informative on the success of different survey methods. Our estimates provide evidence on the extent of bias in existing studies of program effects and program take-up and suggest possible corrections.

Return to top

Title: Building Effective, Exceptionally Fast Fellegi-Holt Edit/Imputation Systems

  • Chair: Arthur Kennickell, Board of Governors of the Federal Reserve System
  • Speaker: William E. Winkler, U.S. Census Bureau
  • Date/Time: Wednesday, May 27, 2009 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

A holy grail of edit/imputation research has been to create generalized modeling and production software based on the model of Fellegi and Holt (1976) that can used with at most minor modification for a large set of different demographic surveys. There are distinct advantages to FH systems: (1) all edits are contained in easily modified tables, (2) the logical consistency of the system can be checked prior to the receipt of data, and (3) the optimization routines that determine the minimum number of fields to impute that assure that a record satisfies edits never need to be changed. A number of statistical agencies have implemented FH systems that are used across surveys but do not assure that underlying probability distributions are preserved in a principled manner. This talk presents new extended methods that combine restraints due to editing with constraints that preserve the joint distributions. Winkler (2003) provided theory on how FH methods could be combined the imputation methods of Little and Rubin (2002). The new methods and computational algorithms (Winkler 2006, 2008) are characterized by extreme speed in comparison to commercial software and run on all computer systems. Because of its generality, use of tables for edit restraints, and ability to preserve joint distributions, the software can easily and quickly be applied in a large variety of survey situations, uses far less resources in building productions systems, allows variance estimation and evaluation, and consistently provides demonstrably far better survey results than built-from-scratch hot-deck-based systems.

Return to top

Title: When is the Verdict or Judgment Final?: An Examination of Post Trial Activity in Civil Litigation

  • Speaker: Thomas H. Cohen, J.D., Ph.D., Statistician, Bureau of Justice Statistics
  • Chair: Mel Kollander
  • Date/time: Wednesday, June 10, 2009 / 12:00 - 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources Section

Abstract:

In the civil justice system, there is often an expectation of case resolution with jury and bench trials. The assumption that verdicts or judgments provide an end point to civil disputes, however, does not provide an accurate view of the full civil litigation process. Litigants can file motions requesting various forms of post trial relief as a means of challenging or modifying the trial court verdict or judgment. This article applies multivariate logistic regression techniques to examine the factors associated with post trial activity among a national sample of tort and contract trials concluded in 2005. Results show that the legal issues adjudicated at trial, the type of trial (bench/jury), damage award amounts, punitive damages, filing to disposition time, trial length, and geographic location are all significantly associated with the decision of one or both litigants to seek post trial relief.

Point of contact e-mail: Thomas.H.Cohen@usdoj.gov

Return to top

Title: The Statistical Adventures of an Applied Sociologist: Reflections, Challenges, and Lessons Learned

  • Speaker: Henry Y.H. Wong, Ph.D, Director of Program Assessment and Research, Cygnus Corporation, Inc.
  • Chair: Mel Kollander
  • Date/time: Wednesday, June 24, 2009 / 12:00 - 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources Section

Abstract:

The primary objective of the presentation is aimed to reflect on a 30 year career of encountering various types of statistical challenges. Starting from the introduction to the Applied Sociology program in the graduate school and applying social statistics and population projections for the dissertation research, to the experience of facing new statistical applications along the professional paths. These experiences included the challenges of developing national manpower forecasts and labor policies, applying national income account for measuring the labor productivity changes, and managing a Construction Project Costing System for an oil rich nation; designing community-based household survey for the collection of biometric measures and development of physical growth standards for newborn and pre-school children in a developing country; tackling biostatistics in a randomized controlled trial for the U.S. Army; overseeing a data resources center for an NIH institute; and dealing with the OMB regulations in conducting health communications and social marketing research for government contracts. The discussion will conclude with lessons learned from the journey through these various statistical adventures.

Point of contact e-mail: wongh@cygnusc.com

Return to top

Title: An Empirical Evaluation of Signal Extraction Goodness-of-fit Diagnostic Tests

  • Chair: Stuart Scott, Bureau of Labor Statistics
  • Speaker: Christopher D. Blakely, University of Maryland
  • Date/Time: Thursday, July 23, 2009 / 12:30-2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

We present a band-limited frequency-domain goodness-of-fit (gof) diagnostic test that is based on signal extraction variances for nonstationary time series. This diagnostic test extends the statistic of McElroy (2008) by taking into account the effects of model parameter uncertainty, and as a result the statistic is a diagnostic of model gof. We explore its size and power through several numerical studies, showing that adequate distributional properties are obtained for fairly short time series (10 to 15 years of monthly data). Our Monte Carlo studies of finite sample size and power consider different combinations of both signal and noise components using seasonal, trend, and irregular component models obtained via canonical decomposition.

Details of the implementation appropriate for ARMA and SARIMA models are given.

Return to top

Title: Calibration Alternatives to Poststratification for Doubly Classified Data

  • Chair: John Eltinge, OSMR, BLS
  • Speaker: Ted Chang, University of Virginia
  • Date/Time: Friday, September, 04, 12:30-2:00pm
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Current Employment Statistics Program, BLS & Methodology Program, WSS

Abstract:

We consider alternatives to poststratification for doubly classified data in which at least one of the two-way cells is too small to allow the poststratification based upon this double classification. In our study data set, the expected count in the smallest cell is 0.36.

One approach is simply to collapse cells. This is likely, however, to destroy the double classification structure. Our alternative approaches allow one to maintain the original double classification of the data.

The approaches are based upon the calibration study by Chang and Kott (2008). We choose weight adjustments dependent upon the marginal classifications (but not full cross classification) to minimize an objective function of the differences between the population counts of the two way cells and their sample estimates. In the terminology of Chang and Kott (2008), if the row and column classifications have I and J cells respectively, this results in IJ benchmark variables and I + J - 1 model variables.

We study the performance of these estimators by constructing simulation simple random samples from the 2005 Quarterly Census of Employment and Wages which is maintained by the Bureau of Labor Statistics. We use the double classification of state and industry group. In our study, the calibration approaches introduced an asymptotically trivial bias, but reduced the MSE, compared to the unbiased estimator, by as much as 20% for a small sample.

Return to top

Title: Improving Differential Expression Analysis with yhe Consideration of Genome-Wide Co-Expression Information

  • Speaker: Yinglei Lai, Ph.D, Assistant Professor of Statistics,
    Department of Statistics, George Washington University, Washington D.C.
  • Chair: Grant Izmirlian, National Cancer Institute
  • Discussant: TBA
  • Date/Time: Friday, September 11, 2009 / 10:00-11:00 a.m.
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, New Research Building, E501, Washington, DC 20007
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University and the Public Health/Biostatistics section of the Washington Statistical

Abstract:

Microarrayshave been widely used in biomedical studies. The differential expression analysis of microarray data is still an interesting topic. The control of false positives in differential expression analysis remains a major challenge although many statistical methods have been proposed for its improvement. Since genes interact with each other during cellular and molecular processes, an efficient incorporation of genome-wide co-expression information may significantly improve the detection of differential expression. We will address our recent research progress in this direction.

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu Details of the implementation appropriate for ARMA and SARIMA models are given.

Return to top

Title: Why is Survey Research 20 Years Behind?

  • Speaker: Robert Fay, Senior Statistician, Westat
  • Chair: Brian Meekins, BLS
  • Date/Time: Tuesday, September 22, 2009 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

The principal goal of this talk is to argue the presupposition of its title. More specifically, the claim is that survey research has fallen approximately 20 years behind developments in relevant basic science. I will limit my scope to a single but broad topic, research on memory. A timeline is offered to establish both parts of the claim, namely (1) a qualitative claim that survey research overlooks important basic findings in memory, and (2) a quantitative claim that the gap is approximately 20 years. The timeline comprises papers and books chosen to illustrate advances in the basic science or implications of memory research for other areas of psychology and behavioral science generally. The talk will offer a few examples of key issues in survey research where the effect of the 20-year gap is evident. I will also suggest a few answers to the why question of the title.

Return to top

Title: Some Lessons from Our Collaborative Studies in Esophageal Cancer, Prostate Cancer, HIV, and Breast Cancer

  • Speaker: George Bonney, Ph.D., Howard University, Washington D.C.
  • Chair: Grant Izmirlian, National Cancer Institute
  • Discussant: TBA
  • Date/Time: Friday, September 25th, 2009/ 10:00 - 11:00 a.m.
  • Location: Georgetown University Medical Center, Lombardi Comprehensive Cancer Center, 3900 Reservoir Rd., NW, New Research Building, E501, Washington, DC 20007
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University and the Public Health/Biostatistics section of the Washington Statistical Society

Abstract:

The work of the Statistical Genetics and Bioinformatics Unit of the National Human Genome Center at Howard University involves the use of high level mathematical and statistical computing skills in biomedicine. Here I briefly discuss questions and results from some of our collaborative studies:

  • Esophageal cancer in Chinese families: Is alcohol really protective?

  • Multiple cancers in Texas families. Is the association with the p53 mutation causal?

  • Prostate Cancer in African American Men: Where are the genes?

  • HIV Prevalence and Incidence among Blacks in Washington DC: Does it make sense to talk of estimation for the whole city using only the data from Howard University Hospital?

  • A Molecular Index for Breast Cancer Risk Assessment? Can we really construct such an index for risk of invasivebreast cancer?

For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Return to top

Title: Filling the Gap: Introducing the Conway-Maxwell-Poisson regression for count data

  • Speaker: Kimberly Sellers, Department of Mathematics, Georgetown University
  • Date/Time: Friday, September 25, 2009, 11:00-12:00pm
  • Location: Media and Public Affairs Building (MPA), Room 309, 805 21st Street, NW, Washington DC 20052
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Poisson regression is a popular tool for modeling count data and is applied in a vast array of applications from the social to the physical sciences. Real data, however, are often over- or under-dispersed and, thus, are not conducive to Poisson regression. Further, the dispersion present in the data may vary depending on other explanatory components. We propose a generalized regression model based on the Conway-Maxwell-Poisson (CMP) distribution to address this problem. The CMP regression generalizes the well-known Poisson and logistic regression models, and is suitable for fitting count data with a wide range of dispersion levels. Further, we extend this model approach to consider a model structure for the dispersion parameter as well to better understand the dispersion component. With a GLM approach that takes advantage of exponential family properties, we will discuss model estimation, inference, diagnostics, and interpretation. As well, we will present hypothesis tests for the implementation of the CMP regression, and a variable selection technique. We will compare the CMP to several alternatives and illustrate its advantages and usefulness using datasets with varying types and levels of dispersion. This talk is based on joint work with Galit Shmueli (University of Maryland College Park).

Return to top

Title: The Sociolinguistics of Survey Translation

  • Speaker: Yuling Pan, Statistical Research Division, Census Bureau
  • Discussant: Eileen O'Brien, Energy Information Administration
  • Chair: Bill McNary, Energy Information Administration
  • Date/Time: Thursday, October 8, 2009 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 10. Bring a photo ID to theseminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsors: WSS Data Collection Methods and DC-AAPOR
  • Presentation material:
    Presentation slides (Pan, pdf, ~232kb)
    Discussion slides (O'Brien, pdf, ~84kb)

Abstract:

With the increasing diversity in the United States population, there is a growing need to translate survey questionnaires and survey documents from English into languages other than English. Challenges arise concerning the functional equivalence of the translated materials and the methodology in ensuring the quality of survey translation.

This presentation analyzes the challenges for survey translation from the perspective of sociolinguistics, a scientific discipline that focuses on the social function of language, and studies the relationship between language, culture, and society. The talk will illustrate three key components of successful survey translation: linguistic rules, cultural norms, and social practices. In order to highlight the connection between these three components, findings from two Census Bureau multilingual projects will be presented and discussed. The talk will conclude with recommendations for future research on survey translation.

Return to top

Title: Using Longitudinal Surveys to Evaluate Interventions

  • Speaker: Dr. David Judkins, Senior Statistician,WESTAT Inc.
  • Date/Time: Thursday, October 8, 2009, 3:30pm
  • Location: Room 1313, Math Bldg, University of Maryland College Park
  • Sponsor: University of Maryland, Statistics Program

Abstract:

Longitudinal surveys are often used in evaluation studies conducted to assess the effects of a program or intervention. They are useful for examining the temporal nature of any effects, to distinguish between confounding variables and mediators, and to better control for confounders in the evaluation. In particular, the estimation of causal effects may be improved if baseline data are collected before the intervention is put in place. This presentation will provide an overview of types of interventions, types of effects, some issues in the design and analysis of evaluation studies, and the value of longitudinal data. These points will be illustrated using three evaluation studies: the U.S. Youth Media Campaign Longitudinal Survey (YMCLS), conducted to evaluate a media campaign to encourage 9-to 13-year-old Americans to be physically active; the National Survey of Parents and Youth (NSPY), conducted to evaluate the U.S. National Youth Anti-Drug Media Campaign; and the Gaining Early Awareness and Readiness for Undergraduate Programs (GEAR UP) program, designed to increase the rate of postsecondary education among low-income and disadvantaged students in the United States.

Based on: Piesse, A., Judkins, D., and Kalton, G. (2009). Using longitudinal surveys to evaluate interventions. In P. Lynn (Ed.), Methodology of Longitudinal Surveys (pp. 303-316). Chichester: Wiley.

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

Return to top

Title: The Zooniverse: Advancing Science through User-Guided Learning in Massive Data Streams

  • Speaker:
    Professor Kirk Borne
    Department of Computational and Data Sciences
    George Mason University
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: October 9, 2009
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 301, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

Science projects from all disciplines are producing enormous data repositories, which pose both rich targets for exploration and difficult challenges for data mining. An even greater challenge is posed by high-rate data streams from a vast array of sensors and high-efficiency experiments. Mining the knowledge from these data streams is complicated both by the data volume and the time criticality of identifying novel and important events in the data flow. In addition, data have "inertia" - it is easier to analyze the data while they are moving through the cloud than to extract them later from some vast data repository and then feed them through memory bottlenecks for offline analysis and mining. For cases where the data flow is continuous, the problem gets cumulatively worse with time. Some examples include astronomy (classification of millions of real-time events) and earth system science (the detection and characterization of rapidly forming hazards). We present an intriguing model for user-guided learning from these massive data streams - human computation - which is characterized by enormous cognitive capacity and pattern recognition efficiency. We will describe some remarkable results from the field of citizen science, based upon work with static databases. We envision the eventual application of this emerging computational resource to the problem of massive data stream mining for scientific discovery.

Return to top

19th ANNUAL MORRIS HANSEN LECTURE

Title: The Care, Feeding and Training of Survey Statisticians

  • Speaker:
    Sharon L. Lohr
    Thompson Industries DeanÕs
    Distinguished Professor of Statistics
    Arizona State University
  • Discussants
    James Lepkowski, University of Michigan
    Donsig Jang, Mathematica
    David Morganstein, Westat
  • Date and Time: Tuesday, October 13, 2009 at 3:30 p.m.
  • Location: Jefferson Auditorium of the U.S. Department of Agriculture's South Building (Independence Avenue, SW, between 12th and 14th Streets); Smithsonian Metro Stop (Blue/Orange Lines). Enter through Wing 5 or Wing 7 from Independence Ave. (The special assistance entrance is at 12th & Independence). A photo ID is required.
  • Sponsors: The Washington Statistical Society, Westat, and The National Agricultural Statistics Service.
  • Brochure (pdf, ~ 108kb)

Abstract:

The two volumes of Sample Survey Methods and Theory by Hansen, Hurwitz, and Madow (1953) have had great influence on the training and practice of survey statisticians. We examine current themes in survey sampling research and relate them to topics taught in classes on survey sampling. We discuss other aspects of university training and background that may help the survey statistician thrive in and adapt to a variety of environments.

Please pre-register for this event to help facilitate access to the building. After August 15, pre-register on line at http://www.nass.usda.gov/morrishansen/.

Return to top

Title: Attitudes Towards Firm and Competition: How do they Matter for CRM Activities?Ê

  • Speaker: Nalini Ravishanker, Department of Statistics, University of Connecticut
  • Date/Time: Monday, October 19th 4:00-5:00 pm
  • Location: Funger Hall 520 (2201 G Street, NW, Washington, DC 20052)
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Easy availability of information on a customer's transactions with the firm and the pressure to establish financial returns from marketing investments has led to a dominance of models that directly connect marketing investments to sales at the customer level. Customer's attitudes, on the other hand, have always been assumed to influence customer's reactions to a firm's marketing communications, but rarely included in models that determine customer value. We empirically assess (a) the role of customer's attitudes in determining their value to the firm, and (b) how knowledge of customer attitudes can influence a firm's customer management strategy. Specifically, we evaluate which aspects of attitudes, i.e., attitudes towards firm or competition, have a bigger effect on customer behavior, and whether customer attitudes are more important for managing some customers than others. We use monthly sales call, sales, and survey based attitude information collected over three years from the same customers of a multinational pharmaceutical firm for this study. We develop a hierarchical generalized dynamic linear model (HGDLM) framework that combines the sales call and sales data that are available at regular time intervals, with customer attitudes that are not available at regular intervals, and carry out inference in the Bayesian framework.

Return to top

Title: Racial Profiling Analysis

  • Speaker: Greg Ridgeway, Ph.D.
    Senior Statistician
    Director, Safety & Justice Research Program
    RAND Corporation
  • Discussant: Joel Garner, Ph.D.
    Chief, Law Enforcement Statistics Unit
    Bureau of Justice Statistics
  • Organizer: Dave Judkins, Westat
  • Chair: Brian Meekins, BLS
  • Date/Time: Tuesday, October 20, 2009 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS

Abstract:

Several studies and high profile incidents around the nation involving police and minorities, such as the July arrest of Harvard Professor Henry Louis Gates, have brought the issue of racial profiling to national attention. While civil rights issues continue to arise in other areas such as offers of employment, job promotions, and school admissions, the issue of race disparities in traffic stops seems to have garnered much attention in recent years. Many communities have asked, and at times the U.S. Department of Justice has required, that law enforcement agencies collect and analyze data on all traffic stops. Data collection efforts, however, so far have outpaced the development of methods that can isolate the effect of race bias on officers' decisions to stop, cite, or search motorists.

In this talk Dr. Ridgeway will describe a test for detecting race bias in the decision to stop a driver that does not require explicit, external estimates of the driver risk set. Second, he'll describe an internal benchmarking methodology for identifying potential problem officers. Lastly, he will describe methods for assessing racial disparities in citation, searches, and stop duration. He will present results from his studies of the Oakland (CA), Cincinnati, and New York City Police Departments.

Return to top

Title: Flexible Stepwise Regression: An Adaptive Partition Approach to the Detection of Multiple Change-Points

  • Speaker: Yinglei Lai, Department of Statistics, George Washington University
  • Time: Friday, October 23, 2009, 3:00-4:00pm
  • Location: Phillips Hall, Room 108 (801 22nd Street, NW, Washington DC 20052)
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

We present the flexible stepwise regression: an adaptive partition approach to the detection of multiple change-points. It partitions a "time course" into consecutive non-overlapped intervals such that the population means/proportions of the observations in two adjacent intervals are significantly different at a given level. This is achieved through a modified dynamic programming algorithm. This method can provide consistent estimation results. It has a wide range of applications. A special case of our method is the reduced isotonic regression. Both simulation and experimental data will be used to illustrate our method.

Return to top

Title: Differences in the Academic Careers of Men and Women at Research Intensive Universities and at Critical Transitions

  • Speaker: Alicia Carriquiry, Department of Statistics, Iowa State University
  • Chair: Promod Chandhok, Bureau of Transportation Statistics/RITA
  • Date/Time: October 28, 2009 (Wednesday) / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Ave., NE. Take the Red Line to Union Station.
  • Sponsor: Social & Demographic Statistics Section

Abstract:

A congressionally mandated report written by the National Research Council examined how women at research intensive (RI) universities fare compared with men at key transition points in their careers. Two national surveys were commissioned to help address the issue. The report's conclusions are based on the findings of these surveys of tenure-track and tenured faculty in six disciplines -- biology, chemistry, mathematics, civil engineering, electrical engineering, and physics -- which were conducted at 89 institutions in 2004 and 2005.

In each of the six disciplines, women who applied for tenure-track positions had a better chance of being interviewed and receiving job offers than male applicants had. This was also true for tenured positions, with the exception of those in biology. Men and women reported comparable access to most institutional resources, including start-up packages, travel funds, and supervision of similar numbers of postdocs and research assistants. And in general, men and women spent similar proportions of their time on teaching, research, and service. Although at first glance men seemed to have more lab space than women, this difference disappeared when other factors such as discipline and faculty rank were accounted for. On most key measures -- grant funding, nominations for awards and honors, and offers of positions at other institutions -- there is little evidence that men and women exhibited differences in outcomes. In terms of salary, men and women are paid comparable salaries except at the rank of full professor where males continue to have an edge. This is probably due to differences among men and women in terms of time in rank.

While the data indicate important progress, there are still areas that need addressing. Most striking is the leakage of women between graduate school and academic positions at RI universities. Women are not applying for tenure-track jobs at RI universities at the same rate at which they are earning Ph.D.s. Furthermore, women were underrepresented among candidates for tenure relative to the number of women assistant professors. While at first glance this might suggest attrition of women during the probationary period, the cross-sectional data available to the committee did not permit addressing the question.

In this talk, we describe the data collection, analysis and synthesis from which the report drew its conclusions. In particular, we discuss the type of inferences that can be drawn from this study and also the type of inferences that the study design did not allow.

The study was sponsored by the National Science Foundation at the request of Congress.

Return to top

Title: Pattern Analysis of Pairwise Relationship in Genetic Network

  • Speaker: Ao Yuan, Ph.D., National Human Genome Center, Howard University
  • Date/Time: Thursday, November 5, 2009 / 11am - noon
  • Location: Conference Room 9091, Two Rockledge Center, 6701 Rockledge Drive, Bethesda, MD 20892
  • Sponsor: Office of Biostatistics Research, Division of Prevention and Population Sciences, National Heart, Lung, and Blood Institute

Abstract:

In recent times genetic network analysis has been found to be useful in the study of gene-gene interactions, and the study of gene-gene correlations is a special analysis of the network. There are many methods for this goal. Most of the existing methods model the relationship between each gene and the set of genes under study. These methods work well in applications, but there are often limitations on network size, issues of non-uniqueness of solution and/or computational difficulties, and interpretation of results. Here we study this problem from a different point of view: given a measure of pairwise gene-gene relationship, we use the technique of pattern image restoration to infer the optimal network pairwise relationships. In this method, the genetic network can be of any size, the solution always exists and is unique, and the results are easy to interpret in the global sense and is computationally simple. The regulatory relationships among the genes are inferred according to the principle that neighboring genes tend to share some common features. The network is updated iteratively until convergence, each iteration monotonously reduces entropy and variance of the network, so the limit network represents the clearest picture of the regulatory relationships among the genes provided by the data and recoverable by the model. The method is illustrated with a simulated data and applied to real data sets. This is a joint work with George Bonney.

Seminar contact: Jungnam Joo (jooj@nhlbi.nih.gov).

Return to top

Title: A Class of Multivariate Distributions Related to Distributions with a Gaussian Component

  • Speaker: Prof. Abram Kagan, UMCP
  • Date/Time: Thursday, November 5, 2009, 3:30pm
  • Location: Room 1313, Math Bldg, University of Maryland College Park
  • Sponsor: University of Maryland, Statistics Program

Abstract:

A class of random vectors (X, Y), X 2 Rj , Y 2 Rk with characteristic functions of the form

h(s, t) = f(s)g(t) exp{s′Ct}

where C is a (jxk)-matrix and prime stands for transposition is introduced and studied. The class possesses some nice properties that will be discussed. A relation of the class to random vectors with Gaussian components is of a particular interest. The goal was to understand what kind of restrictions on the marginal distributions are imposed by an attempt to preserve Gaussianlike properties.

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

Return to top

Title: Panel on Address-Based Sampling

  • Co-Chairs: Kathy Downey & Brian Meekins, OSMR, BLS
  • Panelists:
    Anna Fleeman-Elhini, Arbitron Inc.
    Michael Link, Nielsen Media Research
    Jill Montaquila, Westat
    Robert Poole, OPLC, BLS
  • Date/Time: Tuesday, November 10th, 12:30-2:00pm
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Methodology Program, WSS & DC-AAPOR

Abstract:

Each panelist will give a brief (10-15 minutes) discussion of their experiences in implementing address-based sample designs. Possible topics include, but are not limited to: coverage issues, cost, data quality, survey management, and questionnaire design. Ample time will be allotted for questions following the panelists' remarks.

Return to top

Title: Computer-Intensive Statistical Methodology with Applications to Translational Cancer Research

  • Speaker: Kim-Anh Do, PhD, Department of Biostatistics, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center
  • Date/Time: Friday, November 13, 2009 / 10:00 - 11:00 a.m.
  • Registration Fees:
    Members of the Survey Research Methods Section: $60
    ASA members: $75
    Non-members: $95
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics For information, please contact Caroline Wu at 202-687-4114 or ctw26@georgetown.edu

Abstract:

Early detection is critical in disease control and prevention. Biomarkers provide valuable information about the status of a cell at any given time point. Biomarker research has benefited from recent advances in technologies such as gene expression microarrays, and more recently, proteomics. The long term translational research goal is that if drugs can be targeted to specific tissues in the body, then dosage can be altered to achieve the desired effect while minimizing side effects such as toxicity. Motivated by specific problems involving such high throughput data, I have developed computer-intensive statistical methods based on nonparametric and semiparametric mixture model assumptions for real-time analysis in the context of biomarker discovery. Most biomarker-discovery projects aim at identifying features in the biomarker profiles (gene expression, phage, SAGE, mass spectrometry proteins) that distinguish cancers from normals, between different stages of disease development, or between experimental conditions (such as different treatment arms or different tissue types). Novel statistical methodology development will be highlighted with direct applications to cancer research challenges that address our long term translational goal.

Return to top

Title: Moment Determinancy of Distributions: Some Recent Results

  • Speaker: Jordan Stoyanov, Department of Mathematics and Statistics, Newcastle University
  • Time: Friday, November 13, 2009, 3:00-4:00pm
  • Location: Phillips 108 (801 22nd Street, NW, Washington, DC 20052)
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences

Abstract:

The main discussion will be on distributions and their properties expressed in terms of the moments which are assumed to be finite. We describe distributions which are unique (M-determinate) and others which are non-unique (M-indeterminate). We also show the practical importance of these properties in areas such as Financial modelling and Reliability analysis.

We start briefly with classical criteria and turn to very recent developments based on the so-called Krein-Lin techniques. Thus we will be able to analyze Box-Cox functional transformations of random data and characterize the moment determinacy of their distributions. Distributions of stochastic processes such as the Geometric BM and the solutions of SDEs will also be considered.

All statements and criteria will be well illustrated by examples involving popular distributions such as Normal, Skew-Normal, Log-normal, Skew-Log-Normal, Exponential, Gamma, Poisson, IG, etc. Several facts will be reported, it seems some of them are not so well-known, they are a little surprising and even shocking.

The material will be addressed to professionals in Statistics/Probability, Stochastic modeling and also to Doctoral and Master students in these areas. If time permits, some open questions will be outlined.

Return to top

Title: Combined State and Parameter Estimation in General State-Space Models

  • Speaker: Jonathan Stroud, Department of Statistics, George Washington University
  • Date: Monday, November 16, 2009, 4:00-5:00pm
  • Location: Phillips Hall, Room 111 (801 22nd Street, NW, Washington, DC 20052)
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

his talk considers the problem of combined state and parameter estimation in general state-space models. Working within the Bayesian framework, we derive simulation-based (MCMC and sequential Monte Carlo) strategies for filtering, smoothing and parameter estimation. The approaches are quite general and can be applied to a wide class of models, including nonlinear, non-Gaussian and continuous-time models. We illustrate the methods using a stochastic volatility jump-diffusion model and a dynamic spatio-temporal model.

Return to top

Title: Empirical likelihood based inference for quantiles and low income proportions in selection bias and missing data problems

  • Speaker: Jing Qin, Ph.D., Mathematical Statistician, National Institute of Allergy and Infectious Diseases
  • Chair: David Judkins, Westat
  • Discussant: Paul Zador, Ph.D., Senior Statistician, Westat
  • Date/Time: November 19, 2009 (Thursday) / 12:3 0- 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Video Link: Westat, Rockville Offices. On a trial basis, Westat is opening up its conference center for watching the lecture remotely. Reservation required. Call Fran Winter, 301-294-4419.
  • Sponsor: Methodology Program, WSS

Abstract:

In this talk, we study covariate-adjusted median treatment effects based on the empirical likelihood method. This method is useful for studying treatment effects on skewed variables in studies where treatment is not randomly assigned. A closely related problem is to estimate the low income proportion on a sample subject to nonresponse that is ignorable given measured covariates but is not completely random. The low income proportion is defined as the proportion of the population income falling below a given fraction A(0<A<1) of the Bth (0<B< 1) quantile of the income distribution. It is an important index in comparisons of poverty in countries around the world. The stability of a society depends on this index heavily. An accurate and reliable estimation of this index plays an important role for government's economic policies.

Return to top

Title: What Data Mining Teaches Me About Teaching Statistic

  • Speaker: Dick De Veaux, Department of Mathematics and Statistics, Williams College
  • Date: Thursday, November 19, 2009
  • Time: 4:00-5:00pm
  • Location: Funger Hall, Room 620 (2201 G Street, NW, Washington, DC 20052)
  • Sponsor: The George Washington University, Department of Statistics

Abstract:

Data mining has been defined as a process that uses a variety of data analysis and modeling techniques to discover patterns and relationships in data that may be used to make accurate predictions and decisions. Statistical inference concerns the same problems. Are the two really different? Through a series of case studies, we will try to illuminate some of the challenges and characteristics of data mining. Each case study reminds us that the important issues are often the ones that transcend the methodological choice one faces when solving real world problems. What lessons can these teach us about teaching the introductory course?

Return to top

Title: Perfect simulation of Vervaat perpetuities

  • Speaker: Prof. James Allen Fill, Department of Applied Mathematics and Statistics, Johns Hopkins University
  • Date/time: Tuesday, December 1, 2009, 3:35 P.M.
  • Location: Bentley Lounge, Gray Hall 130, American University. Metro Red line to Tenleytown-AU. AU shuttle bus stop is next to the station. Please see campus map on http://www.american.edu/maps/ for more details or download the brochure.
  • Sponsor: American University Department of Mathematics and Statistics Colloquium

Abstract:

I will explain how to use "coupling into and from the past" to sample perfectly in a simple and provably fast fashion from the Vervaat family of perpetuities -- or at least a large subfamily thereof. The motivation for this joint work with Mark Huber was to sample perfectly from the Dickman distribution, which arises both in number theory and in the analysis of the Quickselect algorithm.

Point of contact: Stacey Lucien, 202-885-3124, mathstat@american.edu.

Return to top

Title: Why I am Not a System Dynamicist

  • Speaker:
    Professor Rob Axtell
    Department of Computational Social Science, George Mason University
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: December 4, 2009
  • Location:
    Department of Computational and Data Sciences George Mason University
    Research 1, Room 301, Fairfax Campus
    George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

System dynamics models in the social sciences are typically interpreted as a summary or aggregate representation of an autonomous dynamical system composed of a large number of interacting entities. The high dimensional microscopic system is 'compressed' into a lower dimensional form in the process of creating system dynamics models. In order to be useful, the reduced form representation must have some fidelity to or verisimilitude with the underlying dynamical system. In this talk I demonstrate mathematically that even so-called perfectly aggregated models will in general display a host of pathologies that are a direct consequence of the aggregation process and which have no analog at the microscopic level. First, a macroscopic model that perfectly aggregates a microscopic system will either not exist or be not unique. Second, with respect to the underlying model the macroscopic system can display spurious equilibria, have altered stability properties, exhibit peculiar sensitivity structure, manifest corrupted bifurcation behavior, and present anomalous statistical features. As a result, there is a definite sense in which even the best system dynamics models are at least potentially problematical, if not outright misrepresentations of the systems they purport to describe. From these formal results I conclude that the majority of such models may have little practical utility in decision support environments where the results of models are used to set policies, although they probably have some value as 'thought experiments' in which scientists seek to clarify their own thinking about the coarsest features of specific social processes.

Return to top

Title: U-Estimation for Measurement Error Problems

  • Speaker: Jiayang Sun, Ph.D. Department of Statistics, Case Western Reserve University
  • Date: Thursday, December 10, 2009
  • Time: 11-12pm
  • Location: Conference room 9091, Two Rockledge Center, 6701 Rockledge Drive, Bethesda, MD 20862-7913
  • Sponsor: Office of Biostatistics Research, Division of Prevention and Population Sciences, National Heart, Lung, and Blood Institute

Abstract:

Measurement error problems occur frequently. For example, systolic blood pressures, some microarray data from colon cancer patients, data from longitudinal DTI images, and velocity measurements in astronomy include measurement errors. A popular approach to measurement error problems has been to deconvolve the measurement error to lead to deconvolution estimators, a.k.a. Fourier Estimators. However, these estimators may have slow convergence rates and be subject to some computational complexity, even when they are optimal in the traditional sense. In this talk, we advocate a paradigm change and propose a new approach, U-estimation. The idea starts from the case when the error is uniformly distributed. It then proceeds to the case when error is distributed as a linear combination of uniforms, hence approximating a large class of error distributions (including the normal distribution). The resulting estimators have faster convergence rates, are stable and easy to compute. There are no Fourier transformations needed. (This is joint work with Xiaofeng Wang and Michael Woodroofe.)

Return to top

Title: Geographic Information (GIS) Data Collection and Storage

  • Speakers: Chuck Roberts, ESRI Federal Account Manager and Tosia Shall, ESRI Sales Engineer
  • Discussant: Rick Mueller, Head/Spatial Analysis Research, National Agricultural Statistics Service
  • Chair: Marcela Rourk, Mathematical Statistician, Energy Information Administration
  • Date/Time: Wednesday, December 16, 2009 / 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics Conference Center, Room 2. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Take the Red Line to Union Station.
  • Sponsors: WSS Data Collection Methods, WSS Agriculture and Natural Resources and DC-AAPOR
  • Presentation material:
    Slides from Roberts presentation (pdf, ~3.1mb)
    Slides from Mueller presentation (pdf, ~14.8mb)

Abstract:

This presentation will describe methods creating geographic data, storing it in a database, and displaying it for analysis. We will detail methods of data collection, what attributes differentiate GIS data from other types of data, and how to best format the data for storage in a database.

Once the data are geographically referenced in a database, we will further explore how these GIS data can be accessed and displayed with data from other sources to further enhance its usability. These other sources of data can be internal or external to your organization. We will discuss some of these external data sites as well as detail how they disseminate their GIS data.

For further information contact Carol Joyce Blumberg at carol.blumberg@eia.doe.gov or (202) 586-6565.

Return to top

Title: Frailty Modeling via the Empirical Bayes Hastings Sampler

  • Speaker: Prof. Richard Levine, San Diego State University
  • Date/Time: Wednesday, December 16, 2009, 3pm (Note Time and Room Change)
  • Location: Colloquium Room 3206, Math Bldg, University of Maryland College Park
  • Sponsor: University of Maryland, Statistics Program

Abstract:

Studies of ocular disease and analyses of time to disease onset are complicated by the correlation expected between the two eyes from a single patient. We overcome these statistical modeling challenges through a nonparametric Bayesian frailty model. While this model suggests itself as a natural one for such complex data structures, model fitting routines become overwhelmingly complicated and computationally intensive given the nonparametric form assumed for the frailty distribution and baseline hazard function. We consider empirical Bayesian methods to alleviate these difficulties through a routine that iterates between frequentist, data-driven estimation of the cumulative baseline hazard and Markov chain Monte Carlo estimation of the frailty and regression coefficients. We show both in theory and through simulation that this approach yields consistent estimators of the parameters of interest. We then apply the method to the short-wave automated perimetry (SWAP) data set to study risk factors of glaucomatous visual field deficits.

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

Return to top

Title: Frailty Modeling via the Empirical Bayes Hastings Sampler

  • Speaker: Prof. Richard Levine, San Diego State University
  • Date/Time: Wednesday, December 16, 2009, 3pm (Note Time and Room Change)
  • Location: Colloquium Room 3206, Math Bldg, University of Maryland College Park
  • Sponsor: University of Maryland, Statistics Program

Abstract:

Studies of ocular disease and analyses of time to disease onset are complicated by the correlation expected between the two eyes from a single patient. We overcome these statistical modeling challenges through a nonparametric Bayesian frailty model. While this model suggests itself as a natural one for such complex data structures, model fitting routines become overwhelmingly complicated and computationally intensive given the nonparametric form assumed for the frailty distribution and baseline hazard function. We consider empirical Bayesian methods to alleviate these difficulties through a routine that iterates between frequentist, data-driven estimation of the cumulative baseline hazard and Markov chain Monte Carlo estimation of the frailty and regression coefficients. We show both in theory and through simulation that this approach yields consistent estimators of the parameters of interest. We then apply the method to the short-wave automated perimetry (SWAP) data set to study risk factors of glaucomatous visual field deficits.

Directions to Campus: http://www.math.umd.edu/department/campusmap.shtml

Return to top

Title: Comparing the Census Bureau's Master Address File (MAF) with both Fresh Area Listing and Commercial Address Lists

  • Speaker: Clifford Loudermilk and Timothy Kennel Mathematical Statisticians U. S. Census Bureau
  • Organizer: David Judkins Senior Statistician, Westat
  • Chair: TBD
  • Discussant: TBD
  • Date/Time: Thursday, December 17, 2009 / 12:00 - 1:30 p.m.
  • Location: Bureau of Labor Statistics, Conference Center. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Video Link: Westat, Rockville Office. On a trial basis, Westat is opening up its conference center for watching the lecture remotely. Reservation required. Call Fran Winter at 301-294-4419.
  • Sponsor: Methodology Program, WSS

Abstract:

This is an expanded version of two talks from the JSM. In the first talk, Loudermilk reports on joint work with Mei Li to assess the suitability of the MAF as a replacement for the frame for current surveys at the Census Bureau such as the Current Population Survey. They used fresh area listings for this purpose. In the second talk, Kennel reports on joint work with Mei Li to compare the coverage of a commercially available address list with that of the MAF and to that from the same fresh area listings produced to study MAF coverage. Together, these talks should be of high interest to sampling statisticians both inside and outside of the federal government.

Return to top

Title: Probability of Detecting Disease-Associated SNPs in Genome-Wide Association Studies

  • Speaker: Ruth Pfeiffer, PhD, Biostatistics Branch, Div. of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
  • Date/Time: Friday, December 18th, 12:15-1:30
  • Location: Executive Plaza North, 6130 Executive Boulevard, Room 319, Rockville MD. Photo ID and sign-in required. Metro: Get off at the White Flint stop on the red line, and take Nicholson lane to Executive blvd. Make a Right and continue, crossing Old Georgetown rd. When you reach what is more or less tree foliage and the road begins to bend to the right you enter the executive plaza complex parking lot. EPN will be the right most of the two twin buildings, Map: http://dceg.cancer.gov/images/localmap.gif
  • Sponsor: Public Health and Biostatistics Section, WSS and the NCI

Abstract:

Some case-control genome-wide association studies (GWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. For such a study, we define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected", namely have one of the top T largest chi-square values for trend tests of association. Unlike power calculations DP reflects the ranking of p-values among various SNPs. We study DP analytically and via simulations, for fixed and random effects models of genetic risk. For a genetic odds ratio per minor allele of 1.2 or less, even GWAS with 1000 cases and 1000 controls require T to be impractically large to achieve an acceptable DP. We extended these results to two-stage GWASs where all SNPs are analyzed in stage 1, and the SNPs with the smallest p values are then followed up in a second stage. Large sample sizes for stage 1 are required to achieve acceptable DP, and one-stage designs can be recommended in many settings. We also compared several procedures to combine GWAS data from several different studies both in terms of the power to detect a disease-associated SNP while controlling the genome-wide significance level, and in terms of the detection probability (DP).

This is joint work with Mitchell Gail

Return to top

Seminar Archives

2017 2016 2015 2014 2013
2012 2011 2010 2009
2008 2007 2006 2005
2004 2003 2002 2001
2000 1999 1998 1997
1996 1995    

Methodology