Washington Statistical Society on Meetup

Washington Statistical Society Seminars: 2012

January 2012
6
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Statistics
Markov Chain Monte Carlo for Inference on Phase-Type Models
24
Tues.
Calibration Adjustment for Nonresponse in Cross-Classified Data
27
Fri.
George Washington University
Department of Statistics
Estimating Relative Risks for Longitudinal Binary Response Data
February 2012
7
Tues.
Design Effects for Unequal Weighting
7
Tues.
George Washington University
Department of Statistics
Simulation-Based Maximum Likelihood Inference For Partially Observed Markov Process Models
6
Thur.
University of Maryland
Department of Statistics
Monotonicity in the Sample Size of the Length of Classical Confidence Intervals Abstract
10
Fri.
George Washington University
Department of Statistics
Statistical Methods for Dynamic Models with Application Examples
10
Fri.
George Mason University
CDS/CCDS/Statistics Colloquium Series Seminar
Visual Clustering with Quantized Generalized Parallel Coordinates
16
Thur.
University of Maryland
Department of Statistics
An Accurate Genetic Clock And The Third Moment
17
Fri.
George Washington University
Department of Statistics
Matern Class of Cross-Covariance Functions for Multivariate Random Fields
21
Tue.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Statistics
Providers' Profiling for Supporting Decision Makers in Cardiovascular Healthcare
23
Thur.
University of Maryland
Department of Statistics
robabilistic Hashing Methods for Fitting Massive Logistic Regressions and SVM with Billions of Variables
24
Fri.
Cancer As A Failure Of Multicellularity: The Role Of Cellular Evolution
29
Wed.
Geo-Spatial tools in Abu Dhabi Census 2011
March 2012
7
Wed.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Statistics
Business Analytics Degrees: Disruptive Innovation or Passing Fad?
9
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Statistics
Dynamic Multiscale Spatio-Temporal Models for Gaussian Areal Data
15
Thur.
Measuring Household Relationships in Federal Surveys
15
Thur.
University of Maryland
Department of Statistics
Generalized P-values: Theory and Applications
23
Fri.
Georgetown University
Department of Mathematics & Statistics
Opportunities in Mathematical and Statistical Sciences
23
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Differential principal component analysis of ChIP-seq
23
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Statistics
Semi-parametric Bayesian Modeling of Spatiotemporal Inhomogeneous Drift Diffusions in Single-Cell Motility
23
Fri.
George Washington University
Department of Statistics
Current Challenges in Mathematical Genomics: A study on using Olfactory Receptors(ORs)
30
Fri.
JPSM Distinguished Lecture Series
Do Survey Respondents Lie? Situated Cognition and Socially Desirable Responding
April 2012
3
Tues.
Demographic Statistical Methods Division Distinguished Seminar Series
Replication Methods for Variance Estimation with Survey Data
6
Fri.
George Washington University
Department of Statistics
Bivariate/Multivariate Markers and ROC Analysis
9
Mon.
American University
Info-Metrics Institute
Detection of Structural Breaks and Outliers in Time Series
12
Thur.
Roger Herriot Award Lecture Statipedia at 11/2 Years: What's Working & What's Not
12
Thur.
University of Maryland
Department of Statistics
On Modeling and Estimation of Response Probabilities when Missing Data are Not Missing at Random
13
Fri.
George Washington University
Department of Statistics
A Model-based Approach to Limit of Detection in Studying Persistent Environmental Chemicals Exposures and Human Fecundity
13
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences, Department of Decision Sciences & Department of Decision Statistics
Optimal Stopping Problem for Stochastic Differential Equations with Random Coefficients
17
Tues.
The New FERPA
18
Wed.
Estimating the Binomial N
19
Thur.
Measuring Sexual Identity in Federal Surveys
20
Fri.
George Washington University
Department of Statistics
On Statistical Inference in Meta-Regression
25
Wed.
Transitioning to the New American FactFinder - a 1/2 day training session
26
Thur.
American University
Info-Metrics Institute
Simulation based Bayes Procedures for Three 21st Century Key Research Issues
26
Thur.
University of Maryland
Department of Statistics
Bivariate Nonparametric Maximum Likelihood Estimator With Right Censored Data
27
Fri.
George Washington University
Department of Statistics, Department of Mathematics & Department of Decision Science
Subjective Probability: Its Axioms and Acrobatics
27
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Statistics
Information about Dependence in the Absence and Presence of a Probable Cause
May 2012
9
Wed.
Introduction to Statistics Without Borders and Discussion of the Global Citizen Year Project
11
Fri.
The National Academies Committee On National Statistics - Public Seminar
The Future of Social Science Data Collection
11
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences, Department of Decision Sciences & Department of Decision Statistics
Analysis of Multi-server Ticket Queues with Customer Abandonment
23
Wed.
2012 President's Invited Seminar
State of the Statistical System
June 2012
11
Mon.
Evaluating the Environmental Protection Agency's Leadership Development Workshops
14
Thur.
June 2012 AIR Psychometrician Group Meeting
19
Tues.
Nonresponse Modeling in Repeated Independent Surveys in a Closed Stable Population
August 2012
8
Wed.
The Bayesian Paradigm for Quantifying Uncertainty
September 2012
6
Thur.
University of Maryland
Department of Statistics
Bayesian Quantile Regression with Endogenous Censoring
12
Wed.
American University
Info-Metrics Institute
The Measurement and Behavior of Uncertainty: Evidence from the ECB Survey of Professional Forecasters
13
Thur.
University of Maryland
Department of Statistics
Flexible Bayesian Models for Process Monitoring of Paradata Survey Quality Indicators
14
Fri.
George Washington University
Department of Statistics
Small Area Confidence Bounds on Small Cell Proportions in Survey Populations
14
Fri.
George Mason University
Department of Statistics
Coalescence in Branching Processes
20
Thur.
University of Maryland
Department of Statistics
Adding One More Observation to the Data
21
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Statistics
Hospitals clustering via semiparametric Bayesian models: Model based methods for assessing healthcare performance
25
Tues.
American University
Department of Mathematics and Statistics Colloquium
The Mismeasure of Group Differences in the Law and the Social and Medical Sciences
28
Fri.
George Washington University
Department of Statistics
Interdisciplinary Methods for Prediction and Confidence Sets
28
Fri.
George Mason University
Department of Statistics
Examining Moderated Effects of Additional Adolescent Substance Use Treatment: Structural Nested Mean Model Estimation using Inverse-Weighted Regression-With-Residuals
October 2012
4
Thur.
University of Maryland
Department of Statistics
Inference for High Frequency Financial Data: Local Likelihood and Contiguity
5
Fri.
George Mason University
Department of Statistics
Assessing the Relative Performance of Absolute Penalty and Shrinkage Estimation in Weibull Censored Regression Models
9
Tues.
22nd Morris Hansen Lecture
11
Thur.
Statistics and Audit Sampling with Application to the Eloise Cobell Indian Trust Case
11
Thur.
University of Maryland
Department of Statistics
Some Recent Developments of the Support Vector Machine
12
Fri.
Longitudinal High-Dimensional Data Analysis
12
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Robust Statistics and Applications
18
Thur.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
Extropy: A complementary dual of entropy
18
Thur.
University of Maryland
Department of Statistics
A Statistical Paradox
19
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences, Department of Decision Sciences & Department of Decision Statistics
Kolmogorov Stories
19
Fri.
George Mason University
Department of Statistics
Detection of Structural Breaks and Outliers in Time Series
22
Mon.
Weight calibration and the survey bootstrap
22
Mon.
U.S. Census Bureau
DSMD Distinguished Seminar Series
Uses of Models in Survey Design and Estimation
23
Mon.
2012 Herriot Award
Issues in the Evaluation of Data Quality for Business Surveys
25
Thur.
University of Maryland
Department of Statistics
On the Nile problem by Ronald Fisher
26
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
Modeling of Complex Stochastic Systems via Latent Factors
26
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Marginal Additive Hazards Model for Case-cohort Studies with Multiple Disease Outcomes
26
Fri.
George Mason University
Department of Statistics
Sparse estimation for estimating equations using decomposable norm-based regularizers
31
Wed.
American University
Info-Metrics Institute
On the Foundations and Philosophy of Info-Metrics
November 2012
2
Fri.
Case Studies in Nutrition and Disease Prevention: what went wrong?
2
Fri.
George Mason University
Department of Statistics
Jigsaw Percolation: Which networks can solve a puzzle?
2
Fri.
University of Maryland, Baltimore &
University of Maryland Marlene and Stewart Greenebaum Cancer Center
Molecular Gene-signatures and Cancer Clinical Trials
8
Thur.
Privacy-Utility Paradigm using Synthetic Data
8
Thur.
University of Maryland
Department of Statistics
Penalized Quantile Regression for in Ultra-high Dimensional Data
9
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Some Statistical Issues in Diagnostic Studies with Three Ordinal Disease Stages
9
Fri.
George Mason University
Department of Statistics
On the dynamic control of matching queues
14
Wed.
Adjusting for Nonresponse in the Occupational Employment Statistics Survey
15
Thur.
University of Maryland
Department of Statistics
Quality Assurance Tests of Tablet Content Uniformity: Small Sample US Pharmacopeia and Large Sample Tests
16
Fri.
George Mason University
Department of Statistics
Sequential Tests of Multiple Hypotheses
26
Mon.
Statistical Confidentiality: Modern Techniques to Protect Sensitive Cells when Publishing Tables
29
Thur.
University of Maryland
Department of Statistics
Molecular Gene-signatures and Cancer Clinical Trials
29
Thur.
University of Maryland
Department of Statistics
Molecular Gene-signatures and Cancer Clinical Trials
30
Fri.
Using Safety Signals To Detect Subpopulations: A Population Pharmacokinetic/Pharmacodynamic Mixture Modeling Approach
30
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Statistics
Parametric and Topological Inference for Masked System Lifetime Data
30
Fri.
Georgetown University
Department of Mathematics & Statistics
Adaptive Inference After Model Selection
30
Fri.
George Washington University
Department of Statistics
The Under-Appreciation of the Insights Provided by Non-parametric and Robust Methods in the Analysis of Data Arising in Law and Public Policy
30
Fri.
George Mason University
Department of Statistics
Recent Developments in Machine Learning and Personalized Medicine
December 2012
7
Fri.
George Mason University
Department of Statistics
Improving the design and Analysis of Case-Control Studies of Rare Variation in the Presence of Confounders
10
Mon.
U.S. Census Bureau
DSMD Distinguished Seminar Series
Adjustment and Stabilization: Identifying and Meeting Goals
11
Tues.
Uses of Models in Survey Design and Estimation


Title: Markov Chain Monte Carlo for Inference on Phase-Type Models

  • Speaker: Simon Wilson, School of Computer Science and Statistics Trinity College, Dublin, Ireland
  • Time: Friday, January 6th 11:15 am -12:15 pm
  • Place: Duques 553 (2201 G Street, NW, Washington, DC 20052). Followed by wine and cheese reception.
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Statistics. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

Bayesian inference forphase-type distributions is considered when data consist only of absorption times. Extensions to the methodology developed by Bladt et al. (2003) are presented which enable specific structure to be imposed on the underlying continuous time Markov process and expand computational tractability to a wider class of situations.

The conditions for maintaining conjugacy when structure is imposed are shown. Part of the original algorithm involves simulation of the unobserved Markov process and the main contribution is resolution of computational issues which can arise here. Direct conditional simulation, together with exploiting reversibility when available underpin the changes. Ultimately, several variants of the algorithm are produced, their relative merits explained and guidelines for variant selection provided.

The extended methodology thus advances modelling and tractability of Bayesian inference for phase-type distributions where there is direct scientific interest in the underlying stochastic process: the added structural constraints more accurately represent a physical process and the computational changes make the technique practical to implement. A simple application to a repairable redundant electronic system when ultimate system failure (as opposed to individual component failure) comprise the data is presented. This provides one example of a class of problems for which the extended methodology improves both parameter estimates and computational speed.

Return to top

Title: Calibration Adjustment for Nonresponse in Cross-Classified Data

  • Organizer: Charles Day, WSS Methodology Program Chair
  • Chair: Charles Day, WSS Methodology Program Chair
  • Speaker: Dr. Gretchen Falk, Ernst & Young
  • Date & Time: Tuesday, January 24, 12:30pm-2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center Room 10
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program

Abstract:

In the interest of accurately estimating a parameter of interest, generally a population total, calibration is a method that adjusts the sampling weights of each selected element such that the adjusted estimates of the totals of auxiliary, or benchmark, variables equal the known population totals. Calibration has been used to adjust for frame undercoverage, nonresponse, and sampling weights. To treat nonresponse, under the quasi-randomization model assumptions, the sample of respondents is treated as an additional phase of sampling, where the probabilities of response are estimated from a set of model variables. Under this model and varying response probability assumptions, we explore a special case of the calibration method to treat doubly cross-classified data that uses characteristics of the classification structure as the benchmark and model variables. The resulting calibration estimator can be calculated no matter the minimum sample size over the classification groups and without requiring the collapse of cells, which is its advantage over the poststratified estimator. The theoretical behavior of this special case of the calibration estimator is determined. Empirical results of the theoretical conclusions made and empirical comparison of various estimators are presented and discussed. Finally, further research regarding this special case of the calibration method is discussed and general conclusions are made.

Return to top

Title: Estimating Relative Risks for Longitudinal Binary Response Data

Abstract:

Logistic regression is the dominant modeling technique for measuring the risk of exposure or treatment on binary responses. The measure of risk in a logistic regression is odds ratio (OR), which is also valid in retrospective studies. Nevertheless, relative risk (RR) is often the preferred measure of exposure effect because it is more interpretable. When the prevalence is low, OR is a good approximation to RR. Their difference, however, is large for common responses, in which case, the log-binomial model is more desirable. Despite the fact that various techniques have been developed to estimate RR for data from cross-sectional studies, there is no statistical method available for estimating RR in longitudinal studies. To address this issue, we developed log-binomial regression models for longitudinal binary response data. We consider both the marginal model and the random-effects model. The generalized estimating equation with the COPY method is used to fit the marginal log-binomial model and the Bayesian Markov Chain Monte Carlo method is used to obtain the parameter estimates for the random-effects log-binomial model. The performances of the proposed methods are evaluated and compared with competing methods through a large-scale simulation study. The usefulness of the methods is illustrated with data from a respiratory disorder study.

Return to top

Title: Design Effects for Unequal Weighting

  • Organizer: Charles Day, WSS Methodology Program Chair
  • Chair: Charles Day, WSS Methodology Program Chair
  • Speaker Dr. Kim Henry, Statistics of Income Division, IRS
  • Discussant: Vince Iannacchione, RTI International
  • Date & Time: Tuesday, February 7, 12:30pm-2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center Room 10
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program

Abstract:

Different approaches have been developed to summarize the impact of differential weighting in survey samples. The most popular measure is Kish's (1965, 1992) design-based design effect. Spencer (2000) proposed a simple model-based approach that depends on a single covariate to estimate the impact on variance of using variable weights. Both measures may not accurately produce design effects for unequal weighting induced by calibration adjustments. When the calibration covariates are correlated with the coverage/response mechanism, calibration weights can improve the MSE of an estimator. However, since calibration involves unit-level adjustments, in many applications it produces weights that are more variable than the base weights or weights from category-based nonresponse or postratification adjustments. The Kish and Spencer measures may not be appropriate here; an ideal measure of the impact of unequal calibration weights incorporates both the correlation between the survey variable and weights and the correlation between the survey variable and calibration covariates. We propose a model-based extension of the Spencer design-effect for different variables of interest in single-stage and cluster sampling and under calibration weight adjustments. The proposed methods are illustrated using complex sample case studies.

Return to top

Title: Simulation-Based Maximum Likelihood Inference For Partially Observed Markov Process Models

Abstract:

Estimation of static (or time constant) parameters in a general class of nonlinear, non-Gaussian, partially observed Markov process models is an active area of research. In recent years, simulation-based techniques have made estimation and inference feasible for these models and have offered great flexibility to the modeler. An advantageous feature of many of these techniques is that there is no requirement to evaluate the state transition density of the model, which is often high-dimensional and unavailable in closed-form. Instead, inference can proceed as long as one is able to simulate from the state transition density - often a much simpler problem. In this talk, we introduce a simulation-based maximum likelihood inference technique known as iterated filtering that uses an underlying sequential Monte Carlo (SMC) filter. We discuss some key theoretical properties of iterated filtering. In particular, we prove the convergence of the method and establish connections between iterated filtering and well-known stochastic approximation methods. We then use the iterated filtering technique to estimate parameters in a nonlinear, non-Gaussian mechanistic model of malaria transmission and answer scientific questions regarding the effect of climate factors on malaria epidemics in Northwest India. Motivated by the challenges encountered in modeling the malaria data, we conclude by proposing an improvement technique for SMC filters used in an off-line, iterative setting.

Return to top

Title: Monotonicity in the Sample Size of the Length of Classical Confidence Intervals Abstract

  • Speaker: Dr. Yaakov Malinovsky, Dept. of Math. and Stat., UMBC
  • Date/Time: February 9, 2012, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

It is proved that the average length of standard confidence intervals for parameters of gamma and normal distributions monotonically decreases with the sample size. Though the monotonicity seems a very natural property, the proofs are based on fine properties of the classical gamma function and are of independent interest. (It is a joint work with Abram Kagan).

Return to top

Title: Statistical Methods for Dynamic Models with Application Examples

Abstract:

A dynamical system in engineering and physics, specified by a set of differential equations, is usually used to describe a dynamic process which follows physical laws or engineering principles. The parameters in the dynamical system are usually assumed known. However, an interesting question to ask is how to estimate these parameters when they are not known before. In this talk, I show you two examples where various statistical methods are applied to dynamic models for estimating unknown parameters based on observed data. Eventually, we are interested in predicting the future behavior of the dynamic system. The first example is on modeling HIV viral load dynamics from a clinical trial study. The second is on modeling a complicated interactive network.

Return to top

Title: Visual Clustering with Quantized Generalized Parallel Coordinates

  • Speaker: Rida Moustafa, Ph.D., dMining-Technology (dMT)
  • Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
  • Date: Friday, February 10, 2012
  • Location: Research 1, Room 301, Fairfax Campus, George Mason University, 4400 University Drive, Fairfax, VA 22030
  • Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

Visual pattern discovery in large multivariate datasets is a challenging problem in the fields of data mining and exploratory data analysis. This is due, in part, to the visual cluttering problem, which depends on screen resolutions and the number of points. The cluttering defies most information visualization techniques in general and parallel coordinates in particular. The cluttering effect increases with the number of data records, which makes the visual detection of hidden clusters, trends, correlations, periodicity, and anomalies even more difficult.

In this talk we discuss our hybrid plots called the quantized generalized parallel coordinate plot (QGPCP). The QGPCP detects the frequency of the profile lines (or curves), which represent the multivariate observations in parallel coordinate space, and maps this frequency into a gray (or HSV) scale color to highlight the profile lines (or curves) in a crowded GPCP. The approach has shown a great success in mitigating cluttering and detecting clusters in very large data not only in parallel coordinates but also the Andrews plot and the scatterplot matrix. We demonstrate the QGPCP on cluster tracking and visualization on Remote sensing, Computer Network, and Housing data sets.

Return to top

Title: An Accurate Genetic Clock And The Third Moment

  • Speaker: Prof. David Hamilton, (UMCP)
  • Date/Time: February 16, 2012, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

The genetic clock uses mutations at molecular markers to estimate the time T1 of origin of a population. It has become important in the evolution of species and diseases, forensics, history and geneology. However the two types of methods used yield very different estimates even from the same data. For humans at about 10,000 ybp. Mean square Estimates. (MSE) give results about 100% more than .Bayesian analysis of random trees. (BAT).

Also the SD are about 50% of T1. (In the last 500 years all methods give similar and accurate results). Our new theory explains why MSE overestimates by about 50%, while BAT underestimates by about 25%. This is just not a mathematical problem but involves two quite different physical phenomena. The first comes from the mutation process itself. The second is macroscopic and arises from the reproductive dominance of elite lineages. Our method deals with both giving 15% accuracy at 10,000 ybp. This is precise enough to resolve a question first mentioned in Genesis, argued over by archeologists and linguists (and Nazis): the origin of the Europeans. The theory depends on solving a stochastic system of infinite dimensional ode by hyperbolic Bessel functions. At the heart is a new inequality for probability distributions P normalized with mean . = 0, variance _ = 1: If the third moment ! > 0 we have P(1,+1) > 0.

Return to top

Title: robabilistic Hashing Methods for Fitting Massive Logistic Regressions and SVM with Billions of Variables

  • Speaker: Dr. Ping Li, Cornell University
  • Date/Time: February 23, 2012, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

In modern applications, many statistics tasks such as classification using logistic regression or SVM often encounter extremely high-dimensional massive data sets. In the context of search, certain industry applications have used data-sets in 2^64 dimensions, which are larger than the square of billion. This talk will introduce a recent probabilistic hashing technique called b-bit minwise hashing (Research Highlights in Comm. of ACM 2011), which has been used for efficiently computing set similarities in massive data. Most recently (NIPS 2011), we realized that b-bit minwise hashing can be seamlessly integrated with statistical learning algorithms such as logistic regression or SVM to solve extremely large-scale prediction problems. Interestingly, for binary data, b-bit minwise hashing is substantially much more accurate than other popular methods such as random projections. Experimental results on 200GB data (in billion dimensions) will also be presented.

Return to top

Title: Matern Class of Cross-Covariance Functions for Multivariate Random Fields

  • Speaker: Dr. Tatiyana V Apanasovich, Thomas Jefferson University, Jefferson Medical College, Department of Pharmacology and Experimental Therapeutics, Division of Biostatistics
  • Date: Friday, Feburary 17th. 2012
  • Time: 11:00-12:00 noon
  • Location: Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, Department of Statistics. See http://departments.columbian.gwu.edu/statistics/academics/seminars for a list of seminars.

Abstract:

Data indexed by spatial coordinates have become ubiquitous in a large number of applications, for instance in environmental, climate and social sciences, hydrology and ecology. Recently, the availability of high resolution microscopy together with advances in imaging technology has increased the importance of spatial data to detect meaningful patterns as well as to make predictions in medical applications (brain imaging) and systems biology (images of fluorescently labeled proteins, lipids, DNA). The defining feature of multivariate spatial data is the availability of several measurements at each spatial location. Such data may exhibit not only correlation between variables at each site but also spatial correlation within each variable and spatial cross-correlation between variables at neighboring sites. Any analysis or modeling must therefore allow for flexible but computationally tractable specifications for the multivariate spatial effects processes. In practice we assume that such processes, probably after some transformation, are not too far from Gaussian and characterized well by the first two moments. The model for the mean follows from the context. However, the challenge is to find a valid specification for cross-covariance matrixes, which is estimable and yet flexible enough to incorporate a wide range of correlation structures. Recent literature advocates the use of Matern family for univariate processes. I will introduce a valid parametric family of cross-covariance functions for multivariate spatial random fields where each component has a covariance function from Matern class (Apanasovich et al (2012)). Unlike previous attempts, our model indeed allows for various smoothness and rates of correlation decay for any number of vector components. Moreover, I will provide an example of modeling time dependent spatial data with Matern covariances, where dependences across space and time interact (Apanasovich (2012)). Further I will discuss models for multivariate response variables in both space and time, which include all possible interactions between space/time locations and variables (Apanasovich and Genton (2010)).

The application of the proposed methodologies will be illustrated on the datasets from environmental and soil sciences as well as meteorology and systems biology.

Return to top

Title: Providers' Profiling for Supporting Decision Makers in Cardiovascular Healthcare

  • Speaker: Francesca Ieva, Dipartimento di Matematica "F.Brioschi", Politecnico di Milano
  • Time: Tuesday, February 21st 11:00 am - 12:00 noon
  • Place: Funger 420 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Statistics. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

Investigations on surgical performance have always adopted routinely collected clinical data to highlight unusual providers outcomes. In addition, there are a number of regular reports using routinely collected data to produce indicators for hospitals. As well as highlighting possible high- and low-performers, such reports help in understanding the reasons behind variation in health outcomes, and provide a measure of performance which may be compared with benchmarks or targets, or with previous results to examine trends over time. Statistical methodology for provider comparisons has been developed in the context of both education and health.

It is known that pursuing the issue of adjustment for patient severity (case-mix) is a challenging task, since it requires a deep knowledge of the phenomenon from a clinical, organizational, logistic and epidemiological point of view. However, this is the reason why it is always expected to be inadequate and therefore unavoidable residual variability (over-dispersion) will generally exist between providers. It is then crucial that a statistical procedure is able to assess whether a provider may be considered "unusual". In particular, note that although hierarchical models are recommended since they account for the nested structure in describing hospitals performance, it is not straightforward how assessing unusual performance.

Studies of variations in health care utilization and outcomes involve the analysis of multilevel clustered data. Those studies quantify the role of contributing factors (patients and providers) and assess the relationship between health-care processes and outcomes. We develop Bayes rules for different loss functions for hospital report cards when Bayesian Semiparametric Hierarchical models are used, and discuss the impact of assuming different loss functions on the number of hospitals identified as "non acceptably performing". The analysis is carried out on a case study dataset arising from one of the clinical survey arising from Strategic Program of Regione Lombardia, concerning patients admitted with STEMI to one of the hospitals of its Cardiological Network.

The major aim consists of the comparison among different loss functions to discriminate among health care providers' performances, together with the assessment of the role of patients' and providers' characteristics on survival outcome. The application of this theoretical setting to the problem of managing a Cardiological Network is an example of how Bayesian decision theory could be employed within the context of clinical governance of Regione Lombardia. It may point out where investments are more likely to be needed, and could help in not to loose opportunities of quality improvement.

Return to top

Title: Cancer As A Failure Of Multicellularity: The Role Of Cellular Evolution

  • Speaker: John Pepper, Ph.D., Biometry Research Group, Division of Cancer Prevention, NCI
  • Date/Time: Friday, February 24th 11:30am-12:30pm
  • Location: Location: Executive Plaza North (EPN), Conference Room G, 6130 Executive Boulevard, Rockville MD. Photo ID and sign-in required.
  • Metro: Get off at the White Flint stop on the red line, and take Nicholson lane to Executive blvd. Make a Right and continue, crossing Old Georgetown rd. When the road bends to the right make a left turn to enter the executive plaza complex parking lot. EPN will be the right most of the two twin buildings
  • Map: http://dceg.cancer.gov/images/localmap.gif
  • Sponsor: Public Health and Biostatistics Section, WSS and the NCI

Abstract:

Cancer results froma process of cellular evolution. Key cancer defenses and vulnerabilities arose from the ancient evolutionary transition from single-celled to multicellular organisms. Because cellular evolution leads inexorably to cancer, organismal evolution has organized cell reproduction into patterns that are less subject to cellular evolution. We used an agent-based computational model of evolution inside tissues to test the hypothesis that cell differentiation is crucial to suppressing cellular evolution within the body. The hypothesis was supported. If this most basic safeguard is compromised, all the obstacles to cancer built by organismal evolution are quickly dismantled by cellular evolution within the organism. Other simulations addressed the origins of tissue invasion and metastasis.

Return to top

Title: Geo-Spatial tools in Abu Dhabi Census 2011

  • Speaker: Yousef Al Hammadi, PhD., Survey Planning, Design and Processing Department Manager, Statistics Centre Abu Dhabi (SCAD)
  • Chair: Michael P. Cohen, American Institutes for Research
  • Date/time: Wednesday, February 29, 2012 12:30 - 1:30 p.m.
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Agriculture and Natural Resources

Abstract:

Statistics Centre - Abu Dhabi's (SCAD's) mission statement reads, "… provide relevant and reliable official statistics serving policy makers, the business community and the public". SCAD is motivated by this mission and produces quality statistics so users can make informed, evidence-based decisions for the betterment of the citizens of the Emirate.

In 2011, SCAD undertook its first census. From the outset, SCAD declared its commitment to conducting the Census using state-of-the-art technologies for collecting, managing, storing and disseminating census data. In terms of geographic information systems (GIS), SCAD employed spatial technology in three statistical production processes: 1. Design and production of enumeration areas; 2. Census collection using geo-enabled iPads; and 3. Census dissemination - on-line thematic maps.

1. Design and production of enumeration areas: ESRI's ArcGIS was used for map design and production of more than 4,200 enumerator work areas (WA) and 11,300 enumerator sub-work areas (SWA). All maps were quality checked and reviewed.

2. Census collection using geo-enabled iPads: SCAD is one of the first statistical organisations in the world to conduct Census household interviews exclusively using iPad and Galaxy mobile devices. Each iPad contained interactive questionnaires (Arabic/English), GPS, mapping system, and reference material.

3. Census dissemination - on-line thematic maps. In collaboration with SAS Middle East, SCAD is developing a range of on-line statistical tools that give greater utilisation of the rich census data then has been available previously. The web-based tools include Census Thematic maps; Census Community Tables; and a Census Table Builder. The on-line thematic maps provide a unique visual representation of many Census variables making 'clusters' or 'hot spots' easily identifiable.

Point of contact e-mail: ysalhammadi@scad.ae

Return to top

Title: Business Analytics Degrees: Disruptive Innovation or Passing Fad?

  • Speaker: Michael Rappa, Institute for Advanced Analytics, North Carolina State University
  • Time: Wednesday, March 7th 10:30 am -11:45 noon
  • Place: Duques 453 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Statistics. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

Recently more and more schools have begun offering degrees in business analytics. This talk will use the nation's first Master of Science in Analytics, now in its fifth year, as a backdrop to discuss the rise of analytics degree programs and the implications for business schools. In a future where data-driven decisions will be critically important to the success of business, will analytics become the impetus for disruptive innovation that transforms business education? Or is analytics simply the latest in a long line of management fads soon to be forgotten?

Bio: Michael Rappa is the founding director of the Institute for Advanced Analytics and Distinguished University Professor in the Department of Computer Science at North Carolina State University. As head of the Institute, he leads the nation's first and preeminent Master of Science in Analytics (MSA) as its principal architect. The MSA blends statistics, applied mathematics, computer science and business disciplines into an innovative education focused on the analysis of very large amounts of data.

Dr. Rappa is perhaps best known to millions of students around the world as the author of Managing the Digital Enterprise, an open courseware site he has maintained on the web for over a decade. The site contains his early categorization web business models, one of the most widely read on the subject.

Prior to joining NC State, for nine years he was a professor at MIT.

Return to top

Title: Dynamic Multiscale Spatio-Temporal Models for Gaussian Areal Data

  • Speaker: Marco A. Ferreira, Department of Statistics, University of Missouri - Columbia
  • Time: Friday, March 9th 11:00-12:00 noon
  • Place: Duques 453 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Statistics. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

We introduce a new classof dynamic multiscale models for spatio-temporal processes arising from Gaussian areal data. Specifically, we use nested geographical structures to decompose the original process into multiscale coefficients which evolve through time following state-space equations. Our approach naturally accommodates data observed on irregular grids as well as heteroscedasticity. Moreover, we propose a multiscale spatio-temporal clustering algorithm that facilitates estimation of the nested geographical multiscale structure. In addition, we present a singular forward filter backward sampler for efficient Bayesian estimation. Our multiscale spatiotemporalmethodology decomposes large data-analysis problems into manysmaller components and thus leads to scalable and highly efficient computational procedures. Finally, we illustrate the utility and flexibility of our dynamic multiscale framework through two spatio-temporal applications. The first example considers mortality ratios in the state of Missouri whereas the second example examines agricultural production in Espirito Santo State Brazil.

Return to top

Title: Measuring Household Relationships in Federal Surveys

  • Panelists: Nancy Bates (Census), Martin O'Connell (Census), Andrew Zukerberg (NCES), Sarah Grady (AIR)
  • Date & Time: March 15, 2012, 12:30p to 2:00pm
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Public Policy and Methodology Chairs
  • Note: Audio Conferencing is available. Video conferencing is not available.

Abstract:

Measurement of household and marital relationships in Federal surveys is used to understand how people in the US live and the resources available to them to care for dependents. Household relationship data is also used to inform program eligibility decisions and public policy more generally.

As households in the US continue to change, careful review and revision of measures over time is needed. Panelists from the Census Bureau will present current research on measurement error in counting same-sex households in the 2010 Census and present qualitative results from testing alternative ways to ask about household relationships and marital status. Panelists from the National Center for Education Statistics and the American Institutes for Research will present the results of a split panel experiment designed to test two different approaches to collecting parent and household information about a sampled child.

Return to top

Title: Generalized P-values: Theory and Applications

  • Speaker: Prof. Bimal Sinha, Dept. of Mathematics and Statistics, UMBC
  • Date/Time: March 15, 2012, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

During the last fifteen years or so, generalized R-values have become quite useful in solving testing problems in many non-standard situations. In this talk the notion of a generalized P-value will be explained and its many applications will be presented. The application area will mostly include linear models.

Return to top

Title: Opportunities in Mathematical and Statistical Sciences

Abstract:

In this talk, I will describe many funding and employment opportunities that NSF offers for mathematical and statistical sciences communities. Have you heard CREATIV, MSPRF, CDSE-MSS, SAVI, iCorps, institutes, GRFP, PIRE,.? New programs, as well as established programs, at NSF will be described.

Refreshments will be served.

Return to top

Title: Differential principal component analysis of ChIP-seq

Abstract:

We propose Differential Principal Component Analysis (dPCA) for analyzing multiple ChIP-seq datasets to identify differential protein-DNA interactions between two biological conditions. dPCA integrates unsupervised pattern discovery, dimension reduction, and statistical inference into a single statistical framework. It uses a small number of principal components to concisely summarize the major multi-protein differential patterns between the two conditions. For each pattern, it detects and prioritizes differential genomic loci by comparing the between-condition differences with the within-condition variation among replicate samples. dPCA provides a new tool for efficiently analyzing large amounts of ChIP-seq data to study dynamic changes of gene regulation across different biological conditions. We demonstrate this approach through analyses of differential histone modifications at transcription factor binding sites and promoters.

Refreshments will begin at 9:45 am.

Return to top

Title: Semi-parametric Bayesian Modeling of Spatiotemporal Inhomogeneous Drift Diffusions in Single-Cell Motility

  • Speaker: Ioanna Manolopoulou, Department of Statistical Science, Duke University
  • Time: Friday, March 23th 11:00-12:00 noon
  • Place: Funger 553 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Statistics. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

We develop dynamic models for observations from independent time series influenced by the same underlying inhomogeneous drift. Our methods are motivated by modeling single cell motion through a Langevin diffusion, using a flexible representation for the drift as radial basis kernel regression. The primary goal is learning the structure of the tactic fields through the dynamics of lymphocytes, critical to the immune response. Although individual cell motion is assumed to be independent, cells interact through secretion of chemicals into their environment. This interaction is captured as spatiotemporal changes in the underlying drift, allowing us to flexibly identify regions in space where cells influence each other's behavior. We develop Bayesian analysis via customized Markov chain Monte Carlo methods for single cell models, and multi-cell hierarchical extensions for aggregating models and data across multiple cells. Our implementation explores data from multi-photon vital microscopy in murine lymph node experiments, and we use a number of visualization tools to summarize and compare posterior inferences on the 3-dimensional tactic fields.

Return to top

Title: Current Challenges in Mathematical Genomics: A study on using Olfactory Receptors(ORs)

Abstract:

Scientists have come to know that roughly there are 700 ORs of length on an average about 1000 base pairs in Human Genome. But it is true that we can make 4^1000 codes of length 1000 which consist of A, T, C and G nucleotides. Out of these only 700 are selected by NATURE as human olfactory DNA sequences. So, it could be either by some formation methodology or by some selection methodology or both one followed by another. These issues will be discussed in the talk.

Return to top

Demographic Statistical Methods Division Distinguished Seminar Series

Title: Replication Methods for Variance Estimation with Survey Data

  • Presenter: Dr. Jun Shao, Department of Statistics, University of Wisconsin-Madison
  • Discussant: Some Remarks on BRR Variance Estimation at the Census Bureau
    Presenter: Dr. Eric Slud, US Census Bureau and University of Maryland
  • Date: Tuesday, April 3, 2012
  • Time: 10:00AM - 11:15AM
  • Where: US Census Bureau, Conference Room 4. 4600 Silver Hill Road, Suitland, Maryland
  • Contact: Cynthia Wellons-Hazer, 301-763-4277, Cynthia.L.Wellons-Hazer@census.gov

Abstract:

The first part of this presentation reviews some popular replication methods for survey problems, including the jackknife, random groups, balanced half samples, approximate balanced repeated replications, and bootstrap. Advantages and disadvantages of using these replication methods are discussed. The second part focuses on variance estimation for data with nonresponse and imputation. Although most replication methods can be adopted (with some simple adjustments) in the presence of missing data and/or imputed data, issues such as reducing computational effort, too few data in a replicate, and multiple imputation, are addressed. Return to top

Title: Bivariate/Multivariate Markers and ROC Analysis

Abstract:

This talk considers receiver operating characteristic (ROC) analysis for bivariate marker measurements. The research interest is to extend rules and tools from univariate marker to bivariate marker setting for evaluating predictive accuracy of markers. Using a tree-based and-or classifier, an ROC function together with a weighted ROC function (WROC) and their conjugate counterparts are proposed for examining the performance of bivariate markers. The proposed functions evaluate the performance of and-or classifier among all possible combinations of marker values, and are ideal measures for understanding the predictability of biomarkers in target population. Specific features of ROC and WROC functions and other related statistics are discussed in comparison with those familiar properties for univariate marker. Nonparametric methods are developed for estimating ROC-related functions, (partial) area under curve and concordance probability. The inferential results developed in this paper also extend to multivariate marker measurements with a sequence of arbitrarily combined and-or classifier. The proposed procedures and inferential results are useful for evaluating and comparing marker predictability based on a single or bivariate marker (or test) measurements with different choices of markers, and for evaluating different and-or combinations in classifiers. The approach is applied to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to illustrate the applicability of the proposed procedures.

* Content of this talk is based on joint work with Shanshan Li

Return to top

Title: Detection of Structural Breaks and Outliers in Time Series

Abstract:

Often, time series data exhibit nonstationarity in which segments look stationary, but the whole ensemble is nonstationary. In this lecture, we consider the problem of modeling a class of non-stationary time series with outliers using piecewise autoregressive (AR) processes. The number and locations of the piecewise autoregressive segments, as well as the orders of the respective AR processes are assumed to be unknown and each piece may be contaminated with an unknown number of innovational and/or additive outliers. The minimum description length principle is applied to compare various segmented AR fits to the data. The goal is to find the "best" combination of the number of segments, the lengths of the segments, the orders of the piecewise AR processes, and the number and type of outliers. Such a "best" combination is implicitly defined as the optimizer of a MDL criterion. Since the optimization is carried over a large number of configurations of segments and positions of outliers, a genetic algorithm is used to find optimal or near optimal solutions. Strategies for accelerating the procedure will also be described. Numerical results from simulation experiments and real data analyses show that the procedure enjoys excellent empirical properties. The theory behind this procedure will also be discussed. (This is joint work with Thomas Lee and Gabriel Rodriguez-Yam.)

Return to top

Roger Herriot Award Lecture

Title: Statipedia at 11/2 Years: What's Working & What's Not

  • Speaker: Michael Messner, U.S. Bureau of Labor Statistics
  • Chair: To be announced.
  • Date & Time: Thursday, April 12, 12:30 - 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsors: WSS Methodology Program, ASA Social Statistics Section

Abstract:

Statipedia is a wiki for sharing information on federal statistics and economics. It is open to federal government employees and currently has nearly 100 registered users from 20 agencies. This presentation will reveal how Statipedia developed from an idea, to a pilot, and now an agency- hosted platform for collaboration across federal agencies. A brief tour of the wiki will show some of its content, including new pages, recent edits, DC-area events, and "At the Agencies" pages. Progress over the past 1.5 years will be summarized. Usage and growth have been steady, and although this is typical for "successful wikis," we had expected usage and growth to accelerate as new users were added. Something seems to be missing - or not working as it should. The presentation will conclude with an open discussion to address the problem:

  • What is needed to ensure that Statipedia can be sustained?
  • What is missing?
  • What blocks users from making better use of the wiki - and what can be done to break down those barriers?
  • Is it time to open this up to the larger statistics and economics community?
Return to top

Title: On Modeling and Estimation of Response Probabilities when Missing Data are Not Missing at Random

  • Speaker: Dr. Michail Sverchkov, Bureau of Labor Statistics
  • Date/Time: February 9, 2012, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

Most methods that deal with the estimation of response probabilities assume either explicitly or implicitly that the missing data are missing at random (MAR). However, in many practical situations this assumption is not valid, since the probability to respond often depends on the outcome value or on latent variables related to the outcome. The case where the missing data are not MAR (NMAR) can be treated by postulating a parametric model for the distribution of the outcomes under full response and a model for the response probabilities. The two models define a parametric model for the joint distribution of the outcome and the response indicator, and therefore the parameters of this model can be estimated by maximization of the likelihood corresponding to this distribution. Modeling the distribution of the outcomes under full response, however, can be problematic since no data are available from this distribution. Back in 2008 the speaker proposed an approach that permits estimating the parameters of the model for the response probabilities without modelling the distribution of the outcomes under full response. The approach utilizes relationships between the sample distribution and the sample-complement distribution derived by Sverchkov and Pfeffermann in 2004. The present paper extends the above approach.

Return to top

Title: A Model-based Approach to Limit of Detection in Studying Persistent Environmental Chemicals Exposures and Human Fecundity

Abstract:

Human exposureto persistent environmental pollutants often results in a range of exposures with a proportion of concentrations below the laboratory detection limits. Growing evidence suggests that inadequate handling of concentrations below the limit of detection (LOD) may bias assessment of health effects in relation to chemical exposures. We sought to quantify such bias in models focusing on the day specific probability of pregnancy during the fertile window, and propose a model-based approach to reduce such bias. A flexible multivariate skewed generalized $t$-distribution constrained by LODs is assumed, which realistically represents the underlying shape of the chemical exposures. Correlations in the multivariate distribution provided information across chemicals. A Markov chain Monte Carlo sampling algorithm was developed for implementing the Bayesian computations. The deviance information criterion measure is used for guiding the choice of distributions for chemical exposures with LODs. We applied the proposed approach to data from the Longitudinal Investigation of Fertility and the Environment (LIFE) Study.

Return to top

Title: Optimal Stopping Problem for Stochastic Differential Equations with Random Coefficients

  • Speaker: Mou-Hsiung (Harry) Chang, Mathematical Sciences Division, U.S. Army Research Office
  • Time: Friday, April 13th 4:00 pm - 5:00 pm
  • Place: Funger 553 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences, the Department of Decision Sciences and the Department of Statistics. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

This talk is based on the paper "Optimal Stopping Problem for Stochastic Differential Equations with Random Coefficients", Mou-Hsiung Chang, Tao Pang, and Jiongmin Yong, SIAM J. Control & Optimization, vol. 48, No. 2, pp. 941-971, 2009. The paper received the 2011 SIAM Control and Systems Activity Group best paper award. In this talk we consider an optimal stopping problem for stochastic differential equations with random coefficients. The dynamic programming principle leads to a Hamilton- Jacobi-Bellman equation, which, for the current case, is a backward stochastic partial differential variational inequality (BSPDVI, for short) for the value function. Well-posedness of such a BSPDVI is established, and a verification theorem is proved.

Return to top

Title: The New FERPA

  • Chair: To be announced.
  • Speaker: Michael Hawes, U.S. Department of Education
  • Date & Time: Tuesday, April 17, 12:30 p.m. - 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program and Federal Committee on Statistical Methodology, Confidentiality and Data Access Committee.

Abstract:

The Department of Education administers the Family Educational Rights and Privacy Act (FERPA) which protects personally identifiable information (PII) from education records from disclosure without consent. This presentation will cover the basics of FERPA, exceptions to the consent requirement, and the implications of the January 2012 regulatory changes on data sharing and research.

Return to top

Title: Estimating the Binomial N

  • Speaker: William Link, Ph.D., Patuxent Wildlife Research Center, Laurel, Maryland
  • Chair: Mike Fleming
  • Date/time: Wednesday, April 18, 2012 12:30 - 1:30 p.m.
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Agriculture and Natural Resources

Abstract:

The binomialdistribution B(N,p) is one of the first encountered in an elementary statistics course; estimation and modeling of the success parameter p are routine problems for the applied statistician. In this seminar, we consider the problem of estimating the binomial index N. The problem is of relevance in wildlife studies, in human demographics, in sociology, in health studies, and quality control: X individuals in a population of size N are observed; we wish to make an inference about the size of the population. Mark-recapture studies suppose X is distributed as a binomial with index N and success parameter p and use auxiliary data to obtain an estimate p_hat; one can estimate N by X/p_hat.

Strictly speaking, identification of individuals isn't necessary for estimation of the binomial N. Instead, what is needed is replication. If Xi are iid B(N,p), their mean is Np and their variance Np(1-p); thus the variance/mean ratio is a consistent estimate of (1-p). Thus p_hat = 1-S^2/Xbar and N_hat = Xbar/p_hat suggest themselves as estimators. Indeed, since the sample variance and mean are U-statistics, we can establish asymptotic normality and consistency of the estimators without too much difficulty.

The problem seems straightforward enough, but nonetheless has a long and not terribly encouraging history, dating back to work by Fisher (1942) and Haldane (1942); explanations for the "erratic behavior of estimators of N" and a review of related work are given by Hall (1994). In this talk we review some of the difficulties associated with estimation of the binomial N, examine recent developments based on model-based replication, and suggest diagnostic criteria allowing confidence in their application.

Point of contact e-mail: wlink@usgs.gov

Return to top

Title: Measuring Sexual Identity in Federal Surveys

  • Panelists: Kristen Miller (NCHS), Susan Newcomer (NICHD)
  • Date & Time: April 19, 2012, 12:30pm to 2:00pm
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Public Policy and Methodology Chairs

Abstract:

The Institute of Medicine reported in March, 2011 that lesbian, gay, bisexual, transgender (LGBT) and other sexual and gender minority populations experience substantial health risks. Improved measurement of sexual identity in Federal surveys will support greater effectiveness of health interventions and services for individuals within those groups. Research to date on the construct and design of existing sexual identity questions will be presented, and test results of a revised version for use in the National Health Interview Survey will be discussed.

Return to top

Title: On Statistical Inference in Meta-Regression

Abstract:

The explanation of heterogeneity, that occurs when combining results of different studies sharing a common goal, is an important issue in meta-analysis. Besides including a heterogeneity parameter in the analysis, it is also important to understand the possible causes of heterogeneity. A possibility is to incorporate study-specific covariates in the model that account for between-trial variability. This leads to what is known as the random effects meta-regression model. In this talk, we will discuss the commonly used methods for meta-regression and propose a new method based on generalised inference. Higher order likelihood methods will also be considered.

Return to top

Title: Transitioning to the New American FactFinder - a 1/2 day training session

  • Speakers: Robert Chestnut and Nathan Ramsey, U.S. Census Bureau
  • Chair: Deborah Griffin, U.S. Census Bureau
  • Date & Time: Wednesday, April 25, 12:30 - 3:30p.m.
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program, ASA Social Statistics Section

Abstract:

The American FactFinder is the primary vehicle used by the Census Bureau to disseminate survey and census results to data users. The Census Bureau released a new version of the American FactFinder in 2011 and eliminated the legacy system in early 2012. The functionality and organization of the new American FactFinder differs markedly from the legacy system. The Census Bureau has designed this training session to provide an overview of the new American FactFinder using live demonstrations. We expect this session to help seasoned legacy system users gain confidence in this transition and new data users learn about this powerful data access tool. We will include applications to access information from the 2010 Census and the American Community Survey. Census Bureau staff will share shortcuts and tips. We will allow sufficient time to answer specific questions from session participants.

Note that this training session is by reservation only and capacity is limited. We will schedule a second training session if registration interest exceeds available seating.

Return to top

Title: Simulation based Bayes Procedures for Three 21st Century Key Research Issues

Abstract:

A class of adaptive sampling methods is introduced for efficient posterior and predictive simulation. The proposed methods are robust in the sense that they can handle target distributions that exhibit non-elliptical shapes such as multimodality and skewness. The basic method makes use of sequences of importance weighted Expectation Maximization steps in order to efficiently construct a mixture of Student-t densities that approxi- mates accurately the target distribution — typically a posterior distribution, of which we only require a kernel — in the sense that the Kullback-Leibler divergence between target and mixture is minimized. We label this approach Mixture of t by Importance Sampling and Expectation Maximization (MitISEM). The constructed mixture is used as a candidate density for quick and reliable application of either Importance Sampling (IS) or the Metropolis-Hastings (MH) method. We also introduce three extensions of the basic MitISEM approach. First, we propose a method for applying MitISEM in a sequential manner, so that the candidate distribution for posterior simulation is cleverly updated when new data become available. Our results show that the com- putational effort reduces enormously, while the quality of the approximation remains almost unchanged. This sequential approach can be combined with a tempering approach, which facilitates the simulation from densities with multiple modes that are far apart. Second, we introduce a permutation-augmented MitISEM approach. This is useful for importance or Metropolis-Hastings sampling from posterior distributions in mixture models without the requirement of imposing identification restrictions on the model's mixture regimes' parameters. Third, we propose a partial MitISEM approach, which aims at approximating the joint distribution by estimat- ing a product of marginal and conditional distributions. This division can substantially reduce the dimension of the approximation problem, which facilitates the application of adaptive importance sampling for posterior simulation in more complex models with larger numbers of parameters. Our results indicate that the proposed methods can substantially reduce the computational burden in econometric models like DCC or mixture GARCH models and a mixture instrumental variables model.

Return to top

Title: Bivariate Nonparametric Maximum Likelihood Estimator With Right Censored Data

  • Speaker: Dr. Michail Sverchkov, Bureau of Labor Statistics
  • Date/Time: Thursday, April 26, 2012, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

In the analysis of survival data, we often encounter situations where the response variable (the survival time) T is subject to right censoring, but the covariates Z are completely observable. To use the nonparametric approach (i.e., without imposing any model assumptions) in the study of the relation between the right censored response variable T and the completely covariate variable Z, one natural thing to do is to estimate the bivariate distribution function F_o(t, z) of (T, Z) based on the bivariate data which are right censored in one coordinate - we called it BD1RC data. In this article, we derive the bivariate nonparametric maximum likelihood estimator (BNPMLE) F_n(t,z) for F_o(t, z) based on the BD1RC data, which has an explicit expression and is unique in the sense of empirical likelihood. Other nice features of F_n(t,z) include that it has only nonnegative probability masses, thus it is monotone in bivariate sense, while these properties generally do not hold for most existing distribution estimators with censored bivariate data. We show that under BNPMLE F_n(t,z), the conditional distribution function (d.f.) of T given Z is of the same form as the Kaplan-Meier estimator for the univariate case, and that the marginal d.f. F_n(\infty,z) coincides with the empirical d.f. of the covariate sample. We also show that when there is no censoring, F_n(t,z) coincides with the bivariate empirical distribution function. For the case with discrete covariate Z, the strong consistency and weak convergence of F_n(t,z) are established. The extension of our BNPMLE F_n(t,z) to the case with p-variate Z for p>1 is straightforward. This work is joint with Tonya Riddlesworth.

Return to top

Title: Subjective Probability: Its Axioms and Acrobatics

Abstract:

The meaning of probability has been enigmatic, even to the likes of Kolmogorov, and continues to be so. It is fallacious to claim that the law of large numbers provides a definitive interpretation.

Whereas the founding fathers, Kardano, Pascal, Fermat, Bernoulli, de Moivre, Bayes, and Laplace, took probability for granted, the latter day writers, Venn, von Mises, Ramsey, Keynes, deFinetti, and Borel engaged in philosophical and rhetorical discussions about the meaning of probability. Entering into the arena were also physicists like Cox, Jeffreys, and Jaynes and philosophers like Carnap, Jeffrey, and Popper. Interpretation matters because the paradigm used to process information and act upon it, is determined by perspective.

The modern view is that the only philosophically and logically defensible interpretation of probability is that probability is not unique, that it is personal, and therefore subjective. But to make subjective probability mathematically viable, one needs axioms of consistent behavior. The Kolmogorov axioms are a consequence of the behavioristic axioms. In this expository talk, I will review these more fundamental axioms and point out some of the underlying acrobatics that have led to debates and discussions. Besides mathematicians, statisticians, and decision theorists, the material here should be of interest to physical, biological, and social scientists, risk analysts, and those engaged in the art of "intelligence" (Googling, code breaking, hacking, and eavesdropping).

Return to top

Title: Information about Dependence in the Absence and Presence of a Probable Cause

  • Speaker: Ehsan S. Soofi, Sheldon B Lubar School of Business, University of Wisconsin-Milwaukee
  • Time: Friday, April 27th 11:30-12:30 pm
  • Place: Funger 320 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Statistics. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

In general, dependence is more complicated than that could be measured by the traditional indices such as the correlation coefficients, its nonparametric counterparts, and the fraction of variance reduction. An information measure of dependence, known as the mutual information, is increasingly being used in the traditional as well as more modern problems. The mutual information, denoted here as M, measures departure of a joint distribution from the independent model. We also view M as an expected utility of variables for prediction. This view integrates ideas from the general dependence literature and the Bayesian perspectives. We illustrate the success of this index as a "common metric" for comparing the strengths of dependence within and between families of distributions in contrast with the failures of the popular traditional indices. For the location-scale family of distributions, an additive decomposition of M gives the normal distribution as the unique minimal dependence model in the family. An implication for practice is that the popular association indices underestimate the dependence of elliptical distributions, severely for models such as t distributions with low degrees of freedom. A useful formula for M of the convolution of random variables provides a measure of dependence when the predictors and the error term are normally distributed jointly or individually, as well as under other distributional assumptions. Finally, we draw attention to a caveat: M is not applicable to continuous variables when their joint distribution is singular, due to a "probable cause" for the dependence. For an indirect application of M to singular models, we propose a modification of the mutual information index, which retains the important properties of the original index and show some potential applications.

Return to top

Title: Introduction to Statistics Without Borders and Discussion of the Global Citizen Year Project

  • Speakers: Gary Shapiro, Chair Statistics Without Borders; and Shari McGee
  • Chair: Steve Pierson, Director of Science Policy, American Statistical Association
  • Date/Time: Wednesday, May 9, 2012 / 12:30 - 2:00 p.m.
  • Location: New Offices of Mathematica-MPR, 1101 First Street NE, 12th Floor, Washington DC 20002, near L Street, north of Union Station
  • Directions and Remote Viewing: To be placed on the attendance list for webinar and phone viewing, please RSVP to Bruno Vizcarra at bvizcarra@mathematica-mpr.com or (202) 484-4231 at least 1 day in advance of the seminar (in-person attendees do not need to RSVP). Provide your name, affiliation, contact information (email is preferred) and the seminar date. Once on the list, you will be provided with information about webinar and phone viewing. For those who choose to attend in person, Mathematica is located at 1100 1st Street, NE, 12th Floor, Washington, DC 20002. If traveling by Metro, take the Red Line to either the New York Ave Station or Union Station. From the New York Ave Station, follow signs to exit at M Street out of the station and walk 1 block west on M street and 2 blocks south on 1st Street (the building will be on your right). From Union Station, walk north along 1st Street for about 4-5 blocks until you reach L Street (the building will be on your left after crossing L street). If traveling by car, pay parking is available in the building parking garage, which is located 1 block east of North Capitol on L Street NE. Once in the building, take the elevators to the 12th floor and inform the secretary that you are attending the WSS seminar. Please call Mathematica's main office number (202 484-9220) if you have trouble finding the building.
  • Sponsors: WSS Human Rights Program, DC-AAPOR, and Capital Area Social Psychological Association

Abstracts:

1. Introduction to Statistics Without Borders

Gary Shapiro, Chair Statistics Without Borders

Gary will briefly discuss the mission and current status of Statistics Without Borders (SWB), an all-volunteer Outreach Group of the American Statistical Association that provides services in the area of international health. He will then briefly discuss a few of the projects that SWB has and is working on. This talk will serve as an introduction to the main presentation by Shari McGee, on work SWB did for the organization Global Citizen Year.

2. Discussion of Global Citizen Year Project

Shari McGee, Lillian Park, and Geoffrey Urland

The Global Citizen Year (GCY) project consisted of data analysis of the year abroad survey, categorizing open-ended responses and further analysis of key questions chosen by the GCY group to evaluate participant satisfaction with the year abroad program. The statisticians also helped GCY calculate its Net Promoter Score (NPS), a management tool used to determine the loyalty of an organization's customers (in the case of GCY, students). The NPS is highly regarded as a more effective measurement alternative to traditional customer research. Shari's presentation will focus on this work, the outcome, and the lessons learned.

For further information contact Michael P. Cohen at mpcohen@juno.com or 202-403-6453.

Return to top

Title: Analysis of Multi-server Ticket Queues with Customer Abandonment

Abstract:

"Ticket Queues" are the new generation of queuing systems that issue tickets to the customers upon their arrival. The ticket queues differ from the physical queues in terms of the amount of information available to the customers upon their arrival. This study aims at analyzing the system performance of the multi-server ticket queues with reneging and balking customers, who periodically observe their position in the queue and reevaluate their decisions on whether to abandon the system or not. We model the ticket queues using a Markov chain model, and develop two accurate and effective approximation heuristics. These valuation tools enable us provide a method to analyze abandonment probabilities in real systems. Using our analytical model, we analyze the ticket queue data set of a bank to propose a method for separation of customers' reneging and balking probability.

Return to top

Title: The Value of Risk Models, Including Models with SNPs, for Breast Cancer Prevention

  • Speaker: Mitchell H. Gail, M.D., Ph.D, Biostatistics Branch, Division of Cancer Epidemiolgoy and Genetics, NCI
  • Date/Time: Friday, May 18th 11:30am-12:30pm
  • Location: Executive Plaza North (EPN), Conference Room J, 6130 Executive Boulevard, Rockville MD. Photo ID and sign-in required.
  • Metro: Get off at the White Flint stop on the red line, and take Nicholson lane to Executive blvd. Make a Right and continue, crossing Old Georgetown Rd. When the road bends to the right make a left turn to enter the executive plaza complex parking lot. EPN will be the right most of the two twin buildings.
  • Map: http://dceg.cancer.gov/images/localmap.gif
  • Sponsor: Public Health and Biostatistics Section, WSS and the NCI

Abstract:

I define the absolute risk of breast cancer, sometimes called "crude risk" or "cumulative incidence", and discuss its applications in advising individual patients at risk of breast cancer and in public health applications. Deciding whether or not to take tamoxifen to prevent breast cancer is an example of the former. Designing prevention trials, implementing "high risk" prevention strategies, and using risk estimates to allocate prevention resources under cost constraints are examples of the latter. The distribution of absolute risk in the general population plays a key role in assessing the utility of a risk model in these applications and in assessing how much additional risk factors, such as genotypes of single nucleotide polymorphisms (SNPs), improve performance.

Return to top

Title: 2012 President's Invited Seminar: State of the Statistical System

  • Speaker: Katherine Wallman, Chief Statistician, OMB
  • Chair: Jonaki Bose, WSS President
  • Date & Time: Wednesday, May 23, 2012, 12:30 p.m. to 2:30 p.m.
  • Location: Bureau of Labor Statistics, Conference Center, Room 7 To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Washington Statistical Society
  • Note: Video conferencing will not be available.

Abstract:

Katherine Wallman serves as Chief Statistician at the United States Office of Management and Budget. She provides policy oversight, establishes priorities, advances long-term improvements, and sets standards for a Federal statistical establishment that comprises more than 80 agencies spread across every cabinet department. She will be discussing the various challenges that the current federal statistical system is facing, what works and what we should be thinking about.

Return to top

Title: Evaluating the Environmental Protection Agency's Leadership Development Workshops

  • Speaker: Eduardo S. Rodela, Ph.D., Program Manager, U.S. Environmental Protection Agency, Office of Human Resources Leadership Development Institute
  • Chair: Mel Kollander
  • Date/time: Monday, June 11, 2012 12:30 - 1:30 p.m.
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Agriculture and Natural Resources

Abstract:

Eduardo will discuss the training evaluation strategy for the Environmental Protection Agency'sa Leadership Development, Excellent in Supervision Module One workshops. He will describe how he uses Kirkpatrick's four levels of evaluation as the foundation for training evaluation. While he will describe Kirkpatrick's four levels, Eduardo will emphasize Level Three evaluations that focus on "critical behaviors" learned by Module One participants during the workshops. The four levels are: (1) customer satisfaction, (2) learning, (3) critical behaviors, and (4) outcomes. Eduardo also drew on the work of Robert Brinkerhoff to generate the questionnaire portion of the evaluation protocol. The evaluation protocol is a multi-method approach to collect program data; the protocol specifies the use of questionnaires, focus groups, and individual interviews with workshop participants. In addition, plans are currently being drawn to involve participant supervisors' feedback to generate data regarding training participants' behavioral changes.

Eduardo will discuss the training evaluation strategy for the Environmental

Point of contact e-mail: esrodela@cox.net

Return to top

Title: June 2012 AIR Psychometrician Group Meeting

Abstract:

The 2012 AIR Psychometrician Group invites you to participate in its June 14th meeting. Professor J. Patrick Meyer from the University of Virginia will be discussing Features of jMetrik.

jMetrik is a free and open source Java application for psychometric computing. This talk will discuss the algorithms in jMetrik for nonparametric and parametric item response theory and characteristic curve plotting. These methods will be demonstrated with real data sets with attention to computational details. Finally, the source code library will be described to encourage other to make use of and contribute to it.

J. Patrick Meyer is an Assistant Professor in the Curry School of Education at the University of Virginia. His research focuses on methodological issues in educational measurement. Recent areas of focus include applications of generalizability theory to observational measures and test equating in item response theory. Patrick is also the founder and lead developer of jMetrik, an open source psychometric software application. He was awarded the 2010 Brad Hanson Award for Contributions to Educational Measurement by the National Council on Measurement in Education for his work with jMetrik.

Return to top

Title: Nonresponse Modeling in Repeated Independent Surveys in a Closed Stable Population

  • Speaker: Eric Falk, Defense Manpower Data Center, and Fritz Scheuren, NORC
  • Chair: Paul Drugan, Federal Voting Assistance Program
  • Date/time: Tuesday, June 19 12:30 to 2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center, Room 3
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: Washington Statistical Society

Abstract:

Models of survey unit nonresponse in times past were, typically, implicit (e.g., Oh and Scheuren 1983). If nonresponse was sizable, then adjustments were made. These generally employed some form of a missing-at-random mode (e.g. Rubin 1983). One such model (Zhang and Scheuren 2011), explicit this time, is discussed in our presentation. The application is to large repeated independent cross-section samples from a closed population. The units are voting commissions, which are required to report on the details of each national election within a reasonable period afterwards. Typically, despite the force law there is a sizable amount of nonreporting in this application. The talks discuss how this problem might be handled statistically. Naturally, covered also are some model limitations and areas for future research that might be instrumented for the upcoming Presidential and Congressional Elections this fall.

Return to top

Title: The Bayesian Paradigm for Quantifying Uncertainty

  • Speaker: Nozer D. Singpurwalla, Distinguished Research Professor of Statistics, The George Washington University, Washington, D.C.
  • Chair/Organizer: Wendy L. Martinez, Bureau of Labor Statistics
  • Date/time: Wednesday, August 8, 2012, 1:00 - 2:30 p.m.
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Defense and National Security Section

Abstract:

This talk is a conversational overview of the key philosophical and foundational essentials of the Bayesian paradigm for quantifying uncertainty and for decision making under uncertainty. The presentation should be accessible to those with limited or no background in probability and statistics, though those with such backgrounds should also benefit in the sense that several misconceptions as to what Bayesian inference is all about will be clarified.

Recently there has been an explosion in the use of Bayesian methods, this explosion being mainly driven by computer scientists involved with data mining, classification, and imaging who have masterfully harnessed Bayesian ideas into workable tools. Similarly, with those involved with risk analysis and matters of homeland security. The aim of this talk is to shed light on the fundamental thinking which drives the tools.

Point of contact: Wendy Martinez, martinez.wendy@bls.gov

Return to top

Title: Bayesian Quantile Regression with Endogenous Censoring

  • Speaker: Professor Yuan Liao, Department of Mathematics, UMCP
  • Date/Time: Thursday, September 6, 2012, 3:30 pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

This talk presents a new framework on quantile regression with censored data, based on a quasi-Bayesian approach.

Traditional approach on censored data has been assuming that conditional on regressors, the survival time is independent of censoring. Such an assumption is restrictive in many cases and may fail whenever the censoring mechanism is endogenous (e.g., when there is something else determining the survival time and censoring simultaneously). The proposed new framework will allow endogenous censoring.

There are three highlights of the talk:

  1. We allow arbitrary dependence between survival time and censoring, even after conditioning on regressors.
  2. In this case the regression coefficient is either point or partially identified by a set of moment inequalities (shown by Khan and Tamer 09 in J. Econometrics), then the set in which the parameter is identified becomes the target of interest, which may not be a singleton.
  3. We propose a Bayesian approach based on empirical likelihood, which is robust whenever the applied researchers are less sure about the true likelihood. Other moment-conditional based Bayesian approach such as Bayesian GMM (Hansen 82) will work too.

We will show the posterior consistency, i.e., asymptotically the empirical likelihood posterior will concentrate on a neighborhood around the "truth" (either the true coefficient parameter or its identified set, depending on whether or not it is identified). We will also generalize these techniques to a more general instrumental variable regression with interval censored data, which has many applications in economics and social sciences.

Return to top

Title: The Measurement and Behavior of Uncertainty: Evidence from the ECB Survey of Professional Forecasters

Abstract:

We use matched point and density forecasts of output growth and inflation from the ECB Survey of Professional Forecasters to derive measures of forecast uncertainty, forecast dispersion and forecast accuracy. We construct uncertainty measures from aggregate density functions as well as from individual histograms. The uncertainty measures display countercyclical behavior, and there is evidence of increased uncertainty for output growth and inflation since 2007. The results also indicate that uncertainty displays a very weak relationship with forecast dispersion, corroborating the findings of other recent studies that disagreement is not a valid proxy for uncertainty. In addition, we find no correspondence between movements in uncertainty and predictive accuracy, suggesting that time-varying conditional variance estimates may not provide a reliable proxy for uncertainty. Last, using a regression equation that can be interpreted as a (G)ARCH-M-type model, we find limited evidence of linkages between uncertainty and levels of output growth and inflation.

Return to top

Title: Flexible Bayesian Models for Process Monitoring of Paradata Survey Quality Indicators

  • Speaker: Dr. Joseph L. Schafer, Area Chief for Statistical Computing Center for Statistical Research & Methodology, U.S. Census Bureau
  • Date/Time: Thursday, September 13, 2012, 3:30 pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

As data-collecting agencies obtain responses from survey participants, they are also gleaning increasingly large amounts of paradata about the survey process: number of contact attempts, interview duration, reasons for nonresponse, and so on. Paradata may be used to assess the performance of field staff, to describe the effects of interventions on the data collection process, and to alert survey managers to unexpected developments that that may require remedial action. With direct visual inspection and simple plotting of paradata variables over time, it may be difficult to distinguish ordinary random fluctuations from systematic change and long-term trends. The field of Statistical Process Control (SPC), which grew in the context of manufacturing, provides some computational and graphical tools (Shewhart control chart) for process monitoring. Those methods, however, generally assume that the process mean is stable over time, and they may be ill-suited to paradata variables that are often "out of control." In this talk, I present a flexible class of semiparametric models for monitoring paradata which allow the mean function to vary over time in ways that are not specified in advance. The mean functions are modeled as natural splines with penalties for roughness that are estimated from the data. These splines are allowed to vary over groupings (e.g., regional offices, interview teams and interviewers) by creating a generalized linear mixed model with multiple levels of nested and crossed random effects. I describe efficient Markov chain Monte Carlo strategies for simulating random draws of model parameters from the high-dimensional posterior distribution and produce graphical summaries for process monitoring. I illustrate these methods on monthly paradata series from the National Crime Victimization Survey.

Return to top

Title: Small Area Confidence Bounds on Small Cell Proportions in Survey Populations

Abstract:

Motivated by the problem of 'quality filtering' of estimated counts in American Community Survey (ACS) tables, and of reporting small-domain coverage results from the 2010 decennial-census Post-Enumeration Survey (PES), this talk describes methods for placing confidence bounds on estimates of small proportions counts within cells of tables estimated from complex surveys. While Coefficients of Variation are generally used in measuring the quality of estimated counts, they do not make sense for assessing validity of very small or zero counts. The problem is formulated here in terms of (upper) confidence bounds for unknown proportions. We discuss methods of creating confidence bounds from small-area models including synthetic, logistic, beta-binomial, and variance-stabilized (arcsine square root transformed) linear models. The model-based confidence bounds are compared with single-cell bounds derived from arcsine-square-root transformed binomial intervals with survey weights embodied in the "effective sample size". The comparison is illustrated on county-level data about Housing-Unit Erroneous Enumeration status from the 2010 PES.

The primary methods of the talk are "small area estimation", a kind of empirical Bayes model-based prediction relevant to survey problems, with some discussion of parametric-bootstrap methods for interval estimation.

This talk is based on joint work with Aaron Gilary and Jerry Maples of the Census Bureau.

Return to top

Title: Coalescence in Branching Processes

Abstract:

Consider a branching tree. Go to the nth generation. If there are at least two vertices in that generation pick two of them at random by srswor (simple random sampling without replacement) and trace their lines of descent back in time till they meet. Call that generaion Xn. Do the same thing with all individuals in the nth generation. Call that Yn. In this talk we discuss the distributions of Xn and Yn and their asymptotics for Galton Watson trees as n goes to infinity for single and multitype cases for the four cases: subcritical, critical, supercritical, explosive. Applications to branching random walks will also be discussed.

Return to top

Title: Adding One More Observation to the Data

  • Speaker: Professor Abram Kagan, University of Maryland College Park
  • Date/Time: Thursday, September 20, 2012, 3:30 pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

We discuss ways of incorporating the (n+1)-st observation in an estimator of a population characteristic based on the previous n observations. Non-parametric estimators such as the empirical distribution function, the sample mean and variance are jackknife extensions and lack novelty. Classical estimators in the parametric models are expected to be innovative. We prove it for the Pitman estimators of a location parameter and illustrate by a few more examples.

Return to top

Title: Hospitals clustering via semiparametric Bayesian models: Model based methods for assessing healthcare performance

  • Speaker: Francesca Ieva, Dipartimento di Matematica "F.Brioschi", Politecnico di Milano
  • Time: Friday, September 21st 3:30 pm - 4:30 pm
  • Place: Duques 553 (2201 G Street, NW, Washington, DC 20052). Followed by wine and cheese reception.
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Statistics. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

A Bayesian semiparametric mixed effects models is presented for the analysis of binary survival data coming from a clinical survey on STEMI (ST segment Elevation Myocardial Infarction), where statistical units (i.e., patients) are grouped by hospital of admission. The idea is to exploit the flexibility and potential of such models for carrying out model-based clustering of the random effects in order to profile hospitals according to their effects on patient's outcome. Our focus is on the advantages of "model- based" clustering of the hospitals provided by the semiparametric assumption on random effects and on prediction of the binary patient's outcome (in-hospital survival) in case of the strongly unbalanced share. The optimal clustering is obtained by minimising suitable loss functions which improves the capability of the model in predicting failures in presence of strongly unbalanced shares. The proposed Bayesian approach provides a new way for classifying patients by taking advantage of the posterior predictive credibility intervals of the outcome. The methods have been applied to a real dataset from the STEMI Archive, a clinical survey on patients affected by STEMI and admitted to hospitals in Regione Lombardia, whose capital is Milano.

Return to top

Title: The Mismeasure of Group Differences in the Law and the Social and Medical Sciences

  • Speaker: James P. Scanlan, Attorney at Law, Washington DC
  • Date/Time: 2:50pm, Tuesday, September 25th, 2012
  • Location: Bentley Lounge, Gray Hall, American University
  • Directions: Metro RED line to Tenleytown-AU. AU shuttle bus stop is next to the station. Please see campus map on http://www.american.edu/media/directions.cfm for more details
  • Contacts: Stacey Lucien, 202-885-3124, mathstat@american.edu & Prof. Stephen Casey, scasey@american.edu
  • Sponsor: American University Department of Mathematics and Statistics Colloquium

Abstract:

In the law and the social and medical sciences efforts to appraise the size of differences reflected by rates at which demographic groups experience an outcome generally rely on relative differences in favorable or adverse outcomes, absolute differences between rates, and odds ratios. These measures are problematic because they tend to be systematically affected by the overall prevalence of an outcome. Most notably, the rarer an outcome the greater tends to be the relative difference in experiencing it and the smaller tends to be the relative difference in avoiding it. Thus, for example, lowering test cutoffs tends to increase relative differences in failure rates while reducing relative differences in pass rates; reducing mortality tends to increase relative differences in mortality while reducing relative differences in survival; relaxing lending criteria tends to increase relative differences in loan rejection rates while reducing relative differences in loan approval rates. Absolute differences and odds ratios tend also to be affected by the overall prevalence of an outcome, though in a more complicated way than the two relative differences. This presentation will illustrate these patterns and explain how the failure to understand them undermines appraisals of group differences. It will also explain a sound method for measuring the difference reflected by a pair of outcome rates.

Return to top

Title: Interdisciplinary Methods for Prediction and Confidence Sets

Abstract:

The incorporation of methodology from disparate fields to answer scientific questions has become increasingly common in the age of massive data sets. This talk will discuss two areas of statistics, prediction and confidence sets, and interdisciplinary approaches that can be used to answer specific problems within these subspecialties. First, we generate a prediction function in an epidemiology study using a flexible machine learning ensembling approach that combines multiple algorithms into a single algorithm, returning a function with the best cross-validated mean squared error. Second, we discuss an algorithm for the construction of valid confidence regions for the optimal regime in sequential decision problems using linear programming.

Return to top

Title: Examining Moderated Effects of Additional Adolescent Substance Use Treatment: Structural Nested Mean Model Estimation using Inverse-Weighted Regression-With-Residuals

Abstract:

An effect moderator is a measure which tempers, specifies, or alters the effect of treatment. Moderators can be used to explain the heterogeneity of treatment (exposure) effects. In clinical and public health practice, they are often the basis for individualizing treatment. This talk considers the methodological problem of assessing effect moderation using data arising from non-experimental, longitudinal studies in which treatment is time-varying and so are the covariates thought to moderate its effect.

The talk is motivated by a longitudinal data set of 2870 adolescent substance users who are followed over the course of one year, with measurement occasions at baseline/intake and every 3 months thereafter. Treatment receipt and substance use frequency over the past 3 months, and a large number of other covariates are measured at each occasion. Using this data set, we examine the moderated time-varying effects of additional adolescent substance use treatment on future substance use, conditional on past time-varying frequency of use (the candidate time-varying moderator).

We employ a Structural Nested Mean Model (SNMM; Robins, 1994) to formalize the moderated time-varying causal effects of interest. We present an easy-to-use estimator of the SNMM which combines an existing regression-with-residuals (RR) approach with an inverse-probability-of-treatment weighting (IPTW) strategy. In previous work (Almirall, Ten Have, Murphy 2010; Almirall, McCaffrey, Ramchand, Murphy 2011), we discuss how the RR approach identifies the moderated time-varying effects if the candidate time-varying moderators are the sole time-varying confounders. The combined RR+IPTW approach identifies the moderated time-varying effects in the presence of an additional, auxiliary set of known and measured putative time-varying confounders, which are not candidate time-varying moderators of scientific interest. (In the substance use example, this auxiliary set of covariates is large.) Further, we discuss problems with the traditional regression estimator, clarify the distinction between time-varying effect moderation vs time-varying confounding, and, if time permits, we discuss commonalities and differences between the (more commonly used) Marginal Structural Model (MSM; Robins, Hernan, Brumback 2000) and the SNMM.

Return to top

Title: Inference for High Frequency Financial Data: Local Likelihood and Contiguity

  • Speaker: Prof. Per Mykland (University of Chicago)
  • Date/Time: Thursday, October 4th at 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

This talk presents a new framework on quantile regression with censored data, based on a quasi-Bayesian approach.

Recent years have seen a rapid growth in high frequency financial data. This has opened the possibility of accurately determining volatility and similar quantities in small time periods, such as one day or even less. We introduce the types of data, and then present a local parametric approach to estimation in the relevant data structures. Using contiguity, we show that the technique quite generally yields asymptotic properties (consistency, normality) that are correct subject to an ex post adjustment involving asymptotic likelihood ratios. Several examples of estimation are provided: powers of volatility, leverage effect, and integrated betas. The approach provides substantial gains in transparency when it comes to defining and analyzing estimators. The theory relies on the interplay between stable convergence and measure change, s and on asymptotic expansions for martingales.

Return to top

Title: Assessing the Relative Performance of Absolute Penalty and Shrinkage Estimation in Weibull Censored Regression Models

Abstract:

In this talk we address the problem of estimating the vector of regression parameters in the Weibull censored regression model. Our main objective is to provide natural adaptive estimators that significantly improve upon the classical procedures in the situation where some of the predictors may or may not be associated with the response. In the context of two competing Weibull censored regression models (full model and candidate sub-model), we consider an adaptive shrinkage estimation strategy that shrinks the full model maximum likelihood estimate in the direction of the sub-model estimate. Further, we consider LASSO strategy and compare the relative performance with the shrinkage estimators. Monte Carlo simulation study reveals that when the true model is close to the candidate sub-model, the shrinkage strategy performs better than the LASSO strategy when, and only when, there are many inactive predictors in the model. The suggested estimation strategies are applied to a real data set from Veteran's administration lung cancer study to illustrate the usefulness of the procedures in practice.

Return to top

22nd MORRIS HANSEN LECTURE

  • Speaker: Ken Prewitt, Vice President for the Office of Global Centers and Carnegie Professor of Public Affairs at Columbia University
    Ken Prewitt is known for his dynamic and provocative takes on the challenges and opportunities for the federal statistical system in the 21st century. He is vice president for Global Centers and Carnegie professor of public affairs at Columbia University, with appointments in the School of International and Public Affairs and the Department of Political Science. He is a former member of the Committee on National Statistics and current chair of the advisory committee to the Division of Behavioral and Social Sciences at the National Research Council/National Academy of Sciences. He served as director of the U.S. Census Bureau (1998-2001), director of the National Opinion Research Center, president of the Social Science Research Council, and senior vice president of the Rockefeller Foundation. He is a fellow of the American Academy of Arts and Sciences, the American Academy of Political and Social Science, the American Association for the Advancement of Science, and the Center for the Advanced Study in the Behavioral Sciences. He earned his B.A. from Southern Methodist University, his M.A. from Washington University, and his Ph.D. in political science from Stanford University. From 1965 to 1982 he was a professor at the University of Chicago.
  • Discussants:
    Margo Anderson, professor of history and urban studies at the University of Wisconsin-Milwaukee and noted historian of the U.S. census
    Dan Gaylin, executive vice president of NORC at the University of Chicago and formerly senior advisor for research and planning at the U.S. Department of Health and Human Services
  • Date: October 9, 2012
  • Location: Jefferson Auditorium of the U.S. Department of Agriculture's South Building (Independence Avenue, SW, between 12th and 14th Streets); Smithsonian Metro Stop (Blue/Orange Lines). Enter through Wing 5 or Wing 7 from Independence Ave. (The special assistance entrance is at 12th & Independence). A photo ID is required.

Description:

The production of social knowledge is never independent of its institutional base (think monasteries and religious knowledge). In this talk, I discuss the role of the "Westats" (Westat, NORC, RTI, Abt, Mathematica, etc.) in partnering the expansion of government support for (and influence over) policy and research-relevant survey databases and in facilitating the 1960s arrival of "big social science." How have--and why it is important that--the contract houses avoid the partisanship now prevalent among think tanks? The answer instructs us in whether social science can engage sites where power roams and yet not compromise the praised principle--"speak truth to power."

The lecture will be at 3:30 pm and a reception will follow at 5:30 pm. The lecture will be held at the Jefferson Auditorium of the U.S. Department of Agriculture's South Building (Independence Avenue, SW, between 12th and 14th Streets); Smithsonian Metro Stop (Blue/Orange Lines). A photo ID is required.

Further details will be forthcoming.

Return to top

Title: Statistics and Audit Sampling with Application to the Eloise Cobell Indian Trust Case

  • Speakers: Mary Batcher and Fritz Scheuren
  • Chair/Organizer: Daniel Lee
  • Date/Time: October 11, 2012 / 12:30 - 2:00 p.m.
  • Location: New Offices of Mathematica-MPR, 1101 First Street NE, 12th Floor, Washington DC 20002, near L Street, north of Union Station

    Mathematica is located at 1100 1st Street, NE, 12th Floor, Washington, DC 20002. If traveling by Metro, take the Red Line to either the New York Ave Station or Union Station. From the New York Ave Station, follow signs to exit at M Street out of the station and walk 1 block west on M street and 2 blocks south on 1st Street (the building will be on your right). From Union Station, walk north along 1st Street for about 4-5 blocks until you reach L Street (the building will be on your left after crossing L street).

    If traveling by car, pay parking is available in the building parking garage, which is located 1 block east of North Capitol on L Street NE. Once in the building, take the elevators to the 12th floor and inform the secretary that you are attending the WSS seminar. Please call Mathematica's main office number (202 484-9220) if you have trouble finding the building.
  • Remote Viewing: To be placed on the attendance list for webinar and phone viewing, please RSVP to Alyssa Maccarone at amaccarone@mathematica- mpr.com or (202) 250-3570 at least 1 day in advance of the seminar (in- person attendees do not need to RSVP). Provide your name, affiliation, contact information (email is preferred) and the seminar date. Once on the list, you will be provided with information about webinar and phone viewing.
  • Sponsors: WSS Human Rights Program, DC-AAPOR, and Capital Area Social Psychological Association

Abstract:

Probability and judgment samples are extensively used in financial audits. Probability samples need no discussion in this audience. Judgment sampling, even though much maligned, can have a useful role in a discovery context when conducting a compliance audit. Still, probability samples are required if attribute error rates are sought. Seemingly forgotten in recent years is that when problems are found through a discovery judgment sample, there is no direct way to accurately estimate the impact of a control failure. Without a probability sample follow up, conclusions, even if based on multiple judgment samples, can be misleading. The use of now standard meta-analysis tools for multiple judgment samples simply does not work. The nonstatistical intuition is that multiple judgment samples can, somehow, be combined and a stronger inference made. That is simply untrue and has caused much harm in some important cases. The just settled Cobell Indian Trust Case will be used as an example.

Papers: Background on Audit Sampling - Mary Batcher, mary.batcher@ey.com, (202) 327-6773
Application of Audit Sampling in the Elouise Cobell Case - Fritz Scheuren, Scheuren@aol.com, (202) 320-3446

Point of contact: Michael P. Cohen, mpcohen@juno.com, 202-403-6453

Return to top

Title: Some Recent Developments of the Support Vector Machine

  • Speaker: Dr. Yufeng Liu (Department of Statistics and Operation Research & Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill)
  • Date/Time: Thursday, October 11th at 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

The Support Vector Machine (SVM) has been a popular margin-based technique for classification problems in both machine learning and statistics. It has a wide range of applications, from computer science to engineering to bioinformatics. As a statistical method, the SVM has weak distributional assumptions and great flexibility in dealing with high dimensional data. In this talk, I will present various aspects of the SVM as well as some of its recent developments. Issues including statistical properties of the SVM, multi-category SVM, as well as class probability estimation of the SVM will be discussed. Applications in cancer genomics will be included as well.

Return to top

Title: Longitudinal High-Dimensional Data Analysis

Abstract:

We introduce a flexible inferential framework for the longitudinal analysis of ultra-high dimensional data. Typical examples of such data structures include, but are not limited to, observational studies that collect imaging data longitudinally on large cohorts of subjects. The approach decomposes the observed variability into three high dimensional components: a subject-specific random intercept that quantifies the cross-sectional variability, a subject-specific slope that quantifies the dynamic irreversible deformation over multiple visits, and a subject-visit specific imaging deviation that quantifies exchangeable or reversible visit-to-visit changes. The model could be viewed as the ultra-high dimensional counterpart of random intercept/random slope mixed effects model. The proposed inferential method is very fast, scalable to studies including ultra-high dimensional data, and can easily be adapted to and executed on modest computing infrastructures. The method is applied to the longitudinal analysis of diffusion tensor imaging (DTI) data of the corpus callosum of multiple sclerosis (MS) subjects. The study includes 176 subjects observed at a total of 466 visits. For each subject and visit the study contains a registered DTI scan of the corpus callosum at roughly 30,000 voxels.

Return to top

Title: Robust Statistics and Applications

Abstract:

In this presentation, I will talk some robust techniques, including some new techniques developed in recent years. These techniques cover robust regression, quantile regression, robust singular value decomposition, and Bayesian robust methods. Robust regression detects outliers and provides resistant results in the presence of outliers. Quantile regression computes conditional quantile functions and conducts statistical inference on regression quantiles without any distributional assumptions. Robust single value decomposition combines alternating regression with robust techniques and can be used for robust classification. Bayesian robust methods implement robust statistics in Bayesian adaptive fitting and achieve high efficiency and robustness. Applications with these techniques will be demonstrated using the SAS procedures ROBUSTREG and QUANTREG, which are SAS implementations of these techniques.

Refreshments will begin at 9:45 am.

Return to top

Title: Extropy: A complementary dual of entropy

Abstract:

This article provides for a completion to theories of information based on entropy, which is recognised as formulating only part of a dual bivariate measure of a probability mass function. In so doing, it resolves a longstanding question in the axiomatisation of entropy as proposed by Shannon and highlighted in renewed concerns expressed by Jaynes. We introduce a companion measure to entropy that we suggest be called the extropy of a distribution. The entropy and the extropy of an event distribution are identical. However, this identical measure bifurcates into distinct measures for any quantity that is not merely an event indicator. We display several theoretical and geometrical properties of the proposed extropy measure. As with entropy, the maximum extropy distribution is also the uniform distribution; and both measures are invariant with respect to permutations of their mass functions. However, the two measures behave quite differently in their assessments of the refinement of a distribution. This behaviour is the property that concerned both Shannon and Jaynes. Together, the (extropy, entropy) pair identifies uniquely the permutation classes of mass functions in the unit-simplex. In a discrete context, the extropy measure is approximated by a variant of Gini's index of heterogeneity when the maximum probability mass is small. This is related to the ``repeat rate'' of a mass function as studied by Turing and Good. The formal duality of entropy and extropy is specified via the relationship between the entropies and extropies of course and fine partitions. The extropy of a multiple-outcome probability mass function turns out to equal a location/scale transform of the entropy of a general complementary mass function. The continuous analogue of differential extropy turns out to equal the negative integral of the square of the density function, an integral well-known in physical theory via the square of a wave function. It has a similar relation to relative extropy as does differential entropy to the relative entropy with respect to a uniform density. The relative extropy measure for densities constitutes a natural companion to the much studied Kullback Leibler directed distance.

Return to top

Title: A Statistical Paradox

  • Speaker: Dr. Abram Kagan (Department of Mathematics, UMCP)
  • Date/Time: Thursday, October 18th at 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

An interesting paradox observed recently by Moshe Pollak of Hebrew University will be presented. It deals with comparing the conditional distribution of the number of boys in a family having at least m boys (m given) with that in a family in which the first m children are boys.

Return to top

Title: Kolmogorov Stories

  • Speaker: Academician Asaf Hajiev, Azerbaijan National Academy of Sciences and Baku State University
  • Time: Friday, October 19th 4:00 pm - 5:00 pm
  • Place: Duques 651 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences, the Department of Decision Sciences and the Department of Statistics. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

Asaf Hajiev is a Professor of Mathematics at Baku State University and is a Corresponding Member of the Azerbaijan National Academy of Sciences. He received his doctorate in probability theory from Moscow State University under the supervision of Yuri Belyaev with a specialization in queueing and reliability. His current interests are in probability modeling and statistical inference. He has been a visiting professor at many institutions, including UC Berkeley and Bogacizi University in Istanbul. During his student days at Moscow State University he had first hand interactions with Kolmogorov, as a teacher, a mentor, an advisor, and a friend. In this talk Asaf will relate his experiences with Kolmogorov with a slant towards the personal and the non-academic, and give us some interesting stories about Kolmogorov's modus operandus with his colleagues, students, and a bevy of scientists and mathematicians who visited him.

Academician Hajiev is unusual among us. Besides his teaching and research duties he is also in public service as a Member of Parliament in Azerbaijan, representing his home district of Ganja. Science education is one among his many portfolios.

Return to top

Title: Detection of Structural Breaks and Outliers in Time Series

Abstract:

Often, time series data exhibit nonstationarity in which segments look stationary, but the whole ensemble is nonstationary. In this lecture, we consider the problem of modeling a class of non-stationary time series with outliers using piecewise autoregressive (AR) processes. The number and locations of the piecewise autoregressive segments, as well as the orders of the respective AR processes are assumed to be unknown and each piece may be contaminated with an unknown number of innovational and e minimum description length principle is applied to compare various segmented AR fits to the data. The goal is to find the "best" combination of the number of segments, the lengths of the segments, the orders of the piecewise AR processes, and the number and type of outliers. Such a "best" combination is implicitly defined as the optimizer of a MDL criterion. Since the optimization is carried over a large number of configurations of segments and positions of outliers, a genetic algorithm is used to find optimal or near optimal solutions. Strategies for accelerating the procedure will also be described. Numerical results from simulation experiments and real data analyses show that the procedure enjoys excellent empirical properties. The theory behind this procedure will also be discussed. (This is joint work with Thomas Lee and Gabriel Rodriguez-Yam.)

Return to top

Title: Weight calibration and the survey bootstrap

  • Speaker: Stas Kolenikov
  • Chair: Charles Day, Substance Abuse and Mental Health Services Administration
  • Date/Time: October 22, 2012 / 12:30 pm - 3:30 pm
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 3 business days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsors: WSS Methodology Program

Abstract:

In this talk, Dr. Kolenikov will discuss the interplay between weight calibration, aimed at increasing precision of the survey estimates, and resampling variance estimation procedures, namely the jackknife and the family of bootstrap methods. He will introduce weight calibration using the pseudo-empirical likelihood objective function. In a simulation study based on a 5 percent sample from the 2000 U. S. Census, the various variance estimators will be compared in terms of bias, stability, and accuracy of the confidence interval coverage.

Return to top

U.S. Census Bureau
DSMD Distinguished Seminar Series

Title: Uses of Models in Survey Design and Estimation

  • Presenter: Dr. Richard Valliant, University of Michigan and Joint Program in Survey Methodology at the University of Maryland
  • Discussant: Model Checking Using Survey Data, Dr. Joseph Sedransk, Case Western Reserve University and Joint Program in Survey Methodology at the University of Maryland
  • Date: Monday, October 22, 2012
  • Time: 2:00 pm - 3:15 pm
  • Where: Conference Rooms 1&2, U.S. Census Bureau, 4600 Silver Hill Road, Suitland, Maryland
  • Contact: Cynthia Wellons-Hazer, 301-763-4277, Cynthia.L.Wellons.Hazer@census.gov

Abstract:

Survey statisticians rely on a list of probability distributions when designing samples and selecting estimators. These can be explicit or implicit and include a randomization distribution for sample selection, structural superpopulation models to describe analysis variables, a random coverage model to describe omissions from the frame, a model for how units respond, and a model for imputing missing data, among others. This talk surveys some of the uses of models to design samples and construct estimators. Topics will include approximation of optimum selection probabilities, creation of strata, relationship of balanced sampling to methods used in practice, using models to evaluate candidate estimators, selecting covariates for estimators, and the use of paradata in constructing models for nonresponse adjustment. A little of the history of the sometimes controversial use of models in surveys will also be discussed.

Return to top

DC-AAPOR AND WSS PROUDLY PRESENT THE 2012 HERRIOT AWARD WINNER

Title: Issues in the Evaluation of Data Quality for Business Surveys

  • Speaker: Paul Biemer
  • Chair: Jill Dever, RTI International
  • Date/Time: October 23, 2012 / 12:30 - 2:00pm
  • Location: Bureau of Labor Statistics, Conference Center, Rooms 1 and 3
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 3 business days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsors: WSS Methodology Program

Abstract:

This presentation focuses on a number of key issues in the evaluation of data quality for business surveys. Business surveys have many of the same error sources as household surveys: errors due to nonresponse, measurement, data processing, and the frame. But there are additional sources as well; for example, errors due to company profiling, estimates revision, and combining data from multiple sources to produce national accounts statistics. Moreover, some error sources - such as editing error and specification error - have even greater importance in business surveys than in household surveys. Evaluating the data quality can be particularly challenging yet there is scant literature on the topic for business surveys. This overview lecture will consider some of the main techniques for evaluating business survey data quality and will describe some recent developments in the field.

The Social Statistics and Government Statistics Sections of the American Statistical Association (ASA) along with the Washington Statistical Society (a chapter of ASA) established the Roger Herriot Award for Innovation in Federal Statistics. The award is intended to recognize individuals who develop unique and innovative approaches to the solution of statistical problems in federal data collection programs.

Return to top

Title: On the Nile problem by Ronald Fisher

  • Speaker: Dr. Yaakov Malinovsky (Dept. of Mathematics and Statistics, UMBC)
  • Date/Time: Thursday, October 25th at 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

The Nile problem by Ronald Fisher may be interpreted as the problem of making statistical inference for a special curved exponential family when the minimal sucient statistic is incomplete. The problem itself and its versions for general curved exponential families pose a mathematical-statistical challenge: studying the subalgebras of ancillary statistics within the -algebra of the (incomplete) minimal sucient statistics and a closely related question on the structure of UMVUEs. In the talk a new method is presented that proves that in the classical Nile problem no statistic subject to mild natural conditions is a UMVUE. The result almost solves an old problem on the existence of the UMVUEs. The method is purely statistical (vs. analytical) and required the existence of an ancillary subalgebra. An analytical method that uses only first order ancillarity (and thus works in the setups when the existence of an ancillary subalgebra is an open problem) proves the nonexistence of UMVUEs for curved exponential families with polynomial constraints on the parameters. (Joint work with Abram Kagan)

Tea Served After Seminar in Room 3201.

Return to top

Title: Modeling of Complex Stochastic Systems via Latent Factors

Abstract:

Factor models, and related statistical tools for dimension reduction, have been widely and routinely used in psychometric, item response theory, geology, econometric and biological, amongst many other fields, since the late 1960's when Karl G. Joereskog, a Swedish statistician, proposed the first reliable numerical method for maximum likelihood estimation (MLE) in factor analysis (Joereskog, 1969). Such developments happened, certainly not by chance, around the same time the computer industry was experiencing major advances. From a Bayesian perspective, Martin and McDonald (1975) showed that MLE suffers from several inconsistency issues (for instance, negative idiosyncratic variances). Nonetheless, Bayesian researchers themselves could not produce general algorithms for exact posterior inference for factor models until the early 1990's when the computer industry had another wave of major advances and Markov chain Monte Carlo (MCMC) schemes were almost instantly customized for all fields cited above. In this talk, my goal is to illustrate how such advances, both in factor modeling and statistical computing, have driven my own research in financial econometrics, spatio-temporal modeling and macro- and microeconomics, among others.

Return to top

Title: Marginal Additive Hazards Model for Case-cohort Studies with Multiple Disease Outcomes

  • Speaker: Jianwen Cai, Ph.D., Professor and Associate Chair, Department of Biostatistics, University of North Carolina at Chapel Hill, Gillings School of Global Public Health
  • Date: Friday, October 26, 2012
  • Time: 10:00-11:00 am
  • Q&A: 11:00-11:30 am
  • Location: Warwick Evans Conference Room, Building D, 4000 Reservoir Rd, Washington, DC
  • Directions: http://dbbb.georgetown.edu/mastersprogram/visitors/. Medical Campus map: http://bit.ly/T1UsxK
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematicsl Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

We consider fitting marginal additive hazards regression models for case-cohort studies with multiple disease outcomes. Most modern analyses of survival data focus on multiplicative models for relative risk using proportional hazards models. However, in many biomedical studies, the proportional hazards assumption might not hold or the investigators are often interested in risk differences. The additive hazards model, which model the risk differences, has often been suggested as an alternative to the proportional hazards model. We consider a weighted estimating equation approach for the estimation of model parameters. The asymptotic properties of the proposed estimators are derived and their finite sample properties are assessed via simulation studies. The proposed method is applied to the Atherosclerosis Risk in Communities (ARIC) Study for illustration.

Return to top

Title: Sparse estimation for estimating equations using decomposable norm-based regularizers

Abstract:

We propose a new estimating equation-based regularization method for simultaneous estimation and variable selection. Our method can be used even when the number of covariates exceeds the number of samples, and can be implemented using well-studied algorithms from the non-linear constrained optimization literature. Furthermore, for a certain class of estimating equations and a certain class of regularizers, which includes the lasso and group lasso, we prove a finite-sample probability bound on the accuracy of our estimator. Our research was motivated by practical problems, from a genome-wide association study of non-small-cell lung cancer patients and a clinical trial of therapies for head and neck cancer, that are difficult to analyze under the likelihood setting. In simulations we show that our procedure outperforms competing methods, and we use it to analyze the aforementioned studies. (Joint work with Dave Zhao from UPenn).

Return to top

Title: On the Foundations and Philosophy of Info-Metrics

Abstract:

Info-metrics is the science and art of quantitatively processing information and inference. It crosses the boundaries of all sciences and provides the universal mathematical and philosophical foundations for inference with finite, noisy or incomplete information. Info-metrics lies in the intersection of information theory, inference, mathematics, statistics, complexity, decision analysis and the philosophy of science. From mystery solving to the formulation of all theories — we must infer with limited and blurry observable information. The study of info-metrics helps in resolving a major challenge for all scientists and all decision makers of how to reason under conditions of incomplete information. Though optimal inference and efficient information processing are at the heart of info-metrics, these issues cannot be developed and studied without understanding information, entropy, statistical inference, probability theory, information and complexity theory as well as the meaning and value of information, data analysis and other related concepts from across the sciences. In this talk I will discuss some of the issues related to information and information processing. I will concentrate on the basic problem of inference with finite information and with a minimal set of assumptions or structure and will discuss some of the open questions in that field.

Return to top

Title: Case Studies in Nutrition and Disease Prevention: what went wrong?

  • Speaker: Harold Seifried, Ph.D., Acting Chief, Nutrional Science Research Group, DHHS/NIH/NCI
  • Date/Time: Friday, November 2nd, 1:30am-3:30pm
  • Location: Executive Plaza North (EPN), Conference Room H, 6130 Executive Boulevard, Rockville MD.
    Photo ID and sign-in required. Metro: Get off at the White Flint stop on the red line, and take Nicholson lane to Executive blvd. Make a Right and continue, crossing Old Georgetown Rd. When the road bends to the right make a left turn to enter the executive plaza complex parking lot. EPN will be the right most of the two twin buildings.
  • Map: http://dceg.cancer.gov/images/localmap.gif
  • Sponsor: Public Health and Biostatistics Section, WSS and the NCI

Abstract:

Many times in the design, analysis or interpretation of biomedical studies, the best intentioned investigators with highest level of integrety commit grave errors by overlooking some crucial aspect of the study design and this results in drawing the wrong conclusions from the study results. This is especially true in prevention studies having a diet or a dietary supplement intervention. We will present well known studies: of dietary fat versus breast cancer incidence, two efficacy studies, iressa and tarceva, a cancer prevention study of beta carotein and cancer, in which some of the errors committed involve misjudging the true differences in level of exposure, failure to characterize and target appropriate sub-populations and impropper study selection in design of a meta-analysis.

Return to top

Title: Jigsaw Percolation: Which networks can solve a puzzle?

Abstract:

I will introduce a new mathematical model for how individuals in a social network might combine partial ideas to solve a complex problem. For the simplest version of this theoretical model, my collaborators and I have determined which networks can and cannot solve a large class of puzzles. I will present our results and some open questions.

Return to top

Title: Molecular Gene-signatures and Cancer Clinical Trials

  • Speaker: Mei-Ling Ting Lee, Ph.D., Professor and Chair, Department of Epidemiology and Biostatistics, Director, Biostatistics and Risk Assessment Center (BRAC) Editor-in-Chief: Lifetime Data Analysis, University of Maryland, College Park
  • Date: Friday, November 2, 2012
  • Time: 12:00-1:00
  • Place: Room 107, Howard Hall, 660 W. Redwood St
  • Sponsors: Division of Biostatistics, University of Maryland Marlene and Stewart Greenebaum Cancer Center & Department of Epidemiology and Public Health, University of Maryland, Baltimore (UMB).

Abstract:

Over the last dozen years the process to develop molecular biomarkers and genomic tests for assessing the risk of cancer and cancer recurrence has been evolving. High-throughput technologies have increased the rate of discovery of potential new markers and facilitated the development of composite gene signatures that provide prognostic or predictive information about tumors. The traditional method to assess the risk of cancer recurrence is based on clinical/pathological criteria. The conventional design has been challenged, especially when the diseases may be heterogeneous due to underlying genomic characteristics.

Recently there has been an increase in cancer clinical trials using gene signatures to assess cancer aggressiveness. For example, in some breast cancer studies, it was hypothesized that by using newly developed gene-signature tools one can identify subgroup of patients who will respond significantly to post-surgery (adjuvant) chemotherapy. Future treatments can then be designed to the individual person receiving it and therefore spare the side effects of treatment to a large subgroup of potentially non-responsive patients.

On the other hand, a parallel goal is to identify what is the best treatment for patients: chemotherapy or hormonal therapy. It is important to note that, if one of the major goals of using genomic biomarkers is to move closer to individualized treatment, the biomarkers or gene signatures need to be both prognostic and predictive. Many studies with genomic biomarker and clinical investigations have been conducted in the past few years. In this talk, we review these investigations and results

Note: Part of the UMB Biostatistics and Quantitative Research Workshop and Seminar Series. This seminar series intends to be a forum for current and ongoing methodological development in bioinformatics and biostatistics that can potentially influence cancer research as well as cancer studies with a strong quantitative basis. The series includes both chalk talk on ongoing research/open unresolved problems and more formal presentation from both inside and outside speakers.

Return to top

Title: Privacy-Utility Paradigm using Synthetic Data

  • Speaker: Anand N. Vidyashankar, Ph.D., Department of Statistics, Volgeneau School of Engineering, George Mason University
  • Chair: Mike Fleming
  • Date/time: Thursday, November 8, 2012 12:30 - 1:30 p.m..
  • Location: Bureau of Labor Statistics, Conference Center
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources

Abstract:

Data confidentiality is an important issue that arise in several applications involving privacy and security encompassing areas such as healthcare, social networks, and financial applications. Synthetic data is one of the tools used to address data privacy concerns and differential privacy is a metric to evaluate the quality of privacy determined by a perturbation mechanism. In this talk, we first describe a class of methods for generating synthetic data in high-dimensions and the role of copulas and mixed-effects models. Second, we describe the properties of data generated using the proposed methodology and methods for determining the parameters of the perturbation process accounting for privacy and utility. Third, we describe a robust methodology for inference concerning the parameters of the process and study the properties of these estimators. Finally, we apply our methodology to several data sets and evaluate the trade-off between privacy and utility. Applications to network data will also be presented.

Point of contact: Anand Vidyashankar, avidyash@gmu.edu

Return to top

Title: Penalized Quantile Regression for in Ultra-high Dimensional Data

  • Speaker: Professor Runze Li, Department of Statistics, Penn State University
  • Date/Time: Thursday, November 8th, 2012, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this talk, I first introduce recent development on the methodology and theory of nonconvex penalized quantile linear regression in ultra-high dimension. I further propose a two-stage feature screening and cleaning procedure to study the estimation of the index parameter in heteroscedastic single-index models with ultrahigh dimensional covariates.

Sampling properties of the proposed procedures are studied. Finite sample performance of the proposed procedure is examined by Monte Carlo simulation studies. A real example example is used to illustrate the proposed methodology.

Return to top

Title: Some Statistical Issues in Diagnostic Studies with Three Ordinal Disease Stages

  • Speaker: Lili Tian, Ph.D., Associate Professor and Director of Graduate Studies, Department of Biostatistics, University of Buffalo
  • Date: Friday, November 9, 2012
  • Time: 10:00-11:00 am
  • Q&A: 11:00-11:30 am. Please reply to Lindsay Seidenberg (lcb48@georgetown.edu) if you are interested in meeting for 30 minutes with the seminar speaker in the afternoon.
  • Location: Warwick Evans Conference Room, Building D, 4000 Reservoir Rd, Washington, DC.
  • Directions: http://dbbb.georgetown.edu/mastersprogram/visitors/. Medical Campus map: http://bit.ly/T1UsxK
  • Parking: Metered street parking is available along Reservoir Road. To park on campus, drive into Entrance 1 via Reservoir Road and drive straight back to Leavey Garage. Parking fees are $3.00 per hour.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematicsl Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

Receiver-operating characteristic (ROC) curve is a useful tool for assessing the accuracy of a classifier or predictor for diseases with binary classes, e.g., diseased vs. nondiseased, and the area under the ROC curve (AUC) has been widely used as a quantitative index of discriminating ability of a continuous biomarker. In practice, there exist many disease processes (e.g. Alzheimers disease) with three ordinal disease stages, i.e. "non-diseased", "early stage" and "diseased". This talk will mainly address two important issues under such a setting: 1) How to combine several markers to increase the diagnostic accuracy; 2) How to estimate the diagnostic ability of a marker for detection of early disease stage. Parametric and nonparametric statistical methods will be presented for both issues. The proposed solutions for these two issues are applied to a real data set from a cohort study of Alzheimer's disease (AD) from the Washington University Knight Alzheimer's Disease Research Center.

Return to top

Title: On the dynamic control of matching queues

Abstract:

We consider the optimal control of matching queues with dynamically arriving jobs. Jobs arrive to their dedicated queues and wait to be matched with jobs from other (possibly multiple) queues. Our approach to this problem falls within the now broad literature on the optimal control of stochastic processing networks and may be considered a specialization of that theory to this context -- in the spirit of studies conducted for (capacitated) parallel server queues. While our model is somewhat reminiscent of the latter, a fundamental distinguishing feature of matching queues is the duality of demand and supply whereby arriving jobs play simultaneously the role of "customers" waiting for service and of supply of matching-opportunities for other jobs. Paralleling the notions of resource pooling and equivalent workload formulation in the capacitated queueing context, we characterize a match pooling condition and a shortage formulation that have appealing interpretations within the matching context and allow for significant simplifications to the optimal control problem. Subsequently, we offer a policy that, when implemented, provides nearly optimal performance for networks with large arrival rates regardless of whether they are balanced, unbalanced, or alternate between the two modes.

Return to top

Title: Adjusting for Nonresponse in the Occupational Employment Statistics Survey

  • Organizer: Dan Liao, WSS Methodology Program Chair
  • Chair: Dan Liao, WSS Methodology Program Chair
  • Speaker: Nicholas Horton, Smith College
  • Discussants: Nathaniel Schenker, National Center for Health Statistics
  • Date/time: November 14, 2012 / 12:30 p.m. - 2:00 p.m.
  • Location: Bureau of Labor Statistics, Conference Center Room 10
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program

Abstract:

Past research indicates that employment size, industry sector, multi- establishment status, and metropolitan area size, along with important interactions, have a significant impact on an establishment's propensity to respond to the Bureau of Labor Statistics Occupational Employment Statistics survey (OES). Using administrative wage data linked to the sample, we find that these establishment characteristics are related to wages; wage estimates are a major OES outcome variable. In this paper, we investigate the use of the administrative data for imputing missing data due to nonresponse. The multiple imputation method focuses on adjusting the OES wage estimates with this auxiliary data to reduce potential bias.

Return to top

Title: Quality Assurance Tests of Tablet Content Uniformity: Small Sample US Pharmacopeia and Large Sample Tests

  • Speaker: Professor Yi Tsong, Office of Biostatistics, CDER, FDA - (Based on Joint works with Meiyu Shen, Jinglin Zhong and Xiaoyu Dong of CDER, FDA)
  • Date/Time: Thursday, November 15th, 2012, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

The small sample United States Pharmacopeia (USP) content uniformity sampling acceptance plan consists of a two-stage sampling plan with criteria on sample mean and number of out-of-range tablets was the standard for compendium. It is however often used mistakenly for quality assurance of lot. Both FDA and EMA (European Medicinal Agency) proposed large sample quality assurance tests using tolerance interval approach. EMA proposed a test using a modified two-sided tolerance interval as an extension of USP. Their quality assurances are characterized by controlling the percentage required of the lot within the pre-specified specification limits. On the other hand, FDA statisticians proposed an approach based on two one-sided tolerance intervals that provides quality assurance by controlling the below-specification (low efficacy) and above-specification (potential overdose) portions of the lot separately. FDA further proposed the large sample approach with sample size adjusted specifications in order to assure that the accepted lot will have more than 90% chance to pass the small sample USP compendia test during the lot's life time. The operating characteristic curves of the approaches are generated to characterize the approaches and demonstrate the difference between the (two) approaches.

Tea to follow in Room 3201 (3rd Floor)

Return to top

Title: Sequential Tests of Multiple Hypotheses

Abstract:

This talk concerns the following general scenario: A scientist wishes to perform a battery of experiments, each generating a sequential data stream, to investigate some phenomenon. She would like to control the overall error rate in order to draw statistically-valid conclusions from each experiment, but also to be as efficient as possible, "dropping" streams whenever possible. The between-stream data may differ in distribution and dimension but at the same time may be highly correlated, even duplicated exactly in some cases.

Treating each experiment as a hypothesis test and adopting the familywise error rate (FWER) metric, we give a general framework for combining a battery of sequential hypothesis tests into a sequential multiple testing procedure that controls FWER, and another that controls the type I and II FWERs when alternative hypotheses are specified for each data stream along with the null hypotheses. In both versions, dramatic savings in expected sample size can be achieved relative to fixed sample, sequential Bonferroni, and other recently proposed sequential procedures, often with much less conservative error control. The proposed procedures are based on various sequential extensions of Holm's (1979) step-down procedure. If there is enough time I will also mention sequential control of false discovery rate, another popular multiple testing error metric.

This is joint work with Jinlin Song, my PhD student at USC.

Return to top

Title: Statistical Confidentiality: Modern Techniques to Protect Sensitive Cells when Publishing Tables

Abstract:

In 1994, the United Nations set out its Fundamental Principles of Official Statistics. Principle 6 states, "Individual data collected by statistical agencies for statistical compilation, whether they refer to natural or legal persons, are to be strictly confidential and used exclusively for statistical purposes". The implication of this principle is simple: statistical confidentiality is vital to the stewardship of statistical data. Astonishing advances in technology for computing and telecommunications have boosted on one trajectory the perceived benefits of statistical information and, on an opposing trajectory, anxiety about confidentiality. Data are mainly disseminated as microdata and tables, and they both demand techniques to protect sensitive information. In this seminar we will discuss and set up the basic definition of protection.

We want to address and make clear when a table is or not protected before being released. It is not a trivial concept, as several organizations may have in mind different definitions. We will concentrate the discussion on protecting tabular data, and not microdata. Once a definition of protection is established, we then analyze widely-used techniques to solve the problem of finding a protected table. These techniques include cell suppression and controlled rounding. We will discuss examples and explore the limits of the modern algorithms to effectively apply these techniques on real-world tables.

Return to top

Title: Conditional Correlation Models of Autoregressive Conditional Heteroskedasticity with Nonstationary GARCH Equations

Abstract:

We investigate the e¤ects of careful modelling the long-run dynamics of the volatilities of stock market returns on the conditional correlation structure. To this end we allow the individual unconditional variances in Conditional Correla tion GARCH models to change smoothly over time by incorporating a nonstation ary component in the variance equations. The modelling technique to determine the parametric structure of this time-varying component is based on a sequence of speci…cation Lagrange multiplier-type tests derived in Amado and Teräsvirta (2011). The variance equations combine the long-run and the short-run dynamic behaviour of the volatilities. The structure of the conditional correlation matrix is assumed to be either time independent or to vary over time. We apply our model to pairs of seven daily stock returns belonging to the S&P 500 composite index and traded at the New York Stock Exchange. The results suggest that accounting for deterministic changes in the unconditional variances considerably improves the t of the multivariate Conditional Correlation GARCH models to the data. The ef fect of careful specication of the variance equations on the estimated correlations is variable: in some cases rather small, in others more discernible. In addition, we found that portfolio volatility-timing strategies based on time-varying unconditional variances often outperforms the unmodelled long-run variances strategy in the out-of-sample. As a by-product, we generalize news impact surfaces to the situation in which both the GARCH equations and the conditional correlations contain a deterministic component that is a function of time.

Return to top

Title: Understanding and Improving Propensity Score Methods

  • Speaker: Prof. Zhiqiang Tan, Dept. of Statistics, Rutgers University
  • Date/Time: Thursday, November 29, 2012, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

Consider estimating the mean of an outcome in the presence of missing data or estimating population average treatment effects in causal inference. The propensity score is the conditional probability of non-missingness given explanatory variables. In this talk, we will discuss propensity score methods including doubly robust estimators that are consistent if either a propensity score model or an outcome regression model is correctly specified. The focus will be to understand propensity score methods (compared with those based on outcome regression) and to show recent advances of these methods.

Return to top

Title: Parametric and Topological Inference for Masked System Lifetime Data

  • Speaker: Simon Wilson, School of Computer Science and Statistics,Trinity College, Dublin, Ireland
  • Time: Friday, November 30th 3:30-4:30 pm
  • Place: Duques 650 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Sciences. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

Commonly, reliability data consists of lifetimes (or censoring information) on all components and systems under examination. However, masked system lifetime data represents an important class of problems where the information available for statistical analysis is more limited: one only has failure times for the system as a whole, but no data on the component lifetimes directly, or even which components were failed. For example, such data can arise when system autopsy is impractical or cost prohibitive. A novel signature based data augmentation scheme is presented which enables inference for a wide class of component lifetime models for an exchangeable population of systems. It is shown that the approach can be extended to enable topological inference of the underlying system design. A number of illustrative examples are included such as the usual iid exponential case, an exchangeable case and phase-type component reliability case.

Return to top

Title: Adaptive Inference After Model Selection

Abstract:

Penalized maximum likelihood methods that perform automatic variable selection have been developed, studied, and deployed in almost every area of statistical research. A prominent example is the LASSO Tibshirani (1996) with its numerous variants. It is now well-known, however, that these estimators are nonregular and consequently have limiting distributions that can be highly sensitive to small perturbations of the underlying generative model. This is the case even for the fixed "p" framework. Hence, the usual asymptotic methods for inference, like the bootstrap and series approximations, often perform poorly in small samples and require modification. Here, we develop locally asymptotically consistent confidence intervals for regression coefficients when estimation is done using the Adaptive LASSO (Zou, 2006) in the fixed "p" framework. We construct the confidence intervals by sandwiching the nonregular functional of interest between two smooth, data-driven, upper and lower bounds and then approximating the distribution of the bounds using the bootstrap. We leverage the smoothness of the bounds to obtain consistent inference for the nonregular functional under both fixed and local alternatives. The bounds are adaptive to the amount of underlying non-regularity in the sense that they deliver asymptotically exact coverage whenever the underlying generative model is such that the Adaptive LASSO estimators are consistent and asymptotically normal, and conservative otherwise. The resultant confidence intervals possess a certain tightness property among all regular bounds. Although we focus on the case of the Adaptive LASSO, our approach generalizes to other penalized methods including the elastic net and SCAD.

Refreshments will be served.

Return to top

Title: Using Safety Signals To Detect Subpopulations: A Population Pharmacokinetic/Pharmacodynamic Mixture Modeling Approach

  • Speaker: Junshan Qiu US FDA
  • Date/Time: Friday 30 November 12:00-1:00 Coffee and refreshments will be served.
  • Location: Executive Plaza North (EPN) 6130 Executive Boulevard, , Rockville MD. Room H. Photo ID and sign-in required.
  • Metro: Get off at the White Flint stop on the red line, and take Nicholson lane to Executive blvd. Make a Right and continue, crossing Old Georgetown Rd. When the road bends to the right make a left turn to enter the executive plaza complex parking lot. EPN will be the right most of the two twin buildings
  • Map: http://dceg.cancer.gov/images/localmap.gif
  • Sponsor: Public Health and Biostatistics Section, WSS

Abstract:

Safety signals generated from adverse events data not only can help make drug development decisions but also can help identify subpopulations characterized by different biomarkers. The current research focuses on a pharmacokinetic/pharmacodynamic (PK/PD) mixture modeling approach. A zero-inflated ordinal logistic regression model was used to interpret the PD data and further linked with the PK models. This PK/PD mixture model can be used to analyze the PK/PD data simultaneously. Selection of appropriate statistical algorithms to approximate and maximize the likelihood function of the PK/PD mixture model was performed based upon simulation studies. Further, the PK/PD mixture model coupled with a stochastic approximation of expectation and a maximization algorithm were used to analyze simulated data for a population with different characteristics and dosed orally with drug A. Return to top

Title: The Under-Appreciation of the Insights Provided by Non-parametric and Robust Methods in the Analysis of Data Arising in Law and Public Policy

Abstract:

Because nonparametric and robust statistical methods are valid under fewer assumptions than many parametric methods, they have the potential for obtaining sound inferences for a wide variety of data sets of modest size that are submitted as evidence in legal cases or to government agencies formulating a regulation. This talk will show how robust methods can be applied in a variety of areas, including securities law, environmental regulation, evaluating the fairness of peremptory challenges made by the prosecution in serious criminal cases and in estimating the rate of increase in income inequality in the United States from 1967-2011. In particular, it will be seen that a proper analysis of the data submitted to the SEC by Goldman Sachs in the well-known Abacus mortgage security case shows that the final Abacus portfolio performed statistically significantly worse than the universe of similar securities and that using appropriate statistical methodology would provide stronger evidence of discrimination in the North Carolina peremptory challenges made by prosecutors cases in death penalty cases. A new, robust version of the Gini index of inequality will be described, which shows that income inequality in the nation rose at about twice the rate indicated by the standard Gini index.

Return to top

Title: CRecent Developments in Machine Learning and Personalized Medicine

Abstract:

Personalized medicine is an important and active area of clinical research involving high dimensional data. In this talk, we describe some recent design and methodological developments in clinical trials for discovery and evaluation of personalized medicine. Statistical learning tools from artificial intelligence, including machine learning, reinforcement learning and several newer learning methods, are beginning to play increasingly important roles in these areas. We present illustrative examples in treatment of depression and cancer. The new approaches have significant potential to improve health and well-being.

Return to top

Title: Improving the design and Analysis of Case-Control Studies of Rare Variation in the Presence of Confounders

Abstract:

Recent advances in next-generation sequencing technology have enabled investigators to assess the role of rare genetic variation in the origins of complex human diseases. An open issue with such resequencing studies is their validity in the presence of potential confounders, such as population stratification. In this talk, I describe the use of a measure called the stratification score (defined as the odds of disease given confounders) to resolve confounding in case-control resequencing studies. First, I show how one can use the stratification score to choose a subset of cases and controls for resequencing from a larger GWAS sample that are well matched on confounders. Second, I describe how one can use the stratification score to adjust existing rare-variant association tests (many of which rely on statistical frameworks that do not allow for covariates) for confounders. We illustrate our approaches using both simulated data and real data from existing studies of psychiatric and metabolic phenotypes. This is joint work with Drs. Glen Satten and Andrew Allen.

Return to top

U.S. Census Bureau
DSMD Distinguished Seminar Series

Title: Adjustment and Stabilization: Identifying and Meeting Goals

  • Presenter: Dr. Thomas A. Louis, Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University
  • Discussant: Dr. Roderick Little, University of Michigan and U.S. Census Bureau
  • Chair: Ruth Ann Killion, Chief, Demographic Statistical Methods Division, U.S. Census Bureau
  • Date: Monday, December 10, 2012
  • Time: 10:00 am - 11:30 am
  • Where: Conference Rooms 1&2, U.S. Census Bureau, 4600 Silver Hill Road, Suitland, Maryland
  • Contact: Cynthia Wellons-Hazer, 301-763-4277, Cynthia.L.Wellons.Hazer@census.gov

Abstract:

The Centers for Medicare and Medicaid Services (CMS) annually compares hospitals with respect to mortality, readmissions, and other outcomes. Performance metrics are based on information for each hospital as to how well it performs with its patients as compared to a counterfactual hospital treating the same patients, but operating at the national norm. The CMS uses an empirical Bayes, logistic regression with a Gaussian prior to estimate and stabilize the comparisons. The approach has generated several criticisms including that it fails to reveal provider performance variation, masks the performance of small hospitals, does not incorporate hospital-level characteristics in developing shrinkage targets, and does not use them in estimating the risk model to reduce confounding. Other than the last of these, the criticisms are only in play because low-volume hospitals produce high-variance estimates that are moved considerably toward the national norm. The foregoing and related issues apply to many other contexts including small area estimation and genomics.

Using a report prepared by the Committee of Presidents of Statistical Societies as background and motivation, I identify inferential goals and outline candidate approaches to address the criticisms. Approaches include using a fixed-effects model with hospital-specific intercepts (with the associated wide confidence intervals and high year to year variation for the low volume hospitals), using hospital-level attributes in the risk model or in determining shrinkage targets, use of a prior distribution other than Gaussian, limiting the shrinkage, replacing the posterior means by ensemble estimates, and modified reporting of results. In closing, I outline other features that require consideration, for example selection effects that can bias assessments.

Return to top

Title: Uses of Models in Survey Design and Estimation

  • Speaker: Dr. Richard Valliant, University of Michigan and Joint Program in Survey Methodology at the University of Maryland
  • Date & Time: Tuesday, December 11, 11:00am-12:30 pm
  • Locations:
    1. 1208 Lefrak Hall, University of Maryland, College Park, MD 20742 (LIVE)
      Use the Green Line to College Park and take the campus bus or walk. The Mowatt Lane parking garage near Lefrak Hall also has visitor parking.
    2. Bureau of Labor Statistics, Conference Center Room 10 (We can join the seminar via video conference at this location.)
      To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
    3. National Center for Health Statistics, Hyattsville MD, NCHS Room 1406 (Only for NCHS staff)
  • Sponsor: WSS Methodology Program

Abstract:

Survey statisticians rely on a list of probability distributions when designing samples and selecting estimators. These can be explicit or implicit and include a randomization distribution for sample selection, structural superpopulation models to describe analysis variables, a random coverage model to describe omissions from the frame, a model for how units respond, and a model for imputing missing data, among others. This talk surveys some of the uses of models to design samples and construct estimators. Topics will include approximation of optimum selection probabilities, creation of strata, relationship of balanced sampling to methods used in practice, using models to evaluate candidate estimators, selecting covariates for estimators, and the use of paradata in constructing models for nonresponse adjustment. A little of the history of the sometimes controversial use of models in surveys will also be discussed.

Return to top

Seminar Archives

2017 2016 2015 2014 2013
2012 2011 2010 2009
2008 2007 2006 2005
2004 2003 2002 2001
2000 1999 1998 1997
1996 1995    

Methodology