# Washington Statistical Society Seminars: 2011

#### Title: Estimating the Distribution of Usual Daily Energy Expenditure

• Speaker: Nick Beyler, Mathematica Policy Research, Inc.
• Organizer: David Judkins, Westat, WSS Methodology Section Chair
• Chair: Brian Meekins, BLS
• Date/Time: Wednesday, January 12, 2011, 12:30-2:00pm
• Location: Bureau of Labor Statistics, Conference Center. To be placed on the seminar attendance list at the Bureau of Labor Statistics, you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.

Abstract:

When making inferences about physical activity, public health researchers are often interested in the distribution of usual (long-term average) daily energy expenditure for individuals in population groups, such as groups defined by gender and age. Because usual daily energy expenditure cannot be directly measured, self-reported activity data are collected from a sample of individuals and converted to energy expenditure. Daily energy expenditure measurements from self-report instruments contain measurement errors that lead to biased estimates of usual daily energy expenditure parameters. Activity monitors give more accurate measures of energy expenditure, but are expensive to administer. If monitors are used on a subsample of respondents, the reference monitor data can be used in a measurement error model framework to adjust estimates for the presence of errors. We propose a method for estimating usual daily energy expenditure parameters for multiple population groups. The research extends an existing method for estimating usual dietary intake distributions using 24-hour recall data (without a reference measure) for a single subpopulation. The new method involves transforming daily energy expenditure values to normality. The distribution of individuals' mean daily energy expenditure in the transformed scale is estimated using measurement error models to characterize sources of bias and random error for population groups. An estimated distribution of usual daily energy expenditure is generated in the original scale for each of the groups and parameters of usual daily energy expenditure are estimated from the estimated distributions. We illustrate our method with preliminary data from the Physical Activity Measurement Survey (PAMS).

#### Title: Statistical Disclosure: Independent Rounding of Discrete Tabular Data and Efforts to Perturb Data

• Organizer: J. Neil Russell, NCES
• Chair: Adam Safir, BLS, WSS Methodology Program Chair
• Speakers: Ramesh A. Dandekar, U.S. Department of Energy and Tom Krenzke, Westat
• Date/Time: Tuesday, January 18, 2011, 12:30-2:00 pm
• Location: Bureau of Labor Statistics, Conference Center Room 10

Abstracts:

(In)Effectiveness of Independent Rounding of Discrete Tabular Data as a Statistical Disclosure Control Strategy, Ramesh A. Dandekar

In an attempt to protect sensitive counts data (discrete data), independent rounding of tabular data cells has been proposed and implemented by statistical agencies all over the world. In this presentation, we demonstrate that such a practice results in (1) degradation of tabular data quality and (2) produces non-additive tables. This strategy also fails to provide adequate protection from statistical disclosures for low count tabular data cells.

Efforts to Perturb ACS Data for the Census Transportation Planning Products, Tom Krenzke

The main disclosure control practice that has been used on certain Census Transportation Planning Products (CTPP) tabulations is cell suppression. The underlying data for the CTPP are moving from the Census Long Form data to the smaller American Community Survey (ACS) five-year combined sample. It is clear that the data loss at finer geographic areas, such as planned travel analysis zones (TAZs), will be substantial on five-year ACS data due to Census Bureau Disclosure Review Board (DRB) disclosure rules. For this reason, research is being conducted on ways to generate perturbed ACS values in an operationally-practical way that retain the usability of the data tabulations, satisfy the transportation data user community's analytical needs, and pass the disclosure rules set by the DRB. Here we report on some initial results from an evaluation of data utility from three perturbation approaches: parametric model-based, semi-parametric model-assisted, and a constrained hotdeck.

#### Title: When Should You Sell Your Mansion?

• Speaker: Anna Amirdjanova, Ph.D, Department of Mathematics and Statistics, American University
• Date/Time: 3:35pm, Tuesday, January 18th, 2011
• Location: Bentley Lounge, Gray Hall 130, American University
• Directions: Metro RED line to Tenleytown-AU. AU shuttle bus stop is next to the station. Please see campus map on http://www.american.edu/media/directions.cfm for more details
• Contact: Stacey Lucien, 202-885-3124, mathstat@american.edu
• Sponsor: American University Department of Mathematics and Statistics Colloquium

Abstract:

In this paper a class of mixed stochastic control/optimal stopping problems arising in the problem of finding the best time to sell an indivisible non-traded real asset, owned by a risk averse utility maximizing agent, is considered. The agent has power type utility based on the $\ell_{\alpha}$-type aggregator and has access to a frictionless financial market which can be used to partially hedge the risk associated with the real asset if correlations between the financial assets and the real asset value are nonzero. The solution to the problem of finding the optimal time to sell the real asset is characterized in terms of solution to a certain free boundary problem. The latter involves a highly nonlinear partial differential equation and generalizes known cases of the Hamilton-Jacobi-Bellman equation with simpler preferences. Comparisons with the case of exponential utility are also given.

#### Title: Methodology of the Fourth National Incidence Study of Child Abuse and Neglect

• Speaker: Andrea J. Sedlak, Westat
• Chair: Arthur Kendall, Capital Area Social Psychological Association
• Date/Time: Wednesday, January 19, 12:30 - 2:00 p.m.
• Location: American Psychological Association (APA), 750 First Street NE, Washington, DC 20002-4242. (This is about 1 block north of the First Street NW Metro exit, Union Station on the Red Line. There is also pay parking at Union Station.)
• RSVP Instructions: Due to security regulations please RSVP to Ron Schlittler at (202)336-6041 or rschlittler@apa.org
• Sponsors: WSS Human Rights, DC-AAPOR, and Capital Area Social Psychological Association
• Presentation material:
Slides (pdf, ~21.7mb)

Abstracts:

The National Incidence Study of Child Abuse and Neglect (NIS) is a congressionally mandated, periodic research effort to assess the incidence of child abuse and neglect in the United States. The NIS gathers information from multiple sources to estimate the number of children who are abused or neglected children, providing information about the nature and severity of the maltreatment, the characteristics of the children, perpetrators, and families, and the extent of changes in the incidence or distribution of child maltreatment since the time of the last national incidence study.

The NIS design assumes that the maltreated children who are investigated by child protective services represent only the "tip of the iceberg," so although the NIS estimates include children investigated at child protective services they also include maltreated children who are identified by professionals in a wide range of agencies in representative communities. These professionals, called "sentinels," are asked to remain on the lookout for children they believe are maltreated during the study period. Children identified by sentinels and those whose alleged maltreatment is investigated by child protective services during the same period are evaluated against standardized definitions of abuse and neglect. The data are unduplicated so a given child is counted only once in the study estimates.

This talk will discuss the methodology for the Fourth National Incidence Study of Child Abuse and Neglect (NIS-4).

For additional information about NIS-4, the Report to Congress is complete and is available at http://www.acf.hhs.gov/programs/opre/abuse_neglect/natl_incid/index.html

For further information about the seminar, contact Michael P. Cohen, mpcohen@juno.com or (202) 232-4651.

#### Title: Providing Double Protection Against Unit Nonresponse Bias with a Nonlinear Calibration Routine

• Organizer: David Judkins, Westat
• Speaker: Phillip S. Kott, Senior Research Statistician, RTI International
• Date/Time: Wednesday, January 26, 12:30-2:00pm
• Location: Bureau of Labor Statistics, Conference Center

Abstract:

There are at least two reasons to calibrate survey weights: 1, to force estimators to be unbiased under a prediction model, and, 2, to adjust for the bias caused by unit nonresponse. When there is no nonresponse or coverage errors, many forms of calibration are asymptotically equivalent.

When there is unit nonresponse, Lundstroerm and Saerndal (JOS 1999) argue that a unit's weight adjustment under linear calibration is an estimate of the inverse of its response probability. A more natural calibration approach for response-probability modeling employs an extension of generalized raking proposed by Folsom and Singh (SRMS PROCEEDINGS 2000). Surprisingly, fitting a logistic model indirectly through calibration usually leads to smaller mean squared errors than fitting it directly through weighted or unweighted maximum likelihood methods. This is because calibration weighting produces estimators that have good properties under a linear prediction model. These properties can hold even when the response model under which the calibration was developed fails. Conversely, properties under the response-probability model do not depend on the prediction model holding.

#### Title: Convergence of Posterior Distributions in Infinite Dimension — A Decade of Success Stories

• Speaker: Subhashis Ghoshal, Department of Statistics, North Carolina State University
• Date/Time: Friday, January 28, 11-12pm
• Location: The George Washington University, Phillips Hall, Room 108, (801 22nd Street NW, Washington, DC 20052)
• Sponsor: The George Washington University, Department of Statistics

Abstract:

It was long realized that for parametric inference problems, posterior distributions based on a large class of reasonable prior distributions possess very desirable large sample convergence properties, even if viewed from purely frequentist angles. For nonparametric or semiparametric problems, the story gets complicated, but still good frequentist convergence properties are enjoyed by Bayesian methods if a prior distribution is carefully constructed. The last ten years have witnessed the most significant progress in the study of consistency, convergence rates and finer frequentist properties. It is now well understood that the properties are controlled by the concentration of prior mass near the true value, as well as the effective size of the model, measured in terms of the metric entropy. Results have poured in for independent and identically distributed data, independent and non-identically distributed data and dependent data, as well as for a wide spectrum of inference problems such as density estimation, nonparametric regression, classification, and so on. Nonparametric mixtures, random series and Gaussian processes play particularly significant roles in the construction of the "right" priors. In this talk, we try to outline the most significant developments that took place in the last decade. In particular, we emphasize the ability of the posterior distribution to effortlessly choose the right model and adapt to the unknown level of smoothness.

#### Title: Optimal Dynamic Return Management of Fixed Inventories

• Speaker: Mehmet Altug, Department of Decision Sciences, The George Washington University
• Time: Wednesday, February 2nd 11:00-12:15 pm
• Location: The George Washington University, Funger 620 (22nd and G Street NW)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences & Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm.

Abstract:

While the primaryeffort of all retailers is to generate that initial sales, return management is generally identified as a secondary issue that does not necessarily need the same level of planning. In this paper, we position return management as a process that is at the interface of both inventory and revenue management by explicitly incorporating the return policy of the retailer in consumer's valuation. We consider a retailer that sells a fixed amount of inventory over a finite horizon. We assume that return policy is a decision variable which can be changed dynamically at every period. According to a hypothesis which is quite prevalent in the retailing industry, while flexible and more generous return policies increase consumer valuation and generate more demand, they also induce more returns. In this environment, we characterize the optimal dynamic return policies based on two costs of return scenarios. We show a conditional monotonicity result and discuss how these return policies change with respect to retailer's inventory position and time. We then propose a heuristic and prove that it is asymptotically optimal. We also study the joint dynamic pricing and dynamic return management problem in the same setting and propose two more heuristics whose performance is tested numerically and found to be close to optimal for higher inventory levels. We finally extend our model to multiple competing retailers and characterize the resulting equilibrium return policy and prices.

#### Title: Weighted Empirical Likelihood, Censored Data and Logistic Regression

• Speaker: Prof. Jian-Jian Ren, University of Central Florida
• Date/Time: Thursday, February 10, 2011, 9:30am (Note time change)
• Location: Colloquium Room 3206, Math Building, University of Maryland College Park (directions).

Abstract:

In this talk, we will review the concepts of parametric likelihood and the maximum likelihood estimator (MLE), and will review the concepts of nonparametric likelihood, called empirical likelihood (Owen, 1988), and the nonparametric MLE. We then introduce a new likelihood function, called weighted empirical likelihood (Ren, 2001, 2008), which is formulated in a unified form for various types of censored data. We show that the weighted empirical likelihood method provides a useful tool for solving a broad class of nonparametric and semiparametric inference problems involving complicated types of censored data, such as doubly censored data, interval censored data, partly interval censored data, etc. These problems are mathematically challenging and practically important due to applications in cancer research, AIDS research, etc. As an example, some related new statistical methods and data examples on the logistic regression model with censored data will be presented.

#### Title: Pooling Designs for Outcomes Under a Gaussian Random Effects Model

• Speaker: Yaakov Malinovsky, National Institutes of Health, Eunice Kennedy Shriver National Institute of Child Health & Human Development, Epidemiology Branch
• Date/Time: Friday, February 11, 11-12pm
• Location: The George Washington University, Phillips Hall, Room 108 (801 22nd Street NW, Washington, DC 20052).
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm

Abstract:

Due to the rising cost of laboratory assays, it has become increasingly common in epidemiological studies to pool biospecimens. This is particularly true in longitudinal studies, where the cost of performing multiple assays over time can be prohibitive. In this work, we consider the problem of estimating the parameters of a Gaussian random effects model when the repeated outcome is subject to pooling. We consider different pooling designs for the efficient maximum-likelihood estimation of variance components, with particular attention to estimating the intraclass correlation coefficient. We evaluate the efficiency of different pooling design strategies using analytic and simulation study results. We discuss the robustness to the normal assumptions of the error and between subject variation, and consider the adaptation of our design recommendations to unbalanced designs. Further, we discuss the generalization of our design recommendations to models with additional sources of variation. The design methodology is illustrated with a longitudinal study of pre-menopausal women focusing on assessing the reproducibility of F2-isoprostane, a biomarker of oxidative stress, over the menstrual cycle.

#### Title: Two Criteria for Evaluating Risk Prediction Models

• Speaker: Ruth Pfeiffer, Ph.D., Biostatistics Branch, Div. of Cancer Epidemiology and Genetics, National Cancer Institute
• Date/Time: 3:35pm, Tuesday, February 15th, 2011
• Location: Bentley Lounge, Gray Hall 130, American University
• Directions: Metro RED line to Tenleytown-AU. AU shuttle bus stop is next to the station. Please see campus map on http://www.american.edu/media/directions.cfm for more details
• Contact: Stacey Lucien, 202-885-3124, mathstat@american.edu
• Sponsor: American University Department of Mathematics and Statistics Colloquium

Abstract:

We propose and study two criteria to assess the usefulness of models that predict risk of disease incidence for screening and prevention, or the usefulness of prognostic models for management following disease diagnosis. The first criterion, the proportion of cases followed PCF(q\$, is the proportion of individuals who will develop disease who are included in the proportion q of individuals in the population at highest risk. The second criterion is the proportion needed to follow-up, PNF(p), namely the proportion of the general population at highest risk that one needs to follow in order that a proportion p of those destined to become cases will be followed. PCF(q) assesses the effectiveness of a program that follows 100q% of the population at highest risk. PNF(p) assess the feasibility of covering 100p% of cases by indicating how much of the population at highest risk must be followed. We show the relationship of those two criteria to the Lorenz curve and its inverse, and present distribution theory for estimates of PCF and PNF. We develop new methods, based on influence functions, for inference for a single risk model, and also for comparing the PCFs and PNFs of two risk models, both of which were evaluated in the same validation data. We illustrate the methods using data from a validation study for a colorectal cancer risk prediction model.

#### Title: Predicting False Discovery Proportion Using Mixture Models

• Speaker: Anindya Roy, Department of Mathematics and Statistics, University of Maryland, Baltimore County
• Date/Time: Friday, February 25, 11-12pm
• Location: Phillips Hall, Room 108 (801 22nd Street NW, Washington, DC 20052)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm

Abstract:

We present a framework for estimating error measures in multiple testing problems. Our approach is based on modeling p-value distributions under transformed scale by mixtures of skewed distributions. The model can incorporate dependence among p-values and also allows for shape restrictions on the p-value density. A nonparametric Bayesian scheme for estimating the components of the mixture model is outlined. The scheme allows us to predict the false discovery proportion for a given sample. An expression for the positive false discovery rate is given which can be used for estimation under certain dependence structures among p-values. Some results on identifiability of the proportion of null hypothesis are also presented.

#### Title: An Application of a Two-Phase Address-Based Approach to Sampling For Subpopulations

• Organizer: David Judkins, Westat, WSS Methodology Section Chair
• Chair: Nick Beyler, Mathematica Policy Research, Inc.
• Speakers: Jill M. Montaquila, Douglas Williams, Daifeng Han, Westat
• Date/Time: Tuesday, March 1, 2011, 12:30-2:00pm
• Location: Mathematica Policy Research, Inc., Main Conference Room Virtual attendance is also available by webinar or audio feed by phone

To be placed on the seminar attendance list, please RSVP to Bruno Vizcarra at bvizcarra@mathematica-mpr.com or (202) 484-4231 by noon at least two days in advance of the seminar. Provide your name, affiliation, contact information (email is preferred) and the seminar date. Once on the seminar list, you will be provided with information about webinar and phone viewing for the seminar in case you chose to not attend in person. Mathematica is located at 600 Maryland Ave., SW, Suite 550, Washington, DC 20024. If traveling by Metro, take the Orange, Blue, Green, or Yellow Line to the L'Enfant Plaza Station and follow signs to exit at 7th and Maryland. The entrance to the building will be to your right at the top of the escalators. If traveling by car, pay parking is available in the building parking garage, which is located on 6th Street SW, across from the Holiday Inn. Once in the building, take the elevators by Wachovia to the 5th floor lobby and inform the secretary that you are attending the WSS seminar. Please call Mathematica's main office number (202 484-9220) if you have trouble finding the building or the 5th floor lobby.

Abstract:

Historically, random digit dial (RDD) sampling with computer assisted telephone interview (CATI) administration has been viewed as an efficient approach for administering subpopulation surveys. However, due to declines in response rates and coverage, alternatives to landline RDD sampling are being considered. One alternative is address-based sampling (ABS), but methods appropriate for subpopulations have not been examined adequately. In the Fall of 2009, we conducted a pilot study to evaluate ABS, with mail as the primary mode of collection, for a subpopulation. The goal was to replace a periodic survey of preschoolers and school-age children that previously had been conducted using RDD. This study included a screening phase to determine a household's eligibility, followed by a topical survey administered in eligible households.

#### Title: Skewed Factor Models and Maximum Likelihood Skew-Normal Factor Analysis

• Speaker: Beverly J. Gaucher, M.S., Department of Statistics, Texas A&M University
• Date/Time: 3:35pm, Tuesday, March 1st, 2011
• Location: Bentley Lounge, Gray Hall 130, American University
• Directions: Metro RED line to Tenleytown-AU. AU shuttle bus stop is next to the station. Please see campus map on http://www.american.edu/media/directions.cfm for more details
• Contact: Stacey Lucien, 202-885-3124, mathstat@american.edu
• Sponsor: American University Department of Mathematics and Statistics Colloquium

Abstract:

This research computes factor analysis maximum likelihood estimates for skew-normal data. Factor analysis is applied to skewed distributions for a general skew model and the skew-normal model for all sample sizes. The skewed models are formed using selection distribution theory, which is based on Rao's weighted distribution theory. The skew factor model defines the distribution of the unobserved common factors skew-normal and the unobserved unique factors Gaussian noise. The skew-normal factor analysis model's log-likelihood is derived from which the maximum likelihood factor loading estimates are calculated.

#### Title: Genomic Anti-profiles: Modeling Gene Expression and DNA Methylation Variability in Cancer Populations for Prediction and Prognosis

• Speaker: Prof. Hector Corrada Bravo, CMPS-Computer Science, UMCP
• Date/Time: Thursday, March 10, 2011, 3:30pm
• Location: Room 1313, Math Building, University of Maryland College Park (directions).

Abstract:

Predictive models of disease based on genomic measurements, e.g., gene expression or DNA methylation, usually focus on finding distinct representative profiles for healthy and diseased populations. However, some diseases, such as cancer, exhibit increased heterogeneity in the disease population. In this talk, I will discuss recent results and methods that use the idea of anti-profiles, based on the observation of increased variation in cancer populations, as predictive and prognostic models.

#### Title: Recent Results For Random Key Graphs: Connectivity, Triangles, Etc.

• Speaker: Prof. Armand Makowski , University of Maryland
• Date/Time: Thursday, March 17, 2011, 3:30pm
• Location: Room 1313, Math Building, University of Maryland College Park (directions).

Abstract:

Random key graphs, also known as uniform random intersection graphs, appear in application areas as diverse as clustering analysis, collaborative filtering in recommender systems and key distribution in wireless sensor networks (WSNs). In this last context random key graphs are naturally associated with a random key predistribution scheme proposed by Eschenauer and Gligor. In this talk we present some recent results concerning the structure of random key graphs. Similarities and differences with Erdos-Renyi graphs are given. We also discuss performance implications for the scheme of Eschenauer and Gligor. Highlights include: (i) A zero-one law for graph connectivity (and its critical scaling) as the number of nodes becomes unboundedly large; (ii) A zero-one law (and its critical scaling) for the appearance of triangles; and (iii) Clustering coefficients and the "small world" property of random key graphs. This is joint work with Ph.D. student Osman Yagan.

#### Title: Ignoring power and over-reliance on statistical significance: Implications for Dukes v. Walmart

• Speakers: Efstathia Bura and Joe Gastwirth, Department of Statistics, George Washington University
• Date/Time: Friday, March 11, 11-12pm
• Location: Phillips Hall, Room 108 (801 22nd Street NW, Washington, DC 20052)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm.

Abstract:

In situations where many individual plaintiffs have similar claims against the same defendant, it is often more efficient for them to be combined into a single class action. Statistical evidence is often submitted to establish that the members of the proposed class were affected by a common event or policy. In equal employment cases involving an employer with a number of locations or sub-units, defendants argue that the data should be examined separately for each unit while the plaintiffs wish to pool the data into one or several much larger samples or focus on a few units in which statistical significance was observed. Courts often require plaintiffs to demonstrate a statistically significant disparity in a majority of the sub-units. It will be shown that this can lead to absurd results. When many statistical tests are carried out, a small percentage will yield significant results even if there is no disparity. Using the concept of power, one can calculate the expected number, E, of sub-units in which a statistically significant result would occur if there were a legally meaningful disparity. When the observed number of units with a significant disparity is close to E, the data are consistent with a pattern of unfair treatment. When the observed number is clearly less than E, the data do not indicate a pattern of unfairness. Our reanalysis of plaintiffs' promotion data for the 40 or 41 regions in the Wal-mart case are consistent with an overall system in which the odds an eligible female had of being promoted were about 70 to 80 percent of those of a male. Applying an appropriate trend test to the summary p-values of Wal-mart's store regressions indicates a general pattern of underpayment of female employees relative to that of similarly qualified males.

#### Title: Handling Nonresponse in Longitudinal Studies: An Overview of Recent Research Developments

• Organizer: David Judkins, Westat, WSS Methodology Section Chair
• Chair: Nick Beyler, Mathematica Policy Research, Inc.
• Speaker: Jun Shao, Professor, University of Wisconsin
• Date/Time: Tuesday, March 15, 2011, 12:30-2:00pm
• Location: Mathematica Policy Research, Inc., Main Conference Room
600 Maryland Ave., SW, Suite 550,Washington, DC 20024-2512
Virtual attendance is also available by webinar or audio feed by phone
• Directions: To be placed on the seminar attendance list, please RSVP to Bruno Vizcarra at bvizcarra@mathematica-mpr.com or (202) 484-4231 by noon at least two days in advance of the seminar. Provide your name, affiliation, contact information (email is preferred) and the seminar date. Once on the seminar list, you will be provided with information about webinar and phone viewing for the seminar in case you chose to not attend in person. Mathematica is located at 600 Maryland Ave., SW, Suite 550, Washington, DC 20024. If traveling by Metro, take the Orange, Blue, Green, or Yellow Line to the L'Enfant Plaza Station and follow signs to exit at 7th and Maryland. The entrance to the building will be to your right at the top of the escalators. If traveling by car, pay parking is available in the building parking garage, which is located on 6th Street SW, across from the Holiday Inn. Once in the building, take the elevators by Wachovia to the 5th floor lobby and inform the secretary that you are attending the WSS seminar. Please call Mathematica's main office number (202 484-9220) if you have trouble finding the building or the 5th floor lobby.

Abstract:

Nonresponse often occurs in longitudinal studies. When the nonresponse mechanism depends on the observed or unobserved values of the variable subject to nonresponse, statistical analysis is a great challenge. This research presentation provides an overview of recent research developments in this problem. Several semi-parametric approaches of handling nonresponse are introduced and assumptions under which these methods produce approximately unbiased and consistent estimators are discussed. Some empirical results are also presented.

#### Title: Techniques for High Accuracy in Estimation of the Mean Squared Prediction Error in General Small Area Model

• Speaker: Dr. Snigdhansu Chatterjee, School of Statistics, University of Minnesota
• Discussant Title: Combining Multiple Sources of Existing Databases to Reduce Estimation Errors: an Alternative to New Data Collection
• Discussant: Dr. Partha Lahiri, Joint Program in Survey Methodology, University of Maryland
• Date/Time: 10:00AM-11:30AM, Wednesday, March 23, 2011
• Location: Seminar room 5K410, US Census Bureau HQ, 4600 Silver Hill Road, Suitland, MD 20746
• Direction: Metro Green Line to Suitland station
• Contact: Yang Cheng, 301-763-3287, yang.cheng@census.gov
• Sponsor: Statistical Seminar in the Governments Division, US Census Bureau

Abstract of Talk:

A general small areamodel is a hierarchical two stage model, of which special cases are mixed linear models, generalized linear mixed models and hierarchical generalized linear models. In such models, the variability of predictors (like the empirical best predictor or the empirical best linear unbiased predictor) can be quantified with their mean squared prediction error (MSPE), or other risk functions. Small area predictors and estimators for their MSPE are generally not available outside some special cases. First, we extend the notion of small area predictor and its MSPE in the presence of benchmarks, censoring or Winsorization constraints, and other standard restrictions. Second, we propose a simple resampling-based estimation of MSPE for any general small area model. Third, we propose three techniques for improving on the basic, resampling-based MSPE estimator to achieve high order accuracy. These three techniques involve multi-layered bootstrap like the double bootstrap, Taylor series-based methods, and a new method of combining limited resample-size technique with parametric bootstrap. Computational issues and other properties of these improved MSPE estimators will be discussed. We also present real data examples and applications.

Abstract of Discussant:

There is a growing demand from various federal and local government agencies to produce high quality estimates on a wide range of parameters for a target population and its different subgroups in order to assess the well-being of a nation in terms of various socio-economic and health issues. I will first discuss a few situations where there is a need for extracting relevant information from surveys, administrative records and census data using state of-the art statistical techniques and then cite some potential applications of parametric bootstrap in the Census Bureau projects. The suggested methodology implies possible cost savings since combining data would serve as an alternative to collecting new survey data and may potentially reduce estimation errors by taking advantage of the existing databases.

#### Title: Estimation and Forecasting of Dynamic Conditional Covariance: A Semiparametric Multivariate Model

Abstract:

We propose a semiparametric conditional covariance (SCC) estimator that combines the Oerst-stage parametric conditional covariance (PCC) estimator with the second-stage nonparametric correction estimator in a multiplicative way. We prove the asymptotic normality of our SCC estimator, propose a nonparametric test for the correct speciOecation of PCC models, and study its asymptotic properties. We evaluate the Oenite sample performance of our test and SCC estimator and compare the latter with that of PCC estimator, purely nonparametric estimator, and Hafner, Dijk, and Fransesis (2006) estimator in terms of mean squared error and Value-at-Risk losses via simulations and real data analyses.

#### Title: Subsample Ignorable Likelihood for Regression Analysis with Missing Data

• Speaker: Rod Little, Department of Biostatistics, University of Michigan and Associate Director for Research and Methodology, Bureau of the Census
• Date/Time: Friday, March 25, 11-12pm
• Location: Phillips Hall, Room 108 (801 22nd Street NW, Washington, DC 20052)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm.

Abstract:

Two common approaches to regression with missing covariates are complete-case analysis (CC) and ignorable likelihood (IL) methods. We review these approaches, and propose a hybrid class, subsample ignorable likelihood (SSIL) methods, which applies an IL method to the subsample of observations that are complete on one set of variables, but possibly incomplete on others. Conditions on the missing data mechanism are presented under which SSIL gives consistent estimates, but both complete-case analysis and IL methods are inconsistent. We motivate and apply the proposed method to data from National Health and Nutrition Examination Survey, and illustrate properties of the methods by simulation. Extensions to non-likelihood analyses are also mentioned. (Joint work with Nanhua Zhang)

#### Title: A Hierarchical Model for Environmental Correlated Count Processes

Abstract:

Environmental count processes are often characterized by imperfect sampling, zero-inflation, and complex spatial and temporal variability. Moreover, the effect of environmental factors (e.g., temperature) on these count processes often exhibit non-linear patterns which require advanced modeling techniques. The ability to model correlated count processes allows the researchers to better understand the underlying linkage between components of an environmental system. Recently, there is increasing interest in modeling correlated counts (e.g., correlated environmental processes; count processes from ecological communities) for inferential and prediction purposes, particularly, due to increasing interest in studying the effects of climate change on environmental processes. Hierarchical Bayesian modeling approaches provide a flexible and effective tool for modeling these complex problems. In this work, we propose a hierarchical semiparametric modeling approach for zero-inflated bivariate count processes. We present a general modeling framework with discussion of extensions to account for spatial and temporal structures. Finally, we discuss an application of a hierarchical semiparametric bivariate zero-inflated Poisson model for multi-species fish catch data from a monitoring program conducted by the U.S. Geological Survey (USGS) and the Army Corps of Engineers.

#### Title: Inference on Treatment Effects from a Randomized Clinical Trial in the Presence of Premature Treatment Discontinuation: The SYNERGY Trial

• Organizer: David Judkins, Westat
• Speaker: Marie Davidian. William Neal Reynolds Professor of Statistics, North Carolina State University
• Date/Time: Wednesday, March 30, 12:30-2:00pm
• Location: Bureau of Labor Statistics, Conference Center To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.

Abstract:

The Superior Yield of the New Strategy of Enoxaparin, Revascularization, and GlYcoprotein IIb/IIIa inhibitors (SYNERGY) trial was a randomized, open-label, multi-center clinical trial comparing two anticoagulant drugs (enoxaparin and unfractionated heparin, UFH) on the basis of various time-to-event endpoints. In contrast to those of other studies of these agents, the primary, intent-to-treat analysis did not find sufficient evidence of a difference, leading to speculation that premature discontinuation of the study agents by some subjects might have attenuated the treatment effect. As is of the case in such trials, some subjects discontinued (stopped or switched) their assigned treatment prematurely, either because occurrence of an adverse event or other condition under which discontinuation was mandated by the protocol or due to other reasons, e.g., switching to the other treatment at his/her provider's discretion (with more subjects switching from enoxaparin to UFH than vice versa). In this situation, interest often focuses on "the difference in survival distributions had no subject discontinued his/her assigned treatment," inference on which is often attempted via standard analyses where event/censoring times for subjects discontinuing assigned treatment are artificially censored at the time of discontinuation. However, this and other common ad hoc approaches may not yield reliable information because they are not based on a formal definition of the treatment effect of interest. We use SYNERGY as a context in which to describe how such an effect may be conceptualized properly and to present a statistical framework in which it may be identified, which leads naturally to the use of inverse probability weighted methods.

#### Title: On Identification of Bayesian DSGE Models

Abstract:

In recent years there has been increasing concern about the identification of parameters in dynamic stochastic general equilibrium (DSGE) models. Given the structure of DSGE models it may be dificult to determine whether a parameter is identied. For the researcher using Bayesian methods, a lack of identification may not be evident since the posterior of a parameter of interest may differ from its prior even if the parameter is unidentified. We show that this can even be the case even if the priors assumed on the structural parameters are independent. We suggest two Bayesian identification indicators that do not suffer from this dificulty and are relatively easy to compute. The Oerst applies to DSGE models where the parameters can be partitioned into those that are known to be identified and the rest where it is not known whether they are identified. In such cases the marginal posterior of an unidentiOeed parameter will equal the posterior expectation of the prior for that parameter conditional on the identified parameters. The second indicator is more generally applicable and considers the rate at which the posterior precision gets updated as the sample size ( T) is increased. For identified parameters the posterior precision rises with T, whilst for an unidentiOeed parameter its posterior precision may be updated but its rate of update will be slower than T. This result assumes that the identified parameters are √ -consistent, but similar diSerential rates of updates for identified and unidentified parameters can be established in the case of super consistent estimators. These results are illustrated by means of simple D Return to top

#### Title: Bridging the Gap — Bayesian Contributions to Survey Sampling Inference

• Organizer: Mike Brick, Westat, WSS President
• Chair: Adam Safir, BLS, WSS Methodology Program Chair
• Speaker: Murray Aitkin, Department of Mathematics and Statistics, University of Melbourne
• Discussant: TBD
• Date/Time: Tuesday, April 5, 2011, 12:30-2:00pm
• Location: Bureau of Labor Statistics, Conference Center Room 10
To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.

Abstract:

Bayesian contributions to survey sampling inference have been few, but they are very significant. This talk describes the multinomial model for finite populations, and the use of empirical likelihood and the Bayesian bootstrap for frequentist and Bayesian inference respectively. Extensions of the Bayesian approach allow for clustering and stratification in a fully Bayesian posterior framework, and also provide formal procedures for goodness of fit of parametric models.

#### Title: Statistical Analysis Using Combined Data Sources

• Speaker: Ray Chambers, Professor of Statistical Methodology Centre for Statistical and Survey Methodology
School of Mathematics and Applied Statistics
University of Wollongong
• Discussants:
Mike Elliot, Associate Professor, Biostatistics Department
Associate Research Professor, Institute for Social Research
University of Michigan
Associate Research Professor, Joint Program in Survey Methodology
University of Maryland

Nathaniel Schenker, Associate Director for Research and Methodology
National Center for Health Statistics
Centers for Disease Control and Prevention
• Date/Time: Thursday, April 7 at 3:00 PM
• Location: 1524 Van Munching Hall, University of Maryland, College Park.
• Directions: Please RSVP for the reception afterwards with Sarah Gebremicael sgebremicael@survey.umd.edu. For technical questions please contact Duane Gilbert dgilbert@survey.umd.edu. Direction can be found at: http://www.cvs.umd.edu/visitors/maps.html

Abstract:

Information is required to understand, monitor and improve any social, economic, commercial or industrial process, and there are many potential sources of auxiliary data that can be used in conjunction with specially designed data collections, of which sample surveys are a major example, to provide this information. In this context, statistical analysis that combines individual data from a number of data sources offers the prospect of substantial improvements in efficiency compared to the traditional approach of carrying out separate analyses on the data obtained from each source. The need for such data integration has increased dramatically in recent times, fuelled by continuing demand for high quality data through the integration of available sources (i.e. surveys, administrative records, census) as well as pressure for combining available data sources in order to produce improved sub-population level estimates at minimal cost. Data integration is seen as a solution to these issues. However, in spite of extensive theoretical development, statistical analysis based on integrated data still faces considerable methodological challenges. This presentation will address some of these issues, focusing on analysis of data obtained via data linking and data pooling.

#### Title: Particle Learning for Fat-tailed Distributions

• Speaker: Hedibert Lopes, University of Chicago Booth School of Business
• Time: Thursday, April 7th 11:30-12:30pm
• Place: Duques 652 (2201 G Street, NW)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm.

Abstract:

It is well-known that parameter estimates and forecasts are sensitive to assumptions about the tail behavior of the error distribution. In this paper we develop an approach to sequential inference that also simultaneously estimates the tail of the accompanying error distribution. Our simulation- based approach models errors with a t-distribution and, as new data arrives, we sequentially compute the marginal posterior distribution of the tail thickness. Our method naturally incorporates fat-tailed error distributions and can be extended to other data features such as stochastic volatility. We show that the sequential Bayes factor provides an optimal test of fat-tails versus normality. We provide an empirical and theoretical analysis of the rate of learning of tail thickness under a default Jeffreys prior. We illustrate our sequential methodology on the British pound/US dollar daily exchange rate data and on data from the 2008-2009 credit crisis using daily S&P500 returns. Our method naturally extends to multivariate and dynamic panel data.

#### Title: Hitting-time Distributions for Markov Chains

• Speaker: Jim Fill, Department of Applied Mathematics and Statistics, Johns Hopkins University
• Date/Time: Friday, April 8, 11-12pm
• Location: Phillips Hall, Room 108, (801 22nd Street NW, Washington, DC 20052)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm.

Abstract:

I will discussseveral representations of hitting-time distributions for (finite-state, ergodic, time-reversible, continuous-time) Markov chains and stochastic constructions corresponding to these representations. Examples of representations of distributions considered, each of which has a link to published work of Mark Brown, are those of

(i) the hitting time from state 0 of any given state for a birth-and-death chain on the nonnegative integers, as a convolution of exponential distributions;

(ii) the hitting time from stationarity of any given state, as a mixture of N-fold convolution powers of a certain distribution, with N geometrically distributed; and

(iii) the hitting time from stationarity of any given set of states, as a convolution of certain modified-exponential distributions that relate to the interlacing eigenvalue theorem for bordered symmetric matrices.

Intertwinings of Markov semigroups (I'll explain what these are) play a key role in the stochastic constructions.

This is joint work with my Ph.D. advisee Vince Lyzinski.

#### Title: Overview of Legal Framework Affecting Data Confidentiality and Privacy

• Chair: Grace O'Neill, U.S. Energy Information Administration
• Speaker: Jacob Bournazian, U.S. Energy Information Administration
• Date/Time: Tuesday April 12, 2011, 12:30-2:00 pm
• Location: Bureau of Labor Statistics, Conference Center, Room 10
To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
• Sponsor: Data Collection Methods, WSS
• Presentation material:
Slides (pdf, ~164kb)

Abstracts:

Federal legislation provides the framework for federal statistical agencies to adopt policies and practices for collecting and releasing statistical information. These statutes also provide the framework for providing researcher access to federal statistical data and the development of various types of restricted access research programs across the federal statistical agencies. There are two kinds of federal legislation: statutes that apply across all federal agencies and agency specific statutes. This presentation will provide an overview of the main features of relevant federal legislation that apply across all federal statistical agencies for statisticians working in the field of privacy and confidentiality.

Jake Bournazian is the Data Confidentiality officer and Privacy Point of Contact for the U.S. Energy Information Administration. He is also a member of the Tricare Management Activity Privacy Board for the U.S. Department of Defense. He was chair of the Confidentiality and Data Access Committee during 2001 - 2003. He holds a MA in Quantitative Economics from University of Delaware and a Juris Doctorate degree from The George Washington University. He is admitted to practice law in the States of Maryland, Virginia, District of Columbia, and in the U.S. District Courts for Eastern District of Virginia and the District of Columbia and U.S. Supreme Court.

#### Title: Issues Regarding the Use of Administrative Records in Federal Surveys

• Organizer: John Eltinge, BLS, WSS Past-President
• Chair: Adam Safir, BLS, WSS Methodology Program Chair
• Speakers: Nancy Bates, U.S. Bureau of the Census and Joanne Pascale, U.S. Bureau of the Census
• Discussant: TBD
• Location: Bureau of Labor Statistics, Conference Center Room 10
To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.

Abstracts:

Concerns about Privacy, Trust in Government, and Willingness to Use Administrative Records to Improve the Decennial Census

For at least twenty years, the Census Bureau has been considering the use of administrative records to enhance data collection for the decennial census. Beginning with a series of focus groups in 1992, public reactions to the potential use of administrative records have intermittently been the focus of Census Bureau research. In 1995, the Census Bureau sponsored a survey of knowledge, beliefs, and attitudes toward the Census Bureau's confidentiality practices and the use of administrative records to supplement or replace household enumeration. The following year, the survey was replicated. Two more such surveys were carried out in 1999 and 2000 and finally replicated again in 2010.

This presentation will cover two areas. First, we trace trends in attitudes toward administrative record use as well as related constructs such as privacy and confidentiality concerns, trust in government, and importance attached to the decennial census by means of the replicated surveys described above. As we shall show, these trends are predominantly negative, especially in the most recent period. Second, we model attitudes toward using administrative records to augment the census from a set of demographic characteristics, including age, gender, race, ethnicity, education, party identification, and region. We also include attitudes toward privacy, toward government, and toward the census itself as predictors. The aim is to understand the basis for attitudes toward administrative records use to the extent that this is possible.

Requesting Consent to Link Survey Data to Administrative Records: Preliminary Results from the Survey of Health Insurance and Program Participation (SHIPP)

Administrative records have begun to play a key role in survey research and, while policies regarding consent are still in flux, there is general agreement that research is needed on how to request consent from respondents to link their survey data with administrative records. Previous research found that 26 percent of those initially opposed to data sharing shifted their position when prompted with arguments about potential improvements in accuracy and reductions in cost (Singer and Presser, 1996). In order to take these findings further, in the spring of 2010 a field experiment was carried out by the US Census Bureau which included three panels, each presenting a different rationale to the respondent for data linkage: improved accuracy, reduced costs, and reduced respondent burden.

Somewhat contrary to expectations, there was no statistically significant difference in consent rates across the three versions of the consent question. Overall levels of consent, however, were rather high (84 percent), and represented a shift of more than 20 percentage points compared to a similar study in 2004. Demographic analysis indicated that age and non-response to a household income question were predictors of both levels of consent and missing data on key variables needed to make that linkage. Education was also associated with levels of consent. And finally, there was some evidence of interviewer effects; one of the three interviewer groups had a lower rate of respondent objections to consent and a higher rate of obtaining complete reporting on key fields of data used for record linkage.

#### Title: Key Management and Key Pre-distribution

• Speaker: Dr. Bimal Roy, Director, Indian Statistical Institute, Kolkata, India
• Date/Time: Wednesday, April 20, 3:30-4:30pm
• Location: Duques Hall, Room 553, (2201 G Street NW, Washington, DC 20052)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm.

Abstract:

In modern Cryptography, the security of a cryptosystem lies on secrecy of the key, not on secrecy of the encryption algorithm. Hence key management is a very important issue. There are several methods for key management, but most of these are based on Public Key Cryptography, which are typically based on Number Theory. Key Pre-Distribution is an alternative method based on Combinatorics. This method may be used for a scenario where security requirement is not so stringent.

#### Title: Bridging Livestock Survey Results to Published Estimates through State-Space Models: A Time Series Approach

• Organizer: National Agricultural Statistical Services
• Chair: Mel Kollander
• Speaker: Stephen M. Busselberg
• Date/Time: Thursday, April 21, 2011 12:30 - 1:30 p.m
• Location: Bureau of Labor Statistics, Conference Center
To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
• Sponsors: Agriculture & Natural Resources and Economics, WSS
• Point of Contact: Stephen_Busselberg@nass.usda.gov

Abstract:

Survey sampling in order to estimate livestock inventories on farms in the United States is challenging for several reasons. There exists a problem of undercoverage due to a rapidly changing industry; control data is not strongly correlated with actual inventory (therefore the sampling design is limited); and there are external sources that provide accurate information about related statistics, such as federally inspected slaughter counts, commodity imports, and exports. For these reasons the livestock inventory estimates produced from a survey sampling design are often incongruous with these nonproprietary statistics and the exact survey results are not published. The National Agricultural Statistics Service (NASS) plans to address this problem through the application of a State-Space modeling system. This presentation demonstrates through the example of the hog industry how a State-Space system, in conjunction with the Kalman Filter, can achieve estimates concordant with external data.

#### Title: Data Compression, Secrecy Generation and Secure Computation

• Speaker: Prof. Prakash Narayan, Dept. of Electrical and Computer Engineering and Institute for Systems Research
• Date/Time:Thursday, April 21st at 3:30 pm
• Location: Room 1313, Math Building, University of Maryland College Park (directions).

Abstract:

This talk addresses connections between the information theoretic notion of multiterminal data compression, secrecy generation and secure function computation. It is based on joint works with Imre Csiszar, Himanshu Tyagi and Chunxuan Ye.

Consider a situation in which multiple terminals observe separate but correlated signals and seek to devise a secret key through public communication that is observed by an eavesdropper, in such a way that the key is concealed from the eavesdropper. We show how this problem is connected to a multiterminal data compression problem (without secrecy constraints), and illustrate the connection with a simple key construction. Next, consider the situation in which the same of terminals seek to compute a given function of their observed signals using public communication; it is required now that the value of the function be kept secret from an eavesdropper with access to the communication. We show that the feasibility of such secure function computation is tied tightly to the previous secret key generation problem.

#### Title: Joint Models for Longitudinal Data Using Latent Variable Approaches

• Speaker: JJohn Jackson, Department of Statistics, George Washington University
• Date/Time: Friday, April 22, 11-12pm
• Location: Phillips Hall, Room 108, (801 22nd Street NW, Washington, DC 20052)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm.

Abstract:

Understanding the association between risky driving and crash events helps yield insight to teenage driving behavior. Further, prediction of crash events from previously observed kinematic behavior is important from a public health perspective. The Naturalistic Teenage Driving Study (NTDS) is the first U.S. study to document continuous driving performance and crash/near crash experience of newly-licensed teenagers during their first 18 months of licensure. I present two modeling approaches to this data. The first uses a binary and ordinal latent variable model for investigating these associations. Models are also developed where random effects are included to address heterogeneity among drivers propensity for risky driving; I discuss the estimation of these models using the EM algorithm for the models not including random effects, and the Monte Carlo EM algorithm for the random effects models. A second approach implements a joint model where the crash outcomes are linked to the kinematic measures with a mixed hidden Markov model for the latent states; I discuss a unique estimation procedure that readily provides insight to the hidden states as well as a computational advantage over other estimation methods.

#### Title: Statistical Analysis of Neural Spike Train Data

• Speaker: Prof. Robert Kass, Carnegie-Mellon University
• Date/Time: Tuesday, May 3, 2011, 3:30pm
• Location: Room 1313, Math Building, University of Maryland College Park (directions).

Abstract:

One of the most important techniques in learning about the functioning of the brain has involved examining neuronal activity in laboratory animals under differing experimental conditions. Neural information is represented and communicated through series of action potentials, or spike trains, and the central scientific issue in many studies concerns the physiological significance that should be attached to a particular neuron firing pattern in a particular part of the brain. Because repeated presentations of stimuli often produce quite variable neural responses, statistical models have played an important role in advancing neuroscientific knowledge. In my talk I will briefly outline some of the progress made, by many people, over the past 10 years, highlighting work my colleagues and I have contributed. I will also comment on the perspective provided by statistical thinking.

#### Title: CropScape: A New Web Based Visualization Portal for the Dissemination of NASS Geospatial Cropland Product

• Organizer: National Agricultural Statistical Services
• Speaker: Rick Mueller
• Chair: Mel Kollander
• Date/time: Wednesday, May 11, 2011 12:30 - 1:30 p.m.
• Location: Bureau of Labor Statistics Conference Center
To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
• Sponsor: Agriculture and Natural Resources, WSS
• Point of Contact: Rick_Mueller@nass.usda.gov

Abstract:

The National Agricultural Statistics Service (NASS) began producing national crop specific land cover classifications called the Cropland Data Layer (CDL) starting in 2009. The CDL data are widely used for a variety of applications for climate and environmental ecosystem monitoring, health research, agribusiness decision support, and internally for NASS crop acreage and yield statistical estimation. To facilitate the application and dissemination of this data layer, a web-service based interactive map visualization, dissemination, and querying system called CropScape (http://nassgeodata.gmu.edu/CropScape) was constructed. CropScape adopts a service oriented architecture and Open Geospatial Consortium standards and specifications compliant web services, and re-uses functions/algorithms from the George Mason University developed GeoBrain Technology. CropScape enables on-line US cropland exploring, visualization, queries, and dissemination via interactive maps. The data can be directly exported to ArcGIS Explorer and Google Earth for mashups or delivered to other applications via web services. This system greatly improves equal- accessibility, visualization, and dissemination and facilitates crop geospatial information usage.

#### Title: Results of the Congressionally Mandated Fourth National Incidence Study of Child Abuse and Neglect

• Speaker: Andrea J. Sedlak, Ph.D., Westat
• Chair: Arthur Kendall, Ph.D., Capital Area Social Psychological Association
• Date/Time: Wednesday, May 18, 12:30 - 2:00 p.m.
• Location: American Psychological Association (APA), 750 First Street NE, Washington, DC 20002-4242. This is about 1 block north of the First Street NE Metro exit, Union Station on the Red Line. There is pay parking at Union Station, and nearby.
• RSVP Instructions: Due to security regulations please RSVP to Ron Schlittler at (202)336-6041 or rschlittler@apa.org. Upon arrival, please check in at the front desk and await escort to the room for the presentation.
Washington Statistical Society Human Rights (WSS/HR)
Capital Area Social Psychological Association (CASPA)
DC chapter American Association for Public Opinion Research (DC-AAPOR)
District of Columbia Sociological Society (DCSS)
Federation of Associations in Behavioral and Brain Sciences (FABBS)

Abstract:

This talk will discuss the analysis and results for the Fourth National Incidence Study of Child Abuse and Neglect (NIS-4).

periodic research effort to assess the incidence of child abuse and neglect in the United States. The NIS gathers information from multiple sources to estimate the number of children who are abused or neglected children, providing information about the nature and severity of the maltreatment, the characteristics of the children, perpetrators, and families, and the extent of changes in the incidence or distribution of child maltreatment since the time of the last national incidence study.

The NIS design assumes that the maltreated children who are investigated by child protective services represent only the "tip of the iceberg," so although the NIS estimates include children investigated at child protective services they also include maltreated children who are identified by professionals in a wide range of agencies in representative communities. These professionals, called "sentinels," are asked to remain on the lookout for children they believe are maltreated during the study period. Children identified by sentinels and those whose alleged maltreatment is investigated by child protective services during the same period are evaluated against standardized definitions of abuse and neglect. The data are unduplicated so a given child is counted only once in the study estimates.

For additional information about NIS-4, the Report to Congress is complete and is available at http://www.acf.hhs.gov/programs/opre/abuse_neglect/natl_incid/index.html

For further information about the seminar, contact Michael P. Cohen, mpcohen@juno.com or (202) 232-4651.

#### Title: Enumerative and Evaluative Surveys Compared: Obvious When Known?

• Title: Enumerative and Evaluative Surveys Compared: Obvious When Known?
• Organizer: David Judkins, Westat, WSS Methodology Section Chair
• Chair: Nick Beyler, Mathematica Policy Research, Inc.
• Speakers:
Fritz Scheuren, PhD. Vice President, Statistics and Methodology Department, NORC at the University of Chicago
Nate Allen, Research Analyst, International Projects Department, NORC at the University of Chicago
• Discussants:
Michael L. Cohen, Senior Program Officer, Committee on National Statistics
Nate Allen, Research Analyst, International Projects Department, NORC at the University of Chicago
• Date/Time: Thursday, May 19, 2011, 12:30-2:00pm
• Location: Mathematica Policy Research, Inc., Main Conference Room
600 Maryland Ave., SW, Suite 550, Washington, DC 20024-2512 Virtual attendance is also available by webinar or audio feed by phone

To be placed on the seminar attendance list, please RSVP to Bruno Vizcarra at bvizcarra@mathematica-mpr.com or (202) 484-4231 by noon at least two days in advance of the seminar. Provide your name, affiliation, contact information (email is preferred) and the seminar date. Once on the seminar list, you will be provided with information about webinar and phone viewing for the seminar in case you chose to not attend in person. Mathematica is located at 600 Maryland Ave., SW, Suite 550, Washington, DC 20024. If traveling by Metro, take the Orange, Blue, Green, or Yellow Line to the L'Enfant Plaza Station and follow signs to exit at 7th and Maryland. The entrance to the building will be to your right at the top of the escalators. If traveling by car, pay parking is available in the building parking garage, which is located on 6th Street SW, across from the Holiday Inn. Once in the building, take the elevators by Wachovia to the 5th floor lobby and inform the secretary that you are attending the WSS seminar. Please call Mathematica's main office number (202 484-9220) if you have trouble finding the building or the 5th floor lobby.

Abstract:

In our talk, we will seek to directly examine the mixed and analytic applications of enumerative surveys.

Enumerative surveys are derived from censuses and designed to count a finite population. Unlike censuses, enumerative surveys employ samples rather attempting a complete population count. The seminal book by Hansen, Hurvitz and Madow, originally published in 1953, represents the enumerative tradition well. Analytical surveys, which arise in evaluation, are designed to assess the underlying causes that generate the finite population observations. Widely considered one of the authoritative texts on sampling theory and practice, the analytic survey ideas in Deming (1950) did not make the 1953 HHM book, for reasons we can only speculate. Sample survey books by Cochran and others from that same era are similar in scope, although Cochran does have a chapter (Chapter 2) that deals with some aspects of analytic survey inference.

In official statistical settings, enumerative surveys dominate practice, but are increasingly being applied analytically to address policy "What If" questions. Examples of such surveys include the Current Population Survey (CPS) in the United States, which is employed as an input to a micro-simulation model, and U.S. Agency for International Development's Demographic Health Survey, which is being used to supplement baseline data for a health evaluation we are conducting in the African nation of Lesotho. In examining analytic and mixed applications for enumerative surveys, we touch on these and several other U.S.-based and international examples. We discuss how data collected for primarily enumerative means can be used in conjunction with or supplemented by more analytically motivated data collection. We find that such enumerative surveys can be successfully adapted for work in evaluation - but key distinctions in the design and objectives of both types of surveys should be considered.

#### Title: Why One Should Incorporate the Design Weights When Adjusting for Unit Nonresponse Using Response Homogeneity Groups

• Organizer: David Judkins, WSS Methodology Section Chair
• Chair: Adam Safir, WSS Methodology Program Chair
• Speaker: Phil Kott, RTI International
• Discussant: Roderick J.A. Little, U.S. Census Bureau and University of Michigan
• Date & Time: Thursday, June 9, 12:30pm-2:00pm
• Location: Bureau of Labor Statistics, Conference Center Room 10
To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
• Presentation material:
Phil Kott's Slides (pdf, 928kb)
Roderick Little's Slides (pdf, 188kb)

Abstract:

When there is unit nonresponse in a survey sample drawn using probability-sampling principles, a common practice is to divide the sample into mutually exclusive groups or "adjustment cells" in such a way that it is reasonable to assume that each sampled element in a group is equally likely to be a survey nonrespondent. In the introduction to Little and Vartivarian (Statistics in Medicine, 2003), the authors write, "if adjustment cells are formed in this way [homogenous with respect to response probability], then weighting the non-response rates is unnecessary and inefficient, that is, it adds variance to estimates." We will see that this statement is, at best, misleading. Moreover, it is not supported by Little and Vartivarian's own simulations. In fact, if the goal is to estimate the population mean of a survey variable that roughly behaves as if it were a random variable with a constant mean within each group regardless of the original design weights, then incorporating then design weights into the adjustment factors will usually be more efficient than not incorporating them under the assumed response model. Moreover, if the survey variable behaved exactly like such a random variable, then the estimated population mean computed with the design-weighted adjustment factors would be nearly unbiased in some sense even when the sampled elements within a group are not equally likely to respond.

#### Title: Using Statistics to Solve the Medical Mystery of the Sudden Infant Death Syndrome (SIDS)

• Speaker: David T. Mage, (WHO and USEPA, Retired),
• Chair: Mel Kollander
• Date/time: Wednesday, June 15, 2011 12:30 - 1:30 p.m.
• Location: Bureau of Labor Statistics Conference Center
To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
• Sponsor: Agriculture and Natural Resources, WSS
• Point of Contact: MageDonner@aol.com

Abstract:

The Sudden Infant Death Syndrome (SIDS) is a diagnosis of exclusion in the absence of finding any sufficient cause of death at autopsy and death scene investigation. The ages of SIDS have a 4-parameter lognormal (Johnson SB) distribution that is apparently SIDS most unique and characteristic property, so any theory for the cause of SIDS must explain its universality.

The four parameters of this distribution appear to apply to all countries and have remained virtually constant during the period of redefinition of SIDS and change of preferred sleeping position from prone to supine in which the SIDS rate decreased dramatically. These data are shown to obey Cramer's Theorem in that age distributions of male SIDS, female SIDS, and Sudden Unexpected Death in infancy (SUDI) [defined here as 'SIDS mentioned on death certificate as a possibility but other possible cause assigned'] have similar age distributions to SIDS where it is the only cause of death given on the certificate.

The proposed universal SIDS age Normal Transform is y = Log[(m - a)/(b - m)], where y = mu + sigma z, m is age in months, z is a standard normal deviate, and mu = -1.05 and sigma = 0.291 are median and standard deviation of y. a = -0.31 month and b = 41.2 months are the 3rd and 4th parameters, respectively. For these fixed 3rd and 4th parameters the relative constancy of the mean and standard deviation of the SIDS and SUDI age distributions over many years implies that its parameters are both universal and independent of sleep position. [Internation Journal of Pediatrics 2009 (Free on-line), Scandinavian Journal of Forensic Sciences, 2010;16(1) and 2011;17(1) in press]

#### Title: Design and Analysis of a Dual-Frame Survey: The National Bald Eagle Post-Delisting Survey

• Speakers:
Mark C. Otto, U.S. Fish and Wildlife Service
John R. Sauer, U.S. Geological Survey
Emily Bjerre, U.S. Fish and Wildlife Service
• Chair: Mel Kollander
• Date/time: Wednesday, 29 June 2011 12:30 - 1:30 p.m.
• Location: 228 Gabrielson Hall, Patuxent Wildlife Research Refuge, Laurel, MD 20708
• Directions: From the NE portion of the beltway, go north on the Baltimore Washington Parkway (295). Go east at the third exit, Powder Mill Road (212), and go to the end. Cross 197 into the Patuxent Wildlife Research Refuge. Go about a mile curving around to the right until you drive into a parking lot. There will be a pond on the right and two buildings on the left. Park and walk between the two buildings to Gabrielson, the larger, modern building. Conference room 228 is upstairs at the end of the main hall on the left.
• Point of Contact: Mark Otto (301) 497-5872, Mark_Otto@FWS.Gov.

Abstract:

The U.S. Fish and Wildlife Service's dual-frame bald eagle (Haliaeetus leucocephalus) post-delisting survey provided additional statistical survey design and analysis issues. Bald eagles in the contiguous 48 States are statistically rare. The dual frame sample design took advantage of existing valuable information to target samples. A nationwide list of known bald eagle nest sites provided by State wildlife agencies made the primary sampling frame (the list frame). An additional area-based sample (the area frame) was selected from a grid of 10 by 10 km2 plots in 18 strata in 20 States with high densities of bald eagle nests, totaling approximately 400 plots. Surveys were conducted during the spring of 2009, using fixed wing aircraft in most states and helicopters in Washington and Oregon. Double-observer methods were used in the aircraft to estimate detectability of nests.

Sampling strata were developed using historical data on densities of occupied nests. These densities were summarized over 165 Bird Conservation Regions (BCRs) within States, then aggregated into 20 larger strata sequentially by combining adjacent State-BCR regions with similar densities while monitoring increases in the survey total standard error. Sample sizes had to be allocated over not only the strata but also the list and area sampling frames. Allocation of samples to the list frame depended on the unknown coverage of the State nest lists. A general method of obtaining optimal stratum allocations and cost-variance curves was developed: simulating over a reasonable range of uncertain (nest occupancy rate) and unknown (coverage of the State nest lists) parameters, then fixing the cost and minimizing the variance in the cost-variance equation. The Haines-Pollock dual frame screening estimator was used to estimate the number of active nests in the list frame plus the number of "new" nests found in only the area frame. When separate occupancy surveys or censuses were done, an occupancy misclassification rate could also be estimated to account for the birds missed on single visits in the area frame.

Results from a 2008 New Brunswick Canadian Provincial survey and the 2009 baseline survey give some idea of the efficacy of the dual-frame design. The dual-frame design allowed us to estimate the completeness of the nest lists.

• Speaker: Joe Sakshaug, University of Michigan, Program in Survey Methodology
• Date/Time: Thursday, August 18th, 2011 from 3:30-4:30 pm
• Location: Bureau of Labor Statistics, Conference Room 10
2 Massachusetts Ave, NE. Use the Red Line to Union Station
• Registration: To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar.
• Sponsor: DC-AAPOR and WSS Data Collection Section
• Presentation material:
Slides (pdf, ~21.7mb)

Abstract:

Joe Sakshaug is a recent PhD graduate of the Michigan Program in Survey Methodology. He received an MS from the same program in 2007, and a BA in Mathematics from the University of Washington in 2003.His methodological research interests include statistical disclosure control, administrative data linkage, small area estimation, multiple imputation, the collection of biomarkers in population-based surveys, and survey nonresponse and measurement error. Joe's dissertation, titled "Synthetic Data for Small Area Estimation," was funded by grants from the U.S. Census Bureau, the Centers for Disease Control and Prevention, and the National Science Foundation. He is the recipient of an Alexander von Humboldt Research Fellowship for postdoctoral studies in Germany.

#### Title: Dual-Frame RDD Methodology — A Better Approach

• Organizer: Adam Safir, WSS Methodology Section Chair
• Chair: Adam Safir, WSS Methodology Section Chair
• Speaker: Mansour Fahimi, Marketing Systems Group
• Discussant: Richard Sigman, Westat
• Date & Time: Thursday, September 1, 12:30pm-2:00pm
• Location: Bureau of Labor Statistics, Conference Center Room 10
• Directions: To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
• Sponsor: WSS Methodology Program, WSS Data Collection Section, DC-AAPOR

Abstract:

Dual-frame RDD sampling methodology, which employs a mixture of landline and cellular telephone numbers, was devised to improve the coverage of traditional landline RDD samples. Current practice of this methodology, however, is subject to technical inefficiencies. On the one hand, due to unavailability of counts of cell-only households in small geographic domains, survey researchers have to rely on crude assumptions to determine the mixture of their samples. This creates inconsistencies for both sample selection and weighting applications. On the other hand, with advances in the telecommunication technologies, a growing number of households are now relying on quasi-landline alternatives such as Voice over Internet Protocol (VoIP). Since the majority of such alternatives can only provide unlisted telephone numbers, many of which are not included in the traditional landline frame, this results in coverage problems for the landline component of samples.

Marketing Systems Group is introducing a pioneering methodology for obtaining counts of cell-only households for all Census geographic domains, down to each of the 3,143 counties in the US. This will eliminate guesswork when designing dual-frame RDD samples by determining the optimal sample mixture, as well as provide the needed population counts for proper construction of survey weights. Moreover, a new methodology for landline RDD frame construction will be introduced that includes virtually all landline and quasi-landline telephone numbers. While diminishing the undercoverage of landline RDD samples, for the first time counts of both listed and unlisted numbers will be provided for nearly all landline 100-series telephone banks to accommodate various disproportionate sample selection options.

Dr. Mansour Fahimi is the Vice President of Statistical Research Services at Marketing Systems Group. Mansour has over 20 years of experience working on design and administration of complex surveys; data analysis involving multivariate techniques, nonparametric methods, experimental designs; as well as designed-based procedures for analysis of data from complex surveys. As an adjutant professor Dr. Fahimi teaches courses in statistics while working as a consultant on tasks that involve research design, data analysis, and workshops on techniques for enhancing data quality through minimization of the total survey error.

#### Title: Orer Restricted Inference: Computational Algorithms

• Speaker: Edward J. Wegman, School of Physics, Astronomy, and Computational Sciences and Department of Statistics, George Mason University
• Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
• Date: Friday, September 2, 2011
• Location: Research 1, Room 301, Fairfax Campus, George Mason University, 4400 University Drive, Fairfax, VA 22030
• Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

Analysis of video sequences has become an important problem in applications ranging from Order-restricted inference gained much currency in the 1960s in an era where mainframe computing was the norm. In the last forty-five years, computation and data collection methods have made enormous strides. In particular in the 1960s, people thought in terms of, perhaps, a few hundred observations. Maximum likelihood methods for isotonic density estimation dealt with naive algorithms that were satisfactory in the context of a few hundred observations, but become computationally infeasible in the context of millions of observations. I will demonstrate a quick way to narrow the scope including a suggestion of a dynamic programming algorithm. This presentation is dedicated to the memory of Professor Tim Robertson.

#### Title: Conditional Logistic Regression with Survey Data

Abstract:

When data consists of clusters of potentially correlated observations then conditional logistic regression can be used to estimate the association between a binary outcome and covariates conditionally on the cluster effects. Surveys can use multistage sampling with potentially differential probabilities of sampling individuals from the same conditioning cluster (e.g., family). We show that conditional logistic regression of survey data using standard inflation (weighted) estimation (i.e., observations are weighted by the inverse of their inclusion probabilities) can result in biased estimators when the sample sizes of the observations sampled from the conditioning clusters are small. We propose using methods based on weighted pseudo pairwise likelihoods that combine the conditional logistic likelihoods for all pairs of observations consisting of a positive and a negative outcome within a conditioning cluster and weights the pairwise likelihoods by the inverse of their joint inclusion probabilities within the cluster. Design-based variance estimators for regression coefficient estimators are provided. Limited simulations demonstrate that the proposed methods produce approximately unbiased regression coefficients and variance estimates, but can be considerably less efficient than maximum likelihood estimation when the sampling is uninformative. The proposed methods are illustrated with an analysis of data from the Hispanic Health and Nutrition Examination Survey.

#### Title: Unlocking Online Communities: How to measure the weight of the silent masses in authority ranking?

Abstract:

ISocial media has increasingly been used by enterprises for reaching out to their customers for advertising campaigns, receiving product reviews, and users' preferences for new product development. This requires extraction and aggregation of information in the social media space for facilitating the decision making process. A key challenge is to automate this process of information discovery, extraction, and aggregation along relevant dimensions such as age, gender, location,interest, sentiment and authority. We have developed iPointTM, a system that enables the discovery, extraction and aggregation of social media, measuring the sentiments depicted online, providing an authority score for each author based on their interests along with the authors age, gender and location. We then use this information in conjunction with our ad server iServeTM. We use the derived intelligence from iPointTM as a daily updated internet panel that measures the internet waves to help distribute ads accordingly within Advertising networks. Positive results are recorded by comparison to existing targeting technologies using both Yahoo! Right media exchange and Google content network. In this presentation we will focus on our authority ranking model which depends on Eigen Value calculations where we consider the number of posts by each author, the number of links and back comments on the posts, the relevancy of the post within each community and the amount of silent interaction with the posts. We present how we calculate the silent interactions in our model and how we use sparse matrix properties to optimize the calculation and storage time. The authority rank influences the general sentiment of a topic interest level, where sentiments from a highly ranked, more influential author have more weight than the sentiments of a less influential author, thus the community direction.

#### Title: Twitter as an Environmental Sensor

• Speaker: Elizabeth M. Hohman, Naval Surface Warfare Center, Dahlgren Division
• TTime: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
• Date: Friday, September 16, 2011
• Location: Research 1, Room 301, Fairfax Campus George Mason University, 4400 University Drive, Fairfax, VA 22030. Directions to GMU and maps are available at http://www.gmu.edu/resources/welcome/Directions-to-GMU.html.
• Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

Can we use Twitter as a sensor of the opinions or interests of a population? This work investigates using Twitter to measure what people are talking about and how their tweets are changing in time. The data presented consists of public tweets centered in Manama, Bahrain in March 2011. We first describe some basic preprocessing techniques for Twitter data. Next we apply some standard statistical text processing approaches, including topic models, to identify clusters of users who tweet about similar topics. Finally, we look at tweets over time in order to identify changes in the environment. This preliminary work shows that with a small amount of preprocessing, we may be able to extract signal from the noise of Twitter.

#### Title: Subsufficient algebra related to the structure of UMVUEs

• Speaker: Prof. Abram Kagan , University of Maryland, College Park, MD
• Date/Time: Thursday, September, 22nd, 3:30 PM
• Location: Room 1313, Math Building, University of Maryland College Park (directions).

Abstract:

Statistics T(X) (or, more generally, subalgebras) with the property that any function of T(X) is a UMVUE are studied. Though they are functions of the minimal sufficient statistic, the construction in case of categorical X is absolutely different from that of the minimal sufficient statistic.

#### Title: Curvature, Robustness and Optimal Design in Applied Generalized Nonlinear Regression Modelling

Abstract:

Researchers often find that nonlinear regression models are more applicable for modelling various biological, physical and chemical processes than are linear ones since they tend to fit the data well and since these models (and model parameters) are more scientifically meaningful. These researchers are thus often in a position of requiring optimal or near-optimal designs for a given nonlinear model. A common shortcoming of most optimal designs for nonlinear models used in practical settings, however, is that these designs typically focus only on (first-order) parameter variance or predicted variance, and thus ignore the inherent nonlinear of the assumed model function. Another shortcoming of optimal designs is that they often have only p support points, where p is the number of model parameters.

Measures of marginal curvature, first introduced in Clarke (1987) and further developed in Haines et al (2004), provide a useful means of assessing this nonlinearity. Other relevant developments are the second-order volume design criterion introduced in Hamilton and Watts (1985) and extended in O'Brien (1992, 2010), and the second-order MSE criterion developed and illustrated in Clarke and Haines (1995).

This talk examines various robust design criteria and those based on second-order (curvature) considerations. These techniques, coded in the GAUSS and SAS/IML software packages, are illustrated with several examples including one from a preclinical dose-response setting encountered in a recent consulting session.

#### Title: Participatory Development Need Elicitation in Conflict Environments: A Process and a Case Study of Uruzgan, Afghanistan

• Speakers: Seyed Mohammad Mussavi Rizi and Maciej Latek, Krasnow Institute for Advanced Study
• Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
• Date: September 23, 2011
• Location: Research 1, Room 301, Fairfax Campus, George Mason University, 4400 University Drive, Fairfax, VA 22030
• Sponsor: George Mason University CDS/CCDS/Statistics Colloquium

Abstract:

We outline a participatory process of development need elicitation in armed conflict environments where conducting effective surveys and running experiments are stymied by combat conditions and distrustful participants. We describe one field implementation of the process in a case study of development projects in Uruzgan, Afghanistan. We discuss the challenges of analyzing data from such processes and propose a behavioral solution that enables measuring the uncertainty associated with data on the development needs of a community. Finally, we derive development project portfolios that are robust to data uncertainty and fulfill equity constraints.

#### Title: Salvaging an eQTL project: Identifying and correcting sample mix-ups

Abstract:

In a mouse intercross with more than 500 animals and genome-wide gene expression data on six tissues, we identified a high proportion of sample mix-ups in the genotype data, on the order of 15%. Local eQTL (genetic loci influencing gene expression) with extremely large effect may be used to form a classifier for predicting an individual's eQTL genotype from its gene expression value. By considering multiple eQTL and their related transcripts, we identified numerous individuals whose predicted eQTL genotypes (based on their expression data) did not match their observed genotypes, and then went on to identify other individuals whose genotypes did match the predicted eQTL genotypes. The concordance of predictions across six tissues indicated that the problem was due to mix-ups in the genotypes. Consideration of the plate positions of the samples indicated a number of off-by-one and off-by-two errors, likely the result of pipetting errors. Such sample mix-ups can be a problem in any genetic study. As we show, eQTL data allow us to identify, and even correct, such problems.

#### Title: Integration of Multiple Prognostic Factors into the TNM Cancer Staging System by a Statistical learning Algorithm

• Speaker: Dechang Chen, Ph.D., Division of Epidemiology and Biostatistics, Uniformed Services University of the Health Sciences
• Date/Time: 2:50 pm, Thursday, September 29th, 2011 (Refreshments will be served at 2:30 pm)
• Location: Bentley Lounge, Gray Hall 130, American University
• Directions: Metro RED line to Tenleytown-AU. AU shuttle bus stop is next to the station. Please see campus map on http://www.american.edu/media/directions.cfm for more details.
• Contact: Linda Greene, 202-885-3120, mathstat@american.edu
• Sponsor: Department of Mathematics and Statistics Colloquium, American University

Abstract:

The traditional TNM staging system uses three variables to stratify cases into prognostic and treatment groups: tumor size, degree of spread to regional lymph nodes, and presence of metastasis. The system is limited by its inability to include many other factors (e.g., demographic, pathologic and molecular factors), which also impact prognosis. In this talk, I will introduce an ensemble algorithm of clustering of cancer data (EACCD) that can be used to create predictive systems for cancer patients. A predictive system incorporates additional host features and tumor prognostic factors and is more accurate and more adaptable than the TNM system. EACCD is a three-step clustering method. In the first step, an initial dissimilarity is computed by using a test statistic. In the second step, a learnt dissimilarity measure is attained by a partitioning method, and in the third step, the learnt dissimilarity is used with a hierarchical clustering algorithm to obtain clusters of patients. A demonstration on the algorithm will be given by using lung cancer data from the Surveillance, Epidemiology, and End Results Program of the National Cancer Institute, USA for the years 1988-1998.

#### Title: Estimation of Complex Small Area Parameters with Application to Poverty Indicators

• Presenter: Dr. J.N.K. Rao
• Chair: Dr. Graham Kalton, Senior Vice President at the Westat
• Discussants:
1. Dr. Peter Lanjouw, Manager, Poverty and Inequality Group, Development Economics Research Group (DECRG), the World Bank
2. Dr. Partha Lahiri, Professor, Joint Program in Survey Methodology, University of Maryland, College Park
• Date: Friday, September 30, 2011
• Time: 3:00PM - 5:00PM
• Where: 1524 Van Munching Hall, University of Maryland, College Park
• Reception: 5:00PM - 6:00PM
• Contact Person: Dr. Yang Cheng, Branch Chief, Governments Division, US Census Bureau, yang.cheng@census.gov
• For direction, parking, and other information: http://www.jpsm.umd.edu/jpsm/?events/specialevents/distinguished_lecture_2011_09_30/index.htm

Presenter Background:

Dr. J. N. K.Rao is a Distinguished Research Professor at Carleton University, Ottawa, Canada, and a Consultant to Statistics Canada, and a Member of Statistics Canada's Advisory Committee on Methodology. Among the awards and honors, Professor Rao has received the Waksberg Award for Survey Methodology, the Gold Medal of the Statistical Society of Canada, election to the Royal Society of Canada, and Honorary Doctorate of the University of Waterloo. He has made fundamental contributions to the design-based classical theory of sampling, to the foundations of sampling during the debates of the 1960s and 70s, to a variety of aspects of variance estimation, to the analysis of complex survey data, and to small area estimation.

Abstract:

Model-based small area estimation has largely focused on means or totals, using either area level models or unit level models. Empirical best linear unbiased prediction (EBLUP), empirical Bayes or empirical best (EB) and hierarchical Bayes (HB) methods have been extensively used for point estimation and for measuring the variability of the estimators. Primary purpose of this presentation is to study the estimation of complex non-linear small area parameters by using EB and HB methods. Our methodology is generally applicable, but we focus on measures of poverty indicators, in particular on the class of poverty measures called FGT poverty measures (Foster, Greer and Thorbecke, 1984). The World Bank has been releasing small area estimates of the FGT measures for several countries, using the methodology of Elbers, Lanjouw and Lanjouw (2003). The ELL methodology assumes a unit level nested error linear regression model that combines both census and survey data and produces simulated censuses of the variables of interest using the bootstrap. Estimates for any desired small areas are produced from the simulated censuses. The average of the resulting estimates is taken as the area estimate and the variance of the estimates is taken as a measure of variability of the area estimate. We present EB estimation of FGT poverty measures for small areas using best prediction methodology based on the joint predictive density of the non-observed values given the observed data, assuming normality for a suitably transformed value of the variable of interest, for example log of the welfare variable. A nested error linear regression unit-level model with random small area effects is assumed on the transformed variable. We show that values from the joint predictive density under the unit level model can be obtained by generating only univariate normal variables. For comparison with the ELL method, we assume the same model with small area random effects for ELL, although ELL did not include small area effects in their models. We use a parametric bootstrap method for estimating the mean squared error (MSE) of the EB estimators. We develop a census EB method that can be used when the sample data cannot be linked to census auxiliary data. We also study HB estimation under normality, assuming a diffuse prior on the model parameters. We show that the posterior mean and the posterior variance of small area parameters can be obtained using a grid method that avoids the use of Monte Carlo Markov chain methods for generating values from the posterior density of the parameter of interest. If the distribution of random effects and/or unit errors in the unit level model deviate significantly from normality, then the normality-based EB or HB estimators can be biased under significant skewness. We extend the EB method to skew normal random effects and/or unit errors. We present the results of a model-based simulation study on the relative performance of EB, ELL and HB estimators. Return to top

#### Title: Unit Root Tests - A Review

• Speaker: Sastry G. Pantula, Director of the Division of Mathematical Sciences at the National Science Foundation
• Time: Friday, September 30th, 2011, 4:00-5:00 pm
• Place: Duques 553 (2201 G Street, NW, Washington, DC 20052). Followed by wine and cheese reception.
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
• Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

Unit root tests in time series analysis have received considerable amount of attention since the seminal work of Dickey and Fuller (1976). In this talk, some of the existing unit root test criteria will be reviewed. Size, power and robustness to model misspecification of various unit root test criteria will be discussed. Unit root tests where the alternative hypothesis is a unit root process will be discussed. Tests for trend stationarity versus difference stationary models will be discussed briefly. Current work on unit root test criteria will also be discussed. Examples of unit root time series testing will be presented. Extensions to multivariate and heteroscedastic models will be discussed.

Sastry Pantula received his B.Stat and M.Stat from the Indian Statistical Institute, Kolkata and a Ph.D. In Statistics from Iowa State University. He has been a faculty member at North Carolina State University since 2002. He served as the Director of Graduate Programs from 1994-2002, and as the Department Head from 2002-2010. He is the 2010 ASA President. Currently, he is on loan to the National Science Foundation and serving as the Director of Division of Mathematical Sciences.

#### Title: Analyses of Health Informatics Databases for Interventions Related to Negative Outcomes

• Speaker: Byeonghwa Park, PhD Program in Computational Sciences and Informatics, School of Physics, Astronomy, and Computationals Sciences
• Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
• Date: Friday, September 30, 2011
• Location: Research 1, Room 301, Fairfax Campus, George Mason University, 4400 University Drive, Fairfax, VA 22030
• Sponsor: George Mason University SPACS/CCDS/Statistics Colloquium

Abstract:

Substantial evidence shows that alcohol use is significantly associated with negative outcomes such as legal, social, and health problems. Alcohol-related problems are most often found in young males and prevalent on campuses as well. Negative outcomes not only affect individuals but also the communities surrounding them. A better understanding of relationships between alcohol-related problems and alcohol use is critical and prerequisite to developing and implementing effective ways to reduce problems. A common theme of this research is to shed light on relationships between alcohol use and negative outcomes among adolescents in different settings such as colleges and a large metropolitan area, the City of Buffalo, NY. One research piece identifies the relationship between on-campus alcohol-related problems with policy, prevention, and staffing/resources efforts pertaining to alcohol consumption in colleges or universities by using multinomial logistic regression and correspondence analysis. The second research piece uses structural equation modeling to test the integrated theory, a combination of availability and social learning theory, in order to examine the relationships between alcohol use and delinquency with physical and social availability among young males in a metropolitan area. The results from this research piece demonstrate that permitting alcohol to be consumed on campus plays a very important role in affecting undesirable outcomes and that a focus period of time for violence/alcohol education and prevention efforts can be among effective solutions in the prevention and education to decrease health problems, sexual problems, and violence pertaining to alcohol consumption. The findings from the additional research piece show that social availability affects alcohol use more than physical availability. Social availability, which is composed of social norm, parental supervision, and social context of drinking in the groups is a very important factor to implement effective prevention and intervention of youth drinking.

This seminar is also the final defense of PhD dissertation for Mr. Park. Students in the PhD program are encouraged to attend to see what life is like at this stage of your education.

#### Title: Maximum Likelihood Estimates of the Treatment Effect for Randomized Discontinuation Trials with Continuous

Abstract:

Randomized Discontinuation Trials (RDT) designs consist of two stages. The first stage (a single arm trial) serves as a filter for removing those patients who are unlikely to respond to the treatment. The second stage is a two-arm blinded randomized trial that aims to measure the treatment effect. Federov and Liu (2005, 2008) developed the maximum likelihood estimates of the treatment effect for binary responses using all the information from both stages of the experiment. They examined the efficiency of this approach compared to the traditional two-arm randomized clinical trial (RCT). In this talk, we first review RDT designs and the maximum likelihood estimates of the treatment effect sp;Then, we consider a continuous response model for RDT designs. Maximum likelihood estimates of the treatment effect are developed using all the information and the efficiency of this approach compared to the RCT is studied. This is a joint work with Dr. Nancy Flournoy.

#### Title: Signal Extraction for Nonstationary Multivariate Time Series with Applications to Trend Inflation

• Speaker: Dr. Tucker McElroy, Center for Statistical Research and Methodology, Census Bureau
• Date/Time: Thursday, October 6, 2011, 3:30pm
• Location: Room 1313, Math Building, University of Maryland College Park (directions).

Abstract:

We advance the theory of signal extraction by developing the optimal treatment of nonstationary vector time series that may have common trends. We present new formulas for exact signal estimation for both theoretical bi-infinite and finite samples. The formulas reveal the specific roles of inter-relationships among variables for sets of optimal filters, which makes fast and direct calculation feasible, and shows rigorously how the optimal asymmetric filters are constructed near the end points for a set of series. We develop a class of model-based low-pass filters for trend estimation and illustrate the methodology by studying statistical estimates of trend inflation.

Note: Tea to follow in Room 3201.

#### Title: Perspectives on Machine Bias Versus Human Bias

• Speaker: Dr. S. Ejaz Ahmed. Department of Mathematics and Statistics, University of Windsor
• Organizer: Adam Safir, WSS Methodology Section Chair
• Chair: Charles Day, WSS Methodology Program Chair
• Date/ Time: Thursday, October 6, 2011, 3:00 pm - 4:30 pm
• Location: Bureau of Labor Statistics, Conference Center Room 10, 2 Massachusetts Ave, NE. Use the Red Line to Union Station
• Registration: To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar.

Abstract:

Penalized regression have been widely used in high-dimensional data analysis. Much of recent work has been done on the study of penalized least square methods in linear models. In this talk, I consider estimation in generalized linear models when there are many potential predictor variables and some of them may not have influence on the response of interest. In the context of two competing models where one model includes all predictors and the other restricts variable coefficients to a candidate linear subspace based on prior knowledge, we investigate the relative performances of absolute penalty estimator (APE), shrinkage in the direction of the subspace, and candidate subspace restricted type estimators. We develop large sample theory for the shrinkage estimators including derivation of asymptotic bias and mean-squared error. The asymptotics and a Monte Carlo simulation study show that the shrinkage estimator overall performs best and in particular performs better than the APE when the dimension of the restricted parameter space is large. The estimation strategies considered in this talk are also applied on a real life data set for illustrative purpose.

#### Title: Sufficient Dimension Reduction for Longitudinally Measured Predictors

Abstract:

We propose a method to combine several predictors (markers) that are measured repeatedly over time into a composite marker score without assuming a model and only requiring a mild condition on the predictor distribution. Assuming that the first and second moments of the predictors can be decomposed into a time and a marker component via a Kronecker product structure, that accommodates the longitudinal nature of the predictors, we develop first moment sufficient dimension reduction techniques to replace the original markers with linear transformations that contain sufficient information for the regression of the predictors on the outcome. These linear combinations can then be combined into a score that has better predictive performance than the score built under a general model that ignores the longitudinal structure of the data. Our methods can be applied to either continuous or categorical outcome measures. In simulations we focus on binary outcomes and show that our method outperforms existing alternatives using the AUC, the area under the receiver-operator characteristics (ROC) curve, as a summary measure of the discriminatory ability of a single continuous diagnostic marker for binary disease outcomes.

This is joint work with Ruth Pfeiffer and Liliana Forzani

#### Title: Nonlinear Signal Processing Problem: A Regularized Estimator and Cramer-Rao Lower Bound for a Nonlinear Signal Processing Problem

• Speaker: Prof. Radu Balan, Department of Mathematics, UMCP
• Date/Time: Thursday, October 13, 2011, 3:30pm
• Location: Room 1313, Math Building, University of Maryland College Park (directions).

Abstract:

In this talk we present and algorithm for signal reconstruction from absolute value of frame coefficients. Then we compare its performance to the Cramer-Rao Lower Bound (CRLB) at high signal-to-noise ratio. To fix notations, assume {f_i; 1<= I <= m} is a spanning set (hence frame) in R^n. Given noisy measurements d_i=|<x,f_i>|^2+\nu_i, 1<= i<= m, the problem is to recover x\in R^n up to a global sign. In this talk the reconstruction algorithm solves a regularized least squares criterion of the form I(x) = \sum_{i=1}^m ||<x,f_i>|^2-d_i|^2 + \lambda ||x||^2 This criterion is modified in the following way: 1) the vector x is replaced by a nxr matrix L; 2) the criterion is augmented to allow an iterative procedure. Once the matrix L has been obtained, an estimate for x is obtained through an SDV factorization.

Note: Tea to follow in Room 3201.

#### Title: On implications of demand censoring in the newsvendor problem

• Speaker: Alp Muharremoglu, School of Business, UT Dallas
• Time: Friday, October 14th 11:00 am - 12:15 pm
• Place: Duques Room 553, 2201 G Street, NW, Washington DC 20052
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
• Sponsor: The Institute for Integrating Statistics in Decision Sciences, George Washington University.

Abstract:

We consider a repeated newsvendor problem in which the decision-maker (DM) does not have access to the underlying distribution of discrete demand. We analyze three informational settings: i.) the DM observes realized demand in each period; ii.) the DM only observes realized sales; and iii.) the DM observes realized sales but also a lost sales indicator that records whether demand was censored or not. We analyze the implications of censoring on performance and key characteristics that effective policies should possess. We provide a characterization of the best achievable performance in each of these cases, where we measure performance in terms of regret: the worst case difference between the cumulative costs of any policy and the optimal cumulative costs with knowledge of the demand distribution. In particular, we show that for both the first and the third settings, the best achievable performance is bounded (i.e., does not scale with the number of periods) while in the second setting, it grows logarithmically with the number of periods. We link the latter degradation in performance to the need for continuous exploration with sub-optimal decisions and provide a characterization of the frequency with which this should occur.

#### Title: Using Innovative Designs for Clinical trials: My experiences with PhRMA Working Group on Adaptive Dose-Ranging Studies

Abstract:

Several initiatives have been formed recently in pharmaceutical industry in response to high attrition rate observed in drug development for the past decade or so. Examples of such initiatives include FDA's Critical Path Initiative and PhRMA's Pharmaceutical Innovation Steering Committees (PISCs) with multiple working groups, both aiming at improving productivity and efficiency in drug development. One of such working groups, Adaptive Dose-Ranging Studies Working Group was created to specifically address the impact of improved knowledge about dose-response relationship through use of innovative designs in clinical trials on success rate of overall program. This talk highlights key statistical methods and conclusions from the extensive simulation study the group conducted to evaluate the performance of several innovative designs (most of which are rooted in optimal design theory) with respect to quality of information obtained at the end of such trial.

#### Title: Missing Conceptual Components, and Design-Based/Model-Based Viewpoints

• Speakers: Robert M. Groves and Roderick J. Little, U.S. Census Bureau
• Discussant: Natalie Shlomo, Southampton Statistical Sciences Research Institute, University of Southampton, United Kingdom
• Date: Tuesday, October 18, 2011
• Time: 3:30-5:30 p.m.
• Location: Jefferson Auditorium of the U.S. Department of Agriculture's South Building (Independence Avenue, SW, between 12th and 14th Streets); Smithsonian Metro Stop (Blue/Orange Lines). Enter through Wing 5 or Wing 7 from Independence Ave. (The special assistance entrance is at 12th & Independence). A photo ID is required.
• Sponsors: The Washington Statistical Society, Westat, and the National Agricultural Statistics Service.

Abstract:

The Total Survey Error paradigm, with its explicit decom- position of mean squared error into components of sampling and nons- ampling error, is the conceptual foundation of the field of survey methodology. However, it is argued that the current paradigm has some limitations. Key quality concepts are not included, notably those of the user. Quantitative measurement of many components is burdensome and lagging, and error measurement has not been incorporated in inferences for practical surveys to the extent desirable. We seek to address these limitations by extensions of the Total Survey Error concept, and a model- based interpretation of Total Survey Error based on missing data ideas.

A reception will follow at 5:30 pm in the Patio of the Department of Agriculture Jamie L. Whitten Building. Please pre-register for this event to help facilitate access to the building online at http://www.nass.usda.gov/morrishansen/.

#### Title: Communicating Disclosure Risk to Non-Statisticians

• Chair: Jennifer Park, OMB
• Speakers:
George Zipf from HHS/ CDC/ NCHS will represent NHANES
Elise Christopher from DoEd/ IES/ NCED will represent HSLS
Steven Hirschfeld from HHS/ NIH/ Eunice Kennedy Shriver National Institute for Child Health and Human Development (NICHD) will represent the NCS
• Date/ Time: Wednesday, October 18th, 2011 from 12:30 - 2:00 pm
• Location: Bureau of Labor Statistics, Conference Center, 2 Massachusetts Ave, NE. Use the Red Line to Union Station
• Registration: To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar.
• Sponsor: WSS Data Collection Section and DC-AAPOR
• Presentation material:
Jennifer Park Slides (pdf, ~712kb)
Laura LoGerfo Slides (pdf, ~612kb)
George Zipf Slides (pdf, ~860kb)

Abstract:

Community engagement can be essential to successful design, implementation, and information goals of statistical studies. Yet, studies conducted under a federal pledge of confidentiality limit the ways in which study data can be communicated responsibly. As a result, communicating disclosure risk to lay persons, particularly in relation to indirect disclosure, is often extremely important and yet also particularly challenging. Federal statistical guidance is generally oriented to persons well-acquainted with statistical approaches and procedures.

This WSS-facilitated discussion will feature representatives from the National Children's Study, the High School Longitudinal Survey, and the National Health and Nutrition Examination Survey to share their approaches to, experiences with, and recommendations for communicating disclosure risk to local officials and community stakeholders. The issue has relevance to agencies and groups who conduct federal statistical surveys and interact with local officials and community stakeholders in the development of sampling frames, participant recruitment and engagement activities, and the review of survey results.

#### Title: Bayes' Rule: The Theory That Would Not Die

• Speaker: Sharon Bertsch McGrayne
• Time: Friday, October 21, 2011, 4:00 pm
• Place: Duques Room 651, 2201 G Street, NW, Washington DC 20052. Wine and Cheese Reception and Book Signing follow the talk. Kindly RSVP by e-mail : i2sds@gwu.edu
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
• Sponsor: The Departments of Physics, Statistics, The Institute for Integrating Statistics in Decision Sciences, and The Institute for Reliability and Risk Analysis of the George Washington University.

Abstract:

From spam filters and machine translation to the drones over bin Laden's compound, Bayes' rule pervades modern life. Thomas Bayes and Pierre-Simon Laplace discovered the rule roughly 250 years ago but, for most of the 20th century, it was deeply controversial, almost taboo among academics. My talk will range over the history of Bayes' rule, highlighting Alan Turing who decrypted the German Enigma code and Jerome Cornfield of NIH and George Washington University who established smoking as a cause of lung cancer and high cholesterol as a cause of cardiovascular disease. The talk will be based on my recent book, The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines & Emerged Triumphant from Two Centuries of Controversy (Yale University Press).

Sharon Bertsch McGrayne is also the author of Nobel Prize Women in Science (National Academy Press), and Prometheans in the Lab (McGraw-Hill).

A former newspaper reporter and co-author of The Atom, Electricity & Magnetism (Encyclopaedia Britannica). She has been a panelist on NPR's Science Friday, and her work has been featured on Charley Rose. She has written for Scientific American, APS News, Science, Isis, the Times Higher Education Supplement, and other publications. Her books have been reviewed by Nature, Chemical & Engineering News, New Scientist, JAMA, Physics Today, Scientific American, Science Teacher, American Journal of Physics, Physics Teacher, Popular Mechanics, and others. Her webpage is at www.McGrayne.com.

#### Title: The Effects of the Great Recession on Our Economy and Society: Insights from Public Data

• Speakers:
Katherine Wallman, Chief Statistician, Office of Management & Budget
Alan Zaslavsky, CNSTAT, Harvard Medical School
Michael Hout, CNSTAT, University of California, Berkeley
S. Philip Morgan, Duke University
• Discussant: Lisa Lynch, CNSTAT, Brandeis University
• Date: Friday, October 21, 2011
• Time: 1:30 - 5:30 pm
• Place: Room 100 Keck Center of the National Academies, 500 Fifth Street NW, Washington, DC
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
• Sponsor: The Departments of Physics, Statistics, The Institute for Integrating Statistics in Decision Sciences, and The Institute for Reliability and Risk Analysis of the George Washington University.
• Registration: Open to the public, but for planning purposes please register by email to cnstat@nas.edu or call Bridget Edmonds at (202) 334-3096.

Abstract:

The Great Recession—the longest and deepest recession in the United States since the Great Depression in the 1930s—has and will continue to have significant effects on many aspects of the American society and economy. A publication from the Russell Sage Foundation, The Consequences of the Great Recession (2011 forthcoming, edited by David Grusky, Bruce Western, and Christopher Wimer), argues, that even at this early date, the Great Recession can be seen as transformative in its effects on employment, poverty, income, wealth, consumption, fertility, mortality, marriage, attitudes, charitable giving, and much more. Three contributors to the Russell Sage volume—Michael Hout, Timothy Smeeding, and S. Philip Morgan—will trace the Great Recession's effects on employment, income and poverty, and family composition, highlighting the uses of statistical data from a variety of sources and identifying important data gaps. Lisa Lynch, former chief economist in the U.S. Department of Labor and chair of the Board of Directors of the Federal Reserve Bank of Boston, will consider the implications of this work for the federal statistical system and other sources of public data to support continued analysis of this watershed event for the United States and the world.

#### Title: Likelihood-based Methods for Regression Analysis with Binary Exposure Status Assessed by Pooling

• Speaker: Robert H. Lyles, PhD, Associate Professor, Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University
• Date: Friday, October 28, 2011
• Time: 10:00-11:00 am
• Location: Warwick Evans Conference Room, Building D, 4000 Reservoir Rd, Washington, DC
• Directions: http://dbbb.georgetown.edu/mastersprogram/visitors/. Campus map: http://maps.georgetown.edu/
• Sponsor: Department of Biostatistics, Bioinformatics and Biomathematicsl Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

The need for resource-intensive laboratory assays to assess exposures in many epidemiologic studies provides ample motivation to consider study designs that incorporate pooled samples. In this talk, we consider the case in which specimens are combined for the purpose of determining the presence or absence of a pool-wise exposure, in lieu of assessing the actual binary exposure status for each member of the pool. We presume a primary logistic regression model for an observed binary outcome, together with a secondary regression model for exposure. We facilitate maximum likelihood analysis by complete enumeration of the possible implications of a positive pool, and we discuss the applicability of this approach under both cross-sectional and case-control sampling. We also provide a maximum likelihood approach for longitudinal or repeated measures studies where the binary outcome and exposure are assessed on multiple occasions and within-subject pooling is conducted for exposure assessment. Simulation studies illustrate the performance of the proposed approaches along with their computational feasibility using widely available software. We apply the methods to investigate gene-disease association in a population-based case-control study of colorectal cancer.

#### Title: Understanding Genetic and Environmental Contributions to Melanoma Development

• Speaker: Professor Christopher Amos. The University of Texas, Anderson Cancer Center, Department of Epidemiology
• Date: Friday, October 28th, 2011
• Time: 11:00-12:00 noon
• Place: Phillips Hall, Room 411 (801 22nd Street, NW, Washington, DC 20052)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
• Sponsor: The Department of Statistics, George Washington University.

Abstract:

In this presentation I describe the approach that we have taken in conducting genetic and epidemiological studies to investigate the causes of melanoma. I will first discuss the design and analysis of a genome-wide scan of melanoma risk and of Breslow thickness, a measure of severity. Then, I will discuss approaches to gene-gene interaction modeling. Subsequently I will discuss the development and application of new approaches for modeling gene-gene and gene-environment interaction. I then describe application of these methods for studying multiple genetic factors influencing melanoma risk.

#### Title: On Principal Component Analysis (PCA) with Applications to Whole-Genome Scans

Abstract:

Association studies using unrelated individuals have become the most popular design for mapping complex traits. Among the major challenges of association mapping is avoiding spurious association due to population stratification. Principal component analysis is one of the leading stratification-control methods. The popular practice of selecting principal components is based on significance of eigenvalues alone, regardless of their association with the phenotype. We propose a new approach, called EigenCorr, which selects principal components based on both their eigenvalues and their correlation with the phenotype. Our approach tends to select fewer principal components for stratification control than does testing of eigenvalues alone, providing substantial computational savings and improvements in power. Furthermore, a number of settings arise in which it is of interest to predict PC scores for new observations using data from an initial sample. In high-dimensional settings, we demonstrate that naive approaches to PC score prediction can be substantially biased towards 0. This shrinkage phenomenon is largely related to known inconsistency results for sample eigenvalues and eigenvectors. Under the spiked population model, we derive the asymptotic shrinkage factor, based on which we propose a bias-adjusted PC score prediction.

#### Title: Evaluation of Automated Edit and Imputation using BANFF on the USDA NASS Hogs and Pigs Survey

• Organizer: Charles Day, WSS Methodology Program Chair Chair: Charles Day, WSS Methodology Program Chair
• Speaker: James Johanson, National Agricultural Statistics Service, USDA
• Date & Time: Tuesday, November 1, 12:30pm-2:00 pm
• Location: Bureau of Labor Statistics, Conference Center Room 10
To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.

Abstract:

The National Agricultural Statistics Service (NASS) conducts surveys within the U.S. Department of Agriculture. Traditionally, editing and imputation is done manually via Blaise, which is labor intensive. To reduce costs NASS purchased Statistics Canada's Banff system for statistical editing and imputation. This system was applied to the hog survey in Minnesota and Nebraska for December 2010 and March 2011. After data collection and before any manual editing, the original data were processed through the Banff editing code, consisting of only commodity specific edits. Banff error localization and imputation reduced the error count by 90%. Records still in error were given the values derived from manual editing for calculating indications. The resulting indications were not significantly different from indications derived from the manually edited data in 85% of comparisons. Work is underway to improve the questionnaire regarding some findings. These favorable results motivate implementation into the operational program.

#### Title: Overview of Data Confidentiality in the Federal Context

The Confidentiality and Data Access Committee (CDAC) and the Washington Statistical Society (WSS) will be presenting an overview on data confidentiality in the federal context on Thursday, November 3, 2011. The day-long seminar will be held at the Bureau of Labor Statistics Conference Center.

8:30 Registration 9:00 Welcome and Overview (Jonaki Bose, SAMHSA) 9:10 Governing Laws and Legal Issues (Chris Chapman, BLS) 10:00 Assessing Disclosure Risk (Ramona Rantala, BJS) 10:45 Break 11:00 Restricted Access Procedures (Jonaki Bose, SAMHSA) 12:00 Lunch (on your own) 1:15 Statistical Disclosure Limitation Methods for Tabular Data (Mike Buso, BLS) 2:00 Statistical Disclosure Limitation Methods for Microdata (J. Neil Russell, SAMHSA) 2:45 Examples of Disclosure Problems and Disclosure Limitation Strategies (Larry Cox, NISS) 3:15 Conclusion/Questions (Jonaki Bose, SAMHSA) Return to top

#### Title: Modeling U.S. Cause-Specific Mortality Rates Using an Age-Segmented Lee Carter Model

• Speaker: Jiraphan Suntornchost , Department of Mathematics, UMCP
• Date/Time: Thursday, November 3rd at 3:30 PM
• Location: Room 1313, Math Building , University of Maryland College Park (directions).

Abstract:

In many demographic and public-health applications, it is important to summarize mortality curves and time trends from population-based age-specific mortality data collected over successive years, and this is often done through the well-known model of Lee and Carter (1992). In this paper, we propose a modification of the Lee-Carter model which combines an age-segmented Lee-Carter model with spline-smoothed period-effects within each age segment. With different period-effects across age-groups, the segmented Lee-Carter is fitted by using iterative penalized least squares and Poisson Likelihood methods. The new methods are applied to the 1971-2006 public-use mortality data sets released by the National Center for Health Statistics (NCHS). Mortality rates for three leading causes of death, heart diseases, cancer and accidents, are studied in this research. The results from data analysis suggest that the age-segmented method improves the performance of the Lee-Carter method in capturing period-effects across ages.

#### Title: An Overview of Drinking Water Laws, Regulations, and Policy

• Speaker: J. Alan Roberson, Director of Federal Relations, American Water Works Association
• Time: Friday, November 4th 3:30 pm - 4:30 pm
• Location: The George Washington University, Duques 553 (2201 G Street, NW)
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences & Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm.

Abstract:

This presentation will summarize the evolution of drinking water laws and regulation starting with the passage of the initial Safe Drinking Water Act (SDWA) in 1974 and subsequent amendments in 1986 and 1996. The Environmental Protection Agency (EPA) has published 18 major drinking water regulations between 1976 and 2006, and the evolution of these regulations will be discussed - how contaminants are selected for regulation and how the numerical standards are developed. The policy aspects of the regulatory development process will be discussed, along with how politics can shape drinking water regulations within the current statutory framework.

#### Title: Length-biased Data Analysis

Abstract:

Statistical inference can be challenge when analyzing data from the epidemiologic prevalent cohort studies when non random sampling of subjects is involved. Due to the biased sampling scheme, independent censoring assumption is often violated. Although the issues about biased inference caused by length-biased sampling have been widely recognized in statistical, epidemiological and economical literature, there is no satisfactory solution for two-sample testing and regression modeling. We propose an asymptotic most efficient nonparametric test by properly adjusting for length-biased sampling. We will also describe some important properties for the standard logrank test under different biased sampling schemes and right-censoring mechanisms. This is a joint work with Jing Ning and Jing Qin.

#### Title: Clusters in Irregular Areas and Lattices

• Speaker: Professor William Wieczorek, Center for Health and Social Research, Buffalo State College
• Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
• Date: Friday, September 30, 2011
• Location: Research 1, Room 301, Fairfax Campus, George Mason University, 4400 University Drive, Fairfax, VA 22030
• Sponsor: George Mason University SPACS/CCDS/Statistics Colloquium

Abstract:

In this talk, I present an overview of the development, capabilities, and utilization of geographic information systems (GISs). There are nearly an unlimited number of applications that are relevant to GIS because virtually all human interactions, natural and man-made features, resources, and populations have a geographic component. Everything happens somewhere and the location often has a role that affects what occurs. This role is often called spatial dependence or spatial autocorrelation, which exists when a phenomenon is not randomly geographically distributed. GIS has a number of key capabilities that are required to conduct a spatial analysis to assess this spatial dependence. This talk presents these capabilities (e.g., georeferencing, adjacency/distance measures, and overlays) and provides a case study to illustrate how GIS can be used for both research and planning. Although GIS has developed into a relatively mature application for basic functions, development is required to more seamlessly integrate spatial statistics and models.

This is joint work with Alan M. Delmerico.

#### Career Panel: What Can I Do with a Degree in Mathematics, Statistics or Physics and Related Fields?

• Tentative Speakers:
Business/Industry Speaker (An operations research major working at Price Waterhouse Cooper) Intelligence Community Speaker (not yet fully confirmed) U.S. Government Speaker (A mathematics major working at U.S. Energy Information Administration) Private Contractor (A statistics/political science major working at Mathematica Policy Research)
• Date: Wednesday, November 9, 2011
• Time: 4:00 p.m. to 7:00 p.m.
• Location: U.S. Department of Energy, Forrestal 2E-069/2E-081, 1000 Independence Avenue SW, Washington, D.C. 20585
• Sponsors: U.S. Department of Energy, U.S. Energy Information Administration, University of the District of Columbia Department of Mathematics, Georgetown University Career Education Center, the American Statistical Association, and the Washington Statistical Society.
• Co-organizers: Caitlin Donahue (Student-Georgetown University), Carol Joyce Blumberg (Statistician-U.S. Energy Information Administration), and Debbie Jones (CGW STEM Program Manager, Office of the Chief Human Capital Officer).
• Directions: Closest Metro station is the Smithsonian-Independence Ave. exit. The Visitors' entrances to the Department of Energy are located across the street from the Smithsonian Castle.
• Security: Due to Department of Energy regulations, only U.S. citizens may attend this event. Those bringing laptops should arrive by 4:30 p.m. (if possible) to facilitate inspection of laptops. Those without laptops should arrive by 4:40 p.m. (if possible) to facilitate checking in. All those attending will be required go through a screening process similar to airport screening with the non-full body scanners (except that there is no limit on liquid and powder amounts).
• RSVP: Deborah Jones at deborah.jones@hq.doe.gov by Wednesday, November 2, 2011.

The U.S. Department of Energy will be hosting a career panel event featuring professionals who will talk about their careers and the career options for college students pursuing degrees in mathematics, statistics, computing, technology, engineering, and physics.

The event will particularly emphasize the career opportunities available for women.

4:00 p.m. Arrival, Security Check-in, & Informal Networking 4:50 p.m. Introduction 5:00 p.m. Career Panel 6:00 p.m. Pizza, Refreshments & Informal Networking

#### Title: Which Sample Survey Strategy? A Review of Three Different Approaches

• Speaker: R. L. Chambers, Centre for Statistical and Survey Methodology, University of Wollongong
• Date/Time: Wednesday, November 16, 2011, 11:00am - 12:30pm
• Location: Bureau of Labor Statistics, Conference Center. To be placed on the seminar attendance list at the Bureau of Labor Statistics, you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.

Abstract:

We review the essential characteristics of the three different approaches to specifying a sampling strategy; the design-based approach, the model-assisted approach and the model-based approach. We then describe a unified framework for survey design and estimation that incorporates all three approaches, allowing us to contrast them in terms of their concepts of efficiency as well as their robustness to assumptions about the characteristics of the finite population. Our conclusion is that although no one approach delivers both efficiency and robustness, the model-based approach seems to achieve the best compromise between these typically conflicting objectives.

#### Title: Equivariance and Pitman Closeness in Statistical Estimation and Prediction

• Speaker: Professor Tapan Nayak, Statistics Department, George Washington University
• Date/Time: Friday, November 18, 11am -12pm
• Location: The George Washington University, Monroe Hall, Room B32 (2115 G Street, NW, Washington, DC 20052).
• Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/~map.
• Sponsor: The George Washington University, Department of Statistics. List of all GWU Statistics seminars this semester: http://www.gwu.edu/~stat/seminars/spring2011.htm

Abstract:

For location, scale and location-scale models, which are common in practical applications, we derive optimum equivariant estimators and predictors using the Pitman closenesss criterion. This approach is very robust with respect to the choice of the loss function as it only requires the loss function to be strictly monotone. We also prove that, in general, the Pitman closeness comparison of any two equivariant predictors depends on the unknown parameter only through a maximal invariant, and hence it is independent of the parameter when the parameter space is transitive. We present several examples illustrating applications of our theoretical results.

#### Title: Neyman, Markov Processes and Survival Analysis

• Speaker: Professor Grace Yang (UMCP)
• Date/Time: Thursday, December 1st at 3:30 PM
• Location: Room 1313, Math Building , University of Maryland College Park (directions).

Abstract:

Neyman used stochastic processes extensively, particularly the Markov processes, in his applied work. One example is his use of Markov models for comparison of different treatments of breast cancer. The work gives rise to the celebrated Fix-Neyman competing risks model. In this talk we revisit the Fix-Neyman model and one of its extensions to a non-parametric analysis made by Altshuler (1970). This will be followed by a comparison of the Fix-Neyman model with the current development of the survival analysis. We shall illustrate that the Markov models advocated by Neyman offers a very general approach to study many of the problems in survival analysis.

#### Title: Geometric Quantization

• Speaker: Yang Xu, School of Physics, Astronomy and Computational Sciences, George Mason University
• Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
• Date: Friday, December 9, 2011
• Location: Research 1, Room 301, Fairfax Campus, George Mason University, 4400 University Drive, Fairfax, VA 22030
• Sponsor: George Mason University SPACS/CCDS/Statistics Colloquium

Abstract:

In this talk, I discuss a nonparametric method for data quantization so as to reduce massive data sets to more manageable sizes. I discuss the probabilistic foundation and demonstrate statistical results for the quantization process. I discuss optimal geometric quantization procedures and discuss the computational and storage complexity of these procedures.

This discussion is based on a paper by Drs. Nkem-Amin Kuhmbah and Edward Wegman published in 2003.