Washington Statistical Society on Meetup

Washington Statistical Society Seminars: 2013

January 2013
10
Thur.
Analyze US Government Survey Data with R
15
Tues.
Inappropriate Use of Statistical Measures in the Name of Balancing Data Quality and Confidentiality of Tabular Format Magnitude Data
25
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Getting A Seat at the Table: Tips to Biostatisticians
25
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
Stable Distributions: Models for Heavy Tailed Data
February 2013
6
Wed.
Statistical Diplomacy in North Korea
7
Thur.
George Mason University
Department of Statistics
Large Covariance Matrix Estimation With Factor Analysis
7
Thur.
University of Maryland
Department of Statistics
Renyi Entropy and Large Probability Sets
8
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
How do I know if a Recommended Treatment Works?
20
Wed.
U.S. Census Bureau
DSMD Distinguished Seminar Series
Replication Variance Estimation for Rejective Sampling
22
Fri.
Georgetown University
Department of Mathematics & Statistics
Designing and Using an Online Course to Support Teaching Introductory Statistics: Experiences and Assessments
26
Tues.
Testing for Hardy Weinberg Equilibrium in National Genetic Household Surveys
March 2013
1
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
Adversarial Risk Analysis: Games and Auctions
7
Thurs.
Data Science DC
Estimating Effect Sizes in Machine Learning Predictive Models
7
Thur.
University of Maryland
Department of Statistics
Nonparametric Instrumental Variable Regression - Statistics
8
Fri.
George Mason University
Department of Statistics
Density Estimation for Incomplete Data Model
22
Fri.
George Mason University
Department of Statistics
Nonparametric Estimation of Conditional Distributions and Rank-Tracking Probabilities With Time-Varying Transformation Models in Longitudinal Studies
27
Wed.
American University
Info-Metrics Institute
Local Linear GMM Estimation of Functional Coefficient IV Models with an Application to Estimating the Rate of Return to Schooling
28
Thurs.
One Step or Two? Calibration Weighting from a Complete List Frame with Nonresponse
28
Thurs.
U.S. Census Bureau
DSMD Distinguished Seminar Series
Best Predictive Small Area Estimation
28
Thur.
University of Maryland
Department of Statistics
A Reversible Jump Hidden Markov Model Analysis of Search for Cancerous Nodules in X-ray Images
29
Fri.
George Mason University
Department of Statistics
Sequential Change-Point Detection in Sensor Networks
29
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
Should Event Organizers Prevent Resale of Tickets?
April 2013
2
Tues.
American Demographic History Chartbook: 1790 to 2010
4
Thurs.
American University
Department of Mathematics and Statistics
Math + : What else can you do with mathematics?
5
Fri.
JPSM Distinguished Lecture Series
Public Opinion Polls in the News
12
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
Vast Search Effect
19
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
The Role of Predictive Distributions in Process Optimization
19
Thur.
George Mason University
Department of Statistics
Estimating Restricted Mean Job Tenures in Semi-competing Risk Data Compensating Victims of Discrimination
7
Thur.
University of Maryland
Department of Statistics
Shrink Large Covariance Matrix without Penalty: An Empirical Nonparametric Bayesian Framework for Brain Connectivity Network Analsysis
26
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Analysis of the OMOP Results Database — Does the Method Matter More than the Truth?
May 2013
22
Fri.
2013 WSS President's Invited Seminar
From Multiple Modes for Surveys to Multiple Sources for Estimates
June 2013
4
Tues.
Record Linkage Applications and Statistical Considerations
25
Tues.
Gertrude M. Cox Statistics Award Presentation
Big Data in Survey Research: Analyzing Process Information (Paradata)
25
Tues.
U.S. Census Bureau
DSMD Distinguished Seminar Series
Identification and Multiple Imputation of Implausible Gestational Ages for the Study of Preterm Births
July 2013
17
Wed.
Nonresponse Modeling in Repeated Independent Surveys in a Closed Stable Population--Did the Local Election Officials (LEOs) Roar in 2012?
30-31
Tue.-Wed.
Summer Conference Preview/Review 2013 (partnership with DC-AAPOR)
August 2013
20
Tues.
Stochastic Gradient Estimation: Tutorial Review, Recent Research
September 2013
6
Tues.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
An Overview of an Analytic Approach for Branching Processes
17
Tues.
The 2013 Roger Herriot Award
The 1973 Exact Match Study
19
Thur.
University of Maryland
Department of Statistics
Averaged Regression Quantiles
20
Fri.
George Mason University
Department of Statistics
First-hitting-time Based Threshold Regression Models for Time-to-event Data
24
Tues.
U.S. Census Bureau
DSMD Distinguished Seminar Series
Statistical Inference under Nonignorable Sampling and Nonresponse: An Empirical Likelihood Approach
24
Tues.
University of Maryland
Department of Statistics
Complexity Penalization in Sparse and Low Rank Matrix Recovery
27
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Enriched Ensemble methods for Classification of High-Dimensional Data
27
Fri.
Georgetown University
Department of Mathematics & Statistics
Population Dynamics of Species-Rich Ecosystems: The Mixture of Matrix Population Models Approach
27
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
Stochastic Optimization Problems with Multivariate Stochastic Constraints
27
Fri.
George Mason University
Department of Statistics
Novel Statistical Frameworks for Modeling Functional Brain Connectivity Using fMRI Data
October 2013
3
Thur.
University of Maryland
Department of Statistics
Some Old Problems Revisited
4
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
CPS Unemployment Estimates by Rotation Panel and Research Topics
4
Fri.
George Mason University
Department of Statistics
Combination of Longitudinal Biomarkers in Predicting Binary Events With Application to a Fetal Growth Study

10
Thur.
University of Maryland
Department of Statistics
Within-Cluster Resampling Methods for Clustered ROC Data
11
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Determining Change-points in Tumor Blood Flow using a Modified Information Criteria to Better Balance Complexity and Fit in a Semi-parametric Model
11
Fri.
George Mason University
Department of Statistics
Matern Class of Cross-Covariance Functions for Multivariate Random Fields
18
Fri.
George Washington University
Department of Statistics
Estimating Restricted Mean Job Tenures in Semi-Competing Risk Data Compensating Victims of Discrimination
25
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Research at Census, including links to Biostatistcs/Informatics
25
Fri.
George Mason University
Department of Statistics
Some Statistical Problems in Models for Complex Networks

30
Wed.
Unseasonal Seasonals?
31
Thur.
University of Maryland
Department of Statistics
Asymptotic Normality and Optimalities in Estimation of Large Gaussian Graphical Model
November 2013
1
Fri.
Georgetown University
Department of Mathematics & Statistics
New Classes of Nonseparable Space-Time Covariance Functions
1
Fri.
George Mason University
Department of Statistics
Marginal Analysis of Measurement Agreement Data Among Multiple Raters With Missing Ratings

6
Wed.
Measuring the Real Size of the World Economy—Methodology and Challenges
7
Thur.
George Mason University
Department of Statistics
Fast Community Detection in Large Sparse Networks
8
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Estimation of Mean Response Via Effective Balancing Score
8
Fri.
George Washington University
Department of Statistics
New Classes of Nonseparable Space-Time Covariance Functions
15
Fri.
George Mason University
Department of Statistics
Information and Heuristic Creation
15
Fri.
George Washington University
Department of Statistics
Has The Time Come To Give Up Blinding In Randomized Clinical Trials?
19
Tues.
The Remarkable Robustness of Ordinary Least Squares in Randomized Clinical Trials
22
Fri.
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
SHARE: Statistical and Synthetic Health Information Release With Differential Privacy
22
Fri.
George Washington University
Department of Statistics
Large Panel Test of Factor Pricing Models
22
Fri.
George Washington University
The Institute for Integrating Statistics in Decision Sciences & Department of Decision Sciences
On Aggregating Probabilistic Information: The Wisdom of (and Problem with) Crowds
December 2013
6
Fri.
George Mason University
Department of Statistics
Distributional Convergence for the Number of Symbol Comparisons Used by QuickSort
16
Mon.
Julius Shiskin Award Seminar
Micro Data Research and Macro Level Understanding: Innovation at U.S. Statistical Agencies


Title: Analyze US Government Survey Data with R

  • Speaker: Anthony Damico, Statistical Analyst at the Henry J. Kaiser Family Foundation
  • Date and Time: Thursday, January 10, 2013, at 6:30 p.m.
  • Location: Washington Post Auditorium, 1150 15th Street NW, Washington, DC
  • Sponsor: R Users DC
  • Website: Register to attend at http://meetup.com/R-users-DC/events/95903742/

Abstract:

This presentation will outline why the R language is well-positioned to become the "lingua statistica" of survey methodologists, how the R survey and sqlsurvey packages work, and how to get started using one of the government survey data sets. There will also be a brief introduction to the column-oriented database MonetDB and a new method of communication with the R language. [more at the website]

Return to top

Title: Inappropriate Use of Statistical Measures in the Name of Balancing Data Quality and Confidentiality of Tabular Format Magnitude Data

  • Organizer: Dan Liao, WSS Methodology Program Chair
  • Chair: Darryl Creel, RTI International
  • Speaker: Ramesh A. Dandekar, Energy Information Administration
  • Discussant: Michael L. Cohen, Committee on National Statistics
  • Date & Time: Tuesday, January 15, 12:30pm-2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center Room 8
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program
  • Presentation material:
    Ramesh A. Dandekar Presentation Slides (pdf, ~1.7mb)
    Michael L. Cohen Discussion Slides (pdf, ~60kb)

Abstract:

Statisticians are aware of the fact that measures such as: mean, variance, Pearson correlation coefficient are disproportionately influenced by relatively few extremely large observations and, therefore, are unreliable as statistical measures in comparing overall quality of data with an extremely skewed distribution. Tabular data cells follow an extremely skewed distribution. In this paper we show that linear-programming-based controlled tabular adjustments (CTA), which generates synthetic tabular data (Dandekar 2001), makes use of a least absolute difference linear regression model and is well-suited to control overall data quality on its own without additional steps proposed by quality preserving controlled tabular adjustments (QP-CTA) that has been heavily promoted to the statistical community since 2003.

Return to top

Title: Getting A Seat at the Table: Tips to Biostatisticians

  • Speaker: Janet Wittes, Ph.D., President and Founder of Statistics Collaborative, Inc.
  • Date: Friday, January 25, 2013
  • Time: 10-11 am
  • Q&A: 11:00-11:30 am. Please reply to Lindsay Seidenberg (lcb48@georgetown.edu) if you are interested in meeting for 30 minutes with the seminar speaker in the afternoon.
  • Location: Warwick Evans Conference Room, Building D, 4000 Reservoir Rd, Washington, DC.
  • Directions: http://dbbb.georgetown.edu/mastersprogram/visitors/. Medical Campus map: http://bit.ly/X8OKBN
  • Parking: Metered street parking is available along Reservoir Road. To park on campus, drive into Entrance 1 via Reservoir Road and drive straight back to Leavey Garage. Parking fees are $3.00 per hour.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

As statisticians we have much to offer study teams in ways not strictly related to statistical method and theory. Our experience is often very broad, so we can bring to discussions insights from a variety of fields. We have a solid grounding in the scientific method and often with science itself. In the context of drug development, we understand how studies relate to each other and how the FDA interprets data. Nonetheless, we are often not invited to participate in discussions of strategy and of interpretation. This talk presents some examples - some based on personal experiences, some based on hearsay - of cases were the statistician was excluded from discussions where statistical insight could have been important. I ask how often this exclusion is due to failure of physicians to appreciate our contribution and how often it is based on our failure to demonstrate our value. The talk will address some ways in which statisticians act in ways that show a lack of intellectual and emotional involvement with studies. Finally, I shall suggest strategies to help us insinuate ourselves into discussions where the unique perspective of the statistician can contribute positively to decisions.

Return to top

Title: Stable Distributions: Models for Heavy Tailed Data

  • Speaker: John Nolan, American University, Washington, D.C.
  • Time: Friday, January 25th 3:30 pm - 4:30 pm (Followed by Wine and Cheese Reception)
  • Place: Duques 553 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Sciences. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

Stable distributions are a class of heavy tailed probability distributions that generalize the Gaussian distribution and that can be used to model a variety of problems. An overview of univariate stable laws is given, with emphasis on the practical aspects of working with stable distributions. Then a range of statistical applications will be explored. If there is time, a brief introduction to multivariate stable distributions will be given.

Return to top

Title: Statistical Diplomacy in North Korea

  • Speakers: Chancellor and Professor Chan-Mo Park of Pyongyang University of Science and Technology, DPR Korea; President of Pohang University of Science and Technology/POSTECH (2003 - 2007), Republic of Korea; Senior Adviser to President of the Republic of Korea in Science and Technology (2008-2009); and Dr. Asaph Young Chun, Program Chair of ASA Statistics Without Borders; Director of Pyongyang Summer Institute in Survey Science and Quantitative Methodology.See below for short biographies of the speakers.
  • Chair: Michael P. Cohen, American Institutes for Research
  • Date/Time: Wednesday, February 6, 2013 / 12:30 - 2:00 p.m.
  • Location: Pew Research Center, Kellermann Room, 1615 L Street, NW, Suite 700, Washington, DC 20036
  • Directions: To be placed on the attendance list, please RSVP to http://www.pyongyangsummerinstitute.org/events.html at least 24 hours in advance of the seminar.
  • Metro Accessible: Accessible for either the Farragut North (Red Line) and/or Farragut West (Orange & Blue Lines) stations.
  • Parking: Parking is available in the building on a first come, first serve basis. The rate is $16.00 a day. Also, there is additional parking directly across the street.
  • Sponsors: Washington Statistical Society, the ASA Statistics Without Borders, and Capital Area Social Psychological Association

Abstract:

The purpose of this special, late-breaking session is to discuss statistical science diplomacy in DPRK (North Korea): ingredients for launching a modestly successful survey methodology program last summer in DPRK and challenges to overcome this year. The Pyongyang Summer Institute in Survey Science and Quantitative Methodology (PSI) began with about 250 North Korean undergraduate and graduate students last summer, taught by 13 pro bono instructors from the U.S. and European nations. Modeled after the 65-years old University of Michigan Summer Institute, a world-class training ground in survey methodology, PSI was launched by the 501(c)(3) International Strategy and Reconciliation Foundation in collaboration with academic and international professional organizations, such as ASA, Statistics Without Borders, AAPOR, American Association for the Advancement of Science (AAAS), and European Survey Research Association (ESRA). The Pyongyang University of Science and Technology (PUST), the first and only private university in North Korea, hosted PSI, the first higher education survey methodology program in North Korean history. This session will demonstrate how a multilateral team of members of ASA, AAPOR, AAAS and ESRA has evolved as a joint force to design and implement an unprecedented program that integrates statistical science and scholar-to-scholar diplomacy in North Korea.

Short biographies of speakers:

Chancellor and Professor Chan-Mo Park
Pyongyang University of Science and Technology, DPRK (North Korea)
Chancellor Chan-Mo Park was the 4th President of Pohang University of Science & Technology (POSTECH), a Republic of Korea version of MIT, from September, 2003 to August, 2007. After retiring from POSTECH, Professor Park served as a Special Advisor to the President of ROK in Science and Technology and the first President of National Research Foundation (NRF) of Korea, a South Korean version of NSF, before he assumed Chancellor position with Pyongyang University of Science & Technology (PUST) in September, 2009. Dr. Park's experience includes: professorship in the Computer Science Departments at the University of Maryland, College Park and KAIST, ROK; Professor and Chairman at The Catholic University of America, Washington D.C.; and professorship at PUST. He has also taught for the Boston University Overseas Program in Germany and the University of Maryland Asian Division in Japan. His major research interests are Digital Image Processing, Computer Graphics, Virtual Reality, and System Simulation. for the past several years, he has been involved with research activities concerning information technology (IT) development in DPRK (North Korea) and carried out a joint research on virtual reality with Pyongyang Informatics Center (PIC) in Pyongyang, DPRK for seven years. Dr. Park was decorated by the Republic of Korea with the National Order of Camellia in 1986 for his contributions to the advancement of science and technology in Korea and received the Teacher of the Year Award from The Catholic University of America in 1987 for his excellence in teaching. In June, 2005 he was decorated by the Republic of Korea with the Blue Stripes Order of Service Merit for his contributions on information technology development in Korea and collaborations with DPRK. Prof. Park also received the International Alumnus Award from the University of Maryland, College Park in April, 2009 for providing significant leadership to another country's educational, cultural, social and/or economic development. Professor Park received his B.S. from Seoul National University and M.S. and Ph.D. both from the University of Maryland, College Park. He earned an Honorary Doctor of Letters degree from the University of Maryland University College in 2001 in recognition of his scholarly achievements and distinguished service. In 1995, he was also elected as a Fellow by the Korean Academy of Sciences and Technology.

Asaph Young Chun, Ph.D. Program Chair of ASA Statistics without Borders; Director, Pyongyang Summer Institute in Survey Science and Quantitative Methodology
Dr. Chun is a survey methodologist and sociologist with about 25 years of experience in large-scale survey and census research conducted for U.S. federal agencies, such as the Bureau of Labor Statistics, the National Center for Education Statistics, Department of Health and Human Service, and National Science Foundation as well as the US Census Bureau. His research focuses on theory-driven nonresponse and measurement errors with current research devoted to survey costs and errors modeling, and the use of administrative records and big data with a theory of "pandata." Beginning his early career with academic institutions like the University of Michigan Institute for Social Research (data archive specialist) and University of Maryland (faculty instructor), he also worked for academic research institutions, such as NORC at the University of Chicago (senior survey methodologist) and American Institutes for Research (senior research scientist). As a SWB member, he supported the Haiti project by providing human subject review required for an assessment survey of the 2010 earthquake and provided methodological assistance to African projects. Teaming with volunteer professionals of SWB, AAPOR, AAAS, and AEA, he currently directs a higher education PSI program that offers intensive summer graduate training in statistics and survey methodology in DPRK. He earned his A.B. and M.A. in Communication Studies with emphasis on survey methods at the University of Michigan and Ph.D. in Sociology at University of Maryland.

Return to top

Title: Large Covariance Matrix Estimation With Factor Analysis

Abstract:

Sparsity is one of the key assumptions to effectively estimate a high-dimensional covariance matrix. However, in many applications, the sparsity does not hold due to the existence of some common factors. Therefore, in practice a more reliable approach for estimating a large covariance matrix is to first take out the possible common factors before applying Bickel and Levina (2008)'s thresholding. In this talk, I will give detailed explanation of the theory and method of high-dimensional factor analysis. The key feature is that in a high-dimensional factor model, the covariance matrix has a few very large eigenvalues that diverge fast with the dimensionality. I will introduce an effective covariance estimator when sparsity does not hold, called POET, with the help of factor analysis. Some immediate applications in finance and econometrics will also be presented.

Return to top

Title: Renyi Entropy and Large Probability Sets

  • Speaker: Himanshu Tyagi, Dept. of Electrical Engineering, Univ. of Maryland
  • Date/Time: February 7, 2013, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

We provide an estimate of the size of a large probability set, associated with a general random variable (rv), in terms of Renyi entropy. This result has several potential applications. for instance, in data compression, Renyi entropy serves to represent a general sequence of rvs just as Shannon entropy represents the minimum rate of bits needed to represent i.i.d. rvs. We also discuss another application in the context of multiterminal security.

This talk is based on joint work with Prakash Narayan.

Return to top

Title: How do I know if a Recommended Treatment Works?

  • Speaker: Nancy L. Geller, Ph.D., Director, Office of Biostatistics Research at National Heart, Lung, and Blood Institute
  • Date: Friday, February 8, 2013
  • Time: 10-11 am
  • Q&A: 11:00-11:30 am
  • Location: Warwick Evans Conference Room, Building D, Georgetown University Medical Campus
  • Directions: http://dbbb.georgetown.edu/mastersprogram/visitors/. Medical Campus map: http://bit.ly/X8OKBN
  • Parking: Metered street parking is available along Reservoir Road. To park on campus, drive into Entrance 1 via Reservoir Road and drive straight back to Leavey Garage. Parking fees are $3.00 per hour.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

We are often faced with the decision of whether to take a recommended treatment or follow a recommended prevention regimen. How can we assess whether these will be effective? We discuss designs of medical studies and compare other study designs to a randomized clinical trial (RCT), demonstrating its advantages. Examples of recent RCTs of treatments widely believed to work are discussed, including hormone replacement therapy for prevention of heart disease in post-menopausal women, whether an over-the-counter lozenge helps avoid colds, and whether someone with a vertebral fracture should undergo a procedure which surgeons recommend. A number of statistical concepts emerge en route. How to draw conclusions from studies such as these is emphasized.

Return to top

U.S. Census Bureau
DSMD Distinguished Seminar Series

Title: Replication Variance Estimation for Rejective Sampling

  • Presenter: Dr. Wayne A. Fuller, Department of Statistics, Iowa State University
  • Discussant #1: Dr. John Eltinge, Bureau of Labor Statistics
  • Discussant #2: Dr. Phillip Kott, RTI International
  • Chair: Ruth Ann Killion, Chief, Demographic Statistical Methods Division, U.S. Census Bureau
  • Date: Wednesday, February 20, 2013
  • Time: 2:00 pm - 3:30 pm
  • Where: Conference Rooms 1&2, U.S. Census Bureau, 4600 Silver Hill Road, Suitland, Maryland
  • Contact: Cynthia Wellons-Hazer, 301-763-4277, Cynthia.L.Wellons.Hazer@census.gov

Abstract:

The rejective sampling method of Fuller (2009) is reviewed and illustrated. Replication procedures for estimating the variance of estimators constructed with rejective samples are suggested. A bootstrap sample that is a rejective, unequal probability, replacement sample selected from the original sample is described. Simulations for Poisson and stratified samples support the use of replicates in estimating the variance of the regression estimator for rejective samples.

Return to top

Title: Designing and Using an Online Course to Support Teaching Introductory Statistics: Experiences and Assessments

  • Speaker: Professor Oded Meyer Department of Mathematics & Statistics Georgetown University
  • Date: Friday, February 22, 2012
  • Time: 3:15 pm
  • Location: St. Mary's 326, Georgetown University, Washington, DC.
  • Directions: maps.georgetown.edu
  • Sponsor: Department of Mathematics and Statistics, Georgetown University (math.georgetown.edu)

Abstract:

As part of the Open Learning Initiative (OLI) project, Carnegie Mellon University was funded to develop a web-based introductory statistics course, openly and freely available to individual learners online and designed so that students can learn effectively without an instructor. In practice, this course is often used by instructors in "blended" mode, to support and complement face-to-face classroom instruction.

The course was designed to provide pedagogical scaffolding to students as they acquire new knowledge and at the same time provide detailed feedback to the instructor about the students' progress and performance on the different course learning objectives.

The presentation will discuss the course design in terms of content and structure, demonstrate both the student's and instructor's experience when interacting with the system, and will describe the design and results of studies in which the course's effectiveness was assessed in the hybrid instructional model.

The presentation will conclude with a discussion on how instructors can utilize Carnegie Mellon's OLI platform to develop their own courses and describe an ongoing project with the School of Foreign Service at Georgetown University.

Return to top

Title: Testing for Hardy Weinberg Equilibrium in National Genetic Household Surveys

  • Organizer/Chair: Dan Liao, WSS Methodology Program Chair/RTI International
  • Speaker: Dr. Yan Li, Joint Program in Survey Methodology at the University of Maryland
  • Date & Time: Tuesday, February 26, 12:30pm-2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center Room 3
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program

Abstract:

In population-based household surveys, for example, the National Health and Nutrition Examination Survey (NHANES), blood-related individuals are often sampled from the same household. Therefore, genetic data collected from national household surveys are often correlated due to two levels of clustering (correlation) with one induced by the multistage geographical cluster sampling, and the other induced by biological inheritance among multiple participants within the same sampled household. To address this problem, we develop efficient statistical methods that consider the weighting effect induced by the differential selection probabilities in complex sample designs, as well as the clustering (correlation) effects described above. We examine and compare the magnitude of each level of clustering effects under different scenarios and identify the scenario under which the clustering effect induced by one level dominates the other. The proposed method is evaluated via Monte Carlo simulation studies and illustrated using the Hispanic Health and Nutrition Survey (HHANES) with simulated genotype data.

Return to top

Title: Estimating Restricted Mean Job Tenures in Semi-competing Risk Data Compensating Victims of Discrimination

Abstract:

When plaintiffs prevail in a discrimination case, a major component of the calculation of economic loss is the length of time they would have been in the higher position had they been treated fairly during the period in which the employer practised discrimination. This problem is complicated by the fact that one's eligibility for promotion is subject to termination by retirement and both the promotion and retirement processes may be affected by discriminatory practices. This semi-competing risk process is modeled by a retirement process and a promotion process among the employees. Predictions for the purpose of compensation are made by utilizing the expected promotion and retirement probabilities of similarly qualified members of the non-discriminated group. The restricted mean durations of three periods are estimated - the time an employee would be at the lower position, at the higher level and in retirement. The asymptotic properties of the estimators are presented and examined through simulation studies. The proposed restricted mean job duration estimators are shown to be robust in the presence of an independent frailty term. Data from the reverse discrimination case, Alexander v. Milwaukee, where White-male lieutenants were discriminated in promotion to captain are reanalyzed. While the appellate court upheld liability, it reversed the original damage calculations, which heavily depended on the time a plaintiff would have been in each position. The results obtained by the proposed method are compared to those made at the first trial. Substantial differences in both directions are observed.

Return to top

Title: Shrink Large Covariance Matrix without Penalty: An Empirical Nonparametric Bayesian Framework for Brain Connectivity Network Analsysis

  • Speaker: Dr. Shuo Chen, Dept. of Epidemiology and Biostatistics, Univ. of Maryland
  • Date/Time: April 25, 2013, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

In neuroimaging, brain connectivity generally refers to associations between neural units from distinct brain locations. We use nodes (vertices) to represent the neural processing units and edges to note connectivity between those units as in graph theory. In brain network analysis, the edge intensities (connectivity strengths) are usually taken as input data. for statistical modeling, the covariance between edges yields important information because it not only reflect the correlation structure between edges also the spatial structure of nodes. However, the dimension of covariance parameters is very high, for example, 300 nodes will lead to more than one billion covariance parameters between edges. Also, the correlations between edges within and out of brain networks show different distributions. We propose a novel empirical nonparametric Bayesian framework that can efficiently shrink the number of covariance parameters between edges with spatial structure constraint rather than penalty term and yield inferences of brain networks. We apply this method to an fMRI study and simulated data sets to demonstrate the properties of our method.

Return to top

Title: Analysis of the OMOP Results Database -- Does the Method Matter More than the Truth?

  • Speaker: Alan Karr, Ph.D., Director, National Institute of Statistical Sciences
  • When: Friday, April 26, 2013, from 10-11 am, with Q&A time until 11:30 am
  • Where: Warwick Evans Conference Room, Building D, Georgetown University Medical Campus
  • Address: 4000 Reservoir Road NW, Washington, DC 20057
  • Directions and Parking: Detailed directions for public transportation, car, and parking can be found here: http://dbbb.georgetown.edu/about/Visitors/. Medical Campus map: http://bit.ly/X8OKBN
  • Parking: Metered street parking is available along Reservoir Road. To park on campus, drive into Entrance 1 via Reservoir Road and drive straight back to Leavey Garage. Parking fees are $3.00 per hour.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

The results database constructed by the Observational Medical Outcomes Partnership (OMOP) contains more than 6 million records, each representing the results of a particular statistical method (7 methods, some with multiple variants) on nearly 400 drug-adverse outcome pairs, on patient data from five medical records databases. Importantly, for each drug-outcome pair, there is a (scientifically) known ground truth: whether the drug is associated with the outcome. Each analysis variant produces a relative risk, together with supporting statistics. This talk describes initial analyses of the results database. Soberingly, some of these analyses show that relative risk depends more on the analysis method (and its parameters) than on ground truth. We also discuss how to identify methods that seem to have high fidelity to ground truth.

Return to top

Title: Adversarial Risk Analysis: Games and Auctions

Abstract:

Adversarial risk analysis is a decision-analytic approach to strategic games. It builds a Bayesian model for the solution concept, goals, and resources of the opponent, and the analyst can then make the choice that maximizes expected utility against that model. Adversarial risk analysis operationalizes the perspective in Kadane and Larkey (1982), and it often enables the analyst to incorporate empirical data from behavioral game theory. The methodology is illustrated in the context of Le Relance, a routing game, and auctions.

Return to top

Title: Estimating Effect Sizes in Machine Learning Predictive Models

  • Speaker: Abhijit Dasgupta, Consultant at NIH and other organizations
  • Date and Time: Thursday, March 7th, 2013, from 6:30pm - 8:30pm
  • Location: GWU, Funger Hall, Room 103, 2201 G St. NW, Washington, DC
  • Sponsor: Data Science DC
  • Website: Register to attend at http://www.meetup.com/Data-Science-DC/events/105714512/

Abstract:

When using classical regression models, it is relatively easy to estimate effect size. But when your predictive model is a black box, such as a random forest or neural network, this valuable information is typically unattainable. This talk will describe practical new methods for estimating effect size when using machine learning predictive models. (more at the website).

Return to top

Title: Nonparametric Instrumental Variable Regression - Statistics

  • Speaker: Prof. Yuan Liao, Dept. of Mathematics, UMCP
  • Date/Time: March 7, 2013, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

In nonparametric regressions, when the regressor is correlated with the error term, both the estimation and identification of the nonparametric function are ill posed problems. In the econometric literature, people have been using the instrumental variables to solve the problem. But the problem is still very difficult because the identification involves inverting a "Fredholm integration of the first kind", whose inverse either does not exist or is unbounded. I will start by motivating this problem with an application of the effect of education on wage, then explain the concepts of instrumental variables and Fredholm integral equation of the first kind. My proposed Bayesian method does not require the nonparametric function to be identified, so we can never consistently estimate it. Instead, a new consistency concept based on "partial identification" will be introduced.

This is a joint work with my Ph.D. advisor Professor Wenxin Jiang.

Return to top

Title: Density Estimation for Incomplete Data Model

Abstract:

For incomplete data model, the commonly used density estimator of the underlying distribution cannot be computed directly, as the corresponding kernel requires all original data to be available. To estimate density function with such incomplete data model using only the observed data, we propose to use a conditional version of the kernel given the observed data. We study such kernel density estimator for several commonly used incomplete data models. Some large-sample properties of the proposed estimators are investigated.

Return to top

Title: Nonparametric Estimation of Conditional Distributions and Rank-Tracking Probabilities With Time-Varying Transformation Models in Longitudinal Studies

Abstract:

An important objective of longitudinal analysis is to estimate the conditional distributions of an outcome variable through a regression model. The approaches based on modeling the conditional means are not appropriate for this task when the conditional distributions are skewed or can not be approximated by a normal distribution through a known transformation. We study a class of time-varying transformation models and a two-step smoothing method for the estimation of the conditional distribution functions. Based our models, we propose a rank-tracking probability and a rank-tracking probability ratio to measure the strength of tracking ability of an outcome variable at two different time points. Our models and estimation method can be applied to a wide range of scientific objectives that can not be evaluated by the conditional mean based models. We derive the asymptotic properties for the two-step local polynomial estimators of the conditional distribution functions. Finite sample properties of our procedures are investigated through a simulation study. Application of our models and estimation method is demonstrated through a large epidemiological study of childhood growth and blood pressure.

* This is the joint work with Xin Tian (OBR/NHLBI)

Return to top

Title: Local Linear GMM Estimation of Functional Coefficient IV Models with an Application to Estimating the Rate of Return to Schooling

Abstract:

We consider the local linear GMM estimation of functional coefficient models with a mix of discrete and continuous data and in the presence of endogenous regressors. We establish the asymptotic normality of the estimator and derive the optimal instrumental variable that minimizes the asymptotic variance-covariance matrix among the class of all local linear GMM estimators. Data-dependent bandwidth sequences are also allowed for. We propose a nonparametric test for the constancy of the functional coefficients, study its asymptotic properties under the null hypothesis as well as a sequence of local alternatives and global alternatives, and propose a bootstrap version for it. Simulations are conducted to evaluate both the estimator and test. Applications to the 1985 Australian Longitudinal Survey data indicate a clear rejection of the null hypothesis of the constant rate of return to education, and that the returns to education obtained in earlier studies tend to be overestimated for all the work experience.

Return to top

Title: One Step or Two? Calibration Weighting from a Complete List Frame with Nonresponse

  • Speaker: Dr. Philip S. Kott, RTI International
  • Date & Time: Thursday, March 28, 12:30pm-2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center Room 7
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program

Abstract:

When a random sample drawn from a complete list frame suffers from unit nonresponse, calibration weighting to population totals can be used to remove nonresponse bias under either an assumed response (selection) or an assumed prediction (outcome) model. Calibration weighting in this way can not only provide double protection against nonresponse bias, it can also decrease variance. By employing a simple trick one can estimate the variance under the assumed prediction model and the mean squared error under the combination of an assumed response model and the probability-sampling mechanism simultaneously.

Unfortunately, there is a practical limitation on what response model can be assumed when calibrating in a single step. In particular, the choice for the response function cannot always be logistic. That limitation does not hinder calibration weighting when performed in two steps: one to remove the response bias and one to decrease variance. There are potential efficiency advantages from using the two-step approach as well even when the calibration variables employed in both steps are the same or a subset of the calibration variables in the single step. Simultaneous mean-squared-error estimation using linearization is possible, but more complicated than when calibrating in a single step.

An empirical example demonstrates, 1, that double protection works unless both models fail badly, and, 2, that calibration weighting in two steps can be more efficient that in one, although may not be worth the effort.

Return to top

U.S. Census Bureau
DSMD Distinguished Seminar Series

Title: Best Predictive Small Area Estimation

  • Presenter: Dr. Jiming Jiang, Department of Statistics, University of California, Davis
  • Discussant: Dr. Partha Lahiri, Joint Program in Survey Methodology, University of Maryland
  • Chair: Ruth Ann Killion, Chief, Demographic Statistical Methods Division, U.S. Census Bureau
  • Date: Thursday, March 28, 2013
  • Time: 2:00 pm - 3:15 pm
  • Where: Seminar Room 5K410, U.S. Census Bureau, 4600 Silver Hill Road, Suitland, Maryland
  • Contact: Cynthia Wellons-Hazer, 301-763-4277, Cynthia.L.Wellons.Hazer@census.gov

Abstract:

We propose a new method for small area estimation, known as the observed best prediction (OBP; Jiang, Nguyen & Rao 2011, 2012). The OBP is different from the traditional empirical best linear unbiased prediction (EBLUP) in that the unknown model parameters are estimated via the best predictive estimator (BPE), rather than the maximum likelihood or restricted maximum likelihood estimators. One important feature of the OBP is that it is more robust against misspecification of the underlying model, either in terms of the mean function or in terms of the variance function, compared to the EBLUP. We use both theoretical arguments and empirical studies to demonstrate that the OBP can significantly outperform EBLUP in terms of the mean squared prediction error (MSPE), if the underlying model is misspecified. On the other hand, when the underlying model is correctly specified, the overall predictive performance of the OBP is very similar to that of the EBLUP, if the number of small areas is large. Two real data examples are used to illustrate the method.

This work is joint with Thuan Nguyen of the Oregon Health & Science University, and J. Sunil Rao of the University of Miami.

Return to top

Title: A Reversible Jump Hidden Markov Model Analysis of Search for Cancerous Nodules in X-ray Images

  • Speaker: Jin Yan (Department of Mathematics, UMCP)
  • Date/Time: March 28, 2013, 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

Nodules that may represent lung cancer are often missed in chest X-rays. Research has investigated factors affecting search effectiveness based on eye movement patterns, but statistical modeling of these patterns is rare. We analyze eye tracking data of participants looking at chest X-rays with a potential cancerous nodule to find out what areas on the images attract participants' attention more, how their eyes jump among these areas, and which scan pattern is related to an effective capture of the nodule. By using the hidden Markov model and a modified reversible jump Markov chain Monte Carlo algorithm, we estimated the total number of areas of interest (AOIs) on each image, as well as their centers, sizes and orientations. We also use the pixel luminance as prior information, as nodules are often brighter and luminance may thus affect the AOIs. We found that the average number of AOIs per image is about 7, and that participants' switching rate between AOIs is 4.1% on average. One of the AOIs covers the nodule precisely. Differences in scan patterns between those who found the nodule and those who didn't are discussed.

Return to top

Title: Sequential Change-Point Detection in Sensor Networks

Abstract:

Sequential change-point detection problems have a variety of applications including industrial quality control, reliability, fault detection, surveillance, and security systems. By monitoring data streams which are generated from a process, we are interested in quickly detecting malfunctioning once the process goes out control, while keeping false alarms as infrequent as possible when the process is in control. The classical version of this problem, where one monitors a data steam of one (or low) dimension at a centralized location, is a well-developed area. In this talk, we investigate the recent setting where the information available is distributed across a set of sensors. Each sensor receives a sequence of raw observations, and sends a sequence of sensor messages to a central processor, called the fusion center, which makes a final decision when observation are stopped. Two concrete scenarios will be studied. The first one is the "decentralized" case in which the sensor raw observations are required to be quantized into sensor messages that belong to a finite alphabet before sent to the fusion center (due to the need for data compression and limitations of channel bandwidth), and asymptotically optimal procedures will be developed based on Monotone Likelihood Ratio Quantizers (with possibly adaptive thresholds). The second scenario is the "distributed" case in which an occurring event may affect an unknown subset of a large number of sensors at unknown time, and we propose to develop efficient scalable procedures via soft-thresholding shrinkage.

Return to top

Title: Should Event Organizers Prevent Resale of Tickets?

Abstract:

We are interested in whether preventing resale of tickets benefits the capacity providers for sporting and entertainment events. Common wisdom suggests that ticket resale is harmful to event organizers' revenues and event organizers have tried to prevent resale of tickets. for instance, Ticketmaster has recently proposed paperless (non-transferrable) ticketing which would severely limit the opportunity to resell tickets. Surprisingly, we find that this wisdom is incorrect when event organizers use fixed pricing policies, in fact event organizers benefit from reductions in consumers' (and speculators') transaction costs of resale. Even when multi-period pricing policies are used, we find that an event organizer may still benefit from ticket resale if his capacity is small. Given that limiting ticket resale by making it more difficult has resulted in adverse consumer reactions, we propose a novel ticket pricing mechanism of ticket options. We show that ticket options (where consumers would initially buy an option to buy a ticket and then execute at a later date) naturally result in reducing ticket resale significantly and result in significant increases in event organizers' revenues. Furthermore, since a consumer only risks the option price (and not the whole ticket price) if she cannot attend the event, options may face less consumer resistance than paperless tickets. (This is joint work with Yao Cui and Izak Duenyas).

Return to top

Title: American Demographic History Chartbook: 1790 to 2010

  • Speaker: Campbell Gibson, Ph. D. (retired in 2006), Senior Demographer, U.S. Census Bureau
  • Chair: Mike Fleming
  • Date/Time: Tuesday, April 2, 2013 12:30 - 1:30 p.m.
  • Location: Bureau of Labor Statistics Conference Center.
    To be placed on the seminar list attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Agriculture and Natural Resources
  • Point of contact: cgibson47@cox.net

Abstract:

Graphics developed from decennial census data are presented to show the demographic history of the United States and to show how these data can be used in teaching American history. These data have been underutilized because of the limited availability of historical time series. The following topics are included: population growth and distribution, type of residence (urban- rural and metropolitan-nonmetropolitan), race and Hispanic origin, age and sex structure, households and home ownership, marital status, fertility, education (attendance, literacy, and attainment), internal migration, the foreign-born population and country of birth, labor force, and occupation. An optional quiz is included for those who would like to test their knowledge of historical demographic trends and differentials in the United States.

Return to top

Title: Math + : What else can you do with mathematics?

  • Speaker: Mary Gray
  • Date/Time: Thursday, April 4, 2013, 4:00 pm - 5 pm.
  • Place: American University, 4400 Massachusetts Ave. NW, Washington, DC 20016, Gray Hall, Bentley Lounge
  • Directions: http://www.american.edu/media/directions.cfm
  • Sponsor: American University, Department of Mathematics and Statistics

Abstract:

A non-technical talk about the use of mathematics and statistics in working for human rights, women's rights, international development, diversity and inclusiveness.

Return to top

2013 JPSM DISTINGUISHED LECTURE

Title: Public Opinion Polls in the News

  • Speaker: Michael Traugott, Professor of Communication Studies and Political Science and Senior Research Scientist in the Center for Political Studies at the Institute for Social Research
  • Discussants:
    Clyde Tucker, AIR & CNN
    Mark Blumenthal, Huffington Post, University of Maryland
  • Date/Time: Friday, April 5 at 3:00 PM, reception afterwards
  • Location: 1218 LeFrak Hall, University of Maryland, College Park.
  • Directions: http://www.cvs.umd.edu/visitors/maps.html

Abstract:

Information is required to understand, monitor and improve any social, economic, The mass media play an important role in collecting and dissemination polling data about what the public thinks about a number of important issues of the day. In the United States, there has been a symbiotic relationship between pollsters and news organizations for more than 70 years. After a period of steady expansion in the number of polls reported, the recent downturn in the economics of the news business has impacted the frequency of polls and the quality of some data collections. This talk will review these trends and cite examples of issues in low cost data collection (LCDC) and analysis and how they affect the reporting of public opinion as part of political news.

Return to top

Title: Vast Search Effect

Abstract:

The American public is often confronted with sensationalized studies that show statistically significant results for one new phenomenon or another. for example, you might read a headline such as this: "Mother's Depression Linked to Child's Shorter Height," ABC News, Sept 10, 2012. The headline is flashy enough to grab attention. But a new level of truth becomes apparent when one takes the time to learn how the study was designed and how the information was analyzed.

Studies such as this tend to suffer from Vast Search Effect, affectionately called data dredging at Elder Research, Inc. If an analyst conducts continuous analysis of one aspect of data without reflecting on other contributing factors, some kind of relationship in the data is likely to be "discovered." The result may be a random occurrence in the particular sample of data analyzed or may not actually be the most interesting insight contained in the data. Vast Search Effect is the reason why the public hears conflicting messages, such as coffee is good for you … no, coffee causes a certain type of cancer and hypertension … ooops, actually now coffee is good for you again. These contradictory conclusions cause confusion and weaken people's confidence in studies and analytics in general.

Please join Elder Research, Inc. in the discussion of Vast Search Effect. The outcome of our discussion is to create a healthy skepticism for the most promising data discoveries, which should then motivate us to consider the boundary between data mining and data dredging in our own analysis. ERI will discuss how to recognize data dredging using a recent example covered in the press.

Return to top

Title: The Role of Predictive Distributions in Process Optimization

  • Speaker: John J. Peterson, Quantitative Sciences Department, GlaxoSmithKline Pharmaceuticals
  • Time: Friday, April 19th 11:00 am - 12:00 noon
  • Place: Duques 151 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Sciences. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

Quality improvement has been described in a nutshell as "reduction in variation about a target". Such reduction is driven by the desire to have a high probability of meeting process specifications. However, many statistical quantifications and decisions related to process optimization and response surface analyses are focused only on means, without careful thought to the role of variation and risk assessment. A focus on inference for means is also evident from a review of classical response surface methodology textbooks and popular statistical packages for process optimization. This has caused many scientists and engineers to ignore careful modeling of key sources of variation and to propose regions of process operation that are far too large, thereby harboring process operating conditions associated with poor process performance. This talk will illustrate some of the dangers of failing to account for process variation properly. It will also show how predictive distributions can be used for better process optimization.

Return to top

2013 WSS PRESIDENT'S INVITED SEMINAR

Title: From Multiple Modes for Surveys to Multiple Sources for Estimates

  • Speaker: Constance F. Citro, Director, Committee on National Statistics, National Academy of Sciences/National Research Council
  • Chair: Keith Rust, WSS President
  • Date/Time: May 22, 2012, 1:00 - 3:00 p.m. Light refreshments will be served following the talk * Video conferencing will not be available.
  • Location: Bureau of Labor Statistics Conference Center, Room 7.
    To be placed on the seminar list attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.

Abstract:

Users of federal statistics want estimates that are "wider, deeper, quicker, better, cheaper" (channeling Tim Holt, former head of the UK Office for National Statistics). Each of these adjectives poses challenges and opportunities for those who produce statistics. I am going to focus on "better." Since World War II, we have relied on the probability sample survey as the best we could do--and that best being very good, indeed-- for estimates on everything from household income to self-reported health status, unemployment, crime victimization, and many other topics. Faced with secularly declining unit and item response rates, we have responded in many ways, including the use of multiple survey modes, more sophisticated imputation methods, etc. In the business sector, we also long ago moved away from relying solely on surveys to produce needed estimates, but, to date, we have not done that for household surveys. I argue that we can and must move from a paradigm of producing the best estimates possible from a survey to that of producing the best possible estimates from multiple data sources, including administrative records and, in the future, perhaps other kinds of data. I use household income as my main example. From my 45 years working in and around the federal statistical system (28 years with CNSTAT), mostly as a user, I also offer some observations about ways in which a user perspective can productively become a more integral part of the DNA of the federal statistical system. Return to top


Title: Record Linkage Applications and Statistical Considerations

  • Speakers:
    William Winkler, Ph.D, Principal Researcher, United States Census Bureau
    Jennifer Parker, Ph.D, Chief, Special Projects Branch, National Center for Health Statistics
    Deborah Wagner, Chief, Census Applications Group, United States Census Bureau
  • Organizer: Mary Layne, United States Census Bureau
  • Date & Time: June 4, 2013 12:30pm-2:00pm
  • Location: Bureau of Labor Statistics Conference Center
    To be placed on the seminar list attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program

Abstract:

This seminar will feature an interagency discussion by the United States Census Bureau and the National Center for Health Statistics (NCHS) on the application of record linkage, its challenges, and statistical considerations when using linked data.

William Winkler

The Census Bureau analyzes linked files for many of its research projects. The record linkage process, because of its probabilistic nature, has false match and false non-match error. When using linked files for statistical analysis, it necessary to adjust for record linkage error. This talk provides an overview of techniques and suggests methods to adjust statistical analysis for linkage error.

Jennifer Parker

NCHS' record linkage program links the Center's population health surveys to administrative records from the Centers for Medicare and Medicaid Services (CMS) and the Social Security Administration (SSA). In addition, linkage with the National Death Index ascertains mortality status for survey participants. This talk will focus on analytic challenges and opportunities when using the NCHS-CMS-SSA linked data files.

Deborah Wagner

Many research projects at the Census involve matching persons across surveys and federal data to enhance the understanding of participation in various Federal programs. Fundamental to this work is a method to ensure the same person is linked across multiple files. The Census Bureau's Person Identification Validation System (PVS) uses probabilistic matching to assign unique person and address identifiers to federal, commercial, and survey data to facilitate linking persons across these files. This talk will discuss the PVS and it's methods.

Return to top

U.S. Census Bureau
DSMD Distinguished Seminar Series

Title: Identification and Multiple Imputation of Implausible Gestational Ages for the Study of Preterm Births

  • PPresenter: Dr. Nathaniel Schenker, National Center for Health Statistics
  • Discussant: Dr. Joseph Schafer, U.S. Census Bureau
  • Date: Tuesday, June 25, 2013
  • Time: 2:00 pm - 3:15 pm
  • Where: Seminar Room 5K410, US Census Bureau, 4600 Silver Hill Road, Suitland, Maryland
  • Contact: Cynthia Wellons-Hazer, 301-763-4277, Cynthia.L.Wellons.Hazer@census.gov

Abstract:

Gestational age is an important variable in the study of infant health. However, information on gestational age compiled from birth records has been known to have inaccuracies, largely due to mis-estimation of gestational ages based on time of last menstrual period. Simply deleting implausible cases from analyses can lead to loss of information as well as bias. This talk describes work in progress at the National Center for Health Statistics on methods for (a) identifying implausible reported gestational ages using mixture models for the distribution of birth weights conditional on reported gestational ages, and (b) multiply imputing for implausible reported gestational ages using the mixture models together with prediction models for gestational age. The multiple imputation framework allows the reflection of uncertainty in both the assessment of a reported gestational age as incorrect and the prediction of a true gestational age for an incorrectly reported one.

Return to top

Title: Nonresponse Modeling in Repeated Independent Surveys in a Closed Stable Population--Did the Local Election Officials (LEOs) Roar in 2012?

  • Speakers: Tim Markham and Eric Falk, Defense Manpower Data Center, and Fritz Scheuren, NORC
  • Chair: TBA
  • Date/Time: Wednesday, July 17, 12:30 p.m. to 2:00 p.m.
  • Sponsor: Washington Statistical Society Methodology Section
  • Location: New Offices of Mathematica-MPR, 1101 First Street NE, 12th Floor, Washington DC 20002, near L Street, north of Union Station
  • Directions and Remote Viewing: To be placed on the attendance list for webinar and phone viewing, please RSVP to Alyssa Maccarone at amaccarone@mathematica-mpr.com or (202) 250-3570 at least 1 day in advance of the seminar (in-person attendees do not need to RSVP). Provide your name, affiliation, contact information (email is preferred) and the seminar date. Once on the list, you will be provided with information about webinar and phone viewing. for those who choose to attend in person, Mathematica is located at 1100 1st Street, NE, 12th Floor, Washington, DC 20002. If traveling by Metro, take the Red Line to either the New York Ave Station or Union Station. From the New York Ave Station, follow signs to exit at M Street out of the station and walk 1 block west on M street and 2 blocks south on 1st Street (the building will be on your right). From Union Station, walk north along 1st Street for about 4-5 blocks until you reach L Street (the building will be on your left after crossing L street). If traveling by car, pay parking is available in the building parking garage, which is located 1 block east of North Capitol on L Street NE. Once in the building, take the elevators to the 12th floor and inform the secretary that you are attending the WSS seminar. Please call Mathematica's main office number (202 484-9220) if you have trouble finding the building.

Abstract:

In the past, models of survey unit nonresponse typically were implicit (for example, Oh and Scheuren 1983). If nonresponse was sizable, then adjustments were made. These generally employed some form of a missing-at-random model (for example, Rubin 1983). One such model (Zhang and Scheuren 2011), explicit this time, is discussed in our presentation.

This presentation is a follow-up to one given in June 2012 and the application is to large repeated independent cross-section samples from a closed population. The units are voting jurisdictions, which are required to report on the details of each national election within a reasonable period afterwards. The survey was conducted in 2008, 2010, and 2012 and was initially administered to the local election official to gather individual jurisdictional information. The response rate for these initial surveys was relatively low. for the 2012 survey the state election officials in many states were used to gather information at the jurisdiction level. This led to higher response rates in 2012 but still the non-ignorable nonresponse must be addressed. This talk discusses how this problem might be handled statistically going forward and currently at the evaluation stage.

Return to top

Title: Stochastic Gradient Estimation: Tutorial Review, Recent Research

Abstract:

Stochastic gradient estimation techniques are methodologies for deriving computationally efficient estimators used in simulation optimization and sensitivity analysis of complex stochastic systems that require simulation to estimate their performance. Using a simple illustrative example, the three most well-known direct techniques that lead to unbiased estimators are presented: perturbation analysis, the likelihood ratio (score function) method, and weak derivatives (also known as measure-valued differentiation). A few real-world applications are discussed and then some recent research is summarized.

Speaker:

Michael C. Fu is Ralph J. Tyser Professor of Management Science in the Decision, Operations, and Information Technologies department of the Robert H. Smith School of Business, with a joint appointment in the Institute for Systems Research and an affiliate appointment in the Department of Electrical & Computer Engineering (both in the Clark School of Engineering), all at the University of Maryland, College Park. He received degrees in mathematics and EECS from MIT, and his Ph.D. in applied mathematics from Harvard University. His research interests include simulation optimization and applied probability, particularly with applications towards supply chain management and financial engineering. In 2004 he was named a Distinguished Scholar-Teacher at the University of Maryland. He served as Program Chair for the 2011 Winter Simulation Conference. From September 2010 through August 2012, he served as the Operations Research Program Director at the National Science Foundation. He is a Fellow of INFORMS and IEEE.

Return to top

Title: An Overview of an Analytic Approach for Branching Processes

Abstract:

One approach to solving some questions in probability theory—especially questions about asymptotic properties of algorithms and data structures—is to take an analytic approach, i.e., to utilize complex-valued methods of attack. These methods are especially useful with several types of branching processes, leader election algorithms, pattern matching in trees, data compression, etc. This talk will focus on some of the highlights of this approach. I endeavor to keep it at a level that is accessible for graduate students.

Return to top

The 2013 Roger Herriot Award

Title: The 1973 Exact Match Study

  • Speaker: Michael Messner, U.S. Bureau of Labor Statistics
  • Date/Time: September 17, 2013, 12:30 - 2:30 p.m.
    Light refreshments will be served following the talk
    * Video conferencing will not be available.
  • Chair: Fritz Scheuren, NORC at the University of Chicago
  • Some Remembrance:
    TBA, U.S. Census Bureau
    Benjamin Bridges, U.S. Social Security Administration
    Peter Sailer, U.S. Internal Revenue Service (Consultant)
  • Exact Match Study Today:
    Bertram Kestenbaum, U.S. Social Security Administration
  • Location: Bureau of Labor Statistics, Conference Center, Room 2
    To be placed on the seminar attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.

Abstract:

It has been almost 20 years since the untimely death of Roger Herriot. Throughout his career at the National Center for Education Statistics and earlier at the U.S. Census Bureau, Roger demonstrated an enduring commitment to improving federal statistical data and encouraging fresh and original approaches to developing new and better data sources. It was, therefore, most appropriate that his passing should lead to the establishment of the Roger Herriot Award for Innovation in Federal Statistics, which is "intended to recognize individuals or teams who, like Roger, develop unique and innovative approaches to the solution of statistical problems in federal data collection programs."

This year marks the 40th anniversary of a unique project that not only encapsulates the kind of innovation that Roger Herriot was recognized for. He actually co-led the effort which continues, even now, to draw on the cooperative spirit and creativity of federal statisticians and economists from multiple agencies. This was the 1973 Current Population Survey (CPS) &emdash; Internal Revenue Service (IRS) &emdash; Social Security Administration (SSA) Exact Match Study.

The 1973 Exact Match Study was a joint undertaking of the SSA and the Census Bureau that linked survey records for persons in the March 1973 CPS to their respective earnings and benefit information in SSA administrative records and, with full IRS cooperation, to selected items from 1972 IRS individual income tax returns. All three agencies offered staff to work on the undertaking, so it was an interagency effort, not just something done by isolated individuals. Needless to say, careful disclosure safeguards, both physical and legal, were imposed. This first- of-its-kind initiative not only encouraged better cooperation among statistical agencies, but also helped to kick-start major advances in record linkage techniques and important research into the use of administrative data to improve survey methods and output. Research with these and other data sets continues to this day.

In recognition of this seminal interagency work, the 1973 Exact Match Study was selected as this year's winner of the Roger Herriot Award for Innovation in Federal Statistics. To celebrate that event, WSS will hold a special session at the BLS Conference Center. Alumni of the Exact Match Study and subsequent users of the data are especially encouraged to attend. Refreshments will be served. Join us in commemorating this well-deserved honor and re-connect with old friends!

Return to top

Title: Averaged Regression Quantiles

  • Speaker: Professor Jana Jureckova (Dept. of Statistics, Charles University, Prague)
  • Date/Time: Thursday, September 19, 2013, 3:30 pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

Please Click here to download the abstract.

Return to top

Title: Complexity Penalization in Sparse and Low Rank Matrix Recovery

  • Speaker: Professor Vladimir Koltchinskii, School of Mathematics, Georgia Institute of Technology
  • Date/Time: Wednesday, September 25, 2013 - 11:00am
  • Location: Room 3206, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

Please Click here to download the abstract.

Return to top

Title: First-hitting-time Based Threshold Regression Models for Time-to-event Data

Abstract:

Cox regression methods are well-known. It has, however, a strong proportional hazards assumption. In many medical contexts, a disease progresses until a failure event (such as death) is triggered when the health level first reaches a failure threshold. I'll present a model for the health process that requires few assumptions and, hence, is quite general in its potential application. Both parametric and distribution-free methods for estimation and prediction will be discussed. The methodology provides medical researchers and biostatisticians with new and robust statistical tools for estimating treatment effects and assessing a survivor's remaining life. Several case examples will be discussed.

This is a joint work with G.A. Whitmore of McGill University.

Return to top

U.S. Census Bureau
DSMD Distinguished Seminar Series

Title: Statistical Inference under Nonignorable Sampling and Nonresponse: An Empirical Likelihood Approach

  • Presenter: Professor Danny Pfeffermann, Hebrew University of Jerusalem, University of Southampton, and Central Bureau of Statistics of Israel
  • Discussant #1: Dr. Michail Sverchkov, Bureau of Labor Statistics
  • Discussant #2: Dr. Daniel Bonnery, University of Maryland and US Census Bureau
  • Date: Tuesday, September 24, 2013
  • Time: 2:00 pm - 3:30 pm
  • Where: Seminar Room 5K410 , US Census Bureau, 4600 Silver Hill Road, Suitland, Maryland
  • Contact: Cynthia Wellons-Hazer, 301-763-4277, Cynthia.L.Wellons.Hazer@census.gov

Abstract:

When the sample selection probabilities and/or the response propensities are related to the values of the inference target variable, the distribution of the target variable in the sample may be very different from the distribution in the population from which the sample is taken. Ignoring the sample selection or response mechanism in this case may result in highly biased inference. Accounting for sample selection bias is relatively simple because the sample selection probabilities are usually known, and several approaches have been proposed in the literature to deal with this problem. On the other hand, accounting for a nonignorable response mechanism is much harder since the response probabilities are generally unknown, requiring assuming some structure on the response mechanism.

In this talk, we develop a new approach for modelling complex survey data, which accounts simultaneously for nonignorable sampling and nonresponse. Our approach combines the nonparametric empirical likelihood with a parametric model for the response probabilities, which contains the outcome variable as one of the covariates. The sampling weights also feature in the inference process after appropriate smoothing. We discuss estimation issues and propose simple test statistics for testing the model. Combining the population model with the sample selection probabilities and the model for the response probabilities defines the model holding for the missing data and enables imputing the missing sample data from this model. Simulation results illustrate the performance of the proposed approach.

About Professor Danny Pfeffermann:

Danny Pfeffermann is the Government Statistician and Director of the Central Bureau of Statistics of Israel. He is also Professor of Statistics at the Hebrew University of Jerusalem in Israel and at the University of Southampton in the UK. His main research areas are analytic inference from complex sample surveys, small area estimation, seasonal adjustment and trend estimation and more recently, observational studies and nonresponse. He has more than 60 publications in refereed journals and co-edited the two-volume handbook "Sample Surveys", published by North-Holland. Professor Pfeffermann was the president of the Israel Statistical Society; he is a Fellow of the American Statistical Association and an elected member of the International Statistical Institute, and is now the President of the International Association of Survey Statisticians (IASS). He is the recipient of the 2011 Waksberg award for "outstanding contributions to survey methodology".

Return to top

Title: Enriched Ensemble methods for Classification of High-Dimensional Data

  • Speaker: Dhammika Amaratunga, Ph.D., Senior Director and Janssen Fellow in Nonclinical Biostatistics at Janssen Research & Development
  • Date: Friday, September 27, 2013, from 10-11 am, with Q&A time until 11:30 am
  • Location: Warwick Evans Conference Room, Building D, Georgetown University Medical Campus
  • Directions: http://dbbb.georgetown.edu/mastersprogram/visitors/. Medical Campus map: http://bit.ly/X8OKBN
  • Parking: Metered street parking is available along Reservoir Road. To park on campus, drive into Entrance 1 via Reservoir Road and drive straight back to Leavey Garage. Parking fees are $3.00 per hour.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

A spate of technological advances has led to an explosion of high-dimensional data. One of the challenges of modern statistics is how to deal with this type of data. We will consider data from biomedical research that are characterized by the fact that they are comprised of a large number of variables measured on relatively few subjects, such as microarray or deep sequencing data. Classification and regression techniques are often used for analyzing this data, both for prediction as well as for identifying combinations of a few key variables associated with response. However, standard methods do not work well in this setting, due to the small sample size and surfeit of variables, a problem sometimes also exacerbated by the presence of non-specific signals. Enriched methods are a way of circumventing these difficulties. We will describe enriched methods, particularly enriched ensemble methods, that work well with this type of data. Real examples will be used to illustrate the methodology.

* Joint work with Javier Cabrera and others

Return to top

Title: Population Dynamics of Species-Rich Ecosystems: The Mixture of Matrix Population Models Approach

  • Speaker: Frédéric Mortier, Département Environnements et Sociétés du CIRAD
  • Date & Time: Friday September 27, 3:15 - 4:15 pm
  • Location: St. Mary's 326, Georgetown University, Washington, DC.
  • Directions: maps.georgetown.edu
  • Sponsor: Department of Mathematics and Statistics, Georgetown University (math.georgetown.edu)

Abstract:

Matrix population models are widely used to predict population dynamics but, when applied to species rich ecosystems with many rare species, the small population sample sizes hinder a good fit of species-specific models. This issue can be overcome by assigning species to groups to increase the size of the calibration data sets. However, the species classification is often disconnected from the models and from the parameter estimation, thus bringing species groups that may not be optimal with respect to the predicted community dynamics. We propose a method that jointly classifies species into groups and fits the matrix models in an integrated way. The model is a special case of mixture with unknown number of components and is cast in a Bayesian framework. An MCMC algorithm is developed to infer the unknown parameters: the number of groups, the group of each species and the dynamics parameters. We apply the method to a data set from a tropical rain forest in French Guiana.

Return to top

Title: Stochastic Optimization Problems with Multivariate Stochastic Constraints

Abstract:

Stochastic orders formalize preferences among random outcomes and are widely used in statistics and economics. We focus on stochastic optimization problems involving stochastic-order relations as constraints that relate performance functionals, depending on our decisions, to benchmark random outcomes. Necessary and sufficient conditions of optimality and duality theory for these problems involve expected utility theory, dual (rank-dependent) utility theory, and coherent measures of risk, providing a link between various approaches for risk-averse optimization. We discuss the relation of univariate and multivariate stochastic orders to utility functions, conditional value at risk, as well as general coherent measures of risk. The main focus of the talk is on risk-averse two-stage optimization problems involving stochastic-order constraints. We describe primal and dual decomposition methods to solve the problems. Numerical results confirm the efficiency of the methods. Some applications will be outlined.

Return to top

Title: Novel Statistical Frameworks for Modeling Functional Brain Connectivity Using fMRI Data

Abstract:

In neuroimaging, brain connectivity generally refers to associations between neural units from distinct brain locations. The brain locations and the connectivity between them comprise the vertices and edges of brain connectivity network for each subject. In this talk, we focus on statistical modeling strategies to address current brain connectivity analysis challenges including hierarchical spatial structure and high-dimensionality of the vertices and correlation between edges in networks. We develop multilevel Bayesian frameworks to efficiently shrink the number of covariance parameters between edges with spatial structure constraint rather than penalty term and to identify brain communities. We apply this method to an fMRI study and simulated data sets to demonstrate the properties of our method.

Return to top

Title: Some Old Problems Revisited

  • Speaker: Professor Abram Kagan (Dept. of Mathematics, UMCP)
  • Date/Time: Thursday, October 3, 2013 - 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

In the talk a few mathematical statistical problems will be discussed whose common denominator is that they all were treated joinly with Larry Shepp.

Lawrence (Larry) A. Shepp, a first class probabilist and applied statistician, spent most of his career at AT&T Bell Labs. He was elected to the NAS, Institute of Medicine and Acad. of Arts and Sciences in Boston.

At the time of his death on April 23, 2013 as a result of a trauma suffered some three months before, he was the Patrick T. Harter Professor of Statistics ar Wharton School of Business. Larry was a speaker at our faculty colloquium and statistics seminar and a good friend for many years.

The talk is a tribute to his memory.

The problems to be considered are:

  • Testing the null hypothesis a1 = a2 = … = 0 vs the simple alternative ai = ai0, i = 1, 2, … based on X1 + a1, X2 + a2, … where X1, X2, … are iid with a known distribution
  • Maximum correlation between partial sums of iid random variables
  • Symmetrization of random variables
  • The Nile problem by Ronald Fisher
  • A problem in meta-analysis
Return to top

Title: CPS Unemployment Estimates by Rotation Panel and Research Topics

  • Speaker: Michael D. Larsen, PhD; Associate Professor, Department of Statistics
  • Date: Friday, October 4, 2013
  • Time: 11:00AM-noon
  • Room: Duques 652 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Sciences. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

The Current Population Survey (CPS) is a monthly household survey of 72,000 households conducted by the U.S. Census Bureau for the U.S. Bureau of Labor Statistics to measure employment, unemployment, and other characteristics of the civilian non-institutionalized population in the United States. The CPS began in 1940 and provides data for key economic indicators. In this talk, we will study CPS rotation panel bias, investigate whether the estimates of unemployment for different Month-in-Sample (MIS) panels are statistically significantly different, and explore the assumptions underlying the AK composite estimator. We are particularly interested in the comparison of MIS 1 versus MIS 5 because they are based on personal interviews and are key elements in the AK composite estimator. We apply a nonparametric statistical method with dependent permutations across time to real CPS data from 2006-2010. Additional critical statistical research topics for the CPS are described.

Return to top

Title: Combination of Longitudinal Biomarkers in Predicting Binary Events With Application to a Fetal Growth Study

Abstract:

In the disease screening, the combination of multiple biomarkers often substantially improves the diagnostic accuracy over a single marker. When one or multiple biomarkers are measured repeatedly over time, the disease diagnosis should take into account of their trajectory and correlation structure. We propose a pattern mixture model (PMM) framework to predict a binary disease status from a longitudinal sequence of biomarkers. The marker distribution given the disease status is estimated from a linear mixed effects model. A likelihood ratio statistic is computed as the combination rule, which is optimal if the mixed effects model is correct. The individual disease risk score is then estimated by Bayes' theorem, and we derive the analytical form of the 95% confidence interval. We show that this PMM is an approximation to the shared random effects (SRE) model proposed by Albert (2012). Further, with extensive simulation studies, we found that the PMM is more robust than the SRE under wide classes of models. This new PPM approach for combining biomarkers is motivated by and applied to a fetal growth study, where the interest is in predicting macrosomia using longitudinal ultrasound measurements.

Return to top

Title: Within-Cluster Resampling Methods for Clustered ROC Data

  • Speaker: Dr. Larry Tang (Dept. of Statistics, George Mason University) -
  • Date/Time: Thursday, October 10, 2013 - 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

Clustered ROC data is a type of data that each subject has multiple diseased and nondiseased observations. Within the same subject, observations are naturally correlated, and the cluster sizes may be informative of the subject's disease status. The traditional ROC methods on clustered data could result in large bias and lead to incorrect statistical inference. We introduce within-cluster resampling (WCR) methods for clustered ROC data to account for within-cluster correlation and informative cluster sizes. The WCR methods work as follows. First, one observation is randomly selected from each patient, and then the traditional ROC methods are applied on the resampled data to obtain ROC estimates. These steps are performed multiple times and the average of resampled ROC estimates is the final estimator. The proposed method does not require a specific within-cluster correlation structure and yields a valid estimator when the cluster sizes are informative. We compare the proposed methods to existing methods in extensive simulation studies.

Return to top

Title: Determining Change-points in Tumor Blood Flow using a Modified Information Criteria to Better Balance Complexity and Fit in a Semi-parametric Model

  • Speaker: Mary E. Putt, Ph.D., Sc.D., Associate Professor of Biostatistics in Biostatistics and Epidemiology at the Hospital of the University of Pennsylvania, Perelman School of Medicine
  • Date: Friday, October 11, 2013, from 10-11 am, with Q&A time until 11:30 am
  • Location: Warwick Evans Conference Room, Building D, Georgetown University Medical Campus
  • Directions: http://dbbb.georgetown.edu/mastersprogram/visitors/. Medical Campus map: http://bit.ly/X8OKBN
  • Parking: Metered street parking is available along Reservoir Road. To park on campus, drive into Entrance 1 via Reservoir Road and drive straight back to Leavey Garage. Parking fees are $3.00 per hour.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

Our work is motivated by a tumor biology study where blood flow naturally follows a non-linear pattern. An experimental cancer treatment disrupts blood flow; the duration and rate of decline appears to reflect treatment efficacy. With knowledge of the non-linear baseline blood flow, we used a smoothing spline model with unknown change-points to estimate the time of the change in flow, and blood flow at the change-points. We found that the choice of the smoothing parameter strongly influences the estimation of the change-point locations, and the function at the change-points. Choosing the smoothing parameter based on minimizing generalized cross validation, GCV, gave unsatisfactory estimates of the change-points. We propose a new method, aGCV, that re-weights the residual sum of squares and generalized degrees of freedom terms from GCV. The weight is chosen to maximize the decrease in the generalized degrees of freedom as a function of the weight, while simultaneously minimizing aGCV as a function of the smoothing parameter and the change-points. Compared to GCV, simulation studies suggest that the aGCV method yields substantially improved estimates of the change-points, as well as estimation of the function at the change-points. Remaining challenges involved in the development of valid and precise confidence intervals of the function at the change-points, as well as computational challenges will be discussed.

Return to top

Title: Matern Class of Cross-Covariance Functions for Multivariate Random Fields

Abstract:

Data indexed by spatial coordinates have become ubiquitous in a large number of applications, for instance in environmental, climate and social sciences, hydrology and ecology. Recently, the availability of high resolution microscopy together with advances in imaging technology has increased the importance of spatial data to detect meaningful patterns as well as to make predictions in medical applications (brain imaging) and systems biology (images of fluorescently labeled proteins, lipids, DNA). The defining feature of multivariate spatial data is the availability of several measurements at each spatial location. Such data may exhibit not only correlation between variables at each site but also spatial correlation within each variable and spatial cross-correlation between variables at neighboring sites. Any analysis or modeling must therefore allow for flexible but computationally tractable specifications for the multivariate spatial effects processes. In practice we assume that such processes, probably after some transformation, are not too far from Gaussian and characterized well by the first two moments. The model for the mean follows from the context. However, the challenge is to find a valid specification for cross-covariance matrices, which is estimable and yet flexible enough to incorporate a wide range of correlation structures. Recent literature advocates the use of Matern family for univariate processes. I will introduce a valid parametric family of cross-covariance functions for multivariate spatial random fields where each component has a covariance function from Matern class (Apanasovich et al (2012)). Unlike previous attempts, our model indeed allows for various smoothness and rates of correlation decay for any number of vector components.

The application of the proposed methodologies will be illustrated on the datasets from environmental science, meteorology and systems biology.

Return to top

Title: Estimating Restricted Mean Job Tenures in Semi-Competing Risk Data Compensating Victims of Discrimination

Abstract:

When plaintiffs prevail in a discrimination case, a major component of the calculation of economic loss is the length of time they would have been in the higher position had they been treated fairly during the period in which the employer practiced discrimination. This problem is complicated by the fact that one's eligibility for promotion is subject to termination by retirement and both the promotion and retirement processes may be affected by discriminatory practices. This semi-competing risk setup is decomposed into a retirement process and a promotion process among the employees. Predictions for the purpose of compensation are made by utilizing the expected promotion and retirement probabilities of similarly qualified members of the nondiscriminated group. The restricted mean durations of three periods are estimated--the time an employee would be at the lower position, at the higher level and in retirement. The asymptotic properties of the estimators are presented and examined through simulation studies. The proposed restricted mean job duration estimators are shown to be robust in the presence of an independent frailty term. Data from the reverse discrimination case, Alexander v. Milwaukee, where White-male lieutenants were discriminated in promotion to captain are reanalyzed. While the appellate court upheld liability, it reversed the original damage calculations, which heavily depended on the time a plaintiff would have been in each position. The results obtained by the proposed method are compared to those made at the first trial. Substantial differences in both directions are observed.

If time permits, the second part of the talk will showcase progresses of my current project on genetic disparities and risk predictions for microvascular complications among Type 1 diabetes patients. Specifically, statistical methods selecting time-varying SNP effects with adaptive weights emphasizing maximum effects over time will be presented.

Return to top

Title: Research at Census, including links to Biostatistcs/Informatics

  • Speaker: MThomas A. Louis, Ph.D., Research & Methodology Directorate, U.S. Census Bureau Professor, Department of Biostatistics, Johns Hopkins
  • Date: Friday, October 25, 2013, from 10-11 am, with Q&A time until 11:30 am
  • Location: Warwick Evans Conference Room, Building D, Georgetown University Medical Campus
  • Directions: http://dbbb.georgetown.edu/mastersprogram/visitors/. Medical Campus map: http://bit.ly/X8OKBN
  • Parking: Metered street parking is available along Reservoir Road. To park on campus, drive into Entrance 1 via Reservoir Road and drive straight back to Leavey Garage. Parking fees are $3.00 per hour.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

In order to meet the challenges of efficiently obtaining valid information and making it available to the public, research at the U.S. Census Bureau and survey research more generally burgeons. Many research goals and methods are similar to those addressed by and used in Biostatistics or Informatics. To set the scene, I briefly describe the Census Research & Methodology directorate, list major issues and approaches, then provide details on a small subset. Candidate topics include adaptive design (dynamic survey modes, R-factors in the National Survey of College Graduates, timing of mailing hard copy based on K-M curves, challenges of learning from experience), stopping rules, randomized experiments (the effect of interviewer training in the National Crime Victimization Survey), record matching, prediction (of response propensity, of occupancy, of the "fitness for use" of administrative records), imputation, Bayesian methods (design- consistent analyses, post-processed {confidence} intervals, benchmarking), small areea/spatio-temporal analysis (estimation of poverty rates, estimating omissions in the master address file), development and use of paradata (in the National Health Interview Survey), double-robustness, dynamic data posting ("OnTheMap" Local Origin-Destination Employment Statistics), disclosure avoidance/limitation, Big Data (opportunities and challenges), micro- simulation (benefits of research in designing the 2020 Census), and IT infrastructure (the Multi-mode Operational Control System). I close with a call for increased collaboration among statistical agencies and academe, building on the NSF-Census Bureau Research Network.

Return to top

Title: Some Statistical Problems in Models for Complex Networks

Abstract:

The last few years have seen an explosion in the amount of data on many real world networks. This has resulted in an interdisciplinary eff ort in formulating models to understand the data. We explore various theoretical questions arising from such data, including:

1. Reconstruction of routing trees (Network Tomography): In a number of problems that arise from trying to discover the underlying structure of the Internet, it is often impossible to take direct measurements at the routers. We shall describe progress in trying to reconstruct the "Multicast" tree exactly using only "end-to-end" measurements. Using fundamental results from algorithms used to reconstruct Phylogenies, we show that this can be done using very few samples.

2. MCMC simulation of exponential random graphs: Exponential random graphs are one of the most used models in social network theory. The basic intuition is as follows: In social networks we see more triangles cliques etc than we would expect in a random graph, since if A is a friend of B and A is a friend of C then it is quite likely that B and C are friends. One way to model such a phenomenon is to attach, for every graph G,a Hamiltonian given by say

H(G) = β#E(G) + γ#T(G)

where E(G) and T(G) are the number of edges and triangles respectively and then looking at the Gibbs distribution induced by this Hamiltonian. Simulating from these models is of paramount interest.

Using the modern day theory of Markov Chains we and in the ferromagnetic setup, exactly when one can simulate from this model effciently and when it would take exponentially long to simulate from this model.

3. Modeling retweet networks and Non-local preferential attachment: A wide array of real world networks exhibit a "superstar" phenomenon, including twitter event networks, where one vertex contains a finite fraction of the edges of the network. We describe a simple variant of preferential attachment which seems to perform much better on empirical data than the standard model. We will describe mathematical techniques from continuous time branching processes required to rigorously analyze this model

Return to top

Title: Unseasonal Seasonals?

  • Speaker: Jonathan Wright, Johns Hopkins University
  • Discussant: David Findley, Bureau of the Census
  • Date & Time: Wednesday, October 30, 10:00 am-12:00 pm Location: Bureau of Labor Statistics, Conference Center Room 8
    To be placed on the seminar list attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program

Abstract:

In any seasonal adjustment filter, some cyclical variation will be mis-attributed to seasonal factors and vice-versa. The issue is well known, but has resurfaced since the timing of the sharp downturn during the Great Recession appears to have distorted seasonals. In this paper, I find that initially this effect pushed reported seasonally adjusted nonfarm payrolls up in the first half of the year and down in the second half of the year, by a bit more than 100,000 in both cases. But the effect declined in later years and is quite small at the time of writing. In addition, I make a case for using filters that constrain the seasonal factors to vary less over time than the default filters used by US statistical agencies, and also for using filters that are based on estimation of a state-space model. Finally, I report some evidence of predictability in revisions to seasonal factors.

Return to top

Title: Asymptotic Normality and Optimalities in Estimation of Large Gaussian Graphical Model

  • Speaker: Dr. Tingni Sun (Wharton School of Business, Univ. of Pennsylvania)
  • Date/Time: Thursday, October 31, 2013 - 3:30pm
  • Location: Room 1313, Math Building, University of Maryland College Park (directions).
  • Sponsor: University of Maryland, Statistics Program (seminar updates).

Abstract:

Gaussian graphical model has a wide range of applications. In this talk, we consider a fundamental question: When is it possible to estimate low-dimensional parameters at parametric square-root rate in a large Gaussian graphical model? A novel regression approach is proposed to obtain asymptotically efficient estimation of each entry of a precision matrix under a sparseness condition relative to the sample size. The proposed estimator is also applied to test the presence of an edge in the Gaussian graphical model or to recover the support of the entire model. Theoretical properties are studied under a sparsity condition on the precision matrix and a side condition on the range of its spectrum, which significantly relaxes some commonly imposed conditions, e.g. irrepresentable condition, $\ell_1$ constraint on the precision matrix.

This is a joint work with Zhao Ren, Cun-Hui Zhang and Harrison Zhou.

Return to top

Title: New Classes of Nonseparable Space-Time Covariance Functions

  • Speaker: Tatiyana Apanasovich, Department of Statistics, George Washington University
  • Date: Friday, November 1, 2013
  • Time: 3:15 pm
  • Location: St. Mary's 326, Georgetown University, Washington, DC.
  • Directions: maps.georgetown.edu
  • Parking: Metered street parking is available along Reservoir Road. To park on campus, drive into Entrance 1 via Reservoir Road and drive straight back to Leavey Garage. Parking fees are $3 per hour.
  • Sponsor: Department of Mathematics and Statistics, Georgetown University (math.georgetown.edu)

Abstract:

Space-Time data arise in many different scientific areas, such as environmental sciences, epidemiology, geology, marine biology, to name but a few. Hence, there is a growing need for statistical models and methods that can deal with such data. In this talk we address the need for parametric covariance models which account not only for spatial and temporal dependencies but also for their interactions. It has been argued by many researchers that separability of the covariance function can be a very unrealistic assumption in many settings. Hence we propose nonseparable space-time covariance structures which have celebrated Matern family for their spatial margins. Our covariances possess many desirable properties as we demonstrate. For example, the proposed structures allow for the different degree of smoothness for the process in space and time. Moreover, our covariances are smoother along their axis than at the origin. We also describe a simple modification to our family to address the lack of symmetry.

Return to top

Title: Marginal Analysis of Measurement Agreement Data Among Multiple Raters With Missing Ratings

Abstract:

In diagnostic medicine, several measurements have been developed to evaluate the agreements among raters when the data are complete. In practice, raters may not be able to give definitive ratings to some participants because symptoms may not be clear-cut. Simply removing participants with missing ratings may produce biased estimates and result in loss of efficiency. In this article, we propose a within-cluster resampling (WCR) procedure and a marginal approach to handle non-ignorable missing data in measurement agreement data. Simulation studies show that both WCR and marginal approach provide unbiased estimates and have coverage probabilities close to the nominal level. The proposed methods are applied to a data set from the Physician Reliability Study in diagnosing endometriosis.

Return to top

Title: Measuring the Real Size of the World Economy—Methodology and Challenges

  • Organizer: Frederic A Vogel, Deputy Chair, International Comparison Program (ICP) Technical Advisory Group, World Bank
  • Chair: Grant Cameron, Manager, Development Economics Data Group, World Bank
  • Speaker: Michel Mouyelo-Katoula, Global Manager, (ICP), World Bank
  • Discussant: Alan Heston, Professor Emeritus, Department of Economics, University of Pennsylvania and member of ICP Technical Advisory Group
  • Date & Time: Wednesday, November 6, 2013, 12:30-2:00 PM Location: Bureau of Labor Statistics, Conference Center Room 1
    To be placed on the seminar list attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Presentation material:
    Presentation Slides (pdf, ~3.6mb)

Abstract:

The International Comparison Program (ICP) has become the world's largest international statistical activity that includes over 140 countries and economies plus the 46 additional countries in the Eurostat-OECD program. The ICP is a global statistical initiative that supports inter-country comparisons of Gross Domestic Product and its components using Purchasing Power Parities as a currency converter. The foundation of the ICP is the comparison of national prices of a well defined basket of goods and services under the conceptual framework of the System of National Accounts. While the ICP shares a common language and conceptual framework with national statistical systems for measuring the Consumer Price Index and their national accounts, it faces unique challenges in providing statistical methodology that can be carried out in practice by countries differing in size, culture, the diversity of goods and services available to their population, and statistical capabilities.

The next publication of PPPs and related measures of the real size of the World Economy will be in December 2013. The seminar will provide an overview of the statistical methods used to estimate Purchasing Power Parities, changes made from the previous benchmark survey, and the possible impact on the final results.

Return to top

Title: Fast Community Detection in Large Sparse Networks

Abstract:

Community detection is one of the fundamental problems in network analysis, with many diverse applications, and a lot of work has been done on models and algorithms that find communities. Perhaps the most commonly used probabilistic model for a network with communities is the stochastic block model, and many algorithms for fitting it have been proposed. Since finding communities involves optimizing over all possible assignments of discrete labels, most existing algorithms do not scale well to large networks, and many fail on sparse networks. In this talk, we propose a pseudo-likelihood approach for fitting the stochastic block model to address these shortcomings. Pseudo-likelihood is a general statistical principle that involves trading off some of the model complexity against computational efficiency. We also derive a variant that allows for arbitrary degree distributions in the network, making it suitable for fitting the more flexible degree-corrected stochastic block model. The pseudo-likelihood algorithm scales easily to networks with millions of nodes, performs well empirically under a range of settings, including on very sparse networks, and is asymptotically consistent under reasonable conditions. If times allows, I will also discuss spectral clustering with perturbations, a new method of independent interest we use to initialize pseudo-likelihood, which works well on sparse networks where regular spectral clustering fails.

Return to top

Title: Estimation of Mean Response Via Effective Balancing Score

  • Speaker: Zonghui Hu, Ph.D., National Institutes of Health
  • Date: Friday, November 8, 2013, from 10-11 am, with Q&A time until 11:30 am
  • Location: Warwick Evans Conference Room, Building D, Georgetown University Medical Campus
  • Directions: http://dbbb.georgetown.edu/mastersprogram/visitors/. Medical Campus map: http://bit.ly/X8OKBN
  • Parking: Metered street parking is available along Reservoir Road. To park on campus, drive into Entrance 1 via Reservoir Road and drive straight back to Leavey Garage. Parking fees are $3.00 per hour.
  • Sponsor: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University. Part of the Bio3 Seminar Series.

Abstract:

We introduce effective balancing scores for estimation of the mean response under MAR (missing at random). Unlike conventional balancing scores, the effective balancing scores are constructed via dimension reduction free of model specification. Three types of effective balancing scores are introduced, carrying the covariate information about the missingness, the response, or both. They lead to consistent estimation with little or no loss in efficiency. Compared to existing estimators, the effective balancing score based estimator relieves the burden of model specification and is the most robust. It is a near- automatic procedure that is most appealing when high dimensional covariates are involved. We investigate both the asymptotic and the numerical properties, and demonstrate the proposed method in a study of HIV disease.

Full list of authors for this paper: Hu Z., Follmann D.A., Wang N.

Return to top

Title: New Classes of Nonseparable Space-Time Covariance Functions

Abstract:

Statistical methods for the analysis of space-time data are of great interest for many areas of application. Geostatistical approaches to spatiotemporal estimation and prediction heavily rely on appropriate covariance models. In my talk I will first give an overview of techniques to build valid space-time covariances that must satisfy the positive definiteness constraint. Then, I will discuss the specific properties of covariance functions and how they relate to spatial and temporal marginal processes as well as their interaction. The highlighted critical aspects to model building will be used to motivate the proposed family of nonseparable space-time covariance structures which have the celebrated Matern family for their spatial margins. I will also describe a simple modification of the new family to address the lack of symmetry. The application of the proposed methodologies will be illustrated on the datasets from environmental science and meteorology.

Return to top

Title: Information and Heuristic Creation

Abstract:

Many important and practical problems, such as measuring various properties of networks, are computationally intractable (NP-hard). In order to be able to compute these values, we can use alternate approaches such as using approximations or using heuristics (algorithms that almost always work). We examine how to use information content to reason on and create heuristics for these problems.

Return to top

Title: Has The Time Come To Give Up Blinding In Randomized Clinical Trials?

Abstract:

Should all trials be double blinded, that is, should treatment allocation be concealed from both the subjects and those administering the treatment? In the late 1980's and early1990's trialists advocated strongly for double blinding of clinical trials, yet in the past 15 years, we have seen more and more clinical trials that are unblinded. While it is relatively easy to make a placebo controlled trial of a medication given orally double blinded, reasons for not blinding include that in some situations it is too difficult (or expensive) to blind, in some situations, it may be unethical to blind and in other situations, it is impossible to blind. Complex interventions may make blinding especially difficult. Comparative effectiveness studies also encourage unblinded trials because "blinding is not done in the real world." We give several examples of recent trials which have not been blinded and examine the consequences.

Return to top

Title: The Remarkable Robustness of Ordinary Least Squares in Randomized Clinical Trials

  • Chair: Dan Liao, WSS Methodology Section Chair
  • Speaker: David R. Judkins, Abt Associates
  • Date & Time: Tuesday, November 19, 12:30pm-2:00 pm
  • Location: Bureau of Labor Statistics, Conference Center Room 8
    To be placed on the seminar list attendance list at the Bureau of Labor Statistics you need to e-mail your name, affiliation, and seminar name to wss_seminar@bls.gov (underscore after 'wss') by noon at least 2 days in advance of the seminar or call 202-691-7524 and leave a message. Bring a photo ID to the seminar. BLS is located at 2 Massachusetts Avenue, NE. Use the Red Line to Union Station.
  • Sponsor: WSS Methodology Program

Abstract:

There has been a series of occasional papers in Statistics in Medicine about robust covariate control in the analysis of clinical trials. The robust semiparametric and nonparametric methods for statistical inference of estimated effects are fairly easy to apply with 21st century computers, but many prefer to continue using t-tests and confidence intervals based on ordinary least squares for outcomes that clearly do not follow normal distributions. Presumably, issues of tradition and communication make it very hard to deflect this inertia. In addition, recent papers have demonstrated that the tests are asymptotically equivalent and the more complex but less parametric procedures make little difference in practice. However, in this journal, there is not sufficient examination of whether these tests and confidence intervals are robust to substantial excess kurtosis, particularly in small sample sizes. This paper indicates through simulation where the boundaries lie for two types of strongly nonnormal outcomes: binary outcomes and compound binary/gamma outcomes. We found that traditional ANCOVA methods work very well down to very small sample sizes for these outcomes.

Return to top

Title: Large Panel Test of Factor Pricing Models

Abstract:

We consider testing the high-dimensional multi-factor pricing model, with the number of assets much larger than the length of time series. Most of the existing tests are based on a quadratic form of estimated alphas. They suffer from low powers, however, due to the accumulation of errors in estimating high-dimensional parameters that overrides the signals of non-vanishing alphas. To resolve this issue, we develop a new class of tests, called ``power enhancement" tests. It strengthens the power of existing tests in important sparse alternative hypotheses where market inefficiency is caused by a small portion of stocks with significant alphas. The power enhancement component is asymptotically negligible under the null hypothesis and hence does not distort much the size of the original test. Yet, it becomes large in a specific region of the alternative hypothesis and therefore significantly enhances the power. In particular, we design a screened Wald-test that enables us to detect and identify individual stocks with significant alphas. We also develop a feasible Wald statistic using a regularized high-dimensional covariance matrix. By combining those two, our proposed method achieves power enhancement while controlling the size, which is illustrated by extensive simulation studies and empirically applied to the components in the S&P 500 index. Our empirical study shows that market inefficiency is primarily caused by merely a few stocks with significant alphas, most of which are positive, instead of a large portion of slightly mis-priced assets.

Return to top

Title: On Aggregating Probabilistic Information: The Wisdom of (and Problem with) Crowds

  • Speaker: Victor R. Jose, McDonough School of Business, Georgetown University
  • Date: Friday, November 22nd, 2013
  • Time: 11:00AM-noon
  • Room: Duques 353 (2201 G Street, NW, Washington, DC 20052).
  • Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
  • Sponsor: The George Washington University, The Institute for Integrating Statistics in Decision Sciences and the Department of Decision Sciences. See http://business.gwu.edu/decisionsciences/i2sds/seminars.cfm for a list of seminars.

Abstract:

Research related to the wisdom of crowds has often shown that aggregation of forecasts through linear opinion pools can provide a much better point estimate of unknown quantities than individual experts/forecasters. We examine how well this idea is translated when dealing with probability forecasts. One of the issues that we quickly see in dealing with linear opinion pools of probability forecasts is poor calibration. For example, as the crowd's diversity increases, the aggregate tends toward underconfidence. In this talk, I discuss a simple robust approach to combining individual probability forecasts called trimmed opinion pools that is able to address issues related to underconfidence. We also suggest a novel alternative in the case of overconfidence. Using probability forecast data from the US and European Surveys of Professional Forecasters, we demonstrate empirically that these simple robust approaches to opinion pools can outperform the linear opinion pool.

Return to top

Title: Distributional Convergence for the Number of Symbol Comparisons Used by QuickSort

Abstract:

We will begin by reviewing the operation of the sorting algorithm QuickSort. Most previous analyses of QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. In contrast, we suppose that the n independent and identically distributed (iid) keys are each represented as a sequence of symbols from a probabilistic source and that QuickSort operates on individual symbols, and we measure the execution cost as the number of symbol comparisons. Assuming only a mild "tameness" condition on the source, we show that there is a limiting distribution for the number of symbol comparisons after normalization: first centering by the mean and then dividing by n. Additionally, under a condition that grows more restrictive as p increases, we have convergence of moments of orders p and smaller. In particular, we have convergence in distribution and convergence of moments of every order whenever the source is memoryless, i.e., whenever each key is generated as an infinite string of iid symbols. This is somewhat surprising: Even for the classical model that each key is an iid string of unbiased ("fair") bits, the mean exhibits periodic fluctuations of order n.

Return to top

Julius Shishkin Award Seminar

Title: Micro Data Research and Macro Level Understanding: Innovation at U.S. Statistical Agencies

  • Speaker: John Haltiwanger, University of Maryland
  • Time: Monday, Dec 16, 2013, 12:30pm - 2pm
  • Where: Conference Room 4, U.S. Census Bureau, 4600 Silver Hill Road, Suitland, Maryland
  • Contact: To be placed on the seminar attendance list at the Bureau of the Census, e-mail your name, affiliation, citizenship (if other than U.S. Citizen) and seminar name to maria.s.cantwell@census.gov by noon of December 11th or call 301-763-2583 and leave a message. Bring a photo ID (passport, if other than U.S. Citizen) to the seminar. The Census Bureau is located next to the Suitland Green Line Station in

Abstract:

Understanding the U.S. economy and its people at the macro-level requires delving into micro-level data. Micro data research at U.S. statistical agencies has produced innovations that enhance our understanding of U.S. businesses and people. Such research has played a critical role in the development of new data products, discovering innovative methodologies, and assessing the quality and improving existing data products. Successful research programs at the statistical agencies have involved a strong internal research staff as well as active collaboration with external researchers in academia. A critical component of the latter has been programs to facilitate access to the micro data by external researchers. These access programs enable the statistical agencies to harness the creative energy of the U.S. academic community for the benefit of the entire U.S. statistical system. The discussion will focus on critical areas that the statistical agencies should be addressing and the role that micro data research access could play in addressing these challenges.

Speaker Biography:

Professor Haltiwanger joined the faculty at the University of Maryland in 1987 after teaching several years at UCLA and Johns Hopkins University. At Maryland, he was made a Distinguished University Professor in 2010, and was named the first recipient of the Dillard Professorship in Economics in 2013. His began his association with the Census Bureau in 1987 as a Research Associate at the Bureau's Center for Economic Studies (CES), became the Bureau's first Chief Economist in 1996, and headed the CES from 1997 to 1999. He has continued his association with the Bureau as a Research Associate at the CES and as a Senior Research Fellow for the Longitudinal Employer-Household Dynamics (LEHD) program. He will be recognized as one of the two recipients of the 2013 Julius Shiskin Award for his initiatives to educate users and producers of key federal economic statistics.

Return to top

Seminar Archives

2017 2016 2015 2014 2013
2012 2011 2010 2009
2008 2007 2006 2005
2004 2003 2002 2001
2000 1999 1998 1997
1996 1995    

Methodology