May 2008
Contents:
- Announcement of the 2008 WSS Election
- Annual Dinner
- Save The Date - Book Signing at Reiter's Books
- Seminars, Conferences, Symposia & Call For Papers:
- Education Announcements:
- Master of Science Program in Biostatistics (Georgetown University)
- Students' Corner
- Short Courses (includes JPSM short courses)
- SIGSTAT Topics for Spring 2008
- Employment Opportunities
- Note From The WSS NEWS Editor
- WSS People
- PDF Versions:
(Requires Adobe Acrobat Reader)
Newsletter
Area Meetings and Courses
Announcement of the 2008 WSS Election
The 2008 WSS election will be held online from Monday, May 5, through Friday, May 30. Instructions on voting will be sent to members the first week of May. Below are the candidates for this year's election. All WSS members are urged to vote once the balloting begins. The results will be announced at the WSS Annual Dinner on June 25.
Candidates For President (select one)
John L. Eltinge, BLS
John L. Eltinge received a Ph.D. from the Department of Statistics at Iowa State University in 1987. From 1988 to 1999, he served as an assistant and associate professor in the Department of Statistics at Texas A&M University, where he taught statistics and survey sampling, and consulted with the BLS, the NCHS and several Texas state agencies. In 1992-1993, he was an ASA/NSF/BLS Research Fellow. From 1999 to 2004, he worked as the Senior Mathematical Statistician in the Office of Survey Methods Research at the BLS. Since 2004 he has served as the Associate Commissioner for Survey Methods Research at the BLS.
His previous professional service includes work as the program chair for the ASA Survey Research Methods Section (2000); the overall program chair for the Joint Statistical Meetings (2003); Associate Editor for Journal of the American Statistical Association (1992-1998); Associate Editor for The American Statistician (2000-2004); Associate Editor for Survey Methodology (1998-present); and Associate Editor for Journal of Official Statistics (2001-present). He is a fellow of the American Statistical Association, and has primary research interests in survey inference, cost structures, optimization, nonresponse and small area estimation.
Marilyn McMillen Seastrom, NCES
Marilyn Seastrom erves as the Chief Statistician and Director of the Statistical Standards Program at the National Center for Education Statistics (NCES) in the US Department of Education. Prior to joining NCES in 1988, she conducted demographic and health analyses for seven years in various components of HHS.
Seastrom received her Ph.D. in Sociology (Demography and Applied Social Statistics) and an MS from the University of Illinois at Urbana-Champaign. In her current position, she led the 2002 revision of the NCES Statistical Standards, which serve as a guide for NCES staff and contractors in conducting surveys and reporting on survey results. She has published both substantive and methodological work using administrative records and sample survey data in government reports and a number of refereed journals.
Seastrom is a Fellow of the American Statistical Association; and has previously served on the Executive Committees of both the Survey Research Methods and Social Statistics Sections of the ASA. She has been a member of the American Statistical Association for 30 years and is also member of the Washington Statistical Society, where she served as Co-Chair Social Arrangements Committee from 1986 to 1987 and as a member of the Herriot Award Committee in 2004. Seastrom has been a member of the OMB Federal Committee on Statistical Methodology since 2003.
Candidates for Methodology Program Chair (select one)
Brian Meekins, BLS
Brian Meekins is currently a research statistician at the Bureau of Labor Statistics. His primary areas of study are latent variable models, measurement error, and telephone survey methodology. Brian received his doctorate from University of Virginia where he worked at the Center for Survey Research for a number of years. Recently, Brian served as the SRMS Section Newsletter Editor from 2004 to 2006.
Y. Michael Yang, ICF International
Dr. Y. Michael Yang is Chief Statistician at the Survey Research Center at ICF International, a global consulting services firm headquartered in Fairfax, VA. Dr. YangÕs research encompasses survey sample design, post-survey data adjustment, and survey data analysis. He has served as senior statistician on many large scale survey projects, responsible for sample design, estimation, and related methodological research. He joined ICF from the National Opinion Research Center (NORC) at the University of Chicago, where he was Senior Statistician in the Statistics and Methodology Department from 2000 to 2007. He has also worked as Senior Methodologist/Statistician at The Gallup Organization.
Candidates for WSS Representative-at-Large (select two)
Jim Knaub, EIA
Jim Knaub has worked for the Federal Government for 30 years, with a variety of science-oriented job titles, and has been a member of ASA and WSS for 18 years. Qualifications include six years on the Mathematics Advisory Committee for the Arlington Public Schools, including one year as the chair, suggesting and following through with projects, and also substantial work as a referee or editor for statistics journals, including the Journal of Official Statistics (JOS), and InterStat (an online journal at http://interstat.statjournals.net/), with an emphasis on constructive help for authors. He has presented numerous proceedings papers, most (at least 15) for JSM sessions, and has twice chaired at the JSM, presented two WSS Seminars, and has also published articles on InterStat. In June 2006 his book review of Estimation in Surveys with Nonresponse, by Carl-Erik Sarndal and Sixten Lundstrom (Wiley), appeared in JOS (http://www.jos.nu/Articles/abstract.asp?article=222351). Jim has written two entries for the Sage Encyclopedia of Measurement and Statistics, and two other entries for the yet-to-be-released Sage Encyclopedia of Survey Research Methods, including one entry on cutoff sampling. Further, Jim has presented several times to the ASA Energy Committee; is a former member of the ASA Human Rights Committee; and has been a WSS science fair judge and presented for WSS to scouts. At the Electric Power Division within the Energy Information Administration, Jim developed estimation procedures and samples for electric power surveys with emphasis on integrating with the total survey systems, including data quality and error measurement considerations. Over a number of years he has developed an extensive system, and as lead statistician in that area, now organizes the work of others. His first Govt. supervisor called Jim a problem solver and self-starter, and he continues to strive for that. Jim considers himself well rounded due to other interests. (Come hear the band: http://www.viennacommunityband.org/concertschedule.php! Jim plays trombone in a concert band.)
Tim Kennel, U.S. Census Bureau
Tim Kennel enjoys learning about statistics, using statistics to improve surveys, and interacting with the wonderful Washington statistical community. Presently, Tim is a second year doctoral student studying survey methodology at the University of Maryland and a Mathematical Statistician at the US Census Bureau. When he is not working or studying, Tim likes cooking, going to museums, and spending time with his delightful wife. For the past four years, Tim has humbly served on the WSS membership committee with joy. He hopes to serve this community in a new way.
Elizabeth H. Margosches, US EPA
Elizabeth Margosches would like the opportunity to be more active with the WSS and serving as a representative at large seems to her a good way to start and a way her other experiences can best inform her service. Currently Statistician in the Existing Chemicals Assessment Branch, Risk Assessment Division, Office of Pollution Prevention and Toxics (OPPT), Office of Prevention, Pesticides and Toxic Substances (OPPTS), US Environmental Protection Agency (USEPA), Washington, DC, Dr. Margosches has been in the Agency since 1980 in various risk-assessment-related roles and serving on a variety of Agency and external workgroups and committees. Coming to the EPA fresh from her PhD and MPH in Biostatistics at the University of Michigan, she joined the Washington Statistical Society soon after (ASA membership was not a requirement at the time). Then professionally junior, she was encouraged by the WSSÕs offshoot of a group (the Washington Alliance of Statisticians) to mentor statisticians in the early years of their careers. Eventually, she did join the American Statistical Association (where she has served on the Committee on Women in Statistics since 2003, as chair in 2005) and the Government Statistics Section (this latter about 5 years ago). She also belongs to the International Biometric Society/Eastern North American Region (ENAR; Regional Advisory Board, 2002-2004), the Caucus for Women in Statistics (Representative at Large, 1992-1994; President, 1998 with 1997-2000 term), and the Association for Women in Science/DC Chapter (Treasurer 1988-2003).
Darryl Creel, RTI International
Darryl V. Creel is a Research Statistician at RTI International and has worked there for three and a half years. Prior to working at RTI, he worked at Mathematica Policy Research. In 1997, he received an M.S. in Statistical Science from George Mason University and became a member of the Washington Statistical Society. His general research interest is the survey process. His specific research areas of interest are imputation, nonresponse adjustments, and the analysis of survey data. For a decade, he has enjoyed the benefits of being a WSS member and would like the opportunity to give something back to the statistical community.
Candidate for Treasurer (select one)
Jane Li, Westat
Jane Li is a sampling statistician at Westat since August 2007. She received her master's degree in Economics from University of Delaware in December 2002. After that she was admitted to the doctoral program in the Joint Program in Survey Methodology at University of Maryland and received her Ph.D. in December 2007.
Return to topAnnual Dinner
Wednesday June 25, 2008, 6:30 PM
Cocktails 6:00 PM
Meiwah Restaurant, 4457 Willard Ave, Chevy Chase, MD
(Metro: Friendship Heights on the Red Line)
Keynote Presentation
Thomas Lumley, University of Washington
Open Source Statistical Software: Why? When? Where?
Made Possible by Funding from RTI International
Cash Bar
Variety of Dishes Including:
Crispy Fried Shredded Beef
Chicken with Black Bean Sauce
Shanghai Bokchoi with Fresh Mushroom
Price
$45.00 per person
Registration
Please register at: https://www.123signup.com/event?id=tmkbh
Or call/email: Yves Thibaudeau, (301) 763-1706, Yves.Thibaudeau@census.gov
Downlaad the flyer (pdf)
Return to top
Master of Science Program in Biostatistics
Georgetown University
Department of Biostatistics, Bioinformatics and Biomathematics
Are you interested in genetics, bioterrorism, international health, bioinformatics, epidemiology or health policy? Get the analytic tools to meet the demands of the 21st century!
You are invited to apply for the Master of Science Program in Biostatistics with tracks in Bioinformatics and/or Epidemiology for Fall 2008. Application deadline is June 30, 2008.
Graduateswith an MS degree in Biostatistics go on to successful and lucrative positions in academic centers, pharmaceutical companies, biotech companies and private consulting firms.
For more information, visit:
http://www1.georgetown.edu/gumc/departments/biostatistics/ or
e-mail Caroline at ctw26@georgetown.edu.
Apply now for Fall 2008 at:
http://grad.georgetown.edu/pages/apply_online.cfm
106th Meeting of the Committee on National Statistics
The National Academies, NAS building, 2100 C St., NW, Washington, DC
Friday, May 9, 2008
Open Session
Public Seminar and Reception
2:00 p.m. |
Light refreshments for Seminar Guests NAS Great Hall |
Seminar (Auditorium) | |
2:30 |
Welcome and Introduction Bill Eddy, CNSTAT and Carnegie Mellon University |
2:35 p.m. |
Developments at the OMB Statistical and Science Policy Office Katherine Wallman, Chief Statistician |
2:45 p.m. |
Featured Topic: 25 Years (and Counting) of Cognitive Survey Research: Accomplishments, Current Work, and Future Opportunities
Panel:
|
3:45 p.m. | Floor Discussion |
4:00 p.m. |
Reception Great Hall |
5:00 p.m. | Adjourn |
Abstract: The May 2008 seminar celebrates the 25th anniversary of the NSF-funded "advanced research seminar" organized under CNSTAT. This intense activity brought together survey researchers and cognitive psychologists to brainstorm about how cognitive research insights could enhance the design of survey questionnaires to improve the completeness and accuracy of reporting.
The seminar's report, Cognitive Aspects of Survey Methodology (NRC, 1984), boosted efforts by statistical agencies and other organizations to adopt cognitive questionnaire testing as standard best practice for survey work.
A distinguished panel will offer their observations on cognitive survey research-past, present, and future. Roger Tourangeau, director of the Joint Program in Survey Methodology, University of Maryland at College Park, helped produce the 1984 CASM report; he will review the history and past accomplishments of the field. Gordon Willis, who used cognitive methods extensively at the National Center for Health Statistics and is now Cognitive Psychologist at the National Cancer Institute, will discuss current cognitive survey work. Fred Conrad, research associate professor, Institute for Social Research, University of Michigan, will challenge the field to strengthen its grounding in evidence-based research as it looks to the future.
Special guests will include the chair and members of the original CASM advanced research seminar: Judy Tanur, SUNY Stony Brook (chair), Tom Jabine, CNSTAT consultant, Norman Bradburn, NORC, Monroe Sirken, NCHS retired, and Miron Straf, National Academies.
Note: All venues are handicapped-accessible. There is limited first-come, first-served parking in the visitors' lot on 21st St. The nearest Metro station is Foggy Bottom at 23rd and I Sts., NW (Blue and Orange Lines).
Please RSVP by May 6 to Bridget Edmonds at 202-334-3096 or cnstat@nas.edu. Return to top
Save The Date
Book Signing at Reiter's Books
Wendy Alvey and Fritz Scheuren
Elections and Exit Polls
Wednesday, June 11th 12:00 noon-2:00 pm
Reiter's Books
1990 K Street NW
Washington DC
Lite refreshments (wine, cheese, and soft drinks) provided.
Sponsored by WSS Human Rights Section and WSS Public Policy Section
Look forward to seeing you there!
Return to top
Students' Corner
Jill Montaquila, former president of the Washington Statistical Society, recently forwarded the following information to me. The Consortium for the Advancement of Undergraduate Statistics Education (CAUSE) has announced its second biennial Undergraduate Statistics Project Competition (USPROC). The competition is open to undergraduate students globally. Under the guidance of at least one instructor sponsor, working in groups of up to three people, students participating in the competition will present posters showcasing a study in which they have applied statistical methods to real-world data. The top four projects will earn monetary awards of $750 for first place, $500 for second place, $250 for third place and $100 for fourth place.
Students conducting their projects in Spring of 2008 are encouraged to submit their projects as early as May 2008. CAUSE will be accepting submissions for USPROC starting May 5th, 2008, up to January 26th, 2009. Winners will be announced March 7th, 2009. To learn more about the competition, including project scope, the criteria by which projects will be judged, and guidelines for submission, see this document:
http://bist.pbwiki.com/f/Guidelines_USPROC_2009.doc
Or visit the CAUSE website:
http://www.causeweb.org/usproc/
Thanks for this information, Jill!
If you're a student attending a high school or middle school in the Washington, D.C. area, here's an announcement for a competition for you to consider: the Curtis Jacobs Memorial Prize For Outstanding Statistics Project. Students must have a teacher or an advisor who will cover material on statistics during the 2007/2008 academic year, and must submit a five- to ten-page typewritten report of their statistical project. The deadline is May 18, 2008.
See this web page for details:
http://www.scs.gmu.edu/~wss/jacobs.html
If you don't have plans for the summer, consider taking a summer course in statistics. I recently received in the mail an announcement for the ICPSR Summer Program in Quantitative Methods of Social Research. It has a fairly large and varied course offering, ranging from undergraduate to graduate-level and beyond. Most of them are held in Ann Arbor, Michigan, at the University of Michigan. For more details, see this web page: http://www.icpsr.umich.edu/summerprog. I note that by the time you read this, the deadline will unfortunately have passed for most of the offerings. But perhaps you can still take one of the 3- or 5-day workshops.
If you're in the situation where there's a topic that you simply must learn about, then perhaps you might consider taking one of the workshops. For example, maybe you have sunk a lot of time into your thesis when you realize that there is some statistical topic that you need to learn about, but that isn't offered by your department. Perhaps you can take one of the summer workshops, then.
Registration for the Joint Statistical Meetings opens May 1, 2008. See this web page for details:
http://www.amstat.org/meetings/jsm/2008/
This month, I'll present the second of this two-part tutorial on getting started with C++. Last month, we installed Dev-C++, a full-featured C++ integrated development environment. Then we used Dev-C++ to compile a small "Hello World" program, and to compile Newmat, a C++ implementation of useful linear algebra classes and functions into a statically linked library. This month, we'll make a small program that demonstrates some of the possibilities with Newmat and with C++ in general. Afterwards, I'll give some suggestions for further projects. I will assume that you have already gotten through the first part of this tutorial.
Let's continue, then.
- Create New C++ Project: Calling a Function, and Matrix Manipulations
Here, we'll compile a simple C++ program that demonstrates how to use the Newmat source code to perform linear algebra computations, using Dev-C++. This program first calls a function that reads data from a plain text file into a Newmat matrix. It then uses matrix transpose and inverse operations to compute a least-squares estimate, and then finally it demonstrates how to compute Singular Value Decomposition using Newmat.
- Repeat Steps III(B) - III(C) from Part I of this tutorial to create a new C++ Console Program project named matrixdemo. Save the project into the matrixdemo folder we created in step II, accepting the default name of matrixdemo.dev.
- As in Step III(D), a small default program appears. Modify it so that it looks like this (if you're reading an electronic copy of this tutorial, copying from this page and pasting into Dev-C++ would be most convenient here):
#include "matrixdemo.h"
int main(int argc, char *argv[])
{
// Read data matrix into Matrix 'rawdata'
Matrix rawdata;
ReadTextFileIntoMatrix(argv[1],rawdata);
// Copy the first column of 'rawdata' into ColumnVector 'Y'
// Copy the rest of 'rawdata' into Matrix 'X'
ColumnVector Y = rawdata.SubMatrix(1,10,1,1);
Matrix X = rawdata.SubMatrix(1,10,2,4);
// Compute Bhat = inv(X'*X)*X*Y.
// If 'X' is a Newmat matrix, then
// X.t() is the transpose and X.i() is the inverse
// (X.t()*X).i() is therefore the inverse of (X'*X)
Matrix Bhat = (X.t()*X).i() * X.t() * Y;
cout << "Least Squares Estimates:" << endl;
cout << "Y = " << endl << Y << endl;
cout << "X = " << endl << X << endl;
cout << "Bhat = " << endl << Bhat << endl;
// Compute Singular Value Decomposition of X
DiagonalMatrix D;
Matrix U,V;
SVD(X,D,U,V);
cout << "Singular Value Decomposition of X:" << endl;
cout << "U = " << endl << U << endl;
cout << "D = " << endl << D << endl;
cout << "V = " << endl << V << endl;
system("PAUSE");
return EXIT_SUCCESS;
- As in Step III(E), save the program as a file named main.cpp in the matrixdemo folder.
- Create anew source code file by selecting Project -> New file. This opens up a new tabbed pane with a temporary name like [*] Untitled1. Cut and paste the lines of code shown below into this new tabbed pane, and then click on the Save button to save the content into a file. Save the file in the matrixdemo folder, naming it matrixdemo.h. This header file contains two lines that include two header files from the Newmat source code, which are required if we want to use Newmat. The last three lines are function declarations.
#include <cstdlib>
#include <iostream>
using namespace std;
#define MAX_STR_LEN 1024
#include "newmatio.h"
#include "newmatap.h"
int CountWordsInFile(char *filename);
int CountLinesInFile(char *filename);
void ReadTextFileIntoMatrix(char *filename, Matrix &M);
When you're done, the Dev-C++ window should look like the above picture. Note that there are now two tabs above the editor pane, one labeled main.cpp and one labeled matrixdemo.h. Clicking on either tab allows you to edit the corresponding file.
- Repeat Step VI(D), saving the following lines of code into a new file named CountWordsInFile.cpp. This function's name tells you its function: it counts the words in a file. MAX_STR_LEN is a macro which was defined in matrixdemo.h. And the fscanf function reads data from a formatted line into a variable.
#include "matrixdemo.h"
int CountWordsInFile(char *filename) {
int count = 0;
char dummy[MAX_STR_LEN];
FILE *fp = fopen(filename,"r");
while (fscanf(fp,"%s",dummy) != EOF)
count++;
fclose(fp);
return count;
}
- Repeat Step VI(D), saving the following lines of code into a new file named CountLinesInFile.cpp. This function is very similar to the preceding function, but it counts the lines rather than the words in a file. Compare it to the preceding function; the fgets function reads in a line with a specified maximum length.
#include "matrixdemo.h"
int CountLinesInFile(char *filename) {
int count = 0;
char dummy[MAX_STR_LEN];
FILE *fp = fopen(filename,"r");
while (fgets(dummy,MAX_STR_LEN,fp)!=NULL)
count++;
fclose(fp);
return count;
}
- Repeat Step VI(D), saving the following lines of code into a new file named ReadTextFileIntoMatrix.cpp. The "Matrix &M" in the argument list of this function, specifically the ampersand, indicates that matrix M is being passed by reference. This is an efficient way to pass relatively large objects (such as a matrix) between functions. The atof function converts a character string to a floating point numeric value.
#include "matrixdemo.h"
void ReadTextFileIntoMatrix(char *filename, Matrix &M) {
// Count number of words and rows in text file.
int words = CountWordsInFile(filename);
int rows = CountLinesInFile(filename);
int columns = words/rows;
// Read values from text file into Matrix
M.ReSize(rows,columns);
char dummy[MAX_STR_LEN];
FILE *fp = fopen(filename,"r");
for (int i=1; i<= rows; i++) {
for (int j=1; j<=columns; j++) {
fscanf(fp,"%s",dummy);
M(i,j) = (Real) atof(dummy);
}
}
fclose(fp);
}
- OK, this one is different. Select File -> New -> Source File. A dialog box labeled Confirm will appear, asking whether to add the new file to the current project. Click on the No button. A new tabbed editor pane will appear, with a label like "Untitled1." Since we have chosen not to make this file part of the matrixdemo project, an entry for it will not appear in the Project pane on the left.
Copy the following lines of data into the new "Untitled" tabbed editor pane, and then save it into new file in the matrixdemo folder named data.txt. This is sample data that our program will read in. The first column will be the observed data vector, Y, while the 2nd through 4th columns will be the design matrix, X.
-9.1853 1.0000 -9.0000 -2.0000
-5.0942 1.0000 -7.0000 -2.0000
3.1270 1.0000 -5.0000 3.0000
7.9134 1.0000 -3.0000 3.0000
6.6324 1.0000 -1.0000 -2.0000
10.0975 1.0000 1.0000 -2.0000
19.2785 1.0000 3.0000 3.0000
23.5469 1.0000 5.0000 3.0000
22.9575 1.0000 7.0000 -2.0000
26.9649 1.0000 9.0000 -2.0000
Note that you really do not want to make this file part of the project. I tried that, and the compiler attempted to compile this text file as if it were C++ source code, generating an error!
- We need to tell the compiler where to find the Newmat static library, as well as the Newmat header files. Here's how to do this.
- Select Project -> Project Options. This will pop up the Project Options window.
- In the Project Options window, click on the tab labeled Parameters and do the following.
- Click on the button labeled Add Library or Object, in the lower right corner. This pops up a file browser window labeled Open.
- In theOpen window, browse to the newmat folder that we created in Step II. Select the file named newmat.a, that we created in step V(E). Then click on the OK button. This dismisses the Open window, and causes an entry for the Newmat library to be entered into the list under Linker, in the Project Options window. This tells the compiler where to find the Newmat library. The Project Options window should now look like this:
- In the Project Options window, click on the Directories tab and do the following.
- Click on the tab labeled Include Directories.
- Click on the button that looks like a folder with content in the lower right corner. This pops up a window labeled Browse for Folder.
- In the Browse for Folder window, browse to and select the folder newmat. We placed the Newmat files in this folder, so it now contains the newmatio.h and newmatap.h header files that are referenced in the include statements in matrixdemo.h. Selecting the newmat folder will turn it blue; this tells the compiler where to find those header files.
- Clickon the OK button; this will dismiss the Browse for Folder window, and cause the full path to the newmat folder to appear in the Project Options window.
- In the Project Options window, click on the button labeled Add. This causes an entry for the newmat folder to appear in the list under Include Directories, in the Project Options window. Do not forget to click on the Add button, otherwise your folder selection won't "take"! When you're done, the Project Options window should now look like this:
Note that the newmat folder is now listed in the large white pane.
- In the Project Options window, click on the OK button. This dismisses the Project Options window.
- Select Execute -> Parameters... This opens up a window named Parameters. Type the filename
data.txt
into the entry field labeled Parameters to pass to your program:
Then click on the OK button.
Now that you've done this, when you run the program from within Dev-C++, the name of the data file will be included as the command line argument. It will be as if you had gotten into a DOS window, navigated to the matrixdemo folder, and in there typed in the command
matrixdemo data.txt
i.e., the word "data.txt" will then be passed to the main function of the program, and is referred to within the main function as argv[1] (see Step VI(B) above). If you skip this step, you'll see an error like the following if you attempt to run the program!
- Press the F9 key to Compile and Run the program. If you watch the Status of the compilation closely, you'll notice that there's a distinct Linking step; this is where the main, ReadTextFileIntoMatrix, CountWordsInFile, and CountLinesInFile functions as well as the Newmat library all get "linked" together to build the final stand-alone binary executable file, matrixdemo.exe.
- If you get error messages, diagnostic information will be printed out at the bottom of the Dev-C++ window, pinpointing the location (file and line number) of the error. If there are errors, go back to the tabbed editor pane for the file containing the error, fix the error, and then try a Compile and Run again. Otherwise, if your code is error-free, you should see the following output displayed in a DOS window.
Least Squares Estimates:
Y =
-9.185300
-5.094200
3.127000
7.913400
6.632400
10.097500
19.278500
23.546900
22.957500
26.964900
X =
1.000000 -9.000000 -2.000000
1.000000 -7.000000 -2.000000
1.000000 -5.000000 3.000000
1.000000 -3.000000 3.000000
1.000000 -1.000000 -2.000000
1.000000 1.000000 -2.000000
1.000000 3.000000 3.000000
1.000000 5.000000 3.000000
1.000000 7.000000 -2.000000
1.000000 9.000000 -2.000000
Bhat =
10.623860
2.004162
0.947530
Singular Value Decomposition of X:
U =
-0.495434 0.258199 -0.316228
-0.385337 0.258199 -0.316228
-0.275241 -0.387298 -0.316228
-0.165145 -0.387298 -0.316228
-0.055048 0.258199 -0.316228
0.055048 0.258199 -0.316228
0.165145 -0.387298 -0.316228
0.275241 -0.387298 -0.316228
0.385337 0.258199 -0.316228
0.495434 0.258199 -0.316228
D =
18.165902
7.745967
3.162278
V =
0.000000 0.000000 -1.000000
1.000000 0.000000 0.000000
-0.000000 -1.000000 0.000000
Press any key to continue . . .
- Here's another way to run the compiled program. Get into a Windows file browser and navigate to the matrixdemo folder. Click-and-drag the file data.txt onto the file matrixdemo.exe. You should see the same output that you saw in the previous step.
- Finally, here's a taste of Dev-C++'s debugger. I must admit that in my hands, the debugger seems a little buggy; sometimes if I did certain things in a certain order, Dev-C++ seemed to hang. Perhaps it is still a work in progress. But let's give this a try anyway. Go to the tabbed editor pane and click on the ReadTextFilesIntoMatrix.cpp tab to get to that function. Position the cursor at the very beginning of the 9th line in the file (the line where the Matrix M is resized), and single left-mouse-click there. In the lower left corner of the Dev-C+ window, the numbers "9: 1" should appear, indicating that the editor's cursor is now positioned at the first character of the 9th line.
- SelectDebug -> Run to Cursor; or equivalently, press <Shift-F4>. The DOS window appears as before, but for now contains no text (because we haven't gotten to that part of the program yet). At the same time, the 9th line in ReadTextFilesIntoMatrix.cpp is highlighted in blue. This means that the program was executed up to that 9th line, but that execution has now been paused at that line. Notice also that at the same time, the white fieldon the left has changed; the Debug tab is now on top.
- Move your mouse cursor so that it is positioned over any occurrence of the variable words in the function, and wait for about two seconds. An entry showing the current value of the variable rows should appear in the Debug pane on the left. If not, slightly reposition the cursor over the word words and wait two seconds again. Do the same for the rows, columns, dummy, and filename variables. Can you get entries for the variables i and j to appear, too? What about the variable M? (I must admit that I wasn't able to get the debugger to display the actual contents of the Matrix if you're able to figure this out, please email me and tell me how you did it!) The Dev-C++ window will now look something like this:
- Press the F7 key about twenty times, slowly. Each time you press it, the blue highlight advances one line in ReadTextFileIntoMatrix.cpp, and each line is executed in sequence. At the same time, the displayed values of the variables i, j, and dummy are updated in the Debug tab.
- It would be too boring to have to press the F7 key all the way to the end of the ReadTextFileIntoMatrix function. So, let's skip all the loops to get to the end of the function. Position the cursor at the very beginning of the 18th line in the file (the line where the fclose command appears), and single left-mouse-click there. In the lower left corner of the Dev-C+ window, the numbers "18: 1" should appear. Press
, so that execution then runs to that line. The fclose command is now highlighted in blue, indicating that execution has run up to that point.
- Press the F7 key twice. Program execution passes back to the main function, so the main.cpp tab is brought to the front of the tabbed editor pane.
- Press the F4 key; this should bring up a window named New Variable Watch. In this window, type
(char*)argv[1]
so that the window looks like this:
Then press the OK key. This adds the first command line argument to the variables being watched in the Debug pane on the left. It should indicate that the current value of argv[1] is "data.txt". (A hexadecimal number is also shown before "data.txt" in the Debug pane; this is probably the memory address.)
- Press the F7 key about fifteen times, slowly, and watch the DOS window as execution proceeds through the rest of the program. Text appears in the DOS window as each cout command is executed.
- When you finally get to the system("PAUSE") command, you'll have to select the DOS window by clicking on it. Then press any key to continue, as before.
- Back in the Dev-C++ window, you may need to press the F7 key one or two more times to complete execution of the program, and to make the DOS window finally disappear.
- Press the F7 key twice. Program execution passes back to the main function, so the main.cpp tab is brought to the front of the tabbed editor pane.
- Click on the tab labeled Include Directories.
- Click on the button labeled Add Library or Object, in the lower right corner. This pops up a file browser window labeled Open.
This concludes the tutorial. Of course, we've only barely scratched the surface here. Newmat has many more linear algebra operations for you to use. Similarly, Dev-C++ has many features, and we've only seen the basics. As for C++, there are many books and online tutorials for you to try; I suggest one possible book for further study in the References section below.
- Select Project -> Project Options. This will pop up the Project Options window.
- Repeat Steps III(B) - III(C) from Part I of this tutorial to create a new C++ Console Program project named matrixdemo. Save the project into the matrixdemo folder we created in step II, accepting the default name of matrixdemo.dev.
- Suggestions for Further Exploration
- Learn more about Newmat's functionality. I refer you to Dr. Robert Davies' website, where you can find excellent documentation for Newmat. See the References section below.
- Dr. Davies has his own suggestions on how to compile Newmat using Dev-C++; see http://www.robertnz.net/Dev_C.html. Compare his approach to what we did in Steps V and VI. How do the two approaches differ?
- Step VI(G) featured a function ReadTextFileIntoMatrix. Develop a function named WriteMatrixToTextFile that does the reverse operation, i.e., that writes a Newmat matrix into a tab- or space-delimited text file, with one row in the text file for every row in the matrix.
- Write a program to read in a set of p-values into a Newmat ColumnVector from a plain text file. Sort the p-values from smallest to largest, using the built-in Newmat function for sorting the values in a vector. Then compute a threshold based on False Discovery Rate from the sorted p-values (Benjamini and Hochberg, 1995; Tobia et al., 1999).
- Obtain a copy of the famous Numerical Recipes book (the 3rd edition has recently been published, a milestone in scientific computing. Its name has been truncated from Numerical Recipes in C). Using Numerical Recipes code, write software to convert a statistical test (e.g., a t-test or F-test) to a p-value, given the degrees of freedom. Can you figure out how to write software to do the reverse operation: given a p-value and degrees of freedom, compute the corresponding critical threshold of a statistical test?
- Using what you developed in VII(B) above, extend the simple least-squares estimates program we wrote in step VI to compute p-values.
- Take all of the Numerical Recipes functions and build a statically linked library out of them, like we did in Step V above. Now you have two nifty libraries at your disposal, the Newmat library and the Numerical Recipes library!
- Learn more about Newmat's functionality. I refer you to Dr. Robert Davies' website, where you can find excellent documentation for Newmat. See the References section below.
That's all for this month. If you have any feedback on this column or ideas for future topics, please email me at jmm97@georgetown.edu. As always, your thoughts will be greatly appreciated.
Joe Maisog
Georgetown University / Medical Numerics
(again, with thanks to Lanlan Yin for test-driving this tutorial; any errors remain my own).
References
Benjamini Y and Hochberg Y, Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B 57: 289 - 300 (1995).
Brown D and Satir G, C++: The Core Language, Cambridge, MA: O'Reilly, Inc., 1995.
Davies R, Newmat website, http://www.robertnz.net/nm11.htm
Dev-C++ website, http://www.bloodshed.net/dev/devcpp.html
Press WH, Teukolsky SA, Vetterling WT, and Flannery BP, Numerical Recipes 3rd Edition: The Art of Scientific Computing, 3rd ed., Cambridge: Cambridge University Press, 2007.
Tobia RD, Rom D, Wolfinger RD, Hochberg Y (authors), Westfall PH (ed.), SAS Institute (corporate author), Multiple Comparisons and Multiple Tests Using the SAS System, Cary, NC: SAS Institute, 1999.
Return to topSIGSTAT Topics for Spring 2008
May 21, 2008: Survival Models in SAS: PROC PHREG - Part 2
(http://www.sas.com/apps/pubscat/bookdetails.jsp?pc=55233)
Continuing the series of talks based on the book "Survival Analysis Using the SAS System: A Practical Guide" by Paul Allison begun in October 2007, we'll start Chapter 5: Estimating Cox Regression Models with PROC PHREG.
Topics covered are: Tied data
June 18, 2008: Survival Models in SAS: PROC PHREG - Part 3
(http://www.sas.com/apps/pubscat/bookdetails.jsp?pc=55233)
Continuing the series of talks based on the book "Survival Analysis Using the SAS System: A Practical Guide" by Paul Allison begun in October 2007, we'll start Chapter 5: Estimating Cox Regression Models with PROC PHREG.
Topics covered are: Time-Dependent Covariates
SIGSTAT is the Special Interest Group in Statistics for the CPCUG, the Capital PC User Group, and WINFORMS, the Washington Institute for Operations Research Service and Management Science.
All meetings are in Room S3031, 1800 M St, NW from 12:00 to 1:00. Enter the South Tower & take the elevator to the 3rd floor to check in at the guard's desk.
First-time attendees should contact Charlie Hallahan, 202-694-5051, hallahan@ers.usda.gov, and leave their name. Directions to the building & many links of statistical interest can be found at the SIGSTAT website, http://www.cpcug.org/user/sigstat/.
Return to topNote From The WSS NEWS Editor
Items for publication in the June issue of the WSS NEWS should be submitted no later than May 15, 2008. E-mail items to Michael Feil at michael.feil@usda.gov.
Return to topClick here to see the WSS Board Listing (pdf)
Return to top