Title: Mining in the Presence of Class Imbalance: Precision-Recall Curves and the F-Measure
- Speaker: Jacqueline M. Hughes-Oliver
- Date & Time: Friday, April, 11, 11-12 pm
- Location: Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)
- Directions: Foggy Bottom-GWU Metro Stop on the Orange and Blue Lines. The campus map is at http://www.gwu.edu/explore/visitingcampus/campusmaps.
- Sponsor: The George Washington University, Department of Statistics. See http://departments.columbian.gwu.edu/statistics/academics/seminars for a list of seminars.
Abstract:
Algorithms for anomaly detection and information retrieval are designed to identify and characterize "unusual" subjects. As a result, they are typically applied in situations where class membership is not balanced and may even be highly imbalanced. Assessment of the effectiveness of such algorithms has increasingly abandoned the idea of overall accuracy or error rates due to their inability to distinguish between different types of errors. Even the popular receiver operating characteristic (ROC) curve is being pushed aside because of its property of being independent of class imbalance. In an attempt to assess an algorithm both with respect to its accuracy (as measured by the sensitivity, also known as true positive rate, also known as recall) and its utility (as measured by the positive predictive value, also known as precision), the precision-recall (PR) curve is gaining popularity. In this work, we investigate properties of the PR curve and some related summary measures. Discussion is aided by application to real and simulated datasets.