Improving the accuracy of automated occupation coding Event as iCalendar

(School of Social Sciences)

06 May 2016

3 - 4pm

Venue: Room 104, Fale Pasifika Complex (273-104)

Contact info: Dr Barry Milne

Contact email:

Website: COMPASS

Professor Matthias Schonlau, University of Waterloo

Occupation coding, an important task in official statistics, refers to coding a respondent's text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually at great expense. We propose two new methods for automatic coding that also apply when only a fixed proportion of the text answers are to be coded automatically. Using data from the German General Social Survey (ALLBUS), we show that both methods improve on both the coding accuracy of the underlying statistical/ machine learning algorithm and the coding accuracy of duplicates where duplicates exist. (Co-authors: Hyukjun Gweon, U of W, Lars Kaczmirek, GESIS, Germany, Michael Blohm, GESIS, Germany, Stefan Steiner, U of W).

Matthias Schonlau is a Professor in the Department of Statistics and Actuarial Science at the University of Waterloo. Prior to his academic career, he spent 14 years at the RAND Corporation (USA), the Max Planck Institute for Human Development in Berlin (Germany), the German Institute for Economic Analysis (DIW), National Institute of Statistical Sciences (USA), and AT&T Labs Research (USA). He is on sabbatical at the University of Auckland until July 2016. His research interests evolve around survey methodology with a current emphasis on categorizing open-ended questions using text mining. He is the lead author of the book "Conducting Research Surveys via E-Mail and the Web".

Join the Faculty of Arts Public Lecture Mailing List