David Jurgens

It's me, David Jurgens!

I research how humans behave by observing the things we say, what we do, and who we are. My research combines linguistic analysis and network science together to understand behavior in its natural social context. I collaborate with colleagues from areas such as Psychology, Linguistics, Digital Humanities, and Sociology to improve our theories using data-driven insights and methodologies.

Prospective students: I am not actively recruiting PhD students who would start Fall 2025. However, I will still look at applications in CSE for those work on Natural Language Processing. I am most interested in students who have significant experience with NLP methods and a strong interest (or coursework) in social psychology or experience with experiments.

Publications

2024
Not all good Wikipedia articles stay good. Why is that? Read our paper to find out. A Test of Time: Predicting the Sustainable Success of Online Collaboration in Wikipedia.
Abraham Israeli, David Jurgens, and Daniel Romero.
preprint.
paper  ·  code and data
Optimizing the system and task parts of the prompt can have huge benefits SPRIG: Improving Large Language Model Performance by System Prompt Optimization.
Lechen Zhang, Tolga Ergen, Lajanugen Logeswaran, Moontae Lee, and David Jurgens.
preprint.
paper  ·  code and data
The prompt matters in how human an LLM can seem Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue.
Johnathan Ivey, Shivani Kumar, Jiayu Liu, Hua Shen, Sushrita Rakshit, Rohan Raju, Haotian Zhang, Aparna Ananthasubramaniam, Junghwan Kim, Bowen Yi, Dustin Wright, Abraham Israeli, Anders Giovanni Møller, Lechen Zhang, David Jurgens.
preprint.
paper  ·  code and data
Pathways of linguistic diffusion seen on Twitter Networks and Identity Drive Geographic Properties of the Diffusion of Linguistic Innovation
Aparna Ananthasubramaniam, David Jurgens, Daniel M. Romero.
npj Complexity. 2024.
pdf
The pipeline for collecting data of traumatic events The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI
Miriam Schirmer, Tobias Leemann, Gjergji Kasneci, Jürgen Pfeffer, and David Jurgens.
Findings of EMNLP. 2024.
pdf
Communities respond differently to the same message depending on their underlying values ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions
Chan Young Park, Shuyue Stella Li, Hayoung Jung, Svitlana Volkova, Tanushree Mitra, David Jurgens, and Yulia Tsvetkov.
Findings of EMNLP. 2024.
paper  ·  code and data
Tables are data too. Maybe they can be text as well! Tab2Text - A framework for deep learning with tabular data
Tong Lin*, Jason Yan*, David Jurgens, and Sabina Tomkins.
Findings of EMNLP. 2024.
preprint forthcoming
LLMs answer questions more or less accurately depending on the social roles in the question prompt Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts
Mingqian Zheng, Jiaxin Pei, Lajanugen Logeswaran, Moontae Lee, and David Jurgens.
Findings of EMNLP. 2024.
paper  ·  code and data
Human-AI Alignment is bidirectional Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions.
Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, and David Jurgens.
preprint.
paper
Socially aware language technologies and their connections with linguistics, social sciences, and NLP The Call for Socially Aware Language Technologies.
Diyi Yang, Dirk Hovy, David Jurgens, and Barbara Plank.
preprint.
paper
A Multilingual Similarity Dataset for News Article Frame.
Xi Chen, Mattia Samory, Scott Hale, David Jurgens, Przemyslaw A Grabowicz Proceedings of the International AAAI Conference on Web and Social Media (ICWSM).
paper  ·  data
Large language models are bad at answering psychological questionnaires consistently You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments
Bangzhao Shu*, Lechen Zhang*, Minje Choi, Lavinia Dunagan, Dallas Card, and David Jurgens.
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
paper  ·  code and data
Memes are multimodal constructions where the base image template and additional text fills both have semantic value. Social Meme-ing: Measuring Linguistic Variation in Memes
Naitian Zhou, David Jurgens, and David Bamman.
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
paper  ·  code and data
The empathetic alignment between an author and responder on Reddit shows most people just give advice. Modeling Empathetic Alignment in Conversation
Jiamin Yang and David Jurgens.
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
paper  ·  code, models, and data  ·  Jiamin's amazing annotation tool
Strong influence connections in the global news network. Global News Synchrony During the Start of the COVID-19 Pandemic
Xi Chen, Scott A. Hale, David Jurgens, Mattia Samory, Ethan Zuckerman, Przemyslaw Adam Grabowicz.
Proceedings of the 2024 Web Conference.
paper  ·  code and data
The network model for estimating contextual informativeness. Finding Educationally Supportive Contexts for Vocabulary Learning with Attention-Based Models
Sungjin Nam, Kevyn Collins-Thompson, David Jurgens and Xin Tong.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).
paper
Author mentions in science news reveal widespread disparities across name-inferred ethnicities.
Hao Peng, Misha Teplitskiy, David Jurgens.
Journal of Quantitative Social Sciences.
pdf (preprint)
2023
The answers of LLMs align with the perceptions of specific social groups. Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks
Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens.
preprint.
paper  ·  code and data
zero-shot LLM performance on social language tasks Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
Minje Choi,* Jiaxin Pei,* Sagar Kumar, Chang Shu and David Jurgens.
Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 2023.
paper  ·  code and data
Media storms over time with labels When it Rains, it Pours: Modeling Media Storms and the News Ecosystem
Ben Litterer, David Jurgens, and Dallas Card.
Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). 2023.
paper (forthcoming)
Effect sizes of identity disclosure on tweet and retweet-level activities. Profile Update: The Effects of Identity Disclosure on Network Connections and Language
Minje Choi, Daniel Romero, David Jurgens.
preprint.
paper
The causal DAG for the paper. RCT Rejection Sampling for Causal Estimation Evaluation
Katherine A. Keith, Sergey Feldman, David Jurgens, Jonathan Bragg, Rohit Bhattacharya.
preprint. 2023.
paper  ·  code and data
The probability that, given an appropriate message for the relationships represented by a row, the message will also be appropriate in another relationship listed in the column. Probabilities are calculated across the entire data Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships
David Jurgens,* Agrima Seth,* Jackson Sargent, Athena Aghighi, and Michael Geraci..
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 2023.
paper  ·  code and data
Relative use of politeness strategies when annotators rewrite emails to be more polite When Do Annotator Demographics Matter? Measuring The Influence of Annotator Demographics with the POPQUORN Dataset
Jiaxin Pei and David Jurgens.
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII) at ACL. 2023.
paper  ·  code and data
The causal-estimated effect of banning on users matching the style of others Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics
Aparna Ananthasubramaniam, Hong Chen, Jason Yan, Kenan Alkiek, Jiaxin Pei, Agrima Seth, Lavinia Dunagan, Minje Choi, Benjamin Litterer and David Jurgens.
(Best Paper)
Proceedings of the 1st Workshop on Social Influence in Conversations (SICon) at ACL. 2023.
paper  ·  code and data
Overall performance on each language. The box indicates the lower quartile to the upper quartile and the whisker indicates the maximum and the minimum. Outliers are shown as dots. Participants generally achieve better performances on languages in the training set and achieved good performance on Arabic and Dutch. Predicting intimacy in Hindi and Korean remains challenging. Moreover, performances on unseen languages generally have larger variances. SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis
Jiaxin Pei, Vítor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens, and Francesco Barbieri.
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval).
paper  ·  data and competition
The effects of personal shocks on people's social media activities Analyzing the Engagement of Social Relationships During Life Event Shocks in Social Media
Minje Choi, David Jurgens, and Daniel Romero.
Proceedings of the International Conference on Web and Social Media (ICWSM). 2023.
paper  ·  code and data
The influence of multilingual individuals on social connectedness in Europe Bridging Nations: Quantifying the Role of Multilinguals in Communication on Social Media
Julia Mendelsohn, Sayan Ghosh, David Jurgens, and Ceren Budak.
(Best Methodology Paper)
Proceedings of the International Conference on Web and Social Media (ICWSM). 2023.
paper  ·  code and data
2022
The way the press portrays certain scientific results differs by where those results were described in the paper Modeling Information Change in Science Communication with Semantically Matched Paraphrases
Dustin Wright, Jiaxin Pei, David Jurgens, and Isabelle Augenstein.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2022.
paper  ·  code and data
Not all empathy papers use empathy in the same way A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing
Allison Claire Lahnala, Charles Welch, David Jurgens, and Lucie Flek.
Proceedings of the Findings of Empirical Methods in Natural Language Processing (EMNLP Findings). 2022.
paper
Potatoes are delicious POTATO: The Portable Text Annotation Tool
Jiaxin Pei, Aparna Kamakshi Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP): Systems Demonstrations. 2022.
paper  ·  code
Citation context sizes MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting
Anne Lauscher, Brandon Ko, Bailey Kuhl, Sophie Johnson, Arman Cohan, David Jurgens, Kyle Lo.
Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2022.
paper  ·  code and data
Citation context sizes The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists
Christina Lu and David Jurgens.
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022.
paper  ·  code and data
Correlations between the ways in which two news articles can be similar. SemEval-2022 Task 8: Multilingual news article similarity
Xi Chen, Ali Zeynali, Chico Camargo, Fabian Flöck, Devin Gaffney, Przemyslaw Grabowicz, Scott Hale, David Jurgens, and Mattia Samory.
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). 2022.
paper  ·  data
The effect of curriculum ordering on word similarity tasks An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications
Sungjin Nam, David Jurgens, and Kevyn Collins-Thompson.
in submission. 2022.
pdf
The effects of mentorship Diversifying the Professoriate
Bas Hofstra, Daniel A. McFarland, Sanne Smith, David Jurgens.
Socius. 2022.
pdf
Similarities in Redditor political affiliations and commenting activity Classification without (Proper) Representation: Political Heterogeneity in Social Media and Its Implications for Classification and Behavioral Analysis
Kenan Alkik, Bohan Zhang, and David Jurgens.
ACL Findings. 2022.
pdf  ·  code
Multilingual performance on grapheme to phoneme conversion ByT5 model for massively multilingual grapheme-to-phoneme conversion
Jian Zhu, Cong Zhang, and David Jurgens.
Interspeech 2022.
pdf · code
Food healthiness ratings Language in Popular American Culture Constructs the Meaning of Healthy and Unhealthy Eating: Narratives of Craveability, Excitement, and Social Connection in Movies, Television, Social Media, Recipes, and Food Reviews
Bradley P. Turnwald, Margaret A. Perry, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, Hazel R. Markus, Alia J. Crum.
Appetitte. 2022.
pdf
Phone-to-audio alignment without text: A Semi-supervised Approach Phone-to-audio alignment without text: A Semi-supervised Approach
Jian Zhu, Cong Zhang, and David Jurgens.
Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing.
pdf · code
Work Expectations, Depressive Symptoms, and Passive Suicidal Ideation Among Older Adults: Evidence From the Health and Retirement Study
Briana Mezuk, Linh Dang, David Jurgens, Jacqui Smith.
The Gerontologist 62 (10), 1454-1465 2022.
paper
2021
Latent classes of biased words and their effects on toxicity Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media
Sayan Ghosh, Dylan Baker, David Jurgens, and Vinodkumar Prabhakaran.
(Best Paper)
Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT).
pdf
Using Sociolinguistic Variables to Reveal Changing Attitudes Towards Sexuality and Gender.
Sky Wang and David Jurgens.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
pdf
Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles.
Jian Zhu and David Jurgens..
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
pdf  ·  code and data
Measuring Sentence-Level and Aspect-Level Certainty in Science Communications
Jiaxin Pei and David Jurgens.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
pdf  ·  code and data
Detecting Community Sensitive Norm Violations in Online Conversations.
Chan Young Park, Julia Mendelsohn, Karthik Radhakrishnan, Kinjal Jain, Tushar Kanakagiri, David Jurgens and Yulia Tsvetkov.
Proceedings of the Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP).
pdf
An Animated Picture Says at Least a Thousand Words: Selecting Gif-based Replies in Multimodal Dialog..
Xingyao Wang and David Jurgens.
Proceedings of the Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP).
pdf  ·  code and data  ·  Slack gif-bot App
Driving cessation pipeline A Data Science Approach to Estimating the Frequency of Driving Cessation Associated Suicide in the US: Evidence From the National Violent Death Reporting System
Tomohiro M. Ko,, Viktoryia A. Kalesnikava, David Jurgens, and Briana Mezuk.
Frontiers in Public Health.
pdf
Teaching is serious business Learning PyTorch Through A Neural Dependency Parsing Exercise
David Jurgens.
Proceedings of the Fifth Workshop on Teaching NLP, 2021.
pdf
Teaching is serious business Learning about Word Vector Representations and Deep Learning through Implementing Word2vec
David Jurgens.
Proceedings of the Fifth Workshop on Teaching NLP, 2021.
pdf
Temporal dynamics of relationships on Twitter More than meets the tie: Examining the Role of Interpersonal Relationships in Social Networks
Minje Choi, Ceren Budak, Daniel Romero, and David Jurgens.
International Conference on Web and Social Media (ICWSM), 2021.
pdf  ·  code
The structure of two online subreddits, which is predictive of their rate of lexical change The Structure of Online Social Networks Modulates the Rate of Lexical Change
Jian Zhu and David Jurgens.
Proceedings of the North American Meeting of the Association for Computational Linguistics (NAACL), 2021.
pdf  ·  code
The effects of framing on audience response to immigration tweets Modeling Framing in Immigration Discourse on Social Media
Julia Mendelsohn, Ceren Budak, and David Jurgens.
Proceedings of the North American Meeting of the Association for Computational Linguistics (NAACL), 2021.
pdf  ·  code
The main architecture for forecasting prosocial behavior Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations
Jiajun Bao*, Junjie Wu*, Yiming Zhang*, Eshwar Chandrasekharan, and David Jurgens.
Proceedings of the Web Conference (WebConf), 2021.
pdf  ·  code
2020
Quantifying Intimacy In Language
Jiaxin Pei and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
pdf  ·  project webpage  ·  code  ·  pip-installable package
Condolence and Empathy in Online Communities
Naitian Zhou and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
pdf  ·  data request form
Still out there: Modeling and Identifying Russian Troll Accounts on Twitter.
(Best Paper Runner-Up)
Jane Im, Eshwar Chandrasekharan, Jackson Sargent, Paige Lighthammer, Taylor Denby, Ankit Bhargava, Libby Hemphill, David Jurgens, Eric Gilbert.
Proceedings of Web Science, 2020.
pdf
Measuring the predictability of life outcomes with a scientific mass collaboration.
Matthew J. Salganik, Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M. Altschul, Jennie E. Brand, Nicole Bohme Carnegie, Ryan James Compton, Debanjan Datta, Thomas Davidson, Anna Filippova, Connor Gilroy, Brian J. Goode, Eaman Jahani, Ridhi Kashyap, Antje Kirchner, Stephen McKay, Allison C. Morgan, Alex “Sandy” Pentland, Kivan Polimis, Louis Raes, Daniel E. Rigobon, Claudia V. Roberts, Diana M. Stanescu, Yoshihiko Suhara, Adaner Usmani, Erik H. Wang, Muna Adem, Abdulla Alhajri, Bedoor AlShebli, Redwane Amin, Ryan B. Amos, Lisa P. Argyle, Livia Baer-Bositis, Moritz Büchi, Bo-Ryehn Chung, William Eggert, Gregory Faletto, Zhilin Fan, Jeremy Freese, Tejomay Gadgil, Josh Gagné, Yue Gaobj, Andrew Halpern-Manners, Sonia P. Hashim, Sonia A. Hausen, Guanhua He, Kimberly Higuera, Bernie Hogan, Ilana M. Horwitz, Lisa M. Hummel, Naman Jain, Kun Jin, David Jurgens, Patrick C. Kaminski, Areg Karapetyan, E. H. Kim, Ben Leizman, Naijia Liu, Malte Möser, Andrew E. Mack, Mayank Mahajan, Noah Mandell, Helge-Johannes Marahrens, Diana Mercado-Garcia, Viola Mocz, Katariina Mueller-Gastell, Ahmed Musse, Qiankun Niu, William P. Nowak, Hamidreza Omidvar, Andrew Or, Karen Ouyang, Katy M. Pinto, Ethan Porter, Kristin E. Porter, Crystal Qian, Tamkinat Rauf, Anahit Sargsyan, Thomas Schaffner, Landon Schnabel, Bryan Schonfeld, Ben Sender, Jonathan D. Tang, Emma Tsurkov, Austin van Loon, Onur Varol, Xiafei Wang, Zhi Wang, Julia Wang, Flora Wang, Samantha Weissman, Kirstie Whitaker, Maria K Wolters, Wei Lee Woon, James Wu, Catherine Wu, Kengran Yang, Jingwen Yin, Bingyu Zhao, Chenyun Zhu, Jeanne Brooks-Gunn, Barbara E. Engelhardt, Moritz Hardt, Dean Knox, Karen Levy, Arvind Narayanan, Brandon M. Stewart, Duncan J. Watts, and Sara McLanahan.

Proceedings of the National Academy of Sciences. Mar 2020, 201915006; DOI: 10.1073/pnas.1915006117 pdf
2019
Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts
Luke Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov.
Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019.
pdf
Perceptions of social roles across cultures.
(Nominated for Best Paper)
Meixing Dong, David Jurgens, Carmen Banea and Rada Mihalcea.
Proceedings of Social Informatics (SocInfo), 2019.
pdf
Suicide Among Older Adults Living in or Transitioning to Residential Long-term Care, 2003 to 2015
Briana Mezuk, Tomohiro M. Ko, Viktoryia A. Kalesnikava, and David Jurgens.
JAMA Network Open 2019;2(6):e195627
pdf
Wetin dey with these comments? Modeling Sociolinguistic Factors Affecting Code-switching Behavior in Nigerian Online Discussions
Innocent Ndubuisi-Obi*, Sayan Ghosh*, David Jurgens.
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019
pdf
The spectrum of abusive behaviors A Just and Comprehensive Strategy for Using NLP to Address Online Abuse
David Jurgens, Libby Hemphill and Eshwar Chandrasekharan.
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019
pdf
Caste attitudes Smart, Responsible, and Upper Caste Only:Measuring Caste Attitudes through Large-Scale Analysis of Matrimonial Profiles
(Best Paper Award)
Ashwin Rajadesingan, Ramaswami Mahalingam, David Jurgens.
Proceedings of the AAAI International Conference on Web and Social Media (ICWSM), 2019
pdf  ·  Press: Times of India, Devdiscourse, Science X, Business Standard
Population inference Demographic Inference and Representative Population Estimates from Multilingual Social Media Data.
Zijian Wang, Scott Hale, David Ifeoluwa Adelani, Przemyslaw Grabowicz, Timo Hartmann, Fabian Flöck and David Jurgens*.
Proceedings of the Web Conference, 2019
*Corresponding senior author
pdf  ·  demo  ·  code  ·  poster (Best Poster Presentation Award)
Group success Are All Successful Communities Alike? Characterizing and Predicting the Success of Online Communities.
Tiago Cunha, David Jurgens, Chenhao Tan and Daniel Romero.
Proceedings of the Web Conference, 2019
pdf
2018
It's going to be okay: Measuring Access to Support in Online Communities.
Zijian Wang and David Jurgens.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018
pdf  ·  supplementary  ·  website and data  ·  code
RtGender: A Corpus of Responses to Gender for Studying Gender Bias.
Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, and Yulia Tsvetkov.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), 2018
pdf  ·  data
Measuring the Evolution of a Scientific Field through Citation Frames.
David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, Dan Jurafsky.
Transactions of the Association for Computational Linguistics (TACL). 2018.
pdf  ·  website and data  ·  code  ·  video
2017
An Analysis of Individuals' Behavior Change in Online Groups.
David Jurgens, James McCorriston, and Derek Ruths.
Proceedings of the 9th International Conference on Social Informatics (SocInfo). 2017.
pdf (preprint)
Writer Profiling Without the Writer's Text.
David Jurgens, Yulia Tsvetkov, and Dan Jurafsky.
Proceedings of the 9th International Conference on Social Informatics (SocInfo). 2017.
pdf (preprint)
Language from Police Body Camera Footage Shows Racial Disparities in Officer Respect.
Rob Voigt, Nicholas P. Camp, Vinod Prabhakaran, William L. Hamilton, Rebecca C. Hetey, Camilla M. Griffiths, David Jurgens, Dan Jurafsky, and Jennifer L. Eberhardt.
Proceedings of the National Academy of Science (PNAS). 2017.
pdf
Incorporating Dialectal Variability for Socially Equitable Language Identification.
David Jurgens, Yulia Tsvetkov, Dan Jurafsky.
Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2017.
pdf  ·  code  ·  slides
2016
User Migration in Online Social Networks: A Case Study on Reddit During A Period of Community Unrest.
Edward Newell*, David Jurgens*, Hardik Vala, Jad Sassine, Caitrin Armstrong, Derek Ruths and Haji Mohammad Saleem.
Proceedings of the 10th International AAAI Conference on Web and Social Media (ICWSM). 2016
pdf
Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel.
Hardik Vala, Stefan Dimitrov, David Jurgens, Andrew Piper and Derek Ruths.
Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC). 2016.
pdf
Semi-supervised Learning with Induced Word Senses for State of the Art Word Sense Disambiguation.
Osman Baskaya and David Jurgens.
Journal of Artificial Intelligence Research (JAIR). 55(1) pp. 1025-1058.
pdf
SemEval-2016 Task 14: Semantic Taxonomy Enrichment.
David Jurgens and Mohammad Taher Pilehvar.
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval). 2016.
pdf  ·  website
2015
Mr. Bennet, his coachman, and the Archbishop walk into a bar but only one of them gets recognized: On The Difficulty of Detecting Characters in Literary Texts.
Hardik Vala, David Jurgens, Andrew Piper, and Derek Ruths.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2015.
pdf  ·  data
Evaluating learning language representations.
J. Karlgren, J. Callin, K. Collins-Thompson, A.C. Gyllensten, A. Ekgren, D. Jurgens, A. Korhonen, F. Olsson, M. Sahlgren, and H. Schütze.
Proceedings of Conference and Labs of Evaluation Forum (CLEF). 2015.
pdf
Reading Between the Lines: Overcoming Data Sparsity for Accurate Classification of Lexical Relationships.
Silvia Necsulescu, Sara Mendes, David Jurgens, Núria Bel, and Roberto Navigli.
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*SEM). 2015.
pdf
Everyone's Invited: A New Paradigm For Evaluation on Non-transferable Datasets.
David Jurgens, Tyler Finethy, Caitrin Armstrong, and Derek Ruths.
Proceedings of the ICWSM Workshop on Standards and Practices in Large-Scale Social Media Research. 2015.
pdf  ·  code  ·  FREESR code  ·  FREESR website  ·  project website
Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice.
David Jurgens, Tyler Finethy, James McCorriston, Yi Tian Xu, and Derek Ruths.
Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM). 2015
pdf  ·  poster  ·  code  ·  website
An Analysis of Exercising Behavior in Online Populations.
David Jurgens, James McCorriston, and Derek Ruths.
Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM). 2015
pdf  ·  poster  ·  website
Organizations are Users Too: Characterizing and Detecting the Presence of Organizations on Twitter.
James McCorriston, David Jurgens, and Derek Ruths.
Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM). 2015
pdf  ·  website  ·  code  ·  data
Cross Level Semantic Similarity: An Evaluation Framework for Universal Measures of Similarity.
David Jurgens, Mohammad Taher Pilehvar, and Roberto Navilgi.
Journal of Language Resources and Evaluation. 50(1) pp. 5-30.
pdf (preprint)
Reserating the awesometastic: An automatic extension of the WordNet taxonomy for novel terms.
David Jurgens and Mohammad Taher Pilehvar.
Proceeding of the Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL-HLT). 2015.
pdf  ·  poster  ·  download  ·  website
2014
It's All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation.
David Jurgens and Roberto Navigli.
Transactions of the Association for Computational Linguistics (TACL) 2014.
pdf  ·  slides: pdf, pptx  ·  games!
Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization.
Ryan Compton, David Jurgens, and David Allan.
Proceedings of the IEEE International Conference on Big Data. 2014.
pdf
Press: Forbes, MIT Technology Review, Business Insider, Daily Caller, Schneier on Security
Twitter users #CodeSwitch hashtags! #MoltoImportante #wow #헐.
David Jurgens, Stefan Dimitrov, and Derek Ruths.
Proceedings of The First Workshop on Computational Approaches to Code Switching. 2014.
pdf  ·  blog post
SemEval-2014 Task 3: Cross-Level Semantic Similarity.
David Jurgens, Mohammad Taher Pilehvar, and Roberto Navigli.
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval) 2014.
pdf  ·  slides  ·  website
Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose.
Daniele Vannella, David Jurgens, Daniele Scarfini, Domenico Toscani, and Roberto Navigli.
Proceedings of the Annual Meeting for the Association for Computational Linguistics (ACL) 2014.
pdf  ·  poster  ·  games!
An analysis of ambiguity in word sense annotations.
David Jurgens.
Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) 2014.
pdf
2013
Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity.
(Best paper nominee)
Mohammad T. Pilehvar, David Jurgens, and Roberto Navigli.
Proceedings of the Annual Meeting for the Association for Computational Linguistics (ACL) 2013.
pdf  ·  slides  ·  code
That's what friends are for: Inferring location in online communities based on social relationships.
David Jurgens.
Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM) 2013.
pdf  ·  slides  ·  video  ·  Press: Follow the Crowd, MIT Technology Review
Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels.
David Jurgens.
Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) 2013.
pdf  ·  poster
Characterizing Online Discussions in Microblogs Using Network Analysis.
Veronika Strnadova, David Jurgens, and Tsai-Ching Lu.
Proceedings of the AAAI Spring Symposium on Analyzing Microtext, 2013.
pdf
SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses.
David Jurgens and Ioannis Klapaftis.
Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval) 2013.
pdf  ·  errata  ·  slides  ·  website
SemEval-2013 Task 12: Multilingual Word Sense Disambiguation.
Roberto Navigli, David Jurgens, and Daniele Vanilla.
Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval) 2013.
pdf  ·  website
2012
Temporal Motifs Reveal the Dynamics of Editor Interactions in Wikipedia.
David Jurgens and Tsai-Ching Lu.
Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM) 2012.
pdf  ·  video
Semeval-2012 task 2: Measuring degrees of relational similarity.
David Jurgens, Saif M Mohammad, Peter D Turney, and Keith J Holyoak.
Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval), 2012.
pdf  ·  slides
An Evaluation of Graded Sense Disambiguation using Word Sense Induction.
David Jurgens.
Proceedings of the First Joint Conference on Lexical and Computational Semantics (*SEM), 2012.
pdf  ·  slides
Friends, Enemies, and Lovers: Detecting Communities in Networks Where Relationships Matter.
David Jurgens and Tsai-Ching Lu.
Proceedings of Web Science, 2012.
pdf
2011
Word sense induction by community detection.
David Jurgens.
Proceedings of the Workshop on Graph-based Methods for Natural Language Processing (TextGraphs), 2011.
pdf
Measuring the impact of sense similarity on word sense induction.
David Jurgens and Keith Stevens.
Proceedings of the First Workshop on Unsupervised Learning in NLP, 2011.
pdf
2010
The S-Space Package: An Open Source Package for Word Space Models.
David Jurgens and Keith Stevens.
Proceedings of the ACL 2010 System Demonstrations, 2010.
pdf  ·  website  ·  Mailing Lists: Users, Developers
Capturing nonlinear structure in word spaces through dimensionality reduction.
David Jurgens and Keith Stevens.
Proceedings of the ACL Workshop on GEometrical Models of Natural Language (GEMS), 2010.
pdf
HERMIT: Flexible clustering for the SemEval-2 WSI task.
David Jurgens and Keith Stevens.
Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval), 2010.
pdf
2009
Event detection in blogs using temporal random indexing.
David Jurgens and Keith Stevens.
Proceedings of the Workshop on Events in Emerging Text Types, 2009.
pdf
2004
Road extraction from motion cues in aerial video.
Robert Pless and David Jurgens.
Proceedings of the 12th annual ACM international workshop on Geographic information systems, 2004.
pdf

Biographical Sketch

David Jurgens is an assistant professor in the School of Information at the University of Michigan. He holds a PhD from the University of California Los Angeles and was a postdoctoral scholar in the Department of Computer Science at Stanford University and prior at McGill University. His research combines natural language processing, network science and data science to discover, explain and predict human behavior in large social systems. His research has been published in top computational social science and natural language processing venues including PNAS, WWW, ACL, ICWSM, EMNLP, and others. His work has won the Cozzarelli Prize from the National Academy of Science, Cialdini Prize from the Society for Personality and Social Psychology, best paper at ICWSM and W-NUT, best paper nomination at ACL and Web Science, and has been featured in news outlets such as the BBC, Time, MIT Technology Review, New Scientist, and Forbes.

How he got there: Before joining UMSI, David was a postdoctoral scholar, jointly in the the Stanford NLP and SNAP Groups under Dan Jurafsky, Jure Leskovec and Dan McFarland. Prior, he ventured beyond the wall to the cold regions of Montreal (don't let the idyllic summers fool you) and was a postdoctoral scholar at McGill University in the Network Dynamics group with Derek Ruths. Before finishing his PhD, he was a research scientist at the Linguistics Computing Laboratory at Sapienza University of Rome under Roberto Navigli. During his PhD, he was concurrently a visiting researcher at the Information and Systems Science Lab at HRL Laboratories. After trips abroad and to Malibu, he received his PhD in Computer Science from the University of California, Los Angeles under Michael Dyer. Early in his career before he discovered you could study language and people, he received his BA in Philosophy and Political Science and an MS in Computer Science on Computer Vision under Robert Pless from Washington University in St. Louis.

Teaching

Fall 2017, 2018: SI 671 -- Data Mining: Methods and Applications

Winter 2017—present: SI 630 -- Natural Language Processing: Algorithms and People

Fall 2020—present: SI 650 / EECS 549 -- Information Retrieval

Winter 2023: SI 330 -- Data Manipulation

Fall 2019: SI 710 -- PhD Seminar: Computational Sociolinguistics
A new course! On computational sociolinguistics! With actual computation and actual sociolinguistics! If any of that excites you, drop me an email and I can share more details.

Activities

Being a faculty involves too many things to reasonably keep track of so this page is just left out of date in favor of staying sane. For folks who are very interested still, please see my CV for a more up-to-date list.

Current

  • Co-editor of a Frontiers special issue on Computational Sociolinguistics. The journal has a rolling deadline so feel free to submit here, or just read the papers as they appear.
  • Area chair for Social Media for ACL and EMNLP
  • Sponsorship chair for ICWSM 2019.
  • Senior PC for WWW 2020 (Web & Society)
  • Co-chair of the International Workshop on NLP and Computational Social Science at ACL-2017 with Dirk Hovy, David Bamman, Oren Tsur, and Svitlana Volkova.

Past

Research

Broadly, I conduct research in the areas of natural language processing, computational social science, and data science to discover, explain and predict human behavior in large social systems. My current research focuses on four central themes spanning these fields:

  • Who are the people communicating? Humans create nearly all of the text we see, yet often who is communicating is overlooked. I design new methods for identifying the demographics of communicators in order to understand large social systems.
  • How are people related? There are certain things you would tell your friends but not your mother. Yet, when we study peoples' communication pattern, we often focus on the message rather than the relationship between them. I work on learning to recognize how people are related to one another and what effect this has on their communication and behavior.
  • How does our online behavior affect our offline behavior? As our lives are increasingly lived online, what effect does this have in our offline behavior? I study how the experiences we have online --both good and bad-- affect the actions we take, such as exercising, eating, and traveling.
  • How can we make online behavior more civil? Anyone who has scrolled through a few YouTube comments or read a newspaper article's comment section knows how quickly conversation can devolve into incivility. I research what effect this has on people and how we can not only detect such behavior but also mitigate it by improving empathy.

My long-term research goal combines human and language technologies to create social understanding that reflects both the content and people involved in communication. In all my research, I strive to improve social equality by representing all people participating in these social systems.

Prospective Students

PhD students

I admit roughly one PhD student per year. Sometimes students are co-admitted or co-advised, so the number of admits can vary.

For prospective PhDs, I especially like students who come with a strong computational background with some experience in social science. There are no set criteria, but you're much better off towards admissions if you've contacted me (or your advisor has) and let me know of your interests and goals. Make sure to look over these pages carefully; the match should be pretty strong. A PhD student is very costly - in time and money - and I select students for my research group carefully.

If you're a current PhD student outside of CSE or SI, I'm open to collaborations. One of the best things about SI and CSE is the interdisciplinary environment here and I'm potentially open to hosting students outside my home departments (but inside UM) in lab or co-advising on projects where it makes sense. Regardless, I'd love to hear from you and you're always welcome to come take my classes

Masters

For current masters students, I typically ask that you take one of my classes and do well in it before starting on research. Often the research you do in one of my graduate courses can continue on through my lab, if you're interested. If you've already taken classes before that are similar, send me an email and we can discuss possibilities.

Due to the technical work that we do in the lab, I typically require students to have taking some graduate-level class on NLP (e.g., EECS 595 or SI 630) or advanced Machine Learning to give them the requisite skills. Without those classes, we end up teaching you many of the same techniques in a less principled way which takes more time. If you took a class like this in your undergraduate and think it was sufficient or if you already have significant research experience (just not in NLP or ML), please feel free to reach out and describe your background.

For prospectivemasters students, I do not have any role in MSI admissions but please feel free to drop me an email once you've been accepted. Showing interest in research prior to starting is usually good sign and gives you more opportunities during your masters program to see if research is something you might want to consider.

Undergraduates

I enjoy working with undergraduates and generally have a few students working on various projects in my lab. I do look for some serious work and effort associated with your research project , but in turn, if you want a recommendation letter later, it will reflect that effort. Typically, the biggest issue I face is students who over-estimate their time availability and disappear as coursework starts to pile up. Good research takes time and consistent effort; bugs will happen, experiments will fail, and programs will take time to run, so being able to dedicate a few hours every day (no joke!) to research is critical. If you can't see yourself doing that, it's best to reach out during a semester you think you will have more time.

I have the easiest time working with technically-oriented students, who already have good programming skills. You will typically be paired with a PhD student who can serve as an extra mentor and answer your questions. For some exceptionally strong students, we can try to get you a project all to yourself, but this usually requires you having done research prior. Occasionally, for bigger projects, I might pair you with another undergrad so you get to be your own research team.

As a undergrad, you will be expected to join our group's research meetings. During the school year, we'll typically have one meeting a week that is also with the project's co-supervisor (a PhD student or postdoc). During the summer, I typically also try to meet undergrads twice per week for short meetings so I can provide more rapid feedback and keep the ball rolling; it's a lot easier to course correct every few days than once per week.

Due to the technical work that we do in the lab, I typically require students to have taking some upper-division class on NLP or Machine Learning to give them the requisite skills. Without those classes, we end up teaching you many of the same techniques in a less principled way which takes more time. That said, if you think you have the skills without the classes, please feel free to reach out and describe your background.

Non-PhD External Students not at the University of Michigan

I unfortunately rarely work with undergraduate and masters students who are not physically at the University of Michigan. I still get emails from external students asking if we could together on something remotely and I really would love to, but my priority is to advise the current students at UM given the limited bandwith I have for advising. Your best bet to work with me is to get admitted to one of our programs and then drop me an email.

However, I do on occasion have self-funded visiting students for the summer and (rarely) during the academic year. In these cases, usually the student has some prior research experience, similar research interests, and (due to luck) there is a current project going on in my lab where they would be a good fit. If you are a self-funded student who wants to come for the summer, you should send an email in February or March before the summer that

  1. Describes your research experience and state what parts are relevant to the work going on in my group.
  2. Clearly state why you want to work with me. This lets me know that you are familiar with my work and know what you might be working on!
  3. What you want to get out of a summer research experience and what you want to learn. This helps me make sure that the trip is a success for you as well.
I don't consider self-funded visiting students as "free labor" and strongly want to make sure that your stay is productive and a success for your career goals.

If you're a PhD student somewhere else and want to work with me (while being external), this could happen under the right circumstances. Typically, your advisor at your primary institution and I would co-advise you. I typically only do these kinds of arrangements when I know your advisor (more common) or when the collaborative project make sense (rare). To get this started, have your advisor email me (not you directly) about what the project is.

Postdocs

I would love to have you all in my lab but this is generally dependent on funding (but seriously, I would take you all if I could). Email me if you think you're a good match and tell me why and we might be able to figure something out. That said, at the moment, I'm not currently actively seeking postdocs (due to funding, of course). If you're coming with your own funding, that changes everything, so drop me a line then.

FAQs

Q: How do I pronounce your last name?
A: Like you would in the old country

Q: Which old country is that?
A: 🤷

Q: Can I research with you or be a member of the Blablablab?
A: For an overly-detailed answer, click the "Prospective students" tab thingie above. That should cover everything.