Wednesday, November 13, 2024 11am to 12:15pm
About this Event
Engineering 2 1156 High Street, Santa Cruz, California 95064
Background: With the rise of large language models and other opaque machine learning approaches, researchers interested in deploying these systems need reassurance that they will have equitable impact across groups. Identifying systematic differences in model performance between groups, often called algorithmic bias, is key to determining model safety. In this talk, I discuss the application of Behavioral Testing in two Natural Language Processing (NLP) tasks on unstructured clinical notes: de-identification and predicting diagnosis codes.
Methods: The general paradigm of Behavioral Testing is to systematically replace parts of a model’s input while monitoring for parallel systematic output differences. In the de-identification task, we probed six de-identification systems for biases against patient names representative of different demographic groups. We created themed parallel corpora from the same base corpus by replacing the names with others representative of a specific race/ethnicity and monitored for differences in name identification performance. In the prediction task, we replaced words with emotional affect to create corpora with only pejorative, laudative, or neutral terms. We hypothesized that, just as pejorative terms negatively influence clinicians reading notes, a stigmatizing framing of a patient (e.g., “unkempt”) will negatively alter predictions made by language models. We then monitored for differences in diagnosis code predictions for state-of- the-art BERT-based word embedding models.
Implications: While disaggregated model evaluation is important for building trust in models, our studies indicate that researchers can also benefit from Behavioral Testing by providing a very fine-grained means for controlling clinical factors.
Bio: Paul M. Heider, PhD, is an Assistant Professor in the Biomedical Informatics Center (BMIC) and the Department of Public Health Sciences at MUSC. He serves as head of the NLP Core, a service center for researchers who want to use natural language processing (NLP) tools to extract structured coded data from unstructured data sources like clinical notes. Prior to joining MUSC, he built and designed NLP systems for start-ups and multi-national organizations on the scale of tens of millions of documents a day. He received his PhD from the Linguistics Department at SUNY Buffalo and his BA from Grinnell College.
0 people are interested in this event
User Activity
No recent activity