Explaining Crisis Alerts from Humans and Automated Scoring Engines Using Annotations
Explaining how human scorers and scoring engines arrive at scores is an important yet difficult problem that requires careful attention. Typically, humans and scoring engines assign scores but provide no explanation about how the scores are determined for a particular piece of text. We are aware of few studies that examine engine-based annotations and even fewer than compare engine annotations with human annotations.
We examine this issue by having both the humans and scoring engines annotate—or highlight—text associated with a crisis paper “score,” called an Alert. Our evaluation focused on how well two human annotators agreed with one another, and how well human annotations agree with annotations from a scoring engine. The results of the study suggest that human annotators, at least when applied to crisis responses, perform reasonably well. And, the results indicate that engine annotations computed as post-training analysis arising from a scoring model perform worse than human annotators. However, the engine annotations agreed at a level that suggests they can be improved upon and potentially lend themselves to providing validity evidence around how the engine is arriving at the scores it assigns.Download the White Paper
Our systems cover the entire workflow of assessment services.