Explaining Crisis Alerts from Humans and Automated Scoring Engines Using Annotations

The Problem

Explaining how human scorers and scoring engines arrive at scores is an important yet difficult problem that requires careful attention. Typically, humans and scoring engines assign scores but provide no explanation about how the scores are determined for a particular piece of text. We are aware of few studies that examine engine-based annotations and even fewer than compare engine annotations with human annotations.

Solution

We examine this issue by having both the humans and scoring engines annotate—or highlight—text associated with a crisis paper “score,” called an Alert. Our evaluation focused on how well two human annotators agreed with one another, and how well human annotations agree with annotations from a scoring engine. The results of the study suggest that human annotators, at least when applied to crisis responses, perform reasonably well. And, the results indicate that engine annotations computed as post-training analysis arising from a scoring model perform worse than human annotators. However, the engine annotations agreed at a level that suggests they can be improved upon and potentially lend themselves to providing validity evidence around how the engine is arriving at the scores it assigns. Download the White Paper

Find out more about us. Reach out.

Contact Our Team

Cambium Assessment leads the industry in K–12 online testing*

Administered

115.8M

online tests

Served

1.5M

simultaneous testers

Tested

15.6M

individual students

Achieved

4

industry-standard certifications

* Data from the 2022–23 academic year

Explore Careers with Us

Find an opportunity that fits what you're looking for.

Explore Careers