Automated Speech Scoring Methods and Results

09/30/2022 | White Paper

The Problem

Many states are moving toward using automated scoring to reduce scoring costs, return scores more quickly to teachers and students, and provide consistent scoring against program-defined rubrics.  English language learner assessments, particularly screener tests, stand to benefit from the use of automated scoring because they allow students to be quickly identified and placed into English language development services using models trained on high-quality data. As importantly, they also can reduce teacher scoring time. ELL tests are administered in four domains (speaking, writing, listening and reading) and all domains need to be scored in order for a student to be placed into programs.  Speaking and writing tests often use constructed response items that require either automated or hand-scoring; thus, automated scoring needs to be implemented for both domains. While automated scoring of writing is well-established, automated scoring of speech is less common.  In this white paper, we provide a description of Cambium Assessment, Inc’s (CAI’s) automated scoring engine performance on English language learner speaking items. 

Our Solution

CAI’s scoring engine relies on deep learning—or multi-layer neural networks—for both its transcription model and its scoring model. To our knowledge, most automated speech engines rely on explicitly defined or algorithmic features to produce both the transcription (i.e., conversion of speech to text) and/or features used to predict scores. In contrast, the engine uses multi-layer neural networks to learn features from the data, both for transcription and for scoring, rather than using explicitly generated features. The engine architecture relies heavily on the recent Transformer approach as this approach has produced state-of-the-art results in both automatic speech recognition text-based scoring. On15 English Language Learner (ELL) screener speaking items, CAI’s engine performed slightly better than human raters. Also, the engine showed similar performance on the transcriptions to other state-of-the art transcription systems.

Download The White Paper

  • Machine Learning
  • Psychometrics
  • State Assessment
  • Technology