Speech-to-text applications have great potential for helping students with English language
comprehension and pronunciation practice. This study explores the functionality of five speech-to-text
(STT) applications (Google Docs voice typing tool, Apple Dictation, Windows 10 Dictation, Dictation.io
[a website service], and “Transcribe” [an app on iOS]) to measure their speech transcription accuracy
of American English. The experiment involved 30 nonnative speakers, who were asked to perform four
speaking tasks and whose speeches were recorded and transcribed with these applications. The
transcriptions produced by the applications were then compared with human-made transcriptions to
evaluate the accuracy rate of each application’s speech transcription ability. The results revealed that the accuracy rate of speech transcriptions depends not only on the applications’ automatic speech
recognition ability but also on the types of speech produced, as well as each speaker’s L1 influence on L2
(English). The study also offers examples of Japanese speakers’ pronunciation errors attained through
STT transcription, demonstrating great pedagogical potential for pronunciation practice and assessment
in English classrooms.
endingpage:
21
format.extent:
21
identifier.citation:
Hirai, A., & Kovalyova, A. (2024). Speech-to-text applications’ accuracy in English language learners’ speech transcription. Language Learning & Technology, 28(1), 1–21. https://hdl.handle.net/10125/73555
identifier.issn:
1094-3501
identifier.uri:
https://hdl.handle.net/10125/73555
language:
eng
number:
1
publicationname:
Language Learning & Technology
publisher:
University of Hawaii National Foreign Language Resource Center Center for Language & Technology
rights.license:
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License