A model for speech-driven lesson summary generation in a noisy educational environment

Blunt, Phillip John

Title: A model for speech-driven lesson summary generation in a noisy educational environment
Creator: Blunt, Phillip John
Subject: Automatic speech recognition
Subject: Speech processing systems
Subject: Educational technology
Date Issued: 2024-04
Date: 2024-04
Type: Master's theses
Type: text
Identifier: http://hdl.handle.net/10948/64500
Identifier: vital:73741
Description: The application of Automatic Speech Recognition (ASR) technology for generating lesson transcripts and closed captions in the classroom has shown to improve the learning experience of people in disadvantaged student groups. This dissertation proposes a concept model for applying ASR technology in the educational environment for lesson transcription or closed captioning. The model aims further to bolster students’ secondary contact with the lesson content using keyword identification and subsequent association to generate a summary of the educator’s key points with reference to known course content material. To reinforce this concept, three core theoretical areas are discussed in this work, namely the existing applications of ASR technology in the classroom, the prominent machine-learning solutions that are capable of performing ASR, either for keyword spotting or for continuous speech recognition, and finally, the speech enhancement techniques used to mitigate the negative effects of environmental noise in the educational space. After a groundwork investigation into these three core theoretical areas, an initial model was created for incorporating an ASR system into the educational environment using the speech of the educator to drive the process of generating the lesson summary. After analysis for prototype development, the feasibility of developing a keyword-spotting system using South African speech data to train a machine-learning model revealed a number of challenges. Hence, it was decided that it would be more appropriate to implement a cloud-based ASR solution to establish proof of concept in a prototype system. In addition, the advent of a cloud-based ASR solution meant that a more reliable lesson transcript could be generated and, as a result, the direction of this work could move towards exploiting the utility provided by lesson transcription to generate a meaningful lesson summary. An initial prototype implementation was then developed based on the initial model using a cloud-based ASR approach. The final model presented in this work makes use of keyword identification in the transcription process, in collaboration with a course content database to identify known, educator-defined keyword terms during a lesson that are tied to relevant course content items for the specified lesson. As the model or prototype was improved and adapted, its counterpart was modified appropriately, ensuring that each reflected both the theoretical and practical aspects of the other. After a series of improvement cycles, a final version of the model was ascertained, supported by a performance evaluation of an acceptable prototype system. Ultimately, the prototype proved capable of generating a lesson summary, presented to students to bolster secondary contact with lesson content. This lesson summary provides students with a lesson transcript, but also helps them to monitor educator-defined keyword terms, their prevalence as communicated in the lesson by the educator, and their associations with educator-defined sections of course content. The prototype was developed with a modular approach so that its speech recognition component was interchangeable between CMU’s Sphinx and Google Cloud’ Speech-to-Text speech recognition systems, both accessed via a cloud-based programming library. In addition to the ASR module, noise injection, cancellation and reduction were also introduced to the prototype as a speech enhancement module to demonstrate the effects of noise on the prototype. The prototype was tested using different configurations of speech recognition- and speech enhancement techniques to demonstrate the change in accuracy of lesson summary generation. Proof of concept was established using the Google Cloud Continuous Speech Recognition System, which prevailed over CMU’s Sphinx and enabled the prototype to achieve 100,00% accuracy in keyword identification and subsequent association on noise-free speech, contrasted with a 96,93% accuracy in keyword identification and subsequent association on noise-polluted speech when applying noise cancellation.
Description: Thesis (MIT) -- Faculty of Engineering, the Built Environment, and Technology, School of Information Technology, 2024
Format: computer
Format: online resource
Format: application/pdf
Format: 1 online resource (xv, 166 pages)
Format: pdf
Publisher: Nelson Mandela University
Publisher: Faculty of Engineering, the Built Environment, and Technology
Language: English
Rights: Nelson Mandela University
Rights: All Rights Reserved
Rights: Open Access

Hits: 378
Visitors: 387
Downloads: 21

Collections

NMU School of Information and Communication Technology

		Thumbnail	File	Description	Size	Format
View Details Download			SOURCE1	Blunt, PJ.pdf	4 MB	Adobe Acrobat PDF	View Details Download