Optimization of automatic speech recognition under noisy environment using machine learning techniques

Yamkela, Melane

Title: Optimization of automatic speech recognition under noisy environment using machine learning techniques
Creator: Yamkela, Melane
Subject: Automatic speech recognition
Subject: Speech processing systems
Subject: Computational linguistics
Date Issued: 2024-04
Date: 2024-04
Type: Master's theses
Type: text
Identifier: http://hdl.handle.net/10353/29954
Identifier: vital:79216
Description: Speech recognition technology is a fascinating field that enables machines to comprehend and interpret human speech. It allows users to interact with computers, smartphones, and other devices, using spoken commands rather than traditional input methods, like typing. Speech recognition systems analyse audio input, typically in the form of spoken words or phrases, and convert them into text or commands that computers can understand. The journey of speech recognition technology has been remarkable, evolving from simple command-based systems to advanced natural language processing algorithms capable of understanding context, accents, and even emotions. While speech recognition has made significant strides, challenges persist, particularly in accurately handling noisy environments and distinguishing between similarsounding words. This study aimed at developing an optimal automatic speech recognition system under a noisy environment, using machine learning techniques. In addition, the study aimed at evaluating the performance of the developed system. Speech recognition methodology involves several key steps to accurately transform verbal words into written commands or text, such as - Audio Input, Preprocessing, Feature Extraction, Acoustic Modeling, and Language Modeling. The model was developed using Google Collab and TensorFlow, an open-source machinelearning platform. This model used a transformer-hugging face, which is a pre-trained model. Transformers deploy convolutional neural networks that were trained with data collected by Facebook wac2 vec. For evaluation, the model made use of a confusion matrix, precision and accuracy metrics; the model was tested on real-time data and good results were achieved. Evaluation is continuing to observe the model's performance under different noisy backgrounds. This research adds to the corpus of knowledge, particularly in the field of speech recognition and for future work, the study will seek to use large live data and also investigate the error rate.
Description: Thesis (MSci) -- Faculty of Science and Agriculture, 2024
Format: computer
Format: online resource
Format: application/pdf
Format: 1 online resource (67 leaves)
Format: pdf
Publisher: University of Fort Hare
Publisher: Faculty of Science and Agriculture
Language: English
Rights: rights holder
Rights: All Rights Reserved
Rights: Open Access

Hits: 4
Visitors: 3
Downloads: 0

Collections

UFH Department of Computer Science

		Thumbnail	File	Description	Size	Format
View Details Download			SOURCE1	Final Dissertation - Yamkela Melane (201608576).pdf	21 MB	Adobe Acrobat PDF	View Details Download