- Title
- Optimization of automatic speech recognition under noisy environment using machine learning techniques
- Creator
- Yamkela, Melane
- Subject
- Automatic speech recognition
- Subject
- Speech processing systems
- Subject
- Computational linguistics
- Date Issued
- 2024-04
- Date
- 2024-04
- Type
- Master's theses
- Type
- text
- Identifier
- http://hdl.handle.net/10353/29954
- Identifier
- vital:79216
- Description
- Speech recognition technology is a fascinating field that enables machines to comprehend and interpret human speech. It allows users to interact with computers, smartphones, and other devices, using spoken commands rather than traditional input methods, like typing. Speech recognition systems analyse audio input, typically in the form of spoken words or phrases, and convert them into text or commands that computers can understand. The journey of speech recognition technology has been remarkable, evolving from simple command-based systems to advanced natural language processing algorithms capable of understanding context, accents, and even emotions. While speech recognition has made significant strides, challenges persist, particularly in accurately handling noisy environments and distinguishing between similarsounding words. This study aimed at developing an optimal automatic speech recognition system under a noisy environment, using machine learning techniques. In addition, the study aimed at evaluating the performance of the developed system. Speech recognition methodology involves several key steps to accurately transform verbal words into written commands or text, such as - Audio Input, Preprocessing, Feature Extraction, Acoustic Modeling, and Language Modeling. The model was developed using Google Collab and TensorFlow, an open-source machinelearning platform. This model used a transformer-hugging face, which is a pre-trained model. Transformers deploy convolutional neural networks that were trained with data collected by Facebook wac2 vec. For evaluation, the model made use of a confusion matrix, precision and accuracy metrics; the model was tested on real-time data and good results were achieved. Evaluation is continuing to observe the model's performance under different noisy backgrounds. This research adds to the corpus of knowledge, particularly in the field of speech recognition and for future work, the study will seek to use large live data and also investigate the error rate.
- Description
- Thesis (MSci) -- Faculty of Science and Agriculture, 2024
- Format
- computer
- Format
- online resource
- Format
- application/pdf
- Format
- 1 online resource (67 leaves)
- Format
- Publisher
- University of Fort Hare
- Publisher
- Faculty of Science and Agriculture
- Language
- English
- Rights
- rights holder
- Rights
- All Rights Reserved
- Rights
- Open Access
- Hits: 4
- Visitors: 3
- Downloads: 0
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | Final Dissertation - Yamkela Melane (201608576).pdf | 21 MB | Adobe Acrobat PDF | View Details Download |