Investigating unimodal isolated signer-independent sign language recognition
- Authors: Marais, Marc Jason
- Date: 2024-04-04
- Subjects: Convolutional neural network , Sign language recognition , Human activity recognition , Pattern recognition systems , Neural networks (Computer science)
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/435343 , vital:73149
- Description: Sign language serves as the mode of communication for the Deaf and Hard of Hearing community, embodying a rich linguistic and cultural heritage. Recent Sign Language Recognition (SLR) system developments aim to facilitate seamless communication between the Deaf community and the broader society. However, most existing systems are limited by signer-dependent models, hindering their adaptability to diverse signing styles and signers, thus impeding their practical implementation in real-world scenarios. This research explores various unimodal approaches, both pose-based and vision-based, for isolated signer-independent SLR using RGB video input on the LSA64 and AUTSL datasets. The unimodal RGB-only input strategy provides a realistic SLR setting where alternative data sources are either unavailable or necessitate specialised equipment. Through systematic testing scenarios, isolated signer-independent SLR experiments are conducted on both datasets, primarily focusing on AUTSL – a signer-independent dataset. The vision-based R(2+1)D-18 model emerged as the top performer, achieving 90.64% accuracy on the unseen AUTSL dataset test split, closely followed by the pose-based Spatio- Temporal Graph Convolutional Network (ST-GCN) model with an accuracy of 89.95%. Furthermore, these models achieved comparable accuracies at a significantly lower computational demand. Notably, the pose-based approach demonstrates robust generalisation to substantial background and signer variation. Moreover, the pose-based approach demands significantly less computational power and training time than vision-based approaches. The proposed unimodal pose-based and vision-based systems were concluded to both be effective at classifying sign classes in the LSA64 and AUTSL datasets. , Thesis (MSc) -- Faculty of Science, Ichthyology and Fisheries Science, 2024
- Full Text:
- Date Issued: 2024-04-04
- Authors: Marais, Marc Jason
- Date: 2024-04-04
- Subjects: Convolutional neural network , Sign language recognition , Human activity recognition , Pattern recognition systems , Neural networks (Computer science)
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/435343 , vital:73149
- Description: Sign language serves as the mode of communication for the Deaf and Hard of Hearing community, embodying a rich linguistic and cultural heritage. Recent Sign Language Recognition (SLR) system developments aim to facilitate seamless communication between the Deaf community and the broader society. However, most existing systems are limited by signer-dependent models, hindering their adaptability to diverse signing styles and signers, thus impeding their practical implementation in real-world scenarios. This research explores various unimodal approaches, both pose-based and vision-based, for isolated signer-independent SLR using RGB video input on the LSA64 and AUTSL datasets. The unimodal RGB-only input strategy provides a realistic SLR setting where alternative data sources are either unavailable or necessitate specialised equipment. Through systematic testing scenarios, isolated signer-independent SLR experiments are conducted on both datasets, primarily focusing on AUTSL – a signer-independent dataset. The vision-based R(2+1)D-18 model emerged as the top performer, achieving 90.64% accuracy on the unseen AUTSL dataset test split, closely followed by the pose-based Spatio- Temporal Graph Convolutional Network (ST-GCN) model with an accuracy of 89.95%. Furthermore, these models achieved comparable accuracies at a significantly lower computational demand. Notably, the pose-based approach demonstrates robust generalisation to substantial background and signer variation. Moreover, the pose-based approach demands significantly less computational power and training time than vision-based approaches. The proposed unimodal pose-based and vision-based systems were concluded to both be effective at classifying sign classes in the LSA64 and AUTSL datasets. , Thesis (MSc) -- Faculty of Science, Ichthyology and Fisheries Science, 2024
- Full Text:
- Date Issued: 2024-04-04
A model for measuring and predicting stress for software developers using vital signs and activities
- Authors: Hibbers, Ilze
- Date: 2024-04
- Subjects: Machine learning , Neural networks (Computer science) , Computer software developers
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63799 , vital:73614
- Description: Occupational stress is a well-recognised issue that affects individuals in various professions and industries. Reducing occupational stress has multiple benefits, such as improving employee's health and performance. This study proposes a model to measure and predict occupational stress using data collected in a real IT office environment. Different data sources, such as questionnaires, application software (RescueTime) and Fitbit smartwatches were used for collecting heart rate (HR), facial emotions, computer interactions, and application usage. The results of the Demand Control Support and Effort and Reward questionnaires indicated that the participants experienced high social support and an average level of workload. Participants also reported their daily perceived stress and workload level using a 5- point score. The perceived stress of the participants was overall neutral. There was no correlation found between HR, interactions, fear, and meetings. K-means and Bernoulli algorithms were applied to the dataset and two well-separated clusters were formed. The centroids indicated that higher heart rates were grouped either with meetings or had a higher difference in the center point values for interactions. Silhouette scores and 5-fold-validation were used to measure the accuracy of the clusters. However, these clusters were unable to predict the daily reported stress levels. Calculations were done on the computer usage data to measure interaction speeds and time spent working, in meetings, or away from the computer. These calculations were used as input into a decision tree with the reported daily stress levels. The results of the tree helped to identify which patterns lead to stressful days. The results indicated that days with high time pressure led to more reported stress. A new, more general tree was developed, which was able to predict 82 per cent of the daily stress reported. The main discovery of the research was that stress does not have a straightforward connection with computer interactions, facial emotions, or meetings. High interactions sometimes lead to stress and other times do not. So, predicting stress involves finding patterns and how data from different data sources interact with each other. Future work will revolve around validating the model in more office environments around South Africa. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
A model for measuring and predicting stress for software developers using vital signs and activities
- Authors: Hibbers, Ilze
- Date: 2024-04
- Subjects: Machine learning , Neural networks (Computer science) , Computer software developers
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63799 , vital:73614
- Description: Occupational stress is a well-recognised issue that affects individuals in various professions and industries. Reducing occupational stress has multiple benefits, such as improving employee's health and performance. This study proposes a model to measure and predict occupational stress using data collected in a real IT office environment. Different data sources, such as questionnaires, application software (RescueTime) and Fitbit smartwatches were used for collecting heart rate (HR), facial emotions, computer interactions, and application usage. The results of the Demand Control Support and Effort and Reward questionnaires indicated that the participants experienced high social support and an average level of workload. Participants also reported their daily perceived stress and workload level using a 5- point score. The perceived stress of the participants was overall neutral. There was no correlation found between HR, interactions, fear, and meetings. K-means and Bernoulli algorithms were applied to the dataset and two well-separated clusters were formed. The centroids indicated that higher heart rates were grouped either with meetings or had a higher difference in the center point values for interactions. Silhouette scores and 5-fold-validation were used to measure the accuracy of the clusters. However, these clusters were unable to predict the daily reported stress levels. Calculations were done on the computer usage data to measure interaction speeds and time spent working, in meetings, or away from the computer. These calculations were used as input into a decision tree with the reported daily stress levels. The results of the tree helped to identify which patterns lead to stressful days. The results indicated that days with high time pressure led to more reported stress. A new, more general tree was developed, which was able to predict 82 per cent of the daily stress reported. The main discovery of the research was that stress does not have a straightforward connection with computer interactions, facial emotions, or meetings. High interactions sometimes lead to stress and other times do not. So, predicting stress involves finding patterns and how data from different data sources interact with each other. Future work will revolve around validating the model in more office environments around South Africa. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
Augmenting the Moore-Penrose generalised Inverse to train neural networks
- Authors: Fang, Bobby
- Date: 2024-04
- Subjects: Neural networks (Computer science) , Machine learning , Mathematical optimization -- Computer programs
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63755 , vital:73595
- Description: An Extreme Learning Machine (ELM) is a non-iterative and fast feedforward neural network training algorithm which uses the Moore-Penrose generalised inverse of a matrix to compute the weights of the output layer of the neural network, using a random initialisation for the hidden layer. While ELM has been used to train feedforward neural networks, the effectiveness of the MP generalised to train recurrent neural networks is yet to be investigated. The primary aim of this research was to investigate how biases in the output layer and the MP generalised inverse can be used to train recurrent neural networks. To accomplish this, the Bias Augmented ELM (BA-ELM), which concatenated the hidden layer output matrix with a ones-column vector to simulate the biases in the output layer, was proposed. A variety of datasets generated from optimisation test functions, as well as using real-world regression and classification datasets, were used to validate BA-ELM. The results showed in specific circumstances that BA-ELM was able to perform better than ELM. Following this, Recurrent ELM (R-ELM) was proposed which uses a recurrent hidden layer instead of a feedforward hidden layer. Recurrent neural networks also rely on having functional feedback connections in the recurrent layer. A hybrid training algorithm, Recurrent Hybrid ELM (R-HELM), was proposed, which uses a gradient-based algorithm to optimise the recurrent layer and the MP generalised inverse to compute the output weights. The evaluation of R-ELM and R-HELM algorithms were carried out using three different recurrent architectures on two recurrent tasks derived from the Susceptible- Exposed-Infected-Removed (SEIR) epidemiology model. Various training hyperparameters were evaluated through hyperparameter investigations to investigate their effectiveness on the hybrid training algorithm. With optimal hyperparameters, the hybrid training algorithm was able to achieve better performance than the conventional gradient-based algorithm. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
- Authors: Fang, Bobby
- Date: 2024-04
- Subjects: Neural networks (Computer science) , Machine learning , Mathematical optimization -- Computer programs
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63755 , vital:73595
- Description: An Extreme Learning Machine (ELM) is a non-iterative and fast feedforward neural network training algorithm which uses the Moore-Penrose generalised inverse of a matrix to compute the weights of the output layer of the neural network, using a random initialisation for the hidden layer. While ELM has been used to train feedforward neural networks, the effectiveness of the MP generalised to train recurrent neural networks is yet to be investigated. The primary aim of this research was to investigate how biases in the output layer and the MP generalised inverse can be used to train recurrent neural networks. To accomplish this, the Bias Augmented ELM (BA-ELM), which concatenated the hidden layer output matrix with a ones-column vector to simulate the biases in the output layer, was proposed. A variety of datasets generated from optimisation test functions, as well as using real-world regression and classification datasets, were used to validate BA-ELM. The results showed in specific circumstances that BA-ELM was able to perform better than ELM. Following this, Recurrent ELM (R-ELM) was proposed which uses a recurrent hidden layer instead of a feedforward hidden layer. Recurrent neural networks also rely on having functional feedback connections in the recurrent layer. A hybrid training algorithm, Recurrent Hybrid ELM (R-HELM), was proposed, which uses a gradient-based algorithm to optimise the recurrent layer and the MP generalised inverse to compute the output weights. The evaluation of R-ELM and R-HELM algorithms were carried out using three different recurrent architectures on two recurrent tasks derived from the Susceptible- Exposed-Infected-Removed (SEIR) epidemiology model. Various training hyperparameters were evaluated through hyperparameter investigations to investigate their effectiveness on the hybrid training algorithm. With optimal hyperparameters, the hybrid training algorithm was able to achieve better performance than the conventional gradient-based algorithm. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
Self-attentive vision in evolutionary robotics
- Authors: Botha, Bouwer
- Date: 2024-04
- Subjects: Evolutionary robotics , Robotics , Neural networks (Computer science)
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63628 , vital:73566
- Description: The autonomy of a robot refers to its ability to achieve a task in an environment with minimal human supervision. This may require autonomous solutions to be able to perceive their environment to inform their decisions. An inexpensive and highly informative way that robots can perceive the environment is through vision. The autonomy of a robot is reliant on the quality of the robotic controller. These controllers are the software interface between the robot and environment that determine the actions of the robot based on the perceived environment. Controllers are typically created using manual programming techniques, which become progressively more challenging with increasing complexity of both the robot and task. An alternative to manual programming is the use of machine learning techniques such as those used by Evolutionary Robotics (ER). ER is an area of research that investigates the automatic creation of controllers. Instead of manually programming a controller, an Evolutionary Algorithms can be used to evolve the controller through repeated interactions with the task environment. Employing the ER approach on camera-based controllers, however, has presented problems for conventional ER methods. Firstly, existing architectures that are capable of automatically processing images, have a large number of trained parameters. These architectures over-encumber the evolutionary process due to the large search space of possible configurations. Secondly, the evolution of complex controllers needs to be done in simulation, which requires either: (a) the construction of a photo-realistic virtual environment with accurate lighting, texturing and models or (b) potential reduction of the controller capability by simplifying the problem via image preprocessing. Any controller trained in simulation also raises the inherent concern of not being able to transfer to the real world. This study proposes a new technique for the evolution of camera-based controllers in ER, that aims to address the highlighted problems. The use of self-attention is proposed to facilitate the evolution of compact controllers that are able to evolve specialized sets of task-relevant features in unprocessed images by focussing on important image regions. Furthermore, a new neural network-based simulation approach, Generative Neuro-Augmented Vision (GNAV), is proposed to simplify simulation construction. GNAV makes use of random data collected in a simple virtual environment and the real world. A neural network is trained to overcome the visual discrepancies between these two environments. GNAV enables a controller to be trained in a simple simulated environment that appears similar to the real environment, while requiring minimal human supervision. The capabilities of the new technique were demonstrated using a series of real-world navigation tasks based on camera vision. Controllers utilizing the proposed self-attention mechanism were trained using GNAV and transferred to a real camera-equipped robot. The controllers were shown to be able to perform the same tasks in the real world. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
- Authors: Botha, Bouwer
- Date: 2024-04
- Subjects: Evolutionary robotics , Robotics , Neural networks (Computer science)
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63628 , vital:73566
- Description: The autonomy of a robot refers to its ability to achieve a task in an environment with minimal human supervision. This may require autonomous solutions to be able to perceive their environment to inform their decisions. An inexpensive and highly informative way that robots can perceive the environment is through vision. The autonomy of a robot is reliant on the quality of the robotic controller. These controllers are the software interface between the robot and environment that determine the actions of the robot based on the perceived environment. Controllers are typically created using manual programming techniques, which become progressively more challenging with increasing complexity of both the robot and task. An alternative to manual programming is the use of machine learning techniques such as those used by Evolutionary Robotics (ER). ER is an area of research that investigates the automatic creation of controllers. Instead of manually programming a controller, an Evolutionary Algorithms can be used to evolve the controller through repeated interactions with the task environment. Employing the ER approach on camera-based controllers, however, has presented problems for conventional ER methods. Firstly, existing architectures that are capable of automatically processing images, have a large number of trained parameters. These architectures over-encumber the evolutionary process due to the large search space of possible configurations. Secondly, the evolution of complex controllers needs to be done in simulation, which requires either: (a) the construction of a photo-realistic virtual environment with accurate lighting, texturing and models or (b) potential reduction of the controller capability by simplifying the problem via image preprocessing. Any controller trained in simulation also raises the inherent concern of not being able to transfer to the real world. This study proposes a new technique for the evolution of camera-based controllers in ER, that aims to address the highlighted problems. The use of self-attention is proposed to facilitate the evolution of compact controllers that are able to evolve specialized sets of task-relevant features in unprocessed images by focussing on important image regions. Furthermore, a new neural network-based simulation approach, Generative Neuro-Augmented Vision (GNAV), is proposed to simplify simulation construction. GNAV makes use of random data collected in a simple virtual environment and the real world. A neural network is trained to overcome the visual discrepancies between these two environments. GNAV enables a controller to be trained in a simple simulated environment that appears similar to the real environment, while requiring minimal human supervision. The capabilities of the new technique were demonstrated using a series of real-world navigation tasks based on camera vision. Controllers utilizing the proposed self-attention mechanism were trained using GNAV and transferred to a real camera-equipped robot. The controllers were shown to be able to perform the same tasks in the real world. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
Deep neural networks for robot vision in evolutionary robotics
- Authors: Watt, Nathan
- Date: 2021-04
- Subjects: Gqeberha (South Africa) , Eastern Cape (South Africa) , Neural networks (Computer science)
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/52100 , vital:43448
- Description: Advances in electronics manufacturing have made robots and their sensors cheaper and more accessible. Robots can have a variety of sensors, such as touch sensors, distance sensors and cameras. A robot’s controller is the software which interprets its sensors and determines how the robot will behave. The difficulty in programming robot controllers increases with complex robots and complicated tasks, forming a barrier to deploying robots for real-world applications. Robot controllers can be automatically created with Evolutionary Robotics (ER). ER makes use of an Evolutionary Algorithm (EA) to evolve controllers to complete a particular task. Instead of manually programming controllers, an EA can evolve controllers when provided with the robot’s task. ER has been used to evolve controllers for many different kinds of robots with a variety of sensors, however the use of robots with on-board camera sensors has been limited. The nature of EAs makes evolving a controller for a camera-equipped robot particularly difficult. There are two main challenges which complicate the evolution of vision-based controllers. First, every image from a camera contains a large amount of information, and a controller needs many parameters to receive that information, however it is difficult to evolve controllers with such a large number of parameters using EAs. Second, during the process of evolution, it is necessary to evaluate the fitness of many candidate controllers. This is typically done in simulation, however creating a simulator for a camera sensor is a tedious and timeconsuming task, as building a photo-realistic simulated environment requires handcrafted 3-dimensional models, textures and lighting. Two techniques have been used in previous experiments to overcome the challenges associated with evolving vision-based controllers. Either the controller was provided with extremely low-resolution images, or a task-specific algorithm was used to preprocess the images, only providing the necessary information to the controller. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2021
- Full Text: false
- Authors: Watt, Nathan
- Date: 2021-04
- Subjects: Gqeberha (South Africa) , Eastern Cape (South Africa) , Neural networks (Computer science)
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/52100 , vital:43448
- Description: Advances in electronics manufacturing have made robots and their sensors cheaper and more accessible. Robots can have a variety of sensors, such as touch sensors, distance sensors and cameras. A robot’s controller is the software which interprets its sensors and determines how the robot will behave. The difficulty in programming robot controllers increases with complex robots and complicated tasks, forming a barrier to deploying robots for real-world applications. Robot controllers can be automatically created with Evolutionary Robotics (ER). ER makes use of an Evolutionary Algorithm (EA) to evolve controllers to complete a particular task. Instead of manually programming controllers, an EA can evolve controllers when provided with the robot’s task. ER has been used to evolve controllers for many different kinds of robots with a variety of sensors, however the use of robots with on-board camera sensors has been limited. The nature of EAs makes evolving a controller for a camera-equipped robot particularly difficult. There are two main challenges which complicate the evolution of vision-based controllers. First, every image from a camera contains a large amount of information, and a controller needs many parameters to receive that information, however it is difficult to evolve controllers with such a large number of parameters using EAs. Second, during the process of evolution, it is necessary to evaluate the fitness of many candidate controllers. This is typically done in simulation, however creating a simulator for a camera sensor is a tedious and timeconsuming task, as building a photo-realistic simulated environment requires handcrafted 3-dimensional models, textures and lighting. Two techniques have been used in previous experiments to overcome the challenges associated with evolving vision-based controllers. Either the controller was provided with extremely low-resolution images, or a task-specific algorithm was used to preprocess the images, only providing the necessary information to the controller. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2021
- Full Text: false
- «
- ‹
- 1
- ›
- »