Automated machine learning plankton taxonomy pipeline

Du Toit, Ian Charles

Title: Automated machine learning plankton taxonomy pipeline
Creator: Du Toit, Ian Charles
Subject: machine learning
Subject: Plankton -- Ecology
Date Issued: 2022-04
Date: 2022-04
Type: Master's theses
Type: text
Identifier: http://hdl.handle.net/10948/58363
Identifier: vital:59003
Description: Plankton taxonomy is considered a multi-class classification problem. The current state-of-the-art developments in machine learning and phytoplankton taxonomy, such as MorphoCluster, include using a convolutional neural network as a feature extractor and Hierarchical Density-Based Clustering for the classification of plankton and identification of outliers. These convolutional feature extraction algorithms achieved accuracies of 0.78 during the classification process. However, these feature extraction models are trained on clean datasets. They perform very well when analysing previously encountered and well-defined classes but do not perform well when tested on raw datasets expected in field deployment. Raw plankton datasets are unbalanced; whereas some classes only have one or two samples, others can have thousands. They also exhibit many inter-class similarities with significant size differences. The data can also be in the form of low-resolution, noisy images. Phytoplankton species are also highly biodiverse, meaning that there is always a higher chance of a network encountering unknown sample types. Some samples, such as the various body parts of organisms, are easily confused with the species itself. Marine experts classifying plankton tend to group ambiguous samples according to the highest order to which they are confident they belong. This system leads to a dataset containing conflicting classes and forces the feature extraction network to overfit when training. This research aims to address these spatial issues and present a feature extraction methodology built upon existing research and novel concepts. The proposed algorithm uses feature extraction methods designed around real-world sample sets and offers an alternative approach to optimizing the features extracted and supplied to the clustering algorithm. The proposed feature extraction methods achieved scores of 0.821 when tested on the same datasets as the general feature extractor. The algorithm also consists of Auxiliary SoftMax classification branches which indicate the class prediction obtained by the feature extraction models. These branches allow for autonomous labelling of the clusters formed during the HDBSCAN algorithm being performed on the extracted features. This results in a fully automated semi-supervised plankton taxonomy pipeline which achieves a classification score of 0.775 on a real-life sample set.
Description: Thesis (MA) -- Faculty of Engineering, the Built Environment, and Technology, 2022
Format: computer
Format: online resource
Format: application/pdf
Format: 1 online resource (158 pages)
Format: pdf
Publisher: Nelson Mandela University
Publisher: Faculty of Engineering, the Built Environment, and Technology
Language: English
Rights: Nelson Mandela University
Rights: All Rights Reserved
Rights: Open Access

Hits: 400
Visitors: 464
Downloads: 92

Collections

NMU School of Engineering

		Thumbnail	File	Description	Size	Format
View Details Download			SOURCE1	Du Toit, IC.pdf	4 MB	Adobe Acrobat PDF	View Details Download