An analysis of the efficiency of ontology and symbolic learning algorithms in indigenous knowledge representation

Dzimba, Jesman

Title: An analysis of the efficiency of ontology and symbolic learning algorithms in indigenous knowledge representation
Creator: Dzimba, Jesman
Subject: Ontology Computer algorithms
Date Issued: 2016
Date: 2016
Type: Thesis
Type: Masters
Type: MSc
Identifier: http://hdl.handle.net/10353/11939
Identifier: vital:39120
Description: It is without a doubt that machine learning has been the area of focus in early days of artificial intelligence, but the early neural networks approach suffered some shortcomings and this led to a temporary decline in research capacity. New symbolic learning techniques have emerged since then which have yielded promising results and have led to a revival in research in machine learning. This has seen many researchers focusing on these techniques and experimenting with them by comparing their performances for different applications. With that in mind, the research thus decided to make an analysis of the symbolic approach against other approaches such as the neural network (connectionist) to evaluate the power of the former approach. This was done by first generating an ontology that acted as a representation of some collected indigenous knowledge. It is from this ontology that a dataset was generated. The dataset was made ambiguous to see the learning power of classifiers in such data. Two experiments were done, one using WEKA and the other using Orange as tools. The reason why the two experiments were used is because there was not a single tool which contained all the required learning algorithms. The research wanted to make use of ID3 and CN2 symbolic algorithm. However, WEKA had ID3 and not CN2 while Orange had CN2 and not ID3. The most important attributes from the ontology regarding the indigenous knowledge were the name of the plant, the province it is found and the disease the plant treats. Therefore the dataset had two features which were disease and province and one label which was the name of the plant. The learning algorithm was to use the two features to generate rules used to predict the label. However, there was ambiguity on the dataset. The challenge was that two different labels would contain the same features, thus leading to wrongful classification. This was the core of the research. Even though the learning model concluded this situation as wrongful classification, in real time, a system using the same learning model would provide desired and correct results. The only flow which was there is that the learning model simply used one label to predict under and ignore the other label with similar features. This was identified as a flow for both symbolic and non-symbolic algorithms. There is no way of giving suggestions in the case a user wants a different plant but with similar features. Therefore for classification using an ambiguous dataset, both these approaches proved to have the fore mentions flow. The research then decided to use recall to analyze the power of these approaches. It was discovered that ID3 has better recall than Multilayer perceptron and Naïve Bayes algorithms when using a training set. ID3 managed to recall clearly and effectively three of its classes by a probability of 1 while Bayes Net had only one class with recall probability of 1. To further investigate the issue of recall, cross validation was used to contrast the competence of recall of the three algorithms to strengthen the assertion that indeed ID3 has a better recall as compared to the other two algorithms. Three stages of cross-validation were done, one stage using 10 fold, the other 20 fold, and the last using 50 fold. For all the different stages of crossvalidation, Bayes Net proved to perform better in terms of recall than the other two algorithms. In cross-validation, MLP could recall approximately above 88% of the instances available in contrast to when using training set where the algorithm recall only two out of 18 instances. In overall the symbolic approach proved to be a commendable approach for use over the nonsymbolic approach. The study of machine learning involves the building of learning algorithms, improving upon learning algorithms or making comparisons of machine learning algorithms. The research raised awareness on some improvements that need to be done on not only symbolic algorithms but non-symbolic ones as well. Some improvements include improving on or coming up with algorithms that suggest alternative predictions in cases of ambiguity instead of doing wrongful classification and not reflect on other possibilities.
Format: 125 leaves
Format: pdf
Publisher: University of Fort Hare
Publisher: Faculty of Science and Agriculture
Language: English
Rights: University of Fort Hare

Hits: 1065
Visitors: 1130
Downloads: 129

Collections

UFH Department of Computer Science

		Thumbnail	File	Description	Size	Format
View Details Download			SOURCE1	Final Masters Jesman 2016.pdf	33 MB	Adobe Acrobat PDF	View Details Download