Targeted attack detection by means of free and open source solutions
- Authors: Bernardo, Louis F
- Date: 2019
- Subjects: Computer networks -- Security measures , Information technology -- Security measures , Computer security -- Management , Data protection
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/92269 , vital:30703
- Description: Compliance requirements are part of everyday business requirements for various areas, such as retail and medical services. As part of compliance it may be required to have infrastructure in place to monitor the activities in the environment to ensure that the relevant data and environment is sufficiently protected. At the core of such monitoring solutions one would find some type of data repository, or database, to store and ultimately correlate the captured events. Such solutions are commonly called Security Information and Event Management, or SIEM for short. Larger companies have been known to use commercial solutions such as IBM's Qradar, Logrythm, or Splunk. However, these come at significant cost and arent suitable for smaller businesses with limited budgets. These solutions require manual configuration of event correlation for detection of activities that place the environment in danger. This usually requires vendor implementation assistance that also would come at a cost. Alternatively, there are open source solutions that provide the required functionality. This research will demonstrate building an open source solution, with minimal to no cost for hardware or software, while still maintaining the capability of detecting targeted attacks. The solution presented in this research includes Wazuh, which is a combination of OSSEC and the ELK stack, integrated with an Network Intrusion Detection System (NIDS). The success of the integration, is determined by measuring postive attack detection based on each different configuration options. To perform the testing, a deliberately vulnerable platform named Metasploitable will be used as a victim host. The victim host vulnerabilities were created specifically to serve as target for Metasploit. The attacks were generated by utilising Metasploit Framework on a prebuilt Kali Linux host.
- Full Text:
- Date Issued: 2019
- Authors: Bernardo, Louis F
- Date: 2019
- Subjects: Computer networks -- Security measures , Information technology -- Security measures , Computer security -- Management , Data protection
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/92269 , vital:30703
- Description: Compliance requirements are part of everyday business requirements for various areas, such as retail and medical services. As part of compliance it may be required to have infrastructure in place to monitor the activities in the environment to ensure that the relevant data and environment is sufficiently protected. At the core of such monitoring solutions one would find some type of data repository, or database, to store and ultimately correlate the captured events. Such solutions are commonly called Security Information and Event Management, or SIEM for short. Larger companies have been known to use commercial solutions such as IBM's Qradar, Logrythm, or Splunk. However, these come at significant cost and arent suitable for smaller businesses with limited budgets. These solutions require manual configuration of event correlation for detection of activities that place the environment in danger. This usually requires vendor implementation assistance that also would come at a cost. Alternatively, there are open source solutions that provide the required functionality. This research will demonstrate building an open source solution, with minimal to no cost for hardware or software, while still maintaining the capability of detecting targeted attacks. The solution presented in this research includes Wazuh, which is a combination of OSSEC and the ELK stack, integrated with an Network Intrusion Detection System (NIDS). The success of the integration, is determined by measuring postive attack detection based on each different configuration options. To perform the testing, a deliberately vulnerable platform named Metasploitable will be used as a victim host. The victim host vulnerabilities were created specifically to serve as target for Metasploit. The attacks were generated by utilising Metasploit Framework on a prebuilt Kali Linux host.
- Full Text:
- Date Issued: 2019
Towards understanding and mitigating attacks leveraging zero-day exploits
- Authors: Smit, Liam
- Date: 2019
- Subjects: Computer crimes -- Prevention , Data protection , Hacking , Computer security , Computer networks -- Security measures , Computers -- Access control , Malware (Computer software)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/115718 , vital:34218
- Description: Zero-day vulnerabilities are unknown and therefore not addressed with the result that they can be exploited by attackers to gain unauthorised system access. In order to understand and mitigate against attacks leveraging zero-days or unknown techniques, it is necessary to study the vulnerabilities, exploits and attacks that make use of them. In recent years there have been a number of leaks publishing such attacks using various methods to exploit vulnerabilities. This research seeks to understand what types of vulnerabilities exist, why and how these are exploited, and how to defend against such attacks by either mitigating the vulnerabilities or the method / process of exploiting them. By moving beyond merely remedying the vulnerabilities to defences that are able to prevent or detect the actions taken by attackers, the security of the information system will be better positioned to deal with future unknown threats. An interesting finding is how attackers exploit moving beyond the observable bounds to circumvent security defences, for example, compromising syslog servers, or going down to lower system rings to gain access. However, defenders can counter this by employing defences that are external to the system preventing attackers from disabling them or removing collected evidence after gaining system access. Attackers are able to defeat air-gaps via the leakage of electromagnetic radiation as well as misdirect attribution by planting false artefacts for forensic analysis and attacking from third party information systems. They analyse the methods of other attackers to learn new techniques. An example of this is the Umbrage project whereby malware is analysed to decide whether it should be implemented as a proof of concept. Another important finding is that attackers respect defence mechanisms such as: remote syslog (e.g. firewall), core dump files, database auditing, and Tripwire (e.g. SlyHeretic). These defences all have the potential to result in the attacker being discovered. Attackers must either negate the defence mechanism or find unprotected targets. Defenders can use technologies such as encryption to defend against interception and man-in-the-middle attacks. They can also employ honeytokens and honeypots to alarm misdirect, slow down and learn from attackers. By employing various tactics defenders are able to increase their chance of detecting and time to react to attacks, even those exploiting hitherto unknown vulnerabilities. To summarize the information presented in this thesis and to show the practical importance thereof, an examination is presented of the NSA's network intrusion of the SWIFT organisation. It shows that the firewalls were exploited with remote code execution zerodays. This attack has a striking parallel in the approach used in the recent VPNFilter malware. If nothing else, the leaks provide information to other actors on how to attack and what to avoid. However, by studying state actors, we can gain insight into what other actors with fewer resources can do in the future.
- Full Text:
- Date Issued: 2019
- Authors: Smit, Liam
- Date: 2019
- Subjects: Computer crimes -- Prevention , Data protection , Hacking , Computer security , Computer networks -- Security measures , Computers -- Access control , Malware (Computer software)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/115718 , vital:34218
- Description: Zero-day vulnerabilities are unknown and therefore not addressed with the result that they can be exploited by attackers to gain unauthorised system access. In order to understand and mitigate against attacks leveraging zero-days or unknown techniques, it is necessary to study the vulnerabilities, exploits and attacks that make use of them. In recent years there have been a number of leaks publishing such attacks using various methods to exploit vulnerabilities. This research seeks to understand what types of vulnerabilities exist, why and how these are exploited, and how to defend against such attacks by either mitigating the vulnerabilities or the method / process of exploiting them. By moving beyond merely remedying the vulnerabilities to defences that are able to prevent or detect the actions taken by attackers, the security of the information system will be better positioned to deal with future unknown threats. An interesting finding is how attackers exploit moving beyond the observable bounds to circumvent security defences, for example, compromising syslog servers, or going down to lower system rings to gain access. However, defenders can counter this by employing defences that are external to the system preventing attackers from disabling them or removing collected evidence after gaining system access. Attackers are able to defeat air-gaps via the leakage of electromagnetic radiation as well as misdirect attribution by planting false artefacts for forensic analysis and attacking from third party information systems. They analyse the methods of other attackers to learn new techniques. An example of this is the Umbrage project whereby malware is analysed to decide whether it should be implemented as a proof of concept. Another important finding is that attackers respect defence mechanisms such as: remote syslog (e.g. firewall), core dump files, database auditing, and Tripwire (e.g. SlyHeretic). These defences all have the potential to result in the attacker being discovered. Attackers must either negate the defence mechanism or find unprotected targets. Defenders can use technologies such as encryption to defend against interception and man-in-the-middle attacks. They can also employ honeytokens and honeypots to alarm misdirect, slow down and learn from attackers. By employing various tactics defenders are able to increase their chance of detecting and time to react to attacks, even those exploiting hitherto unknown vulnerabilities. To summarize the information presented in this thesis and to show the practical importance thereof, an examination is presented of the NSA's network intrusion of the SWIFT organisation. It shows that the firewalls were exploited with remote code execution zerodays. This attack has a striking parallel in the approach used in the recent VPNFilter malware. If nothing else, the leaks provide information to other actors on how to attack and what to avoid. However, by studying state actors, we can gain insight into what other actors with fewer resources can do in the future.
- Full Text:
- Date Issued: 2019
A comparison of exact string search algorithms for deep packet inspection
- Authors: Hunt, Kieran
- Date: 2018
- Subjects: Algorithms , Firewalls (Computer security) , Computer networks -- Security measures , Intrusion detection systems (Computer security) , Deep Packet Inspection
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/60629 , vital:27807
- Description: Every day, computer networks throughout the world face a constant onslaught of attacks. To combat these, network administrators are forced to employ a multitude of mitigating measures. Devices such as firewalls and Intrusion Detection Systems are prevalent today and employ extensive Deep Packet Inspection to scrutinise each piece of network traffic. Systems such as these usually require specialised hardware to meet the demand imposed by high throughput networks. Hardware like this is extremely expensive and singular in its function. It is with this in mind that the string search algorithms are introduced. These algorithms have been proven to perform well when searching through large volumes of text and may be able to perform equally well in the context of Deep Packet Inspection. String search algorithms are designed to match a single pattern to a substring of a given piece of text. This is not unlike the heuristics employed by traditional Deep Packet Inspection systems. This research compares the performance of a large number of string search algorithms during packet processing. Deep Packet Inspection places stringent restrictions on the reliability and speed of the algorithms due to increased performance pressures. A test system had to be designed in order to properly test the string search algorithms in the context of Deep Packet Inspection. The system allowed for precise and repeatable tests of each algorithm and then for their comparison. Of the algorithms tested, the Horspool and Quick Search algorithms posted the best results for both speed and reliability. The Not So Naive and Rabin-Karp algorithms were slowest overall.
- Full Text:
- Date Issued: 2018
- Authors: Hunt, Kieran
- Date: 2018
- Subjects: Algorithms , Firewalls (Computer security) , Computer networks -- Security measures , Intrusion detection systems (Computer security) , Deep Packet Inspection
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/60629 , vital:27807
- Description: Every day, computer networks throughout the world face a constant onslaught of attacks. To combat these, network administrators are forced to employ a multitude of mitigating measures. Devices such as firewalls and Intrusion Detection Systems are prevalent today and employ extensive Deep Packet Inspection to scrutinise each piece of network traffic. Systems such as these usually require specialised hardware to meet the demand imposed by high throughput networks. Hardware like this is extremely expensive and singular in its function. It is with this in mind that the string search algorithms are introduced. These algorithms have been proven to perform well when searching through large volumes of text and may be able to perform equally well in the context of Deep Packet Inspection. String search algorithms are designed to match a single pattern to a substring of a given piece of text. This is not unlike the heuristics employed by traditional Deep Packet Inspection systems. This research compares the performance of a large number of string search algorithms during packet processing. Deep Packet Inspection places stringent restrictions on the reliability and speed of the algorithms due to increased performance pressures. A test system had to be designed in order to properly test the string search algorithms in the context of Deep Packet Inspection. The system allowed for precise and repeatable tests of each algorithm and then for their comparison. Of the algorithms tested, the Horspool and Quick Search algorithms posted the best results for both speed and reliability. The Not So Naive and Rabin-Karp algorithms were slowest overall.
- Full Text:
- Date Issued: 2018
A convenient approach to the deterministic routing of MIDI messages
- Authors: Shaw, Brent Roy
- Date: 2018
- Subjects: MIDI (Standard) , Microcontrollers , XMOS Limited , Computer architecture , Embedded computer systems
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/63256 , vital:28387
- Description: This research investigates the design and development of a Wireless MIDI Connection Management solution in order to create a deterministic MIDI transmission system. A investigation of the MIDI protocol show it to have certain limitation that can be overcome through the use of transmission solutions. These solutions can be used to improve on the versatility of MIDI while overcoming the MIDI's notorious cable length limitation. XMOS's deterministic XS1 microcontrollers are used to enable the design of a real-time system. The MIDINet system is investigated to identify both the strengths and weaknesses of such a connection management system, while other systems for network transmission of MIDI messages are reviewed. These investigations lead to a design concept for a new network MIDI transmission system that allows for the remote management of connections. The design and subsequent implementation of both the transmission system and the connection management system are then detailed. A testing methodology is then devised to allow for the newly created connection management system to be compared to the MIDINet system. The findings show the deterministic system to have lower latency than that of the MIDINet system, while utilising more compact and power efficient hardware.
- Full Text:
- Date Issued: 2018
- Authors: Shaw, Brent Roy
- Date: 2018
- Subjects: MIDI (Standard) , Microcontrollers , XMOS Limited , Computer architecture , Embedded computer systems
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/63256 , vital:28387
- Description: This research investigates the design and development of a Wireless MIDI Connection Management solution in order to create a deterministic MIDI transmission system. A investigation of the MIDI protocol show it to have certain limitation that can be overcome through the use of transmission solutions. These solutions can be used to improve on the versatility of MIDI while overcoming the MIDI's notorious cable length limitation. XMOS's deterministic XS1 microcontrollers are used to enable the design of a real-time system. The MIDINet system is investigated to identify both the strengths and weaknesses of such a connection management system, while other systems for network transmission of MIDI messages are reviewed. These investigations lead to a design concept for a new network MIDI transmission system that allows for the remote management of connections. The design and subsequent implementation of both the transmission system and the connection management system are then detailed. A testing methodology is then devised to allow for the newly created connection management system to be compared to the MIDINet system. The findings show the deterministic system to have lower latency than that of the MIDINet system, while utilising more compact and power efficient hardware.
- Full Text:
- Date Issued: 2018
A framework for malicious host fingerprinting using distributed network sensors
- Authors: Hunter, Samuel Oswald
- Date: 2018
- Subjects: Computer networks -- Security measures , Malware (Computer software) , Multisensor data fusion , Distributed Sensor Networks , Automated Reconnaissance Framework , Latency Based Multilateration
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/60653 , vital:27811
- Description: Numerous software agents exist and are responsible for increasing volumes of malicious traffic that is observed on the Internet today. From a technical perspective the existing techniques for monitoring malicious agents and traffic were not developed to allow for the interrogation of the source of malicious traffic. This interrogation or reconnaissance would be considered active analysis as opposed to existing, mostly passive analysis. Unlike passive analysis, the active techniques are time-sensitive and their results become increasingly inaccurate as time delta between observation and interrogation increases. In addition to this, some studies had shown that the geographic separation of hosts on the Internet have resulted in pockets of different malicious agents and traffic targeting victims. As such it would be important to perform any kind of data collection over various source and in distributed IP address space. The data gathering and exposure capabilities of sensors such as honeypots and network telescopes were extended through the development of near-realtime Distributed Sensor Network modules that allowed for the near-realtime analysis of malicious traffic from distributed, heterogeneous monitoring sensors. In order to utilise the data exposed by the near-realtime Distributed Sensor Network modules an Automated Reconnaissance Framework was created, this framework was tasked with active and passive information collection and analysis of data in near-realtime and was designed from an adapted Multi Sensor Data Fusion model. The hypothesis was made that if sufficiently different characteristics of a host could be identified; combined they could act as a unique fingerprint for that host, potentially allowing for the re-identification of that host, even if its IP address had changed. To this end the concept of Latency Based Multilateration was introduced, acting as an additional metric for remote host fingerprinting. The vast amount of information gathered by the AR-Framework required the development of visualisation tools which could illustrate this data in near-realtime and also provided various degrees of interaction to accommodate human interpretation of such data. Ultimately the data collected through the application of the near-realtime Distributed Sensor Network and AR-Framework provided a unique perspective of a malicious host demographic. Allowing for new correlations to be drawn between attributes such as common open ports and operating systems, location, and inferred intent of these malicious hosts. The result of which expands our current understanding of malicious hosts on the Internet and enables further research in the area.
- Full Text:
- Date Issued: 2018
- Authors: Hunter, Samuel Oswald
- Date: 2018
- Subjects: Computer networks -- Security measures , Malware (Computer software) , Multisensor data fusion , Distributed Sensor Networks , Automated Reconnaissance Framework , Latency Based Multilateration
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/60653 , vital:27811
- Description: Numerous software agents exist and are responsible for increasing volumes of malicious traffic that is observed on the Internet today. From a technical perspective the existing techniques for monitoring malicious agents and traffic were not developed to allow for the interrogation of the source of malicious traffic. This interrogation or reconnaissance would be considered active analysis as opposed to existing, mostly passive analysis. Unlike passive analysis, the active techniques are time-sensitive and their results become increasingly inaccurate as time delta between observation and interrogation increases. In addition to this, some studies had shown that the geographic separation of hosts on the Internet have resulted in pockets of different malicious agents and traffic targeting victims. As such it would be important to perform any kind of data collection over various source and in distributed IP address space. The data gathering and exposure capabilities of sensors such as honeypots and network telescopes were extended through the development of near-realtime Distributed Sensor Network modules that allowed for the near-realtime analysis of malicious traffic from distributed, heterogeneous monitoring sensors. In order to utilise the data exposed by the near-realtime Distributed Sensor Network modules an Automated Reconnaissance Framework was created, this framework was tasked with active and passive information collection and analysis of data in near-realtime and was designed from an adapted Multi Sensor Data Fusion model. The hypothesis was made that if sufficiently different characteristics of a host could be identified; combined they could act as a unique fingerprint for that host, potentially allowing for the re-identification of that host, even if its IP address had changed. To this end the concept of Latency Based Multilateration was introduced, acting as an additional metric for remote host fingerprinting. The vast amount of information gathered by the AR-Framework required the development of visualisation tools which could illustrate this data in near-realtime and also provided various degrees of interaction to accommodate human interpretation of such data. Ultimately the data collected through the application of the near-realtime Distributed Sensor Network and AR-Framework provided a unique perspective of a malicious host demographic. Allowing for new correlations to be drawn between attributes such as common open ports and operating systems, location, and inferred intent of these malicious hosts. The result of which expands our current understanding of malicious hosts on the Internet and enables further research in the area.
- Full Text:
- Date Issued: 2018
An analysis of fusing advanced malware email protection logs, malware intelligence and active directory attributes as an instrument for threat intelligence
- Authors: Vermeulen, Japie
- Date: 2018
- Subjects: Malware (Computer software) , Computer networks Security measures , Data mining , Phishing , Data logging , Quantitative research
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/63922 , vital:28506
- Description: After more than four decades email is still the most widely used electronic communication medium today. This electronic communication medium has evolved into an electronic weapon of choice for cyber criminals ranging from the novice to the elite. As cyber criminals evolve with tools, tactics and procedures, so too are technology vendors coming forward with a variety of advanced malware protection systems. However, even if an organization adopts such a system, there is still the daily challenge of interpreting the log data and understanding the type of malicious email attack, including who the target was and what the payload was. This research examines a six month data set obtained from an advanced malware email protection system from a bank in South Africa. Extensive data fusion techniques are used to provide deeper insight into the data by blending these with malware intelligence and business context. The primary data set is fused with malware intelligence to identify the different malware families associated with the samples. Active Directory attributes such as the business cluster, department and job title of users targeted by malware are also fused into the combined data. This study provides insight into malware attacks experienced in the South African financial services sector. For example, most of the malware samples identified belonged to different types of ransomware families distributed by known botnets. However, indicators of targeted attacks were observed based on particular employees targeted with exploit code and specific strains of malware. Furthermore, a short time span between newly discovered vulnerabilities and the use of malicious code to exploit such vulnerabilities through email were observed in this study. The fused data set provided the context to answer the “who”, “what”, “where” and “when”. The proposed methodology can be applied to any organization to provide insight into the malware threats identified by advanced malware email protection systems. In addition, the fused data set provides threat intelligence that could be used to strengthen the cyber defences of an organization against cyber threats.
- Full Text:
- Date Issued: 2018
- Authors: Vermeulen, Japie
- Date: 2018
- Subjects: Malware (Computer software) , Computer networks Security measures , Data mining , Phishing , Data logging , Quantitative research
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/63922 , vital:28506
- Description: After more than four decades email is still the most widely used electronic communication medium today. This electronic communication medium has evolved into an electronic weapon of choice for cyber criminals ranging from the novice to the elite. As cyber criminals evolve with tools, tactics and procedures, so too are technology vendors coming forward with a variety of advanced malware protection systems. However, even if an organization adopts such a system, there is still the daily challenge of interpreting the log data and understanding the type of malicious email attack, including who the target was and what the payload was. This research examines a six month data set obtained from an advanced malware email protection system from a bank in South Africa. Extensive data fusion techniques are used to provide deeper insight into the data by blending these with malware intelligence and business context. The primary data set is fused with malware intelligence to identify the different malware families associated with the samples. Active Directory attributes such as the business cluster, department and job title of users targeted by malware are also fused into the combined data. This study provides insight into malware attacks experienced in the South African financial services sector. For example, most of the malware samples identified belonged to different types of ransomware families distributed by known botnets. However, indicators of targeted attacks were observed based on particular employees targeted with exploit code and specific strains of malware. Furthermore, a short time span between newly discovered vulnerabilities and the use of malicious code to exploit such vulnerabilities through email were observed in this study. The fused data set provided the context to answer the “who”, “what”, “where” and “when”. The proposed methodology can be applied to any organization to provide insight into the malware threats identified by advanced malware email protection systems. In addition, the fused data set provides threat intelligence that could be used to strengthen the cyber defences of an organization against cyber threats.
- Full Text:
- Date Issued: 2018
Building IKhwezi, a digital platform to capture everyday Indigenous Knowledge for improving educational outcomes in marginalised communities
- Authors: Ntšekhe, Mathe V K
- Date: 2018
- Subjects: Information technology , Knowledge management , Traditional ecological knowledge , Pedagogical content knowledge , Traditional ecological knowledge -- Technological innovations , IKhwezi , ICT4D , Indigenous Technological Pedagogical Content Knowledge (I-TPACK) , Siyakhula Living Lab
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/62505 , vital:28200
- Description: Aptly captured in the name, the broad mandate of Information and Communications Technologies for Development (ICT4D) is to facilitate the use of Information and Communication Technologies (ICTs) in society to support development. Education, as often stated, is the cornerstone for development, imparting knowledge for conceiving and realising development. In this thesis, we explore how everyday Indigenous Knowledge (IK) can be collected digitally, to enhance the educational outcomes of learners from marginalised backgrounds, by stimulating the production of teaching and learning materials that include the local imagery to have resonance with the learners. As part of the exploration, we reviewed a framework known as Technological Pedagogical Content Knowledge (TPACK), which spells out the different kinds of knowledge needed by teachers to teach effectively with ICTs. In this framework, IK is not present explicitly, but through the concept of context(s). Using Afrocentric and Pan-African scholarship, we argue that this logic is linked to colonialism and a critical decolonising pedagogy necessarily demands explication of IK: to make visible the cultures of the learners in the margins (e.g. Black rural learners). On the strength of this argument, we have proposed that TPACK be augumented to become Indigenous Technological Pedagogical Content Knowledge (I-TPACK). Through this augumentation, I-TPACK becomes an Afrocentric framework for a multicultural education in the digital era. The design of the digital platform for capturing IK relevant for formal education, was done in the Siyakhula Living Lab (SLL). The core idea of a Living Lab (LL) is that users must be understood in the context of their lived everyday reality. Further, they must be involved as co-creators in the design and innovation processes. On a methodological level, the LL environment allowed for the fusing together of multiple methods that can help to create a fitting solution. In this thesis, we followed an iterative user-centred methodology rooted in ethnography and phenomenology. Specifically, through long term conversations and interaction with teachers and ethnographic observations, we conceptualized a platform, IKhwezi, that facilitates the collection of context-sensitive content, collaboratively, and with cost and convenience in mind. We implemented this platform using MediaWiki, based on a number of considerations. From the ICT4D disciplinary point of view, a major consideration was being open to the possibility that other forms of innovation—and, not just ‘technovelty’ (i.e. technological/- technical innovation)—can provide a breakthrough or ingenious solution to the problem at hand. In a sense, we were reinforcing the growing sentiment within the discipline that technology is not the goal, but the means to foregrounding the commonality of the human experience in working towards development. Testing confirmed that there is some value in the platform. This is despite the challenges to onboard users, in pursuit of more content that could bolster the value of everyday IK in improving the educational outcomes of all learners.
- Full Text:
- Date Issued: 2018
- Authors: Ntšekhe, Mathe V K
- Date: 2018
- Subjects: Information technology , Knowledge management , Traditional ecological knowledge , Pedagogical content knowledge , Traditional ecological knowledge -- Technological innovations , IKhwezi , ICT4D , Indigenous Technological Pedagogical Content Knowledge (I-TPACK) , Siyakhula Living Lab
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/62505 , vital:28200
- Description: Aptly captured in the name, the broad mandate of Information and Communications Technologies for Development (ICT4D) is to facilitate the use of Information and Communication Technologies (ICTs) in society to support development. Education, as often stated, is the cornerstone for development, imparting knowledge for conceiving and realising development. In this thesis, we explore how everyday Indigenous Knowledge (IK) can be collected digitally, to enhance the educational outcomes of learners from marginalised backgrounds, by stimulating the production of teaching and learning materials that include the local imagery to have resonance with the learners. As part of the exploration, we reviewed a framework known as Technological Pedagogical Content Knowledge (TPACK), which spells out the different kinds of knowledge needed by teachers to teach effectively with ICTs. In this framework, IK is not present explicitly, but through the concept of context(s). Using Afrocentric and Pan-African scholarship, we argue that this logic is linked to colonialism and a critical decolonising pedagogy necessarily demands explication of IK: to make visible the cultures of the learners in the margins (e.g. Black rural learners). On the strength of this argument, we have proposed that TPACK be augumented to become Indigenous Technological Pedagogical Content Knowledge (I-TPACK). Through this augumentation, I-TPACK becomes an Afrocentric framework for a multicultural education in the digital era. The design of the digital platform for capturing IK relevant for formal education, was done in the Siyakhula Living Lab (SLL). The core idea of a Living Lab (LL) is that users must be understood in the context of their lived everyday reality. Further, they must be involved as co-creators in the design and innovation processes. On a methodological level, the LL environment allowed for the fusing together of multiple methods that can help to create a fitting solution. In this thesis, we followed an iterative user-centred methodology rooted in ethnography and phenomenology. Specifically, through long term conversations and interaction with teachers and ethnographic observations, we conceptualized a platform, IKhwezi, that facilitates the collection of context-sensitive content, collaboratively, and with cost and convenience in mind. We implemented this platform using MediaWiki, based on a number of considerations. From the ICT4D disciplinary point of view, a major consideration was being open to the possibility that other forms of innovation—and, not just ‘technovelty’ (i.e. technological/- technical innovation)—can provide a breakthrough or ingenious solution to the problem at hand. In a sense, we were reinforcing the growing sentiment within the discipline that technology is not the goal, but the means to foregrounding the commonality of the human experience in working towards development. Testing confirmed that there is some value in the platform. This is despite the challenges to onboard users, in pursuit of more content that could bolster the value of everyday IK in improving the educational outcomes of all learners.
- Full Text:
- Date Issued: 2018
De-identification of personal information for use in software testing to ensure compliance with the Protection of Personal Information Act
- Authors: Mark, Stephen John
- Date: 2018
- Subjects: Data processing , Information technology -- Security measures , Computer security -- South Africa , Data protection -- Law and legislation -- South Africa , Data encryption (Computer science) , Python (Computer program language) , SQL (Computer program language) , Protection of Personal Information Act (POPI)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/63888 , vital:28503
- Description: Encryption of Personally Identifiable Information stored in a Structured Query Language Database has been difficult for a long time. This is owing to block-cipher encryption algorithms changing the length and type of the input data when encrypted, which cannot subsequently be stored in the database without altering its structure. As the enactment of the South African Protection of Personal Information Act, No 4 of 2013 (POPI), was set in motion with the appointment of the Information Regulators Office in December 2016, South African companies are intensely focused on implementing compliance strategies and processes. The legislation, promulgated in 2013, encompasses the processing and storage of personally identifiable information (PII), ensuring that corporations act responsibly when collecting, storing and using individuals’ personal data. The Act comprises eight broad conditions that will become legislation once the new Information Regulator’s office is fully equipped to carry out their duties. POPI requires that individuals’ data should be kept confidential from all but those who specifically have permission to access the data. This means that not all members of IT teams should have access to the data unless it has been de-identified. This study tests an implementation of the Fixed Feistel 1 algorithm from the National Institute of Standards and Technology (NIST) “Special Publication 800-38G: Recommendation for Block Cipher Modes of Operation : Methods for Format-Preserving Encryption” using the LibFFX Python library. The Python scripting language was used for the experiments. The research shows that it is indeed possible to encrypt data in a Structured Query Language Database without changing the database schema using the new Format-Preserving encryption technique from NIST800-38G. Quality Assurance software testers can then run their full set of tests on the encrypted database. There is no reduction of encryption strength when using the FF1 encryption technique, compared to the underlying AES-128 encryption algorithm. It further shows that the utility of the data is not lost once it is encrypted.
- Full Text:
- Date Issued: 2018
- Authors: Mark, Stephen John
- Date: 2018
- Subjects: Data processing , Information technology -- Security measures , Computer security -- South Africa , Data protection -- Law and legislation -- South Africa , Data encryption (Computer science) , Python (Computer program language) , SQL (Computer program language) , Protection of Personal Information Act (POPI)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/63888 , vital:28503
- Description: Encryption of Personally Identifiable Information stored in a Structured Query Language Database has been difficult for a long time. This is owing to block-cipher encryption algorithms changing the length and type of the input data when encrypted, which cannot subsequently be stored in the database without altering its structure. As the enactment of the South African Protection of Personal Information Act, No 4 of 2013 (POPI), was set in motion with the appointment of the Information Regulators Office in December 2016, South African companies are intensely focused on implementing compliance strategies and processes. The legislation, promulgated in 2013, encompasses the processing and storage of personally identifiable information (PII), ensuring that corporations act responsibly when collecting, storing and using individuals’ personal data. The Act comprises eight broad conditions that will become legislation once the new Information Regulator’s office is fully equipped to carry out their duties. POPI requires that individuals’ data should be kept confidential from all but those who specifically have permission to access the data. This means that not all members of IT teams should have access to the data unless it has been de-identified. This study tests an implementation of the Fixed Feistel 1 algorithm from the National Institute of Standards and Technology (NIST) “Special Publication 800-38G: Recommendation for Block Cipher Modes of Operation : Methods for Format-Preserving Encryption” using the LibFFX Python library. The Python scripting language was used for the experiments. The research shows that it is indeed possible to encrypt data in a Structured Query Language Database without changing the database schema using the new Format-Preserving encryption technique from NIST800-38G. Quality Assurance software testers can then run their full set of tests on the encrypted database. There is no reduction of encryption strength when using the FF1 encryption technique, compared to the underlying AES-128 encryption algorithm. It further shows that the utility of the data is not lost once it is encrypted.
- Full Text:
- Date Issued: 2018
Designing and prototyping WebRTC and IMS integration using open source tools
- Authors: Motsumi, Tebagano Valerie
- Date: 2018
- Subjects: Internet Protocol multimedia subsystem , Session Initiation Protocol (Computer network protocol) , Computer software -- Development , Web Real-time Communications (WebRTC)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/63245 , vital:28386
- Description: WebRTC, or Web Real-time Communications, is a collection of web standards that detail the mechanisms, architectures and protocols that work together to deliver real-time multimedia services to the web browser. It represents a significant shift from the historical approach of using browser plugins, which over time, have proven cumbersome and problematic. Furthermore, it adopts various Internet standards in areas such as identity management, peer-to-peer connectivity, data exchange and media encoding, to provide a system that is truly open and interoperable. Given that WebRTC enables the delivery of multimedia content to any Internet Protocol (IP)-enabled device capable of hosting a web browser, this technology could potentially be used and deployed over millions of smartphones, tablets and personal computers worldwide. This service and device convergence remains an important goal of telecommunication network operators who seek to enable it through a converged network that is based on the IP Multimedia Subsystem (IMS). IMS is an IP-based subsystem that sits at the core of a modern telecommunication network and acts as the main routing substrate for media services and applications such as those that WebRTC realises. The combination of WebRTC and IMS represents an attractive coupling, and as such, a protracted investigation could help to answer important questions around the technical challenges that are involved in their integration, and the merits of various design alternatives that present themselves. This thesis is the result of such an investigation and culminates in the presentation of a detailed architectural model that is validated with a prototypical implementation in an open source testbed. The model is built on six requirements which emerge from an analysis of the literature, including previous interventions in IMS networks and a key technical report on design alternatives. Furthermore, this thesis argues that the client architecture requires support for web-oriented signalling, identity and call handling techniques leading to a potential for IMS networks to natively support these techniques as operator networks continue to grow and develop. The proposed model advocates the use of SIP over WebSockets for signalling and DTLS-SRTP for media to enable one-to-one communication and can be extended through additional functions resulting in a modular architecture. The model was implemented using open source tools which were assembled to create an experimental network testbed, and tests were conducted demonstrating successful cross domain communications under various conditions. The thesis has a strong focus on enabling ordinary software developers to assemble a prototypical network such as the one that was assembled and aims to enable experimentation in application use cases for integrated environments.
- Full Text:
- Date Issued: 2018
- Authors: Motsumi, Tebagano Valerie
- Date: 2018
- Subjects: Internet Protocol multimedia subsystem , Session Initiation Protocol (Computer network protocol) , Computer software -- Development , Web Real-time Communications (WebRTC)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/63245 , vital:28386
- Description: WebRTC, or Web Real-time Communications, is a collection of web standards that detail the mechanisms, architectures and protocols that work together to deliver real-time multimedia services to the web browser. It represents a significant shift from the historical approach of using browser plugins, which over time, have proven cumbersome and problematic. Furthermore, it adopts various Internet standards in areas such as identity management, peer-to-peer connectivity, data exchange and media encoding, to provide a system that is truly open and interoperable. Given that WebRTC enables the delivery of multimedia content to any Internet Protocol (IP)-enabled device capable of hosting a web browser, this technology could potentially be used and deployed over millions of smartphones, tablets and personal computers worldwide. This service and device convergence remains an important goal of telecommunication network operators who seek to enable it through a converged network that is based on the IP Multimedia Subsystem (IMS). IMS is an IP-based subsystem that sits at the core of a modern telecommunication network and acts as the main routing substrate for media services and applications such as those that WebRTC realises. The combination of WebRTC and IMS represents an attractive coupling, and as such, a protracted investigation could help to answer important questions around the technical challenges that are involved in their integration, and the merits of various design alternatives that present themselves. This thesis is the result of such an investigation and culminates in the presentation of a detailed architectural model that is validated with a prototypical implementation in an open source testbed. The model is built on six requirements which emerge from an analysis of the literature, including previous interventions in IMS networks and a key technical report on design alternatives. Furthermore, this thesis argues that the client architecture requires support for web-oriented signalling, identity and call handling techniques leading to a potential for IMS networks to natively support these techniques as operator networks continue to grow and develop. The proposed model advocates the use of SIP over WebSockets for signalling and DTLS-SRTP for media to enable one-to-one communication and can be extended through additional functions resulting in a modular architecture. The model was implemented using open source tools which were assembled to create an experimental network testbed, and tests were conducted demonstrating successful cross domain communications under various conditions. The thesis has a strong focus on enabling ordinary software developers to assemble a prototypical network such as the one that was assembled and aims to enable experimentation in application use cases for integrated environments.
- Full Text:
- Date Issued: 2018
Gaining cyber security insight through an analysis of open source intelligence data: an East African case study
- Authors: Chindipha, Stones Dalitso
- Date: 2018
- Subjects: Open source intelligence -- Africa, East , Computer security -- Africa, East , Computer networks -- Security measures -- Africa, East , Denial of service attacks -- Africa, East , Sentient Hvper-Optimised Data Access Network (SHODAN) , Internet Background Radiation (IBR)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/60618 , vital:27805
- Description: With each passing year the number of Internet users and connected devices grows, and this is particularly so in Africa. This growth brings with it an increase in the prevalence cyber-attacks. Looking at the current state of affairs, cybersecurity incidents are more likely to increase in African countries mainly due to the increased prevalence and affordability of broadband connectivity which is coupled with lack of online security awareness. The adoption of mobile banking has aggravated the situation making the continent more attractive to hackers who bank on the malpractices of users. Using Open Source Intelligence (OSINT) data sources like Sentient Hvper-Optimised Data Access Network (SHODAN) and Internet Background Radiation (IBR), this research explores the prevalence of vulnerabilities and their accessibility to evber threat actors. The research focuses on the East African Community (EAC) comprising of Tanzania, Kenya, Malawi, and Uganda, An IBR data set collected by a Rhodes University network telescope spanning over 72 months was used in this research, along with two snapshot period of data from the SHODAN project. The findings shows that there is a significant risk to systems within the EAC, particularly using the SHODAN data. The MITRE CVSS threat scoring system was applied to this research using FREAK and Heartbleed as sample vulnerabilities identified in EAC, When looking at IBR, the research has shown that attackers can use either destination ports or IP source addresses to perform an attack which if not attended to may be reused yearly until later on move to the allocated IP address space once it starts making random probes. The moment it finds one vulnerable client on the network it spreads throughout like a worm, DDoS is one the attacks that can be generated from IBR, Since the SHODAN dataset had two collection points, the study has shown the changes that have occurred in Malawi and Tanzania for a period of 14 months by using three variables i.e, device type, operating systems, and ports. The research has also identified vulnerable devices in all the four countries. Apart from that, the study identified operating systems, products, OpenSSL, ports and ISPs as some of the variables that can be used to identify vulnerabilities in systems. In the ease of OpenSSL and products, this research went further by identifying the type of attack that can occur and its associated CVE-ID.
- Full Text:
- Date Issued: 2018
- Authors: Chindipha, Stones Dalitso
- Date: 2018
- Subjects: Open source intelligence -- Africa, East , Computer security -- Africa, East , Computer networks -- Security measures -- Africa, East , Denial of service attacks -- Africa, East , Sentient Hvper-Optimised Data Access Network (SHODAN) , Internet Background Radiation (IBR)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/60618 , vital:27805
- Description: With each passing year the number of Internet users and connected devices grows, and this is particularly so in Africa. This growth brings with it an increase in the prevalence cyber-attacks. Looking at the current state of affairs, cybersecurity incidents are more likely to increase in African countries mainly due to the increased prevalence and affordability of broadband connectivity which is coupled with lack of online security awareness. The adoption of mobile banking has aggravated the situation making the continent more attractive to hackers who bank on the malpractices of users. Using Open Source Intelligence (OSINT) data sources like Sentient Hvper-Optimised Data Access Network (SHODAN) and Internet Background Radiation (IBR), this research explores the prevalence of vulnerabilities and their accessibility to evber threat actors. The research focuses on the East African Community (EAC) comprising of Tanzania, Kenya, Malawi, and Uganda, An IBR data set collected by a Rhodes University network telescope spanning over 72 months was used in this research, along with two snapshot period of data from the SHODAN project. The findings shows that there is a significant risk to systems within the EAC, particularly using the SHODAN data. The MITRE CVSS threat scoring system was applied to this research using FREAK and Heartbleed as sample vulnerabilities identified in EAC, When looking at IBR, the research has shown that attackers can use either destination ports or IP source addresses to perform an attack which if not attended to may be reused yearly until later on move to the allocated IP address space once it starts making random probes. The moment it finds one vulnerable client on the network it spreads throughout like a worm, DDoS is one the attacks that can be generated from IBR, Since the SHODAN dataset had two collection points, the study has shown the changes that have occurred in Malawi and Tanzania for a period of 14 months by using three variables i.e, device type, operating systems, and ports. The research has also identified vulnerable devices in all the four countries. Apart from that, the study identified operating systems, products, OpenSSL, ports and ISPs as some of the variables that can be used to identify vulnerabilities in systems. In the ease of OpenSSL and products, this research went further by identifying the type of attack that can occur and its associated CVE-ID.
- Full Text:
- Date Issued: 2018
Investigating combinations of feature extraction and classification for improved image-based multimodal biometric systems at the feature level
- Authors: Brown, Dane
- Date: 2018
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/63470 , vital:28414
- Description: Multimodal biometrics has become a popular means of overcoming the limitations of unimodal biometric systems. However, the rich information particular to the feature level is of a complex nature and leveraging its potential without overfitting a classifier is not well studied. This research investigates feature-classifier combinations on the fingerprint, face, palmprint, and iris modalities to effectively fuse their feature vectors for a complementary result. The effects of different feature-classifier combinations are thus isolated to identify novel or improved algorithms. A new face segmentation algorithm is shown to increase consistency in nominal and extreme scenarios. Moreover, two novel feature extraction techniques demonstrate better adaptation to dynamic lighting conditions, while reducing feature dimensionality to the benefit of classifiers. A comprehensive set of unimodal experiments are carried out to evaluate both verification and identification performance on a variety of datasets using four classifiers, namely Eigen, Fisher, Local Binary Pattern Histogram and linear Support Vector Machine on various feature extraction methods. The recognition performance of the proposed algorithms are shown to outperform the vast majority of related studies, when using the same dataset under the same test conditions. In the unimodal comparisons presented, the proposed approaches outperform existing systems even when given a handicap such as fewer training samples or data with a greater number of classes. A separate comprehensive set of experiments on feature fusion show that combining modality data provides a substantial increase in accuracy, with only a few exceptions that occur when differences in the image data quality of two modalities are substantial. However, when two poor quality datasets are fused, noticeable gains in recognition performance are realized when using the novel feature extraction approach. Finally, feature-fusion guidelines are proposed to provide the necessary insight to leverage the rich information effectively when fusing multiple biometric modalities at the feature level. These guidelines serve as the foundation to better understand and construct biometric systems that are effective in a variety of applications.
- Full Text:
- Date Issued: 2018
- Authors: Brown, Dane
- Date: 2018
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/63470 , vital:28414
- Description: Multimodal biometrics has become a popular means of overcoming the limitations of unimodal biometric systems. However, the rich information particular to the feature level is of a complex nature and leveraging its potential without overfitting a classifier is not well studied. This research investigates feature-classifier combinations on the fingerprint, face, palmprint, and iris modalities to effectively fuse their feature vectors for a complementary result. The effects of different feature-classifier combinations are thus isolated to identify novel or improved algorithms. A new face segmentation algorithm is shown to increase consistency in nominal and extreme scenarios. Moreover, two novel feature extraction techniques demonstrate better adaptation to dynamic lighting conditions, while reducing feature dimensionality to the benefit of classifiers. A comprehensive set of unimodal experiments are carried out to evaluate both verification and identification performance on a variety of datasets using four classifiers, namely Eigen, Fisher, Local Binary Pattern Histogram and linear Support Vector Machine on various feature extraction methods. The recognition performance of the proposed algorithms are shown to outperform the vast majority of related studies, when using the same dataset under the same test conditions. In the unimodal comparisons presented, the proposed approaches outperform existing systems even when given a handicap such as fewer training samples or data with a greater number of classes. A separate comprehensive set of experiments on feature fusion show that combining modality data provides a substantial increase in accuracy, with only a few exceptions that occur when differences in the image data quality of two modalities are substantial. However, when two poor quality datasets are fused, noticeable gains in recognition performance are realized when using the novel feature extraction approach. Finally, feature-fusion guidelines are proposed to provide the necessary insight to leverage the rich information effectively when fusing multiple biometric modalities at the feature level. These guidelines serve as the foundation to better understand and construct biometric systems that are effective in a variety of applications.
- Full Text:
- Date Issued: 2018
NetwIOC: a framework for the automated generation of network-based IOCS for malware information sharing and defence
- Authors: Rudman, Lauren Lynne
- Date: 2018
- Subjects: Malware (Computer software) , Computer networks Security measures , Computer security , Python (Computer program language)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/60639 , vital:27809
- Description: With the substantial number of new malware variants found each day, it is useful to have an efficient way to retrieve Indicators of Compromise (IOCs) from the malware in a format suitable for sharing and detection. In the past, these indicators were manually created after inspection of binary samples and network traffic. The Cuckoo Sandbox, is an existing dynamic malware analysis system which meets the requirements for the proposed framework and was extended by adding a few custom modules. This research explored a way to automate the generation of detailed network-based IOCs in a popular format which can be used for sharing. This was done through careful filtering and analysis of the PCAP hie generated by the sandbox, and placing these values into the correct type of STIX objects using Python, Through several evaluations, analysis of what type of network traffic can be expected for the creation of IOCs was conducted, including a brief ease study that examined the effect of analysis time on the number of IOCs created. Using the automatically generated IOCs to create defence and detection mechanisms for the network was evaluated and proved successful, A proof of concept sharing platform developed for the STIX IOCs is showcased at the end of the research.
- Full Text:
- Date Issued: 2018
- Authors: Rudman, Lauren Lynne
- Date: 2018
- Subjects: Malware (Computer software) , Computer networks Security measures , Computer security , Python (Computer program language)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/60639 , vital:27809
- Description: With the substantial number of new malware variants found each day, it is useful to have an efficient way to retrieve Indicators of Compromise (IOCs) from the malware in a format suitable for sharing and detection. In the past, these indicators were manually created after inspection of binary samples and network traffic. The Cuckoo Sandbox, is an existing dynamic malware analysis system which meets the requirements for the proposed framework and was extended by adding a few custom modules. This research explored a way to automate the generation of detailed network-based IOCs in a popular format which can be used for sharing. This was done through careful filtering and analysis of the PCAP hie generated by the sandbox, and placing these values into the correct type of STIX objects using Python, Through several evaluations, analysis of what type of network traffic can be expected for the creation of IOCs was conducted, including a brief ease study that examined the effect of analysis time on the number of IOCs created. Using the automatically generated IOCs to create defence and detection mechanisms for the network was evaluated and proved successful, A proof of concept sharing platform developed for the STIX IOCs is showcased at the end of the research.
- Full Text:
- Date Issued: 2018
Practical application of distributed ledger technology in support of digital evidence integrity verification processes
- Authors: Weilbach, William Thomas
- Date: 2018
- Subjects: Digital forensic science , Blockchains (Databases) , Bitcoin , Distributed databases , Computer systems Verification
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/61872 , vital:28070
- Description: After its birth in cryptocurrencies, distributed ledger (blockchain) technology rapidly grew in popularity in other technology domains. Alternative applications of this technology range from digitizing the bank guarantees process for commercial property leases (Anz and IBM, 2017) to tracking the provenance of high-value physical goods (Everledger Ltd., 2017). As a whole, distributed ledger technology has acted as a catalyst to the rise of many innovative alternative solutions to existing problems, mostly associated with trust and integrity. In this research, a niche application of this technology is proposed for use in digital forensics by providing a mechanism for the transparent and irrefutable verification of digital evidence, ensuring its integrity as established blockchains serve as an ideal mechanism to store and validate arbitrary data against. Evaluation and identification of candidate technologies in this domain is based on a set of requirements derived from previous work in this field (Weilbach, 2014). OpenTimestamps (Todd, 2016b) is chosen as the foundation of further work for its robust architecture, transparent nature and multi-platform support. A robust evaluation and discussion of OpenTimestamps is performed to reinforce why it can be trusted as an implementation and protocol. An implementation of OpenTimestamps is designed for the popular open source forensic tool, Autopsy, and an Autopsy module is subsequently developed and released to the public. OpenTimestamps is tested at scale and found to have insignificant error rates for the verification of timestamps. Through practical implementation and extensive testing, it is shown that OpenTimestamps has the potential to significantly advance the practice of digital evidence integrity verification. A conclusion is reached by discussing some of the limitations of OpenTimestamps in terms of accuracy and error rates. It is shown that although OpenTimestamps has very specific timing claims in the attestation, with a near zero error rate, the actual attestation is truly accurate to within a day. This is followed by proposing potential avenues for future work.
- Full Text:
- Date Issued: 2018
- Authors: Weilbach, William Thomas
- Date: 2018
- Subjects: Digital forensic science , Blockchains (Databases) , Bitcoin , Distributed databases , Computer systems Verification
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/61872 , vital:28070
- Description: After its birth in cryptocurrencies, distributed ledger (blockchain) technology rapidly grew in popularity in other technology domains. Alternative applications of this technology range from digitizing the bank guarantees process for commercial property leases (Anz and IBM, 2017) to tracking the provenance of high-value physical goods (Everledger Ltd., 2017). As a whole, distributed ledger technology has acted as a catalyst to the rise of many innovative alternative solutions to existing problems, mostly associated with trust and integrity. In this research, a niche application of this technology is proposed for use in digital forensics by providing a mechanism for the transparent and irrefutable verification of digital evidence, ensuring its integrity as established blockchains serve as an ideal mechanism to store and validate arbitrary data against. Evaluation and identification of candidate technologies in this domain is based on a set of requirements derived from previous work in this field (Weilbach, 2014). OpenTimestamps (Todd, 2016b) is chosen as the foundation of further work for its robust architecture, transparent nature and multi-platform support. A robust evaluation and discussion of OpenTimestamps is performed to reinforce why it can be trusted as an implementation and protocol. An implementation of OpenTimestamps is designed for the popular open source forensic tool, Autopsy, and an Autopsy module is subsequently developed and released to the public. OpenTimestamps is tested at scale and found to have insignificant error rates for the verification of timestamps. Through practical implementation and extensive testing, it is shown that OpenTimestamps has the potential to significantly advance the practice of digital evidence integrity verification. A conclusion is reached by discussing some of the limitations of OpenTimestamps in terms of accuracy and error rates. It is shown that although OpenTimestamps has very specific timing claims in the attestation, with a near zero error rate, the actual attestation is truly accurate to within a day. This is followed by proposing potential avenues for future work.
- Full Text:
- Date Issued: 2018
Preimages for SHA-1
- Authors: Motara, Yusuf Moosa
- Date: 2018
- Subjects: Data encryption (Computer science) , Computer security -- Software , Hashing (Computer science) , Data compression (Computer science) , Preimage , Secure Hash Algorithm 1 (SHA-1)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/57885 , vital:27004
- Description: This research explores the problem of finding a preimage — an input that, when passed through a particular function, will result in a pre-specified output — for the compression function of the SHA-1 cryptographic hash. This problem is much more difficult than the problem of finding a collision for a hash function, and preimage attacks for very few popular hash functions are known. The research begins by introducing the field and giving an overview of the existing work in the area. A thorough analysis of the compression function is made, resulting in alternative formulations for both parts of the function, and both statistical and theoretical tools to determine the difficulty of the SHA-1 preimage problem. Different representations (And- Inverter Graph, Binary Decision Diagram, Conjunctive Normal Form, Constraint Satisfaction form, and Disjunctive Normal Form) and associated tools to manipulate and/or analyse these representations are then applied and explored, and results are collected and interpreted. In conclusion, the SHA-1 preimage problem remains unsolved and insoluble for the foreseeable future. The primary issue is one of efficient representation; despite a promising theoretical difficulty, both the diffusion characteristics and the depth of the tree stand in the way of efficient search. Despite this, the research served to confirm and quantify the difficulty of the problem both theoretically, using Schaefer's Theorem, and practically, in the context of different representations.
- Full Text:
- Date Issued: 2018
- Authors: Motara, Yusuf Moosa
- Date: 2018
- Subjects: Data encryption (Computer science) , Computer security -- Software , Hashing (Computer science) , Data compression (Computer science) , Preimage , Secure Hash Algorithm 1 (SHA-1)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/57885 , vital:27004
- Description: This research explores the problem of finding a preimage — an input that, when passed through a particular function, will result in a pre-specified output — for the compression function of the SHA-1 cryptographic hash. This problem is much more difficult than the problem of finding a collision for a hash function, and preimage attacks for very few popular hash functions are known. The research begins by introducing the field and giving an overview of the existing work in the area. A thorough analysis of the compression function is made, resulting in alternative formulations for both parts of the function, and both statistical and theoretical tools to determine the difficulty of the SHA-1 preimage problem. Different representations (And- Inverter Graph, Binary Decision Diagram, Conjunctive Normal Form, Constraint Satisfaction form, and Disjunctive Normal Form) and associated tools to manipulate and/or analyse these representations are then applied and explored, and results are collected and interpreted. In conclusion, the SHA-1 preimage problem remains unsolved and insoluble for the foreseeable future. The primary issue is one of efficient representation; despite a promising theoretical difficulty, both the diffusion characteristics and the depth of the tree stand in the way of efficient search. Despite this, the research served to confirm and quantify the difficulty of the problem both theoretically, using Schaefer's Theorem, and practically, in the context of different representations.
- Full Text:
- Date Issued: 2018
Pursuing cost-effective secure network micro-segmentation
- Authors: Fürst, Mark Richard
- Date: 2018
- Subjects: Computer networks -- Security measures , Computer networks -- Access control , Firewalls (Computer security) , IPSec (Computer network protocol) , Network micro-segmentation
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/131106 , vital:36524
- Description: Traditional network segmentation allows discrete trust levels to be defined for different network segments, using physical firewalls or routers that control north-south traffic flowing between different interfaces. This technique reduces the attack surface area should an attacker breach one of the perimeter defences. However, east-west traffic flowing between endpoints within the same network segment does not pass through a firewall, and an attacker may be able to move laterally between endpoints within that segment. Network micro-segmentation was designed to address the challenge of controlling east-west traffic, and various solutions have been released with differing levels of capabilities and feature sets. These approaches range from simple network switch Access Control List based segmentation to complex hypervisor based software-defined security segments defined down to the individual workload, container or process level, and enforced via policy based security controls for each segment. Several commercial solutions for network micro-segmentation exist, but these are primarily focused on physical and cloud data centres, and are often accompanied by significant capital outlay and resource requirements. Given these constraints, this research determines whether existing tools provided with operating systems can be re-purposed to implement micro-segmentation and restrict east-west traffic within one or more network segments for a small-to-medium sized corporate network. To this end, a proof-of-concept lab environment was built with a heterogeneous mix of Windows and Linux virtual servers and workstations deployed in an Active Directory domain. The use of Group Policy Objects to deploy IPsec Server and Domain Isolation for controlling traffic between endpoints is examined, in conjunction with IPsec Authenticated Header and Encapsulating Security Payload modes as an additional layer of security. The outcome of the research shows that revisiting existing tools can enable organisations to implement an additional, cost-effective secure layer of defence in their network.
- Full Text:
- Date Issued: 2018
- Authors: Fürst, Mark Richard
- Date: 2018
- Subjects: Computer networks -- Security measures , Computer networks -- Access control , Firewalls (Computer security) , IPSec (Computer network protocol) , Network micro-segmentation
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/131106 , vital:36524
- Description: Traditional network segmentation allows discrete trust levels to be defined for different network segments, using physical firewalls or routers that control north-south traffic flowing between different interfaces. This technique reduces the attack surface area should an attacker breach one of the perimeter defences. However, east-west traffic flowing between endpoints within the same network segment does not pass through a firewall, and an attacker may be able to move laterally between endpoints within that segment. Network micro-segmentation was designed to address the challenge of controlling east-west traffic, and various solutions have been released with differing levels of capabilities and feature sets. These approaches range from simple network switch Access Control List based segmentation to complex hypervisor based software-defined security segments defined down to the individual workload, container or process level, and enforced via policy based security controls for each segment. Several commercial solutions for network micro-segmentation exist, but these are primarily focused on physical and cloud data centres, and are often accompanied by significant capital outlay and resource requirements. Given these constraints, this research determines whether existing tools provided with operating systems can be re-purposed to implement micro-segmentation and restrict east-west traffic within one or more network segments for a small-to-medium sized corporate network. To this end, a proof-of-concept lab environment was built with a heterogeneous mix of Windows and Linux virtual servers and workstations deployed in an Active Directory domain. The use of Group Policy Objects to deploy IPsec Server and Domain Isolation for controlling traffic between endpoints is examined, in conjunction with IPsec Authenticated Header and Encapsulating Security Payload modes as an additional layer of security. The outcome of the research shows that revisiting existing tools can enable organisations to implement an additional, cost-effective secure layer of defence in their network.
- Full Text:
- Date Issued: 2018
Towards a collection of cost-effective technologies in support of the NIST cybersecurity framework
- Shackleton, Bruce Michael Stuart
- Authors: Shackleton, Bruce Michael Stuart
- Date: 2018
- Subjects: National Institute of Standards and Technology (U.S.) , Computer security , Computer networks Security measures , Small business Information technology Cost effectiveness , Open source software
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/62494 , vital:28199
- Description: The NIST Cybersecurity Framework (CSF) is a specific risk and cybersecurity framework. It provides guidance on controls that can be implemented to help improve an organisation’s cybersecurity risk posture. The CSF Functions consist of Identify, Protect, Detect, Respond, and Recover. Like most Information Technology (IT) frameworks, there are elements of people, processes, and technology. The same elements are required to successfully implement the NIST CSF. This research specifically focuses on the technology element. While there are many commercial technologies available for a small to medium sized business, the costs can be prohibitively expensive. Therefore, this research investigates cost-effective technologies and assesses their alignment to the NIST CSF. The assessment was made against the NIST CSF subcategories. Each subcategory was analysed to identify where a technology would likely be required. The framework provides a list of Informative References. These Informative References were used to create high- level technology categories, as well as identify the technical controls against which the technologies were measured. The technologies tested were either open source or proprietary. All open source technologies tested were free to use, or have a free community edition. Proprietary technologies would be free to use, or considered generally available to most organisations, such as components contained within Microsoft platforms. The results from the experimentation demonstrated that there are multiple cost-effective technologies that can support the NIST CSF. Once all technologies were tested, the NIST CSF was extended. Two new columns were added, namely high-level technology category, and tested technology. The columns were populated with output from the research. This extended framework begins an initial collection of cost-effective technologies in support of the NIST CSF.
- Full Text:
- Date Issued: 2018
- Authors: Shackleton, Bruce Michael Stuart
- Date: 2018
- Subjects: National Institute of Standards and Technology (U.S.) , Computer security , Computer networks Security measures , Small business Information technology Cost effectiveness , Open source software
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/62494 , vital:28199
- Description: The NIST Cybersecurity Framework (CSF) is a specific risk and cybersecurity framework. It provides guidance on controls that can be implemented to help improve an organisation’s cybersecurity risk posture. The CSF Functions consist of Identify, Protect, Detect, Respond, and Recover. Like most Information Technology (IT) frameworks, there are elements of people, processes, and technology. The same elements are required to successfully implement the NIST CSF. This research specifically focuses on the technology element. While there are many commercial technologies available for a small to medium sized business, the costs can be prohibitively expensive. Therefore, this research investigates cost-effective technologies and assesses their alignment to the NIST CSF. The assessment was made against the NIST CSF subcategories. Each subcategory was analysed to identify where a technology would likely be required. The framework provides a list of Informative References. These Informative References were used to create high- level technology categories, as well as identify the technical controls against which the technologies were measured. The technologies tested were either open source or proprietary. All open source technologies tested were free to use, or have a free community edition. Proprietary technologies would be free to use, or considered generally available to most organisations, such as components contained within Microsoft platforms. The results from the experimentation demonstrated that there are multiple cost-effective technologies that can support the NIST CSF. Once all technologies were tested, the NIST CSF was extended. Two new columns were added, namely high-level technology category, and tested technology. The columns were populated with output from the research. This extended framework begins an initial collection of cost-effective technologies in support of the NIST CSF.
- Full Text:
- Date Issued: 2018
A risk-based framework for new products: a South African telecommunication’s study
- Authors: Jeffries, Michael
- Date: 2017
- Subjects: Telephone companies -- Risk management -- South Africa , Telephone companies -- South Africa -- Case studies , Telecommunication -- Security measures -- South Africa
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/4765 , vital:20722
- Description: The integrated reports of Vodacom, Telkom and MTN — telecommunication organisations in South Africa — show that they are diversifying their product offerings from traditional voice and data services. These organisations are including new offerings covering the financial, health, insurance and mobile education services. The potential exists for these organisations to launch products that are substandard and which either do not take into account customer needs or do not comply with current legislations or regulations. Most telecommunication organisations have a well-defined enterprise risk management program, to ensure compliance to King III, however risk management processes specifically for new products and services might be lacking. The responsibility usually resides with the product managers for the implementation of robust products; however they do not always have the correct skillset to ensure adherence to governance requirements and therefore might not be aware of which laws they might not be adhering to, or understand the customers’ requirements. More complex products, additional competition, changes to technology and new business ventures have reinforced the need to manage risk on telecommunication products. Failure to take risk requirements into account could lead to potential fines, damage the organisation’s reputation which could lead to customers churning from these service providers. This research analyses three periods of data captured from a mobile telecommunication organisation to assess the current status quo of risk management maturity within the organisation’s product and service environment. Based on the analysis as well as industry best practices, a risk management framework for products is proposed that can assist product managers analyse concepts to ensure adherence to governance requirements. This could assist new product or service offerings in the marketplace do not create a perception of distrust by consumers.
- Full Text:
- Date Issued: 2017
- Authors: Jeffries, Michael
- Date: 2017
- Subjects: Telephone companies -- Risk management -- South Africa , Telephone companies -- South Africa -- Case studies , Telecommunication -- Security measures -- South Africa
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/4765 , vital:20722
- Description: The integrated reports of Vodacom, Telkom and MTN — telecommunication organisations in South Africa — show that they are diversifying their product offerings from traditional voice and data services. These organisations are including new offerings covering the financial, health, insurance and mobile education services. The potential exists for these organisations to launch products that are substandard and which either do not take into account customer needs or do not comply with current legislations or regulations. Most telecommunication organisations have a well-defined enterprise risk management program, to ensure compliance to King III, however risk management processes specifically for new products and services might be lacking. The responsibility usually resides with the product managers for the implementation of robust products; however they do not always have the correct skillset to ensure adherence to governance requirements and therefore might not be aware of which laws they might not be adhering to, or understand the customers’ requirements. More complex products, additional competition, changes to technology and new business ventures have reinforced the need to manage risk on telecommunication products. Failure to take risk requirements into account could lead to potential fines, damage the organisation’s reputation which could lead to customers churning from these service providers. This research analyses three periods of data captured from a mobile telecommunication organisation to assess the current status quo of risk management maturity within the organisation’s product and service environment. Based on the analysis as well as industry best practices, a risk management framework for products is proposed that can assist product managers analyse concepts to ensure adherence to governance requirements. This could assist new product or service offerings in the marketplace do not create a perception of distrust by consumers.
- Full Text:
- Date Issued: 2017
A unified data repository for rich communication services
- Authors: Sogunle, Oluwasegun Francis
- Date: 2017
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/5777 , vital:20974
- Description: Rich Communication Services (RCS) is a framework that defines a set of IP-based services for the delivery of multimedia communications to mobile network subscribers. The framework unifies a set of pre-existing communication services under a single name, and permits network operators to re-use investments in existing network infrastructure, especially the IP Multimedia Subsystem (IMS), which is a core part of a mobile network and also acts as a docking station for RCS services. RCS generates and utilises disparate subscriber data sets during execution, however, it lacks a harmonised repository for the management of such data sets, thus making it difficult to obtain a unified view of heterogeneous subscriber data. This thesis proposes the creation of a unified data repository for RCS which is based on the User Data Convergence (UDC) standard. The standard was proposed by the 3rd Generation Partnership Project (3GPP), a major telecommunications standardisation group. UDC provides an approach for consolidating subscriber data into a single logical repository without adversely affecting existing network infrastructure, such as the IMS. Thus, this thesis details the design and development of a prototypical implementation of a unified repository, named Converged Subscriber Data Repository (CSDR). It adopts a polyglot persistence model for the underlying data store and exposes heterogeneous data through the Open Data Protocol (OData), which is a candidate implementation of the Ud interface defined in the UDC architecture. With the introduction of polyglot persistence, multiple data stores can be used within the CSDR and disparate network data sources can access heterogeneous data sets using OData as a standard communications protocol. As the CSDR persistence model becomes more complex due to the inclusion of more storage technologies, polyglot persistence ensures a consistent conceptual view of these data sets through OData. Importantly, the CSDR prototype was integrated into a popular open-source implementation of the core part of an IMS network known as the Open IMS Core. The successful integration of the prototype demonstrates its ability to manage and expose a consolidated view of heterogeneous subscriber data, which are generated and used by different RCS services deployed within IMS.
- Full Text:
- Date Issued: 2017
- Authors: Sogunle, Oluwasegun Francis
- Date: 2017
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/5777 , vital:20974
- Description: Rich Communication Services (RCS) is a framework that defines a set of IP-based services for the delivery of multimedia communications to mobile network subscribers. The framework unifies a set of pre-existing communication services under a single name, and permits network operators to re-use investments in existing network infrastructure, especially the IP Multimedia Subsystem (IMS), which is a core part of a mobile network and also acts as a docking station for RCS services. RCS generates and utilises disparate subscriber data sets during execution, however, it lacks a harmonised repository for the management of such data sets, thus making it difficult to obtain a unified view of heterogeneous subscriber data. This thesis proposes the creation of a unified data repository for RCS which is based on the User Data Convergence (UDC) standard. The standard was proposed by the 3rd Generation Partnership Project (3GPP), a major telecommunications standardisation group. UDC provides an approach for consolidating subscriber data into a single logical repository without adversely affecting existing network infrastructure, such as the IMS. Thus, this thesis details the design and development of a prototypical implementation of a unified repository, named Converged Subscriber Data Repository (CSDR). It adopts a polyglot persistence model for the underlying data store and exposes heterogeneous data through the Open Data Protocol (OData), which is a candidate implementation of the Ud interface defined in the UDC architecture. With the introduction of polyglot persistence, multiple data stores can be used within the CSDR and disparate network data sources can access heterogeneous data sets using OData as a standard communications protocol. As the CSDR persistence model becomes more complex due to the inclusion of more storage technologies, polyglot persistence ensures a consistent conceptual view of these data sets through OData. Importantly, the CSDR prototype was integrated into a popular open-source implementation of the core part of an IMS network known as the Open IMS Core. The successful integration of the prototype demonstrates its ability to manage and expose a consolidated view of heterogeneous subscriber data, which are generated and used by different RCS services deployed within IMS.
- Full Text:
- Date Issued: 2017
Evolving an efficient and effective off-the-shelf computing infrastructure for schools in rural areas of South Africa
- Authors: Siebörger, Ingrid Gisélle
- Date: 2017
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/14557 , vital:21938
- Description: Upliftment of rural areas and poverty alleviation are priorities for development in South Africa. Information and knowledge are key strategic resources for social and economic development and ICTs act as tools to support them, enabling innovative and more cost effective approaches. In order for ICT interventions to be possible, infrastructure has to be deployed. For the deployment to be effective and sustainable, the local community needs to be involved in shaping and supporting it. This study describes the technical work done in the Siyakhula Living Lab (SLL), a long-term ICT4D experiment in the Mbashe Municipality, with a focus on the deployment of ICT infrastructure in schools, for teaching and learning but also for use by the communities surrounding the schools. As a result of this work, computing infrastructure was deployed, in various phases, in 17 schools in the area and a “broadband island” connecting them was created. The dissertation reports on the initial deployment phases, discussing theoretical underpinnings and policies for using technology in education as well various computing and networking technologies and associated policies available and appropriate for use in rural South African schools. This information forms the backdrop of a survey conducted with teachers from six schools in the SLL, together with experimental work towards the provision of an evolved, efficient and effective off-the-shelf computing infrastructure in selected schools, in order to attempt to address the shortcomings of the computing infrastructure deployed initially in the SLL. The result of the study is the proposal of an evolved computing infrastructure model for use in rural South African schools.
- Full Text:
- Date Issued: 2017
- Authors: Siebörger, Ingrid Gisélle
- Date: 2017
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/14557 , vital:21938
- Description: Upliftment of rural areas and poverty alleviation are priorities for development in South Africa. Information and knowledge are key strategic resources for social and economic development and ICTs act as tools to support them, enabling innovative and more cost effective approaches. In order for ICT interventions to be possible, infrastructure has to be deployed. For the deployment to be effective and sustainable, the local community needs to be involved in shaping and supporting it. This study describes the technical work done in the Siyakhula Living Lab (SLL), a long-term ICT4D experiment in the Mbashe Municipality, with a focus on the deployment of ICT infrastructure in schools, for teaching and learning but also for use by the communities surrounding the schools. As a result of this work, computing infrastructure was deployed, in various phases, in 17 schools in the area and a “broadband island” connecting them was created. The dissertation reports on the initial deployment phases, discussing theoretical underpinnings and policies for using technology in education as well various computing and networking technologies and associated policies available and appropriate for use in rural South African schools. This information forms the backdrop of a survey conducted with teachers from six schools in the SLL, together with experimental work towards the provision of an evolved, efficient and effective off-the-shelf computing infrastructure in selected schools, in order to attempt to address the shortcomings of the computing infrastructure deployed initially in the SLL. The result of the study is the proposal of an evolved computing infrastructure model for use in rural South African schools.
- Full Text:
- Date Issued: 2017
Transcription factor binding specificity and occupancy : elucidation, modelling and evaluation
- Authors: Kibet, Caleb Kipkurui
- Date: 2017
- Subjects: Transcription factors , Transcription factors -- Data processing , Motif Assessment and Ranking Suite
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: vital:21185 , http://hdl.handle.net/10962/6838
- Description: The major contributions of this thesis are addressing the need for an objective quality evaluation of a transcription factor binding model, demonstrating the value of the tools developed to this end and elucidating how in vitro and in vivo information can be utilized to improve TF binding specificity models. Accurate elucidation of TF binding specificity remains an ongoing challenge in gene regulatory research. Several in vitro and in vivo experimental techniques have been developed followed by a proliferation of algorithms, and ultimately, the binding models. This increase led to a choice problem for the end users: which tools to use, and which is the most accurate model for a given TF? Therefore, the first section of this thesis investigates the motif assessment problem: how scoring functions, choice and processing of benchmark data, and statistics used in evaluation affect motif ranking. This analysis revealed that TF motif quality assessment requires a systematic comparative analysis, and that scoring functions used have a TF-specific effect on motif ranking. These results advised the design of a Motif Assessment and Ranking Suite MARS, supported by PBM and ChIP-seq benchmark data and an extensive collection of PWM motifs. MARS implements consistency, enrichment, and scoring and classification-based motif evaluation algorithms. Transcription factor binding is also influenced and determined by contextual factors: chromatin accessibility, competition or cooperation with other TFs, cell line or condition specificity, binding locality (e.g. proximity to transcription start sites) and the shape of the binding site (DNA-shape). In vitro techniques do not capture such context; therefore, this thesis also combines PBM and DNase-seq data using a comparative k-mer enrichment approach that compares open chromatin with genome-wide prevalence, achieving a modest performance improvement when benchmarked on ChIP-seq data. Finally, since statistical and probabilistic methods cannot capture all the information that determine binding, a machine learning approach (XGBooost) was implemented to investigate how the features contribute to TF specificity and occupancy. This combinatorial approach improves the predictive ability of TF specificity models with the most predictive feature being chromatin accessibility, while the DNA-shape and conservation information all significantly improve on the baseline model of k-mer and DNase data. The results and the tools introduced in this thesis are useful for systematic comparative analysis (via MARS) and a combinatorial approach to modelling TF binding specificity, including appropriate feature engineering practices for machine learning modelling.
- Full Text:
- Date Issued: 2017
- Authors: Kibet, Caleb Kipkurui
- Date: 2017
- Subjects: Transcription factors , Transcription factors -- Data processing , Motif Assessment and Ranking Suite
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: vital:21185 , http://hdl.handle.net/10962/6838
- Description: The major contributions of this thesis are addressing the need for an objective quality evaluation of a transcription factor binding model, demonstrating the value of the tools developed to this end and elucidating how in vitro and in vivo information can be utilized to improve TF binding specificity models. Accurate elucidation of TF binding specificity remains an ongoing challenge in gene regulatory research. Several in vitro and in vivo experimental techniques have been developed followed by a proliferation of algorithms, and ultimately, the binding models. This increase led to a choice problem for the end users: which tools to use, and which is the most accurate model for a given TF? Therefore, the first section of this thesis investigates the motif assessment problem: how scoring functions, choice and processing of benchmark data, and statistics used in evaluation affect motif ranking. This analysis revealed that TF motif quality assessment requires a systematic comparative analysis, and that scoring functions used have a TF-specific effect on motif ranking. These results advised the design of a Motif Assessment and Ranking Suite MARS, supported by PBM and ChIP-seq benchmark data and an extensive collection of PWM motifs. MARS implements consistency, enrichment, and scoring and classification-based motif evaluation algorithms. Transcription factor binding is also influenced and determined by contextual factors: chromatin accessibility, competition or cooperation with other TFs, cell line or condition specificity, binding locality (e.g. proximity to transcription start sites) and the shape of the binding site (DNA-shape). In vitro techniques do not capture such context; therefore, this thesis also combines PBM and DNase-seq data using a comparative k-mer enrichment approach that compares open chromatin with genome-wide prevalence, achieving a modest performance improvement when benchmarked on ChIP-seq data. Finally, since statistical and probabilistic methods cannot capture all the information that determine binding, a machine learning approach (XGBooost) was implemented to investigate how the features contribute to TF specificity and occupancy. This combinatorial approach improves the predictive ability of TF specificity models with the most predictive feature being chromatin accessibility, while the DNA-shape and conservation information all significantly improve on the baseline model of k-mer and DNase data. The results and the tools introduced in this thesis are useful for systematic comparative analysis (via MARS) and a combinatorial approach to modelling TF binding specificity, including appropriate feature engineering practices for machine learning modelling.
- Full Text:
- Date Issued: 2017