Suspicious activity reports: Enhancing the detection of terrorist financing and suspicious transactions in migrant remittances
- Authors: Mbiva, Stanley Munamato
- Date: 2024-10-11
- Subjects: Migrant remittances , Terrorism financing , Machine learning , Outliers (Statistics) , Anomaly detection (Computer security) , Unsupervised learning
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/465058 , vital:76569
- Description: Migrant remittances have become an important factor in poverty alleviation and microeconomic development in low-income nations. Global migrant remittances are expected to exceed US $630 billion by 2023, according to the World Bank. In addition to offering an alternate source of income that supplements the recipient’s household earnings, they are less likely to be affected by global economic downturns, ensuring stability and a consistent stream of revenue. However, the ease of global migrant remittance financial transfers has attracted the risk of being abused by terrorist organizations to quickly move and conceal operating cash, hence facilitating terrorist financing. This study aims to develop an unsupervised machine-learning model capable of detecting suspicious financial transactions associated with terrorist financing in migrant remittances. The data used in this study came from a World Bank survey of migrant remitters in Belgium. To understand the natural structures and grouping in the dataset, agglomerative hierarchical clustering and k-prototype clustering techniques were employed. This established the number of clusters present in the dataset making it possible to compare individual migrant remittances in the dataset with their peers. A Structural Equation Model (SEM) and an Local Outlier Factor - Isolation Forest (LOF-IF) algorithm were applied to analyze and detect suspicious transactions in the dataset. A traditional Rule-Based Method (RBM) was also created as a benchmark algorithm that evaluates model performance. The results show that the SEM model classifies a significantly high number of transactions as suspicious, making it prone to detecting false positives. Finally, the study applied the proposed ensemble outlier detection model to detect suspicious transactions in the same data set. The proposed ensemble model utilized an Isolation Forest (IF) for pruning and a Local Outlier Factor (LOF) to detect local outliers. The model performed exceptionally well, being able to detect over 90% of suspicious transactions in the testing data set during model cross-validation. , Thesis (MSc) -- Faculty of Science, Statistics, 2024
- Full Text:
- Date Issued: 2024-10-11
- Authors: Mbiva, Stanley Munamato
- Date: 2024-10-11
- Subjects: Migrant remittances , Terrorism financing , Machine learning , Outliers (Statistics) , Anomaly detection (Computer security) , Unsupervised learning
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/465058 , vital:76569
- Description: Migrant remittances have become an important factor in poverty alleviation and microeconomic development in low-income nations. Global migrant remittances are expected to exceed US $630 billion by 2023, according to the World Bank. In addition to offering an alternate source of income that supplements the recipient’s household earnings, they are less likely to be affected by global economic downturns, ensuring stability and a consistent stream of revenue. However, the ease of global migrant remittance financial transfers has attracted the risk of being abused by terrorist organizations to quickly move and conceal operating cash, hence facilitating terrorist financing. This study aims to develop an unsupervised machine-learning model capable of detecting suspicious financial transactions associated with terrorist financing in migrant remittances. The data used in this study came from a World Bank survey of migrant remitters in Belgium. To understand the natural structures and grouping in the dataset, agglomerative hierarchical clustering and k-prototype clustering techniques were employed. This established the number of clusters present in the dataset making it possible to compare individual migrant remittances in the dataset with their peers. A Structural Equation Model (SEM) and an Local Outlier Factor - Isolation Forest (LOF-IF) algorithm were applied to analyze and detect suspicious transactions in the dataset. A traditional Rule-Based Method (RBM) was also created as a benchmark algorithm that evaluates model performance. The results show that the SEM model classifies a significantly high number of transactions as suspicious, making it prone to detecting false positives. Finally, the study applied the proposed ensemble outlier detection model to detect suspicious transactions in the same data set. The proposed ensemble model utilized an Isolation Forest (IF) for pruning and a Local Outlier Factor (LOF) to detect local outliers. The model performed exceptionally well, being able to detect over 90% of suspicious transactions in the testing data set during model cross-validation. , Thesis (MSc) -- Faculty of Science, Statistics, 2024
- Full Text:
- Date Issued: 2024-10-11
Natural Language Processing with machine learning for anomaly detection on system call logs
- Authors: Goosen, Christo
- Date: 2023-10-13
- Subjects: Natural language processing (Computer science) , Machine learning , Information security , Anomaly detection (Computer security) , Host-based intrusion detection system
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/424699 , vital:72176
- Description: Host intrusion detection systems and machine learning have been studied for many years especially on datasets like KDD99. Current research and systems are focused on low training and processing complex problems such as system call returns, which lack the system call arguments and potential traces of exploits run against a system. With respect to malware and vulnerabilities, signatures are relied upon, and the potential for natural language processing of the resulting logs and system call traces needs further experimentation. This research looks at unstructured raw system call traces from x86_64 bit GNU Linux operating systems with natural language processing and supervised and unsupervised machine learning techniques to identify current and unseen threats. The research explores whether these tools are within the skill set of information security professionals, or require data science professionals. The research makes use of an academic and modern system call dataset from Leipzig University and applies two machine learning models based on decision trees. Random Forest as the supervised algorithm is compared to the unsupervised Isolation Forest algorithm for this research, with each experiment repeated after hyper-parameter tuning. The research finds conclusive evidence that the Isolation Forest Tree algorithm is effective, when paired with a Principal Component Analysis, in identifying anomalies in the modern Leipzig Intrusion Detection Data Set (LID-DS) dataset combined with samples of executed malware from the Virus Total Academic dataset. The base or default model parameters produce sub-optimal results, whereas using a hyper-parameter tuning technique increases the accuracy to within promising levels for anomaly and potential zero day detection. , Thesis (MSc) -- Faculty of Science, Computer Science, 2023
- Full Text:
- Date Issued: 2023-10-13
- Authors: Goosen, Christo
- Date: 2023-10-13
- Subjects: Natural language processing (Computer science) , Machine learning , Information security , Anomaly detection (Computer security) , Host-based intrusion detection system
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/424699 , vital:72176
- Description: Host intrusion detection systems and machine learning have been studied for many years especially on datasets like KDD99. Current research and systems are focused on low training and processing complex problems such as system call returns, which lack the system call arguments and potential traces of exploits run against a system. With respect to malware and vulnerabilities, signatures are relied upon, and the potential for natural language processing of the resulting logs and system call traces needs further experimentation. This research looks at unstructured raw system call traces from x86_64 bit GNU Linux operating systems with natural language processing and supervised and unsupervised machine learning techniques to identify current and unseen threats. The research explores whether these tools are within the skill set of information security professionals, or require data science professionals. The research makes use of an academic and modern system call dataset from Leipzig University and applies two machine learning models based on decision trees. Random Forest as the supervised algorithm is compared to the unsupervised Isolation Forest algorithm for this research, with each experiment repeated after hyper-parameter tuning. The research finds conclusive evidence that the Isolation Forest Tree algorithm is effective, when paired with a Principal Component Analysis, in identifying anomalies in the modern Leipzig Intrusion Detection Data Set (LID-DS) dataset combined with samples of executed malware from the Virus Total Academic dataset. The base or default model parameters produce sub-optimal results, whereas using a hyper-parameter tuning technique increases the accuracy to within promising levels for anomaly and potential zero day detection. , Thesis (MSc) -- Faculty of Science, Computer Science, 2023
- Full Text:
- Date Issued: 2023-10-13
- «
- ‹
- 1
- ›
- »