- Title
- An information extraction model for recommending the most applied case
- Creator
- Padayachy, Thashen
- Subject
- Information technology
- Subject
- Information storage and retrieval systems System design
- Date Issued
- 2019
- Date
- 2019
- Type
- Thesis
- Type
- Masters
- Type
- MSc
- Identifier
- http://hdl.handle.net/10948/43325
- Identifier
- vital:36794
- Description
- The amount of information produced by different domains is constantly increasing. One domain that particularly produces large amounts of information is the legal domain, where information is mainly used for research purposes. However, too much time is spent by legal researchers on searching for useful information. Information is found by using special search engines or by consulting hard copies of legal literature. The main research question that this study addressed is “What techniques can be incorporated into a model that recommends the most applied case for a field of law?”. The Design Science Research (DSR) methodology was used to address the research objectives. The model developed is the theoretical contribution produced from following the DSR methodology. A case study organisation, called LexisNexis, was to help investigate the real-world problem. The initial investigation into the real-world problem revealed that too much time is spent on searching for the Most Applied Case (MAC) and no formal or automated processes were used. An analysis of an informal process followed by legal researchers enabled the identification of different concepts that could be combined to create a prescriptive model to recommend the MAC. A critical analysis of the literature was conducted to obtain a better understanding of the legal domain and the techniques that can be applied to assist with problems faced in this domain, related to information retrieval and extraction. This resulted in the creation of an IE Model based only on theory. Questionnaires were sent to experts to obtain a further understanding of the legal domain, highlight problems faced, and identify which attributes of a legal case can be used to help recommend the MAC. During the Design and Development activity of the DSR methodology, a prescriptive MAC Model for recommending the MAC was created based on findings from the literature review and questionnaires. The MAC Model consists of processes concerning: Information retrieval (IR); Information extraction (IE); Information storage; and Query-independent ranking. Analysis of IR and IE helped to identify problems experienced when processing text. Furthermore, appropriate techniques and algorithms were identified that can process legal documents and extract specific facts. The extracted facts were then further processed to allow for storage and processing by query-independent ranking algorithms. The processes incorporated into the model were then used to create a proof-of-concept prototype called the IE Prototype. The IE Prototype implements two processes called the IE process and the Database process. The IE process analyses different sections of a legal case to extract specific facts. The Database process then ensures that the extracted facts are stored in a document database for future querying purposes. The IE Prototype was evaluated using the technical risk and efficacy strategy from the Framework for Evaluation of Design Science. Both formative and summative evaluations were conducted. Formative evaluations were conducted to identify functional issues of the prototype whilst summative evaluations made use of real-world legal cases to test the prototype. Multiple experiments were conducted on legal cases, known as source cases, that resulted in facts from the source cases being extracted. For the purpose of the experiments, the term “source case” was used to distinguish between a legal case in its entirety and a legal case’s list of cases referred to. Two types of NoSQL databases were investigated for implementation namely, a graph database and a document database. Setting up the graph database required little time. However, development issues prevented the graph database from being successfully implemented in the proof-of-concept prototype. A document database was successfully implemented as an alternative for the proof-of-concept prototype. Analysis of the source cases used to evaluate the IE Prototype revealed that 96% of the source cases were categorised as being partially extracted. The results also revealed that the IE Prototype was capable of processing large amounts of source cases at a given time.
- Format
- xi, 161 leaves
- Format
- Publisher
- Nelson Mandela University
- Publisher
- Faculty of Science
- Language
- English
- Rights
- Nelson Mandela University
- Hits: 1026
- Visitors: 1145
- Downloads: 143
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | PadayachyTM_dissertation final submission April 2019.pdf | 4 MB | Adobe Acrobat PDF | View Details Download |