Investigating tools and techniques for improving software performance on multiprocessor computer systems
- Authors: Tristram, Waide Barrington
- Date: 2012
- Subjects: Multiprocessors , Multiprogramming (Electronic computers) , Parallel programming (Computer science) , Linux , Abstract data types (Computer science) , Threads (Computer programs) , Computer programming
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4655 , http://hdl.handle.net/10962/d1006651 , Multiprocessors , Multiprogramming (Electronic computers) , Parallel programming (Computer science) , Linux , Abstract data types (Computer science) , Threads (Computer programs) , Computer programming
- Description: The availability of modern commodity multicore processors and multiprocessor computer systems has resulted in the widespread adoption of parallel computers in a variety of environments, ranging from the home to workstation and server environments in particular. Unfortunately, parallel programming is harder and requires more expertise than the traditional sequential programming model. The variety of tools and parallel programming models available to the programmer further complicates the issue. The primary goal of this research was to identify and describe a selection of parallel programming tools and techniques to aid novice parallel programmers in the process of developing efficient parallel C/C++ programs for the Linux platform. This was achieved by highlighting and describing the key concepts and hardware factors that affect parallel programming, providing a brief survey of commonly available software development tools and parallel programming models and libraries, and presenting structured approaches to software performance tuning and parallel programming. Finally, the performance of several parallel programming models and libraries was investigated, along with the programming effort required to implement solutions using the respective models. A quantitative research methodology was applied to the investigation of the performance and programming effort associated with the selected parallel programming models and libraries, which included automatic parallelisation by the compiler, Boost Threads, Cilk Plus, OpenMP, POSIX threads (Pthreads), and Threading Building Blocks (TBB). Additionally, the performance of the GNU C/C++ and Intel C/C++ compilers was examined. The results revealed that the choice of parallel programming model or library is dependent on the type of problem being solved and that there is no overall best choice for all classes of problem. However, the results also indicate that parallel programming models with higher levels of abstraction require less programming effort and provide similar performance compared to explicit threading models. The principle conclusion was that the problem analysis and parallel design are an important factor in the selection of the parallel programming model and tools, but that models with higher levels of abstractions, such as OpenMP and Threading Building Blocks, are favoured.
- Full Text:
- Date Issued: 2012
- Authors: Tristram, Waide Barrington
- Date: 2012
- Subjects: Multiprocessors , Multiprogramming (Electronic computers) , Parallel programming (Computer science) , Linux , Abstract data types (Computer science) , Threads (Computer programs) , Computer programming
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4655 , http://hdl.handle.net/10962/d1006651 , Multiprocessors , Multiprogramming (Electronic computers) , Parallel programming (Computer science) , Linux , Abstract data types (Computer science) , Threads (Computer programs) , Computer programming
- Description: The availability of modern commodity multicore processors and multiprocessor computer systems has resulted in the widespread adoption of parallel computers in a variety of environments, ranging from the home to workstation and server environments in particular. Unfortunately, parallel programming is harder and requires more expertise than the traditional sequential programming model. The variety of tools and parallel programming models available to the programmer further complicates the issue. The primary goal of this research was to identify and describe a selection of parallel programming tools and techniques to aid novice parallel programmers in the process of developing efficient parallel C/C++ programs for the Linux platform. This was achieved by highlighting and describing the key concepts and hardware factors that affect parallel programming, providing a brief survey of commonly available software development tools and parallel programming models and libraries, and presenting structured approaches to software performance tuning and parallel programming. Finally, the performance of several parallel programming models and libraries was investigated, along with the programming effort required to implement solutions using the respective models. A quantitative research methodology was applied to the investigation of the performance and programming effort associated with the selected parallel programming models and libraries, which included automatic parallelisation by the compiler, Boost Threads, Cilk Plus, OpenMP, POSIX threads (Pthreads), and Threading Building Blocks (TBB). Additionally, the performance of the GNU C/C++ and Intel C/C++ compilers was examined. The results revealed that the choice of parallel programming model or library is dependent on the type of problem being solved and that there is no overall best choice for all classes of problem. However, the results also indicate that parallel programming models with higher levels of abstraction require less programming effort and provide similar performance compared to explicit threading models. The principle conclusion was that the problem analysis and parallel design are an important factor in the selection of the parallel programming model and tools, but that models with higher levels of abstractions, such as OpenMP and Threading Building Blocks, are favoured.
- Full Text:
- Date Issued: 2012
Analyzing communication flow and process placement in Linda programs on transputers
- De-Heer-Menlah, Frederick Kofi
- Authors: De-Heer-Menlah, Frederick Kofi
- Date: 1992 , 2012-11-28
- Subjects: LINDA (Computer system) , Transputers , Parallel programming (Computer science)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4675 , http://hdl.handle.net/10962/d1006698 , LINDA (Computer system) , Transputers , Parallel programming (Computer science)
- Description: With the evolution of parallel and distributed systems, users from diverse disciplines have looked to these systems as a solution to their ever increasing needs for computer processing resources. Because parallel processing systems currently require a high level of expertise to program, many researchers are investing effort into developing programming approaches which hide some of the difficulties of parallel programming from users. Linda, is one such parallel paradigm, which is intuitive to use, and which provides a high level decoupling between distributable components of parallel programs. In Linda, efficiency becomes a concern of the implementation rather than of the programmer. There is a substantial overhead in implementing Linda, an inherently shared memory model on a distributed system. This thesis describes the compile-time analysis of tuple space interactions which reduce the run-time matching costs, and permits the distributon of the tuple space data. A language independent module which partitions the tuple space data and suggests appropriate storage schemes for the partitions so as to optimise Linda operations is presented. The thesis also discusses hiding the network topology from the user by automatically allocating Linda processes and tuple space partitons to nodes in the network of transputers. This is done by introducing a fast placement algorithm developed for Linda. , KMBT_223
- Full Text:
- Date Issued: 1992
- Authors: De-Heer-Menlah, Frederick Kofi
- Date: 1992 , 2012-11-28
- Subjects: LINDA (Computer system) , Transputers , Parallel programming (Computer science)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4675 , http://hdl.handle.net/10962/d1006698 , LINDA (Computer system) , Transputers , Parallel programming (Computer science)
- Description: With the evolution of parallel and distributed systems, users from diverse disciplines have looked to these systems as a solution to their ever increasing needs for computer processing resources. Because parallel processing systems currently require a high level of expertise to program, many researchers are investing effort into developing programming approaches which hide some of the difficulties of parallel programming from users. Linda, is one such parallel paradigm, which is intuitive to use, and which provides a high level decoupling between distributable components of parallel programs. In Linda, efficiency becomes a concern of the implementation rather than of the programmer. There is a substantial overhead in implementing Linda, an inherently shared memory model on a distributed system. This thesis describes the compile-time analysis of tuple space interactions which reduce the run-time matching costs, and permits the distributon of the tuple space data. A language independent module which partitions the tuple space data and suggests appropriate storage schemes for the partitions so as to optimise Linda operations is presented. The thesis also discusses hiding the network topology from the user by automatically allocating Linda processes and tuple space partitons to nodes in the network of transputers. This is done by introducing a fast placement algorithm developed for Linda. , KMBT_223
- Full Text:
- Date Issued: 1992
Towards a portable occam
- Authors: Hill, David Timothy
- Date: 1988 , 2013-03-07
- Subjects: occam (Computer program language) , Transputers , Parallel programming (Computer science)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4587 , http://hdl.handle.net/10962/d1004749 , occam (Computer program language) , Transputers , Parallel programming (Computer science)
- Description: Occam is designed for concurrent programming on a network of transputers. AIlocation and partitioning of the program is specified within the source code, binding the program to a specific network. An altemative approach is proposed which completely separates the source code from hardware considerations. Static allocation is performed as a separate phase and should, ideally, be automatic but at present is manual. Complete hardware abstraction requires that non-local, shared communication be provided for, introducing an efficiency overhead which can be minimised by the allocation. The proposal was implemented on a network of IBM PCs, modelled on a transputer network, and implementation issues are discussed
- Full Text:
- Date Issued: 1988
- Authors: Hill, David Timothy
- Date: 1988 , 2013-03-07
- Subjects: occam (Computer program language) , Transputers , Parallel programming (Computer science)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4587 , http://hdl.handle.net/10962/d1004749 , occam (Computer program language) , Transputers , Parallel programming (Computer science)
- Description: Occam is designed for concurrent programming on a network of transputers. AIlocation and partitioning of the program is specified within the source code, binding the program to a specific network. An altemative approach is proposed which completely separates the source code from hardware considerations. Static allocation is performed as a separate phase and should, ideally, be automatic but at present is manual. Complete hardware abstraction requires that non-local, shared communication be provided for, introducing an efficiency overhead which can be minimised by the allocation. The proposal was implemented on a network of IBM PCs, modelled on a transputer network, and implementation issues are discussed
- Full Text:
- Date Issued: 1988
Parallel process placement
- Authors: Handler, Caroline
- Date: 1989
- Subjects: Parallel programming (Computer science)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4568 , http://hdl.handle.net/10962/d1002033
- Description: This thesis investigates methods of automatic allocation of processes to available processors in a given network configuration. The research described covers the investigation of various algorithms for optimal process allocation. Among those researched were an algorithm which used a branch and bound technique, an algorithm based on graph theory, and an heuristic algorithm involving cluster analysis. These have been implemented and tested in conjunction with the gathering of performance statistics during program execution, for use in improving subsequent allocations. The system has been implemented on a network of loosely-coupled microcomputers using multi-port serial communication links to simulate a transputer network. The concurrent programming language occam has been implemented, replacing the explicit process allocation constructs with an automatic placement algorithm. This enables the source code to be completely separated from hardware considerations
- Full Text:
- Date Issued: 1989
- Authors: Handler, Caroline
- Date: 1989
- Subjects: Parallel programming (Computer science)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4568 , http://hdl.handle.net/10962/d1002033
- Description: This thesis investigates methods of automatic allocation of processes to available processors in a given network configuration. The research described covers the investigation of various algorithms for optimal process allocation. Among those researched were an algorithm which used a branch and bound technique, an algorithm based on graph theory, and an heuristic algorithm involving cluster analysis. These have been implemented and tested in conjunction with the gathering of performance statistics during program execution, for use in improving subsequent allocations. The system has been implemented on a network of loosely-coupled microcomputers using multi-port serial communication links to simulate a transputer network. The concurrent programming language occam has been implemented, replacing the explicit process allocation constructs with an automatic placement algorithm. This enables the source code to be completely separated from hardware considerations
- Full Text:
- Date Issued: 1989
Algorithmic skeletons as a method of parallel programming
- Authors: Watkins, Rees Collyer
- Date: 1993
- Subjects: Parallel programming (Computer science) , Algorithms
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4609 , http://hdl.handle.net/10962/d1004889 , Parallel programming (Computer science) , Algorithms
- Description: A new style of abstraction for program development, based on the concept of algorithmic skeletons, has been proposed in the literature. The programmer is offered a variety of independent algorithmic skeletons each of which describe the structure of a particular style of algorithm. The appropriate skeleton is used by the system to mould the solution. Parallel programs are particularly appropriate for this technique because of their complexity. This thesis investigates algorithmic skeletons as a method of hiding the complexities of parallel programming from the user, and for guiding them towards efficient solutions. To explore this approach, this thesis describes the implementation and benchmarking of the divide and conquer and task queue paradigms as skeletons. All but one category of problem, as implemented in this thesis, scale well over eight processors. The rate of speed up tails off when there are significant communication requirements. The results show that, with some user knowledge, efficient parallel programs can be developed using this method. The evaluation explores methods for fine tuning some skeleton programs to achieve increased efficiency.
- Full Text:
- Date Issued: 1993
- Authors: Watkins, Rees Collyer
- Date: 1993
- Subjects: Parallel programming (Computer science) , Algorithms
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4609 , http://hdl.handle.net/10962/d1004889 , Parallel programming (Computer science) , Algorithms
- Description: A new style of abstraction for program development, based on the concept of algorithmic skeletons, has been proposed in the literature. The programmer is offered a variety of independent algorithmic skeletons each of which describe the structure of a particular style of algorithm. The appropriate skeleton is used by the system to mould the solution. Parallel programs are particularly appropriate for this technique because of their complexity. This thesis investigates algorithmic skeletons as a method of hiding the complexities of parallel programming from the user, and for guiding them towards efficient solutions. To explore this approach, this thesis describes the implementation and benchmarking of the divide and conquer and task queue paradigms as skeletons. All but one category of problem, as implemented in this thesis, scale well over eight processors. The rate of speed up tails off when there are significant communication requirements. The results show that, with some user knowledge, efficient parallel programs can be developed using this method. The evaluation explores methods for fine tuning some skeleton programs to achieve increased efficiency.
- Full Text:
- Date Issued: 1993
The role of parallel computing in bioinformatics
- Authors: Akhurst, Timothy John
- Date: 2005
- Subjects: Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3986 , http://hdl.handle.net/10962/d1004045 , Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Description: The need to intelligibly capture, manage and analyse the ever-increasing amount of publicly available genomic data is one of the challenges facing bioinformaticians today. Such analyses are in fact impractical using uniprocessor machines, which has led to an increasing reliance on clusters of commodity-priced computers. An existing network of cheap, commodity PCs was utilised as a single computational resource for parallel computing. The performance of the cluster was investigated using a whole genome-scanning program written in the Java programming language. The TSpaces framework, based on the Linda parallel programming model, was used to parallelise the application. Maximum speedup was achieved at between 30 and 50 processors, depending on the size of the genome being scanned. Together with this, the associated significant reductions in wall-clock time suggest that both parallel computing and Java have a significant role to play in the field of bioinformatics.
- Full Text:
- Date Issued: 2005
- Authors: Akhurst, Timothy John
- Date: 2005
- Subjects: Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3986 , http://hdl.handle.net/10962/d1004045 , Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Description: The need to intelligibly capture, manage and analyse the ever-increasing amount of publicly available genomic data is one of the challenges facing bioinformaticians today. Such analyses are in fact impractical using uniprocessor machines, which has led to an increasing reliance on clusters of commodity-priced computers. An existing network of cheap, commodity PCs was utilised as a single computational resource for parallel computing. The performance of the cluster was investigated using a whole genome-scanning program written in the Java programming language. The TSpaces framework, based on the Linda parallel programming model, was used to parallelise the application. Maximum speedup was achieved at between 30 and 50 processors, depending on the size of the genome being scanned. Together with this, the associated significant reductions in wall-clock time suggest that both parallel computing and Java have a significant role to play in the field of bioinformatics.
- Full Text:
- Date Issued: 2005
A distributed Linda server on a network of heterogeneous processors
- Authors: Smith, Graham Leslie
- Date: 1993
- Subjects: LINDA (Computer system) , Parallel programming (Computer science)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4610 , http://hdl.handle.net/10962/d1004890 , LINDA (Computer system) , Parallel programming (Computer science)
- Description: Linda is an approach to parallelism which relies on a virtual associative shared memory called tuple space. Tuple space is accessed through a small set of primitive operations and is conceptually easy to understand and manipulate. The physical implementation of a Linda tuple space may of course be completely different from the conceptual model. Rhodes has implemented versions of Linda on a ring of RS-232 joined PC's and on a cluster of T800 transputers with a single copy of tuple space on one transputer. Current research targets the implementation of a distributed Linda server on a network of heterogeneous processors. This work describes the design and implementation of a distributed Linda server. Emphasis is placed on aspects of the design which enhance portability and efficiency.
- Full Text:
- Date Issued: 1993
- Authors: Smith, Graham Leslie
- Date: 1993
- Subjects: LINDA (Computer system) , Parallel programming (Computer science)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4610 , http://hdl.handle.net/10962/d1004890 , LINDA (Computer system) , Parallel programming (Computer science)
- Description: Linda is an approach to parallelism which relies on a virtual associative shared memory called tuple space. Tuple space is accessed through a small set of primitive operations and is conceptually easy to understand and manipulate. The physical implementation of a Linda tuple space may of course be completely different from the conceptual model. Rhodes has implemented versions of Linda on a ring of RS-232 joined PC's and on a cluster of T800 transputers with a single copy of tuple space on one transputer. Current research targets the implementation of a distributed Linda server on a network of heterogeneous processors. This work describes the design and implementation of a distributed Linda server. Emphasis is placed on aspects of the design which enhance portability and efficiency.
- Full Text:
- Date Issued: 1993
- «
- ‹
- 1
- ›
- »