Design patterns and software techniques for large-scale, open and reproducible data reduction

Molenaar, Gijs Jan

Title: Design patterns and software techniques for large-scale, open and reproducible data reduction
Creator: Molenaar, Gijs Jan
Subject: Radio astronomy -- Data processing
Subject: Radio astronomy -- Data processing -- Software
Subject: Radio astronomy -- South Africa
Subject: ASTRODECONV2019 dataset
Subject: Radio telescopes -- South Africa
Subject: KERN (omputer software)
Date Issued: 2021
Date: 2021
Type: text
Type: Thesis
Type: Doctoral
Type: PhD
Identifier: http://hdl.handle.net/10962/172169
Identifier: vital:42172
Identifier: 10.21504/10962/172169
Description: The preparation for the construction of the Square Kilometre Array, and the introduction of its operational precursors, such as LOFAR and MeerKAT, mark the beginning of an exciting era for astronomy. Impressive new data containing valuable science just waiting for discovery is already being generated, and these devices will produce far more data than has ever been collected before. However, with every new data instrument, the data rates grow to unprecedented quantities of data, requiring novel new data-processing tools. In addition, creating science grade data from the raw data still requires significant expert knowledge for processing this data. The software used is often developed by a scientist who lacks proper training in software development skills, resulting in the software not progressing beyond a prototype stage in quality. In the first chapter, we explore various organisational and technical approaches to address these issues by providing a historical overview of the development of radioastronomy pipelines since the inception of the field in the 1940s. In that, the steps required to create a radio image are investigated. We used the lessons-learned to identify patterns in the challenges experienced, and the solutions created to address these over the years. The second chapter describes the mathematical foundations that are essential for radio imaging. In the third chapter, we discuss the production of the KERN Linux distribution, which is a set of software packages containing most radio astronomy software currently in use. Considerable effort was put into making sure that the contained software installs appropriately, all items next to one other on the same system. Where required and possible, bugs and portability fixes were solved and reported with the upstream maintainers. The KERN project also has a website, and issue tracker, where users can report bugs and maintainers can coordinate the packaging effort and new releases. The software packages can be used inside Docker and Singularity containers, enabling the installation of these packages on a wide variety of platforms. In the fourth and fifth chapters, we discuss methods and frameworks for combining the available data reduction tools into recomposable pipelines and introduce the Kliko specification and software. This framework was created to enable end-user astronomers to chain and containerise operations of software in KERN packages. Next, we discuss the Common Workflow Language (CommonWL), a similar but more advanced and mature pipeline framework invented by bio-informatics scientists. CommonWL is supported by a wide range of tools already; among other schedulers, visualisers and editors. Consequently, when a pipeline is made with CommonWL, it can be deployed and manipulated with a wide range of tools. In the final chapter, we attempt something unconventional, applying a generative adversarial network based on deep learning techniques to perform the task of sky brightness reconstruction. Since deep learning methods often require a large number of training samples, we constructed a CommonWL simulation pipeline for creating dirty images and corresponding sky models. This simulated dataset has been made publicly available as the ASTRODECONV2019 dataset. It is shown that this method is useful to perform the restoration and matches the performance of a single clean cycle. In addition, we incorporated domain knowledge by adding the point spread function to the network and by utilising a custom loss function during training. Although it was not possible to improve the cleaning performance of commonly used existing tools, the computational time performance of the approach looks very promising. We suggest that a smaller scope should be the starting point for further studies and optimising of the training of the neural network could produce the desired results.
Format: 134 pages
Format: pdf
Publisher: Rhodes University
Publisher: Faculty of Science, Physics and Electronics
Language: English
Rights: Molenaar, Gijs Jan

Hits: 6811
Visitors: 7045
Downloads: 681

Collections

RU Department of Physics and Electronics

		Thumbnail	File	Description	Size	Format
View Details Download			SOURCE1	MOLENAAR-PHD-TR21-49.pdf	2 MB	Adobe Acrobat PDF	View Details Download