Introduction
This chapter introduces the dissertation by describing its context, the identified challenges, how the chosen challenge was met, the achieved impact, relevant publications, and how the thesis is structured. 1.1 Context In science data is the essential focal point in todays computational and quantitative approaches to scientific knowledge gain. Computational simulations enable far reaching explorations of modeled realities while quantitative methods gather data to improve the understanding of observed phenomena. These methods are increasingly viable only via high-end storage and large-scale High Performance Computing resources with individual requirements dramatically rising. Data throughputs involve gigabytes per second continuously, volumes are of petabyte magnitude, continuous files per second rates are in the double-digit range, and a vast universe of complex data representations exists. The great potential of such data is evident by the current trend of Big Data in science that aims at large-scale information extraction to foster scientific discoveries. This is fundamentally enabled by intelligently handling data and by combining a large variety of information technology methods to so-called data life cycles. In principle, these consist of data sources, systems to manage data as well as compute resources, methods for access rights management, utilization interfaces and data sinks. Scientists are naturally focused on their particular research. Thus, metadata is an essential step forward in the efficiency of use as it enables managing data based on its content instead of location. Via specific data life cycles scientists are freed from the necessity to extensively deal with IT infrastructures while still utilizing them to drive their research by handling their extensive data and computing demands. In this complex technological environment, a plethora of significant challenges presents itself that hinders the advancement of the state-of-the-art in data-driven knowledge gain. 1.2 Challenges Vital challenges in managing data life cycles are manifold. Federated authentication and authorization infrastructures need to be integrated while being mindful of the overall resilience of increasingly complex data life cycles. The increasing numbers of files and data amounts need to be managed by Big Data systems. These in turn need to be efficiently integrated with High Performance Computing resources for analysis which signifies the need for advanced interoperability. Besides automated pre- and postprocessing, the user-friendly creation, and execution of workflows to encapsulate complex analysis procedures need to be supported. Integrated scientific environments need to be provided that hide the underlying complexity while enabling that use. Essential is also the building of trust that an infrastructure delivers 6 1. INTRODUCTION what it promises. Closely connected is moving from a fixed-term build up phase to a sustainable operation phase. As these goals are partly opposing to each other, a effective balance between them needs to be developed for each data life cycle. The dissertation focuses on the major challenge of the organization of large numbers of files in the million range using information about data, so-called metadata. Currently, solutions are often either use case specific or lacking completely, thus, preventing easy access and re-use. Without metadata, users have to remember where an individual file is located. With a large number of files this is inefficient if not impossible. This especially holds true for Big Data use cases with a large number of files with complex content and stored in distributed locations. Currently, significant efforts need to be made to implement even narrowly applicable and pragmatic metadata handling solutions for every new scientific experiment.
ABSTRACT
The study investigated planning and utilization of school plant and students’ academic performance in sel...
Abstract
Airline booking and ticketing in a commercial airline company is a project work aimed at creating a system that will enable...
THE IMPACT OF DECISION-MAKING SKILLS ON ADMINISTRATIVE PERFORMANCE
Abstract: This stud...
Introduction
The transportation community has experienced the beginnings of a cultural shift toward embedding transporta...
ABSTRACT
This study was intended to evaluate the impact of capital budgeting on organizational performance. This study w...
Background to the Study
According to Osemene O.F. (2012), many organizations in Nigeria are established...
ABSTRACT: This study Investigated the Benefits of Early Childhood Education on Cybersecurity Awareness. The...
ABSTRACT
Some locally available fruits (garden egg, orange, banana, guava, avocado, pawpaw and pineapple).sold in Oye Emene Enugu, Enugu...
This study was carried out to examine the influence of inadequate information technology on academic performance of OTM students using Federa...
Background to the study
The effect of health on worker‘s productivity suggests a relationship bet...