[23] | Harald Richter, Richard Kleber, Hermann Hellwagner, Cost-Efficient SCI-based Banyan Networks, In Proceedings of the High Performance Computing Symposium (A N, ed.), N, A, N, A, pp. -, 1998.
[bib] [pdf] |
[22] | Michael Gerndt, Hermann Hellwagner, Implementing Automatic Coordination on Networks of Workstations, IEEE Computer Society, N, A, pp. 10, 1998.
[bib] |
[21] | Michael Eberl, Hermann Hellwagner, Bjarne Geir Herland, Common Messaging Layer for MPI and PVM over SCI, In Proceedings of HPCN-Europe 98 (Peter Sloot, Marian Bubak, Bob Hertzberger, eds.), Springer Verlag, NA, pp. 576-587, 1998.
[bib] [abstract]
Abstract: This paper describes the design of a common message passing layer for implementing both MPI and PVM over the SCI interconnect in a workstation or PC cluster. The design is focused at obtaining low latency. The message layer encapsulates all necessary knowledge of the underlying interconnect and operating system. Yet, we claim that it can be used to implement such different message passing libraries as MPI and PVM without sacrificing efficiency. Initial results obtained from using the message layer in SCI clusters are presented.
|
[20] | Wolfgang Mayerle, Hermann Hellwagner, Konzepte und funktionaler Vergleich von Thread-Systemen (2), In Praxis der Informationsverarbeitung und Kommunikation, Spani, vol. 20, no. 4, Mannheim, Germany, pp. 225-229, 1997.
[bib] |
[19] | Wolfgang Mayerle, Hermann Hellwagner, Konzepte und funktionaler Vergleich von Thread-Systemen (1), In Praxis der Informationsverarbeitung und Kommunikation, Spaniol, Otto, vol. 20, Mannheim, Germany, pp. 164-174, 1997.
[bib] [pdf] [abstract]
Abstract: Dieses Papier gibt eine allgemeine Einführung in Threads und vergleicht einige derzeit für Arbeitsplatzrechner erhältliche Thread-Systeme. Aufbauend auf einer Motivation und grundlegenden Erläuterung des Thread-Konzepts werden wichtige Aspekte und Probleme von Thread-Bibliotheken vorgestellt. Nach einigen Hinweisen zur Programmierung mit Threads werden mehrere Implementierungen einander gegenübergestellt.
|
[18] | Hermann Hellwagner, Arbeitsspeicher- und Bussysteme, In Informatik-Handbuch (Peter Rechenberg, Gustav Pomberger, eds.), Carl Hanser Verlag, München, pp. 239-255, 1997.
[bib] |
[17] | Hermann Hellwagner, Wolfgang Karl, Markus Leberecht, Enabling a PC Cluster for High-Performance Computing, In Speedup Journal, Proceedings, 21st Workshop, March 13-14, 1997, Cadro-Lugano, N, A, vol. Vol. 11, no. 1, N, A, pp. 18-23, 1997.
[bib] [pdf] [abstract]
Abstract: Due to their excellent cost/performance ratio, clusters of PCs can be attractive high-performance computing (HPC) platforms. Yet, their limited communication performance over standard LANs is still prohibitive for parallel applications. The project "Shared Memory in a LAN-like Environment" (SMiLE) at LRR-TUM adopts Scalable Coherent Interface (SCI) interconnect technology to build, and provide software for, a PC cluster which, with hardware-based distributed shared memory (DSM) and high-performance communication characteristics, is regarded as well suited for HPC. The paper describes the key features of the enabling technology, SCI. It then discusses the developments and important results of the SMiLE project so far: the development and initial performance of a PCI/SCI interface card, and the design and initial performance results of low-latency communication layers, Active Messages and a sockets emulation library.
|
[16] | Hermann Hellwagner, High-Level Programming Models and Supportive Environments (HIPS'97), IEEE Computer Society, Los Alamitos, CA, pp. 900, 1997.
[bib] |
[15] | Hermann Hellwagner, Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments, IEEE, N, A, pp. 135, 1997.
[bib] |
[14] | Hermann Hellwagner, Wolfgang Karl, Markus Leberecht, Fast Communication Mechanisms--Coupling Hardware Distributed Shared Memory and User-Level Messaging, In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1997: PDPTA (Hamid R Arabnia, ed.), CSREA Press, Las Vegas, Nevada, USA, pp. 8, 1997.
[bib] [pdf] [abstract]
Abstract: Low latencies for small messages are an important factor of efficient fine-grained parallel computation. The Active Messages concept provides this minimal overhead by eliminating certain parts of the critical path of sending and receiving messages, that is the context switch into the operating system kernel when using user-mode I/O, and multiple buffering in the network layer. Hardware-supported distributed shared memory (DSM) architectures exhibit various properties that make them particularly useful for an implementation of the aforementioned messaging mechanisms. This paper thus describes the concept, implementation, and the performance of a DSM-based Active Messages layer.
|
[13] | Michael Eberl, Hermann Hellwagner, Bjarne Geir Herland, Martin Schulz, SISCI - Implementing a Standard Software Infrastructure on an SCI Cluster, In Tagungsband zum 1. Workshop Cluster Computing (Wolfgang Rehm, ed.), N, A, N, A, pp. 49-61, 1997.
[bib] [pdf] [abstract]
Abstract: To enable the efficient utilization of clusters of workstations it is crucial to develop a stable and rich software infrastructure. The ESPRIT Project SISCI will provide two widely used message-passing interfaces, MPI and PVM, as well as a POSIX compliant, distributed thread package (Pthreads) on multiple SCI-based clusters. This paper features motivation and background on this projects as well as details of the two core components: the common messaging layer and the Pthreads package.
|
[12] | Michael Eberl, Hermann Hellwagner, Wolfgang Karl, Markus Leberecht, Sicherheit und Effizienz in einer Active-Message-Kommunikationsschicht, In Architektur von Rechensystemen: Arbeitsteilige Systemarchitekturen - Konzepte, Lösungen, Anwendungen, Trends (ARCS´97) (Djamshid Tavangarian, ed.), VDE Verlag, N, A, pp. -, 1997.
[bib] [pdf] [abstract]
Abstract: Active Messages haben sich als effizientes Kommunikationsverfahren insbesondere auf Kommunikationstechnologien durchgesetzt, die einen direkten Zugriff des Benutzers ohne Intervention des Betriebssystems zulassen. Als Nachteil der leichtgewichtigen Kommunikation erwiesen sich jedoch die nicht ausreichenden Schutzmechanismen, vor allem bei der Verwendung mehrerer Prozesse, die sich gleichzeitig einer Active-Message-Bibliothek bedienen. Die Spezifikation 2.0 der Berkeley-Active-Messages unternimmt nun den Versuch, f¨ur das bekannte und schnelle Kommunikationsverfahren Schutzabstraktionen vorzusehen. Im Rahmen dieser Arbeit wird die Implementation eines solchen Active-Message-Layers der Version 2.0 auf einem Cluster von SCI-gekoppelten Arbeitsplatzrechnern beschrieben. Wir k¨onnen zeigen, daß die zus¨atzlichen Schutzmechanismen nur wenig Einfluß auf die Leistung haben und somit der Vorteil der Active Messages, die leichtgewichtige, feingranulare Kommunikation, erhalten bleibt.
|
[11] | Michael Eberl, Hermann Hellwagner, Wolfgang Karl, Markus Leberecht, Sicherheit und Effizienz in einer Active Message-Kommunikationsschicht, In Architektur von Rechensystemen, Arbeitsteilige Systemarchitekturen: Konzepte, Lösungen, Anwendungen, Trends - Vorträge der 14. ITG/GI-Fachtagung ARCS '97 (Djamshid Tavangarian, ed.), VDE Verlag, N, A, pp. 211-220, 1997.
[bib] [abstract]
Abstract: Active Messages haben sich als effizientes Kommunikationsverfahren insbesondere auf Kommunikationstechnologien durchgesetzt, die einen direkten Zugriff des Benutzers ohne Intervention des Betriebssystems zulassen. Als Nachteil der leichtgewichtigen Kommunikation erwiesen sich jedoch die nicht ausreichenden Schutzmechanismen, vor allem bei der Verwendung mehrerer Prozesse, die sich gleichzeitig einer Active-Message- Bibliothek bedienen. Die Spezifikation 2.0 der Berkeley-Active-Messages unternimmt nun den Versuch, für das bekannte und schnelle Kommunikationsverfahren Schutzabstraktionen vorzusehen. Im Rahmen dieser Arbeit wird die Implementation eines solchen Active- Message-Layers der Version 2.0 auf einem Cluster von SCI-gekoppelten Arbeitsplatzrechnern beschreiben. Wir können zeigen, daß die zusätzlichen Schutzmechanismen nur wenig Einfluß auf die Leistung haben und somit der Vorteil der Active Messages, die leichtgewichtige, feingranulare Kommunikation, erhalten bleibt.
|
[10] | Georg Acher, Hermann Hellwagner, Wolfgang Karl, Markus Leberecht, Eine PCI-SCI-Adapterkarte für ein PC-Cluster mit verteiltem gemeinsamen Speicher, In Arbeitsplatz-Rechensysteme: Anwendungen, Architekturen, Betriebssysteme und Netzwerke (A N, ed.), N, A, N, A, pp. -, 1997.
[bib] |
[9] | Hermann Hellwagner, Ivan Zoraja, Vaidy Sunderam, PVM Data Transfers on SCI Workstation Clusters, In Proceedings PVM User Group Meeting (Arndt Bode, Jack Dongarra, Thomas Ludwig, Vaidy Sunderam, eds.), Springer, N, A, pp. -, 1996.
[bib] |
[8] | Günter Böckle, Hermann Hellwagner, Roland Lepold, Gerd Sandweg, Burghardt Schallenberger, Raimar Thudt, Stefan and Wallstab, Structured Evaluation of Computer Systems, In IEEE Computer Society, N, A, vol. Vol. 29, no. No 6, N, A, pp. 45-51, 1996.
[bib] [doi] [pdf] [abstract]
Abstract: Evaluating computers and other systems is difficult for a couple of reasons. First, the goal of evaluation is typically ill-defined: customers, sometimes even designers, either don't know or can't specify exactly what result they expect. Often, they don't specify the architectural variants to consider, and often the metrics and workload they expect you to use are ill-defined. Second, they rarely clarify which kind of model and evaluation method best suit the evaluation problem. These problems have consequences. For one thing, the decision-maker may not trust the evaluation. For another, poor planning means the evaluation cannot be reproduced if any of the parameters are changed slightly. Finally, the evaluation documentation is usually inadequate, and so some time after the evaluation you might ask yourself, how did I come to that conclusion? An approach developed at Siemens makes decisions explicit and the process reproducible
|
[7] | Arndt Bode, Michael Gerndt, R Hackenberg, Hermann Hellwagner, High-Level Programming Models and Supportive Environments (HIPS´96), In Proceedings of IPPS '96, The 10th International Parallel Processing Symposium (A N, ed.), IEEE Computer Society, N, A, pp. -, 1996.
[bib] |
[6] | Arndt Bode, Michael Gerndt, R G Hackenberg, Hermann Hellwagner, Proceedings First International Workshop on High-Level Parallel Programming Models and Supportive Environments, IEEE Computer Society Press, NA, pp. 128, 1996.
[bib] |
[5] | Günter Böckle, Hermann Hellwagner, Systematic Assessment of Computer Systems Architectures, In Innovationen bei Rechen- und Kommunikationssystemen, Eine Herausforderung für die Informatik (Bernd E Wolfinger, ed.), Springer Verlag, N, A, pp. 310-317, 1994.
[bib] |
[4] | Hermann Hellwagner, Randomized Shared Memory - Concept and Efficiency of a Scalable Shared Memory Scheme, In Parallel Computer Architectures: Theory, Hardware, Software, Applications (Bode Arndt, Mario Dal Cin, eds.), Springer Verlag, London, UK, pp. 102-117, 1993.
[bib] [abstract]
Abstract: Our work explores the practical relevance of Randomized Shared Memory (RSM), a theoretical concept that has been proven to enable an (asymptotically) optimally efficient implementation of scalable and universal shared memory in a distributed-memory parallel system. RSM (address hashing) pseudo-randomly distributes global memory addresses throughout the nodes' local memories. High memory access latencies are masked through massive parallelism. This paper introduces the basic principles and properties of RSM and analyzes its practical efficiency in terms of constant factors through simulation studies, assuming a state-of-the-art parallel architecture. Bottlenecks in the architecture are pointed out, and improvements are being made and their effects assessed quantitatively. The results show that RSM efficiency is encouragingly high, even in a non-optimized architecture. We propose architectural features to support RSM and conclude that RSM may indeed be a feasible shared-memory implementation in future massively parallel computers.
|
[3] | Hermann Hellwagner, Design Considerations for Scalable Parallel File Systems, In The Computer Journal - Parallel Processing, N, A, vol. Vol. 36, no. 8, N, A, pp. 741-755, 1993.
[bib] [pdf] [abstract]
Abstract: This paper addresses the problem of providing high-performance disk I/O in massively parallel computers. Resolving the fundamental I/O bottleneck in parallel architectures involves both hardware and software issues. We review previous work on disk arrays and I/O architectures aimed at providing highly parallel disk I/O subsystems. We then focus on the requirements and design of parallel file systems (PFSs) which are responsible to make the parallelism offered by the hardware and a declustered file organization available to application programs. We present the design strategy and key concepts of a general-purpose file system for a parallel computer with scalable distributed shared memory. The principal objectives of the PFS are to fully exploit the parallelism inherent among and within file accesses, and to provide scalable I/O performance. The machine model underlying the design is described, with and emphasis on the innovative architectural features supporting scalability of the shared memory. Starting from a classification of various scenarios of concurrent I/O requests, the features of the PFS design essential for achieving the goals are described and justified. It is argued that the inter- and intra-request parallelism of the I/O load can indeed be effectively exploited and supported by the parallel system resources. Scalability of I/O performance and of the PFS software can be ensured by avoiding serial bottlenecks through the use of the powerful architectural features.
|
[2] | Hermann Hellwagner, On the Practical Efficiency of Randomized Shared Memory, In Parallel Processing: CONPAR 92 - VAPP V, Second Joint International Conference on Vector and Parallel Processing (Luc Bougé, Michel Cosnard, Yves Robert, Denis Trystram, eds.), Springer, Berlin-Heidelberg, pp. 429-440, 1992.
[bib] [abstract]
Abstract: This paper analyzes the efficiency of Randomized Shared Memory (RSM) in terms of constant factors. RSM or memory hashing, that is, pseudorandom distribution of global memory addresses throughout local memories in a distributed-memory parallel system, has been proven to enable an (asymptotically) optimally efficient implementation of scalable and universal shared memory. High memory access latencies are hidden through massive parallelism. Our work examines the practical relevance and feasibility of this potentially significant theoretical result. After an introduction of the background, principles, and desirable properties of RSM and an outline of the approach to determine RSM efficiency, the major results of our simulations are presented. The results show that RSM efficiency is encouragingly high (up to 20% efficiency of idealized shared memory), even in an architecture modelled on the basis of state-of-the-art technology. Performance-limiting factors are identified from the results and architectural features to increase efficiency are proposed, most notably extremely fast process switching and a combining network. Several novel machine designs document the increased interest in RSM and hardware support.
|
[1] | Hermann Hellwagner, ed., Systolische Architekturen für die Verallgemeinerte Diskrete Fourier-Transformation, VWGO, Wien, pp. 150, 1989.
[bib] |