[31] | Matthias Ohlenroth, Hermann Hellwagner, RTP Packetization of MPEG-4 Elementary Streams, In ICME Proceedings (IEEE, ed.), IEEE Xplore, NA, pp. 1-4, 2002.
[bib][url] [pdf] [abstract]
Abstract: Multimedia streaming becomes ever more popular. The multimedia standard MPEG-4 has been designed to support scenes of different levels of complexity and applications with low bandwidth requirements up to very high bandwidth requirements. One protocol suitable to transfer this kind of data over IP networks is the real-time transport protocol (RTP). This report describes standardized and proposed payload formats that support the transport of MPEG-4 elementary streams over RTP connections. These RTP packetization formats are compared w.r.t. their suitability for the adaptation (scaling) of the media data within the network, i.e., by advanced routers or proxy caches. This adaptation process is governed by metadata that need to be transferred and inspected in conjunction with the media streams.
|
[30] | Hermann Hellwagner, Matthias Ohlenroth, VI Architecture Communication Features and Performance on the Giganet Cluster LAN, In Future Generation Computer Systems, Elsevier B.V., vol. Vol. 18, no. Issue 3, Amsterdam, Netherlands, pp. 421-433, 2002.
[bib][url] [doi] [pdf] [abstract]
Abstract: The virtual interface (VI) architecture standard was developed to satisfy the need for a high throughput, low latency communication system required for cluster computing. VI architecture aims to close the performance gap between the bandwidths and latencies provided by the communication hardware and visible to the application, respectively, by minimizing the software overhead on the critical path of the communication. This paper presents the results of a performance study of one VI architecture hardware implementation, the Giganet cLAN (cluster LAN). The focus of the study is to assess and compare the performance of different VI architecture data transfer modes and specific features that are available to higher-level communication software like MPI in order to aid the implementor to decide which VI architecture options to employ for various communication scenarios. Examples of such options include the use of send/receive vs. RDMA data transfers, polling vs. blocking to check completion of communication operations, multiple VIs, completion queues and scatter capabilities of VI architecture.
|
[29] | Balázs Csizmazia, Hermann Hellwagner, The design and implementation of the A2QM3 System, In Proceedings Fourth International Workshop on Active Middleware Services (A N, ed.), IEEE Computer Society, Washington, DC, USA, pp. 19-27, 2002.
[bib] [pdf] [abstract]
Abstract: In this paper we present the design, architecture and implementation of the A2QM3 System. It provides programmers re-usable QoS-aware Control Objects to enable building a complete middleware for adaptive applications over active networks. We introduce the programming model, the system architecture, and show the parts that make this system a full-featured middleware supporting QoS-aware reliable stream-oriented communication, communication using the request/reply-based CORBA model and real-time streaming for continuous multimedia contents.
|
[28] | Laszlo Böszörmenyi, Mario Döller, Hermann Hellwagner, Harald Kosch, Mulugeta Libsie, Peter Schojer, Comprehensive Treatment of Adaptation in Distributed Mulimedia Systems in the ADMITS Project, In Proceedings of the tenth ACM international conference on Multimedia (Juan Les Pins, ed.), ACM, New York, pp. 429-430, 2002.
[bib][url] [abstract]
Abstract: Adaptation is becoming an increasingly important tool for resource and media management in distributed multimedia systems. Best-effort scheduling and worst-case reservation of resources are two extreme cases, none of them well suited to cope with large-scale, dynamic multimedia systems. The middle course can be met by a system which dynamically adapts its data, resource requirements, and processing components to achieve user satisfaction. Nevertheless, there is no agreement about the questions, where, when, what and who should adapt. A number of papers have been published in recent years, where adaptation is a central issue, however, in most different interpretations and generally in a somehow limited scope; e.g.,[1, 2, 8, 9, 10, 12]. A distributed multimedia system comprises several types of components, such as media servers, meta-databases, proxies, routers, clients. Also, a large number of adaptation possibilities exist, from simple frame dropping up to virtual server systems which dynamically allocate new resources on demand. The main question is, which kind of component can be best used for what kind of adaptation. In the ADMITS project (Adaptation in Distributed Multimedia IT Systems), we are seeking for answers to exactly this basic question, and to a number of related questions.
|
[27] | Matthias Ohlenroth, Hermann Hellwagner, Quality Adaptation Options of MPEG-4 Video Streams, Technical report, Institute of Information Technology (ITEC), Klagenfurt University, no. TR/ITEC/01/1.03., Klagenfurt, Austria, pp. 20, 2001.
[bib] |
[26] | Harald Kosch, Laszlo Böszörmenyi, Hermann Hellwagner, Modeling Quality Adaptation Capabilities of Audio-Visual Data, In Proceedings of the 12th International Conference on Database and Expert Systems Application - DEXA 2001, Munich, Germany, September 3-5, 2001 (HC Mayr, J Lazansky, G Quichmayr, P Vogel, eds.), Springer Verlag, Berlin [u. a.], pp. 744-753, 2001.
[bib] |
[25] | Hermann Hellwagner, Erich Kargl, A Cluster-Based QoS Testbed for Multimedia Communications, In SCI 2001 Proceedings of the 5th World Multi-Conference on Systemics, Cybernetics and Informatics, Volume XV, IEEE CS, July 2001 (N Callaos, W Badawy, S Bozinovski, eds.), IEEE, --, pp. 362-367, 2001.
[bib] [pdf] [abstract]
Abstract: This paper presents an inexpensive cluster-based QoS networking testbed that can be employed to \emulate" different networks for multimedia communication ex-periments. Such a network can be built using standard PC and Ethernet hardware and open-source software components, e.g., IP routing and traÆc control avail- able in recent Linux kernels as well as a Differentiated Services package built atop these building blocks. The testbed can exibly be configured to model various link bandwidths as well as IP routers capable of classifying, queuing (with various disciplines), forward-ing and/or dropping packets and shaping traÆc. The QoS components and facilities of the testbed are in-troduced and initial performance analysis experiments and results are reported. A simple video streaming application under QoS control is presented to show the usefulness of the testbed.
|
[24] | Laszlo Böszörmenyi, Hermann Hellwagner, Harald Kosch, Multimedia Technologies for E-Buisness Systems and Process, In Elektronische Geschäftsprozesse: Grundlagen, Sicherheitsaspekte, Realisierungen, Anwendungen. Tagungsband zur gemeinsamen Arbeitskonferenz GI/VOI/BITKOM/OCG/TeleTrusT (Patrick Horster, ed.), it Verlag, Höhenkirchen, pp. 471-481, 2001.
[bib] |
[23] | Christian Weiß, Hermann Hellwagner, Linda Stals, Ulrich Rüde, Data Locality Optimizations to Improve The Efficiency of Multigrid Methods, In Concepts of Numerical Software, NA, NA, pp. 1-10, 2000.
[bib] [pdf] [abstract]
Abstract: Current superscalar microprocessors are able to operate at a peak performance of up to 1 GFlop/sec. However, current main memory technology does not provide the data needed fast enough to keep the CPU busy. To minimize idle times of the CPU, caches are used to speed up accesses to frequently used data. To exploit caches, the software must be aware of them and reuse data in the cache before it is being replaced. Unfortunately, all conventional multigrid codes are not cache-aware and hence exploit less than 10 percent of the peak performance of cache based machines. Our studies with linear PDEs with constant coefficients show that it is possible to speed up the execution of our multigrid method by a large factor and hence solve a Poisson’s equation with one million unknowns in less than 3 seconds. The optimized reuse of data in the cache allows us to exploit 30 percent of the peak performance of the CPU, in contrast to mgd9v for instance, which achieves less than 5 percent on the same machine. To achieve this, we used several techniques like loop unrolling and loop fusion to better exploit the memory hierarchy and the superscalar CPU. We study the effects of these techniques on the runtime performance in detail. We also study several tools which guide the optimizations and help to restructure the code.
|
[22] | Hermann Hellwagner, Klaus Leopold, Ralf Schlatterbeck, Carsten Weich, Performance Tuning of Parallel Real-Time Voice Communication Software, In Proceedings Distributed and Parallel Systems (Peter Kascuk, Gabriele Kotsis, eds.), Kluwer Academic Publishers, Norwell, MA, USA, pp. 57-60, 2000.
[bib] [pdf] [abstract]
Abstract: This paper describes an unconventional way to apply a performance analysis tool for parallel programs (Vampir) to understand and tune the performance of the real-time voice and data communication software running on top of Frequentis’ V4 switch. The execution schedule of the strictly time-triggered V4 switching software is computed off-line; analyzing the schedule to identify e.g. performance bottlenecks used to be a complex and time-consuming process. We present our approach to transform the V4 software schedule’s information into Vampir trace files and use this tool’s facilities to provide a visualization of the schedule. A case study illustrates the benefits of this approach.
|
[21] | Hermann Hellwagner, Markus Lachowitz, Matthias Ohlenroth, Exploring the Performance of VI Architecture Communication Features in the Giganet Cluster LAN, In Proceedings International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA´2000), June 26, 2000, Las Vegas, Vol. 5 (Hamid R Arabnia, ed.), CSREA Press, [Athens, Ga.], pp. 2615-2621, 2000.
[bib] [doi] [pdf] [abstract]
Abstract: The Virtual Interface (VI) Architecture standard was developed to satisfy the need for a high-throughput, low-latency communication system required for cluster computing. This paper presents the results of a performance study of one VI Architecture hardware implementation, the Giganet cLAN (Cluster LAN). The focus of the study is to assess and compare the performance of different VI Architecture data transfer modes and specific features that are available to higher-level communication software like MPI, in order to aid the implementor to decide which VI Architecture options to employ for various communication scenarios. Examples of such options include the use of send/receive vs. RDMA data transfers, polling vs. blocking to check completion of communication operations, multiple VIs, completion queues, and scatter capabilities of VI Architecture.
|
[20] | Hermann Hellwagner, Ivan Zoraja, Vaidy Sunderam, SCIPVM: Parallel Distributed Computing on SCI Workstation Clusters, In Concurrency: Practice and Experience, N, A, vol. Vol 11, no. No 3, N, A, pp. 121-138, 1999.
[bib] [pdf] [abstract]
Abstract: Workstation and PC clusters interconnected by SCI (Scalable Coherent Interface) are very promising technologies for high performance cluster computing. Using commercial SBus to SCI interface cards and early system software and drivers, a two-workstation cluster has been constructed for initial testing and evaluation. The PVM system has been adapted to operate on this cluster using raw device access to the SCI interconnect, and preliminary communications performance tests have been carried out. Our preliminary results indicate that communications throughput in the range of 3.5 MBytes/s, and latencies Research supported by the Applied Mathematical Sciences program, Office of Basic Energy Sciences, U. S. Department of Energy, under Grant No. DE-FG05-91ER25105, the National Science Foundation, under Award Nos. ASC-9527186 and ASC-9214149, and the German Science Foundation SFB342. of 620 ¯s can be achieved on SCI clusters. These figures are significantly better (by a factor of 3 to 4) ...
|
[19] | Martin Schulz, Hermann Hellwagner, Global Virtual Memory based on SCI-DSM, In Proceedings of SCI-Europe ´98 (A N, ed.), N, A, N, A, pp. 59-67, 1998.
[bib] |
[18] | Martin Schulz, Hermann Hellwagner, Extending NT Virtual Memory by SCI-based Hardware DSM, In Proceedings of 2nd USENIX Windows NT Symposium (A N, ed.), USENIX Association, Seattle, WA, USA, pp. -, 1998.
[bib] |
[17] | Harald Richter, Richard Kleber, Hermann Hellwagner, Cost-Efficient SCI-based Banyan Networks, In Proceedings of the High Performance Computing Symposium (A N, ed.), N, A, N, A, pp. -, 1998.
[bib] [pdf] |
[16] | Michael Eberl, Hermann Hellwagner, Bjarne Geir Herland, Common Messaging Layer for MPI and PVM over SCI, In Proceedings of HPCN-Europe 98 (Peter Sloot, Marian Bubak, Bob Hertzberger, eds.), Springer Verlag, NA, pp. 576-587, 1998.
[bib] [abstract]
Abstract: This paper describes the design of a common message passing layer for implementing both MPI and PVM over the SCI interconnect in a workstation or PC cluster. The design is focused at obtaining low latency. The message layer encapsulates all necessary knowledge of the underlying interconnect and operating system. Yet, we claim that it can be used to implement such different message passing libraries as MPI and PVM without sacrificing efficiency. Initial results obtained from using the message layer in SCI clusters are presented.
|
[15] | Wolfgang Mayerle, Hermann Hellwagner, Konzepte und funktionaler Vergleich von Thread-Systemen (2), In Praxis der Informationsverarbeitung und Kommunikation, Spani, vol. 20, no. 4, Mannheim, Germany, pp. 225-229, 1997.
[bib] |
[14] | Wolfgang Mayerle, Hermann Hellwagner, Konzepte und funktionaler Vergleich von Thread-Systemen (1), In Praxis der Informationsverarbeitung und Kommunikation, Spaniol, Otto, vol. 20, Mannheim, Germany, pp. 164-174, 1997.
[bib] [pdf] [abstract]
Abstract: Dieses Papier gibt eine allgemeine Einführung in Threads und vergleicht einige derzeit für Arbeitsplatzrechner erhältliche Thread-Systeme. Aufbauend auf einer Motivation und grundlegenden Erläuterung des Thread-Konzepts werden wichtige Aspekte und Probleme von Thread-Bibliotheken vorgestellt. Nach einigen Hinweisen zur Programmierung mit Threads werden mehrere Implementierungen einander gegenübergestellt.
|
[13] | Hermann Hellwagner, Arbeitsspeicher- und Bussysteme, In Informatik-Handbuch (Peter Rechenberg, Gustav Pomberger, eds.), Carl Hanser Verlag, München, pp. 239-255, 1997.
[bib] |
[12] | Hermann Hellwagner, Wolfgang Karl, Markus Leberecht, Enabling a PC Cluster for High-Performance Computing, In Speedup Journal, Proceedings, 21st Workshop, March 13-14, 1997, Cadro-Lugano, N, A, vol. Vol. 11, no. 1, N, A, pp. 18-23, 1997.
[bib] [pdf] [abstract]
Abstract: Due to their excellent cost/performance ratio, clusters of PCs can be attractive high-performance computing (HPC) platforms. Yet, their limited communication performance over standard LANs is still prohibitive for parallel applications. The project "Shared Memory in a LAN-like Environment" (SMiLE) at LRR-TUM adopts Scalable Coherent Interface (SCI) interconnect technology to build, and provide software for, a PC cluster which, with hardware-based distributed shared memory (DSM) and high-performance communication characteristics, is regarded as well suited for HPC. The paper describes the key features of the enabling technology, SCI. It then discusses the developments and important results of the SMiLE project so far: the development and initial performance of a PCI/SCI interface card, and the design and initial performance results of low-latency communication layers, Active Messages and a sockets emulation library.
|
[11] | Hermann Hellwagner, Wolfgang Karl, Markus Leberecht, Fast Communication Mechanisms--Coupling Hardware Distributed Shared Memory and User-Level Messaging, In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1997: PDPTA (Hamid R Arabnia, ed.), CSREA Press, Las Vegas, Nevada, USA, pp. 8, 1997.
[bib] [pdf] [abstract]
Abstract: Low latencies for small messages are an important factor of efficient fine-grained parallel computation. The Active Messages concept provides this minimal overhead by eliminating certain parts of the critical path of sending and receiving messages, that is the context switch into the operating system kernel when using user-mode I/O, and multiple buffering in the network layer. Hardware-supported distributed shared memory (DSM) architectures exhibit various properties that make them particularly useful for an implementation of the aforementioned messaging mechanisms. This paper thus describes the concept, implementation, and the performance of a DSM-based Active Messages layer.
|
[10] | Michael Eberl, Hermann Hellwagner, Bjarne Geir Herland, Martin Schulz, SISCI - Implementing a Standard Software Infrastructure on an SCI Cluster, In Tagungsband zum 1. Workshop Cluster Computing (Wolfgang Rehm, ed.), N, A, N, A, pp. 49-61, 1997.
[bib] [pdf] [abstract]
Abstract: To enable the efficient utilization of clusters of workstations it is crucial to develop a stable and rich software infrastructure. The ESPRIT Project SISCI will provide two widely used message-passing interfaces, MPI and PVM, as well as a POSIX compliant, distributed thread package (Pthreads) on multiple SCI-based clusters. This paper features motivation and background on this projects as well as details of the two core components: the common messaging layer and the Pthreads package.
|
[9] | Michael Eberl, Hermann Hellwagner, Wolfgang Karl, Markus Leberecht, Sicherheit und Effizienz in einer Active-Message-Kommunikationsschicht, In Architektur von Rechensystemen: Arbeitsteilige Systemarchitekturen - Konzepte, Lösungen, Anwendungen, Trends (ARCS´97) (Djamshid Tavangarian, ed.), VDE Verlag, N, A, pp. -, 1997.
[bib] [pdf] [abstract]
Abstract: Active Messages haben sich als effizientes Kommunikationsverfahren insbesondere auf Kommunikationstechnologien durchgesetzt, die einen direkten Zugriff des Benutzers ohne Intervention des Betriebssystems zulassen. Als Nachteil der leichtgewichtigen Kommunikation erwiesen sich jedoch die nicht ausreichenden Schutzmechanismen, vor allem bei der Verwendung mehrerer Prozesse, die sich gleichzeitig einer Active-Message-Bibliothek bedienen. Die Spezifikation 2.0 der Berkeley-Active-Messages unternimmt nun den Versuch, f¨ur das bekannte und schnelle Kommunikationsverfahren Schutzabstraktionen vorzusehen. Im Rahmen dieser Arbeit wird die Implementation eines solchen Active-Message-Layers der Version 2.0 auf einem Cluster von SCI-gekoppelten Arbeitsplatzrechnern beschrieben. Wir k¨onnen zeigen, daß die zus¨atzlichen Schutzmechanismen nur wenig Einfluß auf die Leistung haben und somit der Vorteil der Active Messages, die leichtgewichtige, feingranulare Kommunikation, erhalten bleibt.
|
[8] | Georg Acher, Hermann Hellwagner, Wolfgang Karl, Markus Leberecht, Eine PCI-SCI-Adapterkarte für ein PC-Cluster mit verteiltem gemeinsamen Speicher, In Arbeitsplatz-Rechensysteme: Anwendungen, Architekturen, Betriebssysteme und Netzwerke (A N, ed.), N, A, N, A, pp. -, 1997.
[bib] |
[7] | Hermann Hellwagner, Ivan Zoraja, Vaidy Sunderam, PVM Data Transfers on SCI Workstation Clusters, In Proceedings PVM User Group Meeting (Arndt Bode, Jack Dongarra, Thomas Ludwig, Vaidy Sunderam, eds.), Springer, N, A, pp. -, 1996.
[bib] |