Abstract: Future Media Internet is envisaged to provide the means to share and distribute (advanced) multimedia content and services with superior quality and striking flexibility, in a trusted and personalized way, improving citizens' quality of life, working conditions, edutainment and safety. Based on work that has taken place in projects ICT SEA and ICT OPTIMIX, and the Media Delivery Platforms Cluster of projects, we try to provide the challenges and the way ahead in the area of content adaptation.

Abstract: This paper introduces a prototype test-bed for triggering sensory effects like light, wind, or vibration when presenting audiovisual resources, e.g., a video, to users. The ISO/IEC MPEG is currently standardizing the Sensory Effect Description Language (SEDL) for describing such effects. This language is briefly described in the paper and the testbed that is destined to evaluate the quality of the multimedia experience of users is presented. It consists of a video annotation tool for sensory effects, a corresponding simulation tool, and a real test system. Initial experiments and results on determining the color of light effects from the video content are reported.

Abstract: Virtual worlds (often referred to as 3D3C for 3D visualization & navigation and the 3C’s of Community, Creation and Commerce) integrate existing and emerging (media) technologies (e.g. instant messaging, video, 3D, VR, AI, chat, voice, etc.) that allow for the support of existing and the development of new kinds of networked services. The emergence of virtual worlds as platforms for networked services is recognized by businesses as an important enabler as it offers the power to reshape the way companies interact with their environments (markets, customers, suppliers, creators, stakeholders, etc.) in a fashion comparable to the Internet and to allow for the development of new (breakthrough) business models, services, applications and devices. Each virtual world however has a different culture and audience making use of these specific worlds for a variety of reasons. These differences in existing Metaverses permit users to have unique experiences. In order to bridge these differences in existing and emerging Metaverses a standardized framework is required, i.e., MPEG-V Media Context and Control (ISO/IEC 23005), that will provide a lower entry level to (multiple) virtual worlds both for the provider of goods and services as well as the user. The aim of this paper is to provide an overview of MPEG-V and its intended standardization areas. Additionally, a review about MPEG-V’s most advanced part – Sensory Information – is given.

Abstract: Nowadays, mobile devices have implemented several transmission technologies which enable access to the Internet and increase the bit rate for data exchange. Despite modern mobile processors and high-resolution displays, mobile devices will never reach the stage of a powerful notebook or desktop system (for example, due to the fact of battery powered CPUs or just concerning the smallsized displays). Due to these limitations, the deliverable content for these devices should be adapted based on their capabilities including a variety of aspects (e.g., from terminal to network characteristics). These capabilities should be described in an interoperable way. In practice, however, there are many standards available and a common mapping model between these standards is not in place. Therefore, in this paper we describe such a mapping model and its implementation aspects. In particular, we focus on the whole delivery context (i.e., terminal capabilities, network characteristics, user preferences, etc.) and investigated the two most prominent state-of-the-art description schemes, namely User Agent Profile (UAProf) and Usage Environment Description (UED).

Abstract: We present a new approach for video browsing using visualization of motion direction and motion intensity statistics by color and brightness variations. Statistics are collected from motion vectors of H.264/AVC encoded video streams, so full video decoding is not required. By interpreting visualized motion patterns of video segments, users are able to quickly identify scenes similar to a prototype scene or identify potential scenes of interest. We give some examples of motion patterns with different semantic value, including camera zooms, hill jumps of ski-jumpers, and the repeated appearance of a news speaker. In a user study we show that certain scenes of interest can be found significantly faster using our video browsing tool than using a video player with VCR-like controls.

Abstract: We present a video browsing tool that uses a novel and powerful visualization technique of video motion. The tool provides an interactive navigation index that allows users to quickly and easily recognize content semantics like scenes with fast/slow motion (in general or according to a specific direction), scenes showing still/moving objects in front of a still/moving background, camera pans, or camera zooms. Moreover, the visualization facilitates identification of similar segments in a video. A first user study has shown encouraging results.

Abstract: A new approach for interactive video browsing is described. The novelty of the proposed approach is the flexible concept of interactive navigation summaries. Similar to time sliders, commonly used with standard soft video players, navigation summaries allow random access to a video. In addition, they also provide abstract visualizations of the content at a user-defined level of detail and, thus, quickly communicate content characteristics to the user. Navigation summaries can provide visual information about both low-level features but even high-level features. The concept fully integrates the user, who knows best which navigation summary at which level of detail could be most beneficial for his/her current video browsing task, and provide him/her a flexible set of navigation means. A first user study has shown that our approach can significantly outperform standard soft video players - the state-of-the art “poor man’s” video browsing tool.

Abstract: Commercial blocks provide no extra value for video indexing, retrieval, archiving, or summarization of TV broadcasts. Therefore, automatic detection of commercial blocks is an important topic in the domain of multimedia information systems. We present a commercial detection approach which is based on logo detection performed in the compressed domain. The novelty of our approach is that by taking advantage of advanced features of the H.264/AVC coding, it is both significantly faster and more exact than existing approaches working directly on compressed data. Our approach enables removal of commercials in a fraction of real-time while achieving an average recall of 97.33% with an average precision of 99.31%. Moreover, due to its run-time performance, our approach can also be employed on low performance devices, for instance DVB recorders.

Abstract: We describe an approach for viewing any large, detail-rich picture on a small display by generating a video from the image, as taken by a virtual camera moving across it at varying distance. Our main innovation is the ability to build the virtual camera's motion from a textual description of a picture, e.g., a museum caption, so that relevance and ordering of image regions are determined by co-analyzing image annotations and natural language text. Furthermore, our system arranges the resulting presentation such that it is synchronized with an audio track generated from the text by use of a text-to-speech system.

Abstract: In this paper, we present an approach for presenting large, feature-rich pictures on small displays by generating an animation and subsequently a video from the image, as it could be taken by a virtual camera moving across the image. Our main innovation is the ability to build the virtual camera's motion upon a textual description of a picture, as from a museum caption, so that relevance and ordering of image regions is determined by co-analyzing image annotations and text. Furthermore, our system can arrange the resulting presentation in a way that it is synchronized with an audio track generated from the text by use of a text-to-speech system.

Abstract: Currently, much research aims at coping with the shortcomings in multimedia consumption that may exist in a user's current context, e.g., due to the absence of appropriate devices at many locations, a lack of capabilities of mobile devices, restricted access to content, or non-personalized user interfaces. Recently, solutions to specific problems have been emerging, e.g., wireless access to multimedia repositories over standardized interfaces; however, due to usability restrictions the user has to spend much effort to or is even incapable of fulfilling his/her demands. The vision of user-centric multimedia places the user in the center of multimedia services to support his/her multimedia consumption intelligently, dealing with the aforementioned issues while minimizing required work. Essential features of such a vision are comprehensive context awareness, personalized user interfaces, and multimedia content adaptation. These aspects are addressed in this paper as major challenges toward a user-centric multimedia framework.

Abstract: MPEG-7 is an extensive multimedia metadata standard covering a huge number of aspects of metadata. However, as with most metadata standards details of usage and application of the standards are – at least partially – open to interpretation. In case of MPEG-7storage and transmission of high level metadata on concept level are defined but retrieval methods are not proposed. So if for instance a user annotates photos using the MPEG-7 semantic description scheme, there are no standardized ways to retrieve the photos based on the annotation. In this paper we propose metrics for retrieval based on the MPEG-7 semantic description scheme and evaluate them in a digital photo retrieval scenario.

Abstract: Arthroscopic surgery is a minimally invasive procedure that uses a small camera to generate video streams, which are recorded and subsequently archived. In this paper we present a video summarization tool and demonstrate how it can be successfully used in the domain of arthroscopic videos. The proposed tool generates a keyframe-based summary, which clusters visually similar frames based on user-selected visual features and appropriate dissimilarity metrics. We discuss how this tool can be used for arthroscopic videos, taking advantage of several domain-specific aspects, without losing its ability to work on general-purpose videos. Experimental results confirm the feasibility of the proposed approach and encourage extending it to other application domains.

Abstract: The increasing availability of short, unstructured video clips on the Web has generated an unprecedented need to organize, index, annotate and retrieve video contents to make them useful to potential viewers. This paper presents a novel, simple, and easy-to-use tool to benchmark different low level features for video summarization based on keyframe extraction. Moreover, it shows the usefulness of the benchmarking tool by developing hypothesis for a chosen domain through an exploratory study. It discusses the results of exploratory studies involving users and their judgment of what makes the summary generated by the tool a good one.

Abstract: Results of internet searches are typically presented as lists. When searching for digital photos different search result presentations however offer different benefits. If users are primarily interested in the visual content of images a thumbnail grid may be more appropriate than a list. For people searching photos taken at a specific place image metadata in the result presentation is of interest too. In this paper we present an application which monitors a user's behavior while searching for digital photos and classifies the user's intention. Based on the intention, the result is adapted to support the user in an optimal way.

Abstract: Search queries are typically interpreted as specification of information need of a user. Typically the search query is either interpreted as is or based on the context of a user, being for instance a user profile, his/her previously undertaken searches or any other background information. The actual intent of the user – the goal s/he wants to achieve with information retrieval – is an important part of a user’s context. In this paper we present the results of an exploratory study on the interplay between the goals of users and their search behavior in multimedia retrieval.

Abstract: This paper presents a novel approach that combines both in-network, application-layer adaptation and network-layer traffic control of scalable video streams based on the H.264/SVC standard. In the IPTV/VoD scenario considered, an intercepting RTSP/RTP proxy performs admission control of the requested video, based on the signaled scalability information, and decides whether the content can be streamed without changes or in an adapted version. The proxy configures the network layer appropriately in order to separate the video stream from besteffort traffic on the same link. Rather than performing fixed bandwidth allocation, our proxy approach uses the Hierarchical Token Bucket (HTB) queuing discipline to allow for borrowing bandwidth between traffic classes. In that setting, two different allocation policies are introduced. The Hard Reservation Policy (HRP) performs admission control and adaptation on the video streams and does not modify video bandwidth allocation after admission. In contrast, the Flexible Borrowing Policy (FBP) restricts the admission control to the base layer of the SVC stream. The packets carrying MGS enhancement layer data are marked with priorities by the proxy and are handled at the network layer by a priority-based queuing mechanism. Both a qualitative comparison and an experimental evaluation of the two policies are given.

Abstract: One of the most active research topics in the field of video signal processing is scalable video coding (SVC). The recently published extension of the H.264/AVC video coding standard introduces scalability features by employing a layered encoding of the video stream. In our work we investigated the usage of this scalable extension of H.264/AVC for in-network multimedia adaptation. We developed an RTSP/RTP-based proxy which exploits the layered encoding of the video and can perform real-time video adaptation on an inexpensive off-the-shelf WiFi router. This is achieved by applying a stateful, packet-based adaptation approach that keeps the computational costs at a minimum. With that approach it is possible to simultaneously adapt multiple video streams to varying network conditions or to the capabilities of the consumers' end-devices. In our demonstration we show the streaming of two scalable video streams from a server to a client and the in-network adaptation of the video at the WiFi router. The adaptation can be controlled interactively in the temporal, spatial and SNR domains.

Abstract: Planning Video-on-Demand (VoD) services based on the server architecture and the available equipment is always a challenging task. We created a formal model to support the design of distributed video servers that adapt dynamically and automatically to the changing client demands, network and host parameters. The model makes giving estimations about the available throughput possible, and defines evaluation criteria for VoD services relating to utilization and load balance, video usage, client satisfaction and costs. The dynamism of the frame model originates from the possible state transitions which have to be defined in a core model. The core model is responsible for configuration recommendation which determines how clients are served depending on the properties of their requests, system configuration and system load. Furthermore, it decides on the optimal placement of the server components in the network. The usability of the model is illustrated on examples.

Abstract: This paper addresses the efficient adaptation of encrypted scalable video content (H.264/SVC). RTP-based in-network adaptation schemes on a media aware network element (MANE) in an IPTV and VoD scenario are considered. Two basic alternatives to implement encryption and adaptation of H.264/SVC content are investigated: (i) full, format-independent encryption making use of Secure RTP (SRTP); (ii) SVC-specific encryption that leaves the metadata relevant for adaptation (NAL unit headers) unencrypted. The SRTP-based scheme (i) is straightforward to deploy, but requires the MANE to be in the security context of the delivery, i.e., to be a trusted node. For adaptation, the content needs to be decrypted, scaled, and re-encrypted. The SVC-specific approach (ii) enables both full and selective encryption, e.g., of the base layer only. SVC-specific encryption is based on own previous work, which is substantially extended and detailed in this paper. The adaptation MANE can now be an untrusted node; adaptation becomes a low-complexity process, avoiding full decryption and re-encryption of the content. This paper presents the first experimental comparison of these two approaches and evaluates whether multimedia-specific encryption can lead to performance and application benefits. Potential security threats and security properties of the two approaches in the IPTV and VoD scenario are elementarily analyzed. In terms of runtime performance on the MANE our SVC-specific encryption scheme significantly outperforms the SRTP-based approach. SVC-specific encryption is also superior in terms of induced end-to-end delays. The performance can even be improved by selective application of the SVC-specific encryption scheme. The results indicate that efficient adaptation of SVC-encrypted content on low-end, untrusted network devices is feasible.

2009
[33]	Theodore Zahariadis, Catherine Lamy-Bergot, Thomas Schierl, Karsten Grüneberg, Luca Celetto, Christian Timmerer, Content Adaptation Issues in the Future Internet, Chapter in Towards the Future Internet - A European Research Perspective (Georgios Tselentis, John Domingue, Alex Galis, Anastasius Gavras, David Hausheer, Srdjan Krco, Volkmar Lotz, Theodore Zahariadis, eds.), IOS Press, Amsterdam, Netherlands, pp. 283-292, 2009. [bib][url] [pdf] [abstract] Abstract: Future Media Internet is envisaged to provide the means to share and distribute (advanced) multimedia content and services with superior quality and striking flexibility, in a trusted and personalized way, improving citizens' quality of life, working conditions, edutainment and safety. Based on work that has taken place in projects ICT SEA and ICT OPTIMIX, and the Media Delivery Platforms Cluster of projects, we try to provide the challenges and the way ahead in the area of content adaptation.
[32]	Markus Waltl, Christian Timmerer, Hermann Hellwagner, A Test-Bed for Quality of Multimedia Experience Evaluation of Sensory Effects, In Proceedings of the First International Workshop on Quality of Multimedia Experience (QoMEX 2009) (Touradj Ebrahim, Khaled El-Maleh, Gokce Dane, Lina Karam, eds.), IEEE, Los Alamitos, CA, USA, pp. 145-150, 2009. [bib][url] [doi] [pdf] [abstract] Abstract: This paper introduces a prototype test-bed for triggering sensory effects like light, wind, or vibration when presenting audiovisual resources, e.g., a video, to users. The ISO/IEC MPEG is currently standardizing the Sensory Effect Description Language (SEDL) for describing such effects. This language is briefly described in the paper and the testbed that is destined to evaluate the quality of the multimedia experience of users is presented. It consists of a video annotation tool for sensory effects, a corresponding simulation tool, and a real test system. Initial experiments and results on determining the color of light effects from the video content are reported.
[31]	Christian Timmerer, Jean Gelissen, Markus Waltl, Hermann Hellwagner, Interfacing with Virtual Worlds, In Proceedings of the 2009 NEM Summit (Halid Hrasnica, ed.), Eurescom – the European Institute for Research and Strategic Studies in Telecommunications – GmbH, Heidelberg, pp. 118-123, 2009. [bib][url] [pdf] [abstract] Abstract: Virtual worlds (often referred to as 3D3C for 3D visualization & navigation and the 3C’s of Community, Creation and Commerce) integrate existing and emerging (media) technologies (e.g. instant messaging, video, 3D, VR, AI, chat, voice, etc.) that allow for the support of existing and the development of new kinds of networked services. The emergence of virtual worlds as platforms for networked services is recognized by businesses as an important enabler as it offers the power to reshape the way companies interact with their environments (markets, customers, suppliers, creators, stakeholders, etc.) in a fashion comparable to the Internet and to allow for the development of new (breakthrough) business models, services, applications and devices. Each virtual world however has a different culture and audience making use of these specific worlds for a variety of reasons. These differences in existing Metaverses permit users to have unique experiences. In order to bridge these differences in existing and emerging Metaverses a standardized framework is required, i.e., MPEG-V Media Context and Control (ISO/IEC 23005), that will provide a lower entry level to (multiple) virtual worlds both for the provider of goods and services as well as the user. The aim of this paper is to provide an overview of MPEG-V and its intended standardization areas. Additionally, a review about MPEG-V’s most advanced part – Sensory Information – is given.
[30]	Christian Timmerer, Johannes Jaborning, Hermann Hellwagner, A Comparison and Mapping Model, In Proceedings of the 9th Workshop on Multimedia Metadata (WMM'09) (Ralf Klamma, Romulus Grigoras, Vincent Charvillat, Harald Kosch, eds.), http://ceur-ws.org, Aachen, Germany, pp. 18, 2009. [bib][url] [pdf] [abstract] Abstract: Nowadays, mobile devices have implemented several transmission technologies which enable access to the Internet and increase the bit rate for data exchange. Despite modern mobile processors and high-resolution displays, mobile devices will never reach the stage of a powerful notebook or desktop system (for example, due to the fact of battery powered CPUs or just concerning the smallsized displays). Due to these limitations, the deliverable content for these devices should be adapted based on their capabilities including a variety of aspects (e.g., from terminal to network characteristics). These capabilities should be described in an interoperable way. In practice, however, there are many standards available and a common mapping model between these standards is not in place. Therefore, in this paper we describe such a mapping model and its implementation aspects. In particular, we focus on the whole delivery context (i.e., terminal capabilities, network characteristics, user preferences, etc.) and investigated the two most prominent state-of-the-art description schemes, namely User Agent Profile (UAProf) and Usage Environment Description (UED).
[29]	Anita Sobe, Laszlo Böszörmenyi, Non-sequential Multimedia Caching, In 2009 First International Conference on Advances in Multimedia (Dan Burdescu, Petre Dini, eds.), IEEE, Los Alamitos, CA, USA, pp. 158-161, 2009. [bib] [doi]
[28]	Anita Sobe, Single Sign-On in IMS-based IPTV Systems, VdM, Saarbrücken, Germany, pp. 80, 2009. [bib]
[27]	Klaus Schoeffmann, Mathias Lux, Mario Taschwer, Laszlo Böszörmenyi, Visualization of Video Motion in Context of Video Browsing, In ICME'09 Proceedings of the 2009 IEEE international Conference on Multimedia and Expo (CY Lin, I Cox, eds.), IEEE, Los Alamitos, CA, USA, pp. 658-661, 2009. [bib][url] [abstract] Abstract: We present a new approach for video browsing using visualization of motion direction and motion intensity statistics by color and brightness variations. Statistics are collected from motion vectors of H.264/AVC encoded video streams, so full video decoding is not required. By interpreting visualized motion patterns of video segments, users are able to quickly identify scenes similar to a prototype scene or identify potential scenes of interest. We give some examples of motion patterns with different semantic value, including camera zooms, hill jumps of ski-jumpers, and the repeated appearance of a news speaker. In a user study we show that certain scenes of interest can be found significantly faster using our video browsing tool than using a video player with VCR-like controls.
[26]	Klaus Schoeffmann, Mario Taschwer, Laszlo Böszörmenyi, Video Browsing Using Motion Visualization, In Proceedings oft the International Conference on Multimedia and Expo 2009 (CY Lin, I Cox, eds.), IEEE, Los Alamitos, CA, USA, pp. 1835-1836, 2009. [bib] [abstract] Abstract: We present a video browsing tool that uses a novel and powerful visualization technique of video motion. The tool provides an interactive navigation index that allows users to quickly and easily recognize content semantics like scenes with fast/slow motion (in general or according to a specific direction), scenes showing still/moving objects in front of a still/moving background, camera pans, or camera zooms. Moreover, the visualization facilitates identification of similar segments in a video. A first user study has shown encouraging results.
[25]	Klaus Schoeffmann, Laszlo Böszörmenyi, Video Browsing Using Interactive Navigation Summaries, In Content-Based Multimedia Indexing, 2009. CBMI '09 (Yannis Avrithis, Stefanos Kollias, eds.), IEEE, Los Alamitos, CA, USA, pp. 243-248, 2009. [bib] [doi] [abstract] Abstract: A new approach for interactive video browsing is described. The novelty of the proposed approach is the flexible concept of interactive navigation summaries. Similar to time sliders, commonly used with standard soft video players, navigation summaries allow random access to a video. In addition, they also provide abstract visualizations of the content at a user-defined level of detail and, thus, quickly communicate content characteristics to the user. Navigation summaries can provide visual information about both low-level features but even high-level features. The concept fully integrates the user, who knows best which navigation summary at which level of detail could be most beneficial for his/her current video browsing task, and provide him/her a flexible set of navigation means. A first user study has shown that our approach can significantly outperform standard soft video players - the state-of-the art “poor man’s” video browsing tool.
[24]	Klaus Schoeffmann, Laszlo Böszörmenyi, Interactive Video Browsing of H.264 Content Based on Just-in-Time Analysis, In Advance in Semantic Media Adaptation and Personalization (Marios C Angelides, Phivos Mylonas, eds.), Auerbach Publications, Boca Raton, FL, USA, pp. 159-179, 2009. [bib]
[23]	Klaus Schoeffmann, Mathias Lux, Laszlo Böszörmenyi, A Novel Approach for Fast and Accurate Commercial Detection in H.264/AVC Bit Streams Based on Logo Identification, In Advances in Multimedia Modeling (Benoit Huet, Alan Smeaton, Ketan Mayer-Patel, Yannis Avrithis, eds.), Springer, Berlin, Heidelberg, New York, pp. 119-127, 2009. [bib][url] [doi] [abstract] Abstract: Commercial blocks provide no extra value for video indexing, retrieval, archiving, or summarization of TV broadcasts. Therefore, automatic detection of commercial blocks is an important topic in the domain of multimedia information systems. We present a commercial detection approach which is based on logo detection performed in the compressed domain. The novelty of our approach is that by taking advantage of advanced features of the H.264/AVC coding, it is both significantly faster and more exact than existing approaches working directly on compressed data. Our approach enables removal of commercials in a fraction of real-time while achieving an average recall of 97.33% with an average precision of 99.31%. Moreover, due to its run-time performance, our approach can also be employed on low performance devices, for instance DVB recorders.
[22]	Bernhard Reiterer, Cyril Concolato, Hermann Hellwagner, Natural-Language-based Conversion of Images to Mobile Multimedia Experiences, In Proceedings of 1st International ICST Conference on User Centric Media - UCMedia 2009 (Patros Daras, Imrich Chlamtac, eds.), Springer, Berlin, Heidelberg, New York, pp. 4 - CD, 2009. [bib][url] [abstract] Abstract: We describe an approach for viewing any large, detail-rich picture on a small display by generating a video from the image, as taken by a virtual camera moving across it at varying distance. Our main innovation is the ability to build the virtual camera's motion from a textual description of a picture, e.g., a museum caption, so that relevance and ordering of image regions are determined by co-analyzing image annotations and natural language text. Furthermore, our system arranges the resulting presentation such that it is synchronized with an audio track generated from the text by use of a text-to-speech system.
[21]	Bernhard Reiterer, Hermann Hellwagner, Animated Picture Presentation Steered by Natural Language, In Proceedings International InterMedia Summer School 2009 (Magnenat-Thalmann Nadia, Han Seunghyun, Potopsaltou Dimitris, eds.), MIRALab at University of Geneva, Geneva, pp. 24-32, 2009. [bib][url] [abstract] Abstract: In this paper, we present an approach for presenting large, feature-rich pictures on small displays by generating an animation and subsequently a video from the image, as it could be taken by a virtual camera moving across the image. Our main innovation is the ability to build the virtual camera's motion upon a textual description of a picture, as from a museum caption, so that relevance and ordering of image regions is determined by co-analyzing image annotations and text. Furthermore, our system can arrange the resulting presentation in a way that it is synchronized with an audio track generated from the text by use of a text-to-speech system.
[20]	Bernhard Reiterer, Janine Lachner, Andreas Lorenz, Andreas Zimmermann, Hermann Hellwagner, Research Directions Toward User-centric Multimedia, In Advances in Semantic Media Adaptation and Personalization (Marios C Angelides, Phivos Mylonas, Manolis Wallace, eds.), Auerbach Publications, Boca Raton (Florida), pp. 21-42, 2009. [bib][url] [doi] [abstract] Abstract: Currently, much research aims at coping with the shortcomings in multimedia consumption that may exist in a user's current context, e.g., due to the absence of appropriate devices at many locations, a lack of capabilities of mobile devices, restricted access to content, or non-personalized user interfaces. Recently, solutions to specific problems have been emerging, e.g., wireless access to multimedia repositories over standardized interfaces; however, due to usability restrictions the user has to spend much effort to or is even incapable of fulfilling his/her demands. The vision of user-centric multimedia places the user in the center of multimedia services to support his/her multimedia consumption intelligently, dealing with the aforementioned issues while minimizing required work. Essential features of such a vision are comprehensive context awareness, personalized user interfaces, and multimedia content adaptation. These aspects are addressed in this paper as major challenges toward a user-centric multimedia framework.
[19]	Mathias Lux, An Evaluation of Metrics for Retrieval of MPEG-7 Semantic Descriptions, In Multimedia, 2009. ISM '09. 11th IEEE International Symposium on (Jeffrey Tsai, Ramesh Jain, eds.), IEEE, Los Alamitos, CA, USA, pp. 546-551, 2009. [bib] [doi] [abstract] Abstract: MPEG-7 is an extensive multimedia metadata standard covering a huge number of aspects of metadata. However, as with most metadata standards details of usage and application of the standards are – at least partially – open to interpretation. In case of MPEG-7storage and transmission of high level metadata on concept level are defined but retrieval methods are not proposed. So if for instance a user annotates photos using the MPEG-7 semantic description scheme, there are no standardized ways to retrieve the photos based on the annotation. In this paper we propose metrics for retrieval based on the MPEG-7 semantic description scheme and evaluate them in a digital photo retrieval scenario.
[18]	Mathias Lux, Oge Marques, Klaus Schoeffmann, Laszlo Böszörmenyi, Georg Lajtai, A novel tool for summarization of arthroscopic videos, In Multimedia Tools and Applications, Springer, Berlin, Heidelberg, New York, pp. 521 - 544, 2009. [bib][url] [abstract] Abstract: Arthroscopic surgery is a minimally invasive procedure that uses a small camera to generate video streams, which are recorded and subsequently archived. In this paper we present a video summarization tool and demonstrate how it can be successfully used in the domain of arthroscopic videos. The proposed tool generates a keyframe-based summary, which clusters visually similar frames based on user-selected visual features and appropriate dissimilarity metrics. We discuss how this tool can be used for arthroscopic videos, taking advantage of several domain-specific aspects, without losing its ability to work on general-purpose videos. Experimental results confirm the feasibility of the proposed approach and encourage extending it to other application domains.
[17]	Mathias Lux, Klaus Schoeffmann, Oge Marques, Laszlo Böszörmenyi, A Novel Tool for Quick Video Summarization using Keyframe Extraction Techniques, In 9th Workshop on Multimedia Metadata (WMM'09) (Romulus Grigoras, Vincent Charvillat, Ralf Klamma, Harald Kosch, eds.), http://ceur-ws.org, Aachen, Germany, pp. 62-76, 2009. [bib][url] [abstract] Abstract: The increasing availability of short, unstructured video clips on the Web has generated an unprecedented need to organize, index, annotate and retrieve video contents to make them useful to potential viewers. This paper presents a novel, simple, and easy-to-use tool to benchmark different low level features for video summarization based on keyframe extraction. Moreover, it shows the usefulness of the benchmarking tool by developing hypothesis for a chosen domain through an exploratory study. It discusses the results of exploratory studies involving users and their judgment of what makes the summary generated by the tool a good one.
[16]	Harald Kosch, Christian Timmerer, Multimedia Metadata and Semantic Management, In IEEE Computing Now, IEEE, vol. Multimedia Metadata and Semantic Management, no. December 2009, Los Alamitos, CA, USA, pp. 00, 2009. [bib]
[15]	Marian Kogler, Manfred Del Fabro, Mathias Lux, Klaus Schoeffmann, Laszlo Böszörmenyi, Global vs. Local Feature in Video Summarization: Experimental Results, In Proceedings of the 10th International Workshop of the Multimedia Metadata Community on Semantic Multimedia Database Technologies (SeMuDaTe'09) in conjunction with the 4th International Conference on Semantic and Digital Media Technologies (SAMT 2009) (Klamma Ralf, Kosch Harald, Mathias Lux, Stegmaier Florian, eds.), http://ceur-ws.org, Aachen, Germany, pp. 6, 2009. [bib][url]
[14]	Christoph Kofler, Mathias Lux, Dynamic presentation adaptation based on user intent classification., In MM '09 Proceedings of the 17th ACM international conference on Multimedia (Wen Gao, Yong Tui, Alan Hanjalic, eds.), NA, NA, pp. 1117-1118, 2009. [bib][url] [doi] [abstract] Abstract: Results of internet searches are typically presented as lists. When searching for digital photos different search result presentations however offer different benefits. If users are primarily interested in the visual content of images a thumbnail grid may be more appropriate than a list. For people searching photos taken at a specific place image metadata in the result presentation is of interest too. In this paper we present an application which monitors a user's behavior while searching for digital photos and classifies the user's intention. Based on the intention, the result is adapted to support the user in an optimal way.
[13]	Christoph Kofler, Mathias Lux, An Exploratory Study on the Explicitness of User Intentions in Digital Photo Retrieval., In Proceedings of I-KNOW ’09 and I-SEMANTICS ’09 (Klaus Tochtermann, Hermann Maurer, eds.), TU Graz & Know Center, Graz, Austria, pp. 208-214, 2009. [bib][url] [abstract] Abstract: Search queries are typically interpreted as specification of information need of a user. Typically the search query is either interpreted as is or based on the context of a user, being for instance a user profile, his/her previously undertaken searches or any other background information. The actual intent of the user – the goal s/he wants to achieve with information retrieval – is an important part of a user’s context. In this paper we present the results of an exploratory study on the interplay between the goals of users and their search behavior in multimedia retrieval.
[12]	Ingo Kofler, Robert Kuschnig, Hermann Hellwagner, Improving IPTV Services by H.264/SVC Adaptation and Traffic Control, In Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) (Pablo Angueira, Ulrich Reimers, eds.), IEEE, Los Alamitos, CA, USA, pp. 1-6, 2009. [bib][url] [doi] [abstract] Abstract: This paper presents a novel approach that combines both in-network, application-layer adaptation and network-layer traffic control of scalable video streams based on the H.264/SVC standard. In the IPTV/VoD scenario considered, an intercepting RTSP/RTP proxy performs admission control of the requested video, based on the signaled scalability information, and decides whether the content can be streamed without changes or in an adapted version. The proxy configures the network layer appropriately in order to separate the video stream from besteffort traffic on the same link. Rather than performing fixed bandwidth allocation, our proxy approach uses the Hierarchical Token Bucket (HTB) queuing discipline to allow for borrowing bandwidth between traffic classes. In that setting, two different allocation policies are introduced. The Hard Reservation Policy (HRP) performs admission control and adaptation on the video streams and does not modify video bandwidth allocation after admission. In contrast, the Flexible Borrowing Policy (FBP) restricts the admission control to the base layer of the SVC stream. The packets carrying MGS enhancement layer data are marked with priorities by the proxy and are handled at the network layer by a priority-based queuing mechanism. Both a qualitative comparison and an experimental evaluation of the two policies are given.
[11]	Ingo Kofler, Robert Kuschnig, Hermann Hellwagner, In-Network Real-Time Adaptation of Scalable Video Content on a WiFi-ne Router, In Proceedings of the 6th IEEE Consumer Communications and Networking Conference (CCNC) (Simon Gibbs, Alan Messer, eds.), IEEE, Los Alamitos, CA, USA, pp. 2, 2009. [bib] [doi] [pdf] [abstract] Abstract: One of the most active research topics in the field of video signal processing is scalable video coding (SVC). The recently published extension of the H.264/AVC video coding standard introduces scalability features by employing a layered encoding of the video stream. In our work we investigated the usage of this scalable extension of H.264/AVC for in-network multimedia adaptation. We developed an RTSP/RTP-based proxy which exploits the layered encoding of the video and can perform real-time video adaptation on an inexpensive off-the-shelf WiFi router. This is achieved by applying a stateful, packet-based adaptation approach that keeps the computational costs at a minimum. With that approach it is possible to simultaneously adapt multiple video streams to varying network conditions or to the capabilities of the consumers' end-devices. In our demonstration we show the streaming of two scalable video streams from a server to a client and the in-network adaptation of the video at the WiFi router. The adaptation can be controlled interactively in the temporal, spatial and SNR domains.
[10]	Peter Karpati, Tibor Szkaliczki, Laszlo Böszörmenyi, Designing and scaling distributed VoD servers, In Multimedia Tools and Applications, Springer, vol. Volume 41, Number 1, Berlin, Heidelberg, New York, pp. 55-91, 2009. [bib][url] [abstract] Abstract: Planning Video-on-Demand (VoD) services based on the server architecture and the available equipment is always a challenging task. We created a formal model to support the design of distributed video servers that adapt dynamically and automatically to the changing client demands, network and host parameters. The model makes giving estimations about the available throughput possible, and defines evaluation criteria for VoD services relating to utilization and load balance, video usage, client satisfaction and costs. The dynamism of the frame model originates from the possible state transitions which have to be defined in a core model. The core model is responsible for configuration recommendation which determines how clients are served depending on the properties of their requests, system configuration and system load. Furthermore, it decides on the optimal placement of the server components in the network. The usability of the model is illustrated on examples.
[9]	Hermann Hellwagner, Robert Kuschnig, Thomas Stütz, Andreas Uhl, Efficient In-Network Adaptation of Encrypted H.264/SVC Content, In Journal on Signal Processing: Image Communication, Elsevier B.V., vol. 24, no. 9, Amsterdam, pp. 740-758, 2009. [bib] [pdf] [abstract] Abstract: This paper addresses the efficient adaptation of encrypted scalable video content (H.264/SVC). RTP-based in-network adaptation schemes on a media aware network element (MANE) in an IPTV and VoD scenario are considered. Two basic alternatives to implement encryption and adaptation of H.264/SVC content are investigated: (i) full, format-independent encryption making use of Secure RTP (SRTP); (ii) SVC-specific encryption that leaves the metadata relevant for adaptation (NAL unit headers) unencrypted. The SRTP-based scheme (i) is straightforward to deploy, but requires the MANE to be in the security context of the delivery, i.e., to be a trusted node. For adaptation, the content needs to be decrypted, scaled, and re-encrypted. The SVC-specific approach (ii) enables both full and selective encryption, e.g., of the base layer only. SVC-specific encryption is based on own previous work, which is substantially extended and detailed in this paper. The adaptation MANE can now be an untrusted node; adaptation becomes a low-complexity process, avoiding full decryption and re-encryption of the content. This paper presents the first experimental comparison of these two approaches and evaluates whether multimedia-specific encryption can lead to performance and application benefits. Potential security threats and security properties of the two approaches in the IPTV and VoD scenario are elementarily analyzed. In terms of runtime performance on the MANE our SVC-specific encryption scheme significantly outperforms the SRTP-based approach. SVC-specific encryption is also superior in terms of induced end-to-end delays. The performance can even be improved by selective application of the SVC-specific encryption scheme. The results indicate that efficient adaptation of SVC-encrypted content on low-end, untrusted network devices is feasible.