[852] | Vladislav Kashansky, Dragi Kimovski, Radu Prodan, Prateek Agrawal, Fabrizio Marozzo, Gabriel Iuhasz, Marek Marozzo, Javier Garcia-Blas, M3AT: Monitoring Agents Assignment Model for Data-Intensive Applications, In 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), IEEE, pp. 72-79, 2020.
[bib][url] [doi] [abstract]
Abstract: Nowadays, massive amounts of data are acquired, transferred, and analyzed nearly in real-time by utilizing a large number of computing and storage elements interconnected through high-speed communication networks. However, one issue that still requires research effort is to enable efficient monitoring of applications and infrastructures of such complex systems. In this paper, we introduce a Integer Linear Programming (ILP) model called M3AT for optimised assignment of monitoring agents and aggregators on large-scale computing systems. We identified a set of requirements from three representative data-intensive applications and exploited them to define the model’s input parameters. We evaluated the scalability of M3AT using the Constraint Integer Programing (SCIP) solver with default configuration based on synthetic data sets. Preliminary results show that the model provides optimal assignments for systems composed of up to 200 monitoring agents while keeping the number of aggregators constant and demonstrates variable sensitivity with respect to the scale of monitoring data aggregators and limitation policies imposed.
|
[851] | Jeroen van der Hooft, Maria Torres Vega, Tim Wauters, Christian Timmerer, Ali C. Begen, Filip De Turck, Raimund Schatz, From Capturing to Rendering: Volumetric Media Delivery with Six Degrees of Freedom, In IEEE Communications Magazine, Institute of Electrical and Electronics Engineers (IEEE), vol. 58, no. 10, pp. 49-55, 2020.
[bib][url] [doi] [abstract]
Abstract: Technological improvements are rapidly advancing holographic-type content distribution. Significant research efforts have been made to meet the low-latency and high-bandwidth requirements set forward by interactive applications such as remote surgery and virtual reality. Recent research made six degrees of freedom (6DoF) for immersive media possible, where users may both move their heads and change their position within a scene. In this article, we present the status and challenges of 6DoF applications based on volumetric media, focusing on the key aspects required to deliver such services. Furthermore, we present results from a subjective study to highlight relevant directions for future research.
|
[850] | Jeroen van der Hooft, Maria Torres Vega, Christian Timmerer, Ali C. Begen, Filip De Turck, Raimund Schatz, Objective and Subjective QoE Evaluation for Adaptive Point Cloud Streaming, In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2020.
[bib][url] [doi] [abstract]
Abstract: Volumetric media has the potential to provide the six degrees of freedom (6DoF) required by truly immersive media. However, achieving 6DoF requires ultra-high bandwidth transmissions, which real-world wide area networks cannot provide economically. Therefore, recent efforts have started to target efficient delivery of volumetric media, using a combination of compression and adaptive streaming techniques. It remains, however, unclear how the effects of such techniques on the user perceived quality can be accurately evaluated. In this paper, we present the results of an extensive objective and subjective quality of experience (QoE) evaluation of volumetric 6DoF streaming. We use PCC-DASH, a standards-compliant means for HTTP adaptive streaming of scenes comprising multiple dynamic point cloud objects. By means of a thorough analysis we investigate the perceived quality impact of the available bandwidth, rate adaptation algorithm, viewport prediction strategy and user’s motion within the scene. We determine which of these aspects has more impact on the user’s QoE, and to what extent subjective and objective assessments are aligned.
|
[849] | Samira Hayat, Evsen Yanmaz, Christian Bettstetter, Timothy X. Brown, Multi-objective drone path planning for search and rescue with quality-of-service requirements, In Autonomous Robots, Springer Science and Business Media LLC, vol. 44, no. 7, pp. 1183-1198, 2020.
[bib][url] [doi] [abstract]
Abstract: We incorporate communication into the multi-UAV path planning problem for search and rescue missions to enable dynamic task allocation via information dissemination. Communication is not treated as a constraint but a mission goal. While achieving this goal, our aim is to avoid compromising the area coverage goal and the overall mission time. We define the mission tasks as: search, inform, and monitor at the best possible link quality. Building on our centralized simultaneous inform and connect (SIC) path planning strategy, we propose two adaptive strategies: (1) SIC with QoS (SICQ): optimizes search, inform, and monitor tasks simultaneously and (2) SIC following QoS (SIC+): first optimizes search and inform tasks together and then finds the optimum positions for monitoring. Both strategies utilize information as soon as it becomes available to determine UAV tasks. The strategies can be tuned to prioritize certain tasks in relation to others. We illustrate that more tasks can be performed in the given mission time by efficient incorporation of communication in the path design. We also observe that the quality of the resultant paths improves in terms of connectivity.
|
[848] | Cathal Gurrin, Tu-Khiem Le, Van-Tu Ninh, Duc-Tien Dang-Nguyen, Björn Thor Jonsson, Jakub Loko, Wolfgang Hürst, Minh-Triet Tran, Klaus Schöffmann, Introduction to the Third Annual Lifelog Search Challenge (LSC' 20), In Proceedings of the 2020 International Conference on Multimedia Retrieval, ACM, pp. 584-585, 2020.
[bib][url] [doi] [abstract]
Abstract: The Lifelog Search Challenge (LSC) is an annual comparative benchmarking activity for comparing approaches to interactive retrieval from multi-modal lifelogs. LSC'20, the third such challenge, attracts fourteen participants with their interactive lifelog retrieval systems. These systems are comparatively evaluated in front of a live-audience at the LSC workshop at ACM ICMR'20 in Dublin, Ireland. This overview motivates the challenge, presents the dataset and system configuration used in the challenge, and briefly presents the participating teams.
|
[847] | Negin Ghamsarian, Klaus Schoeffmann, Morteza Khademi, Blind MV-based video steganalysis based on joint inter-frame and intra-frame statistics, In Multimedia Tools and Applications, Springer Science and Business Media LLC, vol. 80, no. 6, pp. 1-23, 2020.
[bib][url] [doi] [abstract]
Abstract: Despite all its irrefutable benefits, the development of steganography methods has sparked ever-increasing concerns over steganography abuse in recent decades. To prevent the inimical usage of steganography, steganalysis approaches have been introduced. Since motion vector manipulation leads to random and indirect changes in the statistics of videos, MV-based video steganography has been the center of attention in recent years. In this paper, we propose a 54-dimentional feature set exploiting spatio-temporal features of motion vectors to blindly detect MV-based stego videos. The idea behind the proposed features originates from two facts. First, there are strong dependencies among neighboring MVs due to utilizing rate-distortion optimization techniques and belonging to the same rigid object or static background. Accordingly, MV manipulation can leave important clues on the differences between each MV and the MVs belonging to the neighboring blocks. Second, a majority of MVs in original videos are locally optimal after decoding concerning the Lagrangian multiplier, notwithstanding the information loss during compression. Motion vector alteration during information embedding can affect these statistics that can be utilized for steganalysis. Experimental results have shown that our features’ performance far exceeds that of state-of-the-art steganalysis methods. This outstanding performance lies in the utilization of complementary spatio-temporal statistics affected by MV manipulation as well as feature dimensionality reduction applied to prevent overfitting. Moreover, unlike other existing MV-based steganalysis methods, our proposed features can be adjusted to various settings of the state-of-the-art video codec standards such as sub-pixel motion estimation and variable-block-size motion estimation.
|
[846] | Negin Ghamsarian, Mario Taschwer, Klaus Schoeffmann, Deblurring Cataract Surgery Videos Using a Multi-Scale Deconvolutional Neural Network, In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), IEEE, pp. 872-876, 2020.
[bib][url] [doi] [abstract]
Abstract: A common quality impairment observed in surgery videos is blur, caused by object motion or a defocused camera. Degraded image quality hampers the progress of machine-learning-based approaches in learning and recognizing semantic information in surgical video frames like instruments, phases, and surgical actions. This problem can be mitigated by automatically deblurring video frames as a preprocessing method for any subsequent video analysis task. In this paper, we propose and evaluate a multi-scale deconvolutional neural network to deblur cataract surgery videos. Experimental results confirm the effectiveness of the proposed approach in terms of the visual quality of frames as well as PSNR improvement.
|
[845] | Negin Ghamsarian, Enabling Relevance-Based Exploration of Cataract Videos, In Proceedings of the 2020 International Conference on Multimedia Retrieval, ACM, pp. 378-382, 2020.
[bib][url] [doi] [abstract]
Abstract: Training new surgeons as one of the major duties of experienced expert surgeons demands a considerable supervisory investment of them. To expedite the training process and subsequently reduce the extra workload on their tight schedule, surgeons are seeking a surgical video retrieval system. Automatic workflow analysis approaches can optimize the training procedure by indexing the surgical video segments to be used for online video exploration. The aim of the doctoral project described in this paper is to provide the basis for a cataract video exploration system, that is able to (i) automatically analyze and extract the relevant segments of videos from cataract surgery, and (ii) provide interactive exploration means for browsing archives of cataract surgery videos. In particular, we apply deep-learning-based classification and segmentation approaches to cataract surgery videos to enable automatic phase and action recognition and similarity detection.
|
[844] | Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, Klaus Schöffmann, Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks, In Proceedings of the 28th ACM International Conference on Multimedia, ACM, pp. 3577-3585, 2020.
[bib][url] [doi] [abstract]
Abstract: Recorded cataract surgery videos play a prominent role in training and investigating the surgery, and enhancing the surgical outcomes. Due to storage limitations in hospitals, however, the recorded cataract surgeries are deleted after a short time and this precious source of information cannot be fully utilized. Lowering the quality to reduce the required storage space is not advisable since the degraded visual quality results in the loss of relevant information that limits the usage of these videos. To address this problem, we propose a relevance-based compression technique consisting of two modules: (i) relevance detection, which uses neural networks for semantic segmentation and classification of the videos to detect relevant spatio-temporal information, and (ii) content-adaptive compression, which restricts the amount of distortion applied to the relevant content while allocating less bitrate to irrelevant content. The proposed relevance-based compression framework is implemented considering five scenarios based on the definition of relevant information from the target audience's perspective. Experimental results demonstrate the capability of the proposed approach in relevance detection. We further show that the proposed approach can achieve high compression efficiency by abstracting substantial redundant information while retaining the high quality of the relevant content.
|
[843] | Markus Fox, Mario Taschwer, Klaus Schoeffmann, Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN, In 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), IEEE, pp. 565-568, 2020.
[bib][url] [doi] [abstract]
Abstract: Automatically detecting surgical tools in recorded surgery videos is an important building block of further content-based video analysis. In ophthalmology, the results of such methods can support training and teaching of operation techniques and enable investigation of medical research questions on a dataset of recorded surgery videos. While previous methods used frame-based classification techniques to predict the presence of surgical tools — but did not localize them, we apply a recent deep-learning segmentation method (Mask R-CNN) to localize and segment surgical tools used in ophthalmic cataract surgery. We add ground-truth annotations for multi-class instance segmentation to two existing datasets of cataract surgery videos and make resulting datasets publicly available for research purposes. In the absence of comparable results from literature, we tune and evaluate the Mask R-CNN approach on these datasets for instrument segmentation/localization and achieve promising results (61\% mean average precision on 50\% intersection over union for instance segmentation, working even better for bounding box detection or binary segmentation), establishing a reasonable baseline for further research. Moreover, we experiment with common data augmentation techniques and analyze the achieved segmentation performance with respect to each class (instrument), providing evidence for future improvements of this approach.
|
[842] | Hamid Mohammadi Fard, Radu Prodan, Felix Wolf, Dynamic Multi-objective Scheduling of Microservices in the Cloud, In 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC), IEEE, pp. 386-393, 2020.
[bib][url] [doi] [abstract]
Abstract: For many applications, a microservices architecture promises better performance and flexibility compared to a conventional monolithic architecture. In spite of the advantages of a microservices architecture, deploying microservices poses various challenges for service developers and providers alike. One of these challenges is the efficient placement of microservices on the cluster nodes. Improper allocation of microservices can quickly waste resource capacities and cause low system throughput. In the last few years, new technologies in orchestration frameworks, such as the possibility of multiple schedulers for pods in Kubernetes, have improved scheduling solutions of microservices but using these technologies needs to involve both the service developer and the service provider in the behavior analysis of workloads. Using memory and CPU requests specified in the service manifest, we propose a general microservices scheduling mechanism that can operate efficiently in private clusters or enterprise clouds. We model the scheduling problem as a complex variant of the knapsack problem and solve it using a multi-objective optimization approach. Our experiments show that the proposed mechanism is highly scalable and simultaneously increases utilization of both memory and CPU, which in turn leads to better throughput when compared to the state-of-the-art.
|
[841] | Hamid Mohammadi Fard, Radu Prodan, Felix Wolf, A Container-Driven Approach for Resource Provisioning in Edge-Fog Cloud, Chapter in Algorithmic Aspects of Cloud Computing, Springer International Publishing, no. 12041, pp. 59-76, 2020.
[bib][url] [doi] [abstract]
Abstract: With the emerging Internet of Things (IoT), distributed systems enter a new era. While pervasive and ubiquitous computing already became reality with the use of the cloud, IoT networks present new challenges because the ever growing number of IoT devices increases the latency of transferring data to central cloud data centers. Edge and fog computing represent practical solutions to counter the huge communication needs between IoT devices and the cloud. Considering the complexity and heterogeneity of edge and fog computing, however, resource provisioning remains the Achilles heel of efficiency for IoT applications. According to the importance of operating-system virtualization (so-called containerization), we propose an application-aware container scheduler that helps to orchestrate dynamic heterogeneous resources of edge and fog architectures. By considering available computational capacity, the proximity of computational resources to data producers and consumers, and the dynamic system status, our proposed scheduling mechanism selects the most adequate host to achieve the minimum response time for a given IoT service. We show how a hybrid use of containers and serverless microservices improves the performance of running IoT applications in fog-edge clouds and lowers usage fees. Moreover, our approach outperforms the scheduling mechanisms of Docker Swarm.
|
[840] | Alireza Erfanian, Farzad Tashtarian, Reza Farahani, Christian Timmerer, Hermann Hellwagner, On Optimizing Resource Utilization in AVC-based Real-time Video Streaming, In 2020 6th IEEE Conference on Network Softwarization (NetSoft), IEEE, pp. 301-309, 2020.
[bib][url] [doi] [abstract]
Abstract: Real-time video streaming traffic and related applications have witnessed significant growth in recent years. However, this has been accompanied by some challenging issues, predominantly resource utilization. IP multicasting, as a solution to this problem, suffers from many problems. Using scalable video coding could not gain wide adoption in the industry, due to reduced compression efficiency and additional computational complexity. The emerging software-defined networking (SDN)and network function virtualization (NFV) paradigms enable re-searchers to cope with IP multicasting issues in novel ways. In this paper, by leveraging the SDN and NFV concepts, we introduce a cost-aware approach to provide advanced video coding (AVC)-based real-time video streaming services in the network. In this study, we use two types of virtualized network functions (VNFs): virtual reverse proxy (VRP) and virtual transcoder (VTF)functions. At the edge of the network, VRPs are responsible for collecting clients’ requests and sending them to an SDN controller. Then, executing a mixed-integer linear program (MILP) determines an optimal multicast tree from an appropriate set of video source servers to the optimal group of transcoders. The desired video is sent over the multicast tree. The VTFs transcode the received video segments and stream to the requested VRPs over unicast paths. To mitigate the time complexity of the proposed MILPmodel, we propose a heuristic algorithm that determines a near-optimal solution in a reasonable amount of time. Using theMiniNet emulator, we evaluate the proposed approach and show it achieves better performance in terms of cost and resource utilization in comparison with traditional multicast and unicast approaches.
|
[839] | Ekrem Cetinkaya, M. Furkan KIRAÇ, Image denoising using deep convolutional autoencoder with feature pyramids, In TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, vol. 28, no. 4, pp. 2096-2109, 2020.
[bib][url] [doi] [abstract]
Abstract: Image denoising is 1 of the fundamental problems in the image processing field since it is the preliminary stepfor many computer vision applications. Various approaches have been used for image denoising throughout the yearsfrom spatial filtering to model-based approaches. Having outperformed all traditional methods, neural-network-baseddiscriminative methods have gained popularity in recent years. However, most of these methods still struggle to achieveflexibility against various noise levels and types. In this paper, a deep convolutional autoencoder combined with a variantof feature pyramid network is proposed for image denoising. Simulated data generated by Blender software along withcorrupted natural images are used during training to improve robustness against various noise levels. Experimental resultsshow that the proposed method can achieve competitive performance in blind Gaussian denoising with significantly lesstraining time required compared to state of the art methods. Extensive experiments showed the proposed method givespromising performance in a wide range of noise levels with a single network.
|
[838] | Ekrem Cetinkaya, Hadi Amirpour, Christian Timmerer, Mohammad Ghanbari, FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning, In 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), IEEE, pp. 87-90, 2020.
[bib][url] [doi] [abstract]
Abstract: HTTP Adaptive Streaming(HAS) is the most common approach for delivering video content over the Internet. The requirement to encode the same content at different quality levels (i.e., representations) in HAS is a challenging problem for content providers. Fast multirate encoding approaches try to accelerate this process by reusing information from previously encoded representations. In this paper, we propose to use convolutional neural networks (CNNs) to speed up the encoding of multiple representations with a specific focus on parallel encoding. In parallel encoding, the overall time-complexity is limited to the maximum time-complexity of one of the representations that are encoded in parallel. Therefore, instead of reducing the time-complexity for all representations, the highest time-complexities are reduced. Experimental results show that FaME-ML achieves significant time-complexity savings in parallel encoding scenarios(41%in average) with a slight increase in bitrate and quality degradation compared to the HEVC reference software.
|
[837] | Hanna Borgli, Vajira Thambawita, Pia H. Smedsrud, Steven Hicks, Debesh Jha, Sigrun L. Eskeland, Kristin Ranheim Randel, Konstantin Pogorelov, Mathias Lux, Duc Tien Dang Nguyen, Dag Johansen, Carsten Griwodz, H\aakon K. Stensland, Enrique Garcia-Ceja, Peter T. Schmidt, Hugo L. Hammer, Michael A. Riegler, Paal Halvorsen, Thomas de Lange, HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy, In Scientific Data, Springer Science and Business Media LLC, vol. 7, no. 1, 2020.
[bib][url] [doi] [abstract]
Abstract: Artificial intelligence is currently a hot topic in medicine. However, medical data is often sparse and hard to obtain due to legal restrictions and lack of medical personnel for the cumbersome and tedious process to manually label training data. These constraints make it difficult to develop systems for automatic analysis, like detecting disease or other lesions. In this respect, this article presents HyperKvasir, the largest image and video dataset of the gastrointestinal tract available today. The data is collected during real gastro- and colonoscopy examinations at Bærum Hospital in Norway and partly labeled by experienced gastrointestinal endoscopists. The dataset contains 110,079 images and 374 videos, and represents anatomical landmarks as well as pathological and normal findings. The total number of images and video frames together is around 1 million. Initial experiments demonstrate the potential benefits of artificial intelligence-based computer-assisted diagnosis systems. The HyperKvasir dataset can play a valuable role in developing better algorithms and computer-assisted examination systems not only for gastro- and colonoscopy, but also for other fields in medicine.
|
[836] | Neha Bhadwal, Prateek Agrawal, Vishu Madaan, A Machine Translation System from Hindi to Sanskrit Language using Rule based Approach, In Scalable Computing: Practice and Experience, Scalable Computing: Practice and Experience, vol. 21, no. 3, pp. 543-554, 2020.
[bib][url] [doi] [abstract]
Abstract: Machine Translation is an area of Natural Language Processing which can replace the laborious task of manual translation. Sanskrit language is among the ancient Indo-Aryan languages. There are numerous works of art and literature in Sanskrit. It has also been a medium for creating treatise of philosophical work as well as works on logic, astronomy and mathematics. On the other hand, Hindi is the most prominent language of India. Moreover,it is among the most widely spoken languages across the world. This paper is an effort to bridge the language barrier between Hindi and Sanskrit language such that any text in Hindi can be translated to Sanskrit. The technique used for achieving the aforesaid objective is rule-based machine translation. The salient linguistic features of the two languages are used to perform the translation. The results are produced in the form of two confusion matrices wherein a total of 50 random sentences and 100 tokens (Hindi words or phrases) were taken for system evaluation. The semantic evaluation of 100 tokens produce an accuracy of 94% while the pragmatic analysis of 50 sentences produce an accuracy of around 86%. Hence, the proposed system can be used to understand the whole translation process and can further be employed as a tool for learning as well as teaching. Further, this application can be embedded in local communication based assisting Internet of Things (IoT) devices like Alexa or Google Assistant.
|
[835] | Abdelhak Bentaleb, Christian Timmerer, Ali C. Begen, Roger Zimmermann, Performance Analysis of ACTE: a Bandwidth Prediction Method for Low-Latency Chunked Streaming, In ACM Transactions on Multimedia Computing, Communications, and Applications, Association for Computing Machinery (ACM), vol. 16, no. 2s, pp. 1-24, 2020.
[bib][url] [doi] [abstract]
Abstract: HTTP adaptive streaming with chunked transfer encoding can offer low-latency streaming without sacrificing the coding efficiency.This allows media segments to be delivered while still being packaged. However, conventional schemes often make widely inaccurate bandwidth measurements due to the presence of idle periods between the chunks and hence this is causing sub-optimal adaptation decisions. To address this issue, we earlier proposed ACTE (ABR for Chunked Transfer Encoding), a bandwidth prediction scheme for low-latency chunked streaming. While ACTE was a significant step forward, in this study we focus on two still remaining open areas, namely (i) quantifying the impact of encoding parameters, including chunk and segment durations, bitrate levels, minimum interval between IDR-frames and frame rate onACTE, and (ii) exploring the impact of video content complexity on ACTE. We thoroughly investigate these questions and report on our findings. We also discuss some additional issues that arise in the context of pursuing very low latency HTTP video streaming.
|
[834] | Michal Barcis, Agata Barcis, Hermann Hellwagner, Information Distribution in Multi-Robot Systems: Utility-Based Evaluation Model, In Sensors, MDPI AG, vol. 20, no. 3, 2020.
[bib][url] [doi] [abstract]
Abstract: This work addresses the problem of information distribution in multi-robot systems, with an emphasis on multi-UAV (unmanned aerial vehicle) applications. We present an analytical model that helps evaluate and compare different information distribution schemes in a robotic mission. It serves as a unified framework to represent the usefulness (utility) of each message exchanged by the robots. It can be used either on its own in order to assess the information distribution efficacy or as a building block of solutions aimed at optimizing information distribution. Moreover, we present multiple examples of instantiating the model for specific missions. They illustrate various approaches to defining the utility of different information types. Finally, we introduce a proof of concept showing the applicability of the model in a robotic system by implementing it in Robot Operating System 2 (ROS 2) and performing a simple simulated mission using a network emulator. We believe the introduced model can serve as a basis for further research on generic solutions for assessing or optimizing information distribution.
|
[833] | Hadi Amirpour, Ekrem Cetinkaya, Christian Timmerer, Mohammad Ghanbari, Fast Multi-rate Encoding for Adaptive HTTP Streaming, In 2020 Data Compression Conference (DCC), IEEE, 2020.
[bib][url] [doi] [abstract]
Abstract: Adaptive HTTP streaming is the preferred method to deliver multimedia content in the internet. It provides multiple representations of the same content in different qualities (i.e. bit-rates and resolutions) and allows the client to request segments from the available representations in a dynamic, adaptive way depending on its context. The growing number of representations in adaptive HTTP streaming makes encoding of one video segment at different representations a challenging task in terms of encoding time-complexity. In this paper, information of both highest and lowest quality representations are used to limit Rate Distortion Optimization (RDO) for each Coding Unit Tree (CTU) in High Efficiency Video Coding. Our proposed method first encodes the highest quality representation and consequently uses it to encode the lowest quality representation. In particular, the block structure and the selected reference frame of both highest and lowest quality representations are then used to predict and shorten the RDO process of each CTU for intermediate quality representations. Our proposed method introduces a delay of two CTUs thanks to employing parallel processing techniques. Experimental results show significant reduction in time-complexity over the reference software 38% and the state-of-the-art 10% while quality degradation is negligible.
|
[832] | Hadi Amirpour, Christian Timmerer, Mohammad Ghanbari, Towards View-Aware Adaptive Streaming of Holographic Content, In 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, 2020.
[bib][url] [doi] [abstract]
Abstract: Holography is able to reconstruct a three-dimensional structure of an object by recording full wave fields of light emitted from the object. This requires a huge amount of data to be encoded, stored, transmitted, and decoded for holographic content, making its practical usage challenging especially for bandwidth-constrained networks and memory-limited devices. In the delivery of holographic content via the internet, bandwidth wastage should be avoided to tackle high bandwidth demands of holography streaming. For real-time applications, encoding time-complexity is also a major problem. In this paper, the concept of dynamic adaptive streaming over HTTP (DASH) is extended to holography image streaming and view-aware adaptation techniques are studied. As each area of a hologram contains information of a specific view, instead of encoding and decoding the entire hologram, just the part required to render the selected view is encoded and transmitted via the network based on the users’ interactivity. Four different strategies, namely, monolithic, single view, adaptive view, and non-real time streaming strategies are explained and compared in terms of bandwidth requirements, encoding time-complexity, and bitrate overhead. Experimental results show that the view-aware methods reduce the required bandwidth for holography streaming at the cost of a bitrate increase.
|
[831] | Jesus Aguilar-Armijo, Babak Taraghi, Christian Timmerer, Hermann Hellwagner, Dynamic Segment Repackaging at the Edge for HTTP Adaptive Streaming, In 2020 IEEE International Symposium on Multimedia (ISM), IEEE, pp. 17-24, 2020.
[bib] [doi] [abstract]
Abstract: Adaptive video streaming systems typically support different media delivery formats, e.g., MPEG-DASH and HLS, replicating the same content multiple times into the network. Such a diversified system results in inefficient use of storage, caching, and bandwidth resources. The Common Media Application Format (CMAF) emerges to simplify HTTP Adaptive Streaming (HAS), providing a single encoding and packaging format of segmented media content and offering the opportunities of bandwidth savings, more cache hits and less storage needed. However, CMAF is not yet supported by most devices. To solve this issue, we present a solution where we maintain the main advantages of CMAF while supporting heterogeneous devices using different media delivery formats. For that purpose, we propose to dynamically convert the content from CMAF to the desired media delivery format at an edge node. We study the bandwidth savings with our proposed approach using an analytical model and simulation, resulting in bandwidth savings of up to 20% with different media delivery format distributions. We analyze the runtime impact of the required operations on the segmented content performed in two scenarios: the classic one, with four different media delivery formats, and the proposed scenario, using CMAF-only delivery through the network. We compare both scenarios with different edge compute power assumptions. Finally, we perform experiments in a real video streaming testbed delivering MPEG-DASH using CMAF content to serve a DASH and an HLS client, performing the media conversion for the latter one.
|
[830] | Prateek Agrawal, Deepak Chaudhary, Vishu Madaan, Anatoliy Zabrovskiy, Radu Prodan, Dragi Kimovski, Christian Timmerer, Automated bank cheque verification using image processing and deep learning methods, In Multimedia Tools and Applications, Springer Science and Business Media LLC, vol. 80, no. 4, pp. 5319-5350, 2020.
[bib][url] [doi] [abstract]
Abstract: Automated bank cheque verification using image processing is an attempt to complement the present cheque truncation system, as well as to provide an alternate methodology for the processing of bank cheques with minimal human intervention. When it comes to the clearance of the bank cheques and monetary transactions, this should not only be reliable and robust but also save time which is one of the major factor for the countries having large population. In order to perform the task of cheque verification, we developed a tool which acquires the cheque leaflet key components, essential for the task of cheque clearance using image processing and deep learning methods. These components include the bank branch code, cheque number, legal as well as courtesy amount, account number, and signature patterns. our innovation aims at benefiting the banking system by re-innovating the other competent cheque-based monetary transaction system which requires automated system intervention. For this research, we used institute of development and research in banking technology (IDRBT) cheque dataset and deep learning based convolutional neural networks (CNN) which gave us an accuracy of 99.14% for handwritten numeric character recognition. It resulted in improved accuracy and precise assessment of the handwritten components of bank cheque. For machine printed script, we used MATLAB in-built OCR method and the accuracy achieved is satisfactory (97.7%) also for verification of Signature we have used Scale Invariant Feature Transform (SIFT) for extraction of features and Support Vector Machine (SVM) as classifier, the accuracy achieved for signature verification is 98.10%.
|
[829] | Prateek Agrawal, Anatoliy Zabrovskiy, Adithyan Ilangovan, Christian Timmerer, Radu Prodan, FastTTPS: fast approach for video transcoding time prediction and scheduling for HTTP adaptive streaming videos, In Cluster Computing, Springer Science and Business Media LLC, pp. 1-17, 2020.
[bib][url] [doi] [abstract]
Abstract: HTTP adaptive streaming of video content becomes an integrated part of the Internet and dominates other streaming protocols and solutions. The duration of creating video content for adaptive streaming ranges from seconds or up to several hours or days, due to the plethora of video transcoding parameters and video source types. Although, the computing resources of different transcoding platforms and services constantly increase, accurate and fast transcoding time prediction and scheduling is still crucial. We propose in this paper a novel method called fast video transcoding time prediction and scheduling (FastTTPS) of x264 encoded videos based on three phases: (i) transcoding data engineering, (ii) transcoding time prediction, and (iii) transcoding scheduling. The first phase is responsible for video sequence selection, segmentation and feature data collection required for predicting the transcoding time. The second phase develops an artificial neural network (ANN) model for segment transcoding time prediction based on transcoding parameters and derived video complexity features. The third phase compares a number of parallel schedulers to map the predicted transcoding segments on the underlying high-performance computing resources. Experimental results show that our predictive ANN model minimizes the transcoding mean absolute error (MAE) and mean square error (MSE) by up to 1.7 and 26.8, respectively. In terms of scheduling, our method reduces the transcoding time by up to 38% using a Max–Min algorithm compared to the actual transcoding time without prediction information.
|
[828] | Jesus Carretero, Emmanuel Jeannot, Albert Y. Zomaya, eds., Ultrascale Computing Systems, Institution of Engineering and Technology, 2019.
[bib][url] [doi] [abstract]
Abstract: With the spread of the Internet, applications and web-based services, distributed computing infrastructures, local parallel systems, and the availability of huge amounts of dispersed data, software-dependent systems will be more and more connected, more and more networked, leading to the creation of supersystems. The phrase ultrascale computing systems (UCSs) refers to this type of IT supersystems. UCSs are complex large-scale ecosystems aggregating high-performance parallel and distributed computing infrastructures. These systems provide to the end user intrinsically heterogeneous solutions, located at multiple sites and capable of delivering tremendous performance boosts. They are indispensable to applications offering several orders of magnitude increase in the size of data and in the computing power relative to today's existing conventional technologies. However, to really speak of UCS, we must consider several orders of magnitude increase in the size of data, in the computing power and in the network complexity relative to what is existing now.
|