scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Journal of Selected Topics in Signal Processing in 2012"


Journal ArticleDOI
TL;DR: It is shown that with the proposed games, global optimization is achieved with local information, specifically, the local altruistic game maximized the network throughput and the local congestion game minimizes the network collision level.
Abstract: We investigate the problem of achieving global optimization for distributed channel selections in cognitive radio networks (CRNs), using game theoretic solutions. To cope with the lack of centralized control and local influences, we propose two special cases of local interaction game to study this problem. The first is local altruistic game, in which each user considers the payoffs of itself as well as its neighbors rather than considering itself only. The second is local congestion game, in which each user minimizes the number of competing neighbors. It is shown that with the proposed games, global optimization is achieved with local information. Specifically, the local altruistic game maximizes the network throughput and the local congestion game minimizes the network collision level. Also, the concurrent spatial adaptive play (C-SAP), which is an extension of the existing spatial adaptive play (SAP), is proposed to achieve the global optimum both autonomously as well as rapidly.

300 citations


Journal ArticleDOI
TL;DR: The general conclusion is that existing VQA algorithms are not well-equipped to handle distortions that vary over time.
Abstract: We introduce a new video quality database that models video distortions in heavily-trafficked wireless networks and that contains measurements of human subjective impressions of the quality of videos. The new LIVE Mobile Video Quality Assessment (VQA) database consists of 200 distorted videos created from 10 RAW HD reference videos, obtained using a RED ONE digital cinematographic camera. While the LIVE Mobile VQA database includes distortions that have been previously studied such as compression and wireless packet-loss, it also incorporates dynamically varying distortions that change as a function of time, such as frame-freezes and temporally varying compression rates. In this article, we describe the construction of the database and detail the human study that was performed on mobile phones and tablets in order to gauge the human perception of quality on mobile devices. The subjective study portion of the database includes both the differential mean opinion scores (DMOS) computed from the ratings that the subjects provided at the end of each video clip, as well as the continuous temporal scores that the subjects recorded as they viewed the video. The study involved over 50 subjects and resulted in 5,300 summary subjective scores and time-sampled subjective traces of quality. In the behavioral portion of the article we analyze human opinion using statistical techniques, and also study a variety of models of temporal pooling that may reflect strategies that the subjects used to make the final decision on video quality. Further, we compare the quality ratings obtained from the tablet and the mobile phone studies in order to study the impact of these different display modes on quality. We also evaluate several objective image and video quality assessment (IQA/VQA) algorithms with regards to their efficacy in predicting visual quality. A detailed correlation analysis and statistical hypothesis testing is carried out. Our general conclusion is that existing VQA algorithms are not well-equipped to handle distortions that vary over time. The LIVE Mobile VQA database, along with the subject DMOS and the continuous temporal scores is being made available to researchers in the field of VQA at no cost in order to further research in the area of video quality assessment.

299 citations


Journal ArticleDOI
TL;DR: Several criteria for quantitative comparisons of source content, test conditions, and subjective ratings are proposed, which will allow researchers to make more well-informed decisions about databases and may also guide the creation of additional test material and the design of future experiments.
Abstract: Databases of images or videos annotated with subjective ratings constitute essential ground truth for training, testing, and benchmarking algorithms for objective quality assessment. More than two dozen such databases are now available in the public domain; they are presented and analyzed in this paper. We propose several criteria for quantitative comparisons of source content, test conditions, and subjective ratings, which are used as the basis for the ensuing analyses and discussion. This information will allow researchers to make more well-informed decisions about databases, and may also guide the creation of additional test material and the design of future experiments.

270 citations


Journal ArticleDOI
TL;DR: The main contribution of this paper is the insight that the transmitters' knowledge of channel coherence intervals alone (without any knowledge of the values of channel coefficients) can be surprisingly useful in a multiuser setting, illustrated by the idea of blind interference alignment.
Abstract: The main contribution of this paper is the insight that the transmitters' knowledge of channel coherence intervals alone (without any knowledge of the values of channel coefficients) can be surprisingly useful in a multiuser setting, illustrated by the idea of blind interference alignment that is introduced in this work. Specifically, we explore five network communication problems where the possibility of interference alignment, and consequently the total number of degrees of freedom (DoF) with channel uncertainty at the transmitters, are unknown. These problems share the common property that in each case the best known outer bounds are essentially robust to channel uncertainty and represent the outcome with interference alignment, but the best inner bounds-in some cases conjectured to be optimal-predict a total collapse of DoF, thus indicating the infeasibility of interference alignment under channel uncertainty at transmitters. For each of these settings we show that even with no knowledge of channel coefficient values at the transmitters, under certain heterogeneous block fading models, i.e., when certain users experience smaller coherence time/bandwidth than others, blind interference alignment can be achieved. In each case we also establish the DoF optimality of the blind interference alignment scheme.

265 citations


Journal ArticleDOI
TL;DR: A new compressed sensing framework is proposed for extracting useful second-order statistics of wideband random signals from digital samples taken at sub-Nyquist rates, exploiting the unique sparsity property of the two-dimensional cyclic spectra of communications signals.
Abstract: For cognitive radio networks, efficient and robust spectrum sensing is a crucial enabling step for dynamic spectrum access. Cognitive radios need to not only rapidly identify spectrum opportunities over very wide bandwidth, but also make reliable decisions in noise-uncertain environments. Cyclic spectrum sensing techniques work well under noise uncertainty, but require high-rate sampling which is very costly in the wideband regime. This paper develops robust and compressive wideband spectrum sensing techniques by exploiting the unique sparsity property of the two-dimensional cyclic spectra of communications signals. To do so, a new compressed sensing framework is proposed for extracting useful second-order statistics of wideband random signals from digital samples taken at sub-Nyquist rates. The time-varying cross-correlation functions of these compressive samples are formulated to reveal the cyclic spectrum, which is then used to simultaneously detect multiple signal sources over the entire wide band. Because the proposed wideband cyclic spectrum estimator utilizes all the cross-correlation terms of compressive samples to extract second-order statistics, it is also able to recover the power spectra of stationary signals as a special case, permitting lossless rate compression even for non-sparse signals. Simulation results demonstrate the robustness of the proposed spectrum sensing algorithms against both sampling rate reduction and noise uncertainty in wireless networks.

249 citations


Journal ArticleDOI
TL;DR: It is shown that it is possible to achieve more than 1 degrees of freedom (DoF) based on only delayed CSIT in the 3-user interference channel and the 2-user X channel consisting of only single antenna nodes, and in other settings, retrospective interference alignment is feasible.
Abstract: Maddah-Ali and Tse recently introduced the idea of retrospective interference alignment, i.e., achieving interference alignment with only outdated (stale) channel state information at the transmitter (CSIT), in the context of the vector broadcast channel. Since the scheme relies on the centralized transmitter's ability to reconstruct all the interference seen in previous symbols, it is not clear if retrospective interference alignment can be applied in interference networks consisting of distributed transmitters and receivers, where interference is contributed by multiple transmitters, each of whom can reconstruct only the part of the interference caused by themselves. In this work, we prove that even in such settings, retrospective interference alignment is feasible. Specifically, we show that it is possible to achieve more than 1 degrees of freedom (DoF) based on only delayed CSIT in the 3-user interference channel and the 2-user X channel consisting of only single antenna nodes. Retrospective interference alignment is also shown to be possible in other settings, such as the 2-user multiple-input and multiple-output (MIMO) interference channel and with delayed channel output feedback.

222 citations


Journal ArticleDOI
TL;DR: Close-form formulas are derived to calculate appropriate REBs for two different range expansion strategies, investigate both DL and uplink (UL) inter-cell interference coordination (ICIC) to enhance picocell performance, and propose a new macrocell-picocell cooperative scheduling scheme to mitigate bothDL and UL interference caused by macrocells to ER PUEs.
Abstract: In order to expand the downlink (DL) coverage areas of picocells in the presence of an umbrella macrocell, the concept of range expansion has been recently proposed, in which a positive range expansion bias (REB) is added to the DL received signal strengths (RSSs) of picocell pilot signals at user equipments (UEs). Although range expansion may increase DL footprints of picocells, it also results in severe DL inter-cell interference in picocell expanded regions (ERs), because ER picocell user equipments (PUEs) are not connected to the cells that provide the strongest DL RSSs. In this paper, we derive closed-form formulas to calculate appropriate REBs for two different range expansion strategies, investigate both DL and uplink (UL) inter-cell interference coordination (ICIC) to enhance picocell performance, and propose a new macrocell-picocell cooperative scheduling scheme to mitigate both DL and UL interference caused by macrocells to ER PUEs. Simulation results provide insights on REB selection approaches at picocells, and demonstrate the benefits of the proposed macrocell-picocell cooperative scheduling scheme over alternative approaches.

196 citations


Journal ArticleDOI
TL;DR: Numerical results confirm that the proposed resource allocation schemes are effective in increasing the network energy efficiency (as compared to rate-maximizing schemes), thus permitting to optimize the use of the energy stored in the battery.
Abstract: The problem of noncooperative resource allocation in multicell uplink orthogonal frequency division multiple access (OFDMA) systems is considered in this paper. Noncooperative games for subcarrier allocation and transmit power control are considered, aiming at maximizing the users' SINRs and, most notably, the users' energy efficiency, measured in bit/Joule and representing the number of error-free delivered bits for each Joule of energy used for transmission. The theory of potential games is used to come up with several noncooperative games admitting Nash equilibrium points. Since the proposed resource allocation games exhibit a computational complexity that may be in some cases prohibitive, approximate, reduced-complexity, implementations are also considered. For comparison purposes, some considerations on social-optimum solutions are also discussed. Numerical results confirm that the proposed resource allocation schemes are effective in increasing the network energy efficiency (as compared to rate-maximizing schemes), thus permitting to optimize the use of the energy stored in the battery. Moreover, the proposed approximate implementations exhibit a performance very close to that of the exact procedures.

167 citations


Journal ArticleDOI
TL;DR: A comparison of the most promising methods for multi-view human action recognition using two publicly available datasets: the INRIA Xmas Motion Acquisition Sequences (IXMAS) Multi-View Human Action Dataset, and the i3DPost Multi- view Human Action and InteractionDataset.
Abstract: This paper presents a review and comparative study of recent multi-view approaches for human 3D pose estimation and activity recognition. We discuss the application domain of human pose estimation and activity recognition and the associated requirements, covering: advanced human–computer interaction (HCI), assisted living, gesture-based interactive games, intelligent driver assistance systems, movies, 3D TV and animation, physical therapy, autonomous mental development, smart environments, sport motion analysis, video surveillance, and video annotation. Next, we review and categorize recent approaches which have been proposed to comply with these requirements. We report a comparison of the most promising methods for multi-view human action recognition using two publicly available datasets: the INRIA Xmas Motion Acquisition Sequences (IXMAS) Multi-View Human Action Dataset, and the i3DPost Multi-View Human Action and Interaction Dataset. To compare the proposed methods, we give a qualitative assessment of methods which cannot be compared quantitatively, and analyze some prominent 3D pose estimation techniques for application, where not only the performed action needs to be identified but a more detailed description of the body pose and joint configuration. Finally, we discuss some of the shortcomings of multi-view camera setups and outline our thoughts on future directions of 3D body pose estimation and human action recognition.

162 citations


Journal ArticleDOI
TL;DR: This paper proposes a technique performing a classification of the features extracted with EAPs computed on both optical and LiDAR images, leading to a fusion of the spectral, spatial and elevation data.
Abstract: Extended Attribute Profiles (EAPs), which are obtained by applying morphological attribute filters to an image in a multilevel architecture, can be used for the characterization of the spatial characteristics of objects in a scene. EAPs have proved to be discriminant features when considered for thematic classification in remote sensing applications especially when dealing with very high resolution images. Altimeter data (such as LiDAR) can provide important information, which being complementary to the spectral one can be valuable for a better characterization of the surveyed scene. In this paper, we propose a technique performing a classification of the features extracted with EAPs computed on both optical and LiDAR images, leading to a fusion of the spectral, spatial and elevation data. The experiments were carried out on LiDAR data along either with a hyperspectral and a multispectral image acquired on a rural and urban area of the city of Trento (Italy), respectively. The classification accuracies obtained pointed out the effectiveness of the features extracted by EAPs on both optical and LiDAR data for classification.

144 citations


Journal ArticleDOI
TL;DR: Experimental results show that HHF and depth adaptive HHF yield virtual images and stereoscopic videos that are free of geometric distortions and a better rendering quality both subjectively and objectively than traditional hole-filling approaches.
Abstract: Three-dimensional television (3DTV) is believed to be the future of television broadcasting that would replace current 2D HDTV technology. Future 3DTV would bring a more life-like and visually immersive home entertainment experience, in which users will have the freedom to navigate through the scene to choose a different viewpoint. A desired view can be synthesized at the receiver side using depth image-based rendering (DIBR). While this approach has many advantages, one of the key challenges in DIBR is how to fill the holes caused by disocclusion regions and wrong depth values. In this paper, we propose two new approaches for disocclusion removal in DIBR. Both approaches namely hierarchical hole-filling (HHF) and depth adaptive hierarchical hole-filling eliminate the need for any smoothing or filtering of the depth map. Both techniques use a pyramid-like approach to estimate the hole pixels from lower resolution estimates of the 3D wrapped image. The lower resolution estimates involve a pseudo zero canceling plus Gaussian filtering of the wrapped image. The depth adaptive HHF incorporates the depth information to produce a higher resolution rendering around previously occluded areas. Experimental results show that HHF and depth adaptive HHF yield virtual images and stereoscopic videos that are free of geometric distortions and a better rendering quality both subjectively and objectively than traditional hole-filling approaches.

Journal ArticleDOI
TL;DR: This paper thoroughly reviews the recent advances of perceptual video compression mainly in terms of the three major components, namely, perceptual model definition, implementation of coding, and performance evaluation.
Abstract: With the advances in understanding perceptual properties of the human visual system and constructing their computational models, efforts toward incorporating human perceptual mechanisms in video compression to achieve maximal perceptual quality have received great attention. This paper thoroughly reviews the recent advances of perceptual video compression mainly in terms of the three major components, namely, perceptual model definition, implementation of coding, and performance evaluation. Furthermore, open research issues and challenges are discussed in order to provide perspectives for future research trends.

Journal ArticleDOI
TL;DR: This paper builds an image retargeting quality database, and demonstrates that the metric performance can be further improved, by fusing the descriptors of shape distortion and content information loss.
Abstract: This paper presents the result of a recent large-scale subjective study of image retargeting quality on a collection of images generated by several representative image retargeting methods. Owning to many approaches to image retargeting that have been developed, there is a need for a diverse independent public database of the retargeted images and the corresponding subjective scores to be freely available. We build an image retargeting quality database, in which 171 retargeted images (obtained from 57 natural source images of different contents) were created by several representative image retargeting methods. And the perceptual quality of each image is subjectively rated by at least 30 viewers, meanwhile the mean opinion scores (MOS) were obtained. It is revealed that the subject viewers have arrived at a reasonable agreement on the perceptual quality of the retargeted image. Therefore, the MOS values obtained can be regarded as the ground truth for evaluating the quality metric performances. The database is made publicly available (Image Retargeting Subjective Database, [Online]. Available: http://ivp.ee.cuhk.edu.hk/projects/demo/retargeting/index.html) to the research community in order to further research on the perceptual quality assessment of the retargeted images. Moreover, the built image retargeting database is analyzed from the perspectives of the retargeting scale, the retargeting method, and the source image content. We discuss how to retarget the images according to the scale requirement and the source image attribute information. Furthermore, several publicly available quality metrics for the retargeted images are evaluated on the built database. How to develop an effective quality metric for retargeted images is discussed through a specifically designed subjective testing process. It is demonstrated that the metric performance can be further improved, by fusing the descriptors of shape distortion and content information loss.

Journal ArticleDOI
TL;DR: Realistic and comprehensive mathematical models of the OFDM-based mobile Worldwide Interoperability for Microwave Access (WiMAX) and third-Generation Partnership Project Long Term Evolution (3GPP LTE) signals are developed, and their second-order cyclostationarity is studied.
Abstract: Spectrum sensing and awareness are challenging requirements in cognitive radio (CR). To adequately adapt to the changing radio environment, it is necessary for the CR to detect the presence and classify the on-the-air signals. The wireless industry has shown great interest in orthogonal frequency division multiplexing (OFDM) technology. Hence, classification of OFDM signals has been intensively researched recently. Generic signals have been mainly considered, and there is a need to investigate OFDM standard signals, and their specific discriminating features for classification. In this paper, realistic and comprehensive mathematical models of the OFDM-based mobile Worldwide Interoperability for Microwave Access (WiMAX) and third-Generation Partnership Project Long Term Evolution (3GPP LTE) signals are developed, and their second-order cyclostationarity is studied. Closed-from expressions for the cyclic autocorrelation function (CAF) and cycle frequencies (CFs) of both signal types are derived, based on which an algorithm is proposed for their classification. The proposed algorithm does not require carrier, waveform, and symbol timing recovery, and is immune to phase, frequency, and timing offsets. The classification performance of the algorithm is investigated versus signal-to-noise ratio (SNR), for diverse observation intervals and channel conditions. In addition, the computational complexity is explored versus the signal type. Simulation results show the efficiency of the algorithm is terms of classification performance, and the complexity study proves the real time applicability of the algorithm.

Journal ArticleDOI
TL;DR: It is shown that, by using the proposed algorithm, the SUs can self-organize into a network partition composed of disjoint coalitions, with the members of each coalition cooperating to jointly optimize their sensing and access performance.
Abstract: Unlicensed secondary users (SUs) in cognitive radio networks are subject to an inherent tradeoff between spectrum sensing and spectrum access. Although each SU has an incentive to sense the primary user (PU) channels for locating spectrum holes, this exploration of the spectrum can come at the expense of a shorter transmission time, and, hence, a possibly smaller capacity for data transmission. This paper investigates the impact of this tradeoff on the cooperative strategies of a network of SUs that seek to cooperate in order to improve their view of the spectrum (sensing), reduce the possibility of interference among each other, and improve their transmission capacity (access). The problem is modeled as a coalitional game in partition form and an algorithm for coalition formation is proposed. Using the proposed algorithm, the SUs can make individual distributed decisions to join or leave a coalition while maximizing their utilities which capture the average time spent for sensing as well as the capacity achieved while accessing the spectrum. It is shown that, by using the proposed algorithm, the SUs can self-organize into a network partition composed of disjoint coalitions, with the members of each coalition cooperating to jointly optimize their sensing and access performance. Simulation results show the performance improvement that the proposed algorithm yields with respect to the noncooperative case. The results also show how the algorithm allows the SUs to self-adapt to changes in the environment such as changes in the traffic of the PUs, or slow mobility.

Journal ArticleDOI
TL;DR: A power control game in the interference channel is used to highlight the advantages of modeling QoS problems following the notion of SE rather than other equilibrium concepts, e.g., generalized Nash equilibrium.
Abstract: This paper introduces a particular game formulation and its corresponding notion of equilibrium, namely the satisfaction form (SF) and the satisfaction equilibrium (SE). A game in SF models the case where players are uniquely interested in the satisfaction of some individual performance constraints, instead of individual performance optimization. Under this formulation, the notion of equilibrium corresponds to the situation where all players can simultaneously satisfy their individual constraints. The notion of SE, models the problem of QoS provisioning in decentralized self-configuring networks. Here, radio devices are satisfied if they are able to provide the requested QoS. Within this framework, the concept of SE is formalized for both pure and mixed strategies considering finite sets of players and actions. In both cases, sufficient conditions for the existence and uniqueness of the SE are presented. When multiple SE exist, we introduce the idea of effort or cost of satisfaction and we propose a refinement of the SE, namely the efficient SE (ESE). At the ESE, all players adopt the action which requires the lowest effort for satisfaction. A learning method that allows radio devices to achieve a SE in pure strategies in finite time and requiring only one-bit feedback is also presented. Finally, a power control game in the interference channel is used to highlight the advantages of modeling QoS problems following the notion of SE rather than other equilibrium concepts, e.g., generalized Nash equilibrium.

Journal ArticleDOI
Jia Meng1, Wotao Yin2, Yingying Li2, Nam Tuan Nguyen3, Zhu Han3 
TL;DR: It is theoretically show that in the proposed system, a N-resolution channel can be faithfully obtained with an ADC speed at M=O(S2log(N/S), where N is also the DAC speed and S is the channel impulse response sparsity and the Cramér-Rao lower bound is derived.
Abstract: Orthogonal frequency division multiplexing (OFDM) is a technique that will prevail in the next-generation wireless communication. Channel estimation is one of the key challenges in OFDM, since high-resolution channel estimation can significantly improve the equalization at the receiver and consequently enhance the communication performances. In this paper, we propose a system with an asymmetric digital-to-analog converter/analog-to-digital converter (DAC/ADC) pair and formulate OFDM channel estimation as a compressive sensing problem. By skillfully designing pilots and taking advantages of the sparsity of the channel impulse response, the proposed system realizes high-resolution channel estimation at a low cost. The pilot design, the use of a high-speed DAC and a regular-speed ADC, and the estimation algorithm tailored for channel estimation distinguish the proposed approach from the existing estimation approaches. We theoretically show that in the proposed system, a N-resolution channel can be faithfully obtained with an ADC speed at M=O(S2log(N/S)), where N is also the DAC speed and S is the channel impulse response sparsity. Since S is small and increasing the DAC speed to N >; M is relatively cheap, we obtain a high-resolution channel at a low cost. We also present a novel estimator that is both faster and more accurate than the typical l1 minimization. In the numerical experiments, we simulated various numbers of multipaths and different SNRs and let the transmitter DAC run at 16 times the speed of the receiver ADC for estimating channels at the 16 t resolution. While there is no similar approaches (for asymmetric DAC/ADC pairs) to compare with, we derive the Cramer-Rao lower bound.

Journal ArticleDOI
TL;DR: This paper addresses the question of determining the most suitable way to conduct audiovisual subjective testing on a wide range of audiovISual quality, and analyses show that the results of experiments done in pristine, laboratory environments are highly representative of those devices in actual use, in a typical user environment.
Abstract: Traditionally, audio quality and video quality are evaluated separately in subjective tests. Best practices within the quality assessment community were developed before many modern mobile audiovisual devices and services came into use, such as internet video, smart phones, tablets and connected televisions. These devices and services raise unique questions that require jointly evaluating both the audio and the video within a subjective test. However, audiovisual subjective testing is a relatively under-explored field. In this paper, we address the question of determining the most suitable way to conduct audiovisual subjective testing on a wide range of audiovisual quality. Six laboratories from four countries conducted a systematic study of audiovisual subjective testing. The stimuli and scale were held constant across experiments and labs; only the environment of the subjective test was varied. Some subjective tests were conducted in controlled environments and some in public environments (a cafeteria, patio or hallway). The audiovisual stimuli spanned a wide range of quality. Results show that these audiovisual subjective tests were highly repeatable from one laboratory and environment to the next. The number of subjects was the most important factor. Based on this experiment, 24 or more subjects are recommended for Absolute Category Rating (ACR) tests. In public environments, 35 subjects were required to obtain the same Student's t-test sensitivity. The second most important variable was individual differences between subjects. Other environmental factors had minimal impact, such as language, country, lighting, background noise, wall color, and monitor calibration. Analyses indicate that Mean Opinion Scores (MOS) are relative rather than absolute. Our analyses show that the results of experiments done in pristine, laboratory environments are highly representative of those devices in actual use, in a typical user environment.

Journal ArticleDOI
TL;DR: The results provide proof-of-concept for a fully automated crack detection system based on the presented method, utilizing morphological image processing and statistical classification by logistic regression based on 3D profile data of steel slab surfaces.
Abstract: Continuous casting is a highly efficient process used to produce most of the world steel production tonnage, but can cause cracks in the semi-finished steel product output. These cracks may cause problems further down the production chain, and detecting them early in the process would avoid unnecessary and costly processing of the defective goods. In order for a crack detection system to be accepted in industry, however, false detection of cracks in non-defective goods must be avoided. This is further complicated by the presence of scales; a brittle, often cracked, top layer originating from the casting process. We present an approach for an automated on-line crack detection system, based on 3D profile data of steel slab surfaces, utilizing morphological image processing and statistical classification by logistic regression. The initial segmentation successfully extracts 80% of the crack length present in the data, while discarding most potential pseudo-defects (non-defect surface features similar to defects). The subsequent statistical classification individually has a crack detection accuracy of over 80% (with respect to total segmented crack length), while discarding all remaining manually identified pseudo-defects. Taking more ambiguous regions into account gives a worst-case false classification of 131 mm within the 30 600 mm long sequence of 150 mm wide regions used as validation data. The combined system successfully identifies over 70% of the manually identified (unambiguous) crack length, while missing only a few crack regions containing short crack segments. The results provide proof-of-concept for a fully automated crack detection system based on the presented method.

Journal ArticleDOI
TL;DR: It is shown that a certain EEG technique, event-related-potentials (ERP) analysis, is a useful and valid tool in quality research and can be monitored in conscious and presumably non-conscious stages of processing.
Abstract: Common speech quality evaluation methods rely on self-reported opinions after perceiving test stimuli. Whereas these methods-when carefully applied-provide valid and reliable quality indices, they provide little insight into the processes underlying perception and judgment. In this paper, we analyze the performance of electroencephalography (EEG) for indicating different types of degradations in speech stimuli. We show that a certain EEG technique, event-related-potentials (ERP) analysis, is a useful and valid tool in quality research. Three experiments are reported which show that quality degradations can be monitored in conscious and presumably non-conscious stages of processing. Potential and limitations of the approach are discussed and lines of future research are drawn.

Journal ArticleDOI
TL;DR: It is proposed in this paper, that as a complement to assessing the quality degradation due to coding or transmission, the appropriateness of the non-distorted signal should be addressed.
Abstract: 3D video quality of experience (QoE) is a multidimensional problem; many factors contribute to the global rating like image quality, depth perception and visual discomfort. Due to this multidimensionality, it is proposed in this paper, that as a complement to assessing the quality degradation due to coding or transmission, the appropriateness of the non-distorted signal should be addressed. One important factor here is the depth information provided by the source sequences. From an application-perspective, the depth-characteristics of source content are of relevance for pre-validating whether the content is suitable for 3D video services. In addition, assessing the interplay between binocular and monocular depth features and depth perception are relevant topics for 3D video perception research. To achieve the evaluation of the suitability of 3D content, this paper describes both a subjective experiment and a new objective indicator to evaluate depth as one of the added values of 3D video.

Journal ArticleDOI
TL;DR: This work analyzes the asymptotic convergence of the risk measure of sample minimum variance portfolios of arbitrarily high dimension and proposes a generalized consistent estimator of the out-of-sample portfolio variance that only depends on the set of observed returns.
Abstract: We study the realized variance of sample minimum variance portfolios of arbitrarily high dimension. We consider the use of covariance matrix estimators based on shrinkage and weighted sampling. For such improved portfolio implementations, the otherwise intractable problem of characterizing the realized variance is tackled here by analyzing the asymptotic convergence of the risk measure. Rather than relying on less insightful classical asymptotics, we manage to deliver results in a practically more meaningful limiting regime, where the number of assets remains comparable in magnitude to the sample size. Under this framework, we provide accurate estimates of the portfolio realized risk in terms of the model parameters and the underlying investment scenario, i.e., the unknown asset return covariance structure. In-sample approximations in terms of only the available data observations are known to considerably underestimate the realized portfolio risk. If not corrected, these deviations might lead in practice to inaccurate and overly optimistic investment decisions. Therefore, along with the asymptotic analysis, we also provide a generalized consistent estimator of the out-of-sample portfolio variance that only depends on the set of observed returns. Based on this estimator, the model free parameters, i.e., the sample weighting coefficients and the shrinkage intensity defining the minimum variance portfolio implementation, can be optimized so as to minimize the realized variance while taken into account the effect of estimation risk. Our results are based on recent contributions in the field of random matrix theory. Numerical simulations based on both synthetic and real market data validate our theoretical findings under a non-asymptotic, finite-dimensional setting. Finally, our proposed portfolio estimator is shown to consistently outperform a widely applied benchmark implementation.

Journal ArticleDOI
TL;DR: This paper proposes to apply the Kalman Temporal Differences (KTD) framework to the problem of dialogue strategy optimization so as to address all these issues in a comprehensive manner with a single framework.
Abstract: Reinforcement learning is now an acknowledged approach for optimizing the interaction strategy of spoken dialogue systems. If the first considered algorithms were quite basic (like SARSA), recent works concentrated on more sophisticated methods. More attention has been paid to off-policy learning, dealing with the exploration-exploitation dilemma, sample efficiency or handling non-stationarity. New algorithms have been proposed to address these issues and have been applied to dialogue management. However, each algorithm often solves a single issue at a time, while dialogue systems exhibit all the problems at once. In this paper, we propose to apply the Kalman Temporal Differences (KTD) framework to the problem of dialogue strategy optimization so as to address all these issues in a comprehensive manner with a single framework. Our claims are illustrated by experiments led on two real-world goal-oriented dialogue management frameworks, DIPPER and HIS.

Journal ArticleDOI
TL;DR: A bag-of-words (BoW) vocabulary of human actions is built, which is compressed and classified using agglomerative information bottleneck (AIB) and support vector machines (SVMs), respectively, to improve the discrimination between arm- and leg-based actions.
Abstract: In this paper, we address the problem of human action recognition in reconstructed 3-D data acquired by multi-camera systems. We contribute to this field by introducing a novel 3-D action recognition approach based on detection of 4-D (3-D space $+$ time) spatio-temporal interest points (STIPs) and local description of 3-D motion features. STIPs are detected in multi-view images and extended to 4-D using 3-D reconstructions of the actors and pixel-to-vertex correspondences of the multi-camera setup. Local 3-D motion descriptors, histogram of optical 3-D flow (HOF3D), are extracted from estimated 3-D optical flow in the neighborhood of each 4-D STIP and made view-invariant. The local HOF3D descriptors are divided using 3-D spatial pyramids to capture and improve the discrimination between arm- and leg-based actions. Based on these pyramids of HOF3D descriptors we build a bag-of-words (BoW) vocabulary of human actions, which is compressed and classified using agglomerative information bottleneck (AIB) and support vector machines (SVMs), respectively. Experiments on the publicly available i3DPost and IXMAS datasets show promising state-of-the-art results and validate the performance and view-invariance of the approach.

Journal ArticleDOI
TL;DR: Two high-resolution missing-data spectral estimation algorithms are presented: the Iterative Adaptive Approach and the Sparse Learning via Iterative Minimization method, which can significantly improve the spectral estimation performance, including enhanced resolution and reduced sidelobe levels.
Abstract: We consider nonparametric adaptive spectral analysis of complex-valued data sequences with missing samples occurring in arbitrary patterns. We first present two high-resolution missing-data spectral estimation algorithms: the Iterative Adaptive Approach (IAA) and the Sparse Learning via Iterative Minimization (SLIM) method. Both algorithms can significantly improve the spectral estimation performance, including enhanced resolution and reduced sidelobe levels. Moreover, we consider fast implementations of these algorithms using the Conjugate Gradient (CG) technique and the Gohberg-Semencul-type (GS) formula. Our proposed implementations fully exploit the structure of the steering matrices and maximize the usage of the fast Fourier transform (FFT), resulting in much lower computational complexities as well as much reduced memory requirements. The effectiveness of the adaptive spectral estimation algorithms is demonstrated via several numerical examples including both 1-D spectral estimation and 2-D interrupted synthetic aperture radar (SAR) imaging examples.

Journal ArticleDOI
TL;DR: This work combines a quality guided phase unwrapping approach with absolute phase estimates from the stereo cameras to solve for the absolute phase of connected regions.
Abstract: Phase shifted sinusoidal patterns have proven to be effective in structured light systems, which typically consist of a camera and projector. They offer low decoding complexity, require as few as three projection frames per reconstruction, and are well suited for capturing dynamic scenes. In these systems, depth is reconstructed by determining the phase projected onto each pixel in the camera and establishing correspondences between camera and projector pixels. Typically, multiple periods are projected within the set of sinusoidal patterns, thus requiring phase unwrapping on the phase image before correspondences can be established. A second camera can be added to the structured light system to help with phase unwrapping. In this work, we present two consistent phase unwrapping methods for two-camera stereo structured light systems. The first method enforces viewpoint consistency by phase unwrapping in the projector domain. Loopy belief propagation is run over the graph of projector pixels to select pixel correspondences between the left and right camera that align in 3-D space and are spatially smooth in each 2-D image. The second method enforces temporal consistency by unwrapping across space and time. We combine a quality guided phase unwrapping approach with absolute phase estimates from the stereo cameras to solve for the absolute phase of connected regions. We present results for both methods to show their effectiveness on real world scenes.

Journal ArticleDOI
TL;DR: The open problem of the generalization of mathematical morphology to vector images is handled using the paradigm of depth functions and the fundamental assumption of this data-driven approach is the existence of “background/foreground” image representation.
Abstract: The open problem of the generalization of mathematical morphology to vector images is handled in this paper using the paradigm of depth functions. Statistical depth functions provide from the “deepest” point a “center-outward ordering” of a multidimensional data distribution and they can be therefore used to construct morphological operators. The fundamental assumption of this data-driven approach is the existence of “background/foreground” image representation. Examples in real color and hyperspectral images illustrate the results.

Journal ArticleDOI
TL;DR: A novel segmentation scheme where multidimensional vectors are used to jointly represent color and depth data and normalized cuts spectral clustering is applied to them in order to segment the scene.
Abstract: Scene segmentation is a well-known problem in computer vision traditionally tackled by exploiting only the color information from a single scene view. Recent hardware and software developments allow to estimate in real-time scene geometry and open the way for new scene segmentation approaches based on the fusion of both color and depth data. This paper follows this rationale and proposes a novel segmentation scheme where multidimensional vectors are used to jointly represent color and depth data and normalized cuts spectral clustering is applied to them in order to segment the scene. The critical issue of how to balance the two sources of information is solved by an automatic procedure based on an unsupervised metric for the segmentation quality. An extension of the proposed approach based on the exploitation of both images in stereo vision systems is also proposed. Different acquisition setups, like time-of-flight cameras, the Microsoft Kinect device and stereo vision systems have been used for the experimental validation. A comparison of the effectiveness of the different depth imaging systems for segmentation purposes is also presented. Experimental results show how the proposed algorithm outperforms scene segmentation algorithms based on geometry or color data alone and also other approaches that exploit both clues.

Journal ArticleDOI
TL;DR: Experimental results in real outdoor scenarios are provided showing the viability of the proposed multimodal acquisition system and that the proposed approach is not restricted to a specific domain.
Abstract: This paper proposes an imaging system for computing sparse depth maps from multispectral images. A special stereo head consisting of an infrared and a color camera defines the proposed multimodal acquisition system. The cameras are rigidly attached so that their image planes are parallel. Details about the calibration and image rectification procedure are provided. Sparse disparity maps are obtained by the combined use of mutual information enriched with gradient information. The proposed approach is evaluated using a Receiver Operating Characteristics curve. Furthermore, a multispectral dataset, color and infrared images, together with their corresponding ground truth disparity maps, is generated and used as a test bed. Experimental results in real outdoor scenarios are provided showing its viability and that the proposed approach is not restricted to a specific domain.

Journal ArticleDOI
TL;DR: The Dirichlet problem associated with the new p-Laplacian equation is studied and proved and it is proposed to use these operators as a unified framework for solution of many inverse problems in image processing and machine learning.
Abstract: In this paper, we introduce a new class of non-local p-Laplacian operators that interpolate between non-local Laplacian and infinity Laplacian. These operators are discrete analogous of the game p -laplacian operators on Euclidean spaces, and involve discrete morphological gradient on graphs. We study the Dirichlet problem associated with the new p-Laplacian equation and prove existence and uniqueness of it's solution. We also consider non-local diffusion on graphs involving these operators. Finally, we propose to use these operators as a unified framework for solution of many inverse problems in image processing and machine learning.