scispace - formally typeset
Search or ask a question

Showing papers by "Ioannis Pitas published in 2018"


Proceedings ArticleDOI
23 Jul 2018
TL;DR: The outlined issues are partitioned into challenges deriving from ethical/legal/safety considerations and from operational/production requirements, and a brief survey of current technological solutions, including their limitations, is provided.
Abstract: Autonomous UAV cinematography is an active research field with exciting potential for the media industry. It bears the promise of greatly facilitating UAV shooting for various applications, while significantly reducing the costs compared to manual shooting. However, the general problem has not been clearly defined and the challenges arising from current legislation and technology restrictions have not been fully charted. A complete overview of issues related to autonomous UAV cinematography is needed, pertaining to the current situation in the field, so as to guide immediate-future research. The purpose of this paper is to lay exactly this groundwork, with the expectation of providing a global perspective to multiple domain-specific research communities. The outlined issues are partitioned into challenges deriving from ethical/legal/safety considerations and from operational/production requirements. A brief survey of current technological solutions, including their limitations, is also provided for each issue.

50 citations


Journal ArticleDOI
TL;DR: Three specific, novel video summarization methods are derived from a flexible definition of an activity video summary, as the set of key-frames that can both reconstruct the original, full-length video and simultaneously represent its most salient parts.

42 citations


Journal ArticleDOI
TL;DR: Experimental results showed that the incorporation of negative label information increases, in all cases, the classification accuracy of the state of the art.
Abstract: This paper extends the state-of-the-art label propagation (LP) framework in the propagation of negative labels. More specifically, the state-of-the-art LP methods propagate information of the form “the sample $i$ should be assigned the label $k$ .” The proposed method extends the state-of-the-art framework by considering additional information of the form “the sample $i$ should not be assigned the label $k$ .” A theoretical analysis is presented in order to include negative LP in the problem formulation. Moreover, a method for selecting the negative labels in cases when they are not inherent from the data structure is presented. Furthermore, the incorporation of negative label information in two multigraph LP methods is presented. Finally, a discussion on the proposed algorithm extension to out of sample data, as well as scalability issues, is presented. Experimental results in various scenarios showed that the incorporation of negative label information increases, in all cases, the classification accuracy of the state of the art.

27 citations


Journal ArticleDOI
TL;DR: The proposed Semi-Supervised Subclass Support Vector Data Description method is a novel extension of the standard SVDD method, by introducing two additional terms which results in a regularized feature space, where low variance directions have been emphasized, while local geometric data information have been preserved.

26 citations


Proceedings ArticleDOI
05 Oct 2018
TL;DR: This paper focuses on formalizing and geometrically modelling common target-following UAV motion types, in order to analytically determine the maximum permissible camera focal length (therefore, the range of feasible shot types) for avoiding visual target tracking failure.
Abstract: Camera-equipped drones have recently revolutionized aerial cinematography, allowing easy acquisition of impressive footage. Although they are currently manually operated, autonomous functionalities based on machine learning and computer vision are becoming popular. However, the emerging area of autonomous UAV filming has to face several challenges, especially when visually tracking fast and unpredictably moving targets. In the latter case, an important issue is how to determine the shot types that are achievable without risking failure of the 2D visual tracker. This paper studies the constraints imposed to cinematography decision-making during autonomous UAV shooting. It focuses on formalizing and geometrically modelling common target-following UAV motion types, in order to analytically determine the maximum permissible camera focal length (therefore, the range of feasible shot types) for avoiding visual target tracking failure.

23 citations


Proceedings ArticleDOI
01 Oct 2018
TL;DR: The behavior of various model configurations in object detection tasks are investigated and a comparative study on inference optimization methods which aim to reduce the computational cost of Convolutional Neural Networks are performed, while examining the effect of such methods on their performance, and proposing architecture modifications for this purpose.
Abstract: Over the past decade, Deep Convolutional Neural Networks with heavy architectures and large numbers of parameters have achieved state-of-the-art results and eclipsed other methods in multiple visual analysis tasks, including object detection. However, the real-time requirements of such tasks directly conflict with the restricted computational capabilities of embedded systems, prohibiting the immediate deployment of bulky models, and necessitating their optimization for inference. Parameter pruning techniques reduce the number of parameters while reducing the input size leads to smaller internal representations, leading by extension to fewer computational operations. Furthermore, inference optimization schemes provided by Deep Learning frameworks can yield significant speed ups, for example by allowing half-precision floating point operations. We investigate the behavior of various model configurations in object detection tasks and perform a comparative study on inference optimization methods which aim to reduce the computational cost of Convolutional Neural Networks, while examining the effect of such methods on their performance, and propose architecture modifications for this purpose.

17 citations


Proceedings ArticleDOI
27 May 2018
TL;DR: A fast and efficient proportional-integral-derivative (PID) based control algorithm that rely solely on 2D visual information is proposed and it is demonstrated that it is possible to accurately control the camera without inferring the 3D position of the target.
Abstract: Using Unmanned Aerial Vehicles (UAVs), also known as drones, for covering public sport events, such as bicycle races, is becoming increasingly popular. Even though the problem of controlling the flight path of a drone is well studied in the literature, little work has been done on controlling the shooting camera for producing professional grade video footage. In this work we propose a fast and efficient proportional-integral-derivative (PID) based control algorithm that rely solely on 2D visual information and we demonstrate that it is possible to accurately control the camera without inferring the 3D position of the target. To ensure that the proposed method will not exhibit undesired behavior, a genetic algorithm is used to tune its parameters using a properly defined fitness function. The proposed method is evaluated using two datasets that contain actual drone footage: a dataset that contains videos of a single cyclist, and a dataset that contains actually footage from a bicycle race event, the Giro D'Italia bicycle race.

13 citations


Journal ArticleDOI
TL;DR: DSR proved to be a useful diagnostic tool for the detection of vertical root fractures in these four clinical cases and the use of contrast enhancement and pseudocolouring techniques assisted with the diagnosis of verticalRoot fractures.
Abstract: Vertical root fractures are commonly associated with root-filled teeth. Diagnosis is challenging because the clinical signs are not completely pathognomonic, and conventional periapical radiography is often unreliable. Digital subtraction radiography (DSR) is able to detect small radiographic changes between two successive radiographs by subtracting out consistent radiographic elements. Its use could possibly assist in the diagnostic procedure. Four cases are presented to demonstrate the potential use of DSR in the detection of vertical root fractures in endodontically treated teeth. After the digital subtractions had been carried out, a dark line in the body of the roots was distinguishable, raising the possibility of the presence of a vertical root fracture. The use of contrast enhancement and pseudocolouring techniques assisted with the diagnosis of vertical root fractures. DSR proved to be a useful diagnostic tool for the detection of vertical root fractures in these four clinical cases.

11 citations


Proceedings ArticleDOI
18 Apr 2018
TL;DR: This paper improves upon the salient dictionary learning framework by replacing the video frame saliency estimation term with one based on Regularized SVD-based Low Rank Approximation, taking advantage of the well-established correlation between midrange matrix singular values and salient regions.
Abstract: Storage, browsing and analysis of human activity videos can be significantly facilitated by automated video summarization. Unsupervised key-frame extraction remains the most widely applicable technique for summarizing activity videos. However, their specific properties make the problem difficult to solve. Typical relevant algorithms fall under the video frame clustering or the dictionary-of-representatives families, with salient dictionary learning having been recently proposed. Under this formulation, the video frames selected as key-frames are the ones which simultaneously best reconstruct the entire video and are salient compared to the rest. This paper improves upon such a method by replacing the video frame saliency estimation term with one based on Regularized SVD-based Low Rank Approximation, taking advantage of the well-established correlation between midrange matrix singular values and salient regions. Extensive empirical evaluation showcases the high performance of both the salient dictionary learning framework and the specific proposed method.

11 citations


Journal ArticleDOI
TL;DR: The proposed methodology proved to be able to measure the curvature of the root canal and its 3D modification after the instrumentation and led to a decrease of the curvatures by 30.23% (on average) in all groups.
Abstract: Objective: In this study, the three-dimensional (3D) modification of root canal curvature was measured, after the application of Reciproc instrumentation technique, by using cone beam computed tomo...

5 citations


Proceedings ArticleDOI
01 Sep 2018
TL;DR: A hypothesis is proposed and empirically tested, namely that more salient data points can be obtained by attempting to restrain reconstruction error separately for each original data point, and salient dictionary learning is extended by adding a third term to the objective function, pushing towards optimal point reconstruction.
Abstract: Salient dictionary learning has recently proven to be effective for unsupervised activity video summarization by key-frame extraction. All relevant methods select a small subset of the original data points/video frames as dictionary atoms/representatives that, in concert, both optimally reconstruct the original entire dataset/video sequence and are salient. Therefore, they attempt to simultaneously optimize a reconstruction term, pushing towards a dictionary/summary that best reconstructs the entire dataset, and a saliency term, pushing towards a dictionary composed of salient data points. In this paper, a hypothesis is proposed and empirically tested, namely that more salient data points can be obtained by attempting to restrain reconstruction error separately for each original data point. Thus, salient dictionary learning is extended by adding a third term to the objective function, pushing towards optimal point reconstruction. A pre-existing greedy, iterative algorithm for salient dictionary learning is modified according to the proposed extension in two alternative ways. The resulting methods achieve state-of-the-art performance in three databases, verifying the validity of our hypothesis.

Journal ArticleDOI
TL;DR: Experimental results are used, in order to evaluate several combinations and provide a recommendation on how to approach a Big Data clustering problem and how the combination of each component in a clustering framework fares in terms of resources, time and performance.
Abstract: Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of ...

Proceedings ArticleDOI
01 Sep 2018
TL;DR: These methods are capable of reducing correct face identification rates of the VGG-face network by over 90 % and it is shown that these error rates preserve adequate image quality as is demonstrated through the values of the complex wavelet structural similarity index, allowing face recognition by humans contrary to most face de-identification methods.
Abstract: In this paper, two face de-identification methods are proposed regarding face identification hindering against a deep neural network. Our work focuses on achieving a delicate balance, so that the facial images are miss-classified by the deep network, while the human observer can still identify the persons depicted in a scene. The proposed methods are based on achieving face de-identification by partly degrading image quality in order to hinder face recognition from deep neural networks, while maintaining the highest possible image quality, at the same time. To this end, we employ de-identification methods based on singular value decomposition and image hypersphere projections, respectively. From the conducted experiments, it can be concluded that these methods are capable of reducing correct face identification rates of the VGG-face network by over 90 %. Moreover, it is shown that these error rates preserve adequate image quality as is demonstrated through the values of the complex wavelet structural similarity index, allowing face recognition by humans contrary to most face de-identification methods.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: A novel multimedia data label propagation method is presented that can incorporate pairwise facial image similarity and dissimilarity constraints into the objective function of the MLPP-CLP state of the art algorithm.
Abstract: In this paper, a novel multimedia data (specifically facial images) label propagation method is presented that is based on the inclusion of labelling constraints in the objective function of the MLPP-CLP state of the art algorithm. The proposed method can incorporate pairwise facial image similarity and dissimilarity constraints into the objective function of the aforementioned method. Experiments which have been conducted on facial image labelling in three stereoscopic movies, confirm the increased labelling accuracy of the proposed method.

Journal ArticleDOI
TL;DR: Experiments which have been conducted on facial image labeling in three stereoscopic movies, confirm the increased labeling accuracy and the reduced computational cost of the proposed method.
Abstract: In this paper, a novel video data (more specifically facial images) fast labeling method, that aims in the acceleration of a state of the art facial identity label propagation technique is presented. Our method assumes that facial images are derived by applying facial image tracking on stereoscopic videos and thus are temporally ordered. The proposed method utilizes a pruned similarity matrix so that the facial label inference is conducted using fewer entries in this matrix, namely the pairwise similarities of the facial images that exist in the main and the N upper and lower off-diagonals. The proposed method can also incorporate pairwise facial image similarity and dissimilarity constraints into the objective function of the label propagation. Experiments which have been conducted on facial image labeling in three stereoscopic movies, confirm the increased labeling accuracy and the reduced computational cost of the proposed method.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: A classification method that emphasizes on learning the hyperplane that separates the training data with the maximum margin in a regularized space, that is derived by exploiting multiple graph structures, in the SVM optimization process is presented.
Abstract: A classification method that emphasizes on learning the hyperplane that separates the training data with the maximum margin in a regularized space, is presented In the proposed method, this regularized space is derived by exploiting multiple graph structures, in the SVM optimization process Each of the employed graph structure carries some information concerning a geometric or semantic property about the training data, eg, local neighborhood area and global geometric data relationships The proposed method introduces information from each graph type to the standard SVM objective, as a projection of the SVM hyperplane to such a direction, where a specific property of the training data is highlighted We show that each data property can be encoded in a regularized kernel matrix Finally, response in the optimal classification space can be obtained by exploiting a weighted combination of multiple regularized kernel matrices Experimental results in face recognition and object classification denote the effectiveness of the proposed method

Proceedings ArticleDOI
01 Jan 2018
TL;DR: A novel method implementing computational UAV cinematography for assisting sports coverage, based on semantic, human-centered visual analysis is proposed in this work, and promising results are obtained.
Abstract: As audiovisual coverage of sports events using Unmanned Aerial Vehicles (UAVs) is becoming increasingly popular, intelligent audiovisual (A/V) shooting tools are needed to assist the cameramen and directors. Several challenges also arise by employing autonomous UAVs, including the accurate identification of the 2D region of cinematographic attention (RoCA) depicting rapidly moving target ensembles (e.g., athletes) and the automatic control of the UAVs so as to take informative and aesthetically pleasing A/V shots, by performing automatic or semiautomatic visual content analysis with no or minimal human intervention. A novel method implementing computational UAV cinematography for assisting sports coverage, based on semantic, human-centered visual analysis is proposed in this work. Athlete detection and tracking, as well as spatial athlete distribution on the image plane are the semantic features extracted from an aerial video feed captured by a UAV and exploited for the extraction of the RoCA, based solely on present and past athlete detections and their regions of interest (ROIs). A PID controller that visually controls a real or virtual camera in order to track the sports RoCA and produce aesthetically pleasing shots, without using 3D location-related information, is subsequently employed. The proposed method is evaluated on actual UAV A/V footage from soccer matches and promising results are obtained.