scispace - formally typeset
Search or ask a question

Showing papers by "Shai Avidan published in 2012"


Proceedings ArticleDOI
16 Jun 2012
TL;DR: This work provides a probabilistic model of the object variations over time and shows LOT's tracking capabilities on challenging video sequences, both commonly used and new, demonstrating performance comparable to state-of-the-art methods.
Abstract: Locally Orderless Tracking (LOT) is a visual tracking algorithm that automatically estimates the amount of local (dis)order in the object. This lets the tracker specialize in both rigid and deformable objects on-line and with no prior assumptions. We provide a probabilistic model of the object variations over time. The model is implemented using the Earth Mover's Distance (EMD) with two parameters that control the cost of moving pixels and changing their color. We adjust these costs on-line during tracking to account for the amount of local (dis)order in the object. We show LOT's tracking capabilities on challenging video sequences, both commonly used and new, demonstrating performance comparable to state-of-the-art methods.

262 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: A new spatially-constrained similarity measure (SCSM) is proposed to handle object rotation, scaling, view point change and appearance deformation, and a novel and robust re-ranking method with the k-nearest neighbors of the query for automatically refining the initial search results.
Abstract: One fundamental problem in object retrieval with the bag-of-visual words (BoW) model is its lack of spatial information. Although various approaches are proposed to incorporate spatial constraints into the BoW model, most of them are either too strict or too loose so that they are only effective in limited cases. We propose a new spatially-constrained similarity measure (SCSM) to handle object rotation, scaling, view point change and appearance deformation. The similarity measure can be efficiently calculated by a voting-based method using inverted files. Object retrieval and localization are then simultaneously achieved without post-processing. Furthermore, we introduce a novel and robust re-ranking method with the k-nearest neighbors of the query for automatically refining the initial search results. Extensive performance evaluations on six public datasets show that SCSM significantly outperforms other spatial models, while k-NN re-ranking outperforms most state-of-the-art approaches using query expansion.

223 citations


Book ChapterDOI
07 Oct 2012
TL;DR: It is shown that a sequence of key design decisions can make k-d trees run as fast as recently proposed state-of-the-art methods, and because of image coherency it is enough to consider only a sparse grid of patches across the image plane.
Abstract: TreeCANN is a fast algorithm for approximately matching all patches between two images It does so by following the established convention of finding an initial set of matching patch candidates between the two images and then propagating good matches to neighboring patches in the image plane TreeCANN accelerates each of these components substantially leading to an algorithm that is ×3 to ×5 faster than existing methods Seed matching is achieved using a properly tuned k-d tree on a sparse grid of patches In particular, we show that a sequence of key design decisions can make k-d trees run as fast as recently proposed state-of-the-art methods, and because of image coherency it is enough to consider only a sparse grid of patches across the image plane We then develop a novel propagation step that is based on the integral image, which drastically reduces the computational load that is dominated by the need to repeatedly measure similarity between pairs of patches As a by-product we give an optimal algorithm for exact matching that is based on the integral image The proposed exact algorithm is faster than previously reported results and depends only on the size of the images and not on the size of the patches We report results on large and varied data sets and show that TreeCANN is orders of magnitude faster than exact NN search yet produces matches that are within 1% error, compared to the exact NN search

53 citations


Book ChapterDOI
07 Oct 2012
TL;DR: This work proposes a method for photo-sequencing --- temporally ordering a set of still images taken asynchronously by aSet of uncalibrated cameras, and uses rank aggregation to combine them into a globally consistent temporal order of images.
Abstract: Dynamic events such as family gatherings, concerts or sports events are often captured by a group of people. The set of still images obtained this way is rich in dynamic content but lacks accurate temporal information. We propose a method for photo-sequencing --- temporally ordering a set of still images taken asynchronously by a set of uncalibrated cameras. Photo-sequencing is an essential tool in analyzing (or visualizing) a dynamic scene captured by still images. The first step of the method detects sets of corresponding static and dynamic feature points across images. The static features are used to determine the epipolar geometry between pairs of images, and each dynamic feature votes for the temporal order of the images in which it appears. The partial orders provided by the dynamic features are not necessarily consistent, and we use rank aggregation to combine them into a globally consistent temporal order of images. We demonstrate successful photo sequencing on several challenging collections of images taken using a number of mobile phones.

46 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: The core idea is to use a dense multi-camera array to construct a novel, dense 3D volumetric representation of the 3D space where each voxel holds an estimated intensity value and a confidence measure of this value.
Abstract: We propose a method for estimating the 3D structure and the dense 3D motion (scene flow) of a dynamic nonrigid 3D scene, using a camera array. The core idea is to use a dense multi-camera array to construct a novel, dense 3D volumetric representation of the 3D space where each voxel holds an estimated intensity value and a confidence measure of this value. The problem of 3D structure and 3D motion estimation of a scene is thus reduced to a nonrigid registration of two volumes — hence the term ”Scene Registration”. Registering two dense 3D scalar volumes does not require recovering the 3D structure of the scene as a preprocessing step, nor does it require explicit reasoning about occlusions. From this nonrigid registration we accurately extract the 3D scene flow and the 3D structure of the scene, and successfully recover the sharp discontinuities in both time and space. We demonstrate the advantages of our method on a number of challenging synthetic and real data sets.

36 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: A method for browsing multiple videos with a common theme, such as the result of a search query on a video sharing website, or videos of an event covered by multiple cameras, is proposed.
Abstract: We propose a method for browsing multiple videos with a common theme, such as the result of a search query on a video sharing website, or videos of an event covered by multiple cameras. Given the collection of videos we first align each video with all others. This pairwise video alignment forms the basis of a novel browsing interface, termed the Browsing Companion. It is used to play a primary video and, in addition as thumbnails, other video clips that are temporally synchronized with it. The user can, at any time, click on one of the thumbnails to make it the primary. We also show that video alignment can be used for other applications such as automatic highlight detection and multivideo summarization.

27 citations


Proceedings ArticleDOI
01 Jan 2012
TL;DR: This work introduces an automatic system that receives a set of natural images taken in running sports events and outputs the participants’ RBN, used to identify that competitor during the race.
Abstract: Running races, such as marathons, are broadly covered by professional as well as amateur photographers. This leads to a constantly growing number of photos covering a race, making the process of identifying a particular runner in such datasets difficult. Today, such identification is often done manually. In running races, each competitor has an identification number, called the Racing Bib Number (RBN), used to identify that competitor during the race. RBNs are usually printed on a paper or cardboard tag and pined onto the competitor’s T-shirt during the race. We introduce an automatic system that receives a set of natural images taken in running sports events and outputs the participants’ RBN.

24 citations