scispace - formally typeset
Search or ask a question

Showing papers by "Patrick Lucey published in 2014"


Proceedings ArticleDOI
14 Dec 2014
TL;DR: In this article, the authors presented a role-based representation that dynamically updates each player's relative role at each frame and demonstrate how this captures the short-term context to enable both individual player and team analysis.
Abstract: Although the collection of player and ball tracking data is fast becoming the norm in professional sports, large-scale mining of such spatiotemporal data has yet to surface. In this paper, given an entire season's worth of player and ball tracking data from a professional soccer league (a#x2248;400,000,000 data points), we present a method which can conduct both individual player and team analysis. Due to the dynamic, continuous and multi-player nature of team sports like soccer, a major issue is aligning player positions over time. We present a "role-based" representation that dynamically updates each player's relative role at each frame and demonstrate how this captures the short-term context to enable both individual player and team analysis. We discover role directly from data by utilizing a minimum entropy data partitioning method and show how this can be used to accurately detect and visualize formations, as well as analyze individual player behavior.

132 citations


Proceedings ArticleDOI
14 Dec 2014
TL;DR: Focusing on basketball, a latent factor modeling approach is employed, which leads to a compact data representation that enables efficient prediction given raw spatiotemporal tracking data and can make accurate in-game predictions.
Abstract: We consider the problem of learning predictive models for in-game sports play prediction. Focusing on basketball, we develop models for anticipating near-future events given the current game state. We employ a latent factor modeling approach, which leads to a compact data representation that enables efficient prediction given raw spatiotemporal tracking data. We validate our approach using tracking data from the 2012-2013 NBA season, and show that our model can make accurate in-game predictions. We provide a detailed inspection of our learned factors, and show that our model is interpretable and corresponds to known intuitions of basketball game play.

96 citations


Journal Article
TL;DR: This paper presents a method which can accurately determine the identity of a team from spatiotemporal player tracking data by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps.
Abstract: To the trained-eye, experts can often identify a team based on their unique style of play due to their movement, passing and interactions. In this paper, we present a method which can accurately determine the identity of a team from spatiotemporal player tracking data. We do this by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps. We show how our approach is significantly better at identifying different teams compared to standard measures (i.e., shots, passes etc.). We demonstrate the utility of our approach using an entire season of Prozone player tracking data from a top-tier professional soccer league.

60 citations


Proceedings ArticleDOI
01 Dec 2014
TL;DR: In this paper, the authors present a method which can accurately determine the identity of a team from spatiotemporal player tracking data by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps.
Abstract: To the trained-eye, experts can often identify a team based on their unique style of play due to their movement, passing and interactions In this paper, we present a method which can accurately determine the identity of a team from spatiotemporal player tracking data We do this by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps We show how our approach is significantly better at identifying different teams compared to standard measures (ie, Shots, passes etc) We demonstrate the utility of our approach using an entire season of Prozone player tracking data from a top-tier professional soccer league

49 citations


01 Mar 2014
TL;DR: In this article, an automatic formation detection method is presented by investigating the "home advantage" of teams in a recent top-tier professional soccer league, showing that home teams had significantly more possession in the forward-third which correlated with more shots and goals while the shooting and passing proficiencies were the same.
Abstract: In terms of analyzing soccer matches, two of the most important factors to consider are: 1) the formation the team played (e.g., 4-4-2, 4-2-3-1, 3-5-2 etc.), and 2) the manner in which they executed it (e.g., conservative -sitting deep, or aggressive -pressing high). Despite the existence of ball and player tracking data, no current methods exist which can automatically detect and visualize formations. Using an entire season of Prozone data which consists of ball and player tracking information from a recent top-tier professional league, we showcase an automatic formation detectionmethod by investigating the “home advantage”. In a paper we published recently, using an entire season of ball tracking data we showed that home teams had significantlymore possession in the forward-third which correlated with more shots and goals while the shooting and passing proficiencies were the same. Using our automatic formation analysis,we extend this analysisand show that while teams tend to play the same formation at home as they do away, the manner in which they execute the formation is significantly different. Specifically, we show that the position of the formation of teams at homeis significantly higher up the field compared to when they play away. This conservative approach at away games suggests that coaches aim to win their home games and draw their away games. Additionally, we also show that our method can visually summarize agame which gives an indication of dominance and tactics. While enabling new discoveries of team behavior which can enhance analysis, it is also worth mentioning that our automatic formation detection method is the first to be developed.

38 citations


Proceedings ArticleDOI
24 Mar 2014
TL;DR: In this paper, a method of representing audience behavior through facial and body motions from a single video stream, and use these features to predict the rating for feature-length movies is proposed.
Abstract: We propose a method of representing audience behavior through facial and body motions from a single video stream, and use these features to predict the rating for feature-length movies. This is a very challenging problem as: i) the movie viewing environment is dark and contains views of people at different scales and viewpoints; ii) the duration of feature-length movies is long (80–120 mins) so tracking people uninterrupted for this length of time is still an unsolved problem; and iii) expressions and motions of audience members are subtle, short and sparse making labeling of activities unreliable. To circumvent these issues, we use an infrared illuminated test-bed to obtain a visually uniform input. We then utilize motion-history features which capture the subtle movements of a person within a pre-defined volume, and then form a group representation of the audience by a histogram of pair-wise correlations over a small-window of time. Using this group representation, we learn our movie rating classifier from crowd-sourced ratings collected by rottentomatoes.com and show our prediction capability on audiences from 30 movies across 250 subjects (> 50 hrs).

31 citations


01 Feb 2014
TL;DR: In this article, the authors use ball and player tracking data from STATS SportsVU from the 2012-2013 NBA season to analyze offensive and defensive formations of teams and demonstrate the utility of their approach by analyzing all the plays that resulted in a 3-point shot attempt.
Abstract: In this paper, we use ball and player tracking data from STATS SportsVU from the 2012-2013 NBA season to analyze offensive and defensive formations of teams. We move beyond current analysis thatusesonly play-by-play event-driven statistics (i.e.,rebounds, shots) and look at the spatiotemporal changes in a team’s formation. A major concern,which also gives a clue to unlocking this problem, is that of permutations caused by the constant movement and interchanging of positions by players. In this paper, we use a method that represents a team via “role” which is immune to the problem of permutations. We demonstrate the utility of our approach by analyzing all the plays that resulted in a 3-point shot attemptin the 2012-2013 NBA season.We analyzed close to 20,000 shots and found that when a player is “open” the shooting percentage is around 40%, compared to a “pressured” shot which is close to 32%. There is nothing groundbreaking behind this finding (i.e., putting more defensive pressure on the shooter reduces shooting percentages) but finding how teams get shooters open is. Using our method, we show that the amount of defensive role-swaps are predictive of getting an open-shot and this measure can be used to measure the defensive effectiveness of a team. Additionally, our role representation allows for large-scale retrieval of plays by using the tracking data as the input query rather than a text label -this “video Google” approach allows for quick and accurate play retrieval.

31 citations


Book ChapterDOI
01 Nov 2014
TL;DR: This paper proposes an “augmented-Hidden Conditional Random Field” (a-HCRF) which incorporates the local observation within the HCRF which boosts it forecasting performance and shows that this approach outperforms current state-of-the-art methods in forecasting short-term events in both soccer and tennis.
Abstract: In highly dynamic and adversarial domains such as sports, short-term predictions are made by incorporating both local immediate as well global situational information. For forecasting complex events, higher-order models such as Hidden Conditional Random Field (HCRF) have been used to good effect as capture the long-term, high-level semantics of the signal. However, as the prediction is based solely on the hidden layer, fine-grained local information is not incorporated which reduces its predictive capability. In this paper, we propose an “augmented-Hidden Conditional Random Field” (a-HCRF) which incorporates the local observation within the HCRF which boosts it forecasting performance. Given an enormous amount of tracking data from vision-based systems, we show that our approach outperforms current state-of-the-art methods in forecasting short-term events in both soccer and tennis. Additionally, as the tracking data is long-term and continuous, we show our model can be adapted to recent data which improves performance.

29 citations


Proceedings ArticleDOI
24 Mar 2014
TL;DR: In this paper, a vision-based system was proposed to estimate the location of a swimmer in each frame and detect the stroke rate from a large collection of archived swimming races. But the system is limited to a single race and cannot handle large-scale analysis and retrieval.
Abstract: In elite sports, nearly all performances are captured on video. Despite the massive amounts of video that has been captured in this domain over the last 10–15 years, most of it remains in an “unstructured” or “raw” form, meaning it can only be viewed or manually annotated/tagged with higher-level event labels which is time consuming and subjective. As such, depending on the detail or depth of annotation, the value of the collected repositories of archived data is minimal as it does not lend itself to large-scale analysis and retrieval. One such example is swimming, where each race of a swimmer is captured on a camcorder and in-addition to the split-times (i.e., the time it takes for each lap), stroke rate and stroke-lengths are manually annotated. In this paper, we propose a vision-based system which effectively “digitizes” a large collection of archived swimming races by estimating the location of the swimmer in each frame, as well as detecting the stroke rate. As the videos are captured from moving hand-held cameras which are located at different positions and angles, we show our hierarchical-based approach to tracking the swimmer and their different parts is robust to these issues and allows us to accurately estimate the swimmer location and stroke rates.

23 citations


Patent
13 Jan 2014
TL;DR: In this article, a user interface is coupled with at least three cameras that share substantially the same vantage point, and the user interface can switch between the cameras views and control the cameras to capture different portions of the context view.
Abstract: To generate a media presentation of a live event, a user interface is coupled to at least three cameras that share substantially the same vantage point. One of the cameras (e.g., a context camera) provides a context view of the event that is displayed on a screen of the user interface. The views of the other two cameras are superimposed onto the context view to define sub-portions that are visually demarcated within the context view. In one embodiment, only one of views is visually demarcated in the context view at any given time. Based on user interaction, the user interface can switch between the cameras views and control the cameras to capture different portions of the context view. Based the image data captured by the views of the cameras within the context view, the user interface generates a media presentation that may be broadcast to multiple viewers.

15 citations


Book ChapterDOI
01 Jan 2014
TL;DR: A bilinear spatiotemporal basis model using a role representation to clean-up the noisy detections which operates in a low-dimensional space is proposed.
Abstract: Due to their unobtrusive nature, vision-based approaches to tracking sports players have been preferred over wearable sensors as they do not require the players to be instrumented for each match. Unfortunately however, due to the heavy occlusion between players, variation in resolution and pose, in addition to fluctuating illumination conditions, tracking players continuously is still an unsolved vision problem. For tasks like clustering and retrieval, having noisy data (i.e. missing and false player detections) is problematic as it generates discontinuities in the input data stream. One method of circumventing this issue is to use an occupancy map, where the field is discretised into a series of zones and a count of player detections in each zone is obtained. A series of frames can then be concatenated to represent a set-play or example of team behaviour. A problem with this approach though is that the compressibility is low (i.e. the variability in the feature space is incredibly high). In this paper, we propose the use of a bilinear spatiotemporal basis model using a role representation to clean-up the noisy detections which operates in a low-dimensional space. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed high-definition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the-art real-time player detector and compare it to manually labeled data.

Proceedings ArticleDOI
01 Jun 2014
TL;DR: In this paper, a histogram of facial action units representation using Active Appearance Model (AAM) face features and a Hidden Conditional Random Field (HCRF) was used to detect the rigid and non-rigid head motion.
Abstract: utomatic pain monitoring has the potential to greatly improve patient diagnosis and outcomes by providing a continuous objective measure One of the most promising methods is to do this via automatically detecting facial expressions However, current approaches have failed due to their inability to: 1) integrate the rigid and non-rigid head motion into a single feature representation, and 2) incorporate the salient temporal patterns into the classification stage In this paper, we tackle the first problem by developing a “histogram of facial action units” representation using Active Appearance Model (AAM) face features, and then utilize a Hidden Conditional Random Field (HCRF) to overcome the second issue We show that both of these methods improve the performance on the task of pain detection in sequence level compared to current state-of-the-art-methods on the UNBC-McMaster Shoulder Pain Archive

Journal Article
TL;DR: This paper develops a “histogram of facial action units” representation using Active Appearance Model (AAM) face features, and utilizes a Hidden Conditional Random Field (HCRF) to overcome the second issue.
Abstract: utomatic pain monitoring has the potential to greatly improve patient diagnosis and outcomes by providing a continuous objective measure. One of the most promising methods is to do this via automatically detecting facial expressions. However, current approaches have failed due to their inability to: 1) integrate the rigid and non-rigid head motion into a single feature representation, and 2) incorporate the salient temporal patterns into the classification stage. In this paper, we tackle the first problem by developing a “histogram of facial action units” representation using Active Appearance Model (AAM) face features, and then utilize a Hidden Conditional Random Field (HCRF) to overcome the second issue. We show that both of these methods improve the performance on the task of pain detection in sequence level compared to current state-of-the-art-methods on the UNBC-McMaster Shoulder Pain Archive.

Patent
02 Jun 2014
TL;DR: In this article, a future event prediction method based on a parameter-vector input, hidden states, and spatio-temporal data is proposed, where a HCRF predictor is used to generate a future state.
Abstract: Systems and methods are disclosed for a future event prediction. Embodiments include capturing spatiotemporal data pertaining to activities, wherein the activities include a plurality of events, and employing an augmented-hidden-conditional-random-field (a-HCRF) predictor to generate a future event prediction based on a parameter-vector input, hidden states, and the spatiotemporal data. Methods therein utilize a graph including a first node associated with random variables corresponding to a future event state, a second node associated with random variables corresponding to spatiotemporal input data, a first group of nodes, each node therein associated with random variables corresponding to a subset of the spatiotemporal input data, a second group of nodes, each node therein associated with random variables corresponding to a hidden-state; wherein the edges connect the first node with the second node, the first node with the second group of nodes, and the first group of nodes with the second group of nodes.

Patent
26 Sep 2014
TL;DR: In this article, a formation analysis system is described for discovering a formation associated with an agent group engaging in an activity over a window of time, where the first and second results for an objective function are computed for each agent in the agent group.
Abstract: Approaches are described for discovering a formation associated with an agent group engaging in an activity over a window of time. A formation analysis system computes first and second results for an objective function based on first and second sets of role assignments for each agent in the agent group at first and second moments in time, respectively. The formation analysis system iterates by: replacing the first set of role assignments with the second set of role assignments, and determining whether completion criteria have been met based at least in part on comparing the first result with the second result. If the completion criteria have not been met, then the formation analysis system replaces the second set of role assignments with a third set of role assignments that associate each agent in the first agent group with a different role assignment in the third set of role assignments at a third moment in time. If the completion criteria have been met, then the formation analysis system determines the first formation based on the second result.

Journal Article
TL;DR: A method of representing audience behavior through facial and body motions from a single video stream, and using these features to predict the rating for feature-length movies is proposed.
Abstract: We propose a method of representing audience behavior through facial and body motions from a single video stream, and use these features to predict the rating for feature-length movies. This is a very challenging problem as: i) the movie viewing environment is dark and contains views of people at different scales and viewpoints; ii) the duration of feature-length movies is long (80-120 mins) so tracking people uninterrupted for this length of time is still an unsolved problem, and; iii) expressions and motions of audience members are subtle, short and sparse making labeling of activities unreliable. To circumvent these issues, we use an infrared illuminated test-bed to obtain a visually uniform input. We then utilize motion-history features which capture the subtle movements of a person within a pre-defined volume, and then form a group representation of the audience by a histogram of pair-wise correlations over a small-window of time. Using this group representation, we learn our movie rating classifier from crowd-sourced ratings collected by rottentomatoes.com and show our prediction capability on audiences from 30 movies across 250 subjects (> 50 hrs).

Patent
13 Jan 2014
TL;DR: In this paper, a user interface is coupled to at least three cameras that share substantially the same vantage point, and the user interface can switch between the cameras views and control the cameras to capture different portions of the context view.
Abstract: To generate a media presentation of a live event, a user interface is coupled to at least three cameras that share substantially the same vantage point. One of the cameras (e.g., a context camera) provides a context view of the event that is displayed on a screen of the user interface. The views of the other two cameras are superimposed onto the context view to define sub-portions that are visually demarcated within the context view. Based on user interaction, the user interface can switch between the cameras views and control the cameras to capture different portions of the context view. Based the image data captured by the views of the cameras within the context view, the user interface generates a media presentation that may be broadcast to multiple viewers.

Proceedings Article
01 Jan 2014
TL;DR: Predicting the location of shot in tennis using Hawk-Eye tennis data and clustering spatiotemporal plays in soccer to discover the methods in which they get a shot on goal from a professional league are focused on.
Abstract: In this paper, we summarize our recent work in analyz- ing and predicting behaviors in sports using spatiotemporal data. We specifically focus on two recent works: 1) Predicting the location of shot in tennis using Hawk-Eye tennis data, and 2) Clustering spatiotemporal plays in soccer to discover the methods in which they get a shot on goal from a professional league.