scispace - formally typeset
Open AccessProceedings ArticleDOI

Learning Video Object Segmentation from Static Images

Reads0
Chats0
TLDR
In this paper, the authors use a combination of offline and online learning strategies, where the former produces a refined mask from the previous frame estimate and the latter allows to capture the appearance of the specific object instance.
Abstract
Inspired by recent advances of deep learning in instance segmentation and object tracking, we introduce the concept of convnet-based guidance applied to video object segmentation. Our model proceeds on a per-frame basis, guided by the output of the previous frame towards the object of interest in the next frame. We demonstrate that highly accurate object segmentation in videos can be enabled by using a convolutional neural network (convnet) trained with static images only. The key component of our approach is a combination of offline and online learning strategies, where the former produces a refined mask from the previous frame estimate and the latter allows to capture the appearance of the specific object instance. Our method can handle different types of input annotations such as bounding boxes and segments while leveraging an arbitrary amount of annotated frames. Therefore our system is suitable for diverse applications with different requirements in terms of accuracy and efficiency. In our extensive evaluation, we obtain competitive results on three different datasets, independently from the type of input annotation.

read more

Content maybe subject to copyright    Report

Learning Video Object Segmentation from Static Images
*
Federico Perazzi
1,2 *
Anna Khoreva
3
Rodrigo Benenson
3
Bernt Schiele
3
Alexander Sorkine-Hornung
1
1
Disney Research
2
ETH Zurich
3
Max Planck Institute for Informatics, Saarbrücken, Germany
Abstract
Inspired by recent advances of deep learning in instance
segmentation and object tracking, we introduce the concept
of convnet-based guidance applied to video object segmen-
tation. Our model proceeds on a per-frame basis, guided by
the output of the previous frame towards the object of inter-
est in the next frame. We demonstrate that highly accurate
object segmentation in videos can be enabled by using a
convolutional neural network (convnet) trained with static
images only. The key component of our approach is a com-
bination of offline and online learning strategies, where the
former produces a refined mask from the previous’ frame es-
timate and the latter allows to capture the appearance of the
specific object instance. Our method can handle different
types of input annotations such as bounding boxes and seg-
ments while leveraging an arbitrary amount of annotated
frames. Therefore our system is suitable for diverse appli-
cations with different requirements in terms of accuracy and
efficiency. In our extensive evaluation, we obtain competi-
tive results on three different datasets, independently from
the type of input annotation.
1. Introduction
Convolutional neural networks (convnets) have shown
outstanding performance in many fundamental areas in
computer vision, enabled by the availability of large-scale
annotated datasets (e.g., ImageNet classification [
24, 43]).
However, some important challenges in video processing
can be difficult to approach using convnets, since creating
a sufficiently large body of densely, pixel-wise annotated
video data for training is usually prohibitive.
One example of such domain is video object segmenta-
tion. Given only one or a few frames annotated with seg-
mentation masks of a particular object instance, the task of
video object segmentation is to accurately segment the same
The first two authors contributed equally.
MaskTrack ConvNet
Input frame t
Mask estimate t-1
Refined mask t
Figure 1: Given a rough mask estimate from the previous
frame t 1 we train a convnet to provide a refined mask
output for the current frame t.
instance in all other frames of the video. Current top per-
forming approaches either interleave box tracking and seg-
mentation [
53], or propagate the first frame mask annotation
in space-time via CRF or GrabCut-like techniques [
29,49].
One of the key insights and contributions of this paper is
that fully annotated video data is not necessary. We demon-
strate that highly accurate video object segmentation can be
enabled using a convnet trained with static images only. We
show that a convnet designed for semantic image segmen-
tation [
8] can be utilized to perform per-frame instance seg-
mentation, i.e., segmentation of generic objects while dis-
tinguishing different instances of the same class. For each
new video frame the network is guided towards the object
of interest by feeding in the previous’ frame mask estimate.
We therefore refer to our approach as guided instance seg-
mentation. To the best of our knowledge, it represents the
first fully trained approach to video object segmentation.
Our system is efficient due to its feed-forward archi-
tecture and can generate high quality results in a single
pass over the video, without the need for considering more
than one frame at a time. This is in stark contrast to many
other video segmentation approaches, which usually require
global connections over multiple frames or even the whole
1
2663

video sequence in order to achieve coherent results. Futher-
more, our method can handle different types of annotations
and even simple bounding boxes as input are sufficient to
obtain competitive results, making our method flexible with
respect to various practical applications with different re-
quirements in terms of human supervision.
Key to the video segmentation quality of our approach
is the combination of offline and online learning strategies.
In the offline phase, we use deformation and coarsening on
the image masks in order to train the network to produce ac-
curate output masks from their rough estimates. An online
training phase extends ideas from previous works on object
tracking [
12, 32] to the task of video segmentation and en-
ables the method to be easily optimized with respect to an
object of interest in a novel input video.
The result is a single, homogeneus system that compares
favourably to most classical approaches on three extremely
heterogeneous video segmentation benchmarks, despite us-
ing the same model and parameters across all videos. We
provide a detailed ablation study and explore the impact of
varying number and types of annotations. Moreover, we dis-
cuss extensions of the proposed model, allowing to improve
the quality even further.
2. Related Works
The idea of performing video object segmentation via
tracking at the pixel level is at least a decade old [
40]. Re-
cent approaches interweave box tracking with box-driven
segmentation (e.g. TRS [
53]), or propagate the first frame
segmentation via graph labeling approaches.
Local propagation. JOTS [
52] builds a graph over neigh-
boring frames connecting superpixels and (generic) ob-
ject parts to solve the video labeling task. ObjFlow [
49]
builds a graph over pixels and superpixels, uses convnet
based appearance terms, and interleaves labeling with op-
tical flow estimation. Instead of using superpixels or pro-
posals, BVS [
29] formulates a fully-connected pixel-level
graph between frames and efficiently infer the labeling over
the vertices of a spatio-temporal bilateral grid [
7]. Because
these methods propagate information only across neighbor
frames they have difficulties capturing long range relation-
ships and ensuring globally consistent segmentation.
Global propagation. In order to overcome these limi-
tations, some methods have proposed to use long-range
connections between video frames [
15, 25, 48, 55]. In
particular, we compare to FCP [
35], Z15 [56], NLC [15]
and W16 [
50] which build a global graph structure over
object proposal segments, and then infer a consistent
segmentation. A limitation of methods utilizing long-range
connections is that they have to operate on larger image re-
gions such as superpixels or object proposals for acceptable
speed and memory usage, compromising on their ability to
handle fine details.
Unsupervised segmentation. Another family of works
perform moving object segmentation (over all parts of the
image), and selects post-hoc the space-time tube that best
match the annotation [
18, 26, 33, 53]. In contrast, our ap-
proach side-steps the use of any intermediate tracked boxes,
superpixels or object proposals and proceeds on a per-frame
basis, therefore efficiently handling even long sequences at
full detail. We focus on propagating the first frame seg-
mentation forward onto future frames, using an online fine-
tuned convnet as appearance model for segmenting the ob-
ject of interest in the next frames.
Box tracking. Some previous works have investigated ap-
proaches that improve segmentation quality by leveraging
object tracking and vice versa [
10, 13, 17, 40, 53]. More
recent, state-of-the-art tracking methods are based on dis-
criminative correlation filters over handcrafted features (e.g.
HOG) and over frozen deep learned features [
11,12], or are
convnet based trackers on their own right [
20, 32]. Our ap-
proach is most closely related to the latter group. GOTURN
[
20] proposes to train offline a convnet so as to directly
regress the bounding box in the current frame based on
the object position and appearance in the previous frame.
MDNet [
32] proposes to use online fine-tuning of a conv-
net to model the object appearance. Our training strategy
is inspired by GOTURN for the offline part, and MDNet for
the online stage. Compared to the aforementioned methods
our approach operates at pixel level masks instead of boxes.
Differently from MDNet, we do not replace the domain-
specific layers, instead fine-tuning all the layers on the avail-
able annotations for each individual video sequence.
Instance segmentation. At each frame, video object seg-
mentation outputs a single instance segmentation. Given an
estimate of the object location and size, bottom-up seg-
ment proposals [
38] or GrabCut [42] variants can be used
as shape guesses. Also specific convnet architectures have
been proposed for instance segmentation [19, 36, 37, 54].
Our approach outputs per-frame instance segmentation us-
ing a convnet architecture, inspired by works from other do-
mains like [
6, 44, 54]. A concurrent work [5] also exploits
convnets for video object segmentation. Differently from
our approach their segmentation is not guided, and therefore
it cannot distinguish multiple instances of the same object.
Interactive video segmentation. Applications such as
video editing for movie production often require a level
of accuracy beyond the current state-of-the-art. Thus sev-
eral works have also considered video segmentation with
variable annotation effort, enabling human interaction us-
ing clicks [
22, 47, 51] or strokes [1, 16, 57]. In this work
we consider instead box or segment annotations on multiple
frames. In §
5 we report results when varying the amount of
annotation effort, from one frame per video to all frames.
2664

3. Method
We approach the video object segmentation problem
from a different perspective we refer as: convnet-based
guided instance segmentation. For each new frame we wish
to label pixels as object/non-object of interest, for this we
build upon the architecture of the existing pixel labelling
convnet and train it to generate per-frame instance seg-
ments. We pick DeepLabv2 [
8], but our approach is ag-
nostic of the specific architecture selected. The challenge
is then: how to inform the network which instance to seg-
ment? We solve this by using two complementary strate-
gies. First we guide the network towards the instance of in-
terest by feeding in the previous’ frame mask estimate dur-
ing offline training
3.1). Second, we employ online train-
ing to fine-tune the model to incorporate specific knowledge
of the object instance
3.2).
3.1. Offline Training
In order to guide the pixel labeling network to segment
the object of interest, we begin by expanding the convnet
input from RGB to RGB+mask channels. The extra mask
channel is meant to provide an estimate of the visible area
of the object in the current frame, its approximate location
and shape. We can then train the labelling convnet to output
an accurate segmentation of the object, given as input the
current image and a rough estimate of the object mask. Our
tracking network is de-facto a "mask refinement" network.
There are two key observations that make this approach
practical. First, very rough input masks are enough for our
trained network to provide sensible output segments. Even
a large bounding box as input will result in a reasonable out-
put (see §
5.2). The main role of the input mask is to point
the convnet towards the correct object instance to segment.
Second, this particular approach does not require us to use
video as training data, such as done in [
3, 5, 20, 32]. Be-
cause we only use a mask as additional input, instead of an
image crop as in [3,20], we can synthesize training samples
from single frame instance segmentation annotations. This
allows us to train from a large set of diverse images, instead
of having to rely on scarce video annotations.
Figure
1 shows our simplified model. To simulate the
noise of the previous frame output, during offline training,
we generate input masks by deforming the annotations via
affine transformation as well as non-rigid deformations via
thin-plate splines [
4], followed by a coarsening step (dila-
tion morphological operation) to remove details of the ob-
ject contour. We apply this data generation procedure over a
dataset of 10
4
images containing diverse object instances,
see examples in the supplementary material. At test time,
given the mask estimate at time t 1, we apply the dila-
tion operation and use the resulting rough mask as input for
object segmentation in frame t.
The affine transformations and non-rigid deformations
aim at modelling the expected motion of an object between
two frames. The coarsening permits us to generate train-
ing samples that resembles the test time data, simulating
the blobby shape of the output mask given from the pre-
vious frame by the convnet. These two ingredients make
the estimation more robust to noisy segmentation estimates
while helping to avoid accumulation of errors from the pre-
ceding frames. The trained convnet has learnt to do guided
instance segmentation similar to networks like SharpMask
[
37], DeepMask [36] and Hypercolumns [19], but instead of
taking a bounding box as guidance, we can use an arbitrary
input mask. The training details are described in §
4.
When using offline training only, the segmentation pro-
cedure consists of two steps: the previous frame mask is
coarsened and then fed into the trained network to esti-
mate the current frame mask. Since objects have a tendency
to move smoothly through space, the object mask in the
preceding frame will provide a good guess in the current
frame and simply copying the coarse mask from the pre-
vious frame is enough. This approach is fast and already
provides good results. We also experimented using optical
flow to propagate the mask from one frame to the next, but
found the optical flow errors to offset the gains.
With only the offline trained network, the proposed ap-
proach allows us to achieve competitive performance com-
pared to previously reported results (see §
5.2). However the
performance can be further improved by integrating online
training strategy as described in the next section.
3.2. Online Training
For further boosting the video segmentation quality, we
borrow and extend ideas that were originally proposed
for object tracking. Current top performing tracking tech-
niques [
12, 32] use some form of online training. We thus
consider improving results by adding online fine-tuning as
a second strategy.
The idea is to use, at test time, the segment annotation
of the first video frame as additional training data. Using
augmented versions of this single frame annotation, we pro-
ceed to fine-tune the model to become more specialized for
the specific object instance at hand. We use a similar data
augmentation as for offline training. On top of affine and
non-rigid deformations for the input mask, we also add im-
age flipping and rotations. We generate 10
3
training sam-
ples from this single annotation, and proceed to fine-tune
the model previously trained offline.
With online fine-tuning, the network weights partially
capture the appearance of the specific object being tracked.
The model aims to strike a balance between general instance
segmentation (so as to generalize to the object changes), and
specific instance segmentation (so as to leverage the com-
mon appearance across video frames). The details of the
online fine-tuning are provided in §
4. In our experiments
2665

Figure 2: Examples of optical flow magnitude images. Top:
RGB images. Bottom: corresponding motion magnitude es-
timates encoded into as gray-scale images.
we only perform fine-tuning using the annotated frame(s).
To the best of our knowledge our approach is the first
to use a pixel labelling network (like DeepLabv2 [
8]) for
the task of video object segmentation. We name our full ap-
proach, using both offline and online training, MaskTrack.
3.3. Variants
Additionally we consider variations of the proposed
model. First, we demonstrate that our approach is flexible
and could handle different types of input annotations, using
less supervision in the first frame annotation. Second, we
describe how motion information could be easily integrated
in the system, improving the quality of the object segments.
Box annotation. In this paragraph, we discuss a variant
named MaskTrack
Box
, that takes a bounding box annota-
tion in the first frame as an input supervision instead of a
segmentation mask. To this end, we train a similar convnet
that fed with a bounding-box annotation as an input outputs
a segment. Once the first frame bounding box is converted
to a segment, we switch back to the MaskTrack model that
uses as guidance the output mask from the previous frame.
Optical flow. On top of MaskTrack, we consider to em-
ploy optical flow as a source of additional information to
guide the segmentation. Given a video sequence, we com-
pute the optical flow using EpicFlow [
41] with Flow Fields
matches [
2] and convolutional boundaries [30]. In parallel
to the vanilla MaskTrack, we proceed to compute a sec-
ond output mask using the magnitude of the optical flow
as input image (replicated into a three channel image). The
model is used as-is, without retraining. Although it has been
trained on RGB images, this strategy works as object flow
magnitude roughly looks like a gray-scale object, and still
captures useful object shape information, see examples in
Figure
2. Using the RGB model allows to avoid training the
convnet on video datasets annotated with masks. We then
fuse by averaging the output scores given by the two paral-
lel networks, respectively fed with RGB images and optical
flow magnitude as input. As shown in Table
1, optical flow
provides complementary information to MaskTrack with
RGB images, improving the overall performance.
4. Network Implementation and Training
Following, we describe the implementation details of our
approach. Specifically, we provide additional information
regarding the network initialization, the offline and online
training strategies and the data augmentation.
Network. For all our experiments we use the training
and test parameters of DeepLabv2-VGG network [8]. The
model is initialized from a VGG16 network pre-trained on
ImageNet [
46]. For the extra mask channel of filters in the
first convolutional layer we use gaussian initialization. We
also tried zero initialization, but observed no difference.
Offline training. The advantage of our method is that it
does not require expensive pixel-wise video annotations
for training. Thus we can employ existing image datasets.
However, in order for our model to generalize well across
different videos, we avoid training on datasets that are bi-
ased towards certain semantic classes, such as COCO [
28]
or Pascal [
14]. Instead we combine images and annotations
from several saliency segmentation datasets (ECSSD [
45],
MSRA10K [
9], SOD [31], and PASCAL-S [27]), resulting
in an aggregated set of 11 282 training images.
The input masks for the extra channel are generated
by deforming the binary segmentation masks via affine
transformation and non-rigid deformations, as discussed in
§
3.1. For affine transformation we consider random scaling
(±5% of object size) and translation (±10% shift). Non-
rigid deformations are done via thin-plate splines [
4] using
5 control points and randomly shifting the points in x and
y directions within ±10% margin of the original segmen-
tation mask width and height. Next, the mask is coarsened
using dilation operation with 5 pixel radius. This mask de-
formation procedure is applied over all object instances in
the training set. For each image two different masks are
generated. We refer the reader to the supplementary mate-
rial for visual examples of deformed masks.
The convnet training parameters are identical to those
proposed in [
8]. Therefore we use stochastic gradient de-
scent (SGD) with mini-batches of 10 images and a polyno-
mial learning policy with initial learning rate of 0.001. The
momentum and weight decay are set to 0.9 and 0.0005, re-
spectively. The network is trained for 20k iterations.
Online training. For online adaptation we fine-tune the
model previously trained offline on the first frame for 200
iterations with training samples generated from the first
frame annotation. We augment the first frame by image flip-
ping and rotations as well as by deforming the annotated
masks for an extra channel via affine and non-rigid defor-
mations with the same parameters as for the offline training.
This results in an augmented set of 10
3
training images.
The network is trained with the same learning parameters
as for offline training, fine-tuning all convolutional layers.
2666

1st frame
Image Box annotation Segment annotation
13th frame
Ground truth MaskTrack
Box
result MaskTrack result
Figure 3: By propagating annotation from the 1st frame,
either from segment or just bounding box annotations, our
system generates results comparable to ground truth.
At test time our base MaskTrack system runs at about
12 seconds per frame (averaged over DAVIS, amortizing
the online fine-tuning time over all video frames), which
is a magnitude faster compared to ObjFlow [
49] (takes 2
minutes per frame, averaged over DAVIS).
5. Results
In this section we describe our evaluation protocol
5.1), study the importance of the different components
of our system
5.2), and report results comparing to state-
of-the-art techniques over three datasets
5.3), as well as
comparing the effects of different amounts of annotation on
the resulting segmentation quality
5.4). Additional results
are provided in the supplementary material.
5.1. Experimental setup
Datasets. We evaluate the proposed approach on three dif-
ferent video object segmentation datasets: DAVIS [
34],
YoutubeObjects [
39], and SegTrack-v2 [26]. These datasets
include assorted challenges such as appearance change, oc-
clusion, motion blur and shape deformation.
DAVIS [
34] consists of 50 high quality videos, totaling
3 455 frames. Pixel-level segmentation annotations are pro-
vided for each frame, where one single object or two con-
nected objects are separated from the background.
YoutubeObjects [
39] includes videos with 10 object cat-
egories. We consider the subset of 126 videos with more
than 20 000 frames, for which the pixel-level ground truth
segmentation masks are provided by [
21].
SegTrack-v2 [
26] contains 14 video sequences with 24
objects and 947 frames. Every frame is annotated with a
pixel-level object mask. As instance-level annotations are
provided for sequences with multiple objects, each specific
instance segmentation is treated as separate problem.
Evaluation. We evaluate using the standard mIoU metric:
intersection-over-union of the estimated segmentation and
the ground truth binary mask, also known as Jaccard In-
dex, averaged across videos. For DAVIS we use the pro-
vided benchmark code [
34], which excludes the first and
the last frames from the evaluation. For YoutubeObjects
and SegTrack-v2 only the first frame is excluded.
Previous works used different evaluation procedures. To
ensure a consistent comparison between methods, when
needed, we re-computed scores from the publicly available
output masks, or reproduced the results using the available
open source code. In particular, we collected new results for
ObjFlow [
49] and BVS [29] in order to present other meth-
ods with results across the three datasets.
5.2. Ablation study
We first study different ingredients of our method. We
experiment on the DAVIS dataset and measure the per-
formance using the mean intersection-over-union metric
(mIoU). Table
1 shows the importance of each of the in-
gredients described in §
3 and reports the improvement of
adding extra components to the MaskTrack model.
Add-ons. We first study the effect of adding a couple of
ingredients on top of our base MaskTrack system, which
are specifically fine-tuned for DAVIS. We see that optical
flow provides complementary information to the appear-
ance, boosting further the results (74.8 78.4). Adding
on top a well-tuned post-processing CRF [
23] can gain a
couple of mIoU points, reaching 80.3% mIoU on DAVIS,
the best known result on this dataset.
Albeit optical flow can provide interesting gains, we
found it to be brittle when going across different datasets.
Different strategies to handle optical flow provide 1 4%
on each dataset, but none provide consistent gains across
all datasets; mainly due to failure modes of the optical flow
algorithms. For the sake of presenting a single model with
fix parameters across all datasets, we refrain from using a
per-dataset tuned optical flow in the results of §
5.3.
Training. We next study the effect of offline/online train-
ing of the network. By disabling online fine-tuning, and
only relying on offline training we see a 5 IoU percent
points drop, showing that online fine-tuning indeed expand
the tracking capabilities. If instead we skip offline training
and only rely on online fine-tuning performance drop drasti-
cally, albeit the absolute quality (57.6 mIoU) is surprisingly
high for a system trained on ImageNet+single frame.
By reducing the amount of training data from 11k to 5k
we only see a minor decrease in mIoU; this indicates that
even with the small amount of training data we can achieve
reasonable performance. That being said, further increase
of the training data volume would lead to improved results.
Additionally, we explore the effect of the offline training
on video data instead of using static images. We train the
model on the annotated frames of two combined datasets,
SegTrack-v2 and YoutubeObjects. By switching to train on
video data we observe a minor decrease in mIoU; this could
be explained by lack of diversity in the video training data
2667

Citations
More filters
Proceedings ArticleDOI

Fast Online Object Tracking and Segmentation: A Unifying Approach

TL;DR: This method improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task, and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second.
Proceedings ArticleDOI

One-Shot Video Object Segmentation

TL;DR: One-shot video object segmentation (OSVOS) as mentioned in this paper is based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence.
Journal ArticleDOI

Video Salient Object Detection via Fully Convolutional Networks

TL;DR: Wang et al. as discussed by the authors proposed a deep video saliency network consisting of two modules, for capturing the spatial and temporal saliency information, respectively, which can directly produce spatio-temporal saliency inference without time-consuming optical flow computation.
Journal ArticleDOI

Advances in Computer Vision-Based Civil Infrastructure Inspection and Monitoring

TL;DR: An overview of recent advances in computer vision techniques as they apply to the problem of civil infrastructure condition assessment and some of the key challenges that persist toward the goal of automated vision-based civil infrastructure and monitoring are presented.
Proceedings ArticleDOI

SegFlow: Joint Learning for Video Object Segmentation and Optical Flow

TL;DR: In this paper, SegFlow has two branches where useful information of object segmentation and optical flow is propagated bidirectionally in a unified framework, and the unified framework is trained iteratively offline to learn a generic notion, and fine-tuned online for specific objects.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Journal ArticleDOI

The Pascal Visual Object Classes Challenge: A Retrospective

TL;DR: A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.
Related Papers (5)