HAL Id: hal-03390570
https://hal.archives-ouvertes.fr/hal-03390570
Submitted on 21 Oct 2021
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Low-shot learning of plankton categories
Simon-Martin Schröder, Rainer Kiko, Jean-Olivier Irisson, Reinhard Koch
To cite this version:
Simon-Martin Schröder, Rainer Kiko, Jean-Olivier Irisson, Reinhard Koch. Low-shot learning of
plankton categories. Pattern Recognition. GCPR 2018. Lecture Notes in Computer Science, 11269,
2019, 978-3-030-12939-2. �hal-03390570�
Low-Shot learning of plankton categories
Simon-Martin Schröder
1[0000−0002−6603−9907]
, Rainer Kiko
2[0000−0002−7851−9107]
,
Jean-Olivier Irisson
3[0000−0003−4920−3880]
, and
Reinhard Koch
1[0000−0003−4398−1569]
1
Department of Computer Science, Kiel University, Kiel, Germany
{sms,rk}@informatik.uni-kiel.de
2
GEOMAR Helmholtz-Centre for Ocean Research, Kiel, Germany
rkiko@geomar.de
3
Sorbonne Université, CNRS, Laboratoire d’Océanographie de Villefanche, LOV,
Villefranche-sur-mer, France
irisson@obs-vlfr.fr
Abstract.
The size of current plankton image datasets renders manual
classification virtually infeasible. The training of models for machine
classification is complicated by the fact that a large number of classes
consist of only a few examples. We employ the recently introduced weight
imprinting technique in order to use the available training data to train
accurate classifiers in absence of enough examples for some classes.
The model architecture used in this work succeeds in the identification of
plankton using machine learning with its unique challenges, i.e. a limited
number of training examples and a severely skewed class size distribution.
Weight imprinting enables a neural network to recognize small classes
immediately without re-training. This permits the mining of examples
for novel classes.
(a) UVP5 (b) ZooScan
Fig. 1: Example images from both datasets.
1 Introduction
Planktonic organisms – drifters in the ocean – cover a large size range from
nanometer-sized bacteria to meter-sized jellyfishes. While some of these organisms
2 SM Schröder et al.
such as the planktonic copepods can be observed nearly everywhere, others occupy
only small niches. Past observations allow an overview of the most abundant
groups but we can expect that the number of classes will keep increasing with
increasing sampling effort.
Current imaging systems (e.g. UVP5, ZooScan, ISIIS, FlowCytoBot [
25
,
13
,
6
,
21
])
that target the micro to macroplankton size range (approx.
10 µm
to
10 cm
) yield
large amounts of image data every day. The size of the resulting datasets renders
manual classification virtually infeasible. Therefore, accurate machine classifi-
cation is a critical step in the processing of these data. Usually, the result is
later verified by human experts. Even the annotation of pre-classified data is still
labor-intensive [7,12], which is why maximally accurate models are crucial.
This work is part of a larger undertaking with the aim of continually mon-
itoring newly acquired data for classes that have been overlooked so far. The
observation of new kinds of objects means that the machine classification models
need to be updated to incorporate these novel classes. In addition, plankton
image datasets typically consist of few classes with many examples and many
classes with only a few examples. A major problem is therefore the scarcity of
training data for a large number of classes.
Here we tackle the question of how available labeled data can be used to
train accurate machine classifiers when some class sizes in the training data
set are very small, which is known as low-shot learning. We employ a recently
presented method for low-shot learning called weight imprinting [
27
] that is able
to incorporate new classes into a model without re-training it from scratch.
The contribution of this present paper is a rigorous evaluation of whether
weight imprinting works satisfactorily for two plankton image datasets. We also
examine the necessity of the architectural choices made in [27].
Our hypothesis is that once we have trained a classifier, we can use it to
find more examples for underrepresented and novel classes within a large set of
unlabeled data. In this current work, we therefore focus on the smaller classes
instead of maximizing overall accuracy.
The remaining part of this paper is structured as follows. In section 2 we
introduce two plankton image datasets. Then we review the related work in
section 3. Section 4 reproduces the most important aspects of the weight im-
printing technique. In section 5 we apply weight imprinting to both plankton
datasets. Subsequently, we report and discuss our results in section 6 and draw a
conclusion in section 7.
2 Datasets
We evaluate the approach on two datasets extracted from the plankton image
database EcoTaxa [
24
]. The objects were sampled on numerous cruises in many
parts of the world’s oceans. The first dataset (UVP5) consists of 588,121 pelagic
underwater images acquired with the UVP5 [
25
]. The images were sorted by
experts into 65 classes. The dataset is available from the authors upon reasonable
request. The second dataset (ZooScan) [
10
] consists of 1,433,282 wet net samples
Low-Shot learning of plankton categories 3
fluffy_dark (116,090)
fluffy_light (57,199)
Trichodesmium_tuff (45,485)
Trichodesmium_puff (44,786)
detritus_compact (41,202)
Maxillopoda_Copepoda (15,790)
Collodaria_solitaryblack (10,300)
detritus_fiber (8,470)
not-living_feces (6,416)
temporary_t011 (5,968)
Mollusca_veliger (5,072)
Malacostraca_Eumalacostraca (4,464)
detritus_light (3,246)
fiber_fluffy (3,150)
detritus_ovoid (2,762)
Collodaria_solitaryglobule (2,762)
Phaeosphaerida_Aulosphaeridae (2,558)
artefact_badfocus (2,210)
Retaria_Acantharea (2,100)
temporary_t010 (1,517)
temporary_t012 (1,491)
Aulacanthidae_Aulacantha (1,208)
artefact_bubble (987)
Metazoa_Chaetognatha (981)
artefact_turbid (944)
Metazoa_Annelida (766)
Phaeodaria_leg (704)
Collodaria_solitarygrey (576)
Collodaria_collonial (456)
Collodaria_solitaryfuzzy (393)
Terebellida_Poeobius (335)
Cnidaria_Hydrozoa (223)
temporary_t001 (213)
Retaria_Foraminifera (196)
Metazoa_Cnidaria (190)
Oligostraca_Ostracoda (96)
Vertebrata_Gnathostomata (95)
temporary_t006 (77)
Metazoa_Mollusca (60)
Appendicularia_house (59)
Hydrozoa_Cnidaria (57)
Metazoa_Ctenophora (52)
Arthropoda_Crustacea (48)
Munididae_Pleuroncodes (43)
Hydroidolina_Siphonophorae (43)
Crustacea_leg (36)
Tunicata_Appendicularia (24)
Pyrosomatida_Pyrosoma (21)
Trachylina_Narcomedusae (17)
temporary_t004 (16)
othertocheck_darksphere (16)
Thaliacea_Salpida (13)
Annelida_Polychaeta (12)
Solmundella_Solmundella bitentaculata (8)
temporary_t003 (8)
temporary_t015 (8)
Terebellida_Flota (7)
Mollusca_Cephalopoda (6)
Appendicularia_body (6)
Diplostraca_Cladocera (5)
Euopisthobranchia_Gymnosomata (5)
Euopisthobranchia_Thecosomata (4)
temporary_t002 (4)
temporary_t009 (3)
temporary_t005 (2)
10
1
10
2
10
3
10
4
10
5
#Objects
Fig. 2: UVP5 dataset: Classes ordered by their size in the training set. The class
sizes span five orders of magnitude.
digitized with the ZooScan system [
13
] and sorted into 93 classes. We use a subset
of 1,146,684 images for training and validation.
Both datasets are severely imbalanced, as shown in Figure 2 for the UVP5
dataset. The
10 %
most populated classes contain more than
77 %
of all objects
and the class sizes span multiple orders of magnitude. Figure 1 shows some
exemplary objects from both datasets.
3 Related Work
One-shot and low-shot learning
One-shot and low-shot learning is concerned
with training a model with only one or a few training examples for each class.
Low-shot learning using neural networks usually incorporates two phases [
15
].
In the representation learning phase, the learner finds a suitable feature space,
usually guided by a set of base classes with abundant examples. In the low-
shot learning phase, a classifier is trained that incorporates both base and low-
shot classes. Different approaches emphasize different aspects of the process [
3
]:
the discriminative approach is concerned with learning powerful features, the
generative approach enlarges the training set by augmentation or generation and
the network structural approach utilizes new types of classifiers.
Weight imprinting [
27
], label diffusion [
9
], and metric learning [
20
] belong to
the third category. They provide low-shot learning without having to retrain the
whole model from scratch.
Classification of plankton images
Classification of plankton images is tra-
ditionally performed using shallow models, like Support Vector Machines or
4 SM Schröder et al.
Random Forests, trained with handcrafted local features measured on the image
(e.g. size, grey level distribution, etc.) [8,1,28,13,11].
Since Kaggle’s National Data Science Bowl competition to sort data from
ISIIS [6], there has been a slow transition towards deep models [26,18,14,22,4].
In the representation learning phase, we rely on the observations of [
22
]
regarding the classification of plankton images with deep learning models, i.e.
that the initialization with pre-trained weights outperforms random initialization.
4 Weight imprinting
In this section, we outline the most important aspects of weight imprinting as
introduced by [27].
The technique follows the two-phase paradigm of [
15
]: The set of all classes
C
is partitioned into base classes
C
0
with enough training data and the smaller
low-shot classes C
+
, i.e. C = C
0
∪ C
+
.
In the representation learning phase, a convolutional neural network (CNN)
is trained to distinguish the base classes with enough training data
C
0
. In the
low-shot learning phase, the classifier is then updated with calculated weights
(see section 4.2 for details) to also to distinguish the smaller low-shot classes
C
+
. Finally, the whole model can be fine-tuned to further increase its predictive
power.
4.1 Neural network model
The model consists of two stages: A feature extractor network
f
:
I → R
d
maps
an input image
x ∈ I
to an
L
2
-normalized
d
-dimensional feature vector
ˆy
. The
second stage is a modified softmax classifier
g
:
R
d
→
[0
,
1]
|C|
that maps the
feature activations to a discrete probability distribution of |C| classes.
g
i
(y) =
exp(s · ˆw
T
i
ˆy)
P
j∈C
exp(s · ˆw
T
j
ˆy)
(1)
ˆw
i
is the the weight vector corresponding to class
i
and is normalized to
unit length as well. The scalar product
ˆw
T
i
ˆy
is the angle or cosine similarity [
19
]
between the feature vector and the weight vector. A weight vector
ˆw
i
therefore acts
as a template for class
i
.
s
is a learnable scale factor that allows the probabilities
to match the one-hot encoding of classes [29].
4.2 Low-shot learning
To learn a new class
c
+
∈ C
+
, the weight matrix is extended by a column
w
+
. It
follows from the above characterization of weight vectors
ˆw
i
and image feature
vectors
ˆy
that they are interchangeable. Therefore,
w
+
can be calculated directly
from the feature vectors of the examples of class
c
+
. In the simplest case, if only