scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Machine Learning with partially labeled Data for Indoor Outdoor Detection

TL;DR: The feasibility of an hybrid/semi-supervised classification method for detecting the environment of an active mobile phone, based on both labeled and unlabeled cellular radio data is demonstrated.
Abstract: This paper demonstrates the feasibility of an hybrid/semi-supervised classification method for detecting the environment of an active mobile phone, based on both labeled and unlabeled cellular radio data. Precisely, we provide answers to the following question: what is the environment of the mobile user when it is/was experiencing a mobile service/application: indoor or outdoor? Implementing this method within the mobile network is interesting for mobile operators since it has low complexity, is less human intrusive (minimal intervention of mobile users) and more accurate. The semi-supervised classification algorithm learns to identify the environment using large and real collected 3GPP signals measurements. As compared to existing work, in addition to existing parameters used for classification, we propose to also use a radio metric called Timing Advance. It is computed within the mobile network. We empirically validate the innovative semi-supervised algorithm using new real-time radio measurements, with partial ground truth information, gathered daily, weekly, monthly, from indoor and outdoor locations and from multiple typical and diversified environments crossed by mobile users. The study confirms the effectiveness of the proposed scheme compared to the existing supervised classification methods including SVM and Deep Learning.

Summary (2 min read)

Introduction

  • The authors also propose to extend it to mobile networks to deal with the challenge of detecting the environmental context of mobile users from network side.
  • The data measured by multiple UEs during their connection is sent to eNB, using standardized procedures.
  • The authors are interested in Machine Learning (ML), one of the popular techniques, for automatic IOD.
  • Among ML families, the authors consider supervised learning and more particularly semi-supervised learning which can be seen as a mix of supervised and unsupervised approaches.

III. COLLECTED DATA FOR IOD

  • The authors analyze the statistical differences by focusing on the empirical cumulative distribution function (CDF) between indoor and outdoor environments, using a large and real data-set collected at multiple places, many environments.
  • The authors illustrate the impact of the two environments on the empirical CDFs, according to where the data is collected.

A. Data Description

  • The authors large data set consists in Time, 3 LUMD radio signals, the metric Timing Advance (TA) and the label when it is known.
  • The set of these signals has been collected during 9 months, 24h/7 (From October 2017 until June 2018), with an average of 1 measurement per 15 seconds while the mobile phone session is active and 1 measurement per 2 minutes otherwise.
  • The dataset is made of 40% of labeled data and 60% of unlabelled data.

B. Data collection: crowdsourcing vs. drive-test mode

  • In crowdsourcing mode, the collected data consists of signals measured by the mobile phone and sent to the eNB.
  • The significant offset between the indoor and the outdoor curves, results from substantial difference and attenuation variation in radio signal propagation.
  • Also the extreme values seen in the two indoor and outdoor CDFs (located in tails) get similar and the division between the two gets blurred.
  • To model this way of collecting data, referred as drivetest mode, the authors extract a portion data (EPD) from the whole dataset.

IV. CLASSIFICATION USING SUPERVISED LEARNING OR CLUSTERING

  • After analyzing the statistical properties of I/O environments, the authors first evaluate the accuracy and the performance of supervised classifiers for IOD.
  • For this, the authors use the accuracy metric which is the ratio of correctly classified instances divided by the total instances and the metric F1− score that is by definition the weighted average of Precision and Recall according to the following relation: F1− score = 2. P recision.
  • Additionally, Tables I, and III show that (RSRP, CQI) as input provides similar results as (RSRP, RSRQ) when used for classifying EPD or crowdsourcing data.
  • The results shows that the information contained in CQI is also useful for IOD and thus, (RSRP, CQI) is also a good candidate for IOD.
  • Learning the user environment, only based on drive-test data, is thus not enough to learn the complexities of users’ real life.

VI. RESULTS AND DISCUSSION

  • This section evaluates the performance of HSSL on the crowdsourced data.
  • Actually, ReLU is the most widely used activation function while designing neural networks today.
  • The system receives both labeled and unlabeled data as inputs.
  • For this, the authors aim to compare HSSL (including SVM or DL) with SVM and DL, alone, when trained over same amount of tagged data (with the only difference that HSSL in addition also uses untagged data).
  • The authors observe that IOD performs better using DL than using SVM in both cases.

VII. CONCLUSION

  • The authors investigated the problem of IOD performed at network side using 3GPP signals and Timing Advance data collected inside the infrastructure.
  • The authors first showed that using a drive test dataset is insufficient to mimic the real world complexity and reveal the real user behavior.
  • By diversifying the environments more (using a highly representative crowdsourced dataset) during the training phase, the authors showed that the more environments they have for the training phase, the better the supervised classifier performs.
  • The authors also showed that adding a new parameter, Timing Advance, can improve IOD performance.
  • The HSSL system presents satisfactory performance even when facing unknown environments.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: hal-02011454
https://hal.archives-ouvertes.fr/hal-02011454
Submitted on 12 Feb 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Machine Learning with partially labeled Data for Indoor
Outdoor Detection
Illyyne Saar, Marie-Line Alberi-Morel, Kamal Deep Singh, César Viho
To cite this version:
Illyyne Saar, Marie-Line Alberi-Morel, Kamal Deep Singh, César Viho. Machine Learning with par-
tially labeled Data for Indoor Outdoor Detection. CCNC 2019 - 16th IEEE Consumer Communications
& Networking Conference, Jan 2019, Las Vegas, United States. pp.1-7, �10.1109/CCNC.2019.8651736�.
�hal-02011454�

Machine Learning with partially labeled Data for
Indoor Outdoor Detection
Illyyne Saffar
Service Automation,
Nokia Bell Labs
Nozay, France
illyyne.saffar@nokia.com
Marie Line Alberi Morel
Service Automation,
Nokia Bell Labs
Nozay, France
marie line.alberi-morel@nokia.com
Kamal Deep Singh
Laboratoire Hubert Curien,
University of Saint-Etienne,
Saint-Etienne, France
kamal.singh@univ-st-etienne.fr
Cesar Viho
IRISA - INRIA,
University of Rennes 1,
Rennes, France
Cesar.Viho@irisa.fr
Abstract—This paper demonstrates the feasibility of an
hybrid/semi-supervised classification method for detecting the
environment of an active mobile phone, based on both labeled and
unlabeled cellular radio data. Precisely, we provide answers to
the following question: what is the environment of the mobile user
when it is/was experiencing a mobile service/application: indoor
or outdoor? Implementing this method within the mobile network
is interesting for mobile operators since it has low complexity,
is less human intrusive (minimal intervention of mobile users)
and more accurate. The semi-supervised classification algorithm
learns to identify the environment using large and real collected
3GPP signals measurements. As compared to existing work,
in addition to existing parameters used for classification, we
propose to also use a radio metric called Timing Advance. It is
computed within the mobile network. We empirically validate the
innovative semi-supervised algorithm using new real-time radio
measurements, with partial ground truth information, gathered
daily, weekly, monthly, from indoor and outdoor locations and
from multiple typical and diversified environments crossed by
mobile users. The study confirms the effectiveness of the pro-
posed scheme compared to the existing supervised classification
methods including SVM and Deep Learning.
Index Terms—Environment classification, Machine Learning,
Indoor Outdoor Detection, 3GPP radio measurement, crowd-
sourcing, real user activity.
I. INTRODUCTION
Recent technological breakthroughs have extended the mo-
bile phones’ features, functions and capabilities, which are
now used for more than just communicating or affording ap-
plications. Recently, mobile devices are being utilized to know
the consuming habits of individuals and communities [1], [2],
[3]. Thus, our purpose is to inject this learned cognition into
mobile 5G networks to help them grow smarter and be more
efficient when faced to the increasing complexity of network
management combined with numerous new applications and
their heterogeneous needs.
As a first step to bring such additional knowledge to the net-
work, we target Indoor/Outdoor Detection (IOD) in this paper.
IOD refers to the estimation of the mobile users’ environments,
that is to infer whether the user is Indoor or Outdoor. IOD is
a cornerstone of the user behavior contextualization, which
in turn can be used for learning the user behavior, adapting
mobile network resources, etc [4], [5]. The idea is to have
more information on the user like knowing his environment
type or his location.
IOD can be performed automatically and in real-time using
machine learning techniques, which in turn need data for
learning. Thus, data collection is the first phase of designing
IOD solution based on machine learning. Recently, a new
crowd-sourcing approach [6], [7] is becoming popular for
collecting and analyzing real and large network measurement
datasets coming from mobile phones or any other connected
devices. This method exploits smartphones (with built-in cellu-
lar network interface) with their various measurement sensors.
Additionally, data obtained from smartphones has the natural
mobility vector of people carrying them. This ensures cost-
effective, continual and fine-grained spatio-temporal moni-
toring and analyses of mobile networks. For our work, we
propose to investigate this concept of large and real crowd-
sourced measurements for IOD. We also propose to extend it
to mobile networks to deal with the challenge of detecting the
environmental context of mobile users from network side. The
idea is to collect data, which is measured or derived within
network, and then consider it as an input for the machine
learning based classifier used for training, learning and then
detection. The data measured by multiple UEs during their
connection is sent to eNB, using standardized procedures.
Such solutions are interesting for mobile network opera-
tors that wish to exploit cognition of user behavior to op-
timize/customize their service delivery with minimal inter-
vention of the users. Furthermore, such measurements, as an
alternative to coverage modeling or drive tests [6], capture
reality well, reveal real life of a mobile user while at the
same time being less expensive. This method can then be
implemented by the operators in their networks, as a generic
solution, independent of the implementations of particular
manufacturers. Consequently, it allows the mobile network to
exploit direct measurements at user side to deduce contextual
factors such as the user environment.
In 4G/5G cellular networks, such solutions are technically
feasible since enormous amount of mobile measurement data
is collected by the mobile terminal. This data is regularly
sent to the network using standardized protocols and interfaces
during each UE’s connection to the cell (on a per-procedure
basis and on a network defined event basis). This measurement
data is referred to as LTE UE Measurement Data (LUMD) [8].
LUMD contains rich information on mobile performance and
RF metrics such as signal strength (Reference Signal Receive

Power or RSRP), signal plus interference and noise strength
(Reference Signal Receive quality or RSRQ). It also includes
the Channel Quality Indicator (CQI) that is a function of SINR.
In this work, we aim to achieve the following objectives:
(1) infer the user environmental context, from certain
LUMD metrics collected in crowdsourcing mode and the
radio metric, Timing Advance, assessed by the network
when the user is connected to a session. In fact, the
environment considered is divided into two main types:
Indoor: at home, in restaurant, in cafe/ at work or in
other building types, etc.
Outdoor: pedestrian, running or in car moving with
high speed.
(2) consider the constraint that the inference shall be
done at network side with minimal human interaction or
intervention.
To achieve (1) and (2), we design a method for training
IOD automatic classifier based on a weakly or partially labeled
crowdsourced dataset. Such dataset reduces human interven-
tion to the lowest possible. Indeed, the labeled data, used
for machine learning training, is either tagged manually or
automatically. Manual data tagging can be expensive, complex
and even unfeasible for mobile operators if they have to tag
all the collected crowdsourced data.
In this paper, we are interested in Machine Learning (ML),
one of the popular techniques, for automatic IOD. Among ML
families, we consider supervised learning and more particu-
larly semi-supervised learning which can be seen as a mix of
supervised and unsupervised approaches. Supervised learning
is more adapted for classification tasks. It uses labeled data to
learn the mapping between data and the labels. Unsupervised
learning looks for patterns and structures within the data for
tasks such as clustering. The semi-supervised learning, which
is an hybrid approach, is becoming popular with growing
abundance of data in this era. It proposes a learning scheme
based on partially or weakly labeled dataset in order to achieve
a classification task or a function approximation task. In our
case, semi-supervised learning allows the mobile operator
to use labeled data from a few users and combine it with
lot of unlabeled and easily available data collected from
several users. This combination allows to learn all possible
environment types related to the user behavior.
The rest of this paper is organized as follows. Section
II describes the main IOD works in literature. In Section
III, a comparative analysis of crowdsourcing and drive-test
data collection modes is provided. In section IV, results with
supervised classification and clustering algorithms are given.
Section V and VI present a new Deep Learning-based semi-
supervised learning approach proposed for IOD from the
network side. Section VI discuss the results.
II. RELATED WORK
In the literature, the IOD issue has not been largely studied:
only few works address it. Proposed solutions are usually
divided in to two categories [9]. IOD is either considered as
a statistical issue where a weighted score or a threshold is
defined to determine the mobile environment, or as a classifi-
cation problem sorting mobile users between multiple classes.
In most of these works, only two classes are considered
(Indoor/Outdoor) but, in some works, three classes are selected
(e.g. Indoor/Semi-Outdoor/Outdoor). The Figure 1 shows an
illustration of the whole dependency of existing classes.
Fig. 1. Example of IOD classification scheme: in 3 main classes
In addition to such categorization, IOD problem can also be
distinguished based on the location where IOD is performed,
either at the mobile terminal side or at the mobile network side.
In the following, we highlight some of the works dealing with
the IOD issue, presenting them according to this classification.
In first category, [10] looks at a threshold of signals col-
lected from some phone sensors related to: radio signals, cell
signal strength, light intensity as well as the magnetic sensor to
infer whether the mobile user is indoor or outdoor. However,
this threshold is specific to the experimental settings where it is
calculated. It is not generalizable to new environments. Thus,
using just a threshold decreases the IOD accuracy. Similar to
[10], the work in [5] also uses the same signals, but also con-
sidered sound intensity, battery temperature and the proximity
sensor. For IOD, they propose a semi-supervised approach:
a co-training solution. They use 2 classifiers in parallel with
a weighted score of classification probability to improve the
final performance of IOD. For every classifier, they select
a different set of sensors to learn different perspectives and
patterns. This work shows high performance (more than 90%
of accuracy) in the detection of new instances in unknown
environments. However, the impact of this work is limited
since their database is not highly representative. Indeed, the
used data set was only collected in three places (the campus
area, city center, residential area) which are not enough to
train a general IOD system.
The work in [4], proposes a video streaming optimization
based on adaptation as a function of the user location in
time. For that, IOD is computed via a Bayesian detector
that combines measurements from two smartphone sensors to
decide the user environment type.
In second category, in [11] authors optimize the use of
radio measurements in wireless networks. Literally, they use
radio signal measurements collected in different situations

of mobility with varying speed (low, medium, high) namely
(pedestrian, incar and unmoving). They dynamically estimate
the signal attenuation. This in turn helps them to efficiently
classify mobile user environment (pedestrian, incar, unmoving)
and finally improves the handover process. Authors assume
that once the signal power attenuation is estimated correctly,
we can easily come to classify whether the mobile user is
pedestrian, in car or unmoving. This is because the measured
power signal for an unmoving user does not show too much
variations unlike the incar or pedestrian cases. Nevertheless,
this proposition is still at an early stage and it has not
been thoroughly developed yet. In [8], the main issue is to
localize the mobile user by estimating its longitude and latitude
in a most possible accurate way. For this, they made the
assumption that mobile users are outdoor, thus giving rise
to the importance of IOD and the necessity to classify the
user environment. For the classification task, they used RSRP
and RSRQ signals and tested many algorithms: SVM, logistic
regression and random forest. SVM was the retained solution
since it performed best.
In this paper, we focus on the IOD automation within the
network side using machine learning algorithms. They are
trained using large real dataset while minimizing the mobile
user interaction (minimal labels). We look at the performance
in terms of F 1 scores of supervised and semi-supervised
IOD methods. Goal is to evaluate the minimal amount of
labeled data required for obtaining good IOD performance.
III. COLLECTED DATA FOR IOD
In this section, we analyze the statistical differences by
focusing on the empirical cumulative distribution function
(CDF) between indoor and outdoor environments, using a large
and real data-set collected at multiple places, many environ-
ments. We illustrate the impact of the two environments on
the empirical CDFs, according to where the data is collected.
A. Data Description
Our large data set consists in Time, 3 LUMD radio signals,
the metric Timing Advance (TA) and the label when it is
known. Thus, it has a vector of 6 features with the label:
Time: time of signal record
RSRP: the average received power of the Reference
Signal (RS) between -140 dBm to -44 dBm [12], sent
by eNB.
RSRQ: the ratio between RSRP and RSSI (Received
Signal Strength Indicator) between -19.5dB and -3dB
[12], that represents the total power of the received
signal (including the transmitted signal, the noise and the
interference).
CQI: indicator reported by UE to eNB that gives the most
appropriate modulation scheme and coding scheme to be
used for transmission [13].
TA: used to control Uplink signal transmission timing.
It is indicated by eNB to UE via a Timing Advance
command [14].
The set of these signals has been collected during 9 months,
24h/7 (From October 2017 until June 2018), with an average
of 1 measurement per 15 seconds while the mobile phone
session is active and 1 measurement per 2 minutes otherwise.
The dataset is made of 40% of labeled data and 60% of
unlabelled data. The 9 months collection has been performed
in many different environments like mountain, beach, forest,
companies, cafes, streets, bars, parks, restaurants, lakes, etc...
It was also performed in many cities and places like country-
side, villages, small cities, metropolis, and different countries,
but for this paper we are only studying the data collected in
France (Figure 2). This long collection period allows us to
have data reflecting all weather types: Heavy Rain, Foggy,
Sunny, Snowy, Windy, Rainy,... i.e. almost the 4 seasons.
Therefore with this campaign of data measurement we try to
be as close as possible to the complexity and the variety of a
mobile user moving in real world.
Fig. 2. Data collection Points in France: multiple environments and places
B. Data collection: crowdsourcing vs. drive-test mode
In crowdsourcing mode, the collected data consists of sig-
nals measured by the mobile phone and sent to the eNB. Our
dataset described in the previous subsection has been collected
using this mode. Figures 3 shows the empirical cumulative
distribution functions (CDFs) of RSRP and CQI obtained
with the dataset. The significant offset between the indoor
and the outdoor curves, results from substantial difference
and attenuation variation in radio signal propagation. It is
mainly due to reflection, diffraction, dispersion and attenuation
experienced in indoor environment. However, we note that
there is some overlap between the ranges of RSRP and CQI
values. Also the extreme values seen in the two indoor and
outdoor CDFs (located in tails) get similar and the division
between the two gets blurred. The behaviour at the juncture
of extreme values can be explained by the ambiguous char-
acteristics of the environment when a user is at high speed
(Train, car...) or when he is in a semi indoor environments (like
balconies, semi-open building, near a window.., etc. We argue
that these points are ambiguous and will pose a good challenge
for supervised classification, since they can be indifferently
classed indoor or outdoor at the same time.

0
0.2
0.4
0.6
0.8
1
-140 -120 -100 -80 -60 -40
F(x)
RSRP (dBm)
Indoor
Outdoor
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12 14 16 18
F(x)
CQI
Indoor
Outdoor
Fig. 3. Empirical CDF for measured RSRP (left) and CQI (right) in
crowdsourcing mode: multiple environments and places - Indoor (red) and
Outddor (green).
0
0.2
0.4
0.6
0.8
1
-140 -120 -100 -80 -60 -40
F(x)
RSRP (dBm)
Indoor
Outdoor
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12 14 16 18
F(x)
CQI
Indoor
Outdoor
Fig. 4. Empirical CDF for measured RSRP (left) and CQI (right) in drive-
test type mode: specific environments and places - Indoor (red) and Outddor
(green).
An alternate data collection mode, widely used to collect
data, is the drive-test mode. However, this mode imposes limits
on capturing the reality through the data collected in this mode.
Such data collection campaigns are run for limited hours per
day during short period (couple of weeks) and at some specific
places. To model this way of collecting data, referred as drive-
test mode, we extract a portion data (EPD) from the whole
dataset. We aimed by this selected EPD data to be as close
as possible to the type of places where the drive-test was
performed by one of the top 3 American operators in New
York City in [8]. Therefore to build EPD we consider data
only in metropolis (Paris and southern suburbs see figure 5).
Indeed, Paris as metropolis, has a dense and specific architec-
ture which allows better comparison with NYC. Concerning
indoor data, we selected instances where the user was strictly
indoor and, thus, not in “semi-indoor” positions like semi-
open building or balconies,...etc. For outdoor data, we chose
the instances where the user was either pedestrian or in vehicle
in different city streets (limited speed). Thus, to mimic drive-
test we consequently ignored data coming from environments
like subway/ countryside/ forest/ beaches/ Mountains/ .../ etc.
We did this to enable a fair comparison between the two
modes. Figure 4 shows well separated RSRP empirical cdfs
between the classes indoor and outdoor. The superimposed
points of both the cdfs we judge conflicting have disappeared.
The overlap between both the cdfs, which previously led to
ambiguity, has disappeared. This is due to the significant
distance between the indoor and the outdoor curves. In the case
of CQI cdfs we notice a similar phenomenon. This analysis
allows us to argue that supervised classification will run better
on labeled dataset collected in drive-test mode as compared to
obtained through crowdsourcing mode.
Fig. 5. The Data collection Points of EPD in drive-test like mode: Paris and
southern suburbs
IV. CLASSIFICATION USING SUPERVISED LEARNING OR
CLUSTERING
After analyzing the statistical properties of I/O environ-
ments, we first evaluate the accuracy and the performance of
supervised classifiers for IOD. For this, we use the accuracy
metric which is the ratio of correctly classified instances
divided by the total instances and the metric F 1 score that
is by definition the weighted average of Precision and Recall
according to the following relation:
F 1 score = 2.
P recision.Recall
P recision + Recall
where precision is the number of correct positive results
divided by the number of all positive results returned by the
classifier, and recall is the number of correct positive results
divided by the number of all relevant samples. F 1 score
is one of the most used metrics in case of unbalanced data
classes. Indeed, the statistics of our data show that the data
proportion between indoor and outdoor classes is unbalanced
65% Indoor vs. 35% Outdoor. This reflects the reality since
people, in general, spend more time at home or in indoor envi-
ronments than in outdoor environments. For the experiments,
we divided the dataset as follows: 70% for training, 30% for
validation and test. We evaluate the impact of both input pairs
(RSRP, RSRQ), which is the reference input for IOD in the
literature, vs. (RSRP, CQI), in three cases:
Training and evaluation on labeled EPD collected in
drive-test like mode (see Table I),
Training on labeled EPD and evaluation on the rest of
the labeled data of crowdsourcing mode, thus operating
with unknown environments (see Table II) and,
Training and evaluation on labeled data collected in
crowdsourcing mode (see Table III).
As shown in the table I, running either classification (SVM,
Random Forest, Neural Network) or clustering (k-means)
algorithms on EPD, obtained from drive-test like mode, shows
an excellent performance with an F 1score of 99%, which is

Citations
More filters
Posted Content
21 Dec 2020
TL;DR: A comprehensive survey of ML-based localization solutions that use RF signals is provided in this paper, where the authors provide a concise review of the main ML and wireless propagation concepts, which shall help the researchers in either field navigate through the surveyed solutions.
Abstract: The last few decades have witnessed a growing interest in location-based services. Using localization systems based on Radio Frequency (RF) signals has proven its efficacy for both indoor and outdoor applications. However, challenges remain with respect to both complexity and accuracy of such systems. Machine Learning (ML) is one of the most promising methods for mitigating these problems, as ML (especially deep learning) offers powerful practical data-driven tools that can be integrated into localization systems. In this paper, we provide a comprehensive survey of ML-based localization solutions that use RF signals. The survey spans different aspects, ranging from the system architectures, to the input features, the ML methods, and the datasets. A main point of the paper is the interaction between the domain knowledge arising from the physics of localization systems, and the various ML approaches. Besides the ML methods, the utilized input features play a major role in shaping the localization solution; we present a detailed discussion of the different features and what could influence them, be it the underlying wireless technology or standards or the preprocessing techniques. A detailed discussion is dedicated to the different ML methods that have been applied to localization problems, discussing the underlying problem and the solution structure. Furthermore, we summarize the different ways the datasets were acquired, and then list the publicly available ones. Overall, the survey categorizes and partly summarizes insights from almost 400 papers in this field. This survey is self-contained, as we provide a concise review of the main ML and wireless propagation concepts, which shall help the researchers in either field navigate through the surveyed solutions, and suggested open problems.

12 citations

Journal ArticleDOI
TL;DR: This paper proposes and evaluates three approaches for using quantum machine learning for a specific task in mobile networks: indoor–outdoor detection, indoor-outdoor learning and the potential the approaches have when larger systems become available.
Abstract: Communication networks are managed more and more by using artificial intelligence. Anomaly detection, network monitoring and user behaviour are areas where machine learning offers advantages over more traditional methods. However, computer power is increasingly becoming a limiting factor in machine learning tasks. The rise of quantum computers may be helpful here, especially where machine learning is one of the areas where quantum computers are expected to bring an advantage. This paper proposes and evaluates three approaches for using quantum machine learning for a specific task in mobile networks: indoor–outdoor detection. Where current quantum computers are still limited in scale, we show the potential the approaches have when larger systems become available.

9 citations

Proceedings ArticleDOI
11 Sep 2019
TL;DR: A Deep Learning based model is introduced to intelligently detect the user environment, using supervised and semi-supervised multi-output classification, and relevant multi-class schemes are proposed to efficiently regroup the multiple environment categories in more than two classes.
Abstract: Future mobile networks can hugely benefit from cognition of mobile user behavior. Indeed, knowing what/when/where/how the user consumes their mobile services can notably improve the self-adaptation and self-optimization capabilities of these networks and, in turn, ensure user satisfaction. The cognition of mobile user behavior will thus help 5G networks to face the variable consuming habits of users which in turn impact the network conditions, by predicting them in advance. In this paper, we focus on the "where" part, i.e., the detection of the environment where a given user consumes different mobile applications. A statistical study on the real activity of users reveals that there are multiple various environment types corresponding to the mobile phone usage. A Deep Learning based model is introduced to intelligently detect the user environment, using supervised and semi-supervised multi-output classification. Relevant multi-class schemes are proposed to efficiently regroup the multiple environment categories in more than two classes. We empirically evaluate the effectiveness of the proposed model using new real-time radio data, gathered massively from multiple typical and diversified environments of mobile users.

7 citations


Cites background or methods from "Machine Learning with partially lab..."

  • ...This unbalancing between the categories of various environments, observed in case of two classes in [3], is also augmented when classifying with more than two classes....

    [...]

  • ...In [12], authors used the same signals as [3], but with addition of a mobility indicator to solve some difficult cases of detection like when the user is in train (outdoor environment), and suffers from a drastic deterioration of the RSRP signal, then the Mobility Indicator helps to better detect such case....

    [...]

  • ...Both papers [3] and [12] show good performance of the UED...

    [...]

  • ...collected by the phone device and sent to the mobile network via 3GPP procedures [3]....

    [...]

  • ...In [3], [12] authors used grid search to optimize the hyperparameters of their deep learning model....

    [...]

Proceedings ArticleDOI
30 Sep 2019
TL;DR: This paper proposes a complete indoor-outdoor infrastructure-free positioning prototype including a foot-mounted reference navigation system named Pedestrian Reference System (PERSY) and a Ublox High Sensitivity GNSS (HS-GNSS) receiver (M8P) receiver and a loosely-coupled architecture between GNSS receiver and the PERSY.
Abstract: With the highly development of navigation techniques during the past decades, the demand for seamless indoor-outdoor navigation is growing from different application fields especially for the military or the first response emergency services. For military applications, one of the key performance requirements is the availability of the positioning solutions for all kinds of dynamics in different environments. Furthermore, due to the stealth requirement in some military actions, it is impossible for military vehicles or personnel to emit signals which enable to be detected by their opponents. This limitation prevents the use of infrastructure-based cooperative localization techniques.The research work of this paper aims at facing the following challenging issues: firstly, to design a positioning filter which is adaptive to the dynamic changes between walking and driving; secondly, to find an approach that correctly identifies the transition between outdoor and indoor with reduced latency; finally, to construct a loosely coupling GNSS/IMU scheme which takes into account the GNSS signal distortion in indoor and urban spaces.Under this context, we propose a complete indoor-outdoor infrastructure-free positioning prototype including a foot-mounted reference navigation system named Pedestrian Reference System (PERSY) and a Ublox High Sensitivity GNSS (HS-GNSS) receiver (M8P). A loosely-coupled architecture between GNSS receiver and the PERSY is employed by using an indicator of horizontal position accuracy PACCH provided by the GNSS Ublox M8P receiver. This indicator allows qualifying the position solutions delivered by the GNSS receiver as well as detecting the transition of indoor/outdoor, which helps the PERSY to update with absolute positions from GNSS. This positioning prototype can take advantage of both GNSS and PERSY so as to realize a seamless indoor-outdoor positioning for pedestrians and vehicles. The proposed system is evaluated in two scenarios over respectively 2.17 km and 2.68 km including indoor , outdoor and in-vehicle phases. The median horizontal position errors for the two scenarios are respectively 2.23 m and 1.93 m.

7 citations


Cites background or methods from "Machine Learning with partially lab..."

  • ...On the other hand, the Indoor-Outdoor (IO) can be detected by analyzing signal strengths from external sensors [8], [9], by recognizing certain landmarks with image processing [10], or by looking for specific signal features with the help of machine learning algorithms [11]–[13]....

    [...]

  • ...This can be realized by machine leaning methods such as in [8], [11]–[13], [21] as well as by image processing methods to recognize certain landmarks [10]....

    [...]

Proceedings ArticleDOI
07 Jun 2020
TL;DR: This paper focuses on Channel Quality Indicator (CQI) reports that are periodically sent from a UE to the base station, and proposes mechanisms to optimize the reporting process with the aim of reducing signaling overhead and avoiding the associated channel overloads.
Abstract: Channel quality feedback is crucial for the operation of 4G and 5G radio networks, as it allows to control User Equipment (UE) connectivity, transmission scheduling, and the modulation and rate of the data transmitted over the wireless link. However, when such feedback is frequent and the number of UEs in a cell is large, the channel may be overloaded by signaling messages, resulting in lower throughput and data loss. optimizing this signaling process thus represents a key challenge. In this paper, we focus on Channel Quality Indicator (CQI) reports that are periodically sent from a UE to the base station, and propose mechanisms to optimize the reporting process with the aim of reducing signaling overhead and avoiding the associated channel overloads, particularly when channel conditions are stable. To this end, we apply machine learning mechanisms to predict channel stability, which can be used to decide if the CQI of a UE is necessary to be reported, and in turn to control the reporting frequency. We study two machine learning models for this purpose, namely Support Vector Machines (SVM) and Neural Networks (NN). Simulation results show that both provide a high prediction accuracy, with NN consistently outperforming SVM in our settings, especially as CQI reporting frequency reduces.

4 citations


Additional excerpts

  • ...In fact, different types of data representing the channel state may be used, such as SNIR, CQI, and others [17]....

    [...]

References
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations


"Machine Learning with partially lab..." refers methods in this paper

  • ...We have used both scikit-learn [20] and keras [21] in python for the HSSL implementation....

    [...]

Posted Content
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from this http URL.

28,898 citations

Journal ArticleDOI
TL;DR: The goal is to assist the readers in refining the motivation, problem formulation, and methodology of powerful machine learning algorithms in the context of future networks in order to tap into hitherto unexplored applications and services.
Abstract: Next-generation wireless networks are expected to support extremely high data rates and radically new applications, which require a new wireless radio technology paradigm. The challenge is that of assisting the radio in intelligent adaptive learning and decision making, so that the diverse requirements of next-generation wireless networks can be satisfied. Machine learning is one of the most promising artificial intelligence tools, conceived to support smart radio terminals. Future smart 5G mobile terminals are expected to autonomously access the most meritorious spectral bands with the aid of sophisticated spectral efficiency learning and inference, in order to control the transmission power, while relying on energy efficiency learning/inference and simultaneously adjusting the transmission protocols with the aid of quality of service learning/inference. Hence we briefly review the rudimentary concepts of machine learning and propose their employment in the compelling applications of 5G networks, including cognitive radios, massive MIMOs, femto/small cells, heterogeneous networks, smart grid, energy harvesting, device-todevice communications, and so on. Our goal is to assist the readers in refining the motivation, problem formulation, and methodology of powerful machine learning algorithms in the context of future networks in order to tap into hitherto unexplored applications and services.

958 citations

Proceedings ArticleDOI
02 Nov 2011
TL;DR: This paper presents results on app usage at a national level using anonymized network measurements from a tier-1 cellular carrier in the U.S. and identifies traffic from distinct marketplace apps based on HTTP signatures and presents aggregate results on their spatial and temporal prevalence, locality, and correlation.
Abstract: Smartphone users are increasingly shifting to using apps as "gateways" to Internet services rather than traditional web browsers. App marketplaces for iOS, Android, and Windows Phone platforms have made it attractive for developers to deploy apps and easy for users to discover and start using many network-enabled apps quickly. For example, it was recently reported that the iOS AppStore has more than 350K apps and more than 10 billion downloads. Furthermore, the appearance of tablets and mobile devices with other form factors, which also use these marketplaces, has increased the diversity in apps and their user population. Despite the increasing importance of apps as gateways to network services, we have a much sparser understanding of how, where, and when they are used compared to traditional web services, particularly at scale. This paper takes a first step in addressing this knowledge gap by presenting results on app usage at a national level using anonymized network measurements from a tier-1 cellular carrier in the U.S. We identify traffic from distinct marketplace apps based on HTTP signatures and present aggregate results on their spatial and temporal prevalence, locality, and correlation.

440 citations


"Machine Learning with partially lab..." refers background in this paper

  • ...Recently, mobile devices are being utilized to know the consuming habits of individuals and communities [1], [2], [3]....

    [...]

Journal ArticleDOI
Jun Zhang1, Xiao Chen1, Yang Xiang1, Wanlei Zhou1, Jie Wu2 
TL;DR: The proposed RTC scheme has the capability of identifying the traffic of zero-day applications as well as accurately discriminating predefined application classes and is significantly better than four state-of-the-art methods.
Abstract: As a fundamental tool for network management and security, traffic classification has attracted increasing attention in recent years. A significant challenge to the robustness of classification performance comes from zero-day applications previously unknown in traffic classification systems. In this paper, we propose a new scheme of Robust statistical Traffic Classification (RTC) by combining supervised and unsupervised machine learning techniques to meet this challenge. The proposed RTC scheme has the capability of identifying the traffic of zero-day applications as well as accurately discriminating predefined application classes. In addition, we develop a new method for automating the RTC scheme parameters optimization process. The empirical study on real-world traffic data confirms the effectiveness of the proposed scheme. When zero-day applications are present, the classification performance of the new scheme is significantly better than four state-of-the-art methods: random forest, correlation-based classification, semi-supervised clustering, and one-class SVM.

330 citations


"Machine Learning with partially lab..." refers methods in this paper

  • ...As in [5], [15], [16], our approach uses both tagged and untagged data in order to improve the IOD classifier training, while maintaining the same good performances for a given ratio of tagged and untagged data....

    [...]

Trending Questions (1)
Is SVM a part of deep learning?

The study confirms the effectiveness of the proposed scheme compared to the existing supervised classification methods including SVM and Deep Learning.