scispace - formally typeset
Open AccessProceedings ArticleDOI

Soft-Output Sphere Decoding: Performance and Implementation Aspects

Reads0
Chats0
TLDR
It is demonstrated that single tree search, ordered QR decomposition, channel matrix regularization, and log-likelihood ratio clipping are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a computational complexity that is reasonably close to that of hard-output sphere decoding.
Abstract
Multiple-input multiple-output (MIMO) detection algorithms providing soft information for a subsequent channel decoder pose significant implementation challenges due to their high computational complexity. In this paper, we show how sphere decoding can be used as an efficient tool to implement soft-output MIMO detection with flexible trade-offs between computational complexity and (error rate) performance. In particular, we demonstrate that single tree search, ordered QR decomposition, channel matrix regularization, and log-likelihood ratio clipping are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a computational complexity that is reasonably close to that of hard-output sphere decoding.

read more

Content maybe subject to copyright    Report

Soft-Output Sphere Decoding:
Performance and Implementation Aspects
C. Studer
, M. Wenk
, A. Burg
, and H. Bölcskei
Integrated Systems Laboratory
ETH Zurich, Switzerland
email: {studer, mawenk, apburg}@iis.ee.ethz.ch
Communication Technology Laboratory
ETH Zurich, Switzerland
email: boelcskei@nari.ee.ethz.ch
Abstract Multiple-input multiple-output (MIMO) detection
algorithms providing soft information for a subsequent channel
decoder pose significant implementation challenges due to their
high computational complexity. In this paper, we show how
sphere decoding can be used as an efficient tool to implement
soft-output MIMO detection with flexible trade-offs between
computational complexity and (error rate) performance. In
particular, we demonstrate that single tree search, ordered QR
decomposition, channel matrix regularization, and log-likelihood
ratio clipping are the key ingredients for realizing soft-output
MIMO detectors with near max-log performance at a computa-
tional complexity that is reasonably close to that of hard-output
sphere decoding.
I. INTRODUCTION
Multiple-input multiple-output (MIMO) wireless systems
employ multiple antennas on both sides of the wireless link
and offer increased spectral efficiency (compared to single-
antenna systems) by transmitting multiple data streams concur-
rently and in the same frequency band (spatial multiplexing).
MIMO technology constitutes the basis for upcoming wireless
communication standards, such as IEEE 802.11n and IEEE
802.16e.
The main challenge in the practical realization of MIMO
wireless systems lies in the efficient implementation of the
detector which needs to separate the spatially multiplexed data
streams. To this end, a wide range of algorithms offering
various trade-offs between performance and computational
complexity have been developed [1]. Linear detection produc-
ing hard-decision outputs constitutes one extreme of the com-
plexity/performance trade-off region, while computationally
demanding a posteriori probability (APP) detection algorithms
result in the opposite extreme. The computational complexity
of a MIMO detection algorithm depends on the symbol
constellation size and the number of spatially multiplexed data
streams, but often also on the instantaneous MIMO channel
realization and the signal-to-noise ratio (SNR). On the other
hand, the overall decoding effort is typically constrained by
system bandwidth, latency requirements, and limitations on
power consumption. Implementing different algorithms, each
optimized for a maximum allowed decoding effort and/or a
particular system configuration, would entail a considerable
This work was supported by the STREP project No. IST-026905 (MAS-
COT) within the sixth framework programme (FP6) of the European Com-
mission.
hardware overhead and in addition be highly inefficient since
large portions of the chip would remain idle most of the time.
A practical MIMO receiver design must therefore be able to
cover a wide range of complexity/performance trade-offs using
a single tunable detection algorithm.
Contributions: In this (predominantly tutorial) paper, we
provide a formulation of the sphere decoder [2], [3] as a
tunable MIMO detector with performance ranging from that of
successive interference cancellation (SIC) to that of max-log
APP detection. Tuning of the detector is achieved through log-
likelihood ratio (LLR) clipping, preprocessing, and imposing
constraints on the maximum computational complexity of the
decoder. We formulate a framework for systematically char-
acterizing the resulting complexity/performance trade-offs. Fi-
nally, we elaborate on, and provide some refinements of, the
tree-search algorithm introduced in [4] and the LLR clipping
approach proposed in [5].
Outline: The remainder of this paper is organized as fol-
lows. Section II reviews the transformation of the MIMO
detection and LLR computation problems into a tree-search
problem. Section III reviews max-log APP sphere decod-
ing and proposes some refinements of existing algorithms.
In Section IV, we describe methods for reducing the tree-
search complexity. A framework for evaluating the complex-
ity/performance trade-offs of the resulting class of detectors is
introduced in Section V. We conclude in Section VI.
II. SOFT-OUTPUT SPHERE DECODING
Consider a MIMO system with M
T
transmit and M
R
M
T
receive antennas. The coded bit-stream is mapped to
M
T
-dimensional transmit vector symbols s O
M
T
, where O
stands for the underlying complex-valued scalar constellation
of cardinality 2
Q
. The individual coded bits are denoted by
x
j,b
, where the indices j and b refer to the bth bit in the
binary label of the jth entry of s, respectively. The resulting
complex baseband input-output relation is given by
y = Hs + n (1)
where H denotes the M
R
× M
T
channel matrix and n is
an i.i.d. proper complex Gaussian distributed M
R
-dimensional
noise vector with variance N
o
per complex entry.

A. Max-Log Soft-Output Computation
Soft-output MIMO detection requires the computation of
LLRs for all coded bits. In order to reduce the corresponding
computational complexity, we employ the max-log approxi-
mation [6]
L
x
j,b
= min
s∈X
(0)
j,b
ky Hsk
2
min
s∈X
(1)
j,b
ky Hsk
2
(2)
where X
(0)
j,b
and X
(1)
j,b
are the disjoint sets of vector symbols
that have the bth bit in the label of the jth scalar symbol equal
to 0 and 1, respectively, and the LLRs in (2) are normalized
to avoid dependence on the noise variance. For each bit, one
of the two minima in (2) is given by λ
ML
= ky Hs
ML
k
2
,
where
s
ML
= arg min
s∈O
M
T
ky Hsk
2
(3)
is the maximum likelihood (ML) solution. The other minimum
in (2) is given by
λ
ML
j,b
= min
s∈X
(
x
ML
j,b
)
j,b
ky Hsk
2
(4)
where the counter-hypothesis x
ML
j,b
denotes the binary comple-
ment of the bth bit in the binary label of the jth entry of s
ML
.
With (3) and (4) the max-log LLRs can be written as
L
x
j,b
=
(
λ
ML
λ
ML
j,b
, x
ML
j,b
= 0
λ
ML
j,b
λ
ML
, x
ML
j,b
= 1 .
(5)
From (5) we can conclude that efficient max-log APP MIMO
detection reduces to efficiently identifying s
ML
, λ
ML
, and λ
ML
j,b
for j = 1, 2, . . . , M
T
and b = 1, 2, . . . , Q [7].
B. Max-Log APP MIMO Detection as a Tree Search
Transforming (3) and (4) into tree-search problems and
using the sphere decoding algorithm [2], [3] allows to effi-
ciently compute the LLRs (5). To this end, the channel matrix
H is first QR-decomposed according to H = QR, where Q
is unitary of dimension M
R
× M
T
and the upper-triangular
M
T
×M
T
matrix R has real-valued positive entries on its main
diagonal. Left-multiplying (1) by
1
Q
H
leads to the modified
input-output relation
˜
y = Rs + Q
H
n with
˜
y = Q
H
y
and hence, noting that Q
H
n has the same statistics as n, to
the equivalent formulation of λ
ML
and λ
ML
j,b
as
λ
ML
= min
s∈O
M
T
k
˜
y Rsk
2
(6)
λ
ML
j,b
= min
s∈X
(
x
ML
j,b
)
j,b
k
˜
y Rsk
2
. (7)
We next define the partial symbol vectors (PSVs)
s
(i)
= [ s
i
s
i+1
· · · s
M
T
]
T
and note that the s
(i)
can be
arranged in a tree that has its root just above level i = M
T
and leaves, which correspond to possible candidate symbol
vectors, on level i = 1. After initializing d
M
T
+1
= 0, the
1
The superscript
H
stands for conjugate transposition.
Euclidean distances d(s) = k
˜
y Rsk
2
in (6) and (7) can be
computed recursively as d(s) = d
1
with the partial Euclidean
distances (PEDs)
d
i
= d
i+1
+ |e
i
|
2
, i = M
T
, M
T
1, . . . , 1 (8)
and the distance increments (DIs)
|e
i
|
2
=
˜y
i
M
T
X
j=i
R
i,j
s
j
2
. (9)
Since the dependence of the PEDs d
i
on the symbol vector s
is only through s
(i)
, we have transformed ML detection and
the computation of the max-log LLRs into a weighted tree-
search problem: PEDs and PSVs are associated with nodes,
branches correspond to DIs. Each path from the root down
to a leaf corresponds to a symbol vector s O
M
T
. The
leaf associated with the smallest metric in O
M
T
and X
x
ML
j,b
j,b
corresponds to the solution of (6) and (7), respectively. The
basic building block underlying the two tree traversal strategies
described in the next section is the Schnorr-Euchner sphere
decoder (SESD) with radius reduction [8], briefly summarized
as follows: The SESD constrains the search to nodes which
lie within a radius r around
˜
y and traverses the tree depth-
first, visiting the children of a given node in ascending order
of their PEDs. The basic idea of radius reduction is to start
the algorithm with r = and to update the radius according
to r
2
d(s) whenever a leaf s has been reached. This avoids
the problem of selecting a suitable (initial) radius and leads to
efficient pruning of the tree.
Throughout this paper, computational complexity is defined
as the number of visited nodes. This complexity measure
is directly related to the throughput of corresponding VLSI
implementations [9].
III. TREE-TRAVERSAL STRATEGIES
Computing the LLRs as in (5) requires determining the
metric λ
ML
j,b
, which is achieved by traversing only those parts
of the tree that have leaves in X
x
ML
j,b
j,b
. Since this computation
has to be carried out for every coded bit, it is immediately
obvious that the resulting need for repeated tree traversals can
lead to a major computational burden. In the following, we
review two alternative tree-traversal strategies, proposed in [7]
and [4], respectively, for solving (6) and (7). In addition, we
propose some minor refinements of the tree-search algorithm
introduced in [4].
A. Repeated Tree Search
An algorithm for computing the LLRs based on repeated
tree search (RTS) was described in [7]. The basic idea is
to start by solving (6) (using the SESD) and to rerun the
SESD to solve (7) for each coded bit (i.e., QM
T
times) in
the vector symbol. When rerunning the SESD to determine
λ
ML
j,b
, the search tree is prepruned by forcing the decoder to
exclude all nodes (and the corresponding subtrees) from the
search for which x
j,b
= x
ML
j,b
. This prepruning procedure is
illustrated in Fig. 1. Initializing the SESD with r = in

0
1
1
1
0 0
0 0 0 0
x
ML
1
= 1
x
ML
2
= 0 x
ML
3
= 0
x
ML
= [ 0 1 1 ]
Fig. 1. Example of the prepruning procedure in the RTS approach. Counter-
hypotheses to the ML solution are found by forcing the algorithm through
the dashed branches.
each of the QM
T
runs required to obtain λ
ML
j,b
will lead to
high computational complexity. It is therefore important to
realize that, without compromising max-log optimality, we
can initialize the search radius r
j,b
by setting it equal to the
minimum value of k
˜
y Rsk over all s X
x
ML
j,b
j,b
found
during preceding tree traversals.
The main advantage of the RTS strategy lies in the fact
that each traversal of the tree can be performed using a hard-
decision SESD with minimal modifications to account for
the search being carried out on a prepruned tree. The main
disadvantage is the repeated traversal of large parts of the tree.
As noted in [10], this problem can be mitigated somewhat by
changing the detection order in each run. Unfortunately, the
resulting need for multiple QR-decompositions typically leads
to prohibitive overall computational complexity.
B. Single Tree Search
The key to a more efficient (compared to RTS) tree-search
strategy is to ensure that every node in the tree is visited at
most once. This can be accomplished by searching for the ML
solution and all counter-hypotheses concurrently. The basic
idea behind such a single tree search (STS) approach has been
outlined in [4]. In the following, we shall elaborate on the
idea presented in [4] and describe some minor refinements.
Specifically, we formulate update rules and a pruning criterion
based on a list containing the metrics λ
ML
and λ
ML
j,b
.
The main concept is to have a list containing the metric
λ
ML
along with the corresponding bit sequence x
ML
and
the metrics λ
ML
j,b
of all counter-hypotheses and to search the
subtree originating from a given node only if the result can
lead to an update of either λ
ML
or one of the λ
ML
j,b
.
List administration: The algorithm is initialized with
λ
ML
= λ
ML
j,b
= ( j, b). Whenever a leaf with correspond-
ing binary label x has been reached, the decoder distinguishes
between two cases:
1) If a new ML hypothesis is found, i.e., d (x) < λ
ML
, all
λ
ML
j,b
for which x
j,b
= x
ML
j,b
are set to λ
ML
followed
by the updates λ
ML
d (x) and x
ML
x. In other
words, for each bit in the ML hypothesis that is changed
in the process of the update, the metric of the former
ML hypothesis becomes the metric of the new counter-
hypothesis, followed by an update of the ML hypothesis.
This procedure ensures that all λ
ML
j,b
always contain the
metric associated with a valid counter-hypothesis to the
current ML hypothesis.
2) In the case where d (x) λ
ML
, only the counter-
hypotheses have to be checked. For all j and b for
which d (x) < λ
ML
j,b
and x
j,b
= x
ML
j,b
, the decoder up-
dates λ
ML
j,b
d (x).
Pruning criterion: The key aspect of this algorithm is
the following pruning criterion. A given node s
(i)
on
level i and the subtree originating from that node have
the partial binary label x
(i)
consisting of the bits x
j,b
(b = 1, 2, . . . , Q and j = i, i + 1, . . . , M
T
). The remaining
bits x
j,b
(j = 1, 2, . . . , i 1) corresponding to the subtree are
unknown at this point. The pruning criterion for s
(i)
along
with its subtree is compiled from two conditions. First, the
bits in the partial binary label x
(i)
are compared with the
corresponding bits in the binary label of the current ML
hypothesis. In this comparison, for all j, b with x
j,b
= x
ML
j,b
,
the corresponding counter-hypotheses λ
ML
j,b
might be affected
when further searching the node’s subtree. Second, all counter-
hypotheses corresponding to the subtree of s
(i)
with the asso-
ciated metrics λ
ML
j,b
(j = 1, 2, . . . , i 1) may also be updated
since the corresponding bits are not yet known. In summary,
the metrics which may be affected during further search in the
subtree emanating from a node s
(i)
are given by the set
A = {a
l
} =
n
λ
ML
j,b
x
j,b
= x
ML
j,b
, j i
o
n
λ
ML
j,b
j < i
o
.
The node s
(i)
along with its subtree is pruned if its PED
d
s
(i)
satisfies
d
s
(i)
> max
a
l
∈A
a
l
. (10)
This pruning criterion (illustrated in Fig. 2) ensures that the
subtree of a given node is explored only if it can lead to an
update of either the ML hypothesis or of at least one of the
counter-hypotheses. Note that λ
ML
does not appear in (10) as
λ
ML
λ
ML
j,b
( j, b).
IV. METHODS FOR COMPLEXITY REDUCTION
So far we have discussed tree-search strategies which
solve (2) exactly and hence do not compromise the perfor-
mance of the max-log APP decoder. The goal of this section is
to describe methods that allow to trade-off decoder complexity
with (error rate) performance.
A. LLR Clipping
The dynamic range of LLRs is typically not bounded.
However, practical systems need to constrain the maximum
LLR value to enable fixed-point implementations. Evidently
this will lead to a performance degradation. A straightforward

d
s
(i)
!
max
x
(i)
1
0
?
x
ML
1
1
0
1
0
λ
ML
M
T
,1
λ
ML
M
T
,2
λ
ML
M
T
1,2
λ
ML
i,2
λ
ML
i1,1
λ
ML
i1,2
λ
ML
1,1
λ
ML
1,2
0
?
? ?
counter-hypotheses
λ
ML
M
T
1,1
λ
ML
i,1
0
0
0
0 0
0
0
>?
0
level i
Fig. 2. Example of the STS pruning criterion (M
T
= 5 and two bits per
symbol): The partial binary label x
(i)
determines which counter-hypotheses
may be affected during the search of the subtree emanating from the current
node.
way of ensuring that LLR values are bounded is to clip them
after the detection stage so that
L(x
j,b
)
L
max
j, b . (11)
We emphasize that the constraint in (11) refers to the normal-
ized LLRs L(x
j,b
) as defined in (2). It has been noted in [5]
that (11) can be built into the tree-search algorithm such that
it leads to a reduction in search complexity. In the following,
we briefly describe the application of the idea proposed in [5]
to the RTS and the STS tree-traversal strategies.
a) LLR Clipping for RTS: Whenever the RTS algorithm
starts to search for a counter-hypothesis, with the search radius
r
j,b
initialized as described in Section III-A, we first update
r
j,b
min
r
j,b
, λ
ML
+ L
max
(12)
which ensures that (11) is satisfied. Metrics associated with
counter-hypotheses for which no valid lattice point can be
found are set to λ
ML
+ L
max
.
b) LLR Clipping for STS: Whenever a leaf has been
reached and a new ML hypothesis has been found after
carrying out the steps in Case 1 in Section III-B, the counter-
hypotheses have to be updated according to
λ
ML
j,b
min
n
λ
ML
j,b
, λ
ML
+ L
max
o
j, b . (13)
For L
max
= , we obviously get the exact max-log solution,
whereas for L
max
0, the decoder performance approaches
that of a hard-output ML detector. On the other hand smaller
L
max
leads to a reduction in complexity, as more aggressive
pruning is performed. The parameter L
max
can therefore be
used to adjust the complexity/performance trade-off (cf. Sec-
tion V).
B. Ordering and Regularization
Ordering: A common approach to reduce complexity in
sphere decoding without compromising the decoder’s perfor-
mance is to adapt the detection ordering of the spatial streams
to the geometry of the instantaneous channel realization by
performing a QR-decomposition on HP (rather than H),
where P is a suitably chosen permutation matrix. More
efficient pruning of the search tree closer to the root is obtained
if “stronger streams” correspond to the levels closer to the root,
i.e., P is chosen such that the main diagonal entries of R in
HP = QR are sorted in ascending order. In the following, this
approach is termed sorted QR-decomposition (SQRD) [11].
Regularization: Poorly conditioned channel realizations H
lead to significant search complexity due to the low effective
SNR on one or multiple of the effective spatial streams. An
efficient way to counter this problem is to perform the tree-
search on a regularized channel matrix by computing
H
αI
P =
Q
1
Q
2
R
where I is the M
T
× M
T
-identity matrix, Q
1
is of dimension
M
R
× M
T
, Q
2
and R are of dimension M
T
× M
T
, and
α > 0 is a suitably chosen regularization parameter. Note
that Q
1
is, in general, not unitary. LLRs are then computed
according to
L
x
j,b
= min
˜
s∈X
(0)
j,b
k
˜
y R
˜
sk
2
min
˜
s∈X
(1)
j,b
k
˜
y R
˜
sk
2
(14)
where
˜
y = Q
H
1
y and
˜
s = Ps. Note that the LLRs in (14) need
to be reordered at the end of the decoding process to account
for the permutation induced by P. Operating on a regularized
version of the channel matrix clearly entails an (error rate)
performance loss. However, we shall see in Section V that
choosing α according to the minimum mean squared error
(MMSE) criterion (resulting in MMSE-SQRD) as outlined in
[12], degrades the performance only slightly while leading to
considerable savings in terms of search complexity.
C. Run-Time Constraints
A disadvantage of all SDs is that the computational com-
plexity required to find the ML solution (and the LLR values)
depends on the realization of the channel matrix and the noise;
the worst-case complexity corresponds to an exhaustive search.
On the other hand, in order to meet the practically important
requirement of a fixed throughput, the algorithm run-time must
be constrained, which leads to a constraint on the maximum
detection effort. This, in turn, generally prevents the detector
from achieving ML or max-log APP performance.
A straightforward way of enforcing a run-time constraint
is to terminate the search, on a symbol vector by symbol
vector basis, after a maximum number of visited nodes. The
detector then returns the best solution found so far, i.e., the
current ML and counter-hypotheses. A better solution is to
impose an aggregate run-time constraint of ND
avg
visited
nodes for an entire block of N vector symbols
2
. The maximum
complexity allocated to the detection of the kth vector symbol
can, for example, be chosen according to the maximum-first
(MF) scheduling strategy [13] as
D
max
(k) = ND
avg
k1
X
i=1
D(i) (N k)M
T
(15)
2
In an OFDM-based MIMO system, N would, for example, be the number
of OFDM tones.

where D(i) denotes the actual number of visited nodes for the
ith vector symbol. The concept behind (15) is that a vector
symbol is allowed to use up all of the remaining run-time
within the block up to a safety margin of (N k)M
T
visited
nodes, which allows to find at least the decision feedback
solution for the remaining vector symbols. Setting D
avg
= M
T
maximizes the throughput but reduces the performance to that
of hard-decision SIC.
V. PERFORMANCE/COMPLEXITY TRADE-OFFS
In practice, system engineers are typically faced with the
problem of designing a receiver that achieves a given target
frame error rate (FER) at a given throughput. The quality
of the receiver implementation can then be measured by the
minimum SNR required to achieve this target FER. In the
following, we assess the complexity/performance trade-offs
of the concepts described in Sections III and IV by plotting
the average (over independent channel and noise realizations)
number of visited nodes as a function of this minimum
SNR. Since the number of visited nodes translates directly to
the required chip area per throughput [9], the corresponding
charts allow to associate an SNR penalty with a reduction in
hardware complexity.
All simulation results are for a rate 1/2 (generator poly-
nomials [133
o
171
o
] and constraint length 7) convolutionally
encoded 4 × 4 MIMO-OFDM system with 16-QAM constel-
lation (using Gray mapping) and N = 64 tones. A soft-in
Viterbi decoder [14] is employed. One frame consists of 1024
randomly interleaved (across space and frequency) bits and a
TGn type C channel model [15] is used.
A. Comparison of Tree-Search Strategies
Fig. 3 compares the performance of RTS and STS max-
log APP decoders, and the list sphere decoder (LSD) [6] for
different target FERs, different values of L
max
and in the case
of the LSD for different list sizes. Changing the list size allows
to adjust the complexity/performance trade-off.
The STS approach is seen to clearly outperform the RTS
strategy in terms of average complexity. We can furthermore
see that for this setup max-log APP performance is achieved
for L
max
= 0.2. Increasing the LLR clipping level beyond
this value only increases complexity without improving per-
formance.
The implementation of the LSD requires additional memory
and logic for the administration of the candidate list, which is
not accounted for in this comparison. Fig. 3 shows that even
when this additional complexity is ignored, the LSD is still
inferior to the STS algorithm.
B. Impact of Preprocessing and Regularization
Fig. 4 compares the impact of SQRD, MMSE-SQRD, and
standard (unordered) QRD-based preprocessing on the com-
plexity/performance trade-off of the STS algorithm at a target
FER of 0.01. It can be seen that the improvement resulting
from SQRD compared to unordered QRD becomes significant
in the low (but realistic) complexity region. Further (minor)
15.5 16 16.5 17 17.5 18 18.5 19
0
50
100
150
200
250
300
350
400
450
0.0125
0.025
0.05
0.1
0.2
0.4
0.0125
0.025
0.05
0.1
0.2
0.4
0.01250.025
0.05
0.1
0.2
0.4
0.0125
0.1
0.2
0.4
4
8
16
32
64
2
4
8
16
32
64
Average number of visited nodes
Minimum SNR for a given FER
RTS, FER=0.04
RTS, FER=0.01
STS, FER=0.04
STS, FER=0.01
LSD [6], FER=0.04
LSD [6], FER=0.01
0.05
0.025
2
Fig. 3. Comparison of repeated tree search (RTS), single tree search (STS)
and the list sphere decoder (LSD) as proposed in [6]. The numbers next to
the curves correspond to L
max
for RTS and STS and to the list size in the
case of the LSD.
16.5 17 17.5 18 18.5 19 19.5 20
0
20
40
60
80
100
120
hard-output
SESD
0.0125
0.025
0.05
0.1
0.2
0.0125
0.025
0.05
0.1
0.2
0.0125
0.025
0.05
0.1
0.2
0.4
Average number of visited nodes
Minimum SNR for a given FER
QRD
SQRD
MMSE-SQRD
Fig. 4. Comparison of unordered QRD, SQRD and MMSE-SQRD prepro-
cessing applied to STS at a target FER of 0.01. The numbers next to the
curves correspond to L
max
. For L
max
0, the performance approaches
that of hard-output SESD.
improvements are obtained from regularization using MMSE-
SQRD. In the region where the average complexity is very
high, the performance penalty resulting from regularization
eventually renders MMSE-SQRD inferior to SQRD.
C. LLR Clipping
Both Fig. 3 and Fig. 4 show that adjusting the LLR clipping
level L
max
allows to sweep an entire family of sphere decoders
ranging from the exact max-log APP SESD (obtained, in our
setup, for L
max
0.2) to hard-output SESD (L
max
= 0). The
LLR clipping level is therefore an important design parameter
which can be used to conveniently adjust the decoder at
runtime to a given complexity constraint.

Citations
More filters
Proceedings ArticleDOI

Simulating the Long Term Evolution physical layer

TL;DR: This paper presents a MATLAB-based downlink physical-layer simulator for LTE that can efficiently be executed on multi-core processors to significantly reduce the simulation time.
Journal ArticleDOI

Soft-output sphere decoding: algorithms and VLSI implementation

TL;DR: VLSI implementation results are provided which demonstrate that single tree-search, sorted QR-decomposition, channel matrix regularization, log-likelihood ratio clipping, and imposing runtime constraints are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a chip area that is only 58% higher than that of the best-known hard-output sphere decoder VLSI Implementation.
Proceedings ArticleDOI

Mutual information based calculation of the Precoding Matrix Indicator for 3GPP UMTS/LTE

TL;DR: The proposed method provides means to obtain the number of useful MIMO transmission layers, signaled in form of the Rank indicator (RI), by maximizing mutual information also with respect to this value.
Journal ArticleDOI

Experimental Evaluation of Adaptive Modulation and Coding in MIMO WiMAX with Limited Feedback

TL;DR: The measurements show that at small SNR values, a single antenna transmission often outperforms an Alamouti transmission, and the measured throughput is far from its achievable maximum; the loss is mainly caused by a too simple convolutional coding.
Proceedings ArticleDOI

Physics-inspired heuristics for soft MIMO detection in 5G new radio and beyond

TL;DR: In this article, the authors present ParaMax, a MIMO detector architecture that for the first time brings to bear physics-inspired parallel tempering algorithmic techniques on this class of problems.
References
More filters
Journal ArticleDOI

Iterative decoding of binary block and convolutional codes

TL;DR: Using log-likelihood algebra, it is shown that any decoder can be used which accepts soft inputs-including a priori values-and delivers soft outputs that can be split into three terms: the soft channel and aPriori inputs, and the extrinsic value.
Journal ArticleDOI

Achieving near-capacity on a multiple-antenna channel

TL;DR: This work provides a simple method to iteratively detect and decode any linear space-time mapping combined with any channel code that can be decoded using so-called "soft" inputs and outputs and shows that excellent performance at very high data rates can be attained with either.
Journal ArticleDOI

Closest point search in lattices

TL;DR: An efficient closest point search algorithm, based on the Schnorr-Euchner (1995) variation of the Pohst (1981) method, is implemented and is shown to be substantially faster than other known methods.
Journal ArticleDOI

Improved methods for calculating vectors of short length in a lattice, including a complexity analysis

TL;DR: In this paper, the authors show that searching through an ellipsoid is in many cases much more efficient than enumerating all vectors of Z'.. in a suitable box.
Journal ArticleDOI

Lattice basis reduction: improved practical algorithms and solving subset sum problems

TL;DR: Empirical tests show that the strongest of these algorithms solves almost all subset sum problems with up to 66 random weights of arbitrary bit length within at most a few hours on a UNISYS 6000/70 or within a couple of minutes on a SPARC1 + computer.
Related Papers (5)
Frequently Asked Questions (21)
Q1. What are the contributions mentioned in the paper "Soft-output sphere decoding: performance and implementation aspects" ?

In this paper, the authors show how sphere decoding can be used as an efficient tool to implement soft-output MIMO detection with flexible trade-offs between computational complexity and ( error rate ) performance. In particular, the authors demonstrate that single tree search, ordered QR decomposition, channel matrix regularization, and log-likelihood ratio clipping are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a computational complexity that is reasonably close to that of hard-output sphere decoding. 

Ordering: A common approach to reduce complexity in sphere decoding without compromising the decoder’s performance is to adapt the detection ordering of the spatial streams to the geometry of the instantaneous channel realization byperforming a QR-decomposition on HP (rather than H), where P is a suitably chosen permutation matrix. 

More efficient pruning of the search tree closer to the root is obtained if “stronger streams” correspond to the levels closer to the root, i.e., P is chosen such that the main diagonal entries of R in HP = QR are sorted in ascending order. 

Transforming (3) and (4) into tree-search problems and using the sphere decoding algorithm [2], [3] allows to efficiently compute the LLRs (5). 

The key to a more efficient (compared to RTS) tree-search strategy is to ensure that every node in the tree is visited at most once. 

The main advantage of the RTS strategy lies in the fact that each traversal of the tree can be performed using a harddecision SESD with minimal modifications to account for the search being carried out on a prepruned tree. 

The SESD constrains the search to nodes which lie within a radius r around ỹ and traverses the tree depthfirst, visiting the children of a given node in ascending order of their PEDs. 

In the region where Lmax is small, the performance is dominated by aggressive LLR clipping rather than by the run-time constraint. 

Euclidean distances d(s) = ‖ỹ −Rs‖2 in (6) and (7) can be computed recursively as d(s) = d1 with the partial Euclidean distances (PEDs)di = di+1 + |ei|2 , i = MT ,MT − 1, . . . , 1 (8)and the distance increments (DIs)|ei|2 = ∣∣∣ỹi − MT∑j=iRi,jsj ∣∣∣2. (9) Since the dependence of the PEDs di on the symbol vector s is only through s(i), the authors have transformed ML detection and the computation of the max-log LLRs into a weighted treesearch problem: PEDs and PSVs are associated with nodes, branches correspond to DIs. 

A straightforward way of enforcing a run-time constraint is to terminate the search, on a symbol vector by symbol vector basis, after a maximum number of visited nodes. 

The keys to achieving low complexity are the single tree-search strategy in Section III-B, MMSE-SQRD preprocessing, LLR clipping, and imposing run-time constraints with MF scheduling. 

A straightforwardway of ensuring that LLR values are bounded is to clip them after the detection stage so that∣∣L(xj,b)∣∣ ≤ Lmax ∀ j, b . (11) 

An efficient way to counter this problem is to perform the treesearch on a regularized channel matrix by computing[H αI] P = [ Q1 Q2 ] 

Computing the LLRs as in (5) requires determining the metric λMLj,b , which is achieved by traversing only those partsof the tree that have leaves in X “ xMLj,b ” j,b . 

To this end, the channel matrix H is first QR-decomposed according to H = QR, where Q is unitary of dimension MR ×MT and the upper-triangular MT×MT matrix R has real-valued positive entries on its main diagonal. 

In order to reduce the corresponding computational complexity, the authors employ the max-log approximation [6]L ( xj,b ) = mins∈X (0)j,b ‖y −Hs‖2 − min s∈X (1)j,b ‖y −Hs‖2 (2)where X (0)j,b and X (1) j,b are the disjoint sets of vector symbols that have the bth bit in the label of the jth scalar symbol equal to 0 and 1, respectively, and the LLRs in (2) are normalized to avoid dependence on the noise variance. 

It is therefore important to realize that, without compromising max-log optimality, the authors can initialize the search radius rj,b by setting it equal to theminimum value of ‖ỹ − 

Whenever a leaf has been reached and a new ML hypothesis has been found after carrying out the steps in Case 1 in Section III-B, the counterhypotheses have to be updated according toλMLj,b ← min { λMLj,b , λ ML + Lmax } ∀ j, b . 

The maximum complexity allocated to the detection of the kth vector symbol can, for example, be chosen according to the maximum-first (MF) scheduling strategy [13] asDmax(k) = NDavg − k−1∑ i=1 D(i)− (N − k)MT (15)2 

The node s(i) along with its subtree is pruned if its PED d ( s(i) ) satisfiesd ( s(i) ) > maxal∈A al . (10)This pruning criterion (illustrated in Fig. 2) ensures that the subtree of a given node is explored only if it can lead to an update of either the ML hypothesis or of at least one of the counter-hypotheses. 

the authors formulate update rules and a pruning criterion based on a list containing the metrics λML and λMLj,b .The main concept is to have a list containing the metric λML along with the corresponding bit sequence xML and the metrics λMLj,b of all counter-hypotheses and to search the subtree originating from a given node only if the result can lead to an update of either λML or one of the λMLj,b .List administration: