scispace - formally typeset
Open AccessJournal ArticleDOI

VLSI Implementation of a Soft-Output Signal Detector for Multimode Adaptive Multiple-Input Multiple-Output Systems

Reads0
Chats0
TLDR
Implementation-friendly algorithms, which reuse most of the mathematical operations in these three MIMO modes, are proposed to provide accurate soft detection information, i.e., log-likelihood ratio, with much reduced complexity.
Abstract
This paper presents a multimode soft-output multiple-input multiple-output (MIMO) signal detector that is efficient in hardware cost and energy consumption. The detector is capable of dealing with spatial-multiplexing (SM), space-division-multiple-access (SDMA), and spatial-diversity (SD) signals of 4 × 4 antenna and 64-QAM modulation. Implementation-friendly algorithms, which reuse most of the mathematical operations in these three MIMO modes, are proposed to provide accurate soft detection information, i.e., log-likelihood ratio, with much reduced complexity. A unified reconfigurable VLSI architecture has been developed to eliminate the implementation of multiple detector modules. In addition, several block level technologies, such as parallel metric update and fast bit-flipping, are adopted to enable a more efficient design. To evaluate the proposed techniques, we implemented the triple-mode MIMO detector in a 65-nm CMOS technology. The core area is 0.25 mm2 with 83.7 K gates. The maximum detecting throughput is 1 Gb/s at 167-MHz clock frequency and 1.2-V supply, which archives the data rate envisioned by the emerging long-term evolution advanced standard. Under frequency-selective channels, the detector consumes 59.3-, 10.5-, and 169.6-pJ energy per bit detection in SM, SD, and SDMA modes, respectively.

read more

Content maybe subject to copyright    Report

LUND UNIVERSITY
PO Box 117
221 00 Lund
+46 46-222 00 00
VLSI Implementation of a Soft-Output Signal Detector for Multi-Mode Adaptive MIMO
Systems
Liu, Liang; Löfgren, Johan; Nilsson, Peter; Öwall, Viktor
Published in:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DOI:
10.1109/TVLSI.2012.2231706
2013
Link to publication
Citation for published version (APA):
Liu, L., Löfgren, J., Nilsson, P., & Öwall, V. (2013). VLSI Implementation of a Soft-Output Signal Detector for
Multi-Mode Adaptive MIMO Systems.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
,
21
(12), 2262-2273. https://doi.org/10.1109/TVLSI.2012.2231706
Total number of authors:
4
General rights
Unless other specific re-use rights are stated the following general rights apply:
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors
and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the
legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study
or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal
Read more about Creative commons licenses: https://creativecommons.org/licenses/
Take down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove
access to the work immediately and investigate your claim.

Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other
purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org.
VLSI Implementation of a Soft-Output Signal
Detector for Multi-Mode Adaptive MIMO Systems
Liang Liu, Member, IEEE, Johan L
¨
ofgren, Student Member, IEEE, Peter Nilsson, Senior Member, IEEE,
and Viktor
¨
Owall , Member, IEEE
Abstract—This paper presents a multi-mode soft-output
multiple-input multiple-output (MIMO) signal detector that is ef-
ficient in hardware cost and energy consumption. The detector is
capable of dealing with spatial-multiplexing (SM), space-division-
multiple-access (SDMA), and spatial-diversity (SD) signals of
4×4 antenna and 64-QAM modulation. Implementation-friendly
algorithms, which reuse most of the mathematical operations in
these three MIMO modes, are proposed to provide accurate soft
detection information, i.e., log-likelihood ratio (LLR), with much
reduced complexity. A unified reconfigurable VLSI architecture
has been developed to eliminate the implementation of multiple
detector modules. In addition, several block level technologies,
such as parallel metric update and fast bit-flipping, are adopted
to enable a more efficient design. To evaluate the proposed
techniques, we implemented the triple-mode MIMO detector in a
65-nm CMOS technology. The core area is 0.25 mm
2
with 83.7 K
gates. The maximum detecting throughput is 1 Gb/s at 167-MHz
clock frequency and 1.2-V supply, which archives the data rate
envisioned by the emerging long-term evolution advanced (LTE-
A) standard. Under frequency-selective channels, the detector
consumes 59.3 pJ, 10.5 pJ, and 169.6 pJ energy per bit detection
in SM, SD, and SDMA modes, respectively.
Index Terms—Multiple-input multiple-output (MIMO), signal
detector, soft-output, spatial-multiplexing (SM), spatial-diversity
(SD), space-division-multiple-access (SDMA), very-large scale
integration (VLSI).
I. INTRODUCTION
T
O meet the growing demands for better user experience,
the International Telecommunication Union (ITU) has re-
leased its requirements for next-generation wireless networks,
where much higher spectral efficiency, higher coverage, and
lower latencies are expected [1]. It has been a broad agreement
that enhanced multiple-input multiple-output (MIMO) tech-
nologies play an essential role in emerging wireless standards,
e.g., IEEE 802.16m (WiMAX Profile 2.0) [2] and 3GPP Long
Term Evolution Advanced (3GPP LTE-A) (Release 10) [3], to
achieve or exceed the International Mobile Telecommunica-
tions Advanced (IMT-A) target.
Cellular systems experience highly dynamic channel con-
ditions, where the signal-to-noise ratio (SNR) and fading
properties vary within huge ranges. To guarantee the quality
of service (QoS) for users with a specified error rate and data
throughput, it is necessary that the system is equipped with
multiple MIMO technologies, which are dynamically adapted
to the fluctuating channels. This is because single-mode
L. Liu, J. L
¨
ofgren, P. Nilsson, and V.
¨
Owall are with Department of Elec-
trical and Information Technology, Lund University, Lund, Sweden (email:
{Liang.Liu, Johan.Lofgren, Peter.Nilsson, Viktor.Owall}@eit.lth.se).
Digital Object Identifier
MIMO schemes have shown their limitations in satisfying
such requirements. For example, the widely-used spatial mul-
tiplexing (SM) technique [4] suffers from huge performance
loss when the spatial channel becomes highly correlated [5].
Currently, extensive discussions are ongoing about the multi-
mode adaptive MIMO schemes in 3GPP-LTE and WiMAX [6].
MIMO transmission techniques to be switched include SM [7],
space-division-multiple-access (SDMA) [5], [8], and spatial
diversity (SD) [9]. For such an adaptive system, multiple
signal detectors are needed at the receiver side with each
one corresponding to the respective mode. A straightforward
implementation strategy will incur considerable silicon area
overhead and be immensely inefficient since most of the mod-
ules would remain in an idle state for a large part of the time.
As a consequence, an efficient implementation is expected
to integrate multiple MIMO detectors into a single module,
which can be reconfigured for the respective mode at run-time.
Moreover, in real-life wireless systems, signal detectors are
usually attached with channel decoders to provide robustness
against noise and fading. Therefore, a detector should be
capable of not only providing the binary estimation of each
bit but also its reliability measurement, e.g., log-likelihood
ratio (LLR) [7], to achieve further performance enhancement.
Finally, the chip area and power consumption should be low
enough to be adopted in practical systems, especially for hand-
held devices where the high performance and flexibility need
to be combined with energy efficiency. To the best of our
knowledge, VLSI implementation of such a reconfigurable
multi-mode soft-output MIMO detector remains missing in
open literatures.
In an attempt to fill this gap, this paper proposes a
soft-output signal detector that supports 64-QAM modulated
SM/SDMA/SD triple-mode signals for up to 4×4 MIMO
transmission. Furthermore, it achieves near maximum a pos-
teriori probability (MAP) detection performance and provides
gigabit-per-second throughput. The unification of multi-mode
processing is mainly realized by algorithm-level exploita-
tion, where the algorithms for each mode consist of similar
mathematical operations to enable substantial hardware reuse.
First, we develop soft-output detection algorithms for SM and
SDMA modes based on an efficient extension and modification
of the hard-output fixed-complexity sphere decoder (FSD)
[10]. More specifically, we introduce a symbol-level bit-
flipping scheme, which generates accurate LLR values with
marginal hardware increment. Additionally, a polygon-shaped
constraint technique is adopted to facilitate the reduction of
unnecessary node extensions in the tree search procedure.
1

2
For SD signal detecting, e.g., Alamouti space-frequency block
codes (SFBC) [11], [12], we propose a low complexity MAP
algorithm owning a unified detection procedure that is inde-
pendent of antenna number. It allows for parallel detection of
the real and imaginary parts of each transmitted symbol with
the help of QR decomposition to the orthogonal real-valued
channel matrix. Taking advantage of these implementation-
oriented algorithms, a unified VLSI architecture is subse-
quently developed, capable of being reconfigured to support
different MIMO modes at run-time. To further improve the
implementation efficiency, e.g., reduce the detection latency,
we introduce a parallel metric update strategy, which processes
multiple candidate vectors simultaneously for soft-value com-
putation and a fast bit-flipping scheme to select the bit-flipped
symbol with simple boundary-check operations. To validate
the effectiveness of foregoing design solutions, we designed
the proposed triple-mode soft-output signal detector using
Synopsys tools with a 65-nm CMOS standard cell library.
Occupying only 0.25 mm
2
core area (83.7K equivalent gate
count), the detector achieves 1 Gb/s throughput in SM and SD
modes with 4×4 64-QAM configuration, representing a 44%
saving to state-of-the-art in terms of hardware efficiency. The
throughput for detecting SDMA signal is 250 Mb/s. Working
at frequency-selective channels, e.g., the extended vehicular
A (EVA) channel specified in LTE standard [13], the detector
consumes 59.3mW power in SM mode, resulting in a 59.3
pJ/bit energy consumption. The energy needed to detect a bit
in SD and SDMA modes is 10.5 pJ and 169.6 pJ, respectively.
The remainder of this paper is organized as follows: Sec-
tion II briefly introduces the system model and soft-output
MIMO signal detection. Section III describes and evaluates
the proposed detection algorithms. Section IV shows the VLSI
architecture and module circuit design. The implementation
results and performance comparison are presented in Section
V, and conclusions are drawn in Section VI.
II. BACKGROUND
A. System Model
As illustrated in Fig. 1, we consider a downlink switching
SM/SDMA/SD MIMO system with one base station (BS) and
K user equipments (UEs). Both the BS and UEs are equipped
with N transmit and receive antennas. The received N × 1
complex signal vector at the k
th
UE is given by
˜r
k
=
˜
H
c
k
K
X
k=1
˜
P
k
˜s
k
+ ˜n
k
, (1)
where ˜s
k
= [˜s
(0)
k
, . . . , ˜s
(L1)
k
]
T
is the L-layer transmitted
vector for user k, in which each component is taken inde-
pendently from a set of Grey-labeled M -QAM constellation
points. Each symbol vector ˜s
k
is associated with a bit-level
vector b
k
(i.e., ˜s
k
= MAP(b
k
)), which is obtained by error
correction coding (ECC) to the original binary source. In (1),
˜n
k
is the vector of independent Gaussian noise samples with
mean zero and variance N
0
/2,
˜
H
c
k
is the N × N complex
channel matrix between the BS and the k
th
UE, and
˜
P
k
is
the N × L pre-coding matrix which is selected from a pre-
defined code-book and is assumed to be known to both BS and
1
.
.
.
N
UE
K
H
K
~
c
1
.
.
.
N
source
b
1
MappingEncode
s
1
Pre-code
(P
1
)
.
.
.
Channel
estimate
r
1
H
1
Soft
detector
llr
Decode
Ante.1
Ante. N
1
N
.
.
.
1
N
.
.
.
source b
K
MappingEncode
s
K
Pre-code
(P
K
)
~
~
~
~
BS
~
~
UE
1
1
.
.
.
N
H
1
~
c
UE
2
H
2
~
c
.
.
.
Fig. 1. LTE downlink multi-mode MIMO transmission.
UE [14]. The switch between different MIMO transmissions
is realized by changing the matrix
˜
P
k
. Throughout this paper,
we set
˜
P
k
to be an N ×N identity matrix (I
N
) in SM mode.
While in SD mode,
˜
P
k
is an Alamouti coding matrix [12].
For SDMA system,
˜
P
k
is a unitary pre-coding matrix such
that
˜
P
H
k
˜
P
k
= 1 and
˜
P
H
k
˜
P
l,l6=k
= 0, where (·)
H
means
Hermitian transposition. Moreover, we assume point-to-point
transmission in SM and SD modes, i.e., K = 1 and L = N.
Finally, L is set to 1 in SDMA mode, because the number of
layer per UE is limited to one in LTE [14].
The complex system can be transformed to its real-valued
representation r
k
= H[s
1
, . . . , s
K
]
T
+ n
k
, where
r
k
= [<(˜r
k,1
), =(˜r
k,1
), . . . , <(˜r
k,N
), =(˜r
k,N
)]
T
s
k
= [<(˜s
k,1
), =(˜s
k,1
), . . . , <(˜s
k,L
), =(˜s
k,L
)]
n
k
= [<(˜n
k,1
), =(˜n
k,1
), . . . , <(˜n
k,N
), =(˜n
k,N
)]
T
,
(2)
and
H=
<(
˜
H
1,1
) −=(
˜
H
1,1
) . . . <(
˜
H
1,N
) −=(
˜
H
1,N
)
=(
˜
H
1,1
) <(
˜
H
1,1
) . . . =(
˜
H
1,N
) <(
˜
H
1,N
)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
<(
˜
H
N,1
) −=(
˜
H
N,1
) . . . <(
˜
H
N,N
) −=(
˜
H
N,N
)
=(
˜
H
N,1
) <(
˜
H
N,1
) . . . =(
˜
H
N,N
) <(
˜
H
N,N
)
. (3)
In (2) and (3),
˜
H =
˜
H
c
k
[
˜
P
1
, . . . ,
˜
P
K
] is the equivalent chan-
nel, [·]
T
means vector transposition, <(·) and =(·) represent
the real and imaginary parts of a complex number, respectively.
B. Soft-Output MIMO Signal Detection
Hard-output signal detectors tries to recover the original
vector s
k
, given r
k
and H. While the objective of soft-output
detector is to provide reliability information by computing the
LLRs for each bit of b
k
, e.g., for the l
th
bit, we have
L(b
k,l
| r
k
) = ln
P (b
k,l
= 1 | r
k
)
P (b
k,l
= 0 | r
k
)
= L
E
(b
k,l
| r
k
)+L
A
(b
k,l
).
(4)
In (4), L
A
(b
k,l
) is the a priori probability and L
E
(b
k,l
| r
k
) is
the extrinsic information. For simplification, we will omit the
user index k in the following. According to [7], L
E
(b
l
| r)
can be rewritten as
L
E
(b
l
| r) = ln
P
bχ
1
l
P (r | b
l
) exp(1/2b
T
[l]
L
A[l]
)
P
bχ
0
l
P (r | b
l
) exp(1/2b
T
[l]
L
A[l]
)
, (5)

3
where χ
1
l
and χ
0
l
are the sets of bit-level vectors hav-
ing the l
th
bit equal to 1 and 0, respectively, b
[l]
de-
notes the sub-vector of b with the l
th
bit b
l
being omitted,
L
A[l]
is the sub-vector of the a priori information vector
L
A
= [L
A
(b
1
), L
A
(b
2
), . . . , L
A
(b
N log
M
2
)]
T
omitting L
A
(b
l
).
The computation of (5) is usually simplified with max-log
approximation, yielding the maximum a posteriori probability
(MAP) algorithm as
L(b
l
| r) min
bχ
0
l
1
N
0
|r Hs|
2
min
bχ
1
l
1
N
0
|r Hs|
2
. (6)
Note that the a priori information is not considered in (6),
meaning that we do not take into account the turbo receiver
scheme where the inner detector and the outer decoder ex-
change extrinsic information iteratively [7].
From a hardware design perspective, tree-search algorithms
[15]–[21] are promising alternatives to the direct implementa-
tion of (6) due to their effectiveness in confining the detection
procedure within a much smaller search space. A tree-search
algorithm formulates the detection as a 2N-depth
M-ary
tree search problem by rewriting the Euclidean distance as
|y Rs|
2
, where R is an upper triangular matrix obtained
by H = QR, y = Q
H
r, and Q is a unitary matrix. Starting
from the top (2N
th
) layer, the calculation of the Euclidean
distance T is carried out in a recursive way as
T
i
= T
i+1
+ inc
i
,
inc
i
= |y
i
2N
X
j=i+1
R
ij
s
j
R
ii
s
i
|
2
= |y
0
i
R
ii
s
i
|
2
(7)
where T
i
is the partial Euclidean distance (PED) at the i
th
layer. The soft-output tree-search algorithm generates a list L
of candidate vectors by going through the tree and finds the
two elements of (6) within the list, i.e.,
L(b
l
| r) min
b∈L∩χ
0
l
1
N
0
|r Hs|
2
min
b∈L∩χ
1
l
1
N
0
|r Hs|
2
.
(8)
In tree-search detection, L χ
1/0
l
can be empty. Under
such circumstance, a constant value is usually adopted to
demonstrate that b
l
equals to 1 or 0 with a large probability
[16]. Specified in this paper, the breadth-first fixed-complexity
sphere decoder (FSD) [10], [19] will be explored, because of
its low computational complexity, completely regular and feed-
forward-only dataflow, and near-optimal performance.
III. SOFT-OUTPUT DETECTION ALGORITHMS
In this section we will develop detection algorithms for SM,
SDMA, and SD MIMO modes, respectively. These proposed
algorithms feature low computational complexity and demon-
strate similar mathematical operations that can be conveniently
integrated into a single VLSI architecture. In detail, we focus
on the modification of FSD for SM and SDMA modes. For SD
mode, we propose an extensively simplified MAP algorithm
by leveraging the orthogonality of Alamouti signals and the
matrix-decomposition operations.
A. Low-Complexity LLR Generation Based on FSD
FSD divides the real-valued search tree into two unique
parts using a parameter D. A full-search is performed in the
first D layers, exhaustively expanding all
M branches per
node, while in the remaining (2N -D) layers, a single-search is
adopted, expanding only one best branch per node. It has been
analyzed in [22] that FSD achieves close-to-ML performance
if (D + 1)
2
2N. For example, D = 2 allows the FSD to
present an asymptotical ML performance for MIMO system
with N = 4. However, FSD is more efficient in finding the ML
solution in a hard-output scenario instead of generating a list
of vectors around the ML result, resulting in poor performance
from a soft-output perspective [19]. In this section, we extend
the original FSD to provide accurate soft values while main-
taining its low computational complexity. With this purpose,
we utilize a symbol-level bit-flipping scheme for performance
improvement and a polygon-shaped constraint technique to
reduce unnecessary node extensions.
1) LLR Accuracy Improvement by Modified Bit-Flipping:
Compared to the hard-output ML detection, i.e.,
s
ML
= arg min
s2N
M
1
N
0
|r Hs|
2
, (9)
the soft-output detection consists of two minima search pro-
cedures, as demonstrated in (6). One of them is obtained by
(9), which is then referred as T
ML
=
1
N
0
|r Hs
ML
|
2
. The
other can then be formulated as
T
ML
l
= min
bχ
ML
l
1
N
0
|r Hs
ML
l
|
2
, (10)
in which χ
ML
l
is the binary complement to the l
th
bit in the
ML bit vector b
ML
. Basically, there are two major reasons
that FSD tends to generate poor-quality LLRs. One is the
occurrence of vacant bits in the candidate list L corresponding
to χ
ML
l
(i.e., L∩χ
ML
l
= ), existing in most tree search
algorithms. Even for those existing bits, FSD cannot ensure the
minimization of (10). This is because unlike K-Best detection,
where strict sorting is performed at every layer, FSD simply
extends all nodes at the first D layers while only one at the
remaining. Such a tree travel scheme does not guarantee the
inclusion of best vectors (i.e., vectors with smallest Euclidean
distances) in the candidate list.
To tackle these two issues with reasonable complexity over-
head, we suggest a modified bit-flipping scheme by replacing
the whole vector re-calculation [23] with a per symbol re-
calculation scheme. Its basic idea is described as follows: when
calculating the LLR L(b
i,l
) for the l
th
bit in the i
th
scalar
symbol s
i
, the strategy is to first find the locally best symbol
with the l
th
bit value different to b
ML
i
, i.e.,
s
BF
i,l
= arg min
b
i,l
6=b
ML
i,l
|y
0
i
R
i,i
s
i
|
2
, (11)
and then compute the bit-flipped LLR by
L
BF
(b
i,l
| y
0
i
) = |y
0
i
R
i,i
s
ML
i
|
2
|y
0
i
R
i,i
s
BF
i,l
|
2
= inc
ML
i
inc
BF
i,l
.
(12)
In (11) and (12), y
0
i
is the received symbol at the i
th
layer
with the interference from previously detected signals being

4
received signal
r
2
Fig. 2. Polygon-shaped constraint with L
2N 1
= [5, 5, 3, 3, 1, 1, 0, 0].
canceled and b
ML
i
is the bit-level vector corresponding to
s
ML
i
, which is denoted as the i
th
scalar symbol of the ML
vector. It should be pointed out that although the ML result
is obtained by minimizing |y Rs|
2
, it does not promise
a locally best result, i.e., inc
ML
i
is not necessarily smaller
than inc
BF
i,l
. Therefore, the sign of L
BF
(b
i,l
| y
0
i
) should be
adjusted to positive or negative according to the corresponding
bit value of b
ML
i,l
. It should also be mentioned that inc
ML
i
in
(12) has already been calculated during tree search, which can
thus be reused in bit-flipping for hardware saving.
So far, we may have two possible LLRs for each bit, which
are acquired by FSD tree search (L
F SD
b
l
) and the bit-flipping
scheme presented above (L
BF
b
l
). The final result is selected
according to the magnitude of these two candidates
L(b
l
) =
L
BF
b
l
, if |L
BF
b
l
| |L
F SD
b
l
| or L χ
ML
l
=
L
F SD
b
l
, otherwise.
(13)
The selection criteria in (13) finds the minimum of |L
BF
b
l
| and
|L
F SD
b
l
|, which is efficient in relieving the problem of getting
the pseudo-minimum of (10), leading to a more accurate
approximation of the MAP result.
2) Complexity Reduction with Polygon-Shaped Constraint:
According to the analysis in Sec.III-A1, the exhaustive ex-
pansion at upper layers of the FSD tree introduces a lot of
computational waste by including vectors with large Euclidean
distances in the candidate list. To reduce such unnecessary
visits to some nodes, we adopt the imbalanced-expansion
technique proposed in [24] to find the list of vectors closer
to the ML result. This technique is briefly repeated here for
convenience of presentation.
The concept of imbalanced-expansion is to approximate the
circular-shaped constraint in a sphere decoder [17] with a
polygon-shaped constraint, as illustrated in Fig. 2. The polygon
constraint is realized by introducing an extension number
limitation L
m
i
, by which only the L
m
i
best nodes are extended
from the m
th
father node at the i
th
(i > 2N D) layer. The
detailed explanation can be found in [24], where a smaller
L
m
i
is set for the node with larger PED to expand more/fewer
branches from more/less reliable nodes. Moreover, considering
that the FSD performs full extension only at the first two real-
valued tree-search layers for a 4 × 4 system, L
m
i
is applied
only for the (2N 1)
th
layers, i.e., i = 2N 1 (the constraint
to the top layer is accomplished by setting the corresponding
L
m
2N1
to 0).
Compared to the radius constraint, the polygon-shaped
constraint is more efficient from a hardware implementation
perspective. Firstly, with a radius constraint r
2
, a node dissat-
isfying the constraint will not be pruned until its PED is com-
pletely calculated and compared with r
2
. On the other side, the
polygon-shaped constraint with extension number limitation
early prunes less reliable paths before PED calculations by
using the well-known zigzag enumeration technology [25].
Moreover, the number of nodes to be extended is fixed with
a given constraint L
2N1
= [L
1
2N1
, ··· , L
M
2N1
], which is
not the case for radius-constrained algorithms where the node
extension number is variable depending on the channel and
the noise. Therefore, the polygon-constraint algorithm has a
very regular data flow and the corresponding control circuitry
can be significantly simplified. Finally, the proposed scheme
is convenient in tuning the complexity-performance tradeoff
by setting L
total
=
P
L
2N1
to a smaller/larger number.
3) Application to SDMA Mode: The aforementioned FSD
detection is originally developed for SM signal. In the follow-
ing, the algorithm is modified to be adopted for the SDMA
mode. Detecting downlink SDMA signals is unique in that
only signals dedicated to the k
th
user (i.e., ˜s
k
) are reserved,
while the signals intended for other users (i.e., ˜s
l,l6=k
) are dis-
carded after detection [26]. To take full utilization of this fea-
ture, we add a layer-reordering step to the pre-processing stage
of FSD such that the desired signal ˜s
k
is moved to the top layer
of the FSD search tree where multiple candidates are extended.
The reordering is accomplished by a permutation matrix W
k
,
which moves the k
th
column of the channel matrix
˜
H
k
to the
last position, i.e., W
k
= [w
1
, ...w
k1
, w
k+1
, ..., w
K
, w
k
],
where w
i
denotes an N × 1 vector whose i
th
element is
one, and zeros elsewhere. Therefore, the system model can
be rewritten as
˜r
k
=
˜
H
k
W
k
˜s
P
k
+ ˜n
k
=
˜
H
P
k
˜s
P
k
+ ˜n
k
,
(14)
where ˜s
P
k
= [˜s
1
, ...˜s
k1
, ˜s
k+1
, ..., ˜s
K
, ˜s
k
]
T
is the transmit
vector with ˜s
k
being moved to the last position. Taking the
reordered channel matrix
˜
H
P
k
as an input, the imbalanced-
FSD tree search in Section III-A2 is then conducted to get a list
of candidate vectors, based on which the LLRs corresponding
to ˜s
k
are computed.
Since multiple candidates of ˜s
k
are extended at upper layers,
it can be expected in the final list that ˜s
k
has a good diversity in
its bit values. Moreover, the single-extension at the remaining
layers attaches only the best node of ˜s
l,l6=k
to the candidates
of ˜s
k
. Hence, it is highly likely that the final list contains the
actual minimum of (10) for the bits corresponding to ˜s
k
. In
view of the above analysis, the candidate list obtained by the
layer-reordered FSD tree search is good enough to generate
high-quality soft values for ˜s
k
. Thereby, we can turn off the
bit-flipping operation in SDMA mode in order to reduce power
consumption.

Citations
More filters
Journal ArticleDOI

Low-Computing-Load, High-Parallelism Detection Method Based on Chebyshev Iteration for Massive MIMO Systems With VLSI Architecture

TL;DR: A signal detection method called parallelizable Chebyshev iteration (PCI) that reduces the computing load and explores the potential parallelism of matrix inversions and multiplications, which are both major issues in MMSE detection.
Journal ArticleDOI

Stochastic Iterative MIMO Detection System: Algorithm and Hardware Design

TL;DR: A Stochastic iterative multiple-input multiple-output (SIM) detection system based on the Markov chain Monte Carlo method, which can achieve a throughput of 787.5Mbps with a frame error rate 10-3 at Eb/N0=7dB, equaling the FER of a traditional iterative MIMO detection with four outer iterations.
Journal ArticleDOI

A 38 pJ/b Optimal Soft-MIMO Detector

TL;DR: An optimal soft multiple-input multiple-output (MIMO) detector is proposed with linear complexity for a general spatial multiplexing system with two transmitting symbols and receiving antennas, showing great potential to be used in next generation Gbps wireless systems.
Journal ArticleDOI

Hardware Efficient Architecture for Element-Based Lattice Reduction Aided K-Best Detector for MIMO Systems

TL;DR: This paper proposes to use the so-called Element-based Lattice Reduction (ELR) combined with K-Best detector for the sake of attaining a better Bit Error Ratio (BER) performance and lower complexity than the conventional Lenstra, Lanstra, and Lovasz (LLL) LR-aided detection.
MonographDOI

Efficient MIMO Detection Methods

Mirsad Cirkic
TL;DR: For the past decades, the demand in transferring large amounts of data rapidly and reliably has been increasing drastically, and one of the more promising techniques that can provide the desired performance is being developed.
References
More filters
Journal ArticleDOI

A simple transmit diversity technique for wireless communications

TL;DR: This paper presents a simple two-branch transmit diversity scheme that provides the same diversity order as maximal-ratio receiver combining (MRRC) with one transmit antenna, and two receive antennas.
Proceedings ArticleDOI

V-BLAST: an architecture for realizing very high data rates over the rich-scattering wireless channel

TL;DR: This paper describes a wireless communication architecture known as vertical BLAST (Bell Laboratories Layered Space-Time) or V-BLAST, which has been implemented in real-time in the laboratory and demonstrated spectral efficiencies of 20-40 bps/Hz in an indoor propagation environment at realistic SNRs and error rates.
Journal ArticleDOI

Achieving near-capacity on a multiple-antenna channel

TL;DR: This work provides a simple method to iteratively detect and decode any linear space-time mapping combined with any channel code that can be decoded using so-called "soft" inputs and outputs and shows that excellent performance at very high data rates can be attained with either.
Book

Space-time block coding for wireless communications: performance results

TL;DR: It is shown that using multiple transmit antennas and space-time block coding provides remarkable performance at the expense of almost no extra processing.
Journal ArticleDOI

Algorithm and implementation of the K-best sphere decoding for MIMO detection

TL;DR: The implementation results show that it is feasible to achieve near-ML performance and high detection throughput for a 4/spl times/4 16-QAM MIMO system using the proposed algorithms and the VLSI architecture with reasonable complexity.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What have the authors contributed in "Vlsi implementation of a soft-output signal detector for multi-mode adaptive mimo systems" ?

This paper presents a multi-mode soft-output multiple-input multiple-output ( MIMO ) signal detector that is efficient in hardware cost and energy consumption. Implementation-friendly algorithms, which reuse most of the mathematical operations in these three MIMO modes, are proposed to provide accurate soft detection information, i. e., log-likelihood ratio ( LLR ), with much reduced complexity. 

The main task of calculating (20) is to find two minimum Euclidean distances with the corresponding bit vectors having the lth value equal to 1 and 0, respectively. 

It should be reemphasized that TSB takes 14Ltotal cycles to generate the candidate list L by outputting a size-four candidate vector list Li per clock cycle. 

The soft-output tree-search algorithm generates a list L of candidate vectors by going through the tree and finds the two elements of (6) within the list, i.e.,L(bl | r) ≈ min b∈L∩χ0l1N0 |r −Hs|2 − min b∈L∩χ1l1N0 |r −Hs|2. 

Due to the diagonal property of the equivalent channel matrix R in (19), this minima-search procedure is conducted for each real-valued scalar symbol independently, which is then equivalent to the symbol-level bit-flipping operation in the SM signal detection algorithm, i.e., (12). 

Due to the multi-node extension, the performance degradation is minor without bit-flipping when signals are detected at the top layer. 

Instead of calculating the Euclideandistances of all M/2 possible bit-flipped symbols and finding the minimum with extensive comparison, the authors propose to observe the location of sML in the constellation plane and then select sMLl with simple boundary check. 

Associated with the corresponding constraint shapes plotted in Fig. 4, the authors observe that the early-pruned FSD algorithm (with bit-flipping scheme) performs better when the pruning parameter L2N−1 leads to a constraint that better approximates the circularshaped admissible region. 

Their algorithm offers better performance than other fixed-complexity tree-search detections with a much smaller candidate list size (e.g., Ltotal = 16 in their algorithm comparing to K = 64 in K-Best detection and NL in LFSD). 

(13)The selection criteria in (13) finds the minimum of |LBFbl | and |LFSDbl |, which is efficient in relieving the problem of getting the pseudo-minimum of (10), leading to a more accurate approximation of the MAP result.