scispace - formally typeset

Journal ArticleDOI

A* Algorithm Inspired Memory-Efficient Detection for MIMO Systems

18 Jul 2012-IEEE Wireless Communications Letters (IEEE)-Vol. 1, Iss: 5, pp 508-511

TL;DR: Modified best-first detection algorithms in which the order of nodes is determined by both the original cost and the estimated future cost associated with each node are proposed, as inspired by an improved shortest path algorithm (A* algorithm).

AbstractImplementation of a best-first detection algorithm for multiple-input multiple-output (MIMO) systems requires large amounts of memory especially in large systems with high-order modulation. In this letter, we propose modified best-first detection algorithms in which the order of nodes is determined by both the original cost and the estimated future cost associated with each node, as inspired by an improved shortest path algorithm (A* algorithm). The modified algorithms maintain the detection optimality, reduce the memory requirement and sorting complexity, and achieve improved detection performance in memory-constrained scenarios.

Topics: A* search algorithm (56%), Dijkstra's algorithm (55%), Sorting (53%), MIMO (52%)

Summary (1 min read)

Introduction

  • Best-first search (BFS) detection schemes [2]–[6] based on the Dijkstra’s (or ) algorithm maintains a list of nodes sorted in some defined cost and explores the nodes in such order.
  • Imposing a memory constraint [6] facilitates hardware implementation and reduces the search complexity at the cost of some performance degradation.
  • The proposed methods are described Manuscript received June 18, 2012.

II. TRANSMISSION SYSTEM AND BEST-FIRST DETECTION

  • Transmitted symbol vector x̃c contains uncorrelated entries selected equiprobably from the squared quadrature amplitude modulation (QAM) alphabet S = {a + ib | a, b ∈ Q} and has zero mean and covariance matrix σ2xINT , where Q is the pulse amplitude modulation (PAM) alphabet and INT is the NT ×NT identity matrix.
  • Hc has independent and identically distributed (i.i.d.).
  • Gaussian entries with zero mean and covariance matrix σ2HINR , where σ2H = 1.
  • The channel information is assumed perfectly known to the receiver.
  • The authors reach (9) by rewriting the objective function, where the second and third terms do not depend on xk−11 .

A. Complexity Evaluation

  • Here, the authors evaluate the overall computational complexity of the proposed algorithms in comparison with conventional methods.
  • Since all processing is conducted on real values based on (2), all the calculations below refer to real operations.
  • The complexity of a tree-search detection scheme is evaluated in terms of the number of nodes visited and expanded (defined respectively by nodes that ever occupy a position and become the best node in the node list).
  • Similar calculations can be carried out for the BFS-LA2 algorithm.

B. Simulation Results

  • Here, the authors present the simulation results: symbol error rate (SER) performance in Fig. 1, memory usage in Fig. 2, and complexity in terms of floating-point operations in Table I (one real multiplication/addition each counts a flop).
  • Similar observations can be made in Fig. 1(b).
  • Fig. 2 illustrates the memory-reduction capability of the proposed schemes.

V. CONCLUSION

  • Modified BFS-based MIMO detection algorithms incorporating an efficient look-ahead mechanism have been presented.
  • Simulation results demonstrated that the proposed algorithms maintain exact ML detection capability while achieving memory savings and enhanced performance in memory-constrained scenarios.
  • Complexity analysis was conducted to confirm the computational feasibility of the proposed algorithms.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

508 IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. 1, NO. 5, OCTOBER 2012
A
Algorithm Inspired Memory-Efficient Detection for MIMO Systems
Ronald Y. Chang, Wei-Ho Chung, Member, IEEE, and Sian-Jheng Lin
Abstract—Implementation of a best-first detection algorithm
for multiple-input multiple-output (MIMO) systems requires large
amounts of memory especially in large systems with high-order
modulation. In this letter, we propose modified best-first detection
algorithms in which the order of nodes is determined by both
the original cost and the estimated future cost associated with
each node, as inspired by an improved shortest path algorithm
(A
algorithm). The modified algorithms maintain the detection
optimality, reduce the memory requirement and sorting com-
plexity, and achieve improved detection performance in memory-
constrained scenarios.
Index Terms—Maximum likelihood (ML) decoding, multiple-
input multiple-output (MIMO) systems, tree-search detection,
Dijkstra’s algorithm, A
algorithm, memory efficiency.
I. INT RODUCTION
T
HE multiple-input multiple-output (MIMO) detection
problem can be viewed as a tree-search problem [1].
Best-first search (BFS) detection schemes [2]–[6] based on
the Dijkstra’s (or stack) algorithm maintains a list (stack) of
nodes sorted in some defined cost and explores the nodes in
such order. While BFS detection can minimize the number of
searched nodes needed to establish the maximum likelihood
(ML) solution [4], it has a prohibitively large memory require-
ment. Imposing a memory constraint [6] facilitates hardware
implementation and reduces the search complexity at the cost
of some performance degradation. Modifying the sorting crite-
rion such as the use of a biased cost [3] can improve the error
and complexity performance of a BFS scheme in memory-
constrained scenarios, yet at the loss of detection optimality.
The choice of the bias is also heuristic, requiring empirical
efforts to determine proper values. The optimal detectio n
performance in memory-constrained scenarios is guaranteed in
the proposed scheme in [7] that combines the memory-efficient
sphere decoder and the computationally-efficient Dijkstra’s
algorithm. However, it generally searches more nodes than the
original BFS scheme.
In this letter, we propose an optimal BFS detection scheme
inspired by the A
algorithm [8] which speeds up the orig-
inal Dijkstra’s algorithm without losing algorithm optimality
(shortest path is guaranteed). By adding a new term to the
original cost of nodes, the proposed modified BFS scheme
demonstrates memory efficiency and improved error perfor-
mance over conventional BFS detectors in memory-constrained
scenarios. Simulation and complexity studies also show the
tradeoff between error and memory performances and the
computations required for obtaining the added term.
This letter is organized as follows. Sec. II presen ts the system
model and BFS detection. The proposed methods are described
Manuscript receive d June 18, 2012. The associate editor coordinating the
review of this letter and approving it for publication was G. V itetta.
This work was supported by the National Science Council of T a iwan under
Grant NSC 100-2221-E-001-004.
The authors are with the Research Center for Information Technology
Innovation, Academia Sinica, Taipei, Taiwan (e-mail: yjrchang@gmail.com,
{whc, sjlin}@citi.sinica.edu.tw).
Digital Object Identifier 10.1109/WCL.2012.071612.120450
in Sec. III, with complexity and performance results presented
in Sec. IV. Conclusion is given in Sec. V.
II. T
RANSMISSION SYSTEM AND BEST-FIRST DETECTION
We consider an uncoded MIMO transmission system with
N
T
(N
R
) transmit (receive) antennas (denoted by an N
T
×N
R
system). The baseband signal model is given by
y
c
= H
c
˜
x
c
+ v
c
(1)
where y
c
is the N
R
× 1 received signal containing the
N
T
× 1 transmitted signal
˜
x
c
perturbed by the N
R
× N
T
uncorrelated flat-fading channel H
c
and the N
R
× 1 noise
v
c
. Tra nsmitted symbo l vector
˜
x
c
contains uncorrelated entries
selected equiprobably from the squared quadrature amplitude
modulation (QAM) alphabet S = {a + ib | a, b ∈Q}and
has zero mean and covariance matrix σ
2
x
I
N
T
,whereQ is the
pulse amplitude modulation (PAM) alphabet and I
N
T
is the
N
T
× N
T
identity matrix. Complex-valued channel matrix H
c
has independent and identically distributed (i.i.d.) Gaussian
entries with zero mean and covariance m atrix σ
2
H
I
N
R
,where
σ
2
H
=1. The channel information is assumed perfectly known
to the receiver. Noise v
c
is additive white Gaussian noise
(AWGN) with i.i.d. complex elements and has zero mean and
covariance matrix σ
2
v
I
N
R
.
The complex signal model in (1) can be transformed into an
equivalent real signal model by defining y
=[(y
c
) (y
c
)]
T
,
˜
x =[(
˜
x
c
) (
˜
x
c
)]
T
, v =[(v
c
) (v
c
)]
T
,and
H =
(H
c
) −(H
c
)
(H
c
) (H
c
)
where (·) and (·) denote the real and imaginary parts of its
argument, respectively. The real signal model is given by
y
= H
˜
x + v (2)
where y
R
n
, H R
n×m
,
˜
x ∈Q
m
,andv R
n
, with
n =2N
R
and m =2N
T
. We hereafter assume m = n for
presentation brevity.
Given the model in (2), the ML symbol detection is to solve
˜
x
ML
=arg min
x∈Q
m
y
Hx
2
(3)
where · denotes the l
2
-norm of a vector. By performing the
QR decomposition on H (H = QR), we formulate (3) into an
equivalent expression
˜
x
ML
=argmin
x∈Q
m
y Rx
2
,where
y = Q
T
y
. The upper-triangular structure of R enables the
expansion of y Rx
2
in the form
(y
m
r
m,m
x
m
)
2
+ ···+
y
1
m
i=1
r
1,i
x
i
2
(4)
where y
i
is the ith element of y, x
i
is the ith element of x,and
r
i,j
is the (i, j)-entry of R. We denote the (m k +1)th term
in (4) by b(x
m
k
) andthesummationoftherstmk+1 terms
by d(x
m
k
) ( k =1, 2 ,...,m), where x
m
k
(x
k
,...,x
m
)
T
2162-2337/12$31.00
c
2012 IEEE

CHANG et al.:A
ALGORITHM INSPIRED MEMORY-EFFICIENT DETECTION FOR MIMO SYSTEMS 509
Q
mk+1
represents the partial symbol vector. A (rooted)
detection tree is created from (4), which consists of a v irtual
root node, the nonleaf nodes in layers 1,...,m1 each having
|Q| child nodes, and the leaf nodes in layer m,where|·|is
the cardinality of a set. Each node in layer m k +1 uniquely
represents an x
m
k
and has an associated path metric d(x
m
k
)
and branch metric b(x
m
k
).Sinced(x
m
1
) of a leaf node equals
y Rx
2
evaluated for x = x
m
1
represented by the node,
the objective of optimal detection is to find the leaf node with
the smallest path metric among all leaf nodes.
The BFS detection algorithm maintains a list of nodes sorted
in ascending order of their defined cost (denoted by c). The
cost can be a node’s path metric d [4]–[6], or its biased path
metric d k [3] if this node is in layer k,where>0 is
the bias.
1
The conventional BFS algorithm with cost c and
list-size constraint L consists of the following iterative steps:
0) Initially, the node list N contains only the root node. 1)
Select the best (first) node from N ; if this node is in layer m,
terminate the algorithm and output it as the solution. 2) Expand
the best node by adding all its child nodes to N and removing
itself from N . 3) Order the nodes in N in ascending order
of the cost c and discard nodes beyond the first min(|N |,L)
nodes. Slightly abusing the notation, we use BFS(L) to denote
the ab ove algorithm with c = d an d BFS(L, ) to denote the
algorithm with c = d k.
III. T
HE PROPOSED BEST-FIRST DETECTION ALGORITHMS
The A
algorithm [8] speeds up the search of the shortest
path in a graph by considering both the travelled distance
thus far and the estimated distance ahead (the heuristic).
If the heuristic is admissible (not over-estimating the real
distance), the shortest path is guaranteed. The more accurate
is the estimate, the better performance of the algorithm can be
achieved [9]. In a graph where the edge length represents the
geographic distance, an admissible heuristic can be the straight-
line distance from a node to the destination.
The idea of including a heuristic may be applied to enhance
a BFS detection scheme which finds the shortest path from
the source (the root node) to the destination (the grouping
of all leaf nodes). Since there is no notion of straight-line
distances in the detection tree, the h euristic can only be
obtained by calculation. The novelty of this work is that two
methods of finding admissible heuristics are developed without
involving an exhaustive search of the unexplored part of the
tree (requiring exponential complexity) and without simply
precomputing the BFS iterations (trivial modification). The
inclusion of the heuristic is termed “look-ahead (LA).
A. Look-Ahead One Layer
Consider at some point of the BFS algorithm a node in layer
m k +1 that represents a specific
˘
x
m
k
=(˘x
k
,...,˘x
m
)
T
is
visited. The existing (known) cost from the source to this node
is given by the path metric of this node d(
˘
x
m
k
)=
m
j=k
b(
˘
x
m
j
),
and the future (unknown) cost from this node to the destination
is given by
k1
j=1
b
(x
k1
j
,
˘
x
m
k
)
,where(·, ·) denotes the
concatenation of two column vectors by placing the second
one under the first one. Any lower bound on this future cost
constitutes an admissible heuristic for this node. One lower
1
In this letter, we use d to denote the path metric generally and use d(x
m
k
)
to denote the path metric of a specific node; same for c and, later, h
1
and h
2
.
bound is given by the minimum of the |Q| immediate branch
metrics under this node, i.e.,
k1
j=1
b
(x
k1
j
,
˘
x
m
k
)
min
x
k1
∈Q
b
(x
k1
,
˘
x
m
k
)
=min
x
k1
∈Q
y
(k)
k1
r
k1,k1
x
k1
2
h
1
(
˘
x
m
k
) (5)
where y
(k)
k1
is the (k1)th element of y
(k)
= y
m
i=k
˘x
k
·r
k
,
with r
k
being the kth column of R. Note that if find-
ing the minimum in (5) requires an exhaustive search of
x
k1
∈Qthen this look-ahead presents little advantage,
as it reduces to performing node expansion in BFS one
layer ahead and requires the same amount of computation.
Fortunately, the minimizing x
k1
in (5) is directly given by
the one-dimensional zero-forcing (ZF) solution after slicing,
i.e.,
y
(k)
k1
/r
k1,k1
Q
, which can be obtained without actu-
ally computing the division and slicing. For example, for 4-
QAM with Q = {−1, 1}, the minimizing x
k1
is given by
sgn
y
(k)
k1
,wheresgn(x)=1if x 0 and 1 if x<0 (note
that r
k1,k1
> 0); for 16-QAM with Q = {−3, 1, 1, 3},
the minimizing x
k1
is given by 2sgn
y
(k)
k1
+ sgn
y
(k)
k1
2sgn(y
(k)
k1
)r
k1,k1
. As a result,
y
(k)
k1
r
k1,k1
x
k1
2
in (5) needs to be computed just once rather than |Q| times.
The first heuristic is given by
h
1
(
˘
x
m
k
)=
y
(k)
k1
r
k1,k1
·
y
(k)
k1
/r
k1,k1
Q
2
. (6)
The new algorithm, referred to as the BFS-LA1(L) algorithm,
modifies the conventional BFS(L) algorithm by adopting the
cost c = d + h
1
in Step 3 and terminating the algorithm when
the best node selected is in layer m 1 in Step 1.
B. Look-Ahead Multiple Layers
The second and tighter lower bound on the future cost that
can be obtained without an exhaustive constellatio n search
and that can construct a meaningful modification is derived
by relaxing the constraint on the minimization variable from
x
k1
1
∈Q
k1
to x
k1
1
R
k1
. Specifically,
k1
j=1
b
(x
k1
j
,
˘
x
m
k
)
min
x
k1
1
∈Q
k1
k1
j=1
b
(x
k1
j
,
˘
x
m
k
)
(7)
=min
x
k1
1
∈Q
k1
y
(k)
R
(k)
x
k1
1
2
(8)
=min
x
k1
1
∈Q
k1
x
k1
1
ˆ
x
k1
1,ZF
T
R
(k)
T
R
(k)
x
k1
1
ˆ
x
k1
1,ZF
+
y
(k)
2
R
(k)
ˆ
x
k1
1,ZF
2
(9)
min
x
k1
1
R
k1
x
k1
1
ˆ
x
k1
1,ZF
2
=α
2
x
k1
1
ˆ
x
k1
1,ZF
T
R
(k)
T
R
(k)
x
k1
1
ˆ
x
k1
1,ZF
+
y
(k)
2
R
(k)
ˆ
x
k1
1,ZF
2
h
2
(
˘
x
m
k
) (10)
where R
(k)
is an m × (k 1) submatrix of R with columns
r
k
, r
k+1
,...,r
m
removed,
ˆ
x
k1
1,ZF
=
R
(k)
T
R
(k)
1
R
(k)
T
y
(k)
is the unconstrained ZF solution for the reduced-dimension

510 IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. 1, NO. 5, OCTOBER 2012
detection pr oblem in (8), and α
2
˜
x
k1
1,ZF
ˆ
x
k1
1,ZF
2
is
the squared distance between the constrained ZF solution
˜
x
k1
1,ZF
=
ˆ
x
k1
1,ZF
Q
k1
and
ˆ
x
k1
1,ZF
. We reach (9) b y rewriting
the objective function, where the second and third terms do not
depend on x
k1
1
. From (9) to (10) we have used the fact that the
first term in (9) is a convex function in x
k1
1
with the minimum
value of zero achieved at x
k1
1
=
ˆ
x
k1
1,ZF
when there is no
constraint on x
k1
1
(i.e., x
k1
1
R
k1
). By restricting x
k1
1
to
the hypersphere of
x
k1
1
ˆ
x
k1
1,ZF
2
= α
2
, we are guaranteed
to find a positive-valued minimum no greater than that yielded
by any x
k1
1
outside the hypersphere (due to convexness) as
well as that yielded by x
k1
1
=
˜
x
k1
1,ZF
(one special point on the
hypersphere). Since
˜
x
k1
1,ZF
is the nearest lattice point to
ˆ
x
k1
1,ZF
in Q
k1
, we have obtained a lower bound.
The first term in (10) is given by α
2
λ
min
, achieved at x
k1
1
=
αv
min
+
ˆ
x
k1
1,ZF
,whereλ
min
and v
min
are the minimum eigenvalue
of R
(k)
T
R
(k)
and the corresponding unit-length eigenvector,
respectively. Thus, the second heuristic is given by
h
2
(
˘
x
m
k
)=α
2
λ
min
+
y
(k)
2
R
(k)
ˆ
x
k1
1,ZF
2
. (11)
As will be verified in Sec. IV, h
2
requires more compu-
tations than h
1
but yields better performance and memory
efficiency. When the considered node
˘
x
m
k
is in layer m 1,
“look-ahead multiple layers” reduces to “look-ahead one layer,
and h
2
and h
1
become id entical. Su bstituting the new cost
c = d + h
2
in the BFS-LA1(L) algorithm gives the BFS-
LA2(L) algorithm.
IV. R
ESULTS AND DISCUSSIONS
A. Complexity Evaluation
Here, we evaluate the overall computational complexity
of the proposed algorithms in comparison with conventional
methods. Since all processing is conducted o n real values based
on (2), all the calculations below refer to real operations.
The complexity of a tree-search detection scheme is evalu-
ated in terms of the number o f nodes visited and expanded
(defined respectively by nodes that ever occupy a position
and become the best node in the node list). We let I
()
k
and
J
()
k
be the number of visited and expanded nodes in layer
m k +1, respectively, for scheme with some L and, if
applicable, . Note that visiting a node in layer mk+1 entails
computing b(x
m
k
) (m k +2 multiplications and m k +1
additions) and summing up d(x
m
k+1
) and b(x
m
k
) (one addition).
Therefore, the total complexity of the BFS algor ithm is given
by
m
k=1
I
(BFS)
k
(m k +2) multiplications and additions. The
complexity of the BFS-LA1 algorithm includes the complexity
of running the regular search iterations from layer 1 to m 1,
which requires
m
k=2
I
(BFS-LA1)
k
(m k +2) multiplications
and additions, and the complexity of look-ahead (one layer).
The complexity of look-ahead for a node in layer m k +1
is equal to the sum of the complexity of computing b(x
m
k1
)
once (m k +3 multiplications and m k +2 additions)
and the complexity of adding up d and h
1
(one addition).
The number of nodes that require such computations is equal
to the number of visited but nonexpanded nodes, which is
I
(BFS-LA1)
k
J
(BFS-LA1)
k
for layer 1,...,m 2 and I
(BFS-LA1)
k
for layer m 1. Collecting these results, the total complexity
of the BFS-LA1 algorithm is given by
m
k=2
I
(BFS-LA1)
k
(2m
2k +5)
m
k=3
J
(BFS-LA1)
k
(m k +3) multiplications and
additions.
Similar calculations can be carried out for the BFS-LA2
algorithm. Here, the complexity of look-ahead (multiple lay-
ers) includes the computation o f λ
min
,
ˆ
x
k1
1,ZF
, and some ma-
trix/vector manipulations (note that y
(k)
is already available
given d(
˘
x
m
k
)). Note that
R
(k)
T
R
(k)
1
R
(k)
T
and λ
min
can
be precomputed, one time p er layer, but α
2
,
y
(k)
2
,and
R
(k)
ˆ
x
k1
1,ZF
2
need to be computed for each node visited. We
compute matrix/vector computations by direct multiplications
and accumulations, matrix inverse by the efficient LDL
H
decomposition method [10], and λ
min
by the power method
[11] applied on
R
(k)
T
R
(k)
1
to obtain its dominant (largest)
eigenvalue. Summing numbers up, the total computation counts
for the BFS-LA2 algorithm are
m
k=2
I
(BFS-LA2)
k
(km k +
m +3)+(7/6)k
3
(8/3)k +(3/2)
multiplications and
m
k=2
I
(BFS-LA2)
k
(km k + m +2)+(7/6)k
3
(5/2)k
2
+
(17/6)k (3/2)
additions, where we have assumed equal
numbers of multiplications and additions in the complexity of
the power method approximated by 4(k 1)
2
+3(k 1) [11].
The complexity of ML detection, considered for comparison as
a b aseline scheme, is given by |Q|
m
(m
2
+ m) multiplications
and |Q|
m
(m
2
+ m 1) additions from (3).
B. Simulation Results
Here, we present the simulation results: symbol error rate
(SER) performance in Fig. 1, memory usage in Fig. 2, and
complexity in terms of floating-point operations (flops) in
Table I (one real multiplication/addition each counts a flop).
Standard minimum-mean-square-error (MMSE) linear detector
is adopted in Fig. 1 for comparison. The memory usage
counts the number o f memory units required for running an
algorithm, where each unit is used to store the (partial) symbol
vector represented by a node and the cost associated with
a node. Let I and J be the total number of visited and
expanded nodes, respectively, where I = J|Q|. Then, the
memory usage for a BFS-based detection scheme is given by
J(|Q| 1) + 1 or I(1 1/|Q|)+1 units for the case of
unlimited mem ory, and min
J(|Q| 1) + 1, (|Q| 1) + L
units for the case of limited mem ory with list-size constraint
L. The signal-to-noise ratio (SNR) in the plots is defined as
E[H
c
˜
x
c
2
]/E[v
c
2
]=N
T
σ
2
x
2
v
.
Fig. 1 shows that, in the unlimited-memory setting, BFS,
BFS-LA1, and BFS-LA2 all achieve optimal performance.
BFS(, ) has various degrees of performance degradation
but generally achieves memory savings (Fig. 2) and reduced
complexity (Table I). The performance penalty for BFS(,
) is moderate in Fig. 1(a) and very high in Fig. 1(b) with
thesamebias =0.01. T his shows th at the selection of
bias is scenario-dependent and requires a manual effort. In
the memory-constrained setting, the proposed BFS-LA1 and
BFS-LA2 schemes demonstrate significant SER performance
advantage over conventional schemes. The improved memory
efficiency of the proposed schemes when there is no memory
constraint (Fig. 2) results in a smaller SER perform ance
degradation when a memory constraint is imposed. In Fig. 1(a),
BFS-LA1(4) achieves a 3–4 dB gain over BFS(4) at SER =

CHANG et al.:A
ALGORITHM INSPIRED MEMORY-EFFICIENT DETECTION FOR MIMO SYSTEMS 511
22 24 26 28 30 32 34
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
SNR (dB)
SER
MMSE
BFS(4)
BFS(4, 0.01)
BFS−LA1(4)
BFS−LA2(4)
BFS(, 0.01)
BFS(), BFS−LA1(), BFS−LA2() = ML
(a)
30 32 34 36 38 40 42
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
SNR (dB)
SER
MMSE
BFS(8)
BFS(8, 0.01)
BFS−LA1(8)
BFS−LA2(8)
BFS(, 0.01)
BFS(), BFS−LA1(), BFS−LA2() = ML
(b)
Fig. 1. SER performance of MIMO detection schemes. (a) 4×4 MIMO with
16-QAM. (b) 4 × 4 MIMO with 64-QAM. Notations follow those in Table I.
TABLE I
C
OMPLEXITY (FLOPS )COMPARI SONS OF MIMO DETECTION SCHEMES
(LIST-SIZE CONSTRAINT L =4FOR 4 × 4 16-QAM AND L =8FOR 4 × 4
64-QAM; B
IAS =0.01)
MIMO System 4 × 4
Modulation 16-QAM 64-QAM
SNR (dB) 22 28 34 30 36 42
BFS(L, ) 618 396 357 1,102 758 719
BFS(L) 655 421 364 1,411 842 727
BFS-LA1(L) 930 618 560 2,102 1,303 1,184
BFS-LA2(L) 6,666 5,432 5,132 11,515 8,316 7,734
BFS(, ) 747 398 357 1,153 758 719
BFS() 846 436 366 1,679 861 730
BFS-LA1() 1,084 629 560 2,340 1,318 1,185
BFS-LA2() 7,183 5,466 5,133 11,695 8,355 7,736
ML 9.4 × 10
6
2.4 × 10
9
3 × 10
3
, at moderate additional cost (e.g., 560 vs. 364 flops
at SNR =34dB). BFS-LA2(4) achieves another 1–2 dB gain
at high additional cost (e.g., 5,132 flops at SNR =34dB).
Similar observations can be made in Fig. 1(b). Clearly, there
is a tradeoff between the computations required in obtaining
the heuristic and the SER and memory performances yielded
as a result of using the heuristic.
Fig. 2 illustrates the memory-reduction capability of the
proposed schemes. Comparing different schemes without a list-
size constraint (i.e., L = ) shows the n ature of the algorithms
in terms o f memory performance. As SNR increases, the mem-
ory usage converges to m(|Q|1)+1 for BFS with/without the
bias, and (m 1)(|Q| 1) + 1 for BFS-LA1/BFS-LA2. This
suggests that the proposed schemes are asymptotically more
memory-efficient than conventional schemes at high SNR. In
other SNR regions, various degrees of memory saving are
achieved for the proposed schemes since fewer iterations are
needed due to look-ahead. The reduced memory usage also
leads to reduced sorting complexity, since generating a sorted
list (or finding the minimum-cost node in the case of L = )
is required at each iteration of the algorithm.
20 25 30 35 40
20
40
60
80
100
120
140
SNR (dB)
Memory Usage (Units)
BFS()
BFS(, 0.01)
BFS−LA1()
BFS−LA2()
BFS()
BFS(, 0.01)
BFS−LA1()
BFS−LA2()
4x4 MIMO with 16−QAM
4x4 MIMO with 64−QAM
(a)
24 26 28 30 32 34 36 38 40 42
0
200
400
600
800
1000
1200
1400
1600
1800
SNR (dB)
Memory Usage (Units)
BFS()
BFS(, 0.01)
BFS−LA1()
BFS−LA2()
BFS()
BFS(, 0.01)
BFS−LA1()
BFS−LA2()
8x8 MIMO with 16−QAM
8x8 MIMO with 64−QAM
(b)
Fig. 2. Memory usage for BFS-based detection schemes. (a) 4 × 4 MIMO
with 16-QAM and 64-QAM. (b) 8 × 8 MIMO with 16-QAM and 64-QAM.
Notations follow those in Table I.
V. C ONCLUSION
Modified BFS-based MIMO detection algorithms incorpo-
rating an efficient look-ahead mechanism have been presented.
Simulation results demonstrated that the proposed algorithms
maintain exact ML detection capability while achieving mem-
ory savings and enhanced performance in memory-constrained
scenarios. Complexity analysis was conducted to confirm the
computational feasibility of the proposed algorithms.
R
EFERENCES
[1] E. G. Larsson, “MIMO detection methods: How they work, IEEE Signal
Process. Mag., vol. 26, no. 3, pp. 91–95, May 2009.
[2] F. Jelinek, “Fast sequential decoding algorithm using a stack, IBM J.
Research and Development, vol. 13, no. 6, pp. 675–685, Nov. 1969.
[3] A. D. Murugan, H. El Gamal, M. O. Damen, and G. Caire, A
unified framework for tree search decoding: rediscovering the sequential
decoder, IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 933–953, Mar.
2006.
[4] K. Su, “Efficient maximum likelihood detection for communication over
multiple input multiple output channels, Ph.D. dissertation, Univ. of
Cambridge, 2005.
[5] T. Fukatani, R. Matsumoto, and T. Uyematsu, “Two methods for decreas-
ing the computational complexity of the MIMO ML decoder , IEICE
Trans. Fundamentals, vol. E87–A, no. 10, pp. 2571–2576, Oct. 2004.
[6] A. Okawado, R. Matsumoto, and T. Uyematsu, “Near ML detection using
Dijkstra’s algorithm with bounded list size over MIMO channels, in
Proc. 2008 IEEE International Symp. on Inform. Theory, pp. 2022–2025.
[7] Y. Dai and Z. Yan, “Memory-constrained tree search detection and new
ordering schemes, IEEE J. Sel. Topics Signal Pr ocess., vol. 3, no. 6, pp.
1026–1037, Dec. 2009.
[8] P. E. Hart, N. J. Nilsson, and B. Raphael, A formal basis for the heuristic
determination of minimum cost paths, IEEE T rans. Systems Science and
Cybernetics, vol. 4, no. 2, pp. 100–107, July 1968.
[9] R. Dechter and J. Pearl, “Generalized best-first search strategies and the
optimality of A*, J. ACM, vol. 32, no. 3, pp. 505–536, July 1985.
[10] T.-H. Liu and Y.-L. Y. Liu, “Modified f ast recursive algorithm for efficient
MMSE-SIC detection of the V-BLAST system, IEEE Trans. Wireless
Commun., vol. 7, no. 10, pp. 3713–3717, Oct. 2008.
[11] I. Dimov and A. Karaivanova, A po wer method with Monte Carlo
iterations, in Recent Advances in Numerical Methods and Applications,
World Scientific, Singapore, 1999, pp. 239–247.
Citations
More filters

Journal ArticleDOI
Abstract: The emerging massive/large-scale multiple-input multiple-output (LS-MIMO) systems that rely on very large antenna arrays have become a hot topic of wireless communications. Compared to multi-antenna aided systems being built at the time of this writing, such as the long-term evolution (LTE) based fourth generation (4G) mobile communication system which allows for up to eight antenna elements at the base station (BS), the LS-MIMO system entails an unprecedented number of antennas, say 100 or more, at the BS. The huge leap in the number of BS antennas opens the door to a new research field in communication theory, propagation and electronics, where random matrix theory begins to play a dominant role. Interestingly, LS-MIMOs also constitute a perfect example of one of the key philosophical principles of the Hegelian Dialectics, namely, that “quantitative change leads to qualitative change.” In this treatise, we provide a recital on the historic heritages and novel challenges facing LS-MIMOs from a detection perspective. First, we highlight the fundamentals of MIMO detection, including the nature of co-channel interference (CCI), the generality of the MIMO detection problem, the received signal models of both linear memoryless MIMO channels and dispersive MIMO channels exhibiting memory, as well as the complex-valued versus real-valued MIMO system models. Then, an extensive review of the representative MIMO detection methods conceived during the past 50 years (1965–2015) is presented, and relevant insights as well as lessons are inferred for the sake of designing complexity-scalable MIMO detection algorithms that are potentially applicable to LS-MIMO systems. Furthermore, we divide the LS-MIMO systems into two types, and elaborate on the distinct detection strategies suitable for each of them. The type-I LS-MIMO corresponds to the case where the number of active users is much smaller than the number of BS antennas, which is currently the mainstream definition of LS-MIMO. The type-II LS-MIMO corresponds to the case where the number of active users is comparable to the number of BS antennas. Finally, we discuss the applicability of existing MIMO detection algorithms in LS-MIMO systems, and review some of the recent advances in LS-MIMO detection.

422 citations


Journal ArticleDOI
TL;DR: Numerical evaluations suggest that WESN can significantly improve the symbol detection performance as well as effectively mitigate model mismatch effects using very limited training symbols.
Abstract: In this paper, we introduce a reservoir computing (RC) structure, namely, windowed echo state network (WESN), for multiple-input-multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) symbol detection. We show that adding buffers in input layers is able to bring an enhanced short-term memory (STM) to the standard echo state network. A unified training framework is developed for the introduced WESN MIMO-OFDM symbol detector using both comb and scattered patterns, where the training set size is compatible with those adopted in 3GPP LTE/LTE-Advanced standards. Complexity analysis demonstrates the advantages of WESN based symbol detector over state-of-the-art symbol detectors when the number of OFDM sub-carriers is large, where the benchmark methods are chosen as linear minimum mean square error (LMMSE) detection and sphere decoder. Numerical evaluations suggest that WESN can significantly improve the symbol detection performance as well as effectively mitigate model mismatch effects using very limited training symbols.

16 citations


Additional excerpts

  • ...complexity reduced sphere decoding algorithm proposed in [35] for the evaluation....

    [...]


Posted Content
Abstract: This paper investigates the optimal signal detection problem with a particular interest in large-scale multiple-input multiple-output (MIMO) systems. The problem is NP-hard and can be solved optimally by searching the shortest path on the decision tree. Unfortunately, the existing optimal search algorithms often involve prohibitively high complexities, which indicates that they are infeasible in large-scale MIMO systems. To address this issue, we propose a general heuristic search algorithm, namely, hyperaccelerated tree search (HATS) algorithm. The proposed algorithm employs a deep neural network (DNN) to estimate the optimal heuristic, and then use the estimated heuristic to speed up the underlying memory-bounded search algorithm. This idea is inspired by the fact that the underlying heuristic search algorithm reaches the optimal efficiency with the optimal heuristic function. Simulation results show that the proposed algorithm reaches almost the optimal bit error rate (BER) performance in large-scale systems, while the memory size can be bounded. In the meanwhile, it visits nearly the fewest tree nodes. This indicates that the proposed algorithm reaches almost the optimal efficiency in practical scenarios, and thereby it is applicable for large-scale systems. Besides, the code for this paper is available at https://github.com/skypitcher/hats.

7 citations


Posted Content
25 Jun 2019
TL;DR: Numerical evaluations corroborate that the improvement of the STM introduced by the WESN can significantly improve the symbol detection performance as well as effectively mitigate model mismatch effects as opposed to existing methods.
Abstract: Reservoir computing (RC) is a special neural network which consists of a fixed high dimensional feature mapping and trained readout weights. In this paper, we consider a new RC structure for MIMO-OFDM symbol detection, namely windowed echo state network (WESN). It is introduced by adding buffers in input layers which brings an enhanced short-term memory (STM) of the underlying neural network through our theoretical proof. A unified training framework is developed for the WESN MIMO-OFDM symbol detector using both comb and scattered pilot patterns, where the utilized pilots are compatible with the structure adopted in 3GPP LTE/LTE-Advanced systems. Complexity analysis reveals the advantages of the WESN based symbol detector over the state-of-the-art symbol detectors such as the linear the minimum mean square error (LMMSE) detection and the sphere decoder when the system is employed with a large number of OFDM sub-carriers. Numerical evaluations corroborate that the improvement of the STM introduced by the WESN can significantly improve the symbol detection performance as well as effectively mitigate model mismatch effects as opposed to existing methods.

4 citations


Cites methods from "A* Algorithm Inspired Memory-Effici..."

  • ...We choose a complexity reduced sphere decoding algorithm proposed in [39] for the evaluation....

    [...]


Journal ArticleDOI
Abstract: Reservoir computing (RC) is a special recurrent neural network which consists of a fixed high dimensional feature mapping and trained readout weights. In this paper, we introduce a new RC structure for multiple-input, multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) symbol detection, namely windowed echo state network (WESN). The theoretical analysis shows that adding buffers in input layers can bring an enhanced short-term memory (STM) to the underlying neural network. Furthermore, a unified training framework is developed for the WESN MIMO-OFDM symbol detector using both comb and scattered pilot patterns that are compatible with the structure adopted in 3GPP LTE/LTE-Advanced systems. Complexity analysis suggests the advantages of WESN based symbol detector over state-of-the-art symbol detectors such as the linear minimum mean square error (LMMSE) detection and the sphere decoder, when the system is employed with a large number of OFDM sub-carriers. Numerical evaluations illustrate the advantage of the introduced WESN-based symbol detector and demonstrate that the improvement of STM can significantly improve symbol detection performance as well as effectively mitigate model mismatch effects compared to existing methods.

3 citations


References
More filters

Journal ArticleDOI
TL;DR: How heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching is described and an optimality property of a class of search strategies is demonstrated.
Abstract: Although the problem of determining the minimum cost path through a graph arises naturally in a number of interesting applications, there has been no underlying theory to guide the development of efficient search procedures. Moreover, there is no adequate conceptual framework within which the various ad hoc search strategies proposed to date can be compared. This paper describes how heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching and demonstrates an optimality property of a class of search strategies.

8,780 citations


"A* Algorithm Inspired Memory-Effici..." refers methods in this paper

  • ...THE PROPOSED BEST-FIRST DETECTION ALGORITHMS The A∗ algorithm [8] speeds up the search of the shortest path in a graph by considering both the travelled distance thus far and the estimated distance ahead (the heuristic)....

    [...]

  • ...In this letter, we propose an optimal BFS detection scheme inspired by the A∗ algorithm [8] which speeds up the original Dijkstra’s algorithm without losing algorithm optimality (shortest path is guaranteed)....

    [...]


Journal ArticleDOI
TL;DR: It is shown that several known properties of A* retain their form and it is also shown that no optimal algorithm exists, but if the performance tests are confirmed to cases in which the estimates are also consistent, then A* is indeed optimal.
Abstract: This paper reports several properties of heuristic best-first search strategies whose scoring functions ƒ depend on all the information available from each candidate path, not merely on the current cost g and the estimated completion cost h. It is shown that several known properties of A* retain their form (with the minmax of f playing the role of the optimal cost), which helps establish general tests of admissibility and general conditions for node expansion for these strategies. On the basis of this framework the computational optimality of A*, in the sense of never expanding a node that can be skipped by some other algorithm having access to the same heuristic information that A* uses, is examined. A hierarchy of four optimality types is defined and three classes of algorithms and four domains of problem instances are considered. Computational performances relative to these algorithms and domains are appraised. For each class-domain combination, we then identify the strongest type of optimality that exists and the algorithm for achieving it. The main results of this paper relate to the class of algorithms that, like A*, return optimal solutions (i.e., admissible) when all cost estimates are optimistic (i.e., h ≤ h*). On this class, A* is shown to be not optimal and it is also shown that no optimal algorithm exists, but if the performance tests are confirmed to cases in which the estimates are also consistent, then A* is indeed optimal. Additionally, A* is also shown to be optimal over a subset of the latter class containing all best-first algorithms that are guided by path-dependent evaluation functions.

967 citations


"A* Algorithm Inspired Memory-Effici..." refers background in this paper

  • ...The more accurate is the estimate, the better performance of the algorithm can be achieved [9]....

    [...]


Journal ArticleDOI
TL;DR: A new sequential decoding algorithm is introduced that uses stack storage at the receiver that is much simpler to describe and analyze than the Fano algorithm, and is about six times faster than the latter at transmission rates equal to Rcomp.
Abstract: In this paper a new sequential decoding algorithm is introduced that uses stack storage at the receiver It is much simpler to describe and analyze than the Fano algorithm, and is about six times faster than the latter at transmission rates equal to Rcomp the rate below which the average number of decoding steps is bounded by a constant Practical problems connected with implementing the stack algorithm are discussed and a scheme is described that facilitates satisfactory performance even with limited stack storage capacity Preliminary simulation results estimating the decoding effort and the needed stack siazree presented

624 citations


"A* Algorithm Inspired Memory-Effici..." refers methods in this paper

  • ...Best-first search (BFS) detection schemes [2]–[6] based on the Dijkstra’s (or stack) algorithm maintains a list (stack) of nodes sorted in some defined cost and explores the nodes in such order....

    [...]


Journal ArticleDOI
TL;DR: The excellent performance-complexity tradeoff achieved by the proposed MMSE-DFE Fano decoder is established via simulation results and analytical arguments in several multiple-input multiple-output (MIMO) and intersymbol interference (ISI) scenarios.
Abstract: We consider receiver design for coded transmission over linear Gaussian channels. We restrict ourselves to the class of lattice codes and formulate the joint detection and decoding problem as a closest lattice point search (CLPS). Here, a tree search framework for solving the CLPS is adopted. In our framework, the CLPS algorithm is decomposed into the preprocessing and tree search stages. The role of the preprocessing stage is to expose the tree structure in a form matched to the search stage. We argue that the forward and feedback (matrix) filters of the minimum mean-square error decision feedback equalizer (MMSE-DFE) are instrumental for solving the joint detection and decoding problem in a single search stage. It is further shown that MMSE-DFE filtering allows for solving underdetermined linear systems and using lattice reduction methods to diminish complexity, at the expense of a marginal performance loss. For the search stage, we present a generic method, based on the branch and bound (BB) algorithm, and show that it encompasses all existing sphere decoders as special cases. The proposed generic algorithm further allows for an interesting classification of tree search decoders, sheds more light on the structural properties of all known sphere decoders, and inspires the design of more efficient decoders. In particular, an efficient decoding algorithm that resembles the well-known Fano sequential decoder is identified. The excellent performance-complexity tradeoff achieved by the proposed MMSE-DFE Fano decoder is established via simulation results and analytical arguments in several multiple-input multiple-output (MIMO) and intersymbol interference (ISI) scenarios.

329 citations


Journal ArticleDOI
TL;DR: An overview of approaches for detection for MIMO, in the communications receiver context, finds that notions that are important in slow fading are less important in fast fading, where diversity is provided anyway by time variations.
Abstract: The goal of this lecture has been to provide an overview of approaches, in the communications receiver context. Which method is the best in practice? This depends much on the purpose of solving : what error rate can be tolerated, what is the ultimate measure of performance (e.g., frame-error-rate, worst-case complexity, or average complexity), and what computational platform is used. Additionally, the bits in s may be part of a larger code word and different s vectors in that code word may either see the same H (slow fading) or many different realizations of H (fast fading). This complicates the picture, because notions that are important in slow fading (such as spatial diversity) are less important in fast fading, where diversity is provided anyway by time variations. Detection for MIMO has been an active field for more than ten years, and this research will probably continue for some time.

213 citations


Frequently Asked Questions (1)
Q1. What are the contributions in this paper?

In this letter, the authors propose modified best-first detection algorithms in which the order of nodes is determined by both the original cost and the estimated future cost associated with each node, as inspired by an improved shortest path algorithm ( A∗ algorithm ).