scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A* Algorithm Inspired Memory-Efficient Detection for MIMO Systems

TL;DR: Modified best-first detection algorithms in which the order of nodes is determined by both the original cost and the estimated future cost associated with each node are proposed, as inspired by an improved shortest path algorithm (A* algorithm).
Abstract: Implementation of a best-first detection algorithm for multiple-input multiple-output (MIMO) systems requires large amounts of memory especially in large systems with high-order modulation. In this letter, we propose modified best-first detection algorithms in which the order of nodes is determined by both the original cost and the estimated future cost associated with each node, as inspired by an improved shortest path algorithm (A* algorithm). The modified algorithms maintain the detection optimality, reduce the memory requirement and sorting complexity, and achieve improved detection performance in memory-constrained scenarios.

Summary (1 min read)

Introduction

  • Best-first search (BFS) detection schemes [2]–[6] based on the Dijkstra’s (or ) algorithm maintains a list of nodes sorted in some defined cost and explores the nodes in such order.
  • Imposing a memory constraint [6] facilitates hardware implementation and reduces the search complexity at the cost of some performance degradation.
  • The proposed methods are described Manuscript received June 18, 2012.

II. TRANSMISSION SYSTEM AND BEST-FIRST DETECTION

  • Transmitted symbol vector x̃c contains uncorrelated entries selected equiprobably from the squared quadrature amplitude modulation (QAM) alphabet S = {a + ib | a, b ∈ Q} and has zero mean and covariance matrix σ2xINT , where Q is the pulse amplitude modulation (PAM) alphabet and INT is the NT ×NT identity matrix.
  • Hc has independent and identically distributed (i.i.d.).
  • Gaussian entries with zero mean and covariance matrix σ2HINR , where σ2H = 1.
  • The channel information is assumed perfectly known to the receiver.
  • The authors reach (9) by rewriting the objective function, where the second and third terms do not depend on xk−11 .

A. Complexity Evaluation

  • Here, the authors evaluate the overall computational complexity of the proposed algorithms in comparison with conventional methods.
  • Since all processing is conducted on real values based on (2), all the calculations below refer to real operations.
  • The complexity of a tree-search detection scheme is evaluated in terms of the number of nodes visited and expanded (defined respectively by nodes that ever occupy a position and become the best node in the node list).
  • Similar calculations can be carried out for the BFS-LA2 algorithm.

B. Simulation Results

  • Here, the authors present the simulation results: symbol error rate (SER) performance in Fig. 1, memory usage in Fig. 2, and complexity in terms of floating-point operations in Table I (one real multiplication/addition each counts a flop).
  • Similar observations can be made in Fig. 1(b).
  • Fig. 2 illustrates the memory-reduction capability of the proposed schemes.

V. CONCLUSION

  • Modified BFS-based MIMO detection algorithms incorporating an efficient look-ahead mechanism have been presented.
  • Simulation results demonstrated that the proposed algorithms maintain exact ML detection capability while achieving memory savings and enhanced performance in memory-constrained scenarios.
  • Complexity analysis was conducted to confirm the computational feasibility of the proposed algorithms.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

508 IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. 1, NO. 5, OCTOBER 2012
A
Algorithm Inspired Memory-Efficient Detection for MIMO Systems
Ronald Y. Chang, Wei-Ho Chung, Member, IEEE, and Sian-Jheng Lin
Abstract—Implementation of a best-first detection algorithm
for multiple-input multiple-output (MIMO) systems requires large
amounts of memory especially in large systems with high-order
modulation. In this letter, we propose modified best-first detection
algorithms in which the order of nodes is determined by both
the original cost and the estimated future cost associated with
each node, as inspired by an improved shortest path algorithm
(A
algorithm). The modified algorithms maintain the detection
optimality, reduce the memory requirement and sorting com-
plexity, and achieve improved detection performance in memory-
constrained scenarios.
Index Terms—Maximum likelihood (ML) decoding, multiple-
input multiple-output (MIMO) systems, tree-search detection,
Dijkstra’s algorithm, A
algorithm, memory efficiency.
I. INT RODUCTION
T
HE multiple-input multiple-output (MIMO) detection
problem can be viewed as a tree-search problem [1].
Best-first search (BFS) detection schemes [2]–[6] based on
the Dijkstra’s (or stack) algorithm maintains a list (stack) of
nodes sorted in some defined cost and explores the nodes in
such order. While BFS detection can minimize the number of
searched nodes needed to establish the maximum likelihood
(ML) solution [4], it has a prohibitively large memory require-
ment. Imposing a memory constraint [6] facilitates hardware
implementation and reduces the search complexity at the cost
of some performance degradation. Modifying the sorting crite-
rion such as the use of a biased cost [3] can improve the error
and complexity performance of a BFS scheme in memory-
constrained scenarios, yet at the loss of detection optimality.
The choice of the bias is also heuristic, requiring empirical
efforts to determine proper values. The optimal detectio n
performance in memory-constrained scenarios is guaranteed in
the proposed scheme in [7] that combines the memory-efficient
sphere decoder and the computationally-efficient Dijkstra’s
algorithm. However, it generally searches more nodes than the
original BFS scheme.
In this letter, we propose an optimal BFS detection scheme
inspired by the A
algorithm [8] which speeds up the orig-
inal Dijkstra’s algorithm without losing algorithm optimality
(shortest path is guaranteed). By adding a new term to the
original cost of nodes, the proposed modified BFS scheme
demonstrates memory efficiency and improved error perfor-
mance over conventional BFS detectors in memory-constrained
scenarios. Simulation and complexity studies also show the
tradeoff between error and memory performances and the
computations required for obtaining the added term.
This letter is organized as follows. Sec. II presen ts the system
model and BFS detection. The proposed methods are described
Manuscript receive d June 18, 2012. The associate editor coordinating the
review of this letter and approving it for publication was G. V itetta.
This work was supported by the National Science Council of T a iwan under
Grant NSC 100-2221-E-001-004.
The authors are with the Research Center for Information Technology
Innovation, Academia Sinica, Taipei, Taiwan (e-mail: yjrchang@gmail.com,
{whc, sjlin}@citi.sinica.edu.tw).
Digital Object Identifier 10.1109/WCL.2012.071612.120450
in Sec. III, with complexity and performance results presented
in Sec. IV. Conclusion is given in Sec. V.
II. T
RANSMISSION SYSTEM AND BEST-FIRST DETECTION
We consider an uncoded MIMO transmission system with
N
T
(N
R
) transmit (receive) antennas (denoted by an N
T
×N
R
system). The baseband signal model is given by
y
c
= H
c
˜
x
c
+ v
c
(1)
where y
c
is the N
R
× 1 received signal containing the
N
T
× 1 transmitted signal
˜
x
c
perturbed by the N
R
× N
T
uncorrelated flat-fading channel H
c
and the N
R
× 1 noise
v
c
. Tra nsmitted symbo l vector
˜
x
c
contains uncorrelated entries
selected equiprobably from the squared quadrature amplitude
modulation (QAM) alphabet S = {a + ib | a, b ∈Q}and
has zero mean and covariance matrix σ
2
x
I
N
T
,whereQ is the
pulse amplitude modulation (PAM) alphabet and I
N
T
is the
N
T
× N
T
identity matrix. Complex-valued channel matrix H
c
has independent and identically distributed (i.i.d.) Gaussian
entries with zero mean and covariance m atrix σ
2
H
I
N
R
,where
σ
2
H
=1. The channel information is assumed perfectly known
to the receiver. Noise v
c
is additive white Gaussian noise
(AWGN) with i.i.d. complex elements and has zero mean and
covariance matrix σ
2
v
I
N
R
.
The complex signal model in (1) can be transformed into an
equivalent real signal model by defining y
=[(y
c
) (y
c
)]
T
,
˜
x =[(
˜
x
c
) (
˜
x
c
)]
T
, v =[(v
c
) (v
c
)]
T
,and
H =
(H
c
) −(H
c
)
(H
c
) (H
c
)
where (·) and (·) denote the real and imaginary parts of its
argument, respectively. The real signal model is given by
y
= H
˜
x + v (2)
where y
R
n
, H R
n×m
,
˜
x ∈Q
m
,andv R
n
, with
n =2N
R
and m =2N
T
. We hereafter assume m = n for
presentation brevity.
Given the model in (2), the ML symbol detection is to solve
˜
x
ML
=arg min
x∈Q
m
y
Hx
2
(3)
where · denotes the l
2
-norm of a vector. By performing the
QR decomposition on H (H = QR), we formulate (3) into an
equivalent expression
˜
x
ML
=argmin
x∈Q
m
y Rx
2
,where
y = Q
T
y
. The upper-triangular structure of R enables the
expansion of y Rx
2
in the form
(y
m
r
m,m
x
m
)
2
+ ···+
y
1
m
i=1
r
1,i
x
i
2
(4)
where y
i
is the ith element of y, x
i
is the ith element of x,and
r
i,j
is the (i, j)-entry of R. We denote the (m k +1)th term
in (4) by b(x
m
k
) andthesummationoftherstmk+1 terms
by d(x
m
k
) ( k =1, 2 ,...,m), where x
m
k
(x
k
,...,x
m
)
T
2162-2337/12$31.00
c
2012 IEEE

CHANG et al.:A
ALGORITHM INSPIRED MEMORY-EFFICIENT DETECTION FOR MIMO SYSTEMS 509
Q
mk+1
represents the partial symbol vector. A (rooted)
detection tree is created from (4), which consists of a v irtual
root node, the nonleaf nodes in layers 1,...,m1 each having
|Q| child nodes, and the leaf nodes in layer m,where|·|is
the cardinality of a set. Each node in layer m k +1 uniquely
represents an x
m
k
and has an associated path metric d(x
m
k
)
and branch metric b(x
m
k
).Sinced(x
m
1
) of a leaf node equals
y Rx
2
evaluated for x = x
m
1
represented by the node,
the objective of optimal detection is to find the leaf node with
the smallest path metric among all leaf nodes.
The BFS detection algorithm maintains a list of nodes sorted
in ascending order of their defined cost (denoted by c). The
cost can be a node’s path metric d [4]–[6], or its biased path
metric d k [3] if this node is in layer k,where>0 is
the bias.
1
The conventional BFS algorithm with cost c and
list-size constraint L consists of the following iterative steps:
0) Initially, the node list N contains only the root node. 1)
Select the best (first) node from N ; if this node is in layer m,
terminate the algorithm and output it as the solution. 2) Expand
the best node by adding all its child nodes to N and removing
itself from N . 3) Order the nodes in N in ascending order
of the cost c and discard nodes beyond the first min(|N |,L)
nodes. Slightly abusing the notation, we use BFS(L) to denote
the ab ove algorithm with c = d an d BFS(L, ) to denote the
algorithm with c = d k.
III. T
HE PROPOSED BEST-FIRST DETECTION ALGORITHMS
The A
algorithm [8] speeds up the search of the shortest
path in a graph by considering both the travelled distance
thus far and the estimated distance ahead (the heuristic).
If the heuristic is admissible (not over-estimating the real
distance), the shortest path is guaranteed. The more accurate
is the estimate, the better performance of the algorithm can be
achieved [9]. In a graph where the edge length represents the
geographic distance, an admissible heuristic can be the straight-
line distance from a node to the destination.
The idea of including a heuristic may be applied to enhance
a BFS detection scheme which finds the shortest path from
the source (the root node) to the destination (the grouping
of all leaf nodes). Since there is no notion of straight-line
distances in the detection tree, the h euristic can only be
obtained by calculation. The novelty of this work is that two
methods of finding admissible heuristics are developed without
involving an exhaustive search of the unexplored part of the
tree (requiring exponential complexity) and without simply
precomputing the BFS iterations (trivial modification). The
inclusion of the heuristic is termed “look-ahead (LA).
A. Look-Ahead One Layer
Consider at some point of the BFS algorithm a node in layer
m k +1 that represents a specific
˘
x
m
k
=(˘x
k
,...,˘x
m
)
T
is
visited. The existing (known) cost from the source to this node
is given by the path metric of this node d(
˘
x
m
k
)=
m
j=k
b(
˘
x
m
j
),
and the future (unknown) cost from this node to the destination
is given by
k1
j=1
b
(x
k1
j
,
˘
x
m
k
)
,where(·, ·) denotes the
concatenation of two column vectors by placing the second
one under the first one. Any lower bound on this future cost
constitutes an admissible heuristic for this node. One lower
1
In this letter, we use d to denote the path metric generally and use d(x
m
k
)
to denote the path metric of a specific node; same for c and, later, h
1
and h
2
.
bound is given by the minimum of the |Q| immediate branch
metrics under this node, i.e.,
k1
j=1
b
(x
k1
j
,
˘
x
m
k
)
min
x
k1
∈Q
b
(x
k1
,
˘
x
m
k
)
=min
x
k1
∈Q
y
(k)
k1
r
k1,k1
x
k1
2
h
1
(
˘
x
m
k
) (5)
where y
(k)
k1
is the (k1)th element of y
(k)
= y
m
i=k
˘x
k
·r
k
,
with r
k
being the kth column of R. Note that if find-
ing the minimum in (5) requires an exhaustive search of
x
k1
∈Qthen this look-ahead presents little advantage,
as it reduces to performing node expansion in BFS one
layer ahead and requires the same amount of computation.
Fortunately, the minimizing x
k1
in (5) is directly given by
the one-dimensional zero-forcing (ZF) solution after slicing,
i.e.,
y
(k)
k1
/r
k1,k1
Q
, which can be obtained without actu-
ally computing the division and slicing. For example, for 4-
QAM with Q = {−1, 1}, the minimizing x
k1
is given by
sgn
y
(k)
k1
,wheresgn(x)=1if x 0 and 1 if x<0 (note
that r
k1,k1
> 0); for 16-QAM with Q = {−3, 1, 1, 3},
the minimizing x
k1
is given by 2sgn
y
(k)
k1
+ sgn
y
(k)
k1
2sgn(y
(k)
k1
)r
k1,k1
. As a result,
y
(k)
k1
r
k1,k1
x
k1
2
in (5) needs to be computed just once rather than |Q| times.
The first heuristic is given by
h
1
(
˘
x
m
k
)=
y
(k)
k1
r
k1,k1
·
y
(k)
k1
/r
k1,k1
Q
2
. (6)
The new algorithm, referred to as the BFS-LA1(L) algorithm,
modifies the conventional BFS(L) algorithm by adopting the
cost c = d + h
1
in Step 3 and terminating the algorithm when
the best node selected is in layer m 1 in Step 1.
B. Look-Ahead Multiple Layers
The second and tighter lower bound on the future cost that
can be obtained without an exhaustive constellatio n search
and that can construct a meaningful modification is derived
by relaxing the constraint on the minimization variable from
x
k1
1
∈Q
k1
to x
k1
1
R
k1
. Specifically,
k1
j=1
b
(x
k1
j
,
˘
x
m
k
)
min
x
k1
1
∈Q
k1
k1
j=1
b
(x
k1
j
,
˘
x
m
k
)
(7)
=min
x
k1
1
∈Q
k1
y
(k)
R
(k)
x
k1
1
2
(8)
=min
x
k1
1
∈Q
k1
x
k1
1
ˆ
x
k1
1,ZF
T
R
(k)
T
R
(k)
x
k1
1
ˆ
x
k1
1,ZF
+
y
(k)
2
R
(k)
ˆ
x
k1
1,ZF
2
(9)
min
x
k1
1
R
k1
x
k1
1
ˆ
x
k1
1,ZF
2
=α
2
x
k1
1
ˆ
x
k1
1,ZF
T
R
(k)
T
R
(k)
x
k1
1
ˆ
x
k1
1,ZF
+
y
(k)
2
R
(k)
ˆ
x
k1
1,ZF
2
h
2
(
˘
x
m
k
) (10)
where R
(k)
is an m × (k 1) submatrix of R with columns
r
k
, r
k+1
,...,r
m
removed,
ˆ
x
k1
1,ZF
=
R
(k)
T
R
(k)
1
R
(k)
T
y
(k)
is the unconstrained ZF solution for the reduced-dimension

510 IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. 1, NO. 5, OCTOBER 2012
detection pr oblem in (8), and α
2
˜
x
k1
1,ZF
ˆ
x
k1
1,ZF
2
is
the squared distance between the constrained ZF solution
˜
x
k1
1,ZF
=
ˆ
x
k1
1,ZF
Q
k1
and
ˆ
x
k1
1,ZF
. We reach (9) b y rewriting
the objective function, where the second and third terms do not
depend on x
k1
1
. From (9) to (10) we have used the fact that the
first term in (9) is a convex function in x
k1
1
with the minimum
value of zero achieved at x
k1
1
=
ˆ
x
k1
1,ZF
when there is no
constraint on x
k1
1
(i.e., x
k1
1
R
k1
). By restricting x
k1
1
to
the hypersphere of
x
k1
1
ˆ
x
k1
1,ZF
2
= α
2
, we are guaranteed
to find a positive-valued minimum no greater than that yielded
by any x
k1
1
outside the hypersphere (due to convexness) as
well as that yielded by x
k1
1
=
˜
x
k1
1,ZF
(one special point on the
hypersphere). Since
˜
x
k1
1,ZF
is the nearest lattice point to
ˆ
x
k1
1,ZF
in Q
k1
, we have obtained a lower bound.
The first term in (10) is given by α
2
λ
min
, achieved at x
k1
1
=
αv
min
+
ˆ
x
k1
1,ZF
,whereλ
min
and v
min
are the minimum eigenvalue
of R
(k)
T
R
(k)
and the corresponding unit-length eigenvector,
respectively. Thus, the second heuristic is given by
h
2
(
˘
x
m
k
)=α
2
λ
min
+
y
(k)
2
R
(k)
ˆ
x
k1
1,ZF
2
. (11)
As will be verified in Sec. IV, h
2
requires more compu-
tations than h
1
but yields better performance and memory
efficiency. When the considered node
˘
x
m
k
is in layer m 1,
“look-ahead multiple layers” reduces to “look-ahead one layer,
and h
2
and h
1
become id entical. Su bstituting the new cost
c = d + h
2
in the BFS-LA1(L) algorithm gives the BFS-
LA2(L) algorithm.
IV. R
ESULTS AND DISCUSSIONS
A. Complexity Evaluation
Here, we evaluate the overall computational complexity
of the proposed algorithms in comparison with conventional
methods. Since all processing is conducted o n real values based
on (2), all the calculations below refer to real operations.
The complexity of a tree-search detection scheme is evalu-
ated in terms of the number o f nodes visited and expanded
(defined respectively by nodes that ever occupy a position
and become the best node in the node list). We let I
()
k
and
J
()
k
be the number of visited and expanded nodes in layer
m k +1, respectively, for scheme with some L and, if
applicable, . Note that visiting a node in layer mk+1 entails
computing b(x
m
k
) (m k +2 multiplications and m k +1
additions) and summing up d(x
m
k+1
) and b(x
m
k
) (one addition).
Therefore, the total complexity of the BFS algor ithm is given
by
m
k=1
I
(BFS)
k
(m k +2) multiplications and additions. The
complexity of the BFS-LA1 algorithm includes the complexity
of running the regular search iterations from layer 1 to m 1,
which requires
m
k=2
I
(BFS-LA1)
k
(m k +2) multiplications
and additions, and the complexity of look-ahead (one layer).
The complexity of look-ahead for a node in layer m k +1
is equal to the sum of the complexity of computing b(x
m
k1
)
once (m k +3 multiplications and m k +2 additions)
and the complexity of adding up d and h
1
(one addition).
The number of nodes that require such computations is equal
to the number of visited but nonexpanded nodes, which is
I
(BFS-LA1)
k
J
(BFS-LA1)
k
for layer 1,...,m 2 and I
(BFS-LA1)
k
for layer m 1. Collecting these results, the total complexity
of the BFS-LA1 algorithm is given by
m
k=2
I
(BFS-LA1)
k
(2m
2k +5)
m
k=3
J
(BFS-LA1)
k
(m k +3) multiplications and
additions.
Similar calculations can be carried out for the BFS-LA2
algorithm. Here, the complexity of look-ahead (multiple lay-
ers) includes the computation o f λ
min
,
ˆ
x
k1
1,ZF
, and some ma-
trix/vector manipulations (note that y
(k)
is already available
given d(
˘
x
m
k
)). Note that
R
(k)
T
R
(k)
1
R
(k)
T
and λ
min
can
be precomputed, one time p er layer, but α
2
,
y
(k)
2
,and
R
(k)
ˆ
x
k1
1,ZF
2
need to be computed for each node visited. We
compute matrix/vector computations by direct multiplications
and accumulations, matrix inverse by the efficient LDL
H
decomposition method [10], and λ
min
by the power method
[11] applied on
R
(k)
T
R
(k)
1
to obtain its dominant (largest)
eigenvalue. Summing numbers up, the total computation counts
for the BFS-LA2 algorithm are
m
k=2
I
(BFS-LA2)
k
(km k +
m +3)+(7/6)k
3
(8/3)k +(3/2)
multiplications and
m
k=2
I
(BFS-LA2)
k
(km k + m +2)+(7/6)k
3
(5/2)k
2
+
(17/6)k (3/2)
additions, where we have assumed equal
numbers of multiplications and additions in the complexity of
the power method approximated by 4(k 1)
2
+3(k 1) [11].
The complexity of ML detection, considered for comparison as
a b aseline scheme, is given by |Q|
m
(m
2
+ m) multiplications
and |Q|
m
(m
2
+ m 1) additions from (3).
B. Simulation Results
Here, we present the simulation results: symbol error rate
(SER) performance in Fig. 1, memory usage in Fig. 2, and
complexity in terms of floating-point operations (flops) in
Table I (one real multiplication/addition each counts a flop).
Standard minimum-mean-square-error (MMSE) linear detector
is adopted in Fig. 1 for comparison. The memory usage
counts the number o f memory units required for running an
algorithm, where each unit is used to store the (partial) symbol
vector represented by a node and the cost associated with
a node. Let I and J be the total number of visited and
expanded nodes, respectively, where I = J|Q|. Then, the
memory usage for a BFS-based detection scheme is given by
J(|Q| 1) + 1 or I(1 1/|Q|)+1 units for the case of
unlimited mem ory, and min
J(|Q| 1) + 1, (|Q| 1) + L
units for the case of limited mem ory with list-size constraint
L. The signal-to-noise ratio (SNR) in the plots is defined as
E[H
c
˜
x
c
2
]/E[v
c
2
]=N
T
σ
2
x
2
v
.
Fig. 1 shows that, in the unlimited-memory setting, BFS,
BFS-LA1, and BFS-LA2 all achieve optimal performance.
BFS(, ) has various degrees of performance degradation
but generally achieves memory savings (Fig. 2) and reduced
complexity (Table I). The performance penalty for BFS(,
) is moderate in Fig. 1(a) and very high in Fig. 1(b) with
thesamebias =0.01. T his shows th at the selection of
bias is scenario-dependent and requires a manual effort. In
the memory-constrained setting, the proposed BFS-LA1 and
BFS-LA2 schemes demonstrate significant SER performance
advantage over conventional schemes. The improved memory
efficiency of the proposed schemes when there is no memory
constraint (Fig. 2) results in a smaller SER perform ance
degradation when a memory constraint is imposed. In Fig. 1(a),
BFS-LA1(4) achieves a 3–4 dB gain over BFS(4) at SER =

CHANG et al.:A
ALGORITHM INSPIRED MEMORY-EFFICIENT DETECTION FOR MIMO SYSTEMS 511
22 24 26 28 30 32 34
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
SNR (dB)
SER
MMSE
BFS(4)
BFS(4, 0.01)
BFS−LA1(4)
BFS−LA2(4)
BFS(, 0.01)
BFS(), BFS−LA1(), BFS−LA2() = ML
(a)
30 32 34 36 38 40 42
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
SNR (dB)
SER
MMSE
BFS(8)
BFS(8, 0.01)
BFS−LA1(8)
BFS−LA2(8)
BFS(, 0.01)
BFS(), BFS−LA1(), BFS−LA2() = ML
(b)
Fig. 1. SER performance of MIMO detection schemes. (a) 4×4 MIMO with
16-QAM. (b) 4 × 4 MIMO with 64-QAM. Notations follow those in Table I.
TABLE I
C
OMPLEXITY (FLOPS )COMPARI SONS OF MIMO DETECTION SCHEMES
(LIST-SIZE CONSTRAINT L =4FOR 4 × 4 16-QAM AND L =8FOR 4 × 4
64-QAM; B
IAS =0.01)
MIMO System 4 × 4
Modulation 16-QAM 64-QAM
SNR (dB) 22 28 34 30 36 42
BFS(L, ) 618 396 357 1,102 758 719
BFS(L) 655 421 364 1,411 842 727
BFS-LA1(L) 930 618 560 2,102 1,303 1,184
BFS-LA2(L) 6,666 5,432 5,132 11,515 8,316 7,734
BFS(, ) 747 398 357 1,153 758 719
BFS() 846 436 366 1,679 861 730
BFS-LA1() 1,084 629 560 2,340 1,318 1,185
BFS-LA2() 7,183 5,466 5,133 11,695 8,355 7,736
ML 9.4 × 10
6
2.4 × 10
9
3 × 10
3
, at moderate additional cost (e.g., 560 vs. 364 flops
at SNR =34dB). BFS-LA2(4) achieves another 1–2 dB gain
at high additional cost (e.g., 5,132 flops at SNR =34dB).
Similar observations can be made in Fig. 1(b). Clearly, there
is a tradeoff between the computations required in obtaining
the heuristic and the SER and memory performances yielded
as a result of using the heuristic.
Fig. 2 illustrates the memory-reduction capability of the
proposed schemes. Comparing different schemes without a list-
size constraint (i.e., L = ) shows the n ature of the algorithms
in terms o f memory performance. As SNR increases, the mem-
ory usage converges to m(|Q|1)+1 for BFS with/without the
bias, and (m 1)(|Q| 1) + 1 for BFS-LA1/BFS-LA2. This
suggests that the proposed schemes are asymptotically more
memory-efficient than conventional schemes at high SNR. In
other SNR regions, various degrees of memory saving are
achieved for the proposed schemes since fewer iterations are
needed due to look-ahead. The reduced memory usage also
leads to reduced sorting complexity, since generating a sorted
list (or finding the minimum-cost node in the case of L = )
is required at each iteration of the algorithm.
20 25 30 35 40
20
40
60
80
100
120
140
SNR (dB)
Memory Usage (Units)
BFS()
BFS(, 0.01)
BFS−LA1()
BFS−LA2()
BFS()
BFS(, 0.01)
BFS−LA1()
BFS−LA2()
4x4 MIMO with 16−QAM
4x4 MIMO with 64−QAM
(a)
24 26 28 30 32 34 36 38 40 42
0
200
400
600
800
1000
1200
1400
1600
1800
SNR (dB)
Memory Usage (Units)
BFS()
BFS(, 0.01)
BFS−LA1()
BFS−LA2()
BFS()
BFS(, 0.01)
BFS−LA1()
BFS−LA2()
8x8 MIMO with 16−QAM
8x8 MIMO with 64−QAM
(b)
Fig. 2. Memory usage for BFS-based detection schemes. (a) 4 × 4 MIMO
with 16-QAM and 64-QAM. (b) 8 × 8 MIMO with 16-QAM and 64-QAM.
Notations follow those in Table I.
V. C ONCLUSION
Modified BFS-based MIMO detection algorithms incorpo-
rating an efficient look-ahead mechanism have been presented.
Simulation results demonstrated that the proposed algorithms
maintain exact ML detection capability while achieving mem-
ory savings and enhanced performance in memory-constrained
scenarios. Complexity analysis was conducted to confirm the
computational feasibility of the proposed algorithms.
R
EFERENCES
[1] E. G. Larsson, “MIMO detection methods: How they work, IEEE Signal
Process. Mag., vol. 26, no. 3, pp. 91–95, May 2009.
[2] F. Jelinek, “Fast sequential decoding algorithm using a stack, IBM J.
Research and Development, vol. 13, no. 6, pp. 675–685, Nov. 1969.
[3] A. D. Murugan, H. El Gamal, M. O. Damen, and G. Caire, A
unified framework for tree search decoding: rediscovering the sequential
decoder, IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 933–953, Mar.
2006.
[4] K. Su, “Efficient maximum likelihood detection for communication over
multiple input multiple output channels, Ph.D. dissertation, Univ. of
Cambridge, 2005.
[5] T. Fukatani, R. Matsumoto, and T. Uyematsu, “Two methods for decreas-
ing the computational complexity of the MIMO ML decoder , IEICE
Trans. Fundamentals, vol. E87–A, no. 10, pp. 2571–2576, Oct. 2004.
[6] A. Okawado, R. Matsumoto, and T. Uyematsu, “Near ML detection using
Dijkstra’s algorithm with bounded list size over MIMO channels, in
Proc. 2008 IEEE International Symp. on Inform. Theory, pp. 2022–2025.
[7] Y. Dai and Z. Yan, “Memory-constrained tree search detection and new
ordering schemes, IEEE J. Sel. Topics Signal Pr ocess., vol. 3, no. 6, pp.
1026–1037, Dec. 2009.
[8] P. E. Hart, N. J. Nilsson, and B. Raphael, A formal basis for the heuristic
determination of minimum cost paths, IEEE T rans. Systems Science and
Cybernetics, vol. 4, no. 2, pp. 100–107, July 1968.
[9] R. Dechter and J. Pearl, “Generalized best-first search strategies and the
optimality of A*, J. ACM, vol. 32, no. 3, pp. 505–536, July 1985.
[10] T.-H. Liu and Y.-L. Y. Liu, “Modified f ast recursive algorithm for efficient
MMSE-SIC detection of the V-BLAST system, IEEE Trans. Wireless
Commun., vol. 7, no. 10, pp. 3713–3717, Oct. 2008.
[11] I. Dimov and A. Karaivanova, A po wer method with Monte Carlo
iterations, in Recent Advances in Numerical Methods and Applications,
World Scientific, Singapore, 1999, pp. 239–247.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors provide a recital on the historic heritages and novel challenges facing massive/large-scale multiple-input multiple-output (LS-MIMO) systems from a detection perspective.
Abstract: The emerging massive/large-scale multiple-input multiple-output (LS-MIMO) systems that rely on very large antenna arrays have become a hot topic of wireless communications. Compared to multi-antenna aided systems being built at the time of this writing, such as the long-term evolution (LTE) based fourth generation (4G) mobile communication system which allows for up to eight antenna elements at the base station (BS), the LS-MIMO system entails an unprecedented number of antennas, say 100 or more, at the BS. The huge leap in the number of BS antennas opens the door to a new research field in communication theory, propagation and electronics, where random matrix theory begins to play a dominant role. Interestingly, LS-MIMOs also constitute a perfect example of one of the key philosophical principles of the Hegelian Dialectics, namely, that “quantitative change leads to qualitative change.” In this treatise, we provide a recital on the historic heritages and novel challenges facing LS-MIMOs from a detection perspective. First, we highlight the fundamentals of MIMO detection, including the nature of co-channel interference (CCI), the generality of the MIMO detection problem, the received signal models of both linear memoryless MIMO channels and dispersive MIMO channels exhibiting memory, as well as the complex-valued versus real-valued MIMO system models. Then, an extensive review of the representative MIMO detection methods conceived during the past 50 years (1965–2015) is presented, and relevant insights as well as lessons are inferred for the sake of designing complexity-scalable MIMO detection algorithms that are potentially applicable to LS-MIMO systems. Furthermore, we divide the LS-MIMO systems into two types, and elaborate on the distinct detection strategies suitable for each of them. The type-I LS-MIMO corresponds to the case where the number of active users is much smaller than the number of BS antennas, which is currently the mainstream definition of LS-MIMO. The type-II LS-MIMO corresponds to the case where the number of active users is comparable to the number of BS antennas. Finally, we discuss the applicability of existing MIMO detection algorithms in LS-MIMO systems, and review some of the recent advances in LS-MIMO detection.

626 citations

Journal ArticleDOI
TL;DR: Numerical evaluations suggest that WESN can significantly improve the symbol detection performance as well as effectively mitigate model mismatch effects using very limited training symbols.
Abstract: In this paper, we introduce a reservoir computing (RC) structure, namely, windowed echo state network (WESN), for multiple-input-multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) symbol detection. We show that adding buffers in input layers is able to bring an enhanced short-term memory (STM) to the standard echo state network. A unified training framework is developed for the introduced WESN MIMO-OFDM symbol detector using both comb and scattered patterns, where the training set size is compatible with those adopted in 3GPP LTE/LTE-Advanced standards. Complexity analysis demonstrates the advantages of WESN based symbol detector over state-of-the-art symbol detectors when the number of OFDM sub-carriers is large, where the benchmark methods are chosen as linear minimum mean square error (LMMSE) detection and sphere decoder. Numerical evaluations suggest that WESN can significantly improve the symbol detection performance as well as effectively mitigate model mismatch effects using very limited training symbols.

27 citations


Additional excerpts

  • ...complexity reduced sphere decoding algorithm proposed in [35] for the evaluation....

    [...]

Posted Content
TL;DR: In this paper, a hyperaccelerated tree search (HATS) algorithm was proposed to solve the optimal signal detection problem in large-scale MIMO systems, which employs a deep neural network (DNN) to estimate the optimal heuristic, and then uses the estimated heuristic to speed up the underlying memory-bounded search algorithm.
Abstract: This paper investigates the optimal signal detection problem with a particular interest in large-scale multiple-input multiple-output (MIMO) systems. The problem is NP-hard and can be solved optimally by searching the shortest path on the decision tree. Unfortunately, the existing optimal search algorithms often involve prohibitively high complexities, which indicates that they are infeasible in large-scale MIMO systems. To address this issue, we propose a general heuristic search algorithm, namely, hyperaccelerated tree search (HATS) algorithm. The proposed algorithm employs a deep neural network (DNN) to estimate the optimal heuristic, and then use the estimated heuristic to speed up the underlying memory-bounded search algorithm. This idea is inspired by the fact that the underlying heuristic search algorithm reaches the optimal efficiency with the optimal heuristic function. Simulation results show that the proposed algorithm reaches almost the optimal bit error rate (BER) performance in large-scale systems, while the memory size can be bounded. In the meanwhile, it visits nearly the fewest tree nodes. This indicates that the proposed algorithm reaches almost the optimal efficiency in practical scenarios, and thereby it is applicable for large-scale systems. Besides, the code for this paper is available at https://github.com/skypitcher/hats.

19 citations

Journal ArticleDOI
TL;DR: In this paper, a windowed echo state network (WESN) was proposed for symbol detection in MIMO-OFDM systems, where buffers in input layers can bring an enhanced short-term memory (STM) to the underlying neural network.
Abstract: Reservoir computing (RC) is a special recurrent neural network which consists of a fixed high dimensional feature mapping and trained readout weights. In this paper, we introduce a new RC structure for multiple-input, multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) symbol detection, namely windowed echo state network (WESN). The theoretical analysis shows that adding buffers in input layers can bring an enhanced short-term memory (STM) to the underlying neural network. Furthermore, a unified training framework is developed for the WESN MIMO-OFDM symbol detector using both comb and scattered pilot patterns that are compatible with the structure adopted in 3GPP LTE/LTE-Advanced systems. Complexity analysis suggests the advantages of WESN based symbol detector over state-of-the-art symbol detectors such as the linear minimum mean square error (LMMSE) detection and the sphere decoder, when the system is employed with a large number of OFDM sub-carriers. Numerical evaluations illustrate the advantage of the introduced WESN-based symbol detector and demonstrate that the improvement of STM can significantly improve symbol detection performance as well as effectively mitigate model mismatch effects compared to existing methods.

16 citations

Journal ArticleDOI
TL;DR: In this paper , a hyper-accelerated tree search (HATS) algorithm was proposed to solve the optimal signal detection problem in large-scale multiple-input multiple-output (MIMO) systems, which employs a deep neural network (DNN) to estimate the optimal heuristic, and then uses the estimated heuristic to speed up the underlying memory-bounded search algorithm.
Abstract: This paper investigates the optimal signal detection problem with a particular interest in large-scale multiple-input multiple-output (MIMO) systems. The problem is NP-hard and can be solved optimally by searching the shortest path on the decision tree. Unfortunately, the existing optimal search algorithms often involve prohibitively high complexities, which indicates that they are infeasible in large-scale MIMO systems. To address this issue, we propose a general heuristic search algorithm, namely, hyper-accelerated tree search (HATS) algorithm. The proposed algorithm employs a deep neural network (DNN) to estimate the optimal heuristic, and then use the estimated heuristic to speed up the underlying memory-bounded search algorithm. This idea is inspired by the fact that the underlying heuristic search algorithm reaches the optimal efficiency with the optimal heuristic function. Simulation results show that the proposed algorithm reaches almost the optimal bit error rate (BER) performance in large-scale systems, while the memory size can be bounded. In the meanwhile, it visits nearly the fewest tree nodes. This indicates that the proposed algorithm reaches almost the optimal efficiency in practical scenarios, and thereby it is applicable for large-scale systems. Besides, the code for this paper is available at \url{https://github.com/skypitcher/hats}.

12 citations

References
More filters
Proceedings ArticleDOI
05 Jun 2005
TL;DR: It is argued that the minimum mean square error decision feedback (MMSE-DFE) frontend is instrumental for solving the joint detection and decoding problem in a single search stage and shown that MMSE- DFE filtering allows for using lattice reduction methods to reduce complexity, at the expense of a marginal performance loss, and solving under-determined linear systems.
Abstract: We consider receiver design for coded transmission over linear Gaussian channels. We restrict ourselves to the class of lattice codes and formulate the joint detection and decoding problem as a closest lattice point search (CLPS). Here, a tree search framework for solving the CLPS is adopted. In our framework, the CLPS algorithm decomposes into preprocessing and tree search stages. The role of the preprocessing stage is to expose the tree structure in a form matched to the search stage. Here, it is argued that the minimum mean square error decision feedback (MMSE-DFE) frontend is instrumental for solving the joint detection and decoding problem in a single search stage. It is further shown that MMSE-DFE filtering allows for using lattice reduction methods to reduce complexity, at the expense of a marginal performance loss, and solving under-determined linear systems. For the search stage, we present a generic method, based on the branch and bound (BB) algorithm, and show that it encompasses all existing sphere decoders as special cases. The proposed generic algorithm further allows for an interesting classification of tree search decoders, sheds more light on the structural properties of all known sphere decoders, and inspires the design of more efficient decoders. In particular, an efficient decoding algorithm that resembles the well known Fano sequential decoder is identified. The excellent performance-complexity tradeoff achieved by the proposed MMSE-Fano decoder is established via simulation results and analytical arguments in several MIMO and ISI scenarios.

124 citations


"A* Algorithm Inspired Memory-Effici..." refers background in this paper

  • ...Modifying the sorting criterion such as the use of a biased cost [3] can improve the error and complexity performance of a BFS scheme in memoryconstrained scenarios, yet at the loss of detection optimality....

    [...]

  • ...The cost can be a node’s path metric d [4]–[6], or its biased path metric d − k [3] if this node is in layer k, where > 0 is the bias....

    [...]

Journal ArticleDOI
TL;DR: The modified FRA can be used with the transformation based successive interference cancellation procedure to process a V-BLAST frame that contains the preamble and payload and compute an ordered set of nulling vectors from the estimated channel information.
Abstract: Detection of transmitted symbols in a V-BLAST system using the minimum mean squared error criterion with successive interference cancellation (MMSE-SIC) can provide satisfactory bit error rate performance at the cost of moderate computational complexity. The fast recursive algorithm (FRA), developed by Benesty et al., is one of the well-known implementation algorithms for the MMSE-SIC detector. We modify the FRA to compute an ordered set of nulling vectors from the estimated channel information. Our modified FRA can be used with the transformation based successive interference cancellation procedure to process a V-BLAST frame that contains the preamble and payload. To our knowledge, such implementation of the MMSE-SIC detector requires the lowest complexity to process a frame of preamble and payload.

51 citations


"A* Algorithm Inspired Memory-Effici..." refers methods in this paper

  • ...We compute matrix/vector computations by direct multiplications and accumulations, matrix inverse by the efficient LDL decomposition method [10], and λmin by the power method [11] applied on ( R T R )−1 to obtain its dominant (largest) eigenvalue....

    [...]

Journal Article
TL;DR: QR factorization with sort and Dijkstra’s algorithm for decreasing the computational complexity of the sphere decoder that is used for ML detection of signals on the multi-antenna fading channel is proposed.
Abstract: SUMMARY We propose use of QR factorization with sort and Dijkstra’s algorithm for decreasing the computational complexity of the sphere decoder that is used for ML detection of signals on the multi-antenna fading channel. QR factorization with sort decreases the complexity of searching part of the decoder with small increase in the complexity required for preprocessing part of the decoder. Dijkstra’s algorithm decreases the complexity of searching part of the decoder with increase in the storage complexity. The computer simulation demonstrates that the complexity of the decoder is reduced by the proposed methods significantly.

34 citations

Journal ArticleDOI
TL;DR: This paper proposes a memory-constrained tree search (MCTS) algorithm that bridges the gap between the sphere decoding (SD) and stack algorithms and proposes novel ordering schemes that can be easily embedded in the QR decomposition.
Abstract: Hardware implementations of tree search-based multiple-input multiple-output (MIMO) detection often have limited performance due to large memory requirement or high computational complexity of sophisticated MIMO detection algorithms. In this paper, we propose new tree search-based detection algorithms that achieve maximum-likelihood (ML) performance under any given memory constraints and with reduced computational complexity. To this end, we make two main contributions. First, we propose a memory-constrained tree search (MCTS) algorithm that bridges the gap between the sphere decoding (SD) and stack algorithms. Our MCTS algorithm dynamically adapts to any pre-specified memory constraint and offers a graceful tradeoff between computational complexity and memory requirement while maintaining the ML performance. When the memory size is set as the minimum, our MCTS algorithm is similar to the SD algorithm. As the memory size increases, the average computational complexity of our MCTS algorithm decreases. When the memory size becomes large, our MCTS algorithm is similar to the stack algorithm, having similar average computational complexity but requiring significantly less memory. To further reduce the computational complexity of tree search-based ML detection algorithms, we propose novel ordering schemes that can be easily embedded in the QR decomposition and take into account both the channel matrix and the received signal (noise); simulation results show that our ordering schemes lead to reduced average computational complexity for the SD and MCTS algorithms, and the reduction is significant at low to medium signal-to-noise ratio region.

32 citations


"A* Algorithm Inspired Memory-Effici..." refers methods in this paper

  • ...The optimal detection performance in memory-constrained scenarios is guaranteed in the proposed scheme in [7] that combines the memory-efficient sphere decoder and the computationally-efficient Dijkstra’s algorithm....

    [...]

Proceedings ArticleDOI
01 Jul 1999

13 citations


"A* Algorithm Inspired Memory-Effici..." refers methods in this paper

  • ...numbers of multiplications and additions in the complexity of the power method approximated by 4(k− 1)2+3(k− 1) [11]....

    [...]

  • ...We compute matrix/vector computations by direct multiplications and accumulations, matrix inverse by the efficient LDL decomposition method [10], and λmin by the power method [11] applied on ( R T R )−1 to obtain its dominant (largest) eigenvalue....

    [...]

Frequently Asked Questions (1)
Q1. What are the contributions in this paper?

In this letter, the authors propose modified best-first detection algorithms in which the order of nodes is determined by both the original cost and the estimated future cost associated with each node, as inspired by an improved shortest path algorithm ( A∗ algorithm ).