scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Performance Analysis of MatrixVector Multiplication in Hybrid 'MPI + OpenMP'

31 May 2011-International Journal of Computer Applications (Foundation of Computer Science (FCS))-Vol. 22, Iss: 5, pp 22-25
TL;DR: Proposed Hybrid model combines both approaches in the pursuit of reducing the weaknesses in individual, and indicates that the Hybrid approach out performs the MPI and OpenMP approach.
Abstract: g of multiple tasks simultaneously on multiple processors is called Parallel Computing. The parallel program consists of multiple active processes simultaneously solving a given problem. Parallel computers can be roughly classified as Multi-Processor and Multi-Core. In both these classifications the hardware supports parallelism with computer node having multiple processing elements in a single machine, either in single chip pack or on more than one distinct chip respectively. Parallel programming is the ability of program to run on this infrastructure which is still quite difficult and complex task to achieve. Out of many two different approaches used in parallel environment are MPI and OpenMP, each one of them having their own merits and demerits. Hybrid model combines both approaches in the pursuit of reducing the weaknesses in individual. In proposed approach takes a pair of, Matrices produces another matrix by using Matrix-Vector Multiplication Algorithm. The resulting matrix agrees with the result of composition of the linear transformations represented by the two original matrices. This algorithm is implemented in MPI, OpenMP, and Hybrid mode. The algorithm is tested for number of nodes with different number of matrix size. The results indicates that the Hybrid approach out performs the MPI and OpenMP approach.

Summary (2 min read)

1. INTRODUCTION

  • Matrices are a key tool in linear algebra.
  • For a square matrix, the determinant and inverse matrix (when it exists) govern the behavior of solutions to the corresponding system of linear equations, and eigenvalues and eigenvectors provide insight into the geometry of the associated linear transformation.
  • The custom supercomputer of yesteryear has given way to commodity-based supercomputing, or what is now called High Performance Computing (HPC).
  • They won’t talk much about it because it’s considered a competitive advantage [3].
  • The main goal of writing a parallel program is to get better performance over the Serial version.

1.1 MPI:

  • The generic form of message passing in parallel processing is the Message Passing Interface (MPI), which is used as the medium of communication.
  • A standard Message Passing Interface (MPI) is originally designed for writing applications and libraries for distributed memory environments.
  • MPI does provide message-passing routines for exchanging all the information needed to allow a single MPI implementation to operate in a heterogeneous environment [1].

1.2 Open MP:

  • OpenMP is an Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
  • The available programming environment on most of the Multi-Core processors will address the thread affinity to core and overheads in OpenMP Programming environment.

1.3 HYBID:

  • Combining shared-memory and distributed-memory programming models are an old idea [1].
  • One wants to exploit the strengths of both models: the efficiency, memory savings, and ease of programming of the shared-memory model and the scalability of the distributed-memory model.
  • The idea of using OpenMP [3] threads to exploit the multiple cores per node while using MPI to communicate among the nodes appears obvious.
  • Yet one can also use an “MPI everywhere” approach on these architectures, and the data on which approach is better is confusing and inconclusive.
  • It appears to be heavily dependent on the hardware, the MPI and OpenMP implementations, and above all on the application and the skill of the application writer.

2. IMPLEMENTATION DETAILS

  • The Matrix product is the most commonly used type of product of matrices.
  • Matrices offer a concise way of representing linear transformations between vector spaces, and matrix multiplication corresponds to the composition of linear transformations [4].
  • The matrix product of two matrices can be defined when their entries belong to the same ring, and hence can be added and multiplied [5], and, additionally, the number of the columns of the first matrix matches the number of the rows of the second matrix.
  • This definition can be restated by postulating that the matrix product is left and right distributive and the matrix units are multiplied according to the following rule: EikElj = δklEij.
  • Where the first factor is the m×n matrix with 1 at the intersection of the ith row and the kth column and zeros elsewhere and the second factor is the p×n matrix with 1 at the intersection of the lth row and the jth column and zeros elsewhere.

3. RESULTS

  • Performance analysis pure MPI Vs HYBRID (MPI+OpenMP) using matrix multiplication for MPI (1+3) task on dual core and 2 task on each single core same for hybrid model.
  • Performance analysis pure MPI VS HYBRID (MPI+OpenMP) using matrix multiplication for MPI (1+3) task on dual core and 2 task on each single core same for hybrid model.
  • As seen in the figure 3.1,3.2,3.3 and 3.4 the result obtain from the Hybrid programming is gives better result than that of MPI programming due the load balancing problem of MPI programming which is reduced due to the use of OpenMP threads within MPI communication.

4. CONCLUSION

  • This paper compares the performance for program by using MPI, OpenMP, and Hybrid (MPI+OpenMP).
  • It is observed that the Hybrid mixed mode programming model gives better performance than that of MPI and OpenMP programming model for the number of task and thread assigned to each processor, which is scalable.
  • Hence a combination of shared memory and message passing parallelization paradigms within the same application (mixed mode programming) may provide a more efficient Parallelization strategy than pure MPI and OpenMP.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

International Journal of Computer Applications (0975 8887)
Volume 22 No.5, May 2011
22
Performance Analysis of Matrix-Vector Multiplication in
Hybrid (MPI + OpenMP)
Vivek N. Waghmare, Sandip V. Kendre and Sanket G. Chordiya
Assistant Professor
Sandip Institute of Tech. & Research Centre, Nashik
Maharashtra (INDIA)
ABSTRACT
Computing of multiple tasks simultaneously on multiple
processors is called Parallel Computing. The parallel program
consists of multiple active processes simultaneously solving a
given problem. Parallel computers can be roughly classified as
Multi-Processor and Multi-Core. In both these classifications
the hardware supports parallelism with computer node having
multiple processing elements in a single machine, either in
single chip pack or on more than one distinct chip respectively.
Parallel programming is the ability of program to run on this
infrastructure which is still quite difficult and complex task to
achieve. Out of many two different approaches used in parallel
environment are MPI and OpenMP, each one of them having
their own merits and demerits. Hybrid model combines both
approaches in the pursuit of reducing the weaknesses in
individual.
In proposed approach takes a pair of, Matrices produces another
matrix by using Matrix-Vector Multiplication Algorithm. The
resulting matrix agrees with the result of composition of the
linear transformations represented by the two original matrices.
This algorithm is implemented in MPI, OpenMP, and Hybrid
mode. The algorithm is tested for number of nodes with
different number of matrix size. The results indicates that the
Hybrid approach out performs the MPI and OpenMP approach.
Keywords: MPI, OpenMP, Hybrid (MPI+OpenMP),
Matrix-Vector Multiplication Algorithm
1. INTRODUCTION
Matrices are a key tool in linear algebra. One use of matrices is
to represent linear transformations, which are higher-
dimensional analogs of linear functions of the form f(x) = cx,
where c is a constant; matrix multiplication corresponds to
composition of linear transformations. Matrices can also keep
track of the coefficients in a system of linear equations [5]. For
a square matrix, the determinant and inverse matrix (when it
exists) govern the behavior of solutions to the corresponding
system of linear equations, and eigenvalues and eigenvectors
provide insight into the geometry of the associated linear
transformation. Matrices find many applications. Physics makes
use of matrices in various domains, for example in geometrical
optics and matrix mechanics; the latter led to studying in more
detail matrices with an infinite number of rows and columns.
Graph theory uses matrices to keep track of distances between
pairs of vertices in a graph. Computer graphics uses matrices to
project 3-dimensional space onto a 2-dimensional screen [4].
Matrix calculus generalizes classical analytical notions such as
derivatives of functions or exponentials to matrices. The latter
is a recurring need in solving ordinary differential equations.
Serialism and dodecaphonism are musical movements of the
20th century that use a square mathematical matrix to determine
the pattern of music intervals.
Mention the word supercomputer to someone and they
automatically think of monstrously complicated machines
solving problems no one really understands. Maybe they think
of flashing lights and some super intelligence that can beat
humans at chess or figure out the meaning of life, the universe,
and everything. Back in the day, this was not an altogether
untrue view of supercomputing. With an entry fee of at least
seven figures, supercomputing was for the serious scientists and
engineers who needed to crunch numbers as fast as possible.
Today we have a different world. The custom supercomputer of
yesteryear has given way to commodity-based supercomputing,
or what is now called High Performance Computing (HPC). In
today’s HPC world, it is not uncommon for the supercomputer
to use the same hardware found in Web servers and even
desktop workstations.
The HPC world is now open to almost everyone because
the cost of entry is at an all-time low. To many organizations,
HPC is now considered an essential part of business success.
Your competition may be using HPC right now. They won’t
talk much about it because it’s considered a competitive
advantage [3]. Of one thing you can be sure, however; they’re
designing new products, optimizing manufacturing and delivery
processes, solving production problems, mining data, and
simulating everything from business process to shipping crates
all in an effort to become more competitive, profitable, and
“green”. HPC may very well be the new secret weapon. The
main goal of writing a parallel program is to get better
performance over the Serial version. With this in mind, there
are several issues that one needs to consider when designing the
parallel code to obtain the best performance possible within the
constraints of the problem being solved.
1.1 MPI:
The generic form of message passing in parallel processing is
the Message Passing Interface (MPI), which is used as the
medium of communication. A standard Message Passing
Interface (MPI) is originally designed for writing applications
and libraries for distributed memory environments.
However, MPI does provide message-passing routines for
exchanging all the information needed to allow a single MPI
implementation to operate in a heterogeneous environment [1].
1.2 Open MP:
OpenMP is an Application Program Interface (API) that may be
used to explicitly direct multi-threaded, shared memory
parallelism. It is a specification for a set of compiler directives,
library routines and environment variables [2]. The available

International Journal of Computer Applications (0975 8887)
Volume 22 No.5, May 2011
23
programming environment on most of the Multi-Core
processors will address the thread affinity to core and overheads
in OpenMP Programming environment.
1.3 HYBID:
Combining shared-memory and distributed-memory
programming models are an old idea [1]. One wants to exploit
the strengths of both models: the efficiency, memory savings,
and ease of programming of the shared-memory model and the
scalability of the distributed-memory model. Until recently, the
relevant models, languages, and libraries for shared-memory
and distributed-memory architectures have evolved separately,
with MPI becoming the dominant approach for the distributed
memory, or message-passing, model, and OpenMP [2, 3]
emerging as the dominant “high-level” approach for shared
memory with threads.
The idea of using OpenMP [3] threads to exploit the multiple
cores per node while using MPI to communicate among the
nodes appears obvious. Yet one can also use an “MPI
everywhere” approach on these architectures, and the data on
which approach is better is confusing and inconclusive. It
appears to be heavily dependent on the hardware, the MPI and
OpenMP implementations, and above all on the application and
the skill of the application writer.
2. IMPLEMENTATION DETAILS
The Matrix product is the most commonly used type of product
of matrices. Matrices offer a concise way of representing linear
transformations between vector spaces, and matrix
multiplication corresponds to the composition of linear
transformations [4]. The matrix product of two matrices can be
defined when their entries belong to the same ring, and hence
can be added and multiplied [5], and, additionally, the number
of the columns of the first matrix matches the number of the
rows of the second matrix. The product of an m×p matrix A
with an p×n matrix B is an m×n matrix denoted AB whose
entries are,
P
(AB)
i, j
= Σ A
ik
. B
kj
k=1
Where 1 i m is the row index and 1 j n is the column
index.
This definition can be restated by postulating that the matrix
product is left and right distributive and the matrix units are
multiplied according to the following rule:
E
ik
E
lj
= δ
k
lE
ij
Where the first factor is the m×n matrix with 1 at the
intersection of the ith row and the kth column and zeros
elsewhere and the second factor is the p×n matrix with 1 at the
intersection of the lth row and the jth column and zeros
elsewhere.
In general, matrix multiplication is not. C More precisely, AB
and BA need not be simultaneously defined; if they are, they
may have different dimensions; and even if A and B are square
matrices of the same order n, so that AB and BA are also
square matrices of order n, if n is greater or equal than 2, AB
need not be equal to BA. For example,
E
11
E
12
= E
12
, where as E
12
E
11
= 0
However, if A and B are both diagonal square matrices of the
same order then AB = BA.
Matrix Multiplication is Associative:
A (BC) = (AB) C
Matrix multiplication is Distributive over
matrix addition:
C (A+C) = AB+AC,
(A+B) C = AC+BC.
Provided that the expression in either side of each identity is
defined.
Matrix product is compatible with scalar
multiplication:
C (AB) = (CA) B = A (CB)
Where C is a scalar (for the second identity to hold, C must
belong to the center of the ground ring this condition is
automatically satisfied if the ground ring is commutative, in
particular, for matrices over a field).
If A and B are both nxn matrices with entries in a
field then the determinant of their product is the
product of their determinants:
det (AB) = det (A) det(B)
In particular, the determinants of AB and BA coincide.
Let U, V, and W be vector spaces over the same field
with certain bases, S: V W & T: U V be linear
transformations and ST: U W be their
composition. Suppose that A, B, and C are the
matrices of T, S, and ST with respect to the given
bases. Then
AB = C
Thus the matrix of the composition (or the product) of linear
transformations is the product of their matrices with respect to
the given bases.
The figure 2.1 to the right illustrates the product of two
matrices A and B, showing how each intersection in, the
product matrix corresponds to a row of A and a column of B.
The size of the output matrix is always the largest possible, i.e.
for each row of A and for each column of B there are always

International Journal of Computer Applications (0975 8887)
Volume 22 No.5, May 2011
24
corresponding intersections in the product matrix [6]. The
product matrix AB consists of all combinations of dot products
of rows of A and columns of B.
Figure 2.1 the product of two Matrices A & B.
The values at the intersections marked with circles are:
x
1,2
= (a
1,1
, a
1,2
) . (b
1, 2
, b
1,2
)
= a
1,1
b
1,2
+ a
1,2
b
2,2
x
3,3
= (a
3,1
,a
3,2
) . (b
1,3
, b
2,3
)
= a
3,1
b
1,3
+ a
3,2
, b
2,3
3. RESULTS
Performance analysis pure MPI Vs HYBRID (MPI+OpenMP)
using matrix multiplication for MPI (1+3) task on dual core and
2 task on each single core same for hybrid model. Use 2
number of threads and chunk =50 constant number of node =2,
as shown in table 3.1 and figure 3.1 as follows.
Table 3.1 Performance of MPI time Vs HYBRID time on 2
nodes with matrix multiplication.
Matrix Size
MPI_Time (Sec)
HYBRID_Time(Sec)
100 * 100
0.1053
0.065180
200 * 200
1.60451
1.02532
400 * 400
12.451664
7.295252
600 * 600
27.401632
21.667435
1000 * 1000
81.102446
54.757333
Performance analysis pure MPI VS HYBRID (MPI+OpenMP)
using matrix multiplication for MPI (1+3) task on dual core and
2 task on each single core same for hybrid model. We use 2
number of threads and chunk =50 constant number of node =2,
3, 4, as shown as in Table 3.2, 3.3, 3.4 & Figure 3.2, 3.3, 3.4.
Figure 3.1 Performance of MPI time Vs HYBRID time on 2
nodes with matrix multiplication.
Table 3.2 performance of MPI time Vs HYBRID time on 4
nodes with matrix multiplication.
Matrix Size
MPI_Time (Sec)
HYBRID_Time
(Sec)
1000 * 1000
74.3216
53.6614
2000 * 2000
356.2699
271.5930
4000 * 4000
2130.9338
1697.5930
Figure 3.2 performance of MPI time Vs HYBRID time on 4
nodes with matrix multiplication.

International Journal of Computer Applications (0975 8887)
Volume 22 No.5, May 2011
25
Table 3.3 performance of MPI time Vs HYBRID time on 4
nodes with matrix multiplication.
Node_ Number
MPI_Time (Sec)
HYBRID_Time(Sec)
1
451.1239
311.9054
2
367.7262
304.9193
3
362.8631
284.3486
4
356.2699
271.5930
Figure 3.3 performance of MPI time Vs HYBRID time on 4
nodes with matrix multiplication.
Table 3.4 performance of MPI time Vs HYBRID time on 4
node with matrix multiplication.
Node_ Number
MPI_Time (Sec)
HYBRID_Time(Sec)
1
27.9311
20.9770
2
1.60665
7.895560
3
7.98376
6.174830
4
6.56270
4.843306
Figure 3.4 performance of MPI time Vs HYBRID time on 4
node with matrix multiplication
As seen in the figure 3.1,3.2,3.3 and 3.4 the result obtain from
the Hybrid programming is gives better result than that of MPI
programming due the load balancing problem of MPI
programming which is reduced due to the use of OpenMP
threads within MPI communication.
4. CONCLUSION
This paper compares the performance for program by using
MPI, OpenMP, and Hybrid (MPI+OpenMP). It is observed that
the Hybrid mixed mode programming model gives better
performance than that of MPI and OpenMP programming
model for the number of task and thread assigned to each
processor, which is scalable.
Hence a combination of shared memory and message passing
parallelization paradigms within the same application (mixed
mode programming) may provide a more efficient
Parallelization strategy than pure MPI and OpenMP.
5. REFERENCES
[1] MPI, MPI: \A Message-Passing Interface standard"
Message Passing Interface Forum, June 1995.
http://www.mpi-forum.org.
[2] OpenMP, The OpenMP ARB. http://www.OpenMP.org/.
[3] Mixed-mode programming", D. Klepacki, T.J.Watson
Research Center presentations, IBM 1999.
http://www.research.ibm.com/actc/Talks/DavidKlepacki/
MixedMode/htm.
[4] Henry Cohn, Robert Kleinberg, Balazs Szegedy, and Chris
Umans. Group-theoretic Algorithms for Matrix
Multiplication. arXiv:math.GR/0511460. Proceedings of
the 46th Annual Symposium on Foundations of Computer
Science, 2325 October 2005, Pittsburgh, PA, IEEE
Computer Society, pp. 379388.
[5] Horn, Roger A.; Johnson, Charles R. (1985), Matrix
Analysis, Cambridge University Press, ISBN 978-0-521-
38632-6.
[6] Ran Raz. On the complexity of matrix product. In
Proceedings of the thirty-fourth annual ACM symposium
on Theory of computing. ACM Press, 2002.
doi:10.1145/509907.509932.
[7] Finite-size errors in quantum many-body simulations of
extended systems", P.R.C. Kent, R.Q. Hood,
A.J.Williamson, R.J. Needs, W.M.C Foulkes, G.
Rajagopal, Phys. Rev. B 59, pp 1917-1929, 1999.24.
[8] P. Lanucara and S. Rovida, Conjugate-Gradient
algortihms: An MPI-OpenMP implementation on
distributed shared memory systems”, proceeding of the 1st
European Workshop on OpenMP, Lund, Sweden, 1999.
[9] D.K. Tafti, Computational power balancing”, Help for the
overloaded processor. http://access.ncsa.uiuc.edu/
Features/Load-Balancing/
Citations
More filters
Journal ArticleDOI
TL;DR: This methodology achieves higher execution speed than ATLAS state-of-the-art library by fully exploiting the combination of the software and hardware parameters which are considered simultaneously as one problem and not separately, giving a smaller search space and high-quality solutions.
Abstract: In this paper, a new methodology for computing the Dense Matrix Vector Multiplication, for both embedded (processors without SIMD unit) and general purpose processors (single and multi-core processors, with SIMD unit), is presented This methodology achieves higher execution speed than ATLAS state-of-the-art library (speedup from 12 up to 145) This is achieved by fully exploiting the combination of the software (eg, data reuse) and hardware parameters (eg, data cache associativity) which are considered simultaneously as one problem and not separately, giving a smaller search space and high-quality solutions The proposed methodology produces a different schedule for different values of the (i) number of the levels of data cache; (ii) data cache sizes; (iii) data cache associativities; (iv) data cache and main memory latencies; (v) data array layout of the matrix and (vi) number of cores

12 citations


Cites background from "Performance Analysis of MatrixVecto..."

  • ...Furthermore, [39] shows that the optimal performance can be achieved by combining distributed-memory and sharedmemory programming models (MPI and OpenMP approaches, respectively) instead of MPI only....

    [...]

Journal Article
TL;DR: For the product of two matrices over real or complex numbers, a lower bound of Ω(m 2 + 1/O(d) was shown in this paper.
Abstract: Our main result is a lower bound of $\Omega(m^2 \log m)$ for the size of any arithmetic circuit for the product of two matrices, over the real or complex numbers, as long as the circuit does not use products with field elements of absolute value larger than 1 (where m × m is the size of each matrix). That is, our lower bound is superlinear in the number of inputs and is applied for circuits that use addition gates, product gates, and products with field elements of absolute value up to 1. We also prove size-depth tradeoffs for such circuits: We show that if a circuit, as above, is of depth d, then its size is $\Omega(m^{2+ 1/O(d)})$.

7 citations

References
More filters
Book
01 Jan 1985
TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.
Abstract: Linear algebra and matrix theory are fundamental tools in mathematical and physical science, as well as fertile fields for research. This new edition of the acclaimed text presents results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrates their importance in a variety of applications. The authors have thoroughly revised, updated, and expanded on the first edition. The book opens with an extended summary of useful concepts and facts and includes numerous new topics and features, such as: - New sections on the singular value and CS decompositions - New applications of the Jordan canonical form - A new section on the Weyr canonical form - Expanded treatments of inverse problems and of block matrices - A central role for the Von Neumann trace theorem - A new appendix with a modern list of canonical forms for a pair of Hermitian matrices and for a symmetric-skew symmetric pair - Expanded index with more than 3,500 entries for easy reference - More than 1,100 problems and exercises, many with hints, to reinforce understanding and develop auxiliary themes such as finite-dimensional quantum systems, the compound and adjugate matrices, and the Loewner ellipsoid - A new appendix provides a collection of problem-solving hints.

23,986 citations

01 Apr 1994
TL;DR: This document contains all the technical features proposed for the interface and the goal of the Message Passing Interface, simply stated, is to develop a widely used standard for writing message-passing programs.
Abstract: The Message Passing Interface Forum (MPIF), with participation from over 40 organizations, has been meeting since November 1992 to discuss and define a set of library standards for message passing MPIF is not sanctioned or supported by any official standards organization The goal of the Message Passing Interface, simply stated, is to develop a widely used standard for writing message-passing programs As such the interface should establish a practical, portable, efficient and flexible standard for message passing , This is the final report, Version 10, of the Message Passing Interface Forum This document contains all the technical features proposed for the interface This copy of the draft was processed by LATEX on April 21, 1994 , Please send comments on MPI to mpi-comments@csutkedu Your comment will be forwarded to MPIF committee members who will attempt to respond

3,181 citations

Proceedings ArticleDOI
23 Oct 2005
TL;DR: The group-theoretic approach to fast matrix multiplication introduced by Cohn and Umans is developed, and for the first time it is used to derive algorithms asymptotically faster than the standard algorithm.
Abstract: We further develop the group-theoretic approach to fast matrix multiplication introduced by Cohn and Umans, and for the first time use it to derive algorithms asymptotically faster than the standard algorithm. We describe several families of wreath product groups that achieve matrix multiplication exponent less than 3, the asymptotically fastest of which achieves exponent 2.41. We present two conjectures regarding specific improvements, one combinatorial and the other algebraic. Either one would imply that the exponent of matrix multiplication is 2.

182 citations


"Performance Analysis of MatrixVecto..." refers background or methods in this paper

  • ...Matrices offer a concise way of representing linear transformations between vector spaces, and matrix multiplication corresponds to the composition of linear transformations [4]....

    [...]

  • ...Computer graphics uses matrices to project 3-dimensional space onto a 2-dimensional screen [4]....

    [...]

Journal ArticleDOI
TL;DR: Williamson et al. as discussed by the authors introduced the model periodic Coulomb interaction, which greatly reduces the finite-size errors in quantum many-body simulations of extended systems using periodic boundary conditions, and demonstrated the practical application of their techniques with Hartree-Fock and variational and diffusion quantum Monte Carlo calculations for ground and excited-state calculations.
Abstract: Further developments are introduced in the theory of finite-size errors in quantum many-body simulations of extended systems using periodic boundary conditions. We show that our recently introduced model periodic Coulomb interaction [A. J. Williamson et al., Phys. Rev. B 55, R4851 (1997)] can be applied consistently to all Coulomb interactions in the system. The model periodic Coulomb interaction greatly reduces the finite-size errors in quantum many-body simulations. We illustrate the practical application of our techniques with Hartree-Fock and variational and diffusion quantum Monte Carlo calculations for ground- and excited-state calculations. We demonstrate that the finite-size effects in electron promotion and electron addition/subtraction excitation energy calculations are very similar.

98 citations

Proceedings ArticleDOI
19 May 2002
TL;DR: For any c = c(m) &rhoe; 1, a lower bound of &OHgr;(m2 log2c m) is obtained for the size of any arithmetic circuit for the product of two matrices, as long as the circuit doesn't use products with field elements of absolute value larger than c.
Abstract: We prove a lower bound of Ω(m2 log m) for the size of any arithmetic circuit for the product of two matrices, over the real or complex numbers, as long as the circuit doesn't use products with field elements of absolute value larger than 1 (where mxm is the size of each matrix). That is, our lower bound is super-linear in the number of inputs and is applied for circuits that use addition gates, product gates and products with field elements of absolute value up to 1. More generally, for any c = c(m) ρ 1, we obtain a lower bound of Ω(m2 log2c m) for the size of any arithmetic circuit for the product of two matrices (over the real or complex numbers), as long as the circuit doesn't use products with field elements of absolute value larger than c. We also prove size-depth tradeoffs for such circuits.

95 citations

Frequently Asked Questions (1)
Q1. What are the contributions in "Performance analysis of matrix-vector multiplication in hybrid (mpi + openmp)" ?

In this paper, a matrix multiplication algorithm is proposed to reduce the weaknesses of the serial version of MPI and OpenMP.