# Performance Analysis of MatrixVector Multiplication in Hybrid 'MPI + OpenMP'

31 May 2011-International Journal of Computer Applications (Foundation of Computer Science (FCS))-Vol. 22, Iss: 5, pp 22-25

TL;DR: Proposed Hybrid model combines both approaches in the pursuit of reducing the weaknesses in individual, and indicates that the Hybrid approach out performs the MPI and OpenMP approach.

Abstract: g of multiple tasks simultaneously on multiple processors is called Parallel Computing. The parallel program consists of multiple active processes simultaneously solving a given problem. Parallel computers can be roughly classified as Multi-Processor and Multi-Core. In both these classifications the hardware supports parallelism with computer node having multiple processing elements in a single machine, either in single chip pack or on more than one distinct chip respectively. Parallel programming is the ability of program to run on this infrastructure which is still quite difficult and complex task to achieve. Out of many two different approaches used in parallel environment are MPI and OpenMP, each one of them having their own merits and demerits. Hybrid model combines both approaches in the pursuit of reducing the weaknesses in individual. In proposed approach takes a pair of, Matrices produces another matrix by using Matrix-Vector Multiplication Algorithm. The resulting matrix agrees with the result of composition of the linear transformations represented by the two original matrices. This algorithm is implemented in MPI, OpenMP, and Hybrid mode. The algorithm is tested for number of nodes with different number of matrix size. The results indicates that the Hybrid approach out performs the MPI and OpenMP approach.

## Summary (2 min read)

Jump to: [1. INTRODUCTION] – [1.1 MPI:] – [1.2 Open MP:] – [1.3 HYBID:] – [2. IMPLEMENTATION DETAILS] – [3. RESULTS] and [4. CONCLUSION]

### 1. INTRODUCTION

- Matrices are a key tool in linear algebra.
- For a square matrix, the determinant and inverse matrix (when it exists) govern the behavior of solutions to the corresponding system of linear equations, and eigenvalues and eigenvectors provide insight into the geometry of the associated linear transformation.
- The custom supercomputer of yesteryear has given way to commodity-based supercomputing, or what is now called High Performance Computing (HPC).
- They won’t talk much about it because it’s considered a competitive advantage [3].
- The main goal of writing a parallel program is to get better performance over the Serial version.

### 1.1 MPI:

- The generic form of message passing in parallel processing is the Message Passing Interface (MPI), which is used as the medium of communication.
- A standard Message Passing Interface (MPI) is originally designed for writing applications and libraries for distributed memory environments.
- MPI does provide message-passing routines for exchanging all the information needed to allow a single MPI implementation to operate in a heterogeneous environment [1].

### 1.2 Open MP:

- OpenMP is an Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
- The available programming environment on most of the Multi-Core processors will address the thread affinity to core and overheads in OpenMP Programming environment.

### 1.3 HYBID:

- Combining shared-memory and distributed-memory programming models are an old idea [1].
- One wants to exploit the strengths of both models: the efficiency, memory savings, and ease of programming of the shared-memory model and the scalability of the distributed-memory model.
- The idea of using OpenMP [3] threads to exploit the multiple cores per node while using MPI to communicate among the nodes appears obvious.
- Yet one can also use an “MPI everywhere” approach on these architectures, and the data on which approach is better is confusing and inconclusive.
- It appears to be heavily dependent on the hardware, the MPI and OpenMP implementations, and above all on the application and the skill of the application writer.

### 2. IMPLEMENTATION DETAILS

- The Matrix product is the most commonly used type of product of matrices.
- Matrices offer a concise way of representing linear transformations between vector spaces, and matrix multiplication corresponds to the composition of linear transformations [4].
- The matrix product of two matrices can be defined when their entries belong to the same ring, and hence can be added and multiplied [5], and, additionally, the number of the columns of the first matrix matches the number of the rows of the second matrix.
- This definition can be restated by postulating that the matrix product is left and right distributive and the matrix units are multiplied according to the following rule: EikElj = δklEij.
- Where the first factor is the m×n matrix with 1 at the intersection of the ith row and the kth column and zeros elsewhere and the second factor is the p×n matrix with 1 at the intersection of the lth row and the jth column and zeros elsewhere.

### 3. RESULTS

- Performance analysis pure MPI Vs HYBRID (MPI+OpenMP) using matrix multiplication for MPI (1+3) task on dual core and 2 task on each single core same for hybrid model.
- Performance analysis pure MPI VS HYBRID (MPI+OpenMP) using matrix multiplication for MPI (1+3) task on dual core and 2 task on each single core same for hybrid model.
- As seen in the figure 3.1,3.2,3.3 and 3.4 the result obtain from the Hybrid programming is gives better result than that of MPI programming due the load balancing problem of MPI programming which is reduced due to the use of OpenMP threads within MPI communication.

### 4. CONCLUSION

- This paper compares the performance for program by using MPI, OpenMP, and Hybrid (MPI+OpenMP).
- It is observed that the Hybrid mixed mode programming model gives better performance than that of MPI and OpenMP programming model for the number of task and thread assigned to each processor, which is scalable.
- Hence a combination of shared memory and message passing parallelization paradigms within the same application (mixed mode programming) may provide a more efficient Parallelization strategy than pure MPI and OpenMP.

Did you find this useful? Give us your feedback

International Journal of Computer Applications (0975 – 8887)

Volume 22– No.5, May 2011

22

Performance Analysis of Matrix-Vector Multiplication in

Hybrid (MPI + OpenMP)

Vivek N. Waghmare, Sandip V. Kendre and Sanket G. Chordiya

Assistant Professor

Sandip Institute of Tech. & Research Centre, Nashik

Maharashtra (INDIA)

ABSTRACT

Computing of multiple tasks simultaneously on multiple

processors is called Parallel Computing. The parallel program

consists of multiple active processes simultaneously solving a

given problem. Parallel computers can be roughly classified as

Multi-Processor and Multi-Core. In both these classifications

the hardware supports parallelism with computer node having

multiple processing elements in a single machine, either in

single chip pack or on more than one distinct chip respectively.

Parallel programming is the ability of program to run on this

infrastructure which is still quite difficult and complex task to

achieve. Out of many two different approaches used in parallel

environment are MPI and OpenMP, each one of them having

their own merits and demerits. Hybrid model combines both

approaches in the pursuit of reducing the weaknesses in

individual.

In proposed approach takes a pair of, Matrices produces another

matrix by using Matrix-Vector Multiplication Algorithm. The

resulting matrix agrees with the result of composition of the

linear transformations represented by the two original matrices.

This algorithm is implemented in MPI, OpenMP, and Hybrid

mode. The algorithm is tested for number of nodes with

different number of matrix size. The results indicates that the

Hybrid approach out performs the MPI and OpenMP approach.

Keywords: MPI, OpenMP, Hybrid (MPI+OpenMP),

Matrix-Vector Multiplication Algorithm

1. INTRODUCTION

Matrices are a key tool in linear algebra. One use of matrices is

to represent linear transformations, which are higher-

dimensional analogs of linear functions of the form f(x) = cx,

where c is a constant; matrix multiplication corresponds to

composition of linear transformations. Matrices can also keep

track of the coefficients in a system of linear equations [5]. For

a square matrix, the determinant and inverse matrix (when it

exists) govern the behavior of solutions to the corresponding

system of linear equations, and eigenvalues and eigenvectors

provide insight into the geometry of the associated linear

transformation. Matrices find many applications. Physics makes

use of matrices in various domains, for example in geometrical

optics and matrix mechanics; the latter led to studying in more

detail matrices with an infinite number of rows and columns.

Graph theory uses matrices to keep track of distances between

pairs of vertices in a graph. Computer graphics uses matrices to

project 3-dimensional space onto a 2-dimensional screen [4].

Matrix calculus generalizes classical analytical notions such as

derivatives of functions or exponentials to matrices. The latter

is a recurring need in solving ordinary differential equations.

Serialism and dodecaphonism are musical movements of the

20th century that use a square mathematical matrix to determine

the pattern of music intervals.

Mention the word supercomputer to someone and they

automatically think of monstrously complicated machines

solving problems no one really understands. Maybe they think

of flashing lights and some super intelligence that can beat

humans at chess or figure out the meaning of life, the universe,

and everything. Back in the day, this was not an altogether

untrue view of supercomputing. With an entry fee of at least

seven figures, supercomputing was for the serious scientists and

engineers who needed to crunch numbers as fast as possible.

Today we have a different world. The custom supercomputer of

yesteryear has given way to commodity-based supercomputing,

or what is now called High Performance Computing (HPC). In

today’s HPC world, it is not uncommon for the supercomputer

to use the same hardware found in Web servers and even

desktop workstations.

The HPC world is now open to almost everyone because

the cost of entry is at an all-time low. To many organizations,

HPC is now considered an essential part of business success.

Your competition may be using HPC right now. They won’t

talk much about it because it’s considered a competitive

advantage [3]. Of one thing you can be sure, however; they’re

designing new products, optimizing manufacturing and delivery

processes, solving production problems, mining data, and

simulating everything from business process to shipping crates

all in an effort to become more competitive, profitable, and

“green”. HPC may very well be the new secret weapon. The

main goal of writing a parallel program is to get better

performance over the Serial version. With this in mind, there

are several issues that one needs to consider when designing the

parallel code to obtain the best performance possible within the

constraints of the problem being solved.

1.1 MPI:

The generic form of message passing in parallel processing is

the Message Passing Interface (MPI), which is used as the

medium of communication. A standard Message Passing

Interface (MPI) is originally designed for writing applications

and libraries for distributed memory environments.

However, MPI does provide message-passing routines for

exchanging all the information needed to allow a single MPI

implementation to operate in a heterogeneous environment [1].

1.2 Open MP:

OpenMP is an Application Program Interface (API) that may be

used to explicitly direct multi-threaded, shared memory

parallelism. It is a specification for a set of compiler directives,

library routines and environment variables [2]. The available

International Journal of Computer Applications (0975 – 8887)

Volume 22– No.5, May 2011

23

programming environment on most of the Multi-Core

processors will address the thread affinity to core and overheads

in OpenMP Programming environment.

1.3 HYBID:

Combining shared-memory and distributed-memory

programming models are an old idea [1]. One wants to exploit

the strengths of both models: the efficiency, memory savings,

and ease of programming of the shared-memory model and the

scalability of the distributed-memory model. Until recently, the

relevant models, languages, and libraries for shared-memory

and distributed-memory architectures have evolved separately,

with MPI becoming the dominant approach for the distributed

memory, or message-passing, model, and OpenMP [2, 3]

emerging as the dominant “high-level” approach for shared

memory with threads.

The idea of using OpenMP [3] threads to exploit the multiple

cores per node while using MPI to communicate among the

nodes appears obvious. Yet one can also use an “MPI

everywhere” approach on these architectures, and the data on

which approach is better is confusing and inconclusive. It

appears to be heavily dependent on the hardware, the MPI and

OpenMP implementations, and above all on the application and

the skill of the application writer.

2. IMPLEMENTATION DETAILS

The Matrix product is the most commonly used type of product

of matrices. Matrices offer a concise way of representing linear

transformations between vector spaces, and matrix

multiplication corresponds to the composition of linear

transformations [4]. The matrix product of two matrices can be

defined when their entries belong to the same ring, and hence

can be added and multiplied [5], and, additionally, the number

of the columns of the first matrix matches the number of the

rows of the second matrix. The product of an m×p matrix A

with an p×n matrix B is an m×n matrix denoted AB whose

entries are,

P

(AB)

i, j

= Σ A

ik

. B

kj

k=1

Where 1 ≤ i ≤ m is the row index and 1 ≤ j ≤ n is the column

index.

This definition can be restated by postulating that the matrix

product is left and right distributive and the matrix units are

multiplied according to the following rule:

E

ik

E

lj

= δ

k

lE

ij

Where the first factor is the m×n matrix with 1 at the

intersection of the ith row and the kth column and zeros

elsewhere and the second factor is the p×n matrix with 1 at the

intersection of the lth row and the jth column and zeros

elsewhere.

In general, matrix multiplication is not. C More precisely, AB

and BA need not be simultaneously defined; if they are, they

may have different dimensions; and even if A and B are square

matrices of the same order n, so that AB and BA are also

square matrices of order n, if n is greater or equal than 2, AB

need not be equal to BA. For example,

E

11

E

12

= E

12

, where as E

12

E

11

= 0

However, if A and B are both diagonal square matrices of the

same order then AB = BA.

Matrix Multiplication is Associative:

A (BC) = (AB) C

Matrix multiplication is Distributive over

matrix addition:

C (A+C) = AB+AC,

(A+B) C = AC+BC.

Provided that the expression in either side of each identity is

defined.

Matrix product is compatible with scalar

multiplication:

C (AB) = (CA) B = A (CB)

Where C is a scalar (for the second identity to hold, C must

belong to the center of the ground ring this condition is

automatically satisfied if the ground ring is commutative, in

particular, for matrices over a field).

If A and B are both nxn matrices with entries in a

field then the determinant of their product is the

product of their determinants:

det (AB) = det (A) det(B)

In particular, the determinants of AB and BA coincide.

Let U, V, and W be vector spaces over the same field

with certain bases, S: V → W & T: U → V be linear

transformations and ST: U → W be their

composition. Suppose that A, B, and C are the

matrices of T, S, and ST with respect to the given

bases. Then

AB = C

Thus the matrix of the composition (or the product) of linear

transformations is the product of their matrices with respect to

the given bases.

The figure 2.1 to the right illustrates the product of two

matrices A and B, showing how each intersection in, the

product matrix corresponds to a row of A and a column of B.

The size of the output matrix is always the largest possible, i.e.

for each row of A and for each column of B there are always

International Journal of Computer Applications (0975 – 8887)

Volume 22– No.5, May 2011

24

corresponding intersections in the product matrix [6]. The

product matrix AB consists of all combinations of dot products

of rows of A and columns of B.

Figure 2.1 the product of two Matrices A & B.

The values at the intersections marked with circles are:

x

1,2

= (a

1,1

, a

1,2

) . (b

1, 2

, b

1,2

)

= a

1,1

b

1,2

+ a

1,2

b

2,2

x

3,3

= (a

3,1

,a

3,2

) . (b

1,3

, b

2,3

)

= a

3,1

b

1,3

+ a

3,2

, b

2,3

3. RESULTS

Performance analysis pure MPI Vs HYBRID (MPI+OpenMP)

using matrix multiplication for MPI (1+3) task on dual core and

2 task on each single core same for hybrid model. Use 2

number of threads and chunk =50 constant number of node =2,

as shown in table 3.1 and figure 3.1 as follows.

Table 3.1 Performance of MPI time Vs HYBRID time on 2

nodes with matrix multiplication.

Matrix Size

MPI_Time (Sec)

HYBRID_Time(Sec)

100 * 100

0.1053

0.065180

200 * 200

1.60451

1.02532

400 * 400

12.451664

7.295252

600 * 600

27.401632

21.667435

1000 * 1000

81.102446

54.757333

Performance analysis pure MPI VS HYBRID (MPI+OpenMP)

using matrix multiplication for MPI (1+3) task on dual core and

2 task on each single core same for hybrid model. We use 2

number of threads and chunk =50 constant number of node =2,

3, 4, as shown as in Table 3.2, 3.3, 3.4 & Figure 3.2, 3.3, 3.4.

Figure 3.1 Performance of MPI time Vs HYBRID time on 2

nodes with matrix multiplication.

Table 3.2 performance of MPI time Vs HYBRID time on 4

nodes with matrix multiplication.

Matrix Size

MPI_Time (Sec)

HYBRID_Time

(Sec)

1000 * 1000

74.3216

53.6614

2000 * 2000

356.2699

271.5930

4000 * 4000

2130.9338

1697.5930

Figure 3.2 performance of MPI time Vs HYBRID time on 4

nodes with matrix multiplication.

International Journal of Computer Applications (0975 – 8887)

Volume 22– No.5, May 2011

25

Table 3.3 performance of MPI time Vs HYBRID time on 4

nodes with matrix multiplication.

Node_ Number

MPI_Time (Sec)

HYBRID_Time(Sec)

1

451.1239

311.9054

2

367.7262

304.9193

3

362.8631

284.3486

4

356.2699

271.5930

Figure 3.3 performance of MPI time Vs HYBRID time on 4

nodes with matrix multiplication.

Table 3.4 performance of MPI time Vs HYBRID time on 4

node with matrix multiplication.

Node_ Number

MPI_Time (Sec)

HYBRID_Time(Sec)

1

27.9311

20.9770

2

1.60665

7.895560

3

7.98376

6.174830

4

6.56270

4.843306

Figure 3.4 performance of MPI time Vs HYBRID time on 4

node with matrix multiplication

As seen in the figure 3.1,3.2,3.3 and 3.4 the result obtain from

the Hybrid programming is gives better result than that of MPI

programming due the load balancing problem of MPI

programming which is reduced due to the use of OpenMP

threads within MPI communication.

4. CONCLUSION

This paper compares the performance for program by using

MPI, OpenMP, and Hybrid (MPI+OpenMP). It is observed that

the Hybrid mixed mode programming model gives better

performance than that of MPI and OpenMP programming

model for the number of task and thread assigned to each

processor, which is scalable.

Hence a combination of shared memory and message passing

parallelization paradigms within the same application (mixed

mode programming) may provide a more efficient

Parallelization strategy than pure MPI and OpenMP.

5. REFERENCES

[1] MPI, MPI: \A Message-Passing Interface standard"

Message Passing Interface Forum, June 1995.

http://www.mpi-forum.org.

[2] OpenMP, The OpenMP ARB. http://www.OpenMP.org/.

[3] Mixed-mode programming", D. Klepacki, T.J.Watson

Research Center presentations, IBM 1999.

http://www.research.ibm.com/actc/Talks/DavidKlepacki/

MixedMode/htm.

[4] Henry Cohn, Robert Kleinberg, Balazs Szegedy, and Chris

Umans. Group-theoretic Algorithms for Matrix

Multiplication. arXiv:math.GR/0511460. Proceedings of

the 46th Annual Symposium on Foundations of Computer

Science, 23–25 October 2005, Pittsburgh, PA, IEEE

Computer Society, pp. 379–388.

[5] Horn, Roger A.; Johnson, Charles R. (1985), Matrix

Analysis, Cambridge University Press, ISBN 978-0-521-

38632-6.

[6] Ran Raz. On the complexity of matrix product. In

Proceedings of the thirty-fourth annual ACM symposium

on Theory of computing. ACM Press, 2002.

doi:10.1145/509907.509932.

[7] Finite-size errors in quantum many-body simulations of

extended systems", P.R.C. Kent, R.Q. Hood,

A.J.Williamson, R.J. Needs, W.M.C Foulkes, G.

Rajagopal, Phys. Rev. B 59, pp 1917-1929, 1999.24.

[8] P. Lanucara and S. Rovida, “Conjugate-Gradient

algortihms: An MPI-OpenMP implementation on

distributed shared memory systems”, proceeding of the 1st

European Workshop on OpenMP, Lund, Sweden, 1999.

[9] D.K. Tafti, “Computational power balancing”, Help for the

overloaded processor. http://access.ncsa.uiuc.edu/

Features/Load-Balancing/

##### Citations

More filters

••

TL;DR: This methodology achieves higher execution speed than ATLAS state-of-the-art library by fully exploiting the combination of the software and hardware parameters which are considered simultaneously as one problem and not separately, giving a smaller search space and high-quality solutions.

Abstract: In this paper, a new methodology for computing the Dense Matrix Vector Multiplication, for both embedded (processors without SIMD unit) and general purpose processors (single and multi-core processors, with SIMD unit), is presented This methodology achieves higher execution speed than ATLAS state-of-the-art library (speedup from 12 up to 145) This is achieved by fully exploiting the combination of the software (eg, data reuse) and hardware parameters (eg, data cache associativity) which are considered simultaneously as one problem and not separately, giving a smaller search space and high-quality solutions The proposed methodology produces a different schedule for different values of the (i) number of the levels of data cache; (ii) data cache sizes; (iii) data cache associativities; (iv) data cache and main memory latencies; (v) data array layout of the matrix and (vi) number of cores

12 citations

### Cites background from "Performance Analysis of MatrixVecto..."

...Furthermore, [39] shows that the optimal performance can be achieved by combining distributed-memory and sharedmemory programming models (MPI and OpenMP approaches, respectively) instead of MPI only....

[...]

•

TL;DR: For the product of two matrices over real or complex numbers, a lower bound of Ω(m 2 + 1/O(d) was shown in this paper.

Abstract: Our main result is a lower bound of $\Omega(m^2 \log m)$ for the size of any arithmetic circuit for the product of two matrices, over the real or complex numbers, as long as the circuit does not use products with field elements of absolute value larger than 1 (where m × m is the size of each matrix). That is, our lower bound is superlinear in the number of inputs and is applied for circuits that use addition gates, product gates, and products with field elements of absolute value up to 1.
We also prove size-depth tradeoffs for such circuits: We show that if a circuit, as above, is of depth d, then its size is $\Omega(m^{2+ 1/O(d)})$.

7 citations

##### References

More filters

•

[...]

TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.

Abstract: Linear algebra and matrix theory are fundamental tools in mathematical and physical science, as well as fertile fields for research. This new edition of the acclaimed text presents results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrates their importance in a variety of applications. The authors have thoroughly revised, updated, and expanded on the first edition. The book opens with an extended summary of useful concepts and facts and includes numerous new topics and features, such as: - New sections on the singular value and CS decompositions - New applications of the Jordan canonical form - A new section on the Weyr canonical form - Expanded treatments of inverse problems and of block matrices - A central role for the Von Neumann trace theorem - A new appendix with a modern list of canonical forms for a pair of Hermitian matrices and for a symmetric-skew symmetric pair - Expanded index with more than 3,500 entries for easy reference - More than 1,100 problems and exercises, many with hints, to reinforce understanding and develop auxiliary themes such as finite-dimensional quantum systems, the compound and adjugate matrices, and the Loewner ellipsoid - A new appendix provides a collection of problem-solving hints.

23,986 citations

01 Apr 1994

TL;DR: This document contains all the technical features proposed for the interface and the goal of the Message Passing Interface, simply stated, is to develop a widely used standard for writing message-passing programs.

Abstract: The Message Passing Interface Forum (MPIF), with participation from over 40 organizations, has been meeting since November 1992 to discuss and define a set of library standards for message passing MPIF is not sanctioned or supported by any official standards organization The goal of the Message Passing Interface, simply stated, is to develop a widely used standard for writing message-passing programs As such the interface should establish a practical, portable, efficient and flexible standard for message passing , This is the final report, Version 10, of the Message Passing Interface Forum This document contains all the technical features proposed for the interface This copy of the draft was processed by LATEX on April 21, 1994 , Please send comments on MPI to mpi-comments@csutkedu Your comment will be forwarded to MPIF committee members who will attempt to respond

3,181 citations

••

Microsoft

^{1}TL;DR: The group-theoretic approach to fast matrix multiplication introduced by Cohn and Umans is developed, and for the first time it is used to derive algorithms asymptotically faster than the standard algorithm.

Abstract: We further develop the group-theoretic approach to fast matrix multiplication introduced by Cohn and Umans, and for the first time use it to derive algorithms asymptotically faster than the standard algorithm. We describe several families of wreath product groups that achieve matrix multiplication exponent less than 3, the asymptotically fastest of which achieves exponent 2.41. We present two conjectures regarding specific improvements, one combinatorial and the other algebraic. Either one would imply that the exponent of matrix multiplication is 2.

182 citations

### "Performance Analysis of MatrixVecto..." refers background or methods in this paper

...Matrices offer a concise way of representing linear transformations between vector spaces, and matrix multiplication corresponds to the composition of linear transformations [4]....

[...]

...Computer graphics uses matrices to project 3-dimensional space onto a 2-dimensional screen [4]....

[...]

••

TL;DR: Williamson et al. as discussed by the authors introduced the model periodic Coulomb interaction, which greatly reduces the finite-size errors in quantum many-body simulations of extended systems using periodic boundary conditions, and demonstrated the practical application of their techniques with Hartree-Fock and variational and diffusion quantum Monte Carlo calculations for ground and excited-state calculations.

Abstract: Further developments are introduced in the theory of finite-size errors in quantum many-body simulations of extended systems using periodic boundary conditions. We show that our recently introduced model periodic Coulomb interaction [A. J. Williamson et al., Phys. Rev. B 55, R4851 (1997)] can be applied consistently to all Coulomb interactions in the system. The model periodic Coulomb interaction greatly reduces the finite-size errors in quantum many-body simulations. We illustrate the practical application of our techniques with Hartree-Fock and variational and diffusion quantum Monte Carlo calculations for ground- and excited-state calculations. We demonstrate that the finite-size effects in electron promotion and electron addition/subtraction excitation energy calculations are very similar.

98 citations

••

19 May 2002TL;DR: For any c = c(m) &rhoe; 1, a lower bound of &OHgr;(m2 log2c m) is obtained for the size of any arithmetic circuit for the product of two matrices, as long as the circuit doesn't use products with field elements of absolute value larger than c.

Abstract: We prove a lower bound of Ω(m2 log m) for the size of any arithmetic circuit for the product of two matrices, over the real or complex numbers, as long as the circuit doesn't use products with field elements of absolute value larger than 1 (where mxm is the size of each matrix). That is, our lower bound is super-linear in the number of inputs and is applied for circuits that use addition gates, product gates and products with field elements of absolute value up to 1. More generally, for any c = c(m) ρ 1, we obtain a lower bound of Ω(m2 log2c m) for the size of any arithmetic circuit for the product of two matrices (over the real or complex numbers), as long as the circuit doesn't use products with field elements of absolute value larger than c. We also prove size-depth tradeoffs for such circuits.

95 citations