AltiVec extension to PowerPC accelerates media processing

doi:10.1109/40.848475

Home
/
Papers
/
AltiVec extension to PowerPC accelerates media processing

Journal Article•DOI•

AltiVec extension to PowerPC accelerates media processing

K. Diefendorff, Pradeep Dubey, R. Hochsprung, H. Scale

01 Mar 2000-IEEE Micro (IEEE Computer Society Press)-Vol. 20, Iss: 2, pp 85-95

TL;DR: PowerPC's AltiVec speeds not only media processing but also nearly any application in which data parallelism exists, as demonstrated by a cycle-accurate simulation of Motorola's MPC 7400, the heart of Apple G4 systems.

read less

Abstract: There is a clear trend in personal computing toward multimedia-rich applications. These applications will incorporate a wide variety of multimedia technologies, including audio and video compression, 2D image processing, 3D graphics, speech and handwriting recognition, media mining, and narrow/broadband signal processing for communication. In response to this demand, major microprocessor vendors have announced architectural extensions to their general-purpose processors in an effort to improve their multimedia performance. Intel extended IA-32 with MMX and SSE (alias KNI), Sun enhanced Sparc with VIS, Hewlett-Packard added MAX to its PA-RISC architecture, Silicon Graphics extended the MIPS architecture with MDMX, and Digital (now Compaq) added MVI to Alpha. This article describes the most recent, and what we believe to be the most comprehensive, addition to this list: PowerPC's AltiVec, AltiVec speeds not only media processing but also nearly any application in which data parallelism exists, as demonstrated by a cycle-accurate simulation of Motorola's MPC 7400, the heart of Apple G4 systems.

...read moreread less

Citations

PDF

Open Access

More filters

The RISC-V Instruction Set Manual

[...]

Andrew Waterman, Yunsup Lee, David A. Patterson, Krste Asanovi

01 Jan 2014

TL;DR: This draft specification may change before being accepted as standard by the RISC-V Foundation, and it remains possible that implementations made to this draft specification will not conform to the future standard.

...read moreread less

Abstract: Volume II: Privileged Architecture Privileged Architecture Version 1.10 Document Version 1.10 Warning! This draft specification may change before being accepted as standard by the RISC-V Foundation. While the editors intend future changes to this specification to be forward compatible, it remains possible that implementations made to this draft specification will not conform to the future standard.

...read moreread less

583 citations

Journal Article•DOI•

Synergistic Processing in Cell's Multicore Architecture

[...]

Michael K. Gschwind¹, Harm Peter Hofstee¹, Brian Flachs¹, M. Hopkin¹, Y. Watanabe², Takeshi Yamazaki³ - Show less +2 more•Institutions (3)

IBM¹, Toshiba², Sony Computer Entertainment³

01 Mar 2006-IEEE Micro

TL;DR: The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations.

...read moreread less

Abstract: Eight synergistic processor units enable the Cell Broadband Engine's breakthrough performance. The SPU architecture implements a novel, pervasively data-parallel architecture combining scalar and SIMD processing on a wide data path. A large number of SPUs per chip provide high thread-level parallelism. The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations. These design decisions have enabled the Cell BE to deliver unprecedented supercomputer-class compute power for consumer applications

...read moreread less

463 citations

Proceedings Article•DOI•

Conditional random fields for activity recognition

[...]

Douglas L. Vail¹, Manuela Veloso¹, John Lafferty¹•Institutions (1)

Carnegie Mellon University¹

14 May 2007

TL;DR: It is found that the discriminatively trained CRF performs as well as or better than an HMM even when the model features do not violate the independence assumptions of the HMM, and it is confirmed that CRFs are robust against any degradation in performance.

...read moreread less

Abstract: Activity recognition is a key component for creating intelligent, multi-agent systems. Intrinsically, activity recognition is a temporal classification problem. In this paper, we compare two models for temporal classification: hidden Markov models (HMMs), which have long been applied to the activity recognition problem, and conditional random fields (CRFs). CRFs are discriminative models for labeling sequences. They condition on the entire observation sequence, which avoids the need for independence assumptions between observations. Conditioning on the observations vastly expands the set of features that can be incorporated into the model without violating its assumptions. Using data from a simulated robot tag domain, chosen because it is multi-agent and produces complex interactions between observations, we explore the differences in performance between the discriminatively trained CRF and the generative HMM. Additionally, we examine the effect of incorporating features which violate independence assumptions between observations; such features are typically necessary for high classification accuracy. We find that the discriminatively trained CRF performs as well as or better than an HMM even when the model features do not violate the independence assumptions of the HMM. In cases where features depend on observations from many time steps, we confirm that CRFs are robust against any degradation in performance.

...read moreread less

377 citations

Cites methods from "AltiVec extension to PowerPC accele..."

...field heuristic was carefully optimized using a profiler and uses single-instruction-multipledata (SIMD) vector instructions from either the AltiVec [22] or SSE [93] CPU extensions to compute the approximate gains....
[...]

Book Chapter•DOI•

Efficient Rijndael Encryption Implementation with Composite Field Arithmetic

[...]

Atri Rudra, Pradeep Dubey, C. S. Jutla¹, Vijay Kumar, Josyula R. Rao¹, Pankaj Rohatgi¹ - Show less +2 more•Institutions (1)

IBM¹

14 May 2001

TL;DR: This work explores the use of subfield arithmetic for efficient implementations of Galois Field arithmetic especially in the context of the Rijndael block cipher and describes how to select a representation which minimizes the computation cost of the relevant arithmetic.

...read moreread less

Abstract: We explore the use of subfield arithmetic for efficient implementations of Galois Field arithmetic especially in the context of the Rijndael block cipher. Our technique involves mapping field elements to a composite field representation. We describe how to select a representation which minimizes the computation cost of the relevant arithmetic, taking into account the cost of the mapping as well. Our method results in a very compact and fast gate circuit for Rijndael encryption. In conjunction with bit-slicing techniques applied to newly proposed parallelizable modes of operation, our circuit leads to a high-performance software implementation for Rijndael encryption which offers significant speedup compared to previously reported implementations.

...read moreread less

290 citations

Journal Article•DOI•

SODA: A Low-power Architecture For Software Radio

[...]

Yuan Lin¹, Hyunseok Lee¹, Mark Woh¹, Yoav Harel¹, Scott Mahlke¹, Trevor Mudge¹, Chaitali Chakrabarti², Krisztian Flautner - Show less +4 more•Institutions (2)

University of Michigan¹, Arizona State University²

01 May 2006

TL;DR: This paper presents a design study for a fully programmable architecture, SODA, that supports software defined radio - a high-end signal processing application and shows that a four processor design is capable of meeting the throughput requirements of the W-CDMA and 802.11a protocols, while operating within the strict power constraints of a mobile terminal.

...read moreread less

Abstract: The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These implementations are time consuming to design and difficult to verify. A programmable hardware platform capable of supporting software implementations of the physical layer, or software defined radio, has a number of advantages. These include support for multiple protocols, faster time-to-market, higher chip volumes, and support for late implementation changes. The challenge is to achieve this without sacrificing power. In this paper, we present a design study for a fully programmable architecture, SODA, that supports software defined radio a high-end signal processing application. Our design achieves high performance, energy efficiency, and programmability through a combination of features that include single-instruction multiple- data (SIMD) parallelism, and hardware optimized for 16bit computations. The basic processing element is an asymmetric processor consisting of a scalar and SIMD pipeline, and a set of distributed scratchpad memories that are fully managed in software. Results show that a four processor design is capable of meeting the throughput requirements of theW-CDMA and 802.11a protocols, while operating within the strict power constraints of a mobile terminal.

...read moreread less

257 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

Collapse

References

PDF

Open Access

More filters

IEEE Standard for Binary Floating Point Arithmetic

[...]

Ansi Ieee

01 Jan 1985

1,045 citations

Journal Article•DOI•

MMX technology extension to the Intel architecture

[...]

Alexander D. Peleg¹, Uri Weiser¹•Institutions (1)

Intel¹

01 Aug 1996-IEEE Micro

TL;DR: MMX technology extends the Intel architecture to improve the performance of multimedia, communications, and other numeric-intensive applications by introducing data types and instructions to the IA that exploit the parallelism in these applications.

...read moreread less

Abstract: Designed to accelerate multimedia and communications software, MMX technology improves performance by introducing data types and instructions to the IA that exploit the parallelism in these applications. MMX technology extends the Intel architecture (IA) to improve the performance of multimedia, communications, and other numeric-intensive applications. It uses a SIMD (single-instruction, multiple-data) technique to exploit the parallelism inherent in many algorithms, producing full application performance of 1.5 to 2 times faster than the same applications run on the same processor without MMX. The extension also maintains full compatibility with existing IA microprocessors, operating systems, and applications while providing new instructions and data types that applications can use to achieve a higher level of performance on the host CPU.

...read moreread less

552 citations

Journal Article•DOI•

Subword parallelism with MAX-2

[...]

Ruby B. Lee¹•Institutions (1)

Hewlett-Packard¹

01 Aug 1996-IEEE Micro

TL;DR: It is proposed that subword parallelism-parallel computation on lower precision data packed into a word-is an efficient and effective solution for accelerating media processing.

...read moreread less

Abstract: MAX-2 illustrates how a small set of instruction extensions can provide subword parallelism to accelerate media processing and other data-parallel programs. This article proposes that subword parallelism-parallel computation on lower precision data packed into a word-is an efficient and effective solution for accelerating media processing. As an example, it describes MAX-2, a very lean, RISC-like set of media acceleration primitives included in the 64-bit PA-RISC 2.0 architecture. Because MAX-2 strives to be a minimal set of instructions, the article discusses both instructions included and excluded. Several examples illustrate the use of MAX-2 instructions, which provide subword parallelism in a word-oriented general-purpose processor at essentially no incremental cost.

...read moreread less

382 citations

Journal Article•DOI•

VIS speeds new media processing

[...]

Marc Tremblay, J.M. O'Connor, V. Narayanan, Liang He

01 Aug 1996-IEEE Micro

TL;DR: UltraSparc's Visual Instruction Set, described here in detail, accelerates some widely used media-processing algorithms by as much as seven times.

...read moreread less

Abstract: UltraSparc's Visual Instruction Set, described here in detail, accelerates some widely used media-processing algorithms by as much as seven times. Today's new media, increasingly sophisticated 3D graphics environments, videoconferencing, MPEG video playback, 3D visualization, image processing, and so on, demand enhancements to conventional RISC instruction sets, which were not originally designed to handle such applications. The Visual Instruction Set (VIS) is a comprehensive set of RISC-style instructions targeted at accelerating this new media processing.

...read moreread less

266 citations

Journal Article•DOI•

How multimedia workloads will change processor design

[...]

K. Diefendorff¹, Pradeep Dubey²•Institutions (2)

Apple Inc.¹, IBM²

01 Sep 1997-IEEE Computer

TL;DR: The authors predict high-performance, general-purpose processors will incorporate more media processing capabilities, eventually bringing about the demise of specialized media processors, except perhaps, in embedded applications.

...read moreread less

Abstract: Workloads drive architecture design and will change in the next two decades. For high-performance, general-purpose processors, there is a consensus that multimedia will continue to grow in importance. The authors predict these processors will incorporate more media processing capabilities, eventually bringing about the demise of specialized media processors, except perhaps, in embedded applications. These enhanced general-purpose processor capabilities will arise from multimedia applications that require real-time response, continuous-media data types and significant fine-grained data parallelism.

...read moreread less

231 citations