scispace - formally typeset
Search or ask a question

Showing papers on "Program counter published in 1991"


Proceedings ArticleDOI
01 Aug 1991
TL;DR: In this article, a new hardware prefetching scheme based on the prediction of the execution of the instruction stream and associated operand references is proposed. But this scheme requires the use of a reference prediction table and its associated logic.
Abstract: Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blockIookahead technique, or compiler-directed, with insertions of non-blocking prefetch instructions. We introduce a new hardware scheme based on the prediction of the execution of the instruction stream and associated operand references. It consists of a reference prediction table and a look-ahead program counter and its associated logic. With this scheme, data with regular access patterns is preloaded, independently of the stride size, and preloading of data with irregular access patterns is prevented. We evaluate our design through trace driven simulation by comparing it with a pure data cache approach under three different memory access models. Our experiments show that this scheme is very effective for reducing the data access penalty for scientific programs and that is has moderate success for other applications.

458 citations


Patent
07 Mar 1991
TL;DR: In this paper, a computer program of complex instruction set code (CISC) is translated to produce a program of reduced instruction set codes (RISC), and each CISC instruction is translated into a sequence of RISC instructions.
Abstract: A computer program of complex instruction set code (CISC) is translated to produce a program of reduced instruction set code (RISC). Each CISC instruction is translated into a sequence of RISC instructions. The sequence includes in order four groups of instructions. The first group includes instructions that get inputs and place them in temporary storage. The second group includes instructions that operate on the inputs and place results in temporary storage. The third group includes instructions that update memory or register state and are subject to possible exceptions. The fourth group includes instructions that update memory or register state and are free of possible exceptions. When execution of the RISC program is interrupted by an asynchronous event, the RISC instruction being executed at the time of the interrupt is recorded and allowed to complete. The recorded instruction is checked against a bit map to determine whether it is a boundary instruction for the instruction sequence being executed, and if it is, then asynchronous event processing is permitted. If not, then a program counter for the RISC code is aligned with the next backup boundary instruction if any instruction remaining to be executed is subject to a possible exception. If no instruction subject to a possible exception is found, the remaining instructions in the sequence are executed while moving the program counter to the next forward boundary instruction and thereafter permitting asynchronous event processing.

113 citations


Patent
27 Jun 1991
TL;DR: In this article, a conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream.
Abstract: A high-performance CPU of the RISC (reduced instruction set) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. Byte manipulation instructions, included to permit use of previously-established data structures, include the facility for doing in-register byte extract, insert and masking, along with non-aligned load and store instructions. The provision of load/locked and store/conditional instructions permits the implementation of atomic byte writes. By providing a conditional move instruction, many short branches can be eliminated altogether. A conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream. Performance can be speeded up by predicting the target of a branch and prefetching the new instruction based upon this prediction; a branch prediction rule is followed that requires all forward branches to be predicted not-taken and all backward branches (as is common for loops) to be predicted as taken. Another performance improvement makes use of unused bits in the standard-sized instruction to provide a hint of the expected target address for jump and jump to subroutine instructions or the like. The target can thus be prefetched before the actual address has been calculated and placed in a register. In addition, the unused displacement part of the jump instruction can contain a field to define the actual type of jump, i.e., jump, jump to subroutine, return from subroutine, and thus place a predicted target address in a stack to allow prefetching before the instruction has been executed. The processor can employ a variable memory page size, so that the entries in a translation buffer for implementing virtual addressing can be optimally used. A granularity hint is added to the page table entry to define the page size for this entry. An additional feature is the addition of a prefetch instruction which serves to move a block of data to a faster-access cache in the memory hierarchy before the data block is to be used.

90 citations


Patent
15 May 1991
TL;DR: In this article, an error occurring during execution of the second computer program is reported in the context of the first program by aborting execution when the error occurs, and using the second address to indicate that the error is associated with the instruction in the first computer program.
Abstract: In a situation where a first computer program has been translated to obtain a second computer program, an error occurring during execution of the second computer program is reported in the context of the first program. This is done by aborting execution of the second computer program when the error occurs; determining a first address which is the address of the instruction in the second computer program that caused the error; determining from the first address a second address of an instruction in the first computer program from which the instruction in the second computer program was translated; and reporting that the error occurred, and using the second address to indicate that the error is associated with the instruction in the first computer program. Preferably the second address is used to reference traceback and symbolic name information generated when the first computer program is compiled from source code. The traceback information provides the line number of the source code from which the instruction in the first computer program was compiled, and the symbolic name information provides the name of a routine containing the instruction in the first program or a variable used by the instruction.

80 citations


Patent
27 Mar 1991
TL;DR: In this article, a conditional break instruction, BRKcc, is inserted within a looping instruction to conditionally terminate the looping instructions with a minimum number of instruction cycles and a conditional repeat instruction, REPcc, allows a subsequent instruction to be conditionally terminated during execution.
Abstract: A data processor (10) having an instruction fetch unit (12), a decode and control unit (14), and an execution unit 16 performs conditionally executed instructions in hardware. A conditional break instruction, BRKcc, is inserted within a looping instruction to conditionally terminate the looping instruction with a minimum number of instruction cycles. A conditional do-loop instruction, DO#0, prevents the data processor (10) from executing a do-loop with a loop count within a loop counter (24) of zero upon entry. A conditional repeat instruction, REP#0, prevents a repeat instruction from being executed if a loop count is zero upon entry. A conditional repeat instruction, REPcc, allows a subsequent instruction to be conditionally terminated during execution.

37 citations


Patent
Ooi Yasushi1
21 Nov 1991
TL;DR: In this paper, a nesting management mechanism for use in a loop controlling system, comprises a program counter coupled to a program counters bus and incremented each time one instruction is executed, and a loop counter coupled with the program counter bus and set with the number of loops to be executed when a loop execution is executed.
Abstract: A nesting management mechanism for use in a loop controlling system, comprises a program counter coupled to a program counter bus and incremented each time one instruction is executed, and a loop counter coupled with the program counter bus and set with the number of loops to be executed when a loop execution is executed. The loop counter is decremented each time one loop is completed. A loop start address register is coupled to the program counter bus and set with a loop start address when the loop execution is executed, and a loop end address register is coupled to the program counter bus and set with a loop end address when the loop execution is executed. First, second and third independent hardware stacks of a first-in last-out type are provided for the loop counter, the loop start address register, and the loop end address register, respectively, so as to save respective contents of the loop counter, the loop start address register, and the loop end address register at the time of a loop nesting.

34 citations


Patent
30 Aug 1991
TL;DR: In this paper, a vector pipeline instruction is read out from the program memory and is decoded by the decoder, and the program controller stops the program counter and outputs a start signal, and thereafter, controls an operation of the data processor according to the contents of the pipeline instruction.
Abstract: In a program control type processor for executing plural instructions including a vector pipeline instruction including a data processor for executing a pipeline operation, there is provided a program controller including a program memory, a program counter and a decoder, and is further provided an address generator and a data memory. When the vector pipeline instruction is read out from the program memory and is decoded by the decoder, the program controller stops the program counter and outputs a start signal, and thereafter, controls an operation of the data processor according to the contents of the vector pipeline instruction. The data processor executes the pipeline operation for the data outputted from the data memory by being controlled by the program controller, and the program controller detects completion of the pipeline operation performed in response to the vector pipeline instruction a predetermined number of cycles after receiving the end signal, and thereafter, sequentially executes instructions following the vector pipeline instruction.

28 citations


Book ChapterDOI
01 Jun 1991
TL;DR: This work considers a class of abstract machines used for specifying the evaluation of intermediate-level programming languages, and demonstrates how various abstract aspects of these machines can be made concrete, providing for their direct implementation in a low-level microcoded architecture.
Abstract: We consider a class of abstract machines used for specifying the evaluation of intermediate-level programming languages, and demonstrate how various abstract aspects of these machines can be made concrete, providing for their direct implementation in a low-level microcoded architecture We introduce the concept of stored programs and data to abstract machines We demonstrate how machines that dynamically manipulate programs with abstract operations can be translated into machines that only need to read instructions from a fixed program We show how familiar architectural features, such as a program counter, instruction register and a display can all be introduced very naturally into an abstract machine architecture once programs are represented as objects stored in memory This translation lowers the level of the abstract machine, making it less abstract or, equivalently, more concrete The resulting machines bear a close resemblance to a microcoded architecture in which the abstract machine instructions are defined in terms of a small set of micro-instructions that manipulate registers and memory This work provides a further basis for the formal construction and implementation of abstract machines used for implementing programming languages We demonstrate our results on an abstract machine that is a slight variant of the Categorical Abstract Machine

23 citations


01 Jan 1991
TL;DR: This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined architectures and developed a retargetable compiler based on a set of powerful code transformations called Percolation Scheduling that map programs with real-time constraints and/or massive time requirements onto synchronous, parallel, high-performance or semi-custom architectures.
Abstract: Author(s): Potasman, Roni | Abstract: This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined architectures. To explore these trade-offs we developed a retargetable compiler based on a set of powerful code transformations called Percolation Scheduling (PS) that map programs with real-time constraints and/or massive time requirements onto synchronous, parallel, high-performance or semi-custom architectures.High-performance is achieved through extraction of application inherent fine-grain parallelism and the use of a suitable architecture. Exploiting fine-grain parallelism is a critical part of exploiting all of the parallelism available in a given program, particularly since highly irregular forms of parallelism are often not visible at coarser levels and since the use of low-level parallelism has a multiplicative effect on the overall performance.To extract substantial parallelism from both the hardware and the compiler, we use a clean, highly parallel VLIW-like architecture that is synchronous, has multiple functional units and has a single program counter. The use of a hazard-free and homogeneous architecture does not result only in a better VLSI design but also considerably increases the compiler's ability to produce better code. To further enhance parallelism we modified the uni-cycle VLIW model and extended the transformations such that pipelined units that provide extra parallelism are used.Another approach presented is of resource constrained scheduling (RCS). Since the RCS problem is known to be NP-hard, in practice it may be solved only by a heuristic approach. We argue that using the heuristic after extraction of the unlimited-resources schedule may yield better results than if the heuristic has been applied at the beginning of the scheduling process.Through a series of benchmarks we evaluate hardware design trade-offs and show that speed-ups on average of one order of magnitude are feasible with sufficient functional units. However, when resources are limited we show that the number of functional units needed may be optimized for a particular suite of application programs.

23 citations


Patent
Masao Inoue1
13 May 1991
TL;DR: In this article, the value of a program counter is adjusted so that new instructions at subsequent addresses can be supplied to the buffers left empty by the above conversion, and instructions which require multiple cycles to execute can be processed in one cycle.
Abstract: An information processing apparatus wherein a plurality of instructions are checked in an instruction buffer circuit, the plurality of instructions excluding instructions being executed. If there are instructions which can be executed simultaneously, then the instructions are converted to one instruction and executed. By adjusting the value of a program counter so that new instructions at subsequent addresses can be supplied to the buffers left empty by the above conversion, instructions which require multiple cycles to execute can be processed in one cycle. Using one selector control signal, the necessary number of instructions can be supplied from the instruction cache and executed continuously without leaving the instruction buffer empty after multiple instructions are converted to one instruction and executed. Load instructions which do not need to be executed repeatedly in a program containing loops can be canceled.

20 citations


Patent
22 May 1991
TL;DR: In this article, the effective capacity of an instruction cache in a modified HARVARD architecture is enhanced by decoding a current instruction to determine whether it is a program memory data access (PMDA) instruction that requires a data transfer from the program memory when the next instruction is fetched from program memory.
Abstract: The effective capacity of an instruction cache in a digital signal processor with a modified HARVARD architecture is enhanced by decoding a current instruction to be executed to determine whether it is a program memory data access (PMDA) instruction that requires a data transfer from the program memory when the next instruction is fetched from the program memory. If it is a PMDA instruction, the next instruction is loaded into a cache, which then provides the stored instruction each time the PMDA instruction reappears. This relieves a bottleneck resulting from a simultaneous call for both the next instruction, and datum for the current instruction, from the program memory. The cache is used only for an instruction following a PMDA instruction, and can thus have a substantially smaller capacity than previously.

Patent
20 May 1991
TL;DR: In this paper, a patch implementation module is provided for correcting program errors discovered in the ROM after it has been programmed, which can be used to restore the normal address sequence at a point beyond the erroneous program fragment.
Abstract: A mask-programmable microprocessor, of the kind incorporating a ROM containing its program and a program counter for sequentially addressing the ROM to read out the program, is provided with a patch implementation module for use in correcting program errors discovered in the ROM after it has been programmed. The patch implementation module comprises a first address register for containing an address in the ROM just before that of the start of the program fragment containing the error, a second address register containing the address of the start of an additional program fragment which corrects the error and which is entered in a RAM associated with the microprocessor, eg from non-volatile memory on power-up, and a comparator which is responsive to the program counter reaching the address in the first register to load the address in the second register into the program counter. The additional program fragment is therefore sequentially addressed in place of the erroneous one, and is then arranged to restore the microprocessor to its normal address sequence at a point beyond the erroneous program fragment.

Book ChapterDOI
01 Jun 1991
TL;DR: The dataflow model and control-flow model are generally viewed as two extremes of computation models on which a spectrum of architectures are based.
Abstract: The dataflow model and control-flow model are generally viewed as two extremes of computation models on which a spectrum of architectures are based.

Patent
12 Mar 1991
TL;DR: In this article, a method and apparatus for performing fast jump address calculation is disclosed, and a field from the instruction is provided to an adder, on the assumption that it is the displacement value, without actually determining whether it is a displacement value.
Abstract: A method and apparatus for performing a fast jump address calculation is disclosed. A field from the instruction is provided to an adder, on the assumption that it is the displacement value, without actually determining whether it is a displacement value. A fixed instruction length is also provided to the adder, on the assumption that the instruction will have that length. Finally, the current instruction address bits from the program counter are provided to the adder. These are added together to provide a jump address.

Patent
Yasushi Ooi1, Yoshikuni Sato1
04 Sep 1991
TL;DR: In this paper, a processor capable of processing a variable word length instruction has a program counter controlled to indicate the head of an instruction by the value of the program counter, and an adder for summing the length of decoded portions in the variable-word length instruction in accordance with the progress of the instruction decoding.
Abstract: A processor capable of processing a variable word length instruction has a program counter controlled to indicate the head of an instruction by the value of the program counter There are provided an adder for summing the length of decoded portions in the variable word length instruction in accordance with the progress of the instruction decoding, and another adder for adding the length of the decoded instruction portions to the program counter so as to update the program counter Further, there is provided a circuit for calculating an operand effective address by using the value of the program counter in the course of the variable word length instruction decoding Thus, the updating of the program counter and the generation of the effective address are concurrently executed

Patent
19 Sep 1991
TL;DR: In this article, a processor data memory address generator is adapted to receive a control word from a program controller receiving instructions from program memory addressed by an instruction counter and producing a program signal addressed to an arithmetic and logic unit.
Abstract: A processor data memory address generator is adapted to receive a control word from a program controller receiving instructions from a program memory addressed by an instruction counter and producing a program signal addressed to an arithmetic and logic unit. The instruction counter is incremented by a clock signal and reset by the program controller. The control word comprises location information and selection information and the address generator is adapted to produce a data address having a first part comprising bits of said location information and a second part formed by a selected set of bits of the address of the current instruction identified by said selection information.

Proceedings ArticleDOI
18 Nov 1991
TL;DR: A programmable high-performance and high-speed neurocomputer for a large neural network is developed using an application specific IC (ASIC) neurocomputing chip made by CMOS VLSI technology.
Abstract: A programmable high-performance and high-speed neurocomputer for a large neural network is developed using an application specific IC (ASIC) neurocomputing chip made by CMOS VLSI technology. The neurocomputer consists of one master node and multiple slave nodes which are connected by two data paths, a broadcast bus and a ring bus. The neurocomputer was built on one printed circuit board having 50 VLSI chips that offers 1-2 billion connections/s. This computer uses SIMD (single-instruction multiple-data stream) to simplify hardware and operation, and to ease programming. Only the master node has a program counter that controls both master and slave instructions. The same slave instructions are fed to all slave nodes simultaneously. It can execute complicated computations, memory access and memory address control, and data paths control in a single instruction and in a single time step using a pipeline. The neurocomputer processes forward and backward calculation of multilayer perceptron type neural networks, feedback type neural networks, and any other type of programming. >

Patent
20 Mar 1991
TL;DR: In this paper, the authors propose to execute a different instruction in each processor group out of plural processors to which a single instruction is applied, where the secondary instruction can be set up also as a NO OPERATION instruction.
Abstract: PURPOSE: To execute a different instruction in each processor group out of plural processors to which a single instruction is applied. CONSTITUTION: A microcontroller 30 outputs a primary instruction to a bus 31, outputs a coincidence word for specifying a processor zone to an address bus 39 and outputs a signal AMODE for specifying an operation mode to a bus 34. Each processor has a circuit for judging which processor zone includes itself and a circuit for forming a secondary instruction by correcting an inputted primary instruction when a mode signal is an instruction correcting mode. The secondary instruction can be set up also as a NO OPERATION instruction. Since the secondary instruction can be generated from the circuit of each processor, a different instruction can be executed in each processor zone. COPYRIGHT: (C)1992,JPO&Japio

Patent
Jacob K. White1, Jacob K. White1
27 Nov 1991
TL;DR: In this article, a pipeline data processor is simultaneously operable in a pipeline mode, a parallel mode and a vector mode which is a special case of the pipeline mode and each pipeline stage has its own stage program counter 71, 72, 73, 74, 75 and 76.
Abstract: A pipeline data processor is simultaneously operable in a pipeline mode, a parallel mode and a vector mode which is a special case of the pipeline mode. Each pipeline stage has its own stage program counter 71, 72, 73, 74, 75 and 76. A global program counter 34 is incremented in the pipeline mode. The instruction addresses generated in the global program counter 34 are distributed to those pipeline stages which first become available to perform pipelined data processing. Any given pipeline stage may dynamically switch between pipeline mode and a parallel mode in which the stage program counter counts and supplies instruction addresses independently of any other pipeline stage. A vector mode uses pipeline instructions which are repeated to enable any number of the pipeline stages to participate in vector calculations. In the vector mode, one pipeline instruction address is held in the global program counter to be repeatedly supplied to respective first available pipeline stages until the vector calculations are completed.

Patent
14 Aug 1991
TL;DR: In this article, the authors propose a method to facilitate modification of a program by using program altering RAM incorporated in a one chip microcomputer and a nonvolatile memory which is externally connected to the microcomputer, which stores program altering data and capable of serial communication.
Abstract: PURPOSE:To facilitate modification of a program by using program altering RAM incorporated in a one chip microcomputer, and a nonvolatile memory which is externally connected to the microcomputer, which stores program altering data and capable of serial communication. CONSTITUTION:When alteration address data A1 which is set in an address data area 11a coincides with the output content of a program counter 1, an output permission signal is transmitted from a coincidence detection means 13a to a vector table 12a and only the data output of the vector table 12a is transmitted to a program counter value alteration means 14. When the output of the coincidence detection means 13a is simultaneously inputted to an interruption generation circuit 16 through an OR circuit 15, interruption occurs and a vector address 1 from the vector table 12a is inputted to the program counter alteration means 14 by the interruption. The means 14 refers to the content AAAA of the vector address 1 and alters the content of the program counter 1 to AAAA. Thus, the content of a ROM program can easily be altered.

Patent
25 Apr 1991
TL;DR: In this article, a stack operation for saving information necessary to restart a program execution which is stopped by the interruption to a stack memory is performed before start of the interruption operation, whereby an improved processor with less overhead can be provided.
Abstract: An information processor has at least one interface unit by which the processor is coupled to a peripheral equipment. The interface unit can selectively generate either a first mode signal or a second mode signal when the processor performs and interruption operation according to request from the peripheral equipment. When the processor performs the interruption operation in response to the first mode signal, a stack operation for saving information necessary to restart a program execution which is stopped by the interruption to a stack memory is performed before start of the interruption operation. While the processor can perform the interruption operation in response to the second mode signal without the stack operation, whereby an improved processor with less overhead can be provided.

Patent
28 Jun 1991
TL;DR: In this paper, a conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream.
Abstract: A high-performance CPU of the RISC (reduced instruction set) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. Byte manipulation instructions, included to permit use of previously-established data structures, include the facility for doing in-register byte extract, insert and masking, along with non-aligned load and store instructions. The provision of load/locked and store/conditional instructions permits the implementation of atomic byte writes. By providing a conditional move instruction, many short branches can be eliminated altogether. A conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream. Performance can be speeded up by predicting the target of a branch and prefetching the new instruction based upon this prediction; a branch prediction rule is followed that requires, all forward branches to be predicted not-taken and all backward branches (as is common for loops) to be predicted as taken. Another performance improvement makes use of unused bits in the standard-sized instruction to provide a hint of the expected target address for jump and jump to subroutine instructions or the like. The target can thus be prefetched before the actual address has been calculated and placed in a register. In addition, the unused displacement part of the jump instruction can contain a field to define the actual type of jump, i.e., jump, jump to subroutine, return from subroutine, and thus place a predicted target address in a stack to allow prefetching before the instruction has been executed. The processor can employ a variable memory page size, so that the entries in a translation buffer for implementing virtual addressing can be optimally used. A granularity hint is added to the page table entry to define the page size for this entry. An additional feature is the addition of a prefetch instruction which serves to move a block of data to a faster-access cache in the memory hierarchy before the data block is to be used.

Book ChapterDOI
01 Jan 1991
TL;DR: All of the advanced microprocessors discussed in the preceding chapters are von Neumann-type, which means that, notwithstanding the sequential execution by a control-flow processor, it can display various forms of parallelism.
Abstract: All of the advanced microprocessors discussed in the preceding chapters are von Neumann-type. A program is executed in a sequence of ordered operations that a programmer specifies. A program counter points to the memory locations in which these operations and their operands reside, and the processor’s control unit generates the appropriate control signals to activate within the processor those components that eventually will execute a particular operation. Due to this “controlled” flow of instructions and data, this type of processor sometimes is referred to as a control-flow processor. Remember, however, that, notwithstanding the sequential execution by a control-flow processor, it can display various forms of parallelism. Consider, for example, the i860, whose floating-point unit can execute various processes between the adder and multiplier units. Although these processes are sequential in nature, they can be executed in parallel and can transmit data to or receive data from each other. See also the discussion of the transputer in Chapter 6.

Patent
24 Dec 1991
TL;DR: In this paper, a program infinite loop detection signal is detected by saving a program counter at each fixed time interval, comparing the current value with the old value, and deciding a loop when the coincident state of the comparison continues with exclusion of lower rant m bits.
Abstract: PURPOSE:To detect a program infinite loop by saving a program counter at each fixed time, comparing the current value with the old value, and deciding a loop when the coincident state of the comparison continues with exclusion of lower rant m bits. CONSTITUTION:An old program counter 6 is compared with a current program counter 5 at each fixed time interval, and the coincidence of addresses is checked with exclusion of specific lower rant m bits. When the coincidence of addresses is confirmed, a loop counter 8 is counted up. Then a program loop detection signal is outputted when the counter 8 reaches the specific value. Based on this detection signal, the contents of a restart address register 10 are transferred to the counter 5 and the recovery processing is started. Thus a program infinite loop can be detected.

Patent
25 Feb 1991
TL;DR: In this paper, it was shown that it is possible to perform a sequence where M pieces of instructions are repeated by N times as a single block in a pipeline processing system by repeating both sets of the head address of a block held in a program counter when the value of this counter is equal to M.
Abstract: PURPOSE:To perform a sequence where M pieces of instructions are repeated by N times as a single block in a pipeline processing system by repeating both sets of the head address of a block held in a program counter when the value of this counter is equal to M. CONSTITUTION:The value of a program counter 24 is increased one by one every time an instruction for a pipeline process is carried out by one step. When the value of the counter 24 is equal to M, the address held in an address holding register 22 is set again to the counter 24. Then the repeating frequency is held in a sequence frequency register 23. When the repeating frequency is equal to N, the repeating process is stopped and the next process is carried out. Thus it is possible to perform a sequence where M pieces of instructions are repeated by N times as a single block in a pipeline processing system.

Patent
Yuko Ohde1, Hideo Tanaka1, Ichiro Kuroda1
01 May 1991
TL;DR: In this paper, a memory stores a sequence of instructions and a controller transfers an instruction read from the memory into a no-operation instruction when the contents of the counter reaches a predetermined contents.
Abstract: A program control circuit which comprises a register for holding the repetition number of a program operation to be repeated. The control circuit further comprises a counter for receiving the content of the register and adapted to be decremented in response to each execution of the program operation to be repeated. A memory stores a sequence of instructions. A controller transfers an instruction read from the memory without modification in a normal condition and modifies the instruction read from the memory into a no-operation instruction when the contents of the counter reaches a predetermined contents.

Patent
22 Nov 1991
TL;DR: In this paper, a large scale integrated circuit device (LSI-IC) is provided with a program instruction memory part 10, a random number initial value part 40, an output terminal of the logical arithmetic part 30 is connected to an instruction decoder part 43.
Abstract: PURPOSE:To execute the data output whose decoding is difficult by generating irregular data by executing the data conversion of store data by random number data generated by a random number generating circuit part, and executing the encipherment thereof. CONSTITUTION:An LSI (Large scale integrated circuit device) 4 is provided with a program instruction memory part 10, a random number initial value part 40, a random number generating circuit part 20, a random number store memory part 41, and a logical arithmetic part 30, and to the program instruction memory part 10, a program counter 42 is connected, and an output terminal of the logical arithmetic part 30 is connected to an instruction decoder part 43. In such a state, random number data of an intrinsic pattern related in advance to a certain initial value is generated, and store data is converted to data having irregularity being different from that of the store data by this random number data. In such a way, the LSI having a data enciphering circuit whose decoding is difficult is obtained.

Patent
22 Jan 1991
TL;DR: In this article, the first byte of an instruction is represented by a binary decision apparatus comprising a control unit which generates the nary decision apparatus' control signals from an operation code contained in an instruction, a program counter which provides addressing for an external program memory, and a memory buffer register for holding digital instruction data provided by a program memory.
Abstract: A binary decision apparatus comprising a control unit which generates the nary decision apparatus' control signals from an operation code contained in the first byte of an instruction, a program counter which provides addressing for an external program memory, and a memory buffer register for holding digital instruction data provided by the external program memory. External control signals provided to the binary decision apparatus include a single phase system clock, a system reset signal and a wait signal that can be used to single-step the binary decision apparatus. Program instructions are provided from the external program memory to the binary decision apparatus via an eight-bit data bus, while an internal twelve-bit data bus routes digital information between the registers and counters of the binary decision apparatus. The binary decision apparatus of the present invention also includes an input register for receiving and then latching into the register external binary signals, an output register which is a bit or word address register that provides the digital logic output signals for the binary decision apparatus, a flag register in which status bits are stored and counters and registers for performing the counting and other functions/operations of the binary decision apparatus.

Patent
27 Jun 1991
TL;DR: In this paper, a conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream.
Abstract: A high-performance CPU of the RISC (reduced instruction set) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. Byte manipulation instructions, included to permit use of previously-established data structures, include the facility for doing in-register byte extract, insert and masking, along with non-aligned load and store instructions. The provision of load/locked and store/conditional instructions permits the implementation of atomic byte writes. By providing a conditional move instruction, many short branches can be eliminated altogether. A conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream. Performance can be speeded up by predicting the target of a branch and prefetching the new instruction based upon this prediction; a branch prediction rule is followed that requires all forward branches to be predicted not-taken and all backward branches (as is common for loops) to be predicted as taken. Another performance improvement makes use of unused bits in the standardsized instruction to provide a hint of the expected target address for jump and jump to subroutine instructions or the like. The target can thus be prefetched before the actual address has been calculated and placed in a register. In addition, the unused displacement part of the jump instruction can contain a field to define the actual type of jump, i.e., jump, jump to subroutine, return from subroutine, and thus place a predicted target address in a stack to allow prefetching before the instruction has been executed. The processor can employ a variable memory page size, so that the entries in a translation buffer for implementing virtual addressing can be optimally used. A granularity hint is added to the page table entry to define the page size for this entry. An additional feature is the addition of a prefetch instruction which serves to move a block of data to a faster-access cache in the memory hierarchy before the data block is to be used.

Patent
16 Oct 1991
TL;DR: In this paper, a branch instruction is received, the branch destination address corresponding to the output of a program counter 100 is retrieved and outputted by the output through the branch address storing associative memories 611-613.
Abstract: PURPOSE: To increase the data processing speed and to improve the data processing efficiency by using the hardware to carry out a branch instruction. CONSTITUTION: When a branch instruction is received, the branch destination address corresponding to the output of a program counter 100 is retrieved and outputted by the output through the branch address storing associative memories 611-613. At the same time, the increase of the count value of the counter 100 is stopped by the output of an instruction decoder 310. Then the branch addresses outputted from the memories 611-613 are selected based on the branch conditions and applied to the counter 100. The output of the counter 100 is applied to an instruction storing memory 200. COPYRIGHT: (C)1993,JPO&Japio