The system provides a gestural interface to various visually presented elements, presented on a display screen or screens. A gestural vocabulary includes ‘instantaneous’ commands, in which forming one or both hands into the appropriate ‘pose’ results in an immediate, one-time action; and ‘spatial’ commands, in which the operator either refers directly to elements on the screen by way of literal ‘pointing’ gestures or performs navigational maneuvers by way of relative or “offset” gestures. The system contemplates the ability to identify the users hands in the form of a glove or gloves with certain indicia provided thereon, or any suitable means for providing recognizable indicia on a user's hands or body parts. A system of cameras can detect the position, orientation, and movement of the user's hands and translate that information into executable commands.

System and method for gesture based control system

High-performance computing systems are of steadily growing interest to provide new levels of computational capability for an increasing range of applications. The growing use of and dependence on optical interconnects to meet these system's scaling bandwidth demands has given rise to “computercom” as a distinct market segment, alongside the traditional datacom and telecom markets. This paper discusses the trends, requirements, tradeoffs, and potential technologies for this market.

Optical Interconnects for High-Performance Computing

The present invention is directed to a method and apparatus for sending and retrieving location relevant information to a user by selecting and designating a point of interest that is displayed on a graphical user interface and sending the location information associated with that point of interest to a receiver that is also selected using the graphical user interface. The location relevant information may also include mapped routes, waypoints, geo-fenced areas, moving vehicles etc. Updated location relevant information may also be continuously sent to the user while generating updated mapping information on the graphical user interface. The present invention may be practiced by using communication devices such as a personal computer, a personal digital assistance, in-vehicle navigation systems, or a mobile telephone.

Method and apparatus for sending, retrieving and planning location relevant information

Systems and methods are provided for implementing: a rings architecture for communications and data handling systems; an enumeration process for automatically configuring the ring topology; automatic routing of messages through bridges; extending a ring topology to external devices; write-ahead functionality to promote efficiency; wait-till-reset operation resumption; in-vivo scan through rings topology; staggered clocking arrangement; and stray message detection and eradication. Other inventive elements conveyed include: an architectural overview of a packet processor; a programming model for a packet processor; an instruction pipeline for a packet processor; and use of a packet processor as a module on a rings-based architecture. Additional inventive elements conveyed include: an architectural overview of a communications processor; a data path protocol support model for a communications processor; an exemplary network processor employed as the core packet processor for the communications processor; an exemplary rings-based SOC switch fabric architecture; and a variety of quality of support features.

Communications system using rings architecture

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC) Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency

Multi-petascale highly efficient parallel supercomputer

The PERCS system was designed by IBM in response to a DARPA challenge that called for a high-productivity high-performance computing system. A major innovation in the PERCS design is the network that is built using Hub chips that are integrated into the compute nodes. Each Hub chip is about 580 mm$^2$ in size, % uses 45 nm IBM CMOS 12S0 SOI technology with 13 levels of metal, has over 3700 signal I/Os, and is packaged in a module that also contains LGA-attached optical electronic devices. The Hub module implements five types of high-bandwidth interconnects with multiple links that are fully-connected with a high-performance internal crossbar switch. These links provide over 9 Tbits/second of raw bandwidth and are used to construct a two-level direct-connect topology spanning up to tens of thousands of \PS{} chips with high bisection bandwidth and low latency. The Blue Waters System, which is being constructed at NCSA, is an exemplar large-scale PERCS installation. Blue Waters is expected to deliver sustained Pet scale performance over a wide range of applications. The Hub chip supports several high-performance computing protocols (e.g., MPI, RDMA, IP) and also provides a non-coherent system-wide global address space. Collective communication operations such as barriers, reductions, and multi-cast are supported directly in hardware. Multiple routing modes including deterministic as well as hardware-directed random routing are also supported. Finally, the Hub module is capable of operating in the presence of many types of hardware faults and gracefully degrades performance in the presence of lane failures.

/pdf/the-percs-high-performance-interconnect-rx1ovyeryi.pdf

The PERCS High-Performance Interconnect

A multiprocessor data processing system comprising a plurality of processing units, a plurality of caches, that is each affiliated with one of the processing units, and processing logic that, responsive to a receipt of a first system bus response to a coherency operation, causes the requesting processor to execute operations utilizing super-coherent data. The data processing system further includes logic eventually returning to coherent operations with other processing units responsive to an occurrence of a pre-determined condition. The coherency protocol of the data processing system includes a first coherency state that indicates that modification of data within a shared cache line of a second cache of a second processor has been snooped on a system bus of the data processing system. When the cache line is in the first coherency state, subsequent requests for the cache line is issued as a Z 1 read on a system bus and one of two responses are received. If the response to the Z 1 read indicates that the first processor should utilize local data currently available within the cache line, the first coherency state is changed to a second coherency state that indicates to the first processor that subsequent request for the cache line should utilize the data within the local cache and not be issued to the system interconnect. Coherency state transitions to the second coherency state is completed via the coherency protocol of the data processing system. Super-coherent data is provided to the processor from the cache line of the local cache whenever the second coherency state is set for the cache line and a request is received.

High performance symmetric multiprocessing systems via super-coherent data mechanisms

Disclosed is a processor, which reduces issuing of unnecessary barrier operations during instruction processing. The processor comprises an instruction sequencing unit and a load store unit (LSU) that issues a group of memory access requests that precede a barrier instruction in an instruction sequence. The processor also includes a controller, which in response to a determination that all of the memory access requests hit in a cache affiliated with the processor, withholds issuing on an interconnect a barrier operation associated with the barrier instruction. The controller further directs the load store unit to ignore the barrier instruction and complete processing of a next group of memory access requests following the barrier instruction in the instruction sequence without receiving an acknowledgment.

Multi-level multiprocessor speculation mechanism

A method of handling load-and-reserve instructions in a multi-processor computer system wherein the processing units have multi-level caches. Symmetric multi-processor (SMP) computers use cache coherency to ensure the same values for a given memory address are provided to all processors in the system. Load-and-reserve instructions used, for example, in quick read-and-write operations, can become unnecessarily complicated. The present invention provides a method of accessing values in the computer's memory by loading the value from the memory device into all of said caches, and sending a reserve bus operation from a higher-level cache to the next lower-level cache only when the value is to be cast out of the higher cache, and thereafter casting out the value from the higher cache after sending the reserve bus operation. This procedure is preferably used for all caches in a multi-level cache architecture, i.e., when the value is to be cast out of any given cache, a reserve bus operation is sent from the given cache to the next lower-level cache (i.e., the adjacent cache which lies closer to the bus), but the reserve bus operation is not sent to all lower caches. Any attempt by any other processing unit in the computer system to write to an address of the memory device which is associated with the value will then be forwarded to all higher-level caches. The marking of the block as reserved is removed in response to any such attempt to write to the address.

Demand-based larx-reserve protocol for SMP system buses

A method of maintaining cache coherency, by designating one cache that owns a line as a highest point of coherency (HPC) for a particular memory block, and sending a snoop response from the cache indicating that it is currently the HPC for the memory block and can service a request The designation may be performed in response to a particular coherency state assigned to the cache line, or based on the setting of a coherency token bit for the cache line The processing units may be grouped into clusters, while the memory is distributed using memory arrays associated with respective clusters One memory array is designated as the lowest point of coherency (LPC) for the memory block (ie, a fixed assignment) while the cache designated as the HPC is dynamic (ie, changes as different caches gain ownership of the line) An acknowledgement snoop response is sent from the LPC memory array, and a combined response is returned to the requesting device which gives priority to the HPC snoop response over the LPC snoop response

Ravi Kumar Arimilli

Papers

The PERCS High-Performance Interconnect

High performance symmetric multiprocessing systems via super-coherent data mechanisms

Multi-level multiprocessor speculation mechanism

Demand-based larx-reserve protocol for SMP system buses

Multiprocessor system in which a cache serving as a highest point of coherency is indicated by a snoop response