Please note: Do It Yourself articles and guides are intended for technically advanced users. Please review important cautionary information at the end of this page. Republished articles presented in the Do It Yourself section do not necessarily reflect the opinions or positions of AMD.
1GHz HyperTransport™ Technology: A Milestone in Speed
“Texpert”

When the AMD Opteron™ processor emerged in 2003 – closely followed by the AMD Athlon™ 64 and AMD Athlon 64 FX processor families – it featured a number of innovations, including an integrated memory controller, large cache repositories, and 64-bit extensions. The new processor also featured HyperTransport™ technology, an optimized board-level architecture also incorporated into products by industry leaders such as NVIDIA, Broadcom, Sandcraft, Altera, Teradyne, and Dolphin Technology.
HyperTransport™ Technology: A Reaction to the Need for Speed
The Right Tool for the Job
The Inner Workings of HyperTransport Technology
Enabling Low Latency: Going Parallel
AMD64 and HyperTransport Technology
Making the Connection
Conclusion
HyperTransport™ Technology: A Reaction to the Need for Speed

It is no secret that enthusiasts drive the creation of new standards and technologies. The development of HyperTransport technology grew out of their fear that I/O buses were inadequate to emerging bandwidth demands at a time when most board-level designs centered on shared, parallel, multi-drop buses.
According to the “old way,” a Northbridge device – connected to a Southbridge – enabled communication over a fast processor bus between the CPU, system memory, and a graphics adapter. A slower Southbridge device managed several I/O interfaces, operating through the Northbridge over another proprietary connection*.
AMD began developing HyperTransport technology in 1997 to provide more bandwidth, lower latency, and lower pin counts to help contend with power consumption and the thermal constraints of an expanding array of form-factors. Eight other prominent companies joined the HyperTransport Technology Consortium, a non-profit organization responsible for managing the royalty-free I/O standard.
*Peripheral Component Interconnect (PCI) has long been the bus of choice for board-to-board and chip-to-chip interoperability. With time, PCI’s usefulness has eroded. In video editing, for example, high-definition content is too intensive for the Accelerated Graphics Port (AGP) bus adaptation of PCI. Ever-faster processors compounded these issues, quickly outstripping their buses’ ability to keep up.
The Right Tool for the Job

HyperTransport™ technology is one among several technologies aimed at improving I/O, including RapidIO and PCI Express (PCIE). Each is intended for slightly different applications.
Consider a single PC, with its components and subsystems – its internal I/O buses, which enable communication, are “inside-the-box” technologies. Inside-the box latency is a significant problem, since the processor can use data faster than the bus can deliver it. (When you deal with multiple networked systems, you deal with “outside-the-box” links, which are less sensitive to component latency.) Low latencies keep processors fed with data.
- RapidIO includes both a serial and a parallel I/O specification – the parallel spec is similar to HyperTransport technology, but suffers from larger packet overhead and does not offer full transparency for PCI
- PCIE employs a point-to-point serial topology to achieve higher throughput than PCI – but while acceptable between board-level devices, PCIE’s latency levels do not facilitate great communication between multiple processors
- This is where HyperTransport technology comes in – it “naturally” replaces proprietary, shared, multiplexed bus technologies by offering dedicated bandwidth between point-to-point links, while being fully compatible with PCI
The Inner Workings of HyperTransport™ Technology

HyperTransport technology has several advantages over other I/O technologies, particularly in chip-to-chip applications.
Its point-to-point parallel topology employs dual unidirectional links – one upstream, another downstream – to carry standard load/store data and HyperTransport technology packet information. Paths are 2-, 4-, 8-, 16-, or 32-bits wide, depending on device-specific bandwidth requirements. It achieves high data rates by using an enhanced, low-swing, 1.2V Low Voltage Differential Signaling (LVDS) scheme that needs fewer sideband signals than a multiplexed bus. Therefore, it employs fewer pins and wires, reducing cost and power requirements. Because LVDS cancels magnetic fields, it radiates less noise than a single-ended signal and runs at much higher frequencies.
HyperTransport technology devices are organized in a daisy chain, which starts with a required “host” device, terminates with a “cave” device, and may include “bridges” – which allow HyperTransport technology links to interface with other interconnect technologies, such as PCIE – and “tunnels” – which allow the link to pass from one device to the next. A processor usually serves as the host, communicating with up to 32 HyperTransport technology devices on a single chain.
Consider, for example, AMD’s 8151 AGP 3.0 Graphics Tunnel. One side boasts 16-bit input and output paths, the other 8-bit paths, and an integrated AGP bridge facilitates communication with a compatible graphics device. With 3.2GBps per path on the first side, HyperTransport technology delivers more than enough bandwidth to support the fastest AGP 8x protocol.
HyperTransport technology bundles data into packets for efficient transfer and minimal overhead. Command information flows over the data path in 4- or 8-byte packets. The data packet comprises an 8- or 12-byte header and a 4-64 byte payload. In addition, each HyperTransport technology link also contains one clock line per 8-bit data path and one control line, enabling the insertion of a control packet in the middle of a long data packet, thus reducing latency.
Enabling Low Latency: Going Parallel

The storage market is transitioning from parallel IDE and SCSI technologies to Serial ATA and Serial Attached SCSI (SAS), both designed to enable better performance through higher operating frequencies, thinner, longer cables, and improved scalability. At the same time, PCIE is replacing PCI for board-to-board I/O and high-definition audio and peripheral connectivity.
When you compare PCIE and HyperTransport™ technology it is easy to blur the line between serial and parallel technologies. On the one hand, PCIE employs a dual-simplex connection using pairs of differentially driven signals – one pair sends information, the other receives it. On the other hand, HyperTransport technology is also divided into pairs of unidirectional links for communicating in a point-to-point manner. However, there is a simple distinction between the two – HyperTransport technology’s parallel link structure forwards the clock signal, while PCIE eliminates it by encoding it into the data stream and thus increases latency.
Priority Request Interleaving (PRI) contributes to HyperTransport technology's low-latency approach, by allowing an 8-byte priority request command to be inserted within a longer, lower-priority data transfer. So if a peripheral device is transferring data to the host and another needs data from it, PRI inserts a control packet into the first transfer. Once the host starts receiving this data, it initiates a transfer to the second peripheral, even as the first transfer remains in progress, thereby improving utilization and responsiveness.
AMD64 and HyperTransport™ Technology

Not only is HyperTransport technology an integral part of the AMD Opteron™ processor’s board-level architecture, it also shapes how multiple AMD Opteron CPUs communicate with each other.
All AMD Opteron processors feature three 16-bit HyperTransport technology links running at 800MHz, and theoretically capable of 1.6 gigatransfers per second or 3.2GBps of bandwidth in each direction. In a single-processor configuration, the links are non-coherent – all three are available for connection with system I/O components. In a multi-processor configuration, one must be coherent, leaving the others for I/O responsibilities. The coherent link allows each processor to access another’s memory. Thus, an AMD Opteron processor 800-series CPU, designed for four- and eight-way configurations, needs three coherent HyperTransport technology links to communicate with its neighboring processors.
Each AMD64 architecture-based processor features an integrated DDR memory controller. Because coherent HyperTransport technology links allow one processor to access the information contained in another's memory, the total system bandwidth aggregates. So a dual-processor system with DDR400 memory boasts 12.8GBps of total memory bandwidth, a four-processor system boasts 25.6GBps, and so on. Compared to shared architectures that divide available bus bandwidth for every additional processor, AMD Opteron processors enjoy excellent scaling characteristics.
Making the Connection
A processor is inconsequential without a robust supporting platform. So while HyperTransport™ technology pays dividends for high-performance, low-latency inter-processor communication, it also lays the foundation for third-party manufacturers to design complementary core logic.
Spearheading platform development, AMD unveiled the AMD-8131 PCI-X Tunnel and the AMD-8111 I/O Hub to coincide with its AMD Opteron™ processor launch.
The PCI-X tunnel connects to the host processor through 16-bit upstream and downstream connections, continuing the HyperTransport technology link through 8-bit up and downstream paths that interface with the next device in the daisy chain. The tunnel features two PCI-X bridges, each supporting a 64-bit data bus, 133MHz devices, and up to five PCI masters apiece.
AMD’s I/O Hub adds the ancillary functions you would normally expect from a Southbridge, including USB compatibility, AC'97 audio, a 10/100 Ethernet controller, IDE support, and its own PCI bus. Because it does not push much bandwidth, the I/O Hub employs 8-bit upstream and downstream paths, delivering 800MBps of bandwidth.
The other components in AMD’s HyperTransport technology chipset lineup are the AMD-8151 AGP 3.0 Graphics Tunnel and AMD-8132 PCI-X 2.0 Tunnel.
Naturally, the graphics tunnel facilitates the connection of an AGP 8x video card, a still essential component for gamers and graphics workstation professionals. (With the increased popularity of high-end PCIE graphics cards, HyperTransport technology tunnels with PCIE bridges will likely replace AGP tunnels.)
The AMD-8132 tunnel builds on the previous PCI-X implementation, adding support for the new 1GHz HyperTransport technology link, up to 266MHz PCI-X peripherals, and 16-bit upstream and downstream on both sides of the tunnel. It uses two bridges, each capable of accommodating five PCI masters.
Because AMD focuses more on delivering compelling content to enthusiasts through processor designs than on its role as a chipset manufacturer, a number of other companies have dedicated significant resources to developing exciting chipset technologies with HyperTransport technology – VIA and SiS offer Southbridges that connect through proprietary bus technologies, while NVIDIA's nForce3 is a single-chip solution with complementary I/O logic built onto the same die.
The new generation of Socket 939 chipsets uses HyperTransport technology and PCIE cooperatively. HyperTransport technology serves as a chip-to-chip inter-connect, while PCIE handles board-to-board communications. Both are compatible with PCI ordering and configuration specifications.
Conclusion
As microprocessor performance climbs, I/O performance scales slower, thus reducing the overall effect of any advances. Meanwhile, networking, graphics, and storage are performing at new levels and demanding a revamped interconnect technology.
HyperTransport™ technology delivers the raw throughput and low latency necessary for chip-to-chip communication. It increases I/O bandwidth, cuts down the number of different system buses, reduces power consumption, provides a flexible, modular bridge architecture, and ensures compatibility with PCI.
HyperTransport technology also enables AMD Opteron™ processors to function in highly scalable 1-, 2-, 4- or 8-way configurations, aggregating memory bandwidth by maintaining cache coherency instead of dividing bandwidth over a shared bus. Single-processor systems, such as those powered by the AMD Athlon™ 64 FX processor, exploit HyperTransport technology to eliminate I/O bottlenecks.
In tomorrow’s computing world, HyperTransport technology will, for example, serve as the backbone for real-time, high-definition video editing, 10GBps Ethernet, and SAS on a chip level, while PCIE opens up board-level functionality.
The future looks bright for immersive computing with AMD and HyperTransport technology.
Cautionary Statement
Activities and projects described herein may involve the use of tools and materials that may present health and safety hazards. These must be handled carefully and all tools and products should be used strictly according to manufacturers' precautions and instructions for the safe use of the respective tool or product. The techniques described herein may result in the voiding of manufacturers' warranties. The user assumes all risks associated with the techniques described in this article/guide. THIS INFORMATION IS PROVIDED “AS IS” WITH NO WARRANTY, EXPRESS OR IMPLIED. AMD ASSUMES NO RESPONSIBILITY FOR ANY ERRORS CONTAINED IN THIS ARTICLE/GUIDE AND HAS NO LIABILITY OR OBLIGATION FOR ANY DAMAGES ARISING FROM OR IN CONNECTION WITH THE USE OF THIS ARTICLE/GUIDE
|
|