The Wishbone II transaction bus: Another grade of speed
StoryMay 09, 2008
Wishbone specifications have been released by OpenCores and Silicore with the aim to provide a standard IP core interconnection scheme to fulfill requirements of modern System-on-Chip (SoC) designs, including CPUs, DMA engines, memory interfaces, peripheral interfaces, and so on.
Wishbone specifications have been released by OpenCores and Silicore with the aim to provide a standard IP core interconnection scheme to fulfill requirements of modern System-on-Chip (SoC) designs, including CPUs, DMA engines, memory interfaces, peripheral interfaces, and so on.
The andEuros company has used the Wishbone specification since its inception and has developed an improved version of the Wishbone bus, called Wishbone II, to propose an advanced pipelined architecture where read and write transactions are separated and the bus acts as a transaction bus. In this way, multiple transactions can take place at the same time, removing all latencies along the path and stalling RMW cycles by incorporating a new per-cell locking concept. The ultimate benefit, of course, is that finally bus throughput has been increased to the maximum.
Design and development of large-scale FPGA/ASIC SoC designs have forced designers to implement a modular architecture with a standardized module interface that connects various IP modules in any possible configuration. One of the most popular interconnect architectures was released by OpenCores called the Wishbone B.3 bus (www.opencores.org). In a similar way, Altera has introduced its own interconnect scheme called Avalon Bus (www.altera.com) around which SOPC Builder and Nios (II) Systems are made. Xilinx has also introduced its own bus called the On-Chip Peripheral Bus combined with the Processor Local Bus (www.xilinx.com).
These interconnect architectures are single transaction master/slave oriented, meaning that a CPU requesting a word from a given address stalls itself and a path (bus) to the destination for as long as this word is not received. Lots of bus cycles are lost in this way, giving lower actual data throughput than expected despite the relatively high system bus frequency. Even with fast burst reads and writes introduced by special signals, bus cycles are still lost until the first word is received at the additional cost of doubling the burst logic at both sides, source and destination. Bus stalling is more evident when accessing slower modules with greater latencies. In these cases, system performance degrades dramatically; for example, a 100 MHz system may see its throughput fall as low as a few MB per second.
That is why there was a desperate need to develop bus architectures employing new concepts. Some new signals have been introduced to support new transaction bus concepts based on the Wishbone B.3 architecture, overcoming latency issues while maintaining backwards compatibility.
Wishbone II transaction bus concept
In our proposed bus, transactions are represented by a transaction vector containing:
- Source (module) address
- Destination (module) address
- Operator
- Data
Source and destination addresses define the path; the operator describes one or more operations to be executed along the path and/or at the destination address; and some operations require supplemental data given to complete the transaction. Actual implementation requires additional handshaking signals.
Transaction vectors are placed onto a transaction bus transporting the vector from source to destination, and executing bus-oriented operations as requested by the vector. Once the transaction vector is placed (sent), the source has no further responsibility and the transaction bus takes complete control over it. The source is then ready to issue the next transaction vector. Multiple tasks or requests may be issued beforehand, one per bus cycle, which reduces the need for any prediction logic at the destination module to support burst reads or writes as prediction logic for various kinds of burst reads.
There are two kinds of transactions:
- Independent
- Dependent (when their order is important)
To support dependent transactions, the transaction bus must never change the order of already placed transactions. The transaction bus features a fully acknowledged mechanism to accept new transaction vectors, execute internal forwarding, and deliver to the destination module. The transparent architecture reflects itself as a simple input-output black box; however, the implementation is based on a multi-pipelined structure where each (FIFO) line holds one transaction vector.
The Wishbone II transaction bus proposes four basic operations only:
- Single read
- Single write
- Cell lock
- Bus lock
Single read and write are issued by modules, where cell and bus locking operations are in the transaction bus domain. Burst reads and burst writes are accomplished by issuing a stream of read or write transactions. RMW cycles are supported through the bus, or even better, they can be facilitated using the new cell locking concept, which instead of stalling the complete SoC bus locks a single or multiple memory cells only to a given owner. These cells cannot be accessed by others as long as they are not unlocked.
Wishbone II signals
A Wishbone II transaction vector is composed from the Wishbone B.3 specifications by introducing the following new signals:
WB_ACW Write Acknowledge
WB_ACR Read Acknowledge
WB_TGA Address Tag in both directions
WB_ALK Address Lock
In the further text, prefix WB may be changed to WBM denoting a master interface, and WBS denotes a slave interface or can be left blank to describe any master or slave interfaces. Input signals are appended _I at the end and output signals with _O. The proposed bus discards the Wishbone B.3 ACK signal since its functionality is now split among the ACR and ACW signals. Complete basic signal descriptions for master and slave are listed in Table 1. New signals are marked in bold.
Wishbone II bus transactions
Write transactions
A write transaction is almost identical to the write transaction given in the Wishbone B.3 specifications, except Wishbone II uses the ACW signal to acknowledge a write cycle. A read and write transaction is composed of read requests that are identical to write transactions except that the destination operation signal WE is set.
Read transactions
A read transaction is composed of two transactions:
- Read request transaction issued by source
- Read response transaction issued by destination
A read request is sent by the master module representing a source by first issuing a write transaction with the destination operation WE set to read. The Master should set the Address Tag Write vector to identify read response. (If there is a single master, this is not necessary.) The read request transaction is acknowledged in the same way as the write transaction.
The destination completes the transaction by returning a separate read response transaction marked by the acknowledge signal ACR and providing valid data and Address Tag Read information. Address Tag Read is a copy of the Address Tag Write.
Figure 1 shows an example system with one pipeline stage on write (input) and read (output) paths between the source (master) and destination (slave) devices. The system has 1 cycle directions on both directions; therefore, a request-response loop takes at least 2 wait cycles. Slave (memory) may also perform some internal management like refresh, which adds up to the total number of wait states.
Figure 1
(Click graphic to zoom by 1.3x)
You can see that Figure 2 depicts a transaction bus data flow diagram for the given example of the three read request transactions placed by the master as AD0, AD1, and AD2, and the associated returned read response transactions as DO0, DO1, and DO2. The signal WE is assumed to be cleared for all three transactions to indicate read operations. Transactions AD0 and AD1 are burst transactions, meaning that AD1 = AD0 + 1, and the AD2 is an independent transaction triggered meantime that could be a cause of an external interrupt that loads its interrupt vector, and so forth.
Figure 2
Each read request transaction is acknowledged by the ACW signal, and the returned read response transaction is marked (acknowledged) by the ACR signal. Note that the latency order may not be the same, due to other higher priority master(s) or memory refresh functions, and so on. In the previous example, the AD0 is immediately acknowledged but it takes 3 wait cycles to return the DO0; the AD1 is acknowledged 1 cycle later while the DO1 is returned in 2 wait cycles only, and the DO2 again takes 3 wait cycles. All three transactions are completed in 9 cycles; theoretically, without adding two illustrative wait cycles, they would complete in 7 cycles only. Using the Wishbone B.3 specifications, the same scenario is shown in Figure 3.
Figure 3
(Click graphic to zoom by 1.5x)
Where again AD0 and AD1 are bursts, AD1 = AD0 + 1, and the AD2 is an independent request. All three transactions are completed in 12 cycles, decreasing performance for 41 percent (at a minimum 7 cycles in Wishbone II) even at additional silicon cost, a memory burst logic implementation on both sides: source and destination.
Imagine a continuous burst Wishbone II would perform with 0 wait cycles (latency is completely removed) and absolutely no loss (again 0 wait cycles) at the slave side when more than just one master coexists in the system for issuing the first word. To be more illustrative for a system running at 150 MHz, long bursts with fixed latency of 2 cycles would yield a Wishbone II bandwidth of 150 Mwords, and Wishbone B.3 of 50 Mwords only.
Read-modify-write cycles and exclusive bus/Address locking
A read-modify-write cycle can be made using the bus LOCK signal by issuing the read request and LOCK signal set, waiting for the read response, followed by a write, and finally releasing the LOCK afterward. To not stall the complete bus, Wishbone II introduces a per-cell memory locking feature using the ALK signal, which is used in almost the same way as Wishbone LOCK signal, just that it doesn’t stall the complete bus but grants exclusive permissions to a given module distinguished by the source TGA.
Wishbone II races into the future
The Wishbone II bus proposes an advanced transaction bus-oriented architecture for SoC designs for FPGAs and ASICs in which architecture write and read operations are handled as separate write and read transactions. Each transaction is stored in a single line, and the multi-pipeline architecture acts as a FIFO buffer transporting multiple transactions from and to multiple source and destination modules. An advanced locking mechanism prevents the complete bus from stalling due to the RMW cycles using a temporary per-cell locking mechanism. In this way, overall design data throughput is increased just up to the maximum while the design successfully integrates slow- and high-speed, low- and high-latency peripherals and CPUs.
Uros Platise has been R&D manager for more than 10 years at andEuros, specializing in electronics, robotics, and software engineering. His expertise includes FPGA architectures, price/performance optimizations, communication protocols, sensor networks, and so forth. He will receive his PhD in Isotropic Networks, from JSI in Ljubljana, Slovenia. Uros can be contacted at [email protected].
andEuros
+385-52-777-341
www.andEuros.org/erd