Apr 11, 2013

Optimising the JTAG scan

Aficionados of 2Wire kit will already know of the Tripod site, an excellent unofficial resource for these popular and powerful devices. The website can be found at [1].

Bill, the webmaster of the Tripod site, has done a sterling job for some years now. He tracks new firmware for the 2Wire, monitors firmware incompatibilities and follows new firmware rollout programmes. Bill also documents the ways to block or circumvent any undesirable firmware ‘features’.

Recently, Bill recounted his own experience of the painfully slow process of re-flashing an embedded device via JTAG. [2] We are developing an open source JTAG tool for the TriMedia and Bill’s account served as a prescient reminder to optimise the JTAG scan chain whenever possible!

The designers of the TriMedia core, Philips Semiconductors, clearly recognised the issue of long scan times. Several useful mechanisms are built into the TM3260 JTAG controller to mitigate the problem.

Described below are two key techniques for JTAG optimisation which are found in the TriMedia.

One optimisation method sees two JTAG registers combined into one. The ifull handshaking bit of the CTRL2 JTAG register and the 32-bit DATA IN JTAG register are joined in serial to form a single virtual register of 33 bits. This virtual register is named IFULLIN. This combination of control and data register can see a dramatic reduction in JTAG scan-in time. This is illustrated best by reference to the JTAG state machine diagram, above.

If there was no virtual register, transferring 32-bits of data to the TriMedia using JTAG would involve the following: The instruction to select the DATAIN register would be shifted it. The 32-bits would then be scanned in to the JTAG data register (DR). Then the instruction to select the CTRL2 register would be shifted in. Finally, to tell the TriMedia target that data is ready, the 8 control bits, including the ifull handshaking bit needs to be scanned in to the DR register.

However, with a virtual register combining both the DATAIN and the ifull bit from the CTRL2 register, the scan time is shortened as follows: the JTAG controller needs only shift in one instruction (to select the virtual register) before scanning in the data register. That scanned in data is 33-bits in size, containing values for both the CTRL2.ifull bit as well as the 32 data bits.

By eliminating the second of those two-part operations – the instruction shift in to select the CTRL2 register and then the 8-bit scan in of the data register – the time needed to download object code to a TriMedia for a JTAG boot is reduced by 45% according to our tests.

Similar savings in scan time are obtained by combining the CTRL1 and the DATA OUT registers into another virtual register of 33 bits. This virtual register is labelled OFULLOUT.

The virtual JTAG registers allow us to greatly optimise the download function. We need only once shift in the instruction to select the IFULLIN virtual register. In the main loop to the function, we repeatedly scan in the data register containing all 33 bits of IFULLIN. Those bits are the 32-bits of data and then the handshaking control bit, CTRL2.ifull. This is the most optimal method for downloading.

A second, less obvious saving in scan time is achieved by capturing the state of the CTRL bits during a shift in of a TM32 JTAG instruction. This mechanism removes the need to explicitly select and scan out a CTRL register, just to obtain the control flag status.

The shortcut allows the status of the JTAG control bits to be incidentally obtained from the output captured from scanning in any JTAG instruction.

Here, however, the operation of the TM3260 JTAG controller and the official Philips documentation for the controller were found not to tally. [3]

One of the major issues we discovered with the TriMedia‘s JTAG controller is that the CTRL2.ifull bit cannot be reliably read from the TAP interface. This issue runs contrary to the claims in the official documentation. The CTRL2.ifull bit and the ofull bit are vital for handshaking between the TM32 target and the JTAG host that connects to the TriMedia via the TAP interface.

Several methods for reading the ifull bit via the TAP interface were tried without success:
  • select the CTRL2 register, scan out the contents, including the ifull bit.
  • select and scan out the 33 bit IFULLIN virtual register, including the ifull bit.
  • obtain control bits (ifull, ofull and sleepless) in captured output from a shifted-in instruction.
None of these methods can reliably capture the state of the CTRL2.ifull handshaking bit from the TAP interface.

It was also found, regarding the third method listed above, that the control bits are not in the bit positions described in the official TriMedia literature. The source code for our tool, and the output below, clarifies our findings of the true positions of those control bits:
asbokid@home:~/asboapps$ sudo ./2wiglet -c usbblaster -B testfile.bin
2Wiglet v0.5 - (c) 2011 asbokid
JTAG tool for 2Wire Routers with a TriMedia TM32 core

Searching for cable driver: usbblaster
usbblaster USB cable driver found
Connected to libftdi driver.
Connected to UsbBlaster cable
Waiting for JTAG chain to stabilise
Received IDCODE 3269b4c1 (2Wire Ares)
Current ctrl flags: 0x00 [ | | ]
L1BOOT_READY = 12340002
Current ctrl flags: 0x15 [ofull|ifull|sleepless]
MMIO_BASE = 1be00000 want: 1be00000
Current ctrl flags: 0x05 [ |ifull|sleepless]
Current ctrl flags: 0x15 [ofull|ifull|sleepless]
DRAM_LO = 40000000 want: 40000000
Current ctrl flags: 0x05 [ |ifull|sleepless]
Current ctrl flags: 0x15 [ofull|ifull|sleepless]
DRAM_HI = 44000000 want: 44000000
Current ctrl flags: 0x05 [ |ifull|sleepless]
Current ctrl flags: 0x15 [ofull|ifull|sleepless]
DRAM_CLIMIT = 44000000 want: 44000000
Current ctrl flags: 0x05 [ |ifull|sleepless]
LOAD ADDRESS = 40100000
Current ctrl flags: 0x05 [ |ifull|sleepless]
Current ctrl flags: 0x11 [ofull| |sleepless]
In summary: it was found that the CTRL2.ifull bit behaves like an interrupt control line. It can be asserted externally, but the bit itself can only be reliably read (and cleared) internally, by the JTAG controller on the TriMedia core. Consequently, the ifull bit must be considered as a one-way handshaking flag.

Those limitations aside, it is certainly still possible to use the TM3260 JTAG controller for the efficient and reliable transfer of data both to and from the TriMedia.

In tests, a net rate of ~13.5kBytes per second was achieved in transfers from host to TM32 using a clone Altera USB-Blaster JTAG programmer. The USB-Blaster was connected to an x86 PC via a USB 2.0 bus.

Below are logs of the download of 1Mbyte of randomly-generated data. The transfer took 76.31 seconds and attained 13.42kBytes/sec.

At that speed it would take roughly 40 minutes to transfer the whole 32MBytes of data stored in the NAND flash array on the 2Wire 2701 PCB.
asbokid@home:~/asboapps$ dd if=/dev/urandom of=testfile1M.bin bs=1K count=1K
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied.

asbokid@home:~/asboapps$ sudo ./2wiglet -c usbblaster -B testfile1M.bin,0x40100000
2Wiglet v1.0 - (c) 2012 asbokid
JTAG tool for 2Wire Routers with a TriMedia TM32 core

Found USB cable driver for usbblaster
Connected to libftdi driver.
Connected to UsbBlaster cable
Waiting for JTAG chain to stabilise
Received IDCODE 3269b4c1 (2Wire Ares)
Download 1048576 bytes from 'testfile1M.bin' to 40100000

Waiting for L1BOOT_READY from TM32 target
L1BOOT_READY <- data-blogger-escaped--="" data-blogger-escaped-12340002="" data-blogger-escaped-1be00000="" data-blogger-escaped-40000000="" data-blogger-escaped-44000000="" data-blogger-escaped-address="" data-blogger-escaped-dram_climit="" data-blogger-escaped-dram_hi="" data-blogger-escaped-dram_lo="" data-blogger-escaped-expected:="" data-blogger-escaped-load="" data-blogger-escaped-mmio_base=""> 40100000
CODE SIZE -> 00100000 (1048576)

Started L2 download..
L2 load done.

Comparing checksums:
PC MONITOR=07f922c7, TM32 TARGET=07f922c7
Checksums good.

Download complete
Elapsed time 76.31 secs (avg 13.42kB/sec)
Freed buses and JTAG chain
That’s hardly an earth-shattering transfer speed but it appears to be the maximum for that particular programmer using the standard JTAG monitor code running on the TriMedia, and after applying the optimisations described above.

[1] http://bt2700hgv.tripod.com/
[2] http://logikir100.tripod.com/JTAG.htm
[3] http://www.tridentmicro.com/wp-content/uploads/2010/01/UM101041.pdf


Post a Comment