16 Nov 2015
SoC development: do we really understand what is going on?
Gajinder Panesar, UltraSoC CTO, provides some essential guidelines for making SoC, system on chip designs easier and more successful to implement.
The past two decades of SoC evolution have seen an exponential increase in complexity. Today’s devices have multiple processing units, CPUs, GPUs, DSPs, DMAs, third-party IP blocks and custom logic.
As if the hardware is not complicated enough, there will of course be substantial amounts of sophisticated software code running on the SoC, introducing a whole new set of dependencies and uncertainties.
Under these circumstances it is quite literally impossible to fully predict the behaviour of the SoC, and this in turn makes system development difficult. Post-silicon bring-up, hardware/software integration, and testing are made very much harder. The commercial consequences of problems in the development flow are potentially dire: products may be delayed (with potentially damaging effects on their market success); engineering costs increase; products ship with lingering bugs that are simply too difficult to fix; subtle problems occur in the field when there is no effective means to detect or correct them.
In many ways, the worst problems are those which do not produce an outright fault condition, but instead have a more subtle impact. The SoC may consume more power than expected; or perhaps, even though designed with substantial margins in the specification, it will deliver the minimum required data rate and no more.
Getting a grip on all this complexity calls for a fundamental rethink in the way we do SoC development and debug. In particular, it requires robust analytical tools that give the development team actionable information on how the chip is operating as a system.
Moreover, these tools need to be based on more than software instrumentation, piecemeal analysis of subsystems and legacy interfaces like JTAG.
The solution is to build instrumentation, filtering and analytics capabilities into the hardware itself – yes, there is a penalty in terms of silicon real estate, but the benefits substantially outweigh the costs. By placing non-intrusive, simple but intelligent, configurable, blocks capable of monitoring buses or custom logic signals into the design, the development team will get to see how the design is really behaving, at wire speed. The SoC will be released faster (and with more commercial success); development cost and risk will be reduced; there will be fewer bugs in the field, and any issues that do occur will be identified and can be resolved more quickly.
One way of achieving this change in paradigm is to employ a third-party suite of debug and performance analysis tools. It is possible to provide a fully message-based platform that enables concurrent access by multiple performance analysis tools in real-time. The architecture is highly modular and comprises of three classes of modules: advanced; message; and communicators.
Advanced modules can be thought of as probes that can be integrated into the system, for example, by connecting to the block-level interfaces of system components such as bus fabric links. Message modules can be used to construct an on-chip message passing fabric which is independent of the system interconnect. Communicators interface the various components to debug and performance tools, which can be outside of the SoC.
The Status Monitor provides a wide variety of monitoring functions that can be used for debug, diagnostics and performance profiling. It implements the following functions:
Detection (identification of activity to monitor) using:
Simple matching against the input bus. That is, indicate when the input bus matches a predetermined value.
Advanced filtering against the input bus. Enables selection of a set of signals from the input bus and comparison against a user defined value to produce a choice of filter outputs (=, ≠, >, ≥, < or ≤)
A detected condition sequencer. This provides logic-analyser-type functionality that can walk through different filters when certain conditions have been qualified
Monitoring actions for detected activity:
Event message generation
Match message generation
Internal trigger generation
Trace unit to capture data values
As its name suggests, the Bus Monitor provides monitoring functions for master and/or slave interfaces within a system fabric or internal bus. These include:
AXI-3 or AXI-4
OCP 3 or lower
Comprehensive filtering on bus protocol fields to detect transactions of interest
- Cascading filters in series
Monitoring actions for transactions of interest:
Event message generation
Match message generation
Internal trigger generation
Counting various protocol metrics
Data captured by the Status Monitor and Bus Monitor is passed on to the message infrastructure and may then be routed off-chip for analysis. Both monitor types can be configured to intelligently filter data to minimize the amount of data (which can be timestamped) being routed off-chip.
The message infrastructure provides a low-latency message and routing function to all the other on-chip performance and debug blocks, independent of the SoC’s interconnect and configurable from two bits to 512 bits wide. This allows for a network balanced for silicon area and the bandwidth required.
Communicators and USB Hub
Communicators are used to pass information to the outside world. One key capability is the USB hub (Figure 1), which uses the USB PHY within the SoC to expose collected data, requiring no additional software. When enabled, performance and debug messages can co-exist with the modem’s USB communication. The hub also provides for a layered protocol which can enable services like encryption of the debug data.
Figure 1 High-level view of USB Hub
Alternatively, debug information may be stored in system memory for later recovery. This mode of operation is especially useful for field-trials and “in use” analytics, facilitating identification of rare or sparse problems that only occur intermittently and with large numbers of systems and real use cases – making them very hard to identify in the laboratory.
Use Case: on-chip debug and performance monitoring of a baseband SoC The ability of an on-chip performance monitoring and debug solution to solve a variety of complex, subtle issues can be appreciated by considering the case of an LTE-A modem SoC (Figure 2), equipped with UltraSoC IP.
The on-chip blocks allow debug and monitoring of the entire SoC - not just a specific processor core. This is particularly important in multiprocessor SoC devices where there are several processors, possibly obtained from multiple sources, in addition to a number of complex co-processors. The modular nature of the monitoring and debug system, and message-based architecture allows for non-intrusive monitoring of the underlying system. In this example, the USB hub is used to transport data off-chip.
Figure 2 UltraSoC enabled LTE Advanced Modem SoC
The relatively simple additions to the modem allow a large number of common problems to be spotted and diagnosed. For example, samples passing to and from the radio will typically be temporarily stored in FIFOs, which in turn interface with main memory via DMA. A Status Monitor observing and sending signals from within the Radio IF allows the level, state and behaviour of the FIFOs to be monitored: the development team obtains information on how large the FIFOs need to be to ensure the modem is conformant, while not needing to over-engineer them at the expense of silicon area and power efficiency.
DMA operation is another key area with a potential impact on the overall performance of this LTE design: a great deal of data needs to be made available to the DSPs from internal memory, in a fashion that keeps up with the strict LTE timing constraints. A Bus Monitor module is used to observe individual transactions performed by the DMA engines and to calculate latencies (Figure 3 shows an example of read latencies to DDR). In addition to obtaining a histogram of all the transactions, it is also possible to calculate minimum, maximum and average latencies, as well as the time taken for all the DMA transactions.
Figure 3 DMA transaction latencies
One important point to note is the need for cross-triggering within the monitoring system. For example, while a simple Status Monitor can be used count the number of stall cycles for each processor, and thus measure how efficiently the system is utilizing the available processing power, in practice the design team will also want to look more deeply into cases of poor utilization. This can be done by configuring Bus Monitors to capture traces in a circular buffer and, when the CPU’s Status Monitor detects a stall, trigger the Bus Monitors to send the traces captured out via the messaging system, allowing the development team to look at the behaviour of the system leading up to the stall.
Intermittent deadlock conditions are amongst the most difficult problems for development teams to identify and diagnose. There are various causes of deadlock in systems, and on-chip performance monitoring can be used to identify these, whether caused by hardware, software or an interaction. As an example of how to help track down an interconnect-related lock-up, consider the problem where a Master asserts a “ready” signal but the Slave never asserts a “valid” signal.
To detect this situation, a Bus Monitor can be configured in a mode where the trace capture is performed without waiting for ready. Asynchronous triggers are anything which is not directly derived from a particular bus phase acceptance: for example an event, the interval timer, and monitor_snapshot message, or a transaction exceeding a threshold metric.
This could configure trace in 'capture-to' mode, and trigger automatically when the transaction duration exceeds some long threshold. Alternatively, some other block would trigger that the bus is hung and send in a monitor_snapshot message.
Conclusion – and the future
SoCs and their associated software have become ever-more complex over the last twenty years: so complex that their behaviour and the obscure interactions of their component parts makes them very difficult to understand and analyse such systems.
Traditional approaches may take many man-years and may result in delayed products resulting in lost revenue. But by placing non-intrusive, simple but intelligent, configurable, blocks capable of monitoring buses, CPUs or custom logic signals these tasks become much simpler.
This has significant commercial value: faster time-to-market, lower cost, lower risk and fewer bugs in the field. Looking forward, such capabilities could also be used to reduce liability and litigation concerns from products that fail in the field. They could also enable “forensics” – analysis after a field failure of what went wrong and caused the problem. Finally, the ability to optimize systems will be important. By monitoring and reporting on actual use in real-life scenarios, it is may be possible to improve performance or reduce power consumption by adjusting software behaviour. In other cases, the information gathered could be used in the definition of next-generation devices (for example reducing sizes of buffers, or balancing the performance of processors and interconnect).
On-chip debug and monitoring is a powerful concept: but for many development teams its use requires a fundamental re-think. It’s a change in approach that cannot come too soon.
Page 1 of 1
About the author
One of Europe’s leading SoC architects, Gadge Panesar's experience includes senior architecture definition and design roles within both blue-chip and start-up environments. He holds more than 20 patents and is the author of more than 20 published works. Prior to joining UltraSoC, he served at NVIDIA (NASDAQ:NVDA). As Chief Architect at Picochip he created the architecture of the company’s market-defining small-cell SoCs, and continued in this capacity after the company’s acquisition by Mindspeed Inc (NASDAQ:MSPD). His previous experience includes roles at STMicroelectronics, INMOS, and Acorn Computers. He is a former Research Fellow at the UK’s Southampton University, and a former Visiting Fellow at the University of Amsterdam.
UltraSoC is transforming the way companies develop and deliver next-generation electronic devices and systems. Its semiconductor IP and software products help SoC designers equip their chips with advanced capabilities including “bare metal” security and performance monitoring. The company also addresses a burgeoning crisis in the SoC development process itself: today’s chips are so complex that it is b impossible for the design teams who create them to understand their operation using conventional means. By hard-wiring non-intrusive analytics circuitry into the chip, UltraSoC accelerates time-to-market, reduces bugs and increases quality, de-risking the SoC development process.
Most popular articles in Circuit Design
Share this page
Want more like this? Register for our newsletter