scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Diagnosis of Embedded Software Using Program Spectra

TL;DR: This paper discusses the application of a specific automated debugging technique, namely software fault localization through the analysis of program spectra, in the area of embedded software in high-volume consumer electronics products, and demonstrates that it can lead to highly accurate diagnoses of realistic errors.
Abstract: Automated diagnosis of errors detected during software testing can improve the efficiency of the debugging process, and can thus help to make software more reliable. In this paper we discuss the application of a specific automated debugging technique, namely software fault localization through the analysis of program spectra, in the area of embedded software in high-volume consumer electronics products. We discuss why the technique is particularly well suited for this application domain, and through experiments on an industrial test case we demonstrate that it can lead to highly accurate diagnoses of realistic errors

Summary (4 min read)

1 Introduction

  • Software reliability can generally be improved through extensive testing and debugging, but this is often in conflict with market conditions: software cannot be tested exhaustively, and of the bugs that are found, only those with the highest impact on the user-perceived reliability can be solved before the release.
  • Testing reveals more bugs than can be solved, and debugging is a bottleneck for improving reliability.
  • Locating a fault is an important step in actually solving it, and program spectra have successfully been applied for this purpose in several tools focusing on various application domains, such as Pinpoint [4], which focuses on large, dynamic on-line transaction processing systems, AMPLE [5], which focuses on object-oriented software, and Tarantula [9], which focuses on C programs.
  • The remainder of this paper is organized as follows.
  • In Section 2 the authors explain the diagnosis technique in more detail, and in Section 3 they discuss its applicability to embedded software in consumer electronics products.

2.1 Failures, Errors, and Faults

  • As defined in [3], the authors use the following terminology.
  • An error is the part of the total state of the system that may cause a failure.
  • To illustrate these concepts, consider the C function in Figure 1.
  • A failure occurs when applying RationalSort yields anything other than a sorted version of its input.
  • In a software context, faults are often called bugs, and diagnosis is part of debugging.

2.2 Program Spectra

  • A program spectrum [11] is a collection of data that provides a specific view on the dynamic behavior of software.
  • As an example, a block count spectrum tells how often each block of code is executed during a run of a program.
  • A block of code is a C language statement, where the authors do not distinguish between the individual statements of a compound statement, but where they do distinguish between the cases of a switch statement1.
  • Block 5, the RationalGT function body, is executed six times: once for every iteration of the inner loop.
  • Beside block count/hit spectra, many other forms of program spectra exist.

2.3 Fault Diagnosis

  • The hit spectra of M runs constitute a binary matrix, whose columns correspond to N different parts of the program (see Figure 2).
  • In their case, these parts are blocks of 1This is a slightly different notion than a basic block, which is a block of code that has no branch.
  • This vector corresponds to a hypothetical part of the program that is responsible for all observed errors.
  • In the field of data clustering, resemblances between vectors of binary, nominally scaled data, such as the columns in their matrix of program spectra, are quantified by means of similarity coefficients (see, e.g., [8]).
  • I3 is not sorted, but the denominators in this sequence happen to be equal, in which case no error occurs.

3 Relevance to Embedded Software

  • The effectiveness of the diagnosis technique described in the previous section has already been demonstrated in several articles (see, e.g., [1], [4], [9]).
  • Especially because of constraints imposed by the market, the conditions under which this software is developed are somewhat different from those for other software products: Moreover, concurrent systems are difficult to model.
  • The technique improves insight in the run-time behavior.
  • Profiling tools such as gcov are convenient for obtaining program spectra, but they are typically not available in a development environment for embedded software.

4.1 Platform

  • The subject of their experiments is the control software in a particular product line of analog television sets.
  • All audio and video processing is implemented in hardware, but the software is responsible for tasks such as decoding remote control input, displaying the on-screen menu, and coordinating the hardware (e.g., optimizing parameters for audio and video processing based on an analysis of the signals).
  • Most teletext2 functionality is also implemented in software.
  • Essentially, the run-time environment consists of several threads with increasing priorities, and for synchronization purposes, the work on these threads is organized in 315 logical threads inside the various components.
  • The total available RAM memory in consumer sets is two megabyte, but in the special developer version that the authors used for their experiments, another two megabyte was available.

4.2 Faults

  • The authors diagnosed two faults, one existing, and one that was seeded to reproduce an error from a different product line.
  • The CPU load clearly increases around the 60th sample, when the teletext viewing starts, but never returns to its initial level after sample 90, when the authors switch back to TV mode.
  • An existing fault in this functionality entails that searching in a page without visible content locks up the teletext system.
  • For which only specific combinations are allowed.
  • The authors hardcoded a remote control key-sequence that injects this error on their test platform.

4.3 Implementation

  • The authors wrote a small Koala component for recording and storing program spectra, and for transmitting them off the television set via the serial connection.
  • The transmission is done on a low-priority thread while the CPU is otherwise idle, in order to minimize the impact on the timing behavior.
  • Pending their transmission via the serial connection, their component caches program spectra in the extra memory available in their developer version of the hardware.
  • For diagnosing the load problem the authors obtained hit spectra for the logical threads mentioned in Section 4.1, resulting in spectra of 315 binary flags.
  • For the lock-up problem, the authors define a transaction as the computation in between two key-presses on the remote control.

4.4 Diagnosis

  • For the load problem the authors used the scenario of Figure 3.
  • The authors marked the last 60 spectra, for the second period of TV mode as ‘failed,’ and those of earlier transactions as ‘passed.’.
  • In the first position was a logical thread related to teletext, whose activation is part of the problem, so in this case the authors can conclude that although the diagnosis is not perfect, the implied suggestion for investigating the problem is quite useful.
  • For the lock-up problem, the authors used a proper error detection mechanism.
  • On each key-press, when caching the current spectrum, a separate routine verifies the values of the two state variables, and marks the current spectrum as failed if they assume an invalid combination.

5 Discussion

  • Especially the results for the lock-up problem have convinced us that program spectra, and their application to fault diagnosis are a viable technique and useful tool in the area of embedded software in consumer electronics.
  • There are a number of issues with their implementation.
  • Because of its rigorous design, the TV is still functioning properly, but everything runs much slower with the block-level instrumentation (e.g., changing channels now takes seconds).
  • In their case the authors could store 25 spectra of 65,536 counters, which was already slowing down the scenarios with more than that number of transactions, but even with a more memory-efficient implementation, this inevitably becomes a problem with, for example, overnight testing.
  • If an error detection mechanism is available, like in their experiments with the lock-up problem, then these four counters can be calculated on the fly, and the memory requirements become linear in the number columns in the matrix of Figure 2.

7 Conclusion

  • On a largescale industrial test case in the area of embedded software in consumer electronics devices.the authors.
  • In addition to confirming established effectiveness results, their experiments indicate that the technique lends itself well for application in the resource-constrained environments that are typical for the development of embedded software.
  • While their current experiments focus on developmenttime debugging, they open corridors to further applications, such as run-time recovery by rebooting only those parts of a system whose activities correlate with detected errors.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Diagnosis of Embedded Software using Program Spectra
Peter Zoeteweij
1
Rui Abreu
1
Rob Golsteijn
2
Arjan J.C. van Gemund
1
1
Embedded Software Lab
Delft University of Technology
The Netherlands
{p.zoeteweij,r.f.abreu,a.j.c.vangemund}@tudelft.nl
2
Innovation Center Eindhoven
NXP Semiconductors
The Netherlands
rob.golsteijn@nxp.com
Abstract
Automated diagnosis of errors detected during software
testing can improve the efciency of the debugging pro-
cess, and can thus help to make software more reliable.
In this paper we discuss the application of a specic au-
tomated debugging technique, namely software fault local-
ization through the analysis of program spectra, in the area
of embedded software in high-volume consumer electronics
products. We discuss why the technique is particularly well
suited for this application domain, and through experiments
on an industrial test case we demonstrate that it can lead to
highly accurate diagnoses of realistic errors.
Keywords: diagnosis, program spectra, automated debug-
ging, embedded systems, consumer electronics.
1 Introduction
Software reliability can generally be improved through
extensive testing and debugging, but this is often in con-
ict with market conditions: software cannot be tested ex-
haustively, and of the bugs that are found, only those with
the highest impact on the user-perceived reliability can be
solved before the release. In this typical scenario, testing
reveals more bugs than can be solved, and debugging is a
bottleneck for improving reliability. Automated debugging
techniques can help to reduce this bottleneck.
The subject of this paper is a particular automated debug-
ging technique, namely software fault localization through
the analysis of program spectra [11]. These can be seen as
projections of execution traces that indicate which parts of
a program were active during various runs of that program.
The diagnosis consist in analyzing the extent to which the
This work has been carried out as part of the TRADER project under
the responsibility of the Embedded Systems Institute. This project is par-
tially supported by the Netherlands Ministry of Economic Affairs under
the BSIK03021 program.
activity of specic parts correlates with errors detected in
the different runs.
Locating a fault is an important step in actually solving
it, and program spectra have successfully been applied for
this purpose in several tools focusing on various application
domains, such as Pinpoint [4], which focuses on large, dy-
namic on-line transaction processing systems, AMPLE [5],
which focuses on object-oriented software, and Tarantula
[9], which focuses on C programs.
In this paper, we discuss the applicability of the tech-
nique to embedded software, and specically to embed-
ded software in high-volume consumer electronics prod-
ucts. Software has become an important factor in the de-
velopment, marketing, and user-perception of these prod-
ucts, and the typical combination of limited computing re-
sources, complex systems, and tight development deadlines
make the technique a particularly attractive means for im-
proving product reliability.
To support our argument, we report the outcome of two
experiments, where we diagnosed two different errors oc-
curring in the control software of a particular product line
of television sets from a well-known international consumer
electronics manufacturer. In both experiments, the tech-
nique is able to locate the (known) faults that cause these
errors quite well, and in one case, this implies an accuracy
of a single statement in approximately 450K lines of code.
The remainder of this paper is organized as follows. In
Section 2 we explain the diagnosis technique in more detail,
and in Section 3 we discuss its applicability to embedded
software in consumer electronics products. In Section 4 we
describe our experiments, and in Section 5 we discuss how
our current implementation can be improved. In Section 6
we discuss related work. We conclude in Section 7.
2 Preliminaries
In this section we introduce program spectra, and de-
scribe how they are used for diagnosing software faults.

void RationalSort(int n, int
*
num, int
*
den)
{
/
*
block 1
*
/
int i,j,temp;
for ( i=n-1; i>=0; i-- ) {
/
*
block 2
*
/
for ( j=0; j<i; j++ ) {
/
*
block 3
*
/
if (RationalGT(num[j], den[j],
num[j+1], den[j+1])) {
/
*
block 4
*
/
temp = num[j];
num[j] = num[j+1];
num[j+1] = temp; } } }
}
Figure 1. A faulty C function for sorting ratio-
nal numbers
First we introduce the necessary terminology.
2.1 Failures, Errors, and Faults
As dened in [3], we use the following terminology.
A failure is an event that occurs when delivered service
deviates from correct service.
An error is the part of the total state of the system that
may cause a failure.
A fault is the cause of an error in the system.
To illustrate these concepts, consider the C function in
Figure 1. It is meant to sort, using the bubble sort algo-
rithm, a sequence of n rational numbers whose numerators
and denominators are passed via parameters num and den,
respectively. There is a fault (bug) in the swapping code of
block 4: only the numerators of the rational numbers are
swapped. The denominators are left in their original order.
A failure occurs when applying RationalSort yields
anything other than a sorted version of its input. An error
occurs after the code inside the conditional statement is ex-
ecuted, while den[j] "= den[j+1]. Such errors can be
temporary: if we apply RationalSort to the sequence
#
4
1
,
2
2
,
0
1
$, an error occurs after the rst two numerators are
swapped. However, this error is “canceled” by later swap-
ping actions, and the sequence ends up being sorted cor-
rectly. Faults do not automatically lead to errors either: no
error will occur if the input is already sorted, or if all de-
nominators are equal.
The purpose of diagnosis is to locate the faults that are
the root cause of detected errors. As such, error detection is
a prerequisite for diagnosis. As a rudimentary form of er-
ror detection, failure detection can be used, but in software
more powerful mechanisms are available, such as pointer
checking, array bounds checking, deadlock detection, etc.
In a software context, faults are often called bugs, and
diagnosis is part of debugging. Computer-aided techniques
as the one we consider here are known as automated debug-
ging.
2.2 Program Spectra
A program spectrum [11] is a collection of data that pro-
vides a specic view on the dynamic behavior of software.
This data is collected at run-time, and typically consist of
a number of counters or ags for the different parts of a
program. As such, recording a program spectrum is a light-
weight analysis compared to other run-time methods, such
as, e.g., dynamic slicing [10].
As an example, a block count spectrum tells how often
each block of code is executed during a run of a program. In
this paper, a block of code is a C language statement, where
we do not distinguish between the individual statements of a
compound statement, but where we do distinguish between
the cases of a switch statement
1
. Suppose that the function
RationalSort of Figure 1 is used to sort the sequence
#
2
1
,
3
1
,
4
1
,
1
1
$, which it happens to do correctly. This would
result in the following block count spectrum, where block 5
refers to the body of the RationalGT function, which has
not been shown in Figure 1.
block
12345
count 14636
Block 1, the body of the function RationalSort, is exe-
cuted once. Blocks 2 and 3, the bodies of the two loops, are
executed four and six times, respectively. To sort our exam-
ple array, three exchanges must be made, and block 4, the
body of the conditional statement, is executed three times.
Block 5, the RationalGT function body, is executed six
times: once for every iteration of the inner loop.
If we are only interested in whether a block is executed
or not, we can use binary ags instead of counters. In this
case, the block count spectra revert to block hit spectra. Be-
side block count/hit spectra, many other forms of program
spectra exist. See [7] for an overview. In this paper we
will work with block hit spectra, and hit spectra for logi-
cal threads used in the software of our test case (see Sec-
tion 4.1).
2.3 Fault Diagnosis
The hit spectra of M runs constitute a binary matrix,
whose columns correspond to N different parts of the pro-
gram (see Figure 2). In our case, these parts are blocks of
1
This is a slightly different notion than a basic block, which is a block
of code that has no branch.

N parts errors
M spectra
x
11
x
12
... x
1N
x
21
x
22
... x
2N
.
.
.
.
.
.
.
.
.
.
.
.
x
M1
x
M2
... x
MN
e
1
e
2
.
.
.
e
M
s
1
s
2
... s
N
Figure 2. The ingredients of fault diagnosis
C code. In some of the runs an error is detected. This in-
formation constitutes another column vector, the error vec-
tor. This vector corresponds to a hypothetical part of the
program that is responsible for all observed errors. Fault lo-
calization essentially consists in identifying the part whose
column vector resembles the error vector most.
In the eld of data clustering, resemblances between vec-
tors of binary, nominally scaled data, such as the columns
in our matrix of program spectra, are quantied by means
of similarity coefcients (see, e.g., [8]). As an example,
the Jaccard similarity coefcient (see also [8]) expresses the
similarity s
j
of column j and the error vector as the num-
ber of positions in which these vectors share an entry 1 (i.e.,
block was exercised and the run has failed), divided by this
same number plus the number of positions in which the vec-
tors have different entries:
s
j
=
a
11
(j)
a
11
(j)+a
01
(j)+a
10
(j)
(1)
where a
pq
(j)=|{i | x
ij
= p e
i
= q}|, and p, q {0, 1}.
Under the assumption that a high similarity to the error
vector indicates a high probability that the corresponding
parts of the software cause the detected errors, the calcu-
lated similarity coefcients rank the parts of the program
with respect to their likelihood of containing the faults.
To illustrate the approach, suppose that we apply the
RationalSort function to the input sequences I
1
=
#$, I
2
= #
1
4
$, I
3
= #
2
1
,
1
1
$ and I
4
= #
4
1
,
2
2
,
0
1
$, I
5
=
#
3
1
,
2
2
,
4
3
,
1
4
$, and I
6
= #
1
4
,
1
3
,
1
2
,
1
1
$.
I
1
, I
2
, and I
6
are already sorted, and lead to passed runs.
I
3
is not sorted, but the denominators in this sequence hap-
pen to be equal, in which case no error occurs. I
4
is the ex-
ample from Section 2.1: it is not sorted, and an error occurs
during its execution, but this error goes undetected. Only for
I
5
the program fails. The calculated result is #
1
1
,
2
2
,
4
3
,
3
4
$ in-
stead of #
1
4
,
2
2
,
4
3
,
3
1
$, which is a clear indication that an error
has occurred.
The block hit spectra for these runs are as follows (’1’
denotes a hit), where block 5 corresponds to the body of
the RationalGT function, which has not been shown in
Figure 1.
block
input
12345error
I
1
10000 0
I
2
11000 0
I
3
11111 0
I
4
11111 0
I
5
11111 1
I
6
11101 0
For this data, the calculated Jaccard coefcients are s
1
=
1
6
, s
2
=
1
5
, s
3
=
1
4
, s
4
=
1
3
, s
5
=
1
4
, which (correctly)
identies block 4 as the most likely location of the fault.
3 Relevance to Embedded Software
The effectiveness of the diagnosis technique described
in the previous section has already been demonstrated in
several articles (see, e.g., [1], [4], [9]). In this paper we
present the benets and discuss the issues specically re-
lated to debugging embedded software in consumer elec-
tronics products. Especially because of constraints imposed
by the market, the conditions under which this software is
developed are somewhat different from those for other soft-
ware products:
To reduce unit costs, and often to ensure portability
of the devices, the software runs on non-commodity
hardware, and computing resources are limited.
As a consequence, many facilities that developers of
non-embedded software have come to rely on are ab-
sent, or are available only in rudimentary forms. Ex-
amples are proling tools that give insight in the dy-
namic behavior of systems.
At the same time, the systems are highly concurrent,
and operate at a low level of abstraction from the hard-
ware. Therefore, their design and implementation are
complicated by factors that can largely be abstracted
away from in other software systems, such as dead-
lock prevention, and timing constraints involved in,
e.g., writing to the graphics display only in those frac-
tions of a second that the screen is not being refreshed.
On top of challenges that the entire software indus-
try has to deal with, such as geographically distributed
development organizations, the strong competition be-
tween manufacturers of consumer electronics makes it
absolutely vital that release deadlines are met.
Although important safety mechanisms, such as short-
circuit detection, are sometimes implemented in soft-
ware, for a large part of the functionality there are no
personal risks involved in transient failures.

Consequently, it is not uncommon that consumer elec-
tronics products are shipped with several known software
faults outstanding. To a certain extent, this also holds for
other software products, but the combination of the com-
plexity of the systems, the tight constraints imposed by the
market, and the relatively low impact of the majority of pos-
sible system failures creates a unique situation. Instead of
aiming for correctness, the goal is to create a product that is
of value to customers, despite its imperfections, and to bring
the reliability to a commercially acceptable level (also com-
pared to the competition) before a product must be released.
The technique of Section 2 can help to reach this goal
faster, and may thus reduce the time-to-market, and lead to
more reliable products. Specic benets are the following.
As a black-box diagnosis technique, it can be applied
without any additional modeling effort. This effort
would be hard to justify under the market conditions
described above. Moreover, concurrent systems are
difcult to model.
The technique improves insight in the run-time behav-
ior. For embedded software in consumer electronics,
this is often lacking, because of the concurrency, but
also because of the decentralized development.
We expect that the technique can easily be integrated
with existing testing procedures, such as overnight
playback of recorded usage scenarios. In addition to
the information that errors have occurred in some sce-
narios, this gives a rst indication of the parts of the
software that are likely to be involved in these errors.
In the large, geographically distributed development
organizations that we are dealing with, it may also help
to identify which teams of developers to contact.
Last but not least, the technique is light-weight, which
is relevant because of the non-commodity hardware
and limited computing resources. All that is needed is
some memory for storing program spectra, or for cal-
culating the similarity coefcients on the y (which re-
duces the space complexity from O(M ×N) to O(N ),
see Section 5). Proling tools such as gcov are conve-
nient for obtaining program spectra, but they are typ-
ically not available in a development environment for
embedded software. However, the same data can be
obtained through source code instrumentation.
While none of these benets are unique, their combination
makes program spectrum analysis an attractive technique
for diagnosing embedded software in consumer electronics.
4 Experiments
In this section we describe our experience with applying
the techniques of Section 2 to an industrial test case.
4.1 Platform
The subject of our experiments is the control software
in a particular product line of analog television sets. All
audio and video processing is implemented in hardware,
but the software is responsible for tasks such as decoding
remote control input, displaying the on-screen menu, and
coordinating the hardware (e.g., optimizing parameters for
audio and video processing based on an analysis of the sig-
nals). Most teletext
2
functionality is also implemented in
software.
The software itself consists of approximately 450K lines
of C code, which is congured from a much larger (several
MLOC) code base of Koala software components [12].
The control processor is a MIPS running a small multi-
tasking operating system. Essentially, the run-time environ-
ment consists of several threads with increasing priorities,
and for synchronization purposes, the work on these threads
is organized in 315 logical threads inside the various com-
ponents. Threads are preempted when work arrives for a
higher-priority thread.
The total available RAM memory in consumer sets is
two megabyte, but in the special developer version that we
used for our experiments, another two megabyte was avail-
able. In addition, the developer sets have a serial connec-
tion, and a debugger interface for manual debugging on a
PC.
4.2 Faults
We diagnosed two faults, one existing, and one that was
seeded to reproduce an error from a different product line.
Load Problem. A known problem with the specic version
of the control software that we had access to, is that after
teletext viewing, the CPU load when watching television
(TV mode) is approximately 10% higher than before tele-
text viewing. This is illustrated in Figure 3, which shows the
CPU load for the following scenario: one minute TV mode,
30 s teletext viewing, and one minute of TV mode. The
CPU load clearly increases around the 60th sample, when
the teletext viewing starts, but never returns to its initial
level after sample 90, when we switch back to TV mode.
Teletext Lock-up Problem. Another product line of televi-
sion sets provides a function for searching in teletext pages.
An existing fault in this functionality entails that searching
in a page without visible content locks up the teletext sys-
tem. A likely cause for the lock-up is an inconsistency in
the values of two state variables in different components,
2
A standard for broadcasting information (e.g., news, weather, TV
guide) in text pages, very popular in Europe.

0
20
40
60
80
100
0 20 40 60 80 100 120 140 160
Load %
Sample
Figure 3. CPU load measured per second
for which only specic combinations are allowed. We hard-
coded a remote control key-sequence that injects this error
on our test platform.
4.3 Implementation
We wrote a small Koala component for recording and
storing program spectra, and for transmitting them off the
television set via the serial connection. The transmission is
done on a low-priority thread while the CPU is otherwise
idle, in order to minimize the impact on the timing behav-
ior. Pending their transmission via the serial connection,
our component caches program spectra in the extra mem-
ory available in our developer version of the hardware.
For diagnosing the load problem we obtained hit spectra
for the logical threads mentioned in Section 4.1, resulting
in spectra of 315 binary ags. We approached the lock-
up problem at a much ner granularity, and obtained block
hit spectra for practically all blocks of code in the control
software, resulting in spectra of over 60,000 ags.
The hit spectra for the logical threads are obtained by
manually instrumenting a centralized scheduling mecha-
nism. For the block hit spectra we automatically instru-
mented the entire source code using the Front [2] parser
generator.
In Section 2.3 we use program spectra for different runs
of the software, but for embedded software in consumer
electronics, and indeed for most interactive systems, the
concept of a run is not very useful. Therefore we record
the spectra per transaction, instead of per run, and we use
two different notions of a transaction for the two different
faults that we diagnosed:
for the load problem, we use a periodic notion of a
transaction, and record the spectra per second.
for the lock-up problem, we dene a transaction as the
computation in between two key-presses on the remote
control.
4.4 Diagnosis
For the load problem we used the scenario of Figure 3.
We marked the last 60 spectra, for the second period of
TV mode as ‘failed, and those of earlier transactions as
‘passed. In the ranking that follows from the analysis of
Section 2.3, the logical thread that had been identied by
the developers as the actual cause of the load problem was
in the second position out of 315. In the rst position was a
logical thread related to teletext, whose activation is part of
the problem, so in this case we can conclude that although
the diagnosis is not perfect, the implied suggestion for in-
vestigating the problem is quite useful.
For the lock-up problem, we used a proper error detec-
tion mechanism. On each key-press, when caching the cur-
rent spectrum, a separate routine veries the values of the
two state variables, and marks the current spectrum as failed
if they assume an invalid combination. Although this is a
special-purpose mechanism, including and regularly check-
ing high-level assert-like statements about correct behavior
is a valid means to increase the error-awareness of systems.
Using a very simple scenario of 23 key-presses that es-
sentially (1) veries that the TV and teletext subsystems
function correctly, (2) triggers the error injection, and (3)
checks that the teletext subsystem is no longer responding,
we immediately got a good diagnosis of the detected error:
the rst two positions in the total ranking of over 60,000
blocks pointed directly to our error injection code. Adding
another three key-presses to exonerate an uncovered branch
in this code made the diagnosis perfect: the exact statement
that introduced the state inconsistency was located out of
approximately 450K lines of source code.
5 Discussion
Especially the results for the lock-up problem have con-
vinced us that program spectra, and their application to fault
diagnosis are a viable technique and useful tool in the area
of embedded software in consumer electronics. However,
there are a number of issues with our implementation.
First, we cannot claim that we have not altered the timing
behavior of the system. Because of its rigorous design, the
TV is still functioning properly, but everything runs much
slower with the block-level instrumentation (e.g., changing
channels now takes seconds). One reason is that currently,
we collect block count spectra at byte resolution, and con-
vert to block hit spectra off-line. Updating the counters in
a multi-threaded environment requires a critical section for
every executed block, which is hugely expensive. Fortu-
nately, this information is not used, and we believe we can
implement a binary ag update without a critical section.
Second, we cache the spectra of passed transactions, and
transmit them off the system during CPU idle time. Be-

Citations
More filters
Proceedings ArticleDOI
10 Sep 2007
TL;DR: This work investigates diagnostic accuracy as a function of several parameters (such as quality and quantity of the program spectra collected during the execution of the system), some of which directly relate to test design, and indicates that the superior performance of a particular similarity coefficient, used to analyze the programSpectrum-based fault localization, is largely independent of test design.
Abstract: Spectrum-based fault localization shortens the test- diagnose-repair cycle by reducing the debugging effort. As a light-weight automated diagnosis technique it can easily be integrated with existing testing schemes. However, as no model of the system is taken into account, its diagnostic accuracy is inherently limited. Using the Siemens Set benchmark, we investigate this diagnostic accuracy as a function of several parameters (such as quality and quantity of the program spectra collected during the execution of the system), some of which directly relate to test design. Our results indicate that the superior performance of a particular similarity coefficient, used to analyze the program spectra, is largely independent of test design. Furthermore, near- optimal diagnostic accuracy (exonerating about 80% of the blocks of code on average) is already obtained for low-quality error observations and limited numbers of test cases. The influence of the number of test cases is of primary importance for continuous (embedded) processing applications, where only limited observation horizons can be maintained.

686 citations


Cites background from "Diagnosis of Embedded Software Usin..."

  • ...It can easily be integrated with existing testing procedures, and because of the relatively small overhead with respect to CPU time and memory requirements, it lends itself well for application within resource-constrained environments [24]....

    [...]

  • ...In addition to our benchmark studies on the Siemens set, we have also evaluated spectrum-based fault localization on a large-scale industrial code (embedded software in consumer electronics, [24])....

    [...]

Journal ArticleDOI
TL;DR: This work investigates diagnostic accuracy as a function of several parameters (such as quality and quantity of the program spectra collected during the execution of the system) and shows that SFL can effectively be applied in the context of embedded software development in an industrial environment.

443 citations


Cites background from "Diagnosis of Embedded Software Usin..."

  • ...It can easily be integrated with existing testing procedures, and because of the relatively small overhead with respect to CPU time and memory requirements, it lends itself well for application within resource-constrained environments (Zoeteweij et al., 2007)....

    [...]

  • ...All rights reserved....

    [...]

Proceedings ArticleDOI
16 Nov 2009
TL;DR: Experimental results show that BARINEL typically outperforms current SFL approaches at a cost complexity that is only marginally higher, and this superiority is established by formal proof.
Abstract: Fault diagnosis approaches can generally be categorized into spectrum-based fault localization (SFL, correlating failures with abstractions of program traces), and model-based diagnosis (MBD, logic reasoning over a behavioral model). Although MBD approaches are inherently more accurate than SFL, their high computational complexity prohibits application to large programs. We present a framework to combine the best of both worlds, coined BARINEL. The program is modeled using abstractions of program traces (as in SFL) while Bayesian reasoning is used to deduce multiple-fault candidates and their probabilities (as in MBD). A particular feature of BARINEL is the usage of a probabilistic component model that accounts for the fact that faulty components may fail intermittently. Experimental results on both synthetic and real software programs show that BARINEL typically outperforms current SFL approaches at a cost complexity that is only marginally higher. In the context of single faults this superiority is established by formal proof.

353 citations


Cites methods from "Diagnosis of Embedded Software Usin..."

  • ...As an illustration, near-zero wasted effort is measured in experiments with SFL on a 0.5 MLOC industrial software product, reported in [ 40 ], where the problem reports (tests) typically focus on a particular anomaly (small C)....

    [...]

Journal ArticleDOI
TL;DR: This work shows that CC is prevalent in both of its forms and demonstrates that it is a safety reducing factor for Coverage-Based Fault Localization (CBFL), and proposes two techniques for cleansing test suites from coincidental correctness to enhance CBFL.
Abstract: Researchers have argued that for failure to be observed the following three conditions must be met: CR = the defect was reached; CI = the program has transitioned into an infectious state; and CP = the infection has propagated to the output. Coincidental Correctness (CC) arises when the program produces the correct output while condition CR is met but not CP. We recognize two forms of coincidental correctness, weak and strong. In weak CC, CR is met, whereas CI might or might not be met, whereas in strongCC, both CR and CI are met. In this work we first show that CC is prevalent in both of its forms and demonstrate that it is a safety reducing factor for Coverage-Based Fault Localization (CBFL). We then propose two techniques for cleansing test suites from coincidental correctness to enhance CBFL, given that the test cases have already been classified as failing or passing. We evaluated the effectiveness of our techniques by empirically quantifying their accuracy in identifying weak CC tests. The results were promising, for example, the better performing technique, using 105 test suites and statement coverage, exhibited 9p false negatives, 30p false positives, and no false negatives nor false positives in 14.3p of the test suites. Also using 73 test suites and more complex coverage, the numbers were 12p, 19p, and 15p, respectively.

87 citations

Proceedings ArticleDOI
15 Sep 2008
TL;DR: An empirical comparison is presented that investigates the relative accuracy of different models on a set of test programs and fault assumptions, showing that the abstract interpretation based model provides high accuracy at significantly less computational effort than slightly more accurate techniques.
Abstract: Developing model-based automatic debugging strategies has been an active research area for several years, with the aim of locating defects in a program by utilising fully automated generation of a model of the program from its source code. We provide an overview of current techniques in model-based debugging and assess strengths and weaknesses of the individual approaches. An empirical comparison is presented that investigates the relative accuracy of different models on a set of test programs and fault assumptions, showing that our abstract interpretation based model provides high accuracy at significantly less computational effort than slightly more accurate techniques. We compare a range of model-based debugging techniques with other state-of-the-art automated debugging approaches and outline possible future developments in automatic debugging using model-based reasoning as the central unifying component in a comprehensive framework.

81 citations


Cites methods from "Diagnosis of Embedded Software Usin..."

  • ..., with higher accuracy, (ii) integrating MBSD with spectrabased approaches to focus the debugging process [27], [28], and (iii) providing simple user interaction for incremental specification of complex program behaviour....

    [...]

References
More filters
01 Jan 1988

9,439 citations


"Diagnosis of Embedded Software Usin..." refers background in this paper

  • ...As an example, the Jaccard similarity coefficient (see also [8]) expresses the similarity sj of column j and the error vector as the number of positions in which these vectors share an entry 1 (i....

    [...]

Book
01 Jan 1988

8,586 citations

Journal ArticleDOI
TL;DR: The aim is to explicate a set of general concepts, of relevance across a wide range of situations and, therefore, helping communication and cooperation among a number of scientific and technical communities, including ones that are concentrating on particular types of system, of system failures, or of causes of systems failures.
Abstract: This paper gives the main definitions relating to dependability, a generic concept including a special case of such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Basic definitions are given first. They are then commented upon, and supplemented by additional definitions, which address the threats to dependability and security (faults, errors, failures), their attributes, and the means for their achievement (fault prevention, fault tolerance, fault removal, fault forecasting). The aim is to explicate a set of general concepts, of relevance across a wide range of situations and, therefore, helping communication and cooperation among a number of scientific and technical communities, including ones that are concentrating on particular types of system, of system failures, or of causes of system failures.

4,695 citations


"Diagnosis of Embedded Software Usin..." refers methods in this paper

  • ...As defined in [3], we use the following terminology....

    [...]

01 Jan 2007
TL;DR: In this paper, the main definitions relating to dependability, a generic concept including a special case of such attributes as reliability, availability, safety, integrity, maintainability, etc.
Abstract: This paper gives the main definitions relating to dependability, a generic concept including a special case of such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Basic definitions are given first. They are then commented upon, and supplemented by additional definitions, which address the threats to dependability and security (faults, errors, failures), their attributes, and the means for their achievement (fault prevention, fault tolerance, fault removal, fault forecasting). The aim is to explicate a set of general concepts, of relevance across a wide range of situations and, therefore, helping communication and cooperation among a number of scientific and technical communities, including ones that are concentrating on particular types of system, of system failures, or of causes of system failures.

4,335 citations

Journal ArticleDOI
TL;DR: The diagnostic procedure presented in this paper is model-based, inferring the behavior of the composite device from knowledge of the structure and function of the individual components comprising the device.

2,199 citations


"Diagnosis of Embedded Software Usin..." refers background in this paper

  • ..., [6]), where a diagnosis is obtained by logical inference from a formal model of the system, combined with a set of run-time observations....

    [...]

Frequently Asked Questions (13)
Q1. What have the authors contributed in "Diagnosis of embedded software using program spectra∗" ?

In this paper the authors discuss the application of a specific automated debugging technique, namely software fault localization through the analysis of program spectra, in the area of embedded software in high-volume consumer electronics products. The authors discuss why the technique is particularly well suited for this application domain, and through experiments on an industrial test case they demonstrate that it can lead to highly accurate diagnoses of realistic errors. 

All audio and video processing is implemented in hardware, but the software is responsible for tasks such as decoding remote control input, displaying the on-screen menu, and coordinating the hardware (e.g., optimizing parameters for audio and video processing based on an analysis of the signals). 

The transmission is done on a low-priority thread while the CPU is otherwise idle, in order to minimize the impact on the timing behavior. 

To sort their example array, three exchanges must be made, and block 4, the body of the conditional statement, is executed three times. 

For diagnosing the load problem the authors obtained hit spectra for the logical threads mentioned in Section 4.1, resulting in spectra of 315 binary flags. 

The total available RAM memory in consumer sets is two megabyte, but in the special developer version that the authors used for their experiments, another two megabyte was available. 

Especially the results for the lock-up problem have convinced us that program spectra, and their application to fault diagnosis are a viable technique and useful tool in the area of embedded software in consumer electronics. 

If an error detection mechanism is available, like in their experiments with the lock-up problem, then these four counters can be calculated on the fly, and the memory requirements become linear in the number columns in the matrix of Figure 2. 

The software itself consists of approximately 450K lines of C code, which is configured from a much larger (several MLOC) code base of Koala software components [12]. 

their design and implementation are complicated by factors that can largely be abstracted away from in other software systems, such as deadlock prevention, and timing constraints involved in, e.g., writing to the graphics display only in those fractions of a second that the screen is not being refreshed.• 

Profiling tools such as gcov are convenient for obtaining program spectra, but they are typically not available in a development environment for embedded software. 

A known problem with the specific version of the control software that the authors had access to, is that after teletext viewing, the CPU load when watching television (TV mode) is approximately 10% higher than before teletext viewing. 

The CPU load clearly increases around the 60th sample, when the teletext viewing starts, but never returns to its initial level after sample 90, when the authors switch back to TV mode.