231 lines
13 KiB
Markdown
231 lines
13 KiB
Markdown
|
# Breakpad Processor Library
|
|||
|
|
|||
|
## Objective
|
|||
|
|
|||
|
The Breakpad processor library is an open-source framework to access the the
|
|||
|
information contained within crash dumps for multiple platforms, and to use that
|
|||
|
information to produce stack traces showing the call chain of each thread in a
|
|||
|
process. After processing, this data is made available to users of the library.
|
|||
|
|
|||
|
## Background
|
|||
|
|
|||
|
The Breakpad processor is intended to sit at the core of a comprehensive
|
|||
|
crash-reporting system that does not require debugging information to be
|
|||
|
provided to those running applications being monitored. Some existing
|
|||
|
crash-reporting systems, such as [GNOME](http://www.gnome.org/)’s Bug-Buddy and
|
|||
|
[Apple](http://www.apple.com/)’s [CrashReporter]
|
|||
|
(http://developer.apple.com/technotes/tn2004/tn2123.html), require symbolic
|
|||
|
information to be present on the end user’s computer; in the case of
|
|||
|
CrashReporter, the reports are transmitted only to Apple, not to third-party
|
|||
|
developers. Other systems, such as [Microsoft](http://www.microsoft.com/)’s
|
|||
|
[Windows Error Reporting](http://msdn.microsoft.com/isv/resources/wer/) and
|
|||
|
SupportSoft’s Talkback, transmit only a snapshot of a crashed process’ state,
|
|||
|
which can later be combined with symbolic debugging information without the need
|
|||
|
for it to be present on end users’ computers. Because symbolic debugging
|
|||
|
information consumes a large amount of space and is otherwise not needed during
|
|||
|
the normal operation of software, and because some developers are reluctant to
|
|||
|
release debugging symbols to their customers, Breakpad follows the latter
|
|||
|
approach.
|
|||
|
|
|||
|
We know of no currently-maintained crash-reporting systems that meet our
|
|||
|
requirements, which are to: * allow for symbols to be separate from the
|
|||
|
application, * handle crash reports from multiple platforms, * allow developers
|
|||
|
to operate their own crash-reporting platform, and to * be open-source. Windows
|
|||
|
Error Reporting only functions for Microsoft products, and requires the
|
|||
|
involvement of Microsoft’s servers. Talkback, while cross-platform, has not been
|
|||
|
maintained and at this point does not support Mac OS X on x86, which we consider
|
|||
|
to be a significant platform. Talkback is also closed-source commercial
|
|||
|
software, and has very specific requirements for its server platform.
|
|||
|
|
|||
|
We are aware of Windows-only crash-reporting systems that leverage Microsoft’s
|
|||
|
debugging interfaces. Such systems, even if extended to support dumps from other
|
|||
|
platforms, are tied to using Windows for at least a portion of the processor
|
|||
|
platform.
|
|||
|
|
|||
|
## Overview
|
|||
|
|
|||
|
The Breakpad processor itself is written in standard C++ and will work on a
|
|||
|
variety of platforms. The dumps it accepts may also have been created on a
|
|||
|
variety of systems. The library is able to combine dumps with symbolic debugging
|
|||
|
information to create stack traces that include function signatures. The
|
|||
|
processor library includes simple command-line tools to examine dumps and
|
|||
|
process them, producing stack traces. It also exposes several layers of APIs
|
|||
|
enabling crash-reporting systems to be built around the Breakpad processor.
|
|||
|
|
|||
|
## Detailed Design
|
|||
|
|
|||
|
### Dump Files
|
|||
|
|
|||
|
In the processor, the dump data is of primary significance. Dumps typically
|
|||
|
contain:
|
|||
|
|
|||
|
* CPU context (register data) as it was at the time the crash occurred, and an
|
|||
|
indication of which thread caused the crash. General-purpose registers are
|
|||
|
included, as are special-purpose registers such as the instruction pointer
|
|||
|
(program counter).
|
|||
|
* Information about each thread of execution within a crashed process,
|
|||
|
including:
|
|||
|
* The memory region used for each thread’s stack.
|
|||
|
* CPU context for each thread, which for various reasons is not the same
|
|||
|
as the crash context in the case of the crashed thread.
|
|||
|
* A list of loaded code segments (or modules), including:
|
|||
|
* The name of the file (`.so`, `.exe`, `.dll`, etc.) which provides the
|
|||
|
code.
|
|||
|
* The boundaries of the memory region in which the code segment is visible
|
|||
|
to the process.
|
|||
|
* A reference to the debugging information for the code module, when such
|
|||
|
information is available.
|
|||
|
|
|||
|
Ordinarily, dumps are produced as a result of a crash, but other triggers may be
|
|||
|
set to produce dumps at any time a developer deems appropriate. The Breakpad
|
|||
|
processor can handle dumps in the minidump format, either generated by an
|
|||
|
[Breakpad client “handler”](client_design.md) implementation, or by another
|
|||
|
implementation that produces dumps in this format. The
|
|||
|
[DbgHelp.dll!MiniDumpWriteDump]
|
|||
|
(http://msdn2.microsoft.com/en-us/library/ms680360.aspx) function on Windows
|
|||
|
produces dumps in this format, and is the basis for the Breakpad handler
|
|||
|
implementation on that platform.
|
|||
|
|
|||
|
The [minidump format]
|
|||
|
(http://msdn.microsoft.com/en-us/library/ms679293%28VS.85%29.aspx) is
|
|||
|
essentially a simple container format, organized as a series of streams. Each
|
|||
|
stream contains some type of data relevant to the crash. A typical “normal”
|
|||
|
minidump contains streams for the thread list, the module list, the CPU context
|
|||
|
at the time of the crash, and various bits of additional system information.
|
|||
|
Other types of minidump can be generated, such as a full-memory minidump, which
|
|||
|
in addition to stack memory contains snapshots of all of a process’ mapped
|
|||
|
memory regions.
|
|||
|
|
|||
|
The minidump format was chosen as Breakpad’s dump format because it has an
|
|||
|
established track record on Windows, and it can be adapted to meet the needs of
|
|||
|
the other platforms that Breakpad supports. Most other operating systems use
|
|||
|
“core” files as their native dump formats, but the capabilities of core files
|
|||
|
vary across platforms, and because core files are usually presented in a
|
|||
|
platform’s native executable format, there are complications involved in
|
|||
|
accessing the data contained therein without the benefit of the header files
|
|||
|
that define an executable format’s entire structure. Because minidumps are
|
|||
|
leaner than a typical executable format, a redefinition of the format in a
|
|||
|
cross-platform header file, `minidump_format.h`, was a straightforward task.
|
|||
|
Similarly, the capabilities of the minidump format are understood, and because
|
|||
|
it provides an extensible container, any of Breakpad’s needs that could not be
|
|||
|
met directly by the standard minidump format could likely be met by extending it
|
|||
|
as needed. Finally, using this format means that the dump file is compatible
|
|||
|
with native debugging tools at least on Windows. A possible future avenue for
|
|||
|
exploration is the conversion of minidumps to core files, to enable this same
|
|||
|
benefit on other platforms.
|
|||
|
|
|||
|
We have already provided an extension to the minidump format that allows it to
|
|||
|
carry dumps generated on systems with PowerPC processors. The format already
|
|||
|
allows for variable CPUs, so our work in this area was limited to defining a
|
|||
|
context structure sufficient to represent the execution state of a PowerPC. We
|
|||
|
have also defined an extension that allows minidumps to indicate which thread of
|
|||
|
execution requested a dump be produced for non-crash dumps.
|
|||
|
|
|||
|
Often, the information contained within a dump alone is sufficient to produce a
|
|||
|
full stack backtrace for each thread. Certain optimizations that compilers
|
|||
|
employ in producing code frustrate this process. Specifically, the “frame
|
|||
|
pointer omission” optimization of x86 compilers can make it impossible to
|
|||
|
produce useful stack traces given only a stack snapshot and CPU context. In
|
|||
|
these cases, however, compiler-emitted debugging information can aid in
|
|||
|
producing useful stack traces. The Breakpad processor is able to take advantage
|
|||
|
of this debugging information as supplied by Microsoft’s C/C++ compiler, the
|
|||
|
only compiler to apply such optimizations by default. As a result, the Breakpad
|
|||
|
processor can produce useful stack traces even from code with frame pointer
|
|||
|
omission optimizations as produced by this compiler.
|
|||
|
|
|||
|
### Symbol Files
|
|||
|
|
|||
|
The [symbol files](symbol_files.md) that the Breakpad processor accepts allow
|
|||
|
for frame pointer omission data, but this is only one of their capabilities.
|
|||
|
Each symbol file also includes information about the functions, source files,
|
|||
|
and source code line numbers for a single module of code. A module is an
|
|||
|
individually-loadble chunk of code: these can be executables containing a main
|
|||
|
program (`exe` files on Windows) or shared libraries (`.so` files on Linux,
|
|||
|
`.dylib` files, frameworks, and bundles on Mac OS X, and `.dll` files on
|
|||
|
Windows). Dumps contain information about which of these modules were loaded at
|
|||
|
the time the dump was produced, and given this information, the Breakpad
|
|||
|
processor attempts to locate debugging symbols for the module through a
|
|||
|
user-supplied function embodied in a “symbol supplier.” Breakpad includes a
|
|||
|
sample symbol supplier, called `SimpleSymbolSupplier`, that is used by its
|
|||
|
command-line tools; this supplier locates symbol files by pathname.
|
|||
|
`SimpleSymbolSupplier` is also available to other users of the Breakpad
|
|||
|
processor library. This allows for the use of a simple reference implementation,
|
|||
|
but preserves flexibility for users who may have more demanding symbol file
|
|||
|
storage needs.
|
|||
|
|
|||
|
Breakpad’s symbol file format is text-based, and was defined to be fairly
|
|||
|
human-readable and to encompass the needs of multiple platforms. The Breakpad
|
|||
|
processor itself does not operate directly with native symbol formats ([DWARF]
|
|||
|
(http://dwarf.freestandards.org/) and [STABS]
|
|||
|
(http://sourceware.org/gdb/current/onlinedocs/stabs.html) on most Unix-like
|
|||
|
systems, [.pdb files]
|
|||
|
(http://msdn2.microsoft.com/en-us/library/yd4f8bd1(VS.80).aspx) on Windows),
|
|||
|
because of the complications in accessing potentially complex symbol formats
|
|||
|
with slight variations between platforms, stored within different types of
|
|||
|
binary formats. In the case of `.pdb` files, the debugging format is not even
|
|||
|
documented. Instead, Breakpad’s symbol files are produced on each platform,
|
|||
|
using specific debugging APIs where available, to convert native symbols to
|
|||
|
Breakpad’s cross-platform format.
|
|||
|
|
|||
|
### Processing
|
|||
|
|
|||
|
Most commonly, a developer will enable an application to use Breakpad by
|
|||
|
building it with a platform-specific [client “handler”](client_design.md)
|
|||
|
library. After building the application, the developer will create symbol files
|
|||
|
for Breakpad’s use using the included `dump_syms` or `symupload` tools, or
|
|||
|
another suitable tool, and place the symbol files where the processor’s symbol
|
|||
|
supplier will be able to locate them.
|
|||
|
|
|||
|
When a dump file is given to the processor’s `MinidumpProcessor` class, it will
|
|||
|
read it using its included minidump reader, contained in the `Minidump` family
|
|||
|
of classes. It will collect information about the operating system and CPU that
|
|||
|
produced the dump, and determine whether the dump was produced as a result of a
|
|||
|
crash or at the direct request of the application itself. It then loops over all
|
|||
|
of the threads in a process, attempting to walk the stack associated with each
|
|||
|
thread. This process is achieved by the processor’s `Stackwalker` components, of
|
|||
|
which there are a slightly different implementations for each CPU type that the
|
|||
|
processor is able to handle dumps from. Beginning with a thread’s context, and
|
|||
|
possibly using debugging data, the stackwalker produces a list of stack frames,
|
|||
|
containing each instruction executed in the chain. These instructions are
|
|||
|
matched up with the modules that contributed them to a process, and the
|
|||
|
`SymbolSupplier` is invoked to locate a symbol file. The symbol file is given to
|
|||
|
a `SourceLineResolver`, which matches the instruction up with a specific
|
|||
|
function name, source file, and line number, resulting in a representation of a
|
|||
|
stack frame that can easily be used to identify which code was executing.
|
|||
|
|
|||
|
The results of processing are made available in a `ProcessState` object, which
|
|||
|
contains a vector of threads, each containing a vector of stack frames.
|
|||
|
|
|||
|
For small-scale use of the Breakpad processor, and for testing and debugging,
|
|||
|
the `minidump_stackwalk` tool is provided. It invokes the processor and displays
|
|||
|
the full results of processing, optionally allowing symbols to be provided to
|
|||
|
the processor by a pathname-based symbol supplier, `SimpleSymbolSupplier`.
|
|||
|
|
|||
|
For lower-level testing and debugging, the processor library also includes a
|
|||
|
`minidump_dump` tool, which walks through an entire minidump file and displays
|
|||
|
its contents in somewhat readable form.
|
|||
|
|
|||
|
### Platform Support
|
|||
|
|
|||
|
The Breakpad processor library is able to process dumps produced on Mac OS X
|
|||
|
systems running on x86, x86-64, and PowerPC processors, on Windows and Linux
|
|||
|
systems running on x86 or x86-64 processors, and on Android systems running ARM
|
|||
|
or x86 processors. The processor library itself is written in standard C++, and
|
|||
|
should function properly in most Unix-like environments. It has been tested on
|
|||
|
Linux and Mac OS X.
|
|||
|
|
|||
|
## Future Plans
|
|||
|
|
|||
|
There are currently no firm plans or timetables to implement any of these
|
|||
|
features, although they are possible avenues for future exploration.
|
|||
|
|
|||
|
The symbol file format can be extended to carry information about the locations
|
|||
|
of parameters and local variables as stored in stack frames and registers, and
|
|||
|
the processor can use this information to provide enhanced stack traces showing
|
|||
|
function arguments and variable values.
|
|||
|
|
|||
|
On Mac OS X and Linux, we can provide tools to convert files from the minidump
|
|||
|
format into the native core format. This will enable developers to open dump
|
|||
|
files in a native debugger, just as they are presently able to do with minidumps
|
|||
|
on Windows.
|