breakpad/docs/client_design.md

13 KiB
Raw Blame History

Breakpad Client Libraries

Objective

The Breakpad client libraries are responsible for monitoring an application for crashes (exceptions), handling them when they occur by generating a dump, and providing a means to upload dumps to a crash reporting server. These tasks are divided between the “handler” (short for “exception handler”) library linked in to an application being monitored for crashes, and the “sender” library, intended to be linked in to a separate external program.

Background

As one of the chief tasks of the client handler is to generate a dump, an understanding of dump files will aid in understanding the handler.

Overview

Breakpad provides client libraries for each of its target platforms. Currently, these exist for Windows on x86 and Mac OS X on both x86 and PowerPC. A Linux implementation has been written and is currently under review.

Because the mechanisms for catching exceptions and the methods for obtaining the information that a dump contains vary between operating systems, each target operating system requires a completely different handler implementation. Where multiple CPUs are supported for a single operating system, the handler implementation will likely also require separate code for each processor type to extract CPU-specific information. One of the goals of the Breakpad handler is to provide a prepackaged cross-platform system that masks many of these system-level differences and quirks from the application developer. Although the underlying implementations differ, the handler library for each system follows the same set of principles and exposes a similar interface.

Code that wishes to take advantage of Breakpad should be linked against the handler library, and should, at an appropriate time, install a Breakpad handler. For applications, it is generally desirable to install the handler as early in the start-up process as possible. Developers of library code using Breakpad to monitor itself may wish to install a Breakpad handler when the library is loaded, or may only want to install a handler when calls are made in to the library.

The handler can be triggered to generate a dump either by catching an exception or at the request of the application itself. The latter case may be useful in debugging assertions or other conditions where developers want to know how a program got in to a specific non-crash state. After generating a dump, the handler calls a user-specified callback function. The callback function may collect additional data about the programs state, quit the program, launch a crash reporter application, or perform other tasks. Allowing for this functionality to be dictated by a callback function preserves flexibility.

The sender library is also has a separate implementation for each supported platform, because of the varying interfaces for accessing network resources on different operating systems. The sender transmits a dump along with other application-defined information to a crash report server via HTTP. Because dumps may contain sensitive data, the sender allows for the use of HTTPS.

The canonical example of the entire client system would be for a monitored application to link against the handler library, install a Breakpad handler from its main function, and provide a callback to launch a small crash reporter program. The crash reporter program would be linked against the sender library, and would send the crash dump when launched. A separate process is recommended for this function because of the unreliability inherent in doing any significant amount of work from a crashed process.

Detailed Design

Exception Handler Installation

The mechanisms for installing an exception handler vary between operating systems. On Windows, its a relatively simple matter of making one call to register a top-level exception filter callback function. On most Unix-like systems such as Linux, processes are informed of exceptions by the delivery of a signal, so an exception handler takes the form of a signal handler. The native mechanism to catch exceptions on Mac OS X requires a large amount of code to set up a Mach port, identify it as the exception port, and assign a thread to listen for an exception on that port. Just as the preparation of exception handlers differ, the manner in which they are called differs as well. On Windows and most Unix-like systems, the handler is called on the thread that caused the exception. On Mac OS X, the thread listening to the exception port is notified that an exception has occurred. The different implementations of the Breakpad handler libraries perform these tasks in the appropriate ways on each platform, while exposing a similar interface on each.

A Breakpad handler is embodied in an ExceptionHandler object. Because its a C++ object, ExceptionHandlers may be created as local variables, allowing them to be installed and removed as functions are called and return. This provides one possible way for a developer to monitor only a portion of an application for crashes.

Exception Basics

Once an application encounters an exception, it is in an indeterminate and possibly hazardous state. Consequently, any code that runs after an exception occurs must take extreme care to avoid performing operations that might fail, hang, or cause additional exceptions. This task is not at all straightforward, and the Breakpad handler library seeks to do it properly, accounting for all of the minute details while allowing other application developers, even those with little systems programming experience, to reap the benefits. All of the Breakpad handler code that executes after an exception occurs has been written according to the following guidelines for safety at exception time:

  • Use of the application heap is forbidden. The heap may be corrupt or otherwise unusable, and allocators may not function.
  • Resource allocation must be severely limited. The handler may create a new file to contain the dump, and it may attempt to launch a process to continue handling the crash.
  • Execution on the thread that caused the exception is significantly limited. The only code permitted to execute on this thread is the code necessary to transition handling to a dedicated preallocated handler thread, and the code to return from the exception handler.
  • Handlers shouldnt handle crashes by attempting to walk stacks themselves, as stacks may be in inconsistent states. Dump generation should be performed by interfacing with the operating systems memory manager and code module manager.
  • Library code, including runtime library code, must be avoided unless it provably meets the above guidelines. For example, this means that the STL string class may not be used, because it performs operations that attempt to allocate and use heap memory. It also means that many C runtime functions must be avoided, particularly on Windows, because of heap operations that they may perform.

A dedicated handler thread is used to preserve the state of the exception thread when an exception occurs: during dump generation, it is difficult if not impossible for a thread to accurately capture its own state. Performing all exception-handling functions on a separate thread is also critical when handling stack-limit-exceeded exceptions. It would be hazardous to run out of stack space while attempting to handle an exception. Because of the rule against allocating resources at exception time, the Breakpad handler library creates its handler thread when it installs its exception handler. On Mac OS X, this handler thread is created during the normal setup of the exception handler, and the handler thread will be signaled directly in the event of an exception. On Windows and Linux, the handler thread is signaled by a small amount of code that executes on the exception thread. Because the code that executes on the exception thread in this case is small and safe, this does not pose a problem. Even when an exception is caused by exceeding stack size limits, this code is sufficiently compact to execute entirely within the stacks guard page without causing an exception.

The handler thread may also be triggered directly by a user call, even when no exception occurs, to allow dumps to be generated at any point deemed interesting.

Filter Callback

When the handler thread begins handling an exception, it calls an optional user-defined filter callback function, which is responsible for judging whether Breakpads handler should continue handling the exception or not. This mechanism is provided for the benefit of library or plug-in code, whose developers may not be interested in reports of crashes that occur outside of their modules but within processes hosting their code. If the filter callback indicates that it is not interested in the exception, the Breakpad handler arranges for it to be delivered to any previously-installed handler.

Dump Generation

Assuming that the filter callback approves (or does not exist), the handler writes a dump in a directory specified by the application developer when the handler was installed, using a previously generated unique identifier to avoid name collisions. The mechanics of dump generation also vary between platforms, but in general, the process involves enumerating each thread of execution, and capturing its state, including processor context and the active portion of its stack area. The dump also includes a list of the code modules loaded in to the application, and an indicator of which thread generated the exception or requested the dump. In order to avoid allocating memory during this process, the dump is written in place on disk.

Post-Dump Behavior

Upon completion of writing the dump, a second callback function is called. This callback may be used to launch a separate crash reporting program or to collect additional data from the application. The callback may also be used to influence whether Breakpad will treat the exception as handled or unhandled. Even after a dump is successfully generated, Breakpad can be made to behave as though it didnt actually handle an exception. This function may be useful for developers who want to test their applications with Breakpad enabled but still retain the ability to use traditional debugging techniques. It also allows a Breakpad-enabled application to coexist with a platforms native crash reporting system, such as Mac OS X CrashReporter and Windows Error Reporting.

Typically, when Breakpad handles an exception fully and no debuggers are involved, the crashed process will terminate.

Authors of both callback functions that execute within a Breakpad handler are cautioned that their code will be run at exception time, and that as a result, they should observe the same programming practices that the Breakpad handler itself adheres to. Notably, if a callback is to be used to collect additional data from an application, it should take care to read only “safe” data. This might involve accessing only static memory locations that are updated periodically during the course of normal program execution.

Sender Library

The Breakpad sender library provides a single function to send a crash report to a crash server. It accepts a crash servers URL, a map of key-value parameters that will accompany the dump, and the path to a dump file itself. Each of the key-value parameters and the dump file are sent as distinct parts of a multipart HTTP POST request to the specified URL using the platforms native HTTP facilities. On Linux, libcurl is used for this function, as it is the closest thing to a standard HTTP library available on that platform.

Future Plans

Although weve had great success with in-process dump generation by following our guidelines for safe code at exception time, we are exploring options for allowing dumps to be generated in a separate process, to further enhance the handler librarys robustness.

On Windows, we intend to offer tools to make it easier for Breakpads settings to be managed by the native group policy management system.

We also plan to offer tools that many developers would find desirable in the context of handling crashes, such as a mechanism to determine at launch if the program last terminated in a crash, and a way to calculate “crashiness” in terms of crashes over time or the number of application launches between crashes.

We are also investigating methods to capture crashes that occur early in an applications launch sequence, including crashes that occur before a programs main function begins executing.