mac dump_syms: Support .dSYMs > 4GB (partially)

Even 64-bit Mach-O (MH_MAGIC_64 = 0xfeedfacf) is not a fully 64-bit file
format. File offsets in sections are stored in 32-bit fields, with
Mach-O writers typically truncating offsets too large to fit to just
their low 32 bits. When a section begins at a file offset >= 4GB,
dump_syms would produce an error such as:

Google Chrome Framework.dSYM/Contents/Resources/DWARF/Google Chrome Framework: the section '__apple_names' in segment '__DWARF' claims its contents lie outside the segment's contents

As a workaround, this implements the strategy I first described in
https://crbug.com/940823#c22.

Segment file offsets are stored in 64-bit fields. Because segments
contain sections and must load contiguously, it’s possible to infer a
section’s actual offset by computing its load address relative to its
containing segment’s load address, and treating this as an offset into
the containing segment’s file offset. For safety, this is only done for
64-bit segments (LC_SEGMENT_64) where the 32-bit section offset stored
in the Mach-O file is equal to the low (truncated) 32 bits of the
section offset recomputed per the above strategy.

Beware that this does not provide full “large file” support for 64-bit
Mach-O files. There are other file offsets within Mach-O files aside
from section file offsets that are stored in 32-bit fields even in the
64-bit format, including offsets to symbol table data (LC_SYMTAB and
LC_DYSYMTAB). No attempt is made to recover correct file offsets for
such data because, at present, such data is always stored by dsymutil
near the beginning of .dSYM files, within the first 4GB. If it becomes
necessary to address these other offsets, it should be possible to
recover these offsets by reference to the __LINKEDIT segment that
normally contains them, provided that __LINKEDIT doesn’t span more than
4GB, according to the strategy discussed at the bottom of
https://crbug.com/940823#c22.

Although this is sufficient to allow dump_syms to interpret Chromium
.dSYM files that exceed 4GB, be warned that these Mach-O files are still
technically malformed, and most other tools that consume Mach-O files
will continue to have difficulties interpreting these large files.

As further warning, note that should any individual DWARF section exceed
4GB, internal section offsets will be truncated irrecoverably, unless
and until the toolchain implements support for DWARF64.
https://bugs.llvm.org/show_bug.cgi?id=14969

With this change, dump_syms is able to correctly recover file offsets
from and continue processing a .dSYM file with length 4530593528
(4321MB), whose largest section (__DWARF,__debug_info = .debug_info) has
size 0x8d64c0b8 (2262MB), and which contains four sections (starting
with __DWARF,__apple_names) beginning at file offsets >= 4GB.

Bug: chromium:940823, chromium:946404
Change-Id: I23f5f3b07773fa2f010204d5bb53b6fb1d4926f7
Reviewed-on: https://chromium-review.googlesource.com/c/breakpad/breakpad/+/1541830
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>
This commit is contained in:
Mark Mentovai 2019-03-28 16:07:39 -04:00
parent a86aedb515
commit b4a0eb2d06
2 changed files with 34 additions and 10 deletions

View file

@ -38,6 +38,8 @@
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <limits>
// Unfortunately, CPU_TYPE_ARM is not define for 10.4. // Unfortunately, CPU_TYPE_ARM is not define for 10.4.
#if !defined(CPU_TYPE_ARM) #if !defined(CPU_TYPE_ARM)
#define CPU_TYPE_ARM 12 #define CPU_TYPE_ARM 12
@ -344,8 +346,8 @@ bool Reader::WalkLoadCommands(Reader::LoadCommandHandler *handler) const {
cursor cursor
.Read(word_size, false, &segment.vmaddr) .Read(word_size, false, &segment.vmaddr)
.Read(word_size, false, &segment.vmsize) .Read(word_size, false, &segment.vmsize)
.Read(word_size, false, &file_offset) .Read(word_size, false, &segment.fileoff)
.Read(word_size, false, &file_size); .Read(word_size, false, &segment.filesize);
cursor >> segment.maxprot cursor >> segment.maxprot
>> segment.initprot >> segment.initprot
>> segment.nsects >> segment.nsects
@ -354,8 +356,8 @@ bool Reader::WalkLoadCommands(Reader::LoadCommandHandler *handler) const {
reporter_->LoadCommandTooShort(index, type); reporter_->LoadCommandTooShort(index, type);
return false; return false;
} }
if (file_offset > buffer_.Size() || if (segment.fileoff > buffer_.Size() ||
file_size > buffer_.Size() - file_offset) { segment.filesize > buffer_.Size() - segment.fileoff) {
reporter_->MisplacedSegmentData(segment.name); reporter_->MisplacedSegmentData(segment.name);
return false; return false;
} }
@ -363,11 +365,11 @@ bool Reader::WalkLoadCommands(Reader::LoadCommandHandler *handler) const {
// segments removed, and their file offsets and file sizes zeroed // segments removed, and their file offsets and file sizes zeroed
// out. To help us handle this special case properly, give such // out. To help us handle this special case properly, give such
// segments' contents NULL starting and ending pointers. // segments' contents NULL starting and ending pointers.
if (file_offset == 0 && file_size == 0) { if (segment.fileoff == 0 && segment.filesize == 0) {
segment.contents.start = segment.contents.end = NULL; segment.contents.start = segment.contents.end = NULL;
} else { } else {
segment.contents.start = buffer_.start + file_offset; segment.contents.start = buffer_.start + segment.fileoff;
segment.contents.end = segment.contents.start + file_size; segment.contents.end = segment.contents.start + segment.filesize;
} }
// The section list occupies the remainder of this load command's space. // The section list occupies the remainder of this load command's space.
segment.section_list.start = cursor.here(); segment.section_list.start = cursor.here();
@ -461,14 +463,14 @@ bool Reader::WalkSegmentSections(const Segment &segment,
for (size_t i = 0; i < segment.nsects; i++) { for (size_t i = 0; i < segment.nsects; i++) {
Section section; Section section;
section.bits_64 = segment.bits_64; section.bits_64 = segment.bits_64;
uint64_t size; uint64_t size, offset;
uint32_t offset, dummy32; uint32_t dummy32;
cursor cursor
.CString(&section.section_name, 16) .CString(&section.section_name, 16)
.CString(&section.segment_name, 16) .CString(&section.segment_name, 16)
.Read(word_size, false, &section.address) .Read(word_size, false, &section.address)
.Read(word_size, false, &size) .Read(word_size, false, &size)
>> offset .Read(sizeof(uint32_t), false, &offset) // clears high bits of |offset|
>> section.align >> section.align
>> dummy32 >> dummy32
>> dummy32 >> dummy32
@ -481,6 +483,24 @@ bool Reader::WalkSegmentSections(const Segment &segment,
reporter_->SectionsMissing(segment.name); reporter_->SectionsMissing(segment.name);
return false; return false;
} }
// Even 64-bit Mach-O isnt a true 64-bit format in that it doesnt handle
// 64-bit file offsets gracefully. Segment load commands do contain 64-bit
// file offsets, but sections within do not. Because segments load
// contiguously, recompute each sections file offset on the basis of its
// containing segments file offset and the difference between the sections
// and segments load addresses. If truncation is detected, honor the
// recomputed offset.
if (segment.bits_64 &&
segment.fileoff + segment.filesize >
std::numeric_limits<uint32_t>::max()) {
const uint64_t section_offset_recomputed =
segment.fileoff + section.address - segment.vmaddr;
if (offset == static_cast<uint32_t>(section_offset_recomputed)) {
offset = section_offset_recomputed;
}
}
const uint32_t section_type = section.flags & SECTION_TYPE; const uint32_t section_type = section.flags & SECTION_TYPE;
if (section_type == S_ZEROFILL || section_type == S_THREAD_LOCAL_ZEROFILL || if (section_type == S_ZEROFILL || section_type == S_THREAD_LOCAL_ZEROFILL ||
section_type == S_GB_ZEROFILL) { section_type == S_GB_ZEROFILL) {

View file

@ -175,6 +175,10 @@ struct Segment {
// of this value are valid. // of this value are valid.
uint64_t vmsize; uint64_t vmsize;
// The file offset and size of the segment in the Mach-O image.
uint64_t fileoff;
uint64_t filesize;
// The maximum and initial VM protection of this segment's contents. // The maximum and initial VM protection of this segment's contents.
uint32_t maxprot; uint32_t maxprot;
uint32_t initprot; uint32_t initprot;