Bug hunting in Linux kernel land: an unpretentious primer
Lately I've been fixing some defects in the hidraw interface in the Linux kernel (commits d20d5ff and e42dee9), simple issues like these NULL pointer dereference are a good example to show how to analyze a Linux bug trace in order to spot the defective code. So let's take a look.
The steps to fix any defect are basically:
- reproduce the problem (a quick introduction to the concept of “Observability” of software faults is in: Dissecting The Bug)
- find the offending code/policy
- elaborate a solution
- implement and test the solution is working
Reproducing the problem
If the problem is in some userspace/kernelspace interface then writing a test program which triggers the bug may be the quickest way to reproduce it. The test program should be as simple as possible, doing the minimum required to expose the defect, as it will be used also as a proof that the bug has been solved in the right way; just like when proving a theorem we don't want to have more hypotheses than are necessary. The test program for one of the issues above looks like this:
#include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <linux/hidraw.h> int main(int argc, char *argv[]) { int fd = -1; unsigned char report[2] = {0x01, 0x00}; int ret; if (argc != 2) { fprintf(stderr, "usage: %s </dev/hidrawX>\n", argv[0]); exit(1); } fd = open(argv[1], O_RDWR); if (fd < 0) { perror("hidraw open"); exit(1); } while (1) { ret = write(fd, report, sizeof(report)); printf("ret: %d\n", ret); } close(fd); exit(0); }
That gives me the occasion for an observation about one aspect of testing procedures: as you can see the test program above uses deliberately some bad practice, namely it does not check the return value of the write()
call in order to exit the while loop; we know this is generally bad from a programming point of view, however it turns out to be good from another point of view: bad programs can spot cases we didn't think of in our code paths; it goes without saying that this is no excuse for “uncontrolled” bad practices, nor for the bug we had in the first place :).
Anyhow, the code above exposes the defect on kernels up to 2.6.35, just run the program and unplug the (USB) device (a Sony Sixaxis in my case):
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffffa0f0a625>] hidraw_write+0x3b/0x116 [hid] PGD 37082067 PUD 3701f067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/usb4/idVendor CPU 0 Modules linked in: hid_sony usbhid hid kvm_amd kvm powernow_k8 mperf cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace ipt_MASQUERADE lirc_serial(C) lirc_dev bridge stp ppdev lp sco bnep rfcomm l2cap crc16 tun sit tunnel4 binfmt_misc uinput fuse ip6table_raw ip6table_mangle ip6t_REJECT ip6t_LOG nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp ipt_REJECT ipt_ULOG xt_limit xt_state xt_multiport iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw ip_tables x_tables nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc hwmon_vid loop snd_hda_codec_nvhdmi snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi nvidia(P) snd_seq_midi_event snd_seq snd_timer snd_seq_device btusb bluetooth joydev snd edac_core shpchp video tpm_tis tpm evdev rfkill parport_pc parport tpm_bios pcspkr asus_atk0110 processor output edac_mce_amd soundcore snd_page_alloc i2c_nforce2 pci_hotplug k8temp wmi i2c_core button ext3 jbd mbcache dm_mod sg sd_mod sr_mod cdrom crc_t10dif ata_generic usb_storage ahci libahci pata_amd ohci_hcd libata ehci_hcd scsi_mod usbcore forcedeth floppy nls_base thermal thermal_sys [last unloaded: hid] Pid: 4818, comm: test_hidraw_wri Tainted: P C 2.6.35-ao2 #2 M3N78-VM/System Product Name RIP: 0010:[<ffffffffa0f0a625>] [<ffffffffa0f0a625>] hidraw_write+0x3b/0x116 [hid] RSP: 0018:ffff880037081ee8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff880037081f58 RDX: 0000000100000000 RSI: 0000000000000008 RDI: ffffffffa0f128b0 RBP: 0000000000000001 R08: ffffffffa0f0a5ea R09: 000000000040087a R10: 0000000000000000 R11: ffffffff81150c83 R12: ffff880037081f58 R13: 00007fff1b34cab0 R14: 00000000ffffffed R15: 0000000000000000 FS: 00007fa946870700(0000) GS:ffff880001a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 000000005ad24000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process test_hidraw_wri (pid: 4818, threadinfo ffff880037080000, task ffff88006003e630) Stack: ffff88005ad95000 00007fff1b34cab0 ffff880037081f58 00007fff1b34cba0 <0> 0000000000000000 ffffffff810ea46f 0000000000000003 0000000000000008 <0> ffff88005ad95000 00007fff1b34cab0 00000000004005c0 ffffffff810ea57e Call Trace: [<ffffffff810ea46f>] ? vfs_write+0xa4/0x100 [<ffffffff810ea57e>] ? sys_write+0x45/0x6b [<ffffffff81008a02>] ? system_call_fastpath+0x16/0x1b Code: 53 48 8b 47 18 48 c7 c7 b0 28 f1 a0 48 89 d3 48 8b 40 10 8b 68 58 e8 07 9a 3f e0 81 e5 ff ff 0f 00 89 ed 48 8b 04 ed c0 2b f1 a0 <4c> 8b 60 28 49 83 bc 24 50 1c 00 00 00 0f 84 af 00 00 00 48 81 RIP [<ffffffffa0f0a625>] hidraw_write+0x3b/0x116 [hid] RSP <ffff880037081ee8> CR2: 0000000000000028 ---[ end trace 06cb2d01ded47cc4 ]---
The most important info in this message is highlighted above in line two, that is:
IP: [<ffffffffa0f0a625>] hidraw_write+0x3b/0x116 [hid]
This tells us the last instruction executed, the one triggering the bug; the format is symbol_name+offset/symbol_size [module_name]
as can be seen in mm/slab.c.
Finding the offending code
To find where the offending instruction is, we can disassemble the object code with objdump, here's the interesting function (click on the link below to expand the text):
00000000000008e5 <hidraw_write>: 8e5: 41 56 push %r14 8e7: 41 be ed ff ff ff mov $0xffffffed,%r14d 8ed: 41 55 push %r13 8ef: 49 89 f5 mov %rsi,%r13 8f2: 41 54 push %r12 8f4: 55 push %rbp 8f5: 53 push %rbx 8f6: 48 8b 47 18 mov 0x18(%rdi),%rax 8fa: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 901: 48 89 d3 mov %rdx,%rbx 904: 48 8b 40 10 mov 0x10(%rax),%rax 908: 8b 68 58 mov 0x58(%rax),%ebp 90b: e8 00 00 00 00 callq 910 <hidraw_write+0x2b> 910: 81 e5 ff ff 0f 00 and $0xfffff,%ebp 916: 89 ed mov %ebp,%ebp 918: 48 8b 04 ed 00 00 00 mov 0x0(,%rbp,8),%rax 91f: 00 920: 4c 8b 60 28 mov 0x28(%rax),%r12 924: 49 83 bc 24 30 1c 00 cmpq $0x0,0x1c30(%r12) 92b: 00 00 92d: 0f 84 af 00 00 00 je 9e2 <hidraw_write+0xfd> 933: 48 81 fb 00 10 00 00 cmp $0x1000,%rbx 93a: 76 25 jbe 961 <hidraw_write+0x7c> 93c: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 943: 00 00 945: 8b b0 c8 01 00 00 mov 0x1c8(%rax),%esi 94b: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 952: 31 c0 xor %eax,%eax 954: 41 b6 ea mov $0xea,%r14b 957: e8 00 00 00 00 callq 95c <hidraw_write+0x77> 95c: e9 81 00 00 00 jmpq 9e2 <hidraw_write+0xfd> 961: 48 83 fb 01 cmp $0x1,%rbx 965: 77 25 ja 98c <hidraw_write+0xa7> 967: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 96e: 00 00 970: 8b b0 c8 01 00 00 mov 0x1c8(%rax),%esi 976: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 97d: 31 c0 xor %eax,%eax 97f: 41 be ea ff ff ff mov $0xffffffea,%r14d 985: e8 00 00 00 00 callq 98a <hidraw_write+0xa5> 98a: eb 56 jmp 9e2 <hidraw_write+0xfd> 98c: be d0 00 00 00 mov $0xd0,%esi 991: 48 89 df mov %rbx,%rdi 994: 41 be f4 ff ff ff mov $0xfffffff4,%r14d 99a: e8 00 00 00 00 callq 99f <hidraw_write+0xba> 99f: 48 85 c0 test %rax,%rax 9a2: 48 89 c5 mov %rax,%rbp 9a5: 74 3b je 9e2 <hidraw_write+0xfd> 9a7: e8 00 00 00 00 callq 9ac <hidraw_write+0xc7> 9ac: 89 da mov %ebx,%edx 9ae: 4c 89 ee mov %r13,%rsi 9b1: 48 89 ef mov %rbp,%rdi 9b4: e8 00 00 00 00 callq 9b9 <hidraw_write+0xd4> 9b9: 48 85 c0 test %rax,%rax 9bc: 41 b6 f2 mov $0xf2,%r14b 9bf: 75 19 jne 9da <hidraw_write+0xf5> 9c1: b9 01 00 00 00 mov $0x1,%ecx 9c6: 48 89 da mov %rbx,%rdx 9c9: 48 89 ee mov %rbp,%rsi 9cc: 4c 89 e7 mov %r12,%rdi 9cf: 41 ff 94 24 30 1c 00 callq *0x1c30(%r12) 9d6: 00 9d7: 41 89 c6 mov %eax,%r14d 9da: 48 89 ef mov %rbp,%rdi 9dd: e8 00 00 00 00 callq 9e2 <hidraw_write+0xfd> 9e2: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 9e9: e8 00 00 00 00 callq 9ee <hidraw_write+0x109> 9ee: 5b pop %rbx 9ef: 5d pop %rbp 9f0: 41 5c pop %r12 9f2: 41 5d pop %r13 9f4: 49 63 c6 movslq %r14d,%rax 9f7: 41 5e pop %r14 9f9: c3 retq
From the first line above (00000000000008e5 <hidraw_write>
) we can see that the address of the hidraw_write()
function is 0x8e5
.
The instruction we are interested in is at hidraw_write+0x3b
which is at address 0x8e5+0x3b
that is 0x920
:
918: 48 8b 04 ed 00 00 00 mov 0x0(,%rbp,8),%rax 91f: 00 920: 4c 8b 60 28 mov 0x28(%rax),%r12 924: 49 83 bc 24 30 1c 00 cmpq $0x0,0x1c30(%r12) 92b:
Now, how do we find the correspondent instruction in the source code? If we don't want to do that the hard way we can use the -S
option of objdump like mentioned also in Documentation/BUG-HUNTING in the kernel tree, which interleaves source code with the disassembled listing, here's the excerpt we need:
dev = hidraw_table[minor]->hid; 916: 89 ed mov %ebp,%ebp 918: 48 8b 04 ed 00 00 00 mov 0x0(,%rbp,8),%rax 91f: 00 920: 4c 8b 60 28 mov 0x28(%rax),%r12
So in this case the problem is that the code is trying to dereference hidraw_table[minor]
, but this entry could be NULL if we disconnect the device while performing the write()
in a loop without checking return values.
Finding out the exact line could be a little harder when macros and inlined functions are used, but once we know how to deal with a simple scenario like the one proposed the complicated cases become someway doable too.
Elaborating a solution
The fix is trivial is this case: check the pointer is not NULL before dereferencing it, bail out otherwise.
Implementing and testing the solution
This is the actual patch which solves the issue:
diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c index 9eaf6ae..a3866b5 100644 --- a/drivers/hid/hidraw.c +++ b/drivers/hid/hidraw.c @@ -109,6 +109,12 @@ static ssize_t hidraw_write(struct file *file, const char __user *buffer, size_t int ret = 0; mutex_lock(&minors_lock); + + if (!hidraw_table[minor]) { + ret = -ENODEV; + goto out; + } + dev = hidraw_table[minor]->hid; if (!dev->hid_output_raw_report) {
Now, using the fixed hid module, if we run the test program again and disconnect the device we won't hurt the kernel anymore.
Comments
Wonderful. I never thought
Wonderful. I never thought about debugging the linux kernel, but knew it must be a pain in the ass. I just wonder where you took the '0x8e5' address to add '0x3b' (present in the error log) to have 0x920 as a final result. I really searched every piece of log and code you posted and ain't found out from where you took it off. I'm really interested in knowing how did you find out the exact point of the code who broke, and how do you convert it to the exact line present in the source code.
Thanks in advance!
-Fernando Cavendish
If you click on the line
If you click on the line which says:
The output of the command will be shown, and the first line here is:
That's where the
0x8e5
comes from, it's the address of thehidraw_write
function indrivers/hid/hidraw.o
.I fixed the article to mention that explicitly.
About finding the exact line, I used the
-S
option of objdump to interleave the source code with the assembly, it's a lot easier that way.Ciao, Antonio
Post new comment