Bug hunting in Linux kernel land: an unpretentious primer

Lately I've been fixing some defects in the hidraw interface in the Linux kernel (commits d20d5ff and e42dee9), simple issues like these NULL pointer dereference are a good example to show how to analyze a Linux bug trace in order to spot the defective code. So let's take a look.

The steps to fix any defect are basically:

  1. reproduce the problem (a quick introduction to the concept of “Observability” of software faults is in: Dissecting The Bug)
  2. find the offending code/policy
  3. elaborate a solution
  4. implement and test the solution is working

Reproducing the problem

If the problem is in some userspace/kernelspace interface then writing a test program which triggers the bug may be the quickest way to reproduce it. The test program should be as simple as possible, doing the minimum required to expose the defect, as it will be used also as a proof that the bug has been solved in the right way; just like when proving a theorem we don't want to have more hypotheses than are necessary. The test program for one of the issues above looks like this:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#include <linux/hidraw.h>

int main(int argc, char *argv[])
{
	int fd = -1;
	unsigned char report[2] = {0x01, 0x00};
	int ret;

	if (argc != 2) {
		fprintf(stderr, "usage: %s </dev/hidrawX>\n", argv[0]);
		exit(1);
	}

	fd = open(argv[1], O_RDWR);
	if (fd < 0) {
		perror("hidraw open");
		exit(1);
	}

	while (1) {
		ret = write(fd, report, sizeof(report));
		printf("ret: %d\n", ret);
	}

	close(fd);
	exit(0);
}

That gives me the occasion for an observation about one aspect of testing procedures: as you can see the test program above uses deliberately some bad practice, namely it does not check the return value of the write() call in order to exit the while loop; we know this is generally bad from a programming point of view, however it turns out to be good from another point of view: bad programs can spot cases we didn't think of in our code paths; it goes without saying that this is no excuse for “uncontrolled” bad practices, nor for the bug we had in the first place :).

Anyhow, the code above exposes the defect on kernels up to 2.6.35, just run the program and unplug the (USB) device (a Sony Sixaxis in my case):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa0f0a625>] hidraw_write+0x3b/0x116 [hid]
PGD 37082067 PUD 3701f067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/usb4/idVendor
CPU 0 
Modules linked in: hid_sony usbhid hid kvm_amd kvm powernow_k8 mperf cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace ipt_MASQUERADE lirc_serial(C) lirc_dev bridge stp ppdev lp sco bnep rfcomm l2cap crc16 tun sit tunnel4 binfmt_misc uinput fuse ip6table_raw ip6table_mangle ip6t_REJECT ip6t_LOG nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp ipt_REJECT ipt_ULOG xt_limit xt_state xt_multiport iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw ip_tables x_tables nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc hwmon_vid loop snd_hda_codec_nvhdmi snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi nvidia(P) snd_seq_midi_event snd_seq snd_timer snd_seq_device btusb bluetooth joydev snd edac_core shpchp video tpm_tis tpm evdev rfkill parport_pc parport tpm_bios pcspkr asus_atk0110 processor output edac_mce_amd soundcore snd_page_alloc i2c_nforce2 pci_hotplug k8temp wmi i2c_core button ext3 jbd mbcache dm_mod sg sd_mod sr_mod cdrom crc_t10dif ata_generic usb_storage ahci libahci pata_amd ohci_hcd libata ehci_hcd scsi_mod usbcore forcedeth floppy nls_base thermal thermal_sys [last unloaded: hid]

Pid: 4818, comm: test_hidraw_wri Tainted: P         C  2.6.35-ao2 #2 M3N78-VM/System Product Name
RIP: 0010:[<ffffffffa0f0a625>]  [<ffffffffa0f0a625>] hidraw_write+0x3b/0x116 [hid]
RSP: 0018:ffff880037081ee8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff880037081f58
RDX: 0000000100000000 RSI: 0000000000000008 RDI: ffffffffa0f128b0
RBP: 0000000000000001 R08: ffffffffa0f0a5ea R09: 000000000040087a
R10: 0000000000000000 R11: ffffffff81150c83 R12: ffff880037081f58
R13: 00007fff1b34cab0 R14: 00000000ffffffed R15: 0000000000000000
FS:  00007fa946870700(0000) GS:ffff880001a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 000000005ad24000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process test_hidraw_wri (pid: 4818, threadinfo ffff880037080000, task ffff88006003e630)
Stack:
 ffff88005ad95000 00007fff1b34cab0 ffff880037081f58 00007fff1b34cba0
<0> 0000000000000000 ffffffff810ea46f 0000000000000003 0000000000000008
<0> ffff88005ad95000 00007fff1b34cab0 00000000004005c0 ffffffff810ea57e
Call Trace:
 [<ffffffff810ea46f>] ? vfs_write+0xa4/0x100
 [<ffffffff810ea57e>] ? sys_write+0x45/0x6b
 [<ffffffff81008a02>] ? system_call_fastpath+0x16/0x1b
Code: 53 48 8b 47 18 48 c7 c7 b0 28 f1 a0 48 89 d3 48 8b 40 10 8b 68 58 e8 07 9a 3f e0 81 e5 ff ff 0f 00 89 ed 48 8b 04 ed c0 2b f1 a0 <4c> 8b 60 28 49 83 bc 24 50 1c 00 00 00 0f 84 af 00 00 00 48 81 
RIP  [<ffffffffa0f0a625>] hidraw_write+0x3b/0x116 [hid]
 RSP <ffff880037081ee8>
CR2: 0000000000000028
---[ end trace 06cb2d01ded47cc4 ]---

The most important info in this message is highlighted above in line two, that is:

IP: [<ffffffffa0f0a625>] hidraw_write+0x3b/0x116 [hid]

This tells us the last instruction executed, the one triggering the bug; the format is symbol_name+offset/symbol_size [module_name] as can be seen in mm/slab.c.

Finding the offending code

To find where the offending instruction is, we can disassemble the object code with objdump, here's the interesting function (click on the link below to expand the text):

00000000000008e5 <hidraw_write>:
 8e5:	41 56                	push   %r14
 8e7:	41 be ed ff ff ff    	mov    $0xffffffed,%r14d
 8ed:	41 55                	push   %r13
 8ef:	49 89 f5             	mov    %rsi,%r13
 8f2:	41 54                	push   %r12
 8f4:	55                   	push   %rbp
 8f5:	53                   	push   %rbx
 8f6:	48 8b 47 18          	mov    0x18(%rdi),%rax
 8fa:	48 c7 c7 00 00 00 00 	mov    $0x0,%rdi
 901:	48 89 d3             	mov    %rdx,%rbx
 904:	48 8b 40 10          	mov    0x10(%rax),%rax
 908:	8b 68 58             	mov    0x58(%rax),%ebp
 90b:	e8 00 00 00 00       	callq  910 <hidraw_write+0x2b>
 910:	81 e5 ff ff 0f 00    	and    $0xfffff,%ebp
 916:	89 ed                	mov    %ebp,%ebp
 918:	48 8b 04 ed 00 00 00 	mov    0x0(,%rbp,8),%rax
 91f:	00 
 920:	4c 8b 60 28          	mov    0x28(%rax),%r12
 924:	49 83 bc 24 30 1c 00 	cmpq   $0x0,0x1c30(%r12)
 92b:	00 00 
 92d:	0f 84 af 00 00 00    	je     9e2 <hidraw_write+0xfd>
 933:	48 81 fb 00 10 00 00 	cmp    $0x1000,%rbx
 93a:	76 25                	jbe    961 <hidraw_write+0x7c>
 93c:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
 943:	00 00 
 945:	8b b0 c8 01 00 00    	mov    0x1c8(%rax),%esi
 94b:	48 c7 c7 00 00 00 00 	mov    $0x0,%rdi
 952:	31 c0                	xor    %eax,%eax
 954:	41 b6 ea             	mov    $0xea,%r14b
 957:	e8 00 00 00 00       	callq  95c <hidraw_write+0x77>
 95c:	e9 81 00 00 00       	jmpq   9e2 <hidraw_write+0xfd>
 961:	48 83 fb 01          	cmp    $0x1,%rbx
 965:	77 25                	ja     98c <hidraw_write+0xa7>
 967:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
 96e:	00 00 
 970:	8b b0 c8 01 00 00    	mov    0x1c8(%rax),%esi
 976:	48 c7 c7 00 00 00 00 	mov    $0x0,%rdi
 97d:	31 c0                	xor    %eax,%eax
 97f:	41 be ea ff ff ff    	mov    $0xffffffea,%r14d
 985:	e8 00 00 00 00       	callq  98a <hidraw_write+0xa5>
 98a:	eb 56                	jmp    9e2 <hidraw_write+0xfd>
 98c:	be d0 00 00 00       	mov    $0xd0,%esi
 991:	48 89 df             	mov    %rbx,%rdi
 994:	41 be f4 ff ff ff    	mov    $0xfffffff4,%r14d
 99a:	e8 00 00 00 00       	callq  99f <hidraw_write+0xba>
 99f:	48 85 c0             	test   %rax,%rax
 9a2:	48 89 c5             	mov    %rax,%rbp
 9a5:	74 3b                	je     9e2 <hidraw_write+0xfd>
 9a7:	e8 00 00 00 00       	callq  9ac <hidraw_write+0xc7>
 9ac:	89 da                	mov    %ebx,%edx
 9ae:	4c 89 ee             	mov    %r13,%rsi
 9b1:	48 89 ef             	mov    %rbp,%rdi
 9b4:	e8 00 00 00 00       	callq  9b9 <hidraw_write+0xd4>
 9b9:	48 85 c0             	test   %rax,%rax
 9bc:	41 b6 f2             	mov    $0xf2,%r14b
 9bf:	75 19                	jne    9da <hidraw_write+0xf5>
 9c1:	b9 01 00 00 00       	mov    $0x1,%ecx
 9c6:	48 89 da             	mov    %rbx,%rdx
 9c9:	48 89 ee             	mov    %rbp,%rsi
 9cc:	4c 89 e7             	mov    %r12,%rdi
 9cf:	41 ff 94 24 30 1c 00 	callq  *0x1c30(%r12)
 9d6:	00 
 9d7:	41 89 c6             	mov    %eax,%r14d
 9da:	48 89 ef             	mov    %rbp,%rdi
 9dd:	e8 00 00 00 00       	callq  9e2 <hidraw_write+0xfd>
 9e2:	48 c7 c7 00 00 00 00 	mov    $0x0,%rdi
 9e9:	e8 00 00 00 00       	callq  9ee <hidraw_write+0x109>
 9ee:	5b                   	pop    %rbx
 9ef:	5d                   	pop    %rbp
 9f0:	41 5c                	pop    %r12
 9f2:	41 5d                	pop    %r13
 9f4:	49 63 c6             	movslq %r14d,%rax
 9f7:	41 5e                	pop    %r14
 9f9:	c3                   	retq

From the first line above (00000000000008e5 <hidraw_write>) we can see that the address of the hidraw_write() function is 0x8e5.

The instruction we are interested in is at hidraw_write+0x3b which is at address 0x8e5+0x3b that is 0x920:

 918:   48 8b 04 ed 00 00 00    mov    0x0(,%rbp,8),%rax
 91f:   00
 920:   4c 8b 60 28             mov    0x28(%rax),%r12
 924:   49 83 bc 24 30 1c 00    cmpq   $0x0,0x1c30(%r12)
 92b:   

Now, how do we find the correspondent instruction in the source code? If we don't want to do that the hard way we can use the -S option of objdump like mentioned also in Documentation/BUG-HUNTING in the kernel tree, which interleaves source code with the disassembled listing, here's the excerpt we need:

	dev = hidraw_table[minor]->hid;
 916:	89 ed                	mov    %ebp,%ebp
 918:	48 8b 04 ed 00 00 00 	mov    0x0(,%rbp,8),%rax
 91f:	00 
 920:	4c 8b 60 28          	mov    0x28(%rax),%r12

So in this case the problem is that the code is trying to dereference hidraw_table[minor], but this entry could be NULL if we disconnect the device while performing the write() in a loop without checking return values.

Finding out the exact line could be a little harder when macros and inlined functions are used, but once we know how to deal with a simple scenario like the one proposed the complicated cases become someway doable too.

Elaborating a solution

The fix is trivial is this case: check the pointer is not NULL before dereferencing it, bail out otherwise.

Implementing and testing the solution

This is the actual patch which solves the issue:

diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
index 9eaf6ae..a3866b5 100644
--- a/drivers/hid/hidraw.c
+++ b/drivers/hid/hidraw.c
@@ -109,6 +109,12 @@ static ssize_t hidraw_write(struct file *file, const char __user *buffer, size_t
 	int ret = 0;
 
 	mutex_lock(&minors_lock);
+
+	if (!hidraw_table[minor]) {
+		ret = -ENODEV;
+		goto out;
+	}
+
 	dev = hidraw_table[minor]->hid;
 
 	if (!dev->hid_output_raw_report) {

Now, using the fixed hid module, if we run the test program again and disconnect the device we won't hurt the kernel anymore.


CommentsSyndicate content

Wonderful. I never thought

Soares's picture

Wonderful. I never thought about debugging the linux kernel, but knew it must be a pain in the ass. I just wonder where you took the '0x8e5' address to add '0x3b' (present in the error log) to have 0x920 as a final result. I really searched every piece of log and code you posted and ain't found out from where you took it off. I'm really interested in knowing how did you find out the exact point of the code who broke, and how do you convert it to the exact line present in the source code.

Thanks in advance!
-Fernando Cavendish

If you click on the line

ao2's picture

If you click on the line which says:

objdump -d drivers/hid/hidraw.o

The output of the command will be shown, and the first line here is:

00000000000008e5 <hidraw_write>:

That's where the 0x8e5 comes from, it's the address of the hidraw_write function in drivers/hid/hidraw.o.

I fixed the article to mention that explicitly.

About finding the exact line, I used the -S option of objdump to interleave the source code with the assembly, it's a lot easier that way.

Ciao, Antonio

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account associated with the e-mail address you provide, it will be used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
7
k
S
k
6
d
Enter the code without spaces.