Pointer arithmetic in C

Tudor Bosman
You know that piece of the C standard that says that pointer arithmetic is only valid if the pointers point into the same array? (I don't have a copy of the C standard, but C++11 says, in 5.7.6: "Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.")

It's true. Here's an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdlib.h>
#include <stdio.h>

struct Foo {
  char x[9];
};

ssize_t foo_diff(const struct Foo* a, const struct Foo* b) {
  return b - a;
}

struct Bar {
  struct Foo a[2];
  char x;
  struct Foo b[2];
};

void print(const struct Foo* a, const struct Foo* b) {
  ssize_t d = foo_diff(a, b);
  printf("a<b:%d b-a:%zd\n", (a < b), d);
}

int main(int argc, char *argv[]) {
  struct Bar bar;
  print(&bar.a[0], &bar.a[1]);
  print(&bar.b[0], &bar.b[1]);
  print(&bar.a[0], &bar.b[0]);
  return 0;
}


On my machine, it prints:
1
2
3
a<b:1 b-a:1
a<b:1 b-a:1
a<b:1 b-a:-8198552921648689605


What's happening here? Clearly
bar.b[0]
is stored at a higher address than
bar.a[0]
, why is the difference a humongous negative number?

Let's look at the disassembly of
foo_diff()
(on Linux x86_64):

1
2
3
4
5
6
7
Dump of assembler code for function foo_diff:
   0x00000000004004a0 <+0>:     sub    %rdi,%rsi
   0x00000000004004a3 <+3>:     movabs $0x8e38e38e38e38e39,%rax
   0x00000000004004ad <+13>:    imul   %rax,%rsi
   0x00000000004004b1 <+17>:    mov    %rsi,%rax
   0x00000000004004b4 <+20>:    retq
End of assembler dump.


So we subtract
a
(
%rdi
) from
b
(
%rsi
), and then, rather than dividing the result by the size of
struct Foo
(9 bytes), we multiply by 10248191152060862009. What?

That's 9's Multiplicative inverse. Dividing by 9 produces the same result as multiplying by the multiplicative inverse (but presumably multiplication is faster), but this property only holds if the division is exact (the remainder is zero). The compiler is allowed to make this assumption (because
a
and
b
are supposed to point in the same array, remember?), and so it produces code that won't work if the assumption doesn't hold.
UpvoteDownvote1 Comment
Tudor Bosman
Tudor Bosman
code monkey, father of toddler and cat
  • 6,441 views on this post.