Hacker Remix

Getting silly with C, part (void*)2

175 points by justmarc 1 week ago | 115 comments

sylware 1 week ago

C syntax is already way too rich and complex.

We need a C- ore µC:

No implicit cast except for literals and void* (explicit compile time/runtime casts), one loop statement (loop{}), no switch/enum/generic/_thread/typeof/etc, no integer promotion, only sized primitive types (u64 s32 f32 etc...), no anonymous code block, real compiler hard/compile time constant declaration, many operators have to go (--,++, a?b:c, etc)... and everything I am forgetting right now (the dangerous struct pack attribute...). But we need inline keywords for memory barriers, atomics for modern hardware architecture programming.

wongarsu 1 week ago

There is C0, a stripped-down version of C popular in academia [1]. Great for teaching because it's conceptually simple and easy to write a compiler for. But with a couple of additions (like sized primitive types) it might match what you are imagining

1: https://c0.cs.cmu.edu/docs/c0-reference.pdf

glouwbug 1 week ago

C really just needs if / else / while / and void functions. Function inputs should be in/out (const type* or type*).

bregma 1 week ago

So, FORTRAN IV except for the else.

butterisgood 1 week ago

Pre-scheme?

accelbred 1 week ago

Does Zig fit your bill?

sylware 1 week ago

Dunno, I should have a look though. But I have recollection of some garbage collector, wrong/right ?

nick__m 1 week ago

I doubt that, zig is allocators land. Even stdlib datastructures required an allocators to be instanciated. Have a look at the selection of allocators: https://zig.guide/standard-library/allocators .

sylware 1 week ago

I had a look, zig seems to require a runtime, even small, for basic syntax support. So it seems it is not suitable.

You should be able to generate machine code without the need of any runtime, like you can with C.

mlugg 1 week ago

I'm unsure what you're referring to here -- Zig doesn't have any runtime, it doesn't even depend on libc.

The only thing I can think of that you might be referring to is compiler-rt: if so, this is a thing in C too! It's just a small collection of implementations for operations the code generator wants to call into (e.g. memset, arithmetic for integers larger than CPU word size). Clang uses compiler-rt when compiling C code, and GCC's equivalent is libgcc. Nonetheless, Zig lets you disable it using `-fno-compiler-rt`, in which case you'll need to provide the relevant symbols yourself somehow.

sylware 1 week ago

This is not what I understood, there is some kind of flags required for pointers or something, which requires a data section for basic primitive types.

mlugg 1 week ago

I'm afraid I'm not sure what you're referring to. For instance, I can build a simple Hello World in Zig using `zig build-exe`, and get a static executable, on which I can use `nm` to confirm that there aren't symbols from any kind of runtime. I can even trivially build the actual Zig compiler to a static binary.

(For context, by the way, I'm on the Zig "core team"; I'm a notable contributor to the project.)

sylware 1 week ago

mmmmh... basically, generates machine code which only requires a stack (which could be used by code paths not written in zig), contained in memory pages with execute and read permission only. Ofc this machine code would interact with the other machine code (written in other languages) via the architecture calling convention (the ABI/C one).

accelbred 5 days ago

Thats what Zig does. It compiles down to the same stuff as C. It does not have a runtime.

butterisgood 5 days ago

It also has been very much trying to get rid of things like "undefined behavior".

mhandley 1 week ago

I expect many people know this one, but it's a useful teaching aid when understanding the relationship between arrays and pointers

  int array[10];
  *(array+1) = 56;
  array[2] = 4;
  3[array] = 27;

The first two are obvious, but the third is also legal. It works because array indexing is just sugar for pointer arithmetic, so array[2]=4 is identical in meaning to *(array+2)=4. Therefore 3[array]=27 is identical to *(3+array)=27 and so is legal. But just because you can doesn't mean you should.

macintux 1 week ago

The best, most entertaining book I've ever read on C covered that (unless I'm misremembering, but I doubt it): Expert C Programming.

https://www.goodreads.com/book/show/198207.Expert_C_Programm...

dualogy 1 week ago

I'm already liking that one! Page 5 quote:

> There is one other convention — sometimes we repeat a key point to emphasize it. In addition, we sometimes repeat a key point to emphasize it.

One more quote and I'll stop:

> ctime() converts its argument into local time, which will vary from GMT, depending on where you are. California, where this book was written, is eight hours behind London, and several years ahead

WalterBright 1 week ago

> The first two are obvious, but the third is also legal.

D doesn't have that bug!

In 44 years of C programming, I've never encountered a legitimate use for the 3rd. (Other than Obfuscated C, that is.))

WolfeReader 1 week ago

It's not a bug. You're seeing the difference between "this is how you're taught to access arrays" and "this is how array access actually works".

WalterBright 1 week ago

Since the Standard specifies what that does, pedantically it is not a bug. Ok.

But I call it a bug because it has no use and just pointlessly confuses people.

im3w1l 1 week ago

Well it could (and I agree with WalterBright that it should) have been disallowed. a[b] being implemented as an early stage rewrite rule expanding to *(a+b) is an uninteresting implementation detail. And I doubt it is even implemented that way in modern compilers anyway. It certainly can't be in C++ as a[b] and b[a] mean different things when [] is overloaded.

WolfeReader 6 days ago

That "uninteresting implementation detail" is actually of grave importance when it comes to understanding how buffer overflow attacks work. I hate to think anyone would put C code into production without understanding this.

kragen 1 week ago

You seem to be lecturing the author of one of the most prominent early C compilers on how array access actually works in C.

WolfeReader 6 days ago

Yep.

mhandley 1 week ago

Agreed - I've only been programming C for 38 years but I've also never found a legitimate use. However I have used it to illustrate a point when teaching C to beginners - it looks so odd they tend to remember it.

matheusmoreira 1 week ago

Note that this is GNU C, not standard C. GNU has extended the normal C language with features such as forward parameter declarations and numeric ranges in switch cases. Lots of people don't know about these things.

dzaima 1 week ago

Note that switch case ranges might be coming in C2y though.

mananaysiempre 1 week ago

Also forward parameter declarations, or is that proposal dead?

wahern 1 week ago

Basically dead. The main motivation would be to make it easier to use variably modified types in function parameters, where the (length) identifier is declared after the variably modified type, as in

  > void foo(int a[m][m], int m)

Currently you can only do:

  > void foo(int m, int a[m][m])

The holy grail is being able to update the prototypes of functions like snprintf to something like:

  > int snprintf(char buf[bufsiz], size_t bufsiz, const char *, ...);

However, array pointer decay means that foo above is actually:

  > void foo(int (*a)[m], int m)

Likewise, the snprintf example above would be little different than the current definition.

There's related syntax, like

  > foo (int m, int a[static m])

But a is still just a pointer, and while it can help some static analyzers to detect mismatched buffer size arguments at the call site, the extent of the analysis is very limited as decay semantics effectively prevent tracing the propagation of buffer sizes across call chains, even statically.

There's no active proposal at the moment to make it possible to pass VM arrays (or rather, array references) directly to functions--you can only pass pointers to VM array types. That actually works (sizeof *a == sizeof (int) * m when declaring int (*a)[m] in the prototype), but the code in the function body becomes very stilted with all the syntactical dereferencing--and it's just syntactical as the same code is generated for a function parameter of `int (*a)[m]` as for `int *a` (underneath it's the same pointer value rather than an extra level of memory indirection). There are older proposals but they all lost steam because there aren't any existing implementation examples in any major production C compilers. Without that ability, the value of forward declarations is greatly diminished. Because passing VM array types to functions already requires significant refactoring, most of the WG14 felt it wasn't worth the risk of adopting GCC's syntax when everybody could (and should?) just start declaring size parameters before their respective buffer parameters in new code.

uecker 1 week ago

I hope it is not "basically" dead. I just resubmitted it at the request of several people.

And yes, for new APIs you could just change the order, but it does help also with legacy APIs. It does even when not using pointers to arrays: https://godbolt.org/z/TM5Mn95qK (I agree that new APIs should pass a pointer to a VLA).

(edited because I am agreeing with most of what you said)

mananaysiempre 1 week ago

> everybody could (and should?) just start declaring size parameters before their respective buffer parameters in new code

I know that was a common opinion pre-C23, but it feels like the committee trying to reshape the world to their desires (and their designs). It's a longstanding convention that C APIs accept (address, length) pairs in that order. So changing that will already get you a score of -4 on the Hard to Misuse List[1], for "Follow common convention and you'll get it wrong". (The sole old exception in the standard is the signature of main(), but that's somewhat vindicated by the fact that nobody really needs to call main(); there is a new exception in the standard in the form of Meneide's conversion APIs[2], which I seriously dislike for that reason.)

The reason I was asking is that 'uecker said it was requested at the committee draft stage for C23 by some of the national standards orgs. That's already ancient history of course, but I hoped the idea itself was still alive, specifically because I don't want to end up in the world where half of C APIs are (address, length) and half are (length, address), when the former is one of the few C conventions most everyone agrees on currently.

[1] https://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

[2] https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Restart...

dfawcus 1 week ago

Note that GCC does (sometimes) detect the misuse of the "int a[static 3]" case, but maybe that is only when the length is a compile time constant; and possibly only with char arrays.

  $ make texe
  cc -g -O2 -std=c11 -Wall -Wextra -Wpedantic -Werror   -c -o test.o test.c
  test.c: In function ‘do_test_formatSmallElem’:
  test.c:108:9: error: ‘matSmallElemFormat’ accessing 8 bytes in a region of size 2 [-Werror=stringop-overflow=]
    108 |         matSmallElemFormat(elem, buffer);
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  test.c:108:9: note: referencing argument 2 of type ‘char *’
  In file included from test.c:8:
  mat/display.h:17:6: note: in a call to function ‘matSmallElemFormat’
     17 | void matSmallElemFormat(mElem elem, char buffer[static matSmallElemLen]);
        |      ^~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors
  make: *** [<builtin>: test.o] Error 1

Gibbon1 1 week ago

That's related to something I would like which is to be able to set the number of elements in an incomplete struct.

   struct foo
   {
     size_t elements;
     int data[];
   };

   foo foo123 = {.elements = array_size(data), .data = {1, 2, 3}};

or struct str { size_t sz; char str[]; };

   str s123 = {.sz = strlen(.str), .str = "123"};

uecker 1 week ago

Clang and GCC just got the [[counted_by()]] attribute to help protect such structs in the kernel. But yes, native syntax for this would be nice.

dfawcus 1 week ago

I'd have to argue the function typedefs are not useless, I've come across two uses.

The obvious one is rather than a function pointer typedef, such the subsequent use in a struct is obviously a pointer. Which helps when others are initially reading unfamiliar structures.

  typedef int handler_ty(int a);

  struct foo {
    handler_ty *handler;
    /* ... */
  }

  struct foo table[] = { { /* init fields */, /* init fields */, };

The other case can be somewhat related, namely as an assertion / check when writing such handler functions, and more importantly updating them.

  handler_ty some_handler;
  int some_handler(int a) { /* ... */ }

When updating code, it allowed for easier to decode compiler errors if the expected type of handler_ty was changed, and some specific handler was incorrectly updated, or not updated at all.

Basically the error would generally directly call out the inconsistency with the prior line, rather than with the distanct use in the initialisation of 'table'.

As I recall this mechanism has been around since at least C89, I don't recall using it in K&R.