Subject: Beeb, this was your grandma!

<< Previous Message	Main Index	Next Message >>
<< Previous Message in Thread	This Month	Next Message in Thread >>
Date   : Wed, 23 Sep 2009 00:43:14 +0100
From   : philpem@... (Philip Pemberton)
Subject: Beeb, this was your grandma!

Rick Murray wrote:
> All considered, I think it would be much better if somebody got off 
> their advocacy-ass and coded up a drool-proof C [#] compiler [!]. One 
> that checks bounds, one that laughs at you if you define an array and 
> then use negative offsets.

I actually like having a compiler that doesn't try and protect me from 
myself... It would, however, be nice to have array bounds checking in 
gcc, and then compile debug versions of my apps with it turned on...

I'm sure I had some macros for bounds checking *somewhere* but they've 
disappeared, and they didn't handle pointer maths anyway.

 > One that prints out "WTF?!?" every time you
> try to assign from *(**(void)*myvar[*ptr++]) or other such unreadable 
> nonsense that is probably a perfectly valid C statement.

]] *(* *(void) * myvar[*ptr++])

Right, let's tear this apart from the inside out...

myvar[*ptr++]

You're dereferencing a pointer and using it as an index into an array, 
then postincrementing it? OK, fair enough, as long as ptr is an integer 
type. Perfectly valid, and sane (in some circumstances). Just watch out 
for operator precedence rules. I *think* this would move the pointer 
along one element, not increment what ptr points to, but I haven't 
bothered checking K&R to confirm.

(**(void) * myvar[*ptr++])

Not a chance this is valid. You're multiplying myvar[..] against a 
typecast. At the very least you need to specify something to convert 
(e.g. "(void)fred") but "*(void)fred" still wouldn't be valid -- you 
can't dereference a null pointer (because the ANSI spec and K&R say so, 
that's why).

*( ... )

By this point you've already dereferenced the pointer back to an int or 
whatever, and if you try and dereference it again, the compiler will 
most likely barf (or it SHOULD barf). You can play silly games like this:

(int*)(((int)(&fred))+1)

Basically, this increments a pointer by 1 byte. It's guaranteed to break 
on any platform that can't deal with non-word-aligned data, and is also 
guaranteed to break on most 64-bit platforms where sizeof(int) != 
sizeof(void *).

((int)(&fred))   ==> get the address of 'fred' and turn it into an int
((int)(&fred))+1 ==> add 1 to it
(int*)...        ==> turn it back into a pointer

Anyone caught writing code like this should be shot on sight. There 
really is no excuse for it. gcc will spit stuff like this out, though, 
especially if you run it with the switches "-Wall -pedantic" (all 
warnings on, pedantic mode on). Pedantic Mode combined with All Warnings 
is a good way to force yourself to write half-decent code. If you really 
want to be masochistic, add "-Werror" onto that list too (-Werror 
promotes warnings to error status, meaning they cause fatal compilation 
errors).

> # - Not C++. My uncle wrote a detailed book on C++ programming (several, 
> in fact) and I'm afraid I've not found anything in C++ that can't be 
> expressed C. Okay, we'd be calling Modem_PutByte instead of 
> Modem.PutByte, but same difference.

*shrug*

Things like object inheritance are easier to express in C++, Java and 
other OO languages.

For instance: a Disc has 0..x Tracks, a Track has 0..y Sectors, a Sector 
has 0..z bytes of data. So you have classes for Disc, Track and Sector, 
and then implement code that allows you to do things like this:

data = mydisc->track(5)->sector(8)->data->clone();
printf("Data byte 5 is %02X\n", data[4]);
     // array indices are zero-based

I'm really not sure how you'd do this in C, but I suspect it would 
involve something like this:

DISC_CTX *disc;
TRACK_CTX *trk;
SECTOR_CTX *sector;

disc_GetTracks(disc, &trk);
track_GetSector(trk, 5, &sector);
printf("Data byte 5 is %02X\n", sector_GetByte(sector, 4));

Not quite as clean and tidy as the C++ version...

> ! - This isn't to say C is perfect. There's a lot of nonsense, like 
> whether to include <this> or "this",

<this> ===> system libraries / system headers
"this" ===> application specific headers

 > or indirect like.this or
> like->this.

Use like.this for statically-allocated objects, like->this for ones 
allocated with the "new" operator.

 > There are reasons, but it's a pain, as is the TOTAL lack of
> bounds checking and such in 99% of compilers. Bounds checking might add 
> penalties,

HAHA!
Understatement of the millennium.

I did some tests with Turbo Pascal bounds checking. I seem to recall it 
slowing down my code by somewhere on the order of 4-6x. This was 
data-processing code that did a lot of array accesses, though.

 > but I've seen some HORRIBLE things (probably got some in my
> own code) that the clumsiest Duplo-level intelligence bounds checking 
> would have picked up, but never got noticed as the corruption "wasn't 
> bad enough" to make the flaw obvious.

A lot of C bootstraps (crt0 code) allocate more RAM than necessary for 
the application's data store, then fill all of it with the value 
0xBAADF00D, 0xDEADBEEF or something similar. Equally, a lot of free() 
routines do this just before freeing memory. The idea is that a value 
like this is more likely to cause the program to crash than just leaving 
it with its old value, or leaving the memory set at zero.

I suppose with the UNIX type OSes it's also a security thing -- if 
another app could read the contents of memory after (say) su or sudo has 
executed, and ends up in the same memory block, it might be able to find 
out the password for the current user (or even root). Not exactly 
something you want to happen.

But basically, you see 0xBAADF00D appear in the Data Trace window in 
your debugger, and you KNOW what's going on -- you're accessing 
unallocated (or freed) memory. Then you backtrace from there and find 
out what freed the memory you were using...

> But on the other hand, pretty much the entirety of certain mainstream 
> operating systems and most ALL the software supplied with, is written in 
> C. You could argue this isn't necessarily a good thing. I would 
> counter-argue, show me the one written in Pascal...

UCSD Pascal. A complete operating system, Pascal compiler, and so forth. 
  I seem to recall most of UCSD Pascal is written in Pascal -- you have 
a short pcode interpreter that actually "runs" code and provides 
hardware abstraction (e.g. floppy disc, screen and printer access), but 
the actual OS is written in assembler.

Truth be told, though, I do >80% of my programming in C (it's quicker 
than playing with C++), but the OO or GUI stuff is written in C++ out of 
necessity. If I'm just doing quick hacks, I'll probably use Python -- it 
has a lot of useful libraries that make e.g. XML and web interfacing a 
snap. Seriously, I would kill for an equivalent of Python's ElementTree 
XML library written in C++ (and using STL elements, e.g. vectors/lists)...

Cheers,
-- 
Phil.
philpem@...          
http://www.philpem.me.uk/
<< Previous Message	Main Index	Next Message >>
<< Previous Message in Thread	This Month	Next Message in Thread >>