#DesmetC now has 10912 tests of its arithmetic and logical operations. This approach continues to turn up bugs, which I then fix.
#DesmetC now has 10912 tests of its arithmetic and logical operations. This approach continues to turn up bugs, which I then fix.
On the plus side I seem to have corrected lots of #DesmetC mul/div/mod bugs in one fell swoop by rewriting the relevant part of its codegen, lmao. So many type hacks, gone.
On the minus side, I think now I might be finding some bugs in #OpenWatcom C, which was supposed to be my infallible test oracle, dammit ;D
realized my now-thousands of #DesmetC binop expression evaluation test cases are still all just variable [op] variable cases, and don't exercise variable [op] literal or literal [op] variable at all yet. I know there will be more bugs there, because the compiler does constant folding as an entirely separate code path.
Latest #DesmetC "explorations with machete and torch": the compiler source has numbered constants for each supported C datatype. Normally you'd use enum for this sort of thing, but this codebase used #defines. The constants were numbered in a strange order, and I wanted to re-sort them in the order of the "usual arithmetic conversions", to simplify some logic. This broke code-gen, emitting illegal instructions. Several hours later, I found that CCHAR=1 and CINT=2 were directly used in hex math determining which x86 opcode to emit. When I renumbered those constants, it caused absurd instructions to be generated. After correcting that problem, we are now back to self-hosting o.k.
I'm hoping this will make it possible to retire a bunch of one-off type promotion logic scattered around the compiler, in favor of a few central functions closely mapping to the C89 standard.
After all my testing, how many bugs can #DesmetC math expressions still have? Well... the next problem is that the result type doesn't always match the behavior required in C89 §3.2.1.5 Usual arithmetic conversions. Like if you do int + unsigned, there are circumstances where the result might be int, not unsigned as would be standard.
My tests weren't catching this because they were all structured like,
int i;
unsigned j, k;
k = i + j;
// now test the value of k against expectations
where the assignment coerced the result to a particular type. This meant tests weren't checking the result's "natural" type, and sometimes that was wrong.
OK, finally I'm reasonably confident that the integer comparisons in #DesmetC are working correctly, after beating them into submission against a test-case-generator, with tcc on my Linux machine serving as the test oracle.
Huh weird, just noticed that in #DesmetC I can just keep redeclaring a local variable by the same name and it works. My test suite was doing this by accident, with seemingly no problems.
int n = f1();
printf("%d\n", n);
int n = f2();
printf("%d\n", n);
This isn't even a c99 compiler, so declaring a variable other than at the start of a function should be illegal besides.
It is satisfying to watch the test count creep steadily upward. I like to leave off just after writing a failing test, to give me a clear "next task" when I resume work.
#DesmetC mul/div/mod i8 bugs seem more or less vanquished, now moving on to i8 comparison, which I probably should have done first, as it's also turning out to be broken.... -1 > 1, don't you know?
The mul-div codegen path in #DesmetC is sorta a nightmare because so much is reused between multiplication, division, and mod, and because there are some dodgy special-cases in here from 1990 that demonstrably do the wrong thing. Proceeding slowly with machete and torch, laying down test cases as I go.
It's been a pretty productive night in the ol' #DesmetC codebase. Regression tests finally checked in, all the mixed-size integer addition/subtraction involving signed chars I could think of is exercised and passing, nice.
Then I try i8 * i8 -> int and it instantly breaks, not so nice 🫠.
Oh well, that gives me something to fix tomorrow.
Also, I haven't ventured into floating point conversion land yet, either. I'm sure that'll have plenty of dragons when used with signed char.
All this is making me appreciate the wisdom of BCPL and B who have just a word-sized type -- or #Forth which takes that and adds char, as a treat.
#DesmetC (signed char -> long) promotion is now working on my branch: got it on the first try, which hopefully means I'm internalizing the codebase. Now I will start on tests for mixed-sign arithmetic.
Okay, #DesmetC sign-extension in (signed char -> int) promotion now works in my branch. (signed char -> long) does not work yet; it neglects to sign-extend, acting more-or-less like (unsigned char -> long). That's next to fix.
Thankfully, the codebase is small enough that it's not too hard to find the logic responsible for any given codegen decision.
Also, I came up with the trick of having the assembler backend emit comments into the output asm file. This lets me do something like printf debugging to check which codegen cases are being hit and annotate the assembly they're generating.
Naturally this is all test-driven: I'm accumulating regression tests for the broken codegen I've been fixing, and usually the way I find codegen bugs is by writing new tests expecting the mathematically correct answer, and watching them immediately fail.
Hmm, I found yet more weirdness with signed chars in #DesmetC. If you do:
signed char i, j;
int k;
// ...
k = i + j;
k's upper bits become a sign-extended version of i, not a sign-extended version of the result.
Much of this pain seems to trace back to a quirk (deviation from the C standard) documented in the manual: math on char types produces a char result not an int result. Perhaps to save a few instructions? Anyway, however this was implemented seems to work fine for char but not signed char.
Been writing regression tests for arithmetic, which caught another #DesmetC code-gen bug which I was able to fix. https://github.com/the-grue/OpenDC/issues/5
Previously, illegal asm instructions were being generated, as shown.
Ooh, the-grue, the current maintainer of OpenDC #DesmetC, took my code-gen patch, how awesome! Usually when I pick up an old codebase like this, the maintainer is long gone.
So @linear if you end up wanting to submit patches, that's the place: https://github.com/the-grue/OpenDC
My fork will remain just an unofficial fork.
The #DeSmetC compiler codebase is the hairiest code I've had the experience of hacking. K&R style, many global variables, short cryptic names, spooky action at a distance, the shotgun-surgery pattern for type handling splatted around everywhere, oh baby.
For all that, I managed to fix the codegen bug from the Github issues on the ~second day of working on the compiler... that's the beauty of a small codebase.
My fork is here: https://gitlab.cs.washington.edu/fidelp/open_desmet_c
1 bug down, 999 to go...