Friday 30 December 2011

Never overestimate the difficulty of concurrent code

Whenever anything goes wrong in concurrent code, people assume it's because of the concurrency. Especially if it's an apparently-random error, e.g., memory corruption.

But that doesn't actually mean the concurrency is the cause.

Turns out that I read the MSDN page wrong- it was false for "insertion did not occur", not false for "value did not already exist". Whoopsie.

Tuesday 27 December 2011

Concurrency fun

Concurrency is no fun. Let's review.

1. The implementation is buggy as fuck. I mean, stupid, obvious bugs.
2. There's nobody around to help. If I post i++ + ++i, then it gets answered by everyone, and if my concurrent code fails, then nobody even looks.

Friday 23 December 2011

Concurrency in the compiler

So right now, we've identified five stages in compilation:

Lex
Parse
Collate
Compile
Codegen

So which, if any, of these can we run in parallel?

Lex- yes. The lexer simply operates on a series of files. These files could be each opened in parallel.
Parse- yes. The parser simply operates on the tokens produced by the lexer, so each file can be parsed in paralle.
Collate- yes. Thanks to the PPL, a concurrent_unordered_map ought to do just fine here to allow us to collate concurrently.

Compile - Unknown. As this stage currently has no implementation, there's no way to know if I can parallelise that implementation. However, right now, I think the answer may yet be "yes". For example, given a simple binary expression of the form "expr + expr", then logically, each expression could be constructed concurrently. In addition, any statement that does not change the symbol table could be evaluated concurrently.

Codegen - Unlikely. LLVM does not appear to offer much concurrent compilation functionality, so this could get rather messy if concurrency is aimed for.

Locational information

The first task of the analyzer is collation. This basically means going through all the source files and gluing them together. It also means that, from the analyzer's perspective, the data structures are in an irrelevant order- that is, the analyzer doesn't care if you used a function before defining it. The analyzer doesn't look at the *contents* of functions, etc. It just says "Wide.X is a variable.". Collation yields errors like "you said that Wide.X was a variable, but earlier you said it was a function".

The second stage will be compilation- the analyzer starts by looking in all entry points. That means Main(), obviously, and any exported symbols. Then it recursively compiles all functions and types used. This will yield things like "You tried to assign an int to a string, wtf you smoking?" and construct the semantic types for expressions, statements, that kind of useful thing.

The third is code generation. This will mean conversion to LLVM IR, and then ask LLVM to kindly make it into an executable of a useful format, such as PE. Then compilation will be complete.

Right now I'm working on collation. Specifically, I have observed that I give shitty errors. Not quite "You fucked up", but "You made X mistake." and it really needs to be "You made X mistake, and you should really take a look at Y locations to see wtf I'm talking about." This means that Y needs to be passed from the lexer through the parser to the analyzer, which means interacting with my old fun friend Bison.

Moving location data through the parser today so the semantic analyser can give meaningful errors. Right now, I can tell you that you made a mistake, but not the location, which is obviously not very helpful. Currently, AST nodes carry the beginning and end tokens that produced them. Some "container" nodes, like modules, contain only their own data, as each element is a separate node that should be considered separately.

ATM, the only mistakes you can make that I can pick up on are that you gave a module member two different access levels, and that you said a module was dynamic once and then not again. In addition, in theory, I can mention that there was already something else where you tried to put a module, but unfortunately, since modules are the only things I actually collate right now, there's not much to conflict with.

Tuesday 20 December 2011

Move semantics

RARGH VISUAL STUDIO Y U SUCK :(

At least, soon, when I get an ICE, it'll be my own fault.

Monday 5 December 2011

Impossible implementation

I've come to the conclusion that it may be physically impossible to implement WideC as I had planned. The simple problem is interoperation.

The compiler interops heavily with the generated code- for example, they must use the same string classes and data structures, the same exception handling, the same inheritance implementation, etc. This is going to be a big problem, because in order to write the compiler's internal data structures to interact with the WideC libraries, I'd have to compile the WideC libraries, which is obviously impossible, since I can't write the compiler's necessary data structures.

The only solution to this is going to be a C-style abstraction, I think.

Sunday 4 December 2011

ABI independence

The problem I'm currently focusing my non-trivial talents on is a specified ABI. I think that by simply delegating to the C ABI, this would be effectively achieved. Exceptions can already be converted relatively easy using the old error code mechanism, except that the language spec would enforce that it be checked and converted automatically, as it were.

Standard.String exported_func() throws(Standard.String) {
    throw "harhar";
}

becomes, equivalently,

char buff[sizeof(Standard.String)];
void* result = &buff[0];
auto enumvalue = exported_func(&result);
if (result == &buff[0]) // buff contains a valid Standard.String
else {
    if (enumvalue == 1)
        // result points to Standard.String exception value
        // do shit
        __dll_free_exception(result);
    else
       // fuck
}

Not the cheapest conversion ever, but at least it should be fairly simple to automate.

The problem comes in the specification of Standard types. Logically, a custom type can only be built of only two things, ultimately:

Primitive C types, like fixed-width integers / doubles / pointers
Standard types

Obviously, every limitation placed on the implementation restricts the available implementations, which is not something I really want. On the other hand, totally unrestricted implementation means that no implementations can be compatible, which is unacceptable. This issue is partially solved because dynamic libraries are intended to be much rarer in WideC than in C++, but still needs further exploration.