Friday 30 December 2011

Never overestimate the difficulty of concurrent code

Whenever anything goes wrong in concurrent code, people assume it's because of the concurrency. Especially if it's an apparently-random error, e.g., memory corruption.

But that doesn't actually mean the concurrency is the cause.

Turns out that I read the MSDN page wrong- it was false for "insertion did not occur", not false for "value did not already exist". Whoopsie.

Tuesday 27 December 2011

Concurrency fun

Concurrency is no fun. Let's review.

1. The implementation is buggy as fuck. I mean, stupid, obvious bugs.
2. There's nobody around to help. If I post i++ + ++i, then it gets answered by everyone, and if my concurrent code fails, then nobody even looks.

Friday 23 December 2011

Concurrency in the compiler

So right now, we've identified five stages in compilation:

Lex
Parse
Collate
Compile
Codegen

So which, if any, of these can we run in parallel?

Lex- yes. The lexer simply operates on a series of files. These files could be each opened in parallel.
Parse- yes. The parser simply operates on the tokens produced by the lexer, so each file can be parsed in paralle.
Collate- yes. Thanks to the PPL, a concurrent_unordered_map ought to do just fine here to allow us to collate concurrently.

Compile - Unknown. As this stage currently has no implementation, there's no way to know if I can parallelise that implementation. However, right now, I think the answer may yet be "yes". For example, given a simple binary expression of the form "expr + expr", then logically, each expression could be constructed concurrently. In addition, any statement that does not change the symbol table could be evaluated concurrently.

Codegen - Unlikely. LLVM does not appear to offer much concurrent compilation functionality, so this could get rather messy if concurrency is aimed for.

Locational information

The first task of the analyzer is collation. This basically means going through all the source files and gluing them together. It also means that, from the analyzer's perspective, the data structures are in an irrelevant order- that is, the analyzer doesn't care if you used a function before defining it. The analyzer doesn't look at the *contents* of functions, etc. It just says "Wide.X is a variable.". Collation yields errors like "you said that Wide.X was a variable, but earlier you said it was a function".

The second stage will be compilation- the analyzer starts by looking in all entry points. That means Main(), obviously, and any exported symbols. Then it recursively compiles all functions and types used. This will yield things like "You tried to assign an int to a string, wtf you smoking?" and construct the semantic types for expressions, statements, that kind of useful thing.

The third is code generation. This will mean conversion to LLVM IR, and then ask LLVM to kindly make it into an executable of a useful format, such as PE. Then compilation will be complete.

Right now I'm working on collation. Specifically, I have observed that I give shitty errors. Not quite "You fucked up", but "You made X mistake." and it really needs to be "You made X mistake, and you should really take a look at Y locations to see wtf I'm talking about." This means that Y needs to be passed from the lexer through the parser to the analyzer, which means interacting with my old fun friend Bison.

Moving location data through the parser today so the semantic analyser can give meaningful errors. Right now, I can tell you that you made a mistake, but not the location, which is obviously not very helpful. Currently, AST nodes carry the beginning and end tokens that produced them. Some "container" nodes, like modules, contain only their own data, as each element is a separate node that should be considered separately.

ATM, the only mistakes you can make that I can pick up on are that you gave a module member two different access levels, and that you said a module was dynamic once and then not again. In addition, in theory, I can mention that there was already something else where you tried to put a module, but unfortunately, since modules are the only things I actually collate right now, there's not much to conflict with.

Tuesday 20 December 2011

Move semantics

RARGH VISUAL STUDIO Y U SUCK :(

At least, soon, when I get an ICE, it'll be my own fault.

Monday 5 December 2011

Impossible implementation

I've come to the conclusion that it may be physically impossible to implement WideC as I had planned. The simple problem is interoperation.

The compiler interops heavily with the generated code- for example, they must use the same string classes and data structures, the same exception handling, the same inheritance implementation, etc. This is going to be a big problem, because in order to write the compiler's internal data structures to interact with the WideC libraries, I'd have to compile the WideC libraries, which is obviously impossible, since I can't write the compiler's necessary data structures.

The only solution to this is going to be a C-style abstraction, I think.

Sunday 4 December 2011

ABI independence

The problem I'm currently focusing my non-trivial talents on is a specified ABI. I think that by simply delegating to the C ABI, this would be effectively achieved. Exceptions can already be converted relatively easy using the old error code mechanism, except that the language spec would enforce that it be checked and converted automatically, as it were.

Standard.String exported_func() throws(Standard.String) {
    throw "harhar";
}

becomes, equivalently,

char buff[sizeof(Standard.String)];
void* result = &buff[0];
auto enumvalue = exported_func(&result);
if (result == &buff[0]) // buff contains a valid Standard.String
else {
    if (enumvalue == 1)
        // result points to Standard.String exception value
        // do shit
        __dll_free_exception(result);
    else
       // fuck
}

Not the cheapest conversion ever, but at least it should be fairly simple to automate.

The problem comes in the specification of Standard types. Logically, a custom type can only be built of only two things, ultimately:

Primitive C types, like fixed-width integers / doubles / pointers
Standard types

Obviously, every limitation placed on the implementation restricts the available implementations, which is not something I really want. On the other hand, totally unrestricted implementation means that no implementations can be compatible, which is unacceptable. This issue is partially solved because dynamic libraries are intended to be much rarer in WideC than in C++, but still needs further exploration.

Monday 28 November 2011

Unicode

So, I've been attempting to patch up my Unicode support. The lexer's support for it was, well, kinda laughable. It really didn't deal with collating characters and other things properly. As such, I have been forced to retreat to ICU. Oh holy shit, have these guys ever even seen a reasonable quality C++ library? ewwwwwwww. I hate you, now and forever, Unicode.

On the plus side, my custom parser will be done really quite soon. I've been working on the expression handling and I finally managed to express them in such a way that Visual Studio would compile them. This is a significant advantage.

Wednesday 16 November 2011

My Parser

My parser is so awesome, Visual C++ can't compile it. It gives back a nice Internal Compiler Error :(

Oh wait, that's true of nearly *every* awesome program.

Tuesday 8 November 2011

Exist

I do still exist! I am currently working on the first implementation. After the funcakes I had with Bison, you can fully expect that I've been working on the custom parser. My custom lexer is nice and high quality thanks to the fact that I ended up re-writing it about six billion times. At least when my parser throws an exception, I will have nice high quality error messages. That's my end goal, anyway.

Now, for a custom parser, I am obviously going to do the horrendously efficient thing and work without an explicit stack. I am going to instead simply increment/decrement the current element pointer through an existing vector. This should permit a ridiculously small performance improvement for a 1billion times increase in development time.

Seriously though, I just find it more natural.

Tuesday 18 October 2011

Rargh banks annoying

Goddamn useless card readers. I don't need a card reader to send Amazon or Tesco money electronically, so why do I need one to send money to myself? :(:(:(

Sunday 16 October 2011

Rargh

Rargh, university. The bane of my life. So can't wait to get rid of this shit. :(

Tuesday 11 October 2011

Customize-Reach GC

I've decided on a new strategy for garbage collection- the CM GC, or Customize-Mark GC.

The basic idea behind a GC is to list all the objects you ever allocated off the GC. Then you look through the stack (and static references) and mark off every object referenced. Then you go through the list of new references and mark off every object *they* reference. Rinse and repeat until no new objects referenced. Delete all remaining objects.

However, I've had a genius idea for a GC. One where you can customize the "mark" phase. Consider, for example, a std::vector or boost::variant. These container objects would need a custom reach function so that they can "reach" their container's contents. This also means that it would be possible to "mark" objects which are only referred to externally.

For example, let's consider the WinAPI thread pump. You can associate a custom object with each HWND for use in callbacks. This basically consists of SetWindowLongPtr/GetWindowLongPtr. Obviously, in previous GCs this would be impossible to point to a GC object, because the GC has no way of reaching it and even if it does, then it can't change the object pointer, making it impossible. A custom-mark GC, however, can.

class Window {
    HWND hwnd;
public:
    void mark() { // special function
        struct marker {
            HWND hwnd;
            void operator=(T* ptr) {
                SetWindowLongPtr(hwnd, GWLP_USERDATA, ptr);
            }
            T* operator->() {
                return reinterpret_cast<T*>(GetWindowLongPtr(hwnd, GWLP_USERDATA));
            }
        };     
        GC::mark(marker { hwnd });
    }
};

This can tell the GC where to find a custom pointer to an object, and can update it if the GC decides to move the object. This means that the GC implementation can still be compacting, *and* maintain pointers to GC objects that are held externally.

Monday 10 October 2011

Extra Implementations

Well, I've been thinking further about implementations. I want to design a language which can be implemented on the CLR and JVM, as well as native code. Now, all the metaprogramming goodness can be achieved by icky name mangling and is platform independent.

Now, I've been thinking about implementing pointers as integers. Consider the following simple pseudo-Csharp:

class memory {
    static ArrayList<Object> heap;
    static int Create<T>() {
        lock(heap) {
            heap.Add(new T());
            return heap.size() - 1;
        }
    }
}

Obviously this will need to become more complex to actually fulfill the requirements. In any case, this handily converts from a GC reference to a pointer-style type. The integer is POD and can be memory-copied around, it needs to be manually removed, and the GC won't collect the object. In a theoretical implementation of a C++-like language on the CLR, then the return value would serve as a "pointer".

The problem is when you start wanting to insert values into memory yourself. For example, if I were to attempt to emulate a class like this:

class int_or_string {
    bool is_int;
    memory buffer[max(sizeof(int), sizeof(string))];
public:
    int_or_string() {
        is_int = false;
        string* ptr = new (buffer) string();
    }
    // etc
};

For POD types, then there are BinaryWriter classes and such available. As such, I can probably achieve a reasonably close implementation. In addition, all classes that don't have GC references in them are, arguably, implementable by binary writing them into the memory, since emulated pointers are binary writable. As all arrays must be allocated off the GC, it should be relatively simple to implement the "address" returned from placement new as, again, an index into an array of GC references.

The problem comes when I want to put a GC reference there. The GC won't exactly follow my not-exactly-pointers to find the reference.

I've been thinking about just ignoring the "placement" part of the placement new. Just allocate the supposedly placed object off the GC. The problem with this comes when you want to start placing objects into other kinds of memory- for example, how would you place objects into memory provided by external libraries?

What I really need is to differentiate between an object placed into memory which will always have a strong pointer pointing to it, and an object which is binary copied into memory. For objects which are always strong pointed to, I can just ignore the placement part and allocate off the GC. For other objects, well, you can't put GC references in there anyway.

This mandates two ways of writing into memory. So far, I've been thinking about a relatively simple kind of buffer = object; for the second kind. I can guarantee at compile-time that object is not a type which contains GC references. For objects which contain GC references, I would require something like placement new, where you would have to maintain a strongly-typed pointer to the object (including through inheritance, or as part of an array) at all times or risk UB.

Saturday 8 October 2011

Why Joel Spolsky is a numpty

Numpty being a technical term, of course. Have a look at this article:

http://www.joelonsoftware.com/articles/Wrong.html

The correct response to the encoded-non-encoded string problem is to create a separate type which mirrors the platform native string, but that cannot convert to a regular string, or only by a conversion operator which safely encodes it.

class unsafe_string {
    std::string mah_string;
public:
    //.. blah blah
    operator std::string() {
         return encode(mah_string);
    }
};

As such, instead of relying on the eye to review the code and make it safe, you can simply make the compiler do the work for you, which is a vastly superior alternative.

As for operator overloads, well, I'd just have to provide equivalent methods anyway- operator overloads are just syntactic sugar. Also, I don't know wtf IDE you use, but mine tells me the type on demand and will let me see it's declaration on demand as well.

Saturday 24 September 2011

C++ 101

C++ source programs are translated from text into an executable format the machine can understand by a tool known as a compiler. These come in all shapes and formats. For Windows, the most common compiler is Visual Studio- you can get C++ Express for free. For Linux, the most common compiler is G++. For Mac, the compiler is known as XCode. You will have to refer to your compiler's documentation on how to use it- they mostly follow platform conventions. Visual Studio has a pretty GUI, while G++ is usually invoked through the command line, for example.

The first source program we will convert is known as "Hello, World". It is traditionally the first program written by a novice in a language. In C++, this program looks like this:

#include <iostream>
int main() {
    std::cout << "Hello, World!\n";
}

This will output the text "Hello, World!" to your computer screen. Some compilers like Visual Studio will close the console as soon as the program is done running. In this case, you may need to use a breakpoint. Breakpoints are used to pause a program's execution. You can place a breakpoint on the last line of the program.

This small example program actually includes many both simple and advanced features of C++.











Whew, that's a lot- and it's a simplification, too. Fortunately, most of these topics we need to know next to nothing about to work with something a little easier and of more immediate interest. The only current line of interest is the indented one. The purpose of "std::cout << something;" is to print "something" to the screen. That "something" can be a string literal, as in this case, or it can be various other language-provided types, or it can even be custom user-defined types. For now, we will stick to using it with provided types. In addition, this effect can be chained to produce the same effect- that is, "std::cout << something << something_else;" will print both "something" and "something_else" to the screen.

Computer programs work with objects. These objects act as boxes into which values can be placed. The actual definition of these values can be simple, but it can also be very complex. Objects in C++ always have a known, fixed type that cannot change. Variables are simple, named objects, created and destroyed at predictable, known times. The simplest type available is int. int is a simple integral type that can represent integers of a certain range. C++ offers numerous integral types with varying ranges. In the following program, we will perform a simple computation using int.

#include <iostream>
int main() {
    int i = 5; // This is called a declaration.
    i = i + 4;
    std::cout << "Five plus four is " << i;
}

As you can see, each individual chunk of code in the "main" area must end with a semicolon. These chunks do not necessarily correspond to a line, but they almost always do. Furthermore, the language has no rules about the indentation of code, but again, there's a lot of convention about indenting. All professional programmers indent their code to a greater or lesser degree, and many languages or tools enforce certain styles.

First we declared a variable, called i, of type int. This means that we told the compiler of i's existence so that it could, for example, allocate the necessary memory. You cannot access a variable if it has not been declared. Conceptually, the declaration brings the variable into existence- it simply did not exist before that line. We gave it the initial value of 5. 5 is an integer within int's range and therefore a perfectly good int. The compiler will give you an error if you attempt to initialize an int with a value of a different type that it cannot convert- for example, a string. This kind of variable is technically known as an "automatic-lifetime" variable, but is usually referred to as a "stack" variable or a variable on the stack. There are other kinds of variable that we will meet later.

Next, we added 4 to it. This was achieved by giving it a new value- it's old value, plus four.

Finally, we output it to the screen. This example can be trivially modified to perform other, more complex operations. C++ can cope with arbitrarily complex mathematical expressions. In addition, you may create as many variables as you like.


#include <iostream>
int main() {
    int i = 5; // This is called a declaration.
    int j = 6;
    i = i * j; // Multiplication
    i = (i - j); // C++ also supports parenthesis, with the usual meaning
    i = i / j; // Division
    std::cout << "i is " << i << " and j is " << j;
}

An interesting caveat of integer division is that the result is an integer- any additional decimal places are simply removed. This means that the result of, for example, (3 / 5) is not 0.6, but simply 0. This will be addressed in the next lesson.

Friday 23 September 2011

Direct3D11 and Direct2D

Curse the genius at Microsoft who decided that they should not be allowed to play together. For Windows 8 this is fixed. But does Microsoft know if this will be backdated to Windows 7 and Vista? No. How helpful.

I guess I should just bite the bullet and get over the shared-surfaces rubbish?

Tuesday 20 September 2011

ABI

The ability to pass information between compilers- at compile-time- provides a unique opportunity to rectify the ABI issues present in Standard C++.

Name mangling: Export by ordinal, and associate regular names with their ordinal.
Exception handling: I believe this can be done by exporting the try/catch/throw functionality at run-time. Failing that, I was thinking that the process could be exported at compile-time- like, "This is how I implemented throw, and this is how to implement catch.". Bonus if you've got a platform-specific extant mechanism like SEH.
RTTI: I'm really planning on just having `type_info` as a virtual interface. Then you can pass the values at compile-time, like, typeid(int).name() returns "AMAGAD WTF IMPLEMENTATION IS THIS". As long as your virtual functions are compatible, that'll be good to go.

Calling conventions, structure layouts, and such- feel free to describe in detail how to pass parameters and that baloney.

As such, I believe that all I'd have to do is Standardise the means of this communication, and I'd be golden.

Monday 19 September 2011

Microsoft Fail

How is it that Microsoft can endlessly make the same basic design failures every time? The new Metro API's suck a tremendous amount of bollocks. I mean, I'm glad that we invented some of the more basic C++ features ... again ... but Microsoft obviously has no idea whatsoever how to use them. For example, look at the XAML/Direct3D interop class, Presenter. How the fuck do you use that? The interface doesn't at all support the necessary use cases. And there's the usual ApplicationModel rubbish. As if I need a pre-packaged class to tell me how to design my application. Also, handily, they introduced Object, GetType, and a bunch of other utterly useless run-time reflection style stuff. If I wanted run-time reflection, I'd go use WPF. Metro is going to be as slow as a dog, consume a bunch of memory, and it's got a crappy API to boot.

The only good thing is that sometimes, if you want an app that's pre-packaged, the designer might hide all the shit from you.

Friday 16 September 2011

Windows 8

Honestly, I'm just not sure what to make of it.

Yay, DirectX 11 and Direct2D now play together properly, and it therefore doesn't suck tremendously to attempt to render text with DirectX 11.
Boo, ugly language extensions. Metro isn't C++/CLI, despite the visual similarities, but I am more than suspicious of all those language extensions. What's wrong with ComPtr<T> instead of T^?
It is deterministic destruction, and it does have nice exceptions & such, but I am still feeling kind of iffy about it. What if I want to use GCC to produce Metro applications?

Man, the new C++0x features in VC11 are a poor lot, aren't they? Basically just some bug fixes. Not very impressive at all. I mean, AMP is pretty impressive. But come on.. some of the easier features to do would have been nice? Like, say, default/deleted functions?

I also am not very impressed with Win8's Metro. OK, if I had a tablet, it would be great. But I don't have a tablet and I'd much rather have the desktop. So please dear Lord, stop booting into tablet view first. Also, the new Start bar is good for fuck all.

Friday 9 September 2011

Science

Here's how science works. Scientist(s) A do a study. They publish it in a journal after being reviewed by Scientist(s) B,C,D,etc. If you wish to reject the conclusions of A's study, then get out your relevant qualifications,  and kindly ask a peer-reviewed journal to post your detailed rejection. Or at least make an argument grounded in common sense. Until then, you accept what Scientist(s) A have to say, even if that involves ridiculous things like relativity and space/time, or quantum physics, or that young people have different neurochemistries that enforce different sleep cycles.

Sunday 4 September 2011

Namespaces and types

Firstly, a name change. So far, the language has only been called "DeadMG++" for an informal name. However, I've currently decided to refer to it as WideC. Why this is, I've forgotten long ago, but I figure it's a better interim name. I've been grappling for some time now the idea of a relatively simple problem. How does one store constant expressions in a type? Consider a relatively simple C++ class

class X {
    static const int i = 5;
};
int main() {
    int array[X::i];
}

How does this get replicated in WideC?

In fact, specifically, consider my previous application of a memory allocator. The relative complexity of specifying it was just wrong- because it had to exist in *two* times at once. This means that there's only one real solution- namespaces and types are the same thing. Types represent data that *will* exist. Namespaces represent data that *does* exist. However, conceptually, the two are the same. I already had the beginnings of this idea with accessibility in namespaces, and the follow-up with considering mutable namespaces. Now, I'm going to finalize that with a strict equality- the same idea as Standard.Functional.Function versus a FunctionReference. I also had to solve the infinitely recursive problem of the allocator depending on a hash map- which will depend on the allocator.

I'm thinking of using almost "instant-objects" for this job. That is, namespaces will be automatically default constructed - and they may have construction logic- and they may also have destruction logic. They'll be constructed once from source, and then can be mutated for the "next" phase, and can also be queried as to their contents.

That is, "meta-objects" like the allocator dealie that I posted previously would now be simpler to specify, and in addition, I could get rid of those annoying "compile-time" and "run-time" blocks and simply use deduction.



namespace Standard {
namespace Memory {
namespace Allocator {
    // implicitly "private" and "compile-time"
HashMap(
key := int,
value := VariableReference
allocator := GCAllocator
) Pools;
VariableReference Default := HeapAllocator; // basically "new"
GetAllocator(Size)() {
if (Size > 48)
return Default;
if (Pools.Find(Size) == Pools.End())
Pools[Size] = Pool(Size);
return Pools[Size];
}
public {
Allocate(Size)() {
return GetAllocator(Size).Allocate();
}
Allocate(T)() {
return GetAllocator(T.Size()).Allocate(T);
}
Allocate()(Size) {
return Default.Allocate(Size);
}
}
}
}
}

On reflection, I'm not sure that this is actually *better*. Sure, it removes keywords and scoping from the language, but I'm not sure that it's actually better.

Perhaps the fundamental issue here at hand is the N-pass system to begin with. What you're really talking about is a three-pass deal- one to hold the type of the allocator's compile-time stuff, one to hold the type of the allocator's run-time stuff, and one for the run-time itself. Using the namespace might remove the problems of "How do I access the type's compile-time data and call it's compile-time functions through a TypeReference?". Which you can't.

Actually, that's not really true. The data doesn't actually exist at compile-time. It exists in some sort of mid-time. Code-generation time. The issue is that, well, partly, this stuff is bending my mind in ways it was never supposed to go and it's been much too long. And secondly, I never actually intended the language to be able to do that.

I might just opt for the namespace system for now. It's simpler.

Thursday 1 September 2011

Some politics

In the latest United Kingdom General Election, votes per seat breakdown per party:

Conservative: 34980
Labour: 33370
Liberal Democrat: 119944

That is, a Liberal Democrat vote is worth 27.8% of a Labour vote, and 29.2% of a Conservative vote.

If every Lib Dem seat cost as much as a Labour seat, then they would have had 1.9 million votes. Instead, it was 6.8 million votes. That means that 4.9 million people wasted their votes in the UK- a sixth of the turnout.

Friday 26 August 2011

Coursework

It's done. God help me, but it's done. That means that, in theory, 60% of my workload was just ended. Now I only have two examinations to revise for and then I am free as a bird- to sort out my accomodation, health, financial, romantic (lack thereof) etc, issues.

On the plus side, I've been thinking some more about attributes. Firstly, I've decided to eliminate the different filters- now a filter is a filter is a filter. Secondly, I've decided that no matter how hard I try, it's going to be damn difficult to implement attributes on the level of the ones that ship with the language. Volatile is just the easiest example- but there are other language optimization rules, such as "Any pure function may be replaced by it's return value", or, "The implementation may automatically parallelise code marked as threadsafe", which would be difficult, if not impossible, to genuinely replicate as user-defined rules unless you wrote literally the entire optimizer and code-generator as a compile-time metaprogram- which would be impossible, because there'd be no way to code generate the code generator. Therefore, I've decided that whilst attributes offer neat semantics and relatively easy extension points to the language, the extensions that can be emplaced there are fundamentally limited. I'll have to come up with something much better if I want people to genuinely be able to write their own Const, Pure, etc.

In addition, I've been thinking some more about what I said earlier about compiling to target the JVM or CLR. Whilst I've decided again that that would be fundamentally impossible, I'm now not so sure about the reverse. As long as the Standard libraries and native interface can be dealt with, I see no reason that it should not be possible to generate "DeadMG++" code from MSIL or JVM bytecode. After all, I view them as more strict subsets of "DeadMG++" and don't see any semantics that I can't replicate. The only problem would be reflection and/or run-time code generation. Now, reflection I can work with, especially with compile-time reflection as a working base. However, I'm not sure about the ability to JIT code on the fly. I had planned to include a quite minimal JIT in the Standard library of "DeadMG++", but the JIT capabilities offered by those platforms is well in advance of what I would have put in otherwise. In addition, I would have to consider a performance loss due to a substantially less mature GC system- although compiling to native code would of course be a significant performance gain.

In addition, there's a significant library problem. Java and .NET ship with significant libraries by default- and users of .NET will expect some of the Windows-only functionality as well. This would effectively lead to me having to implement WPF, WinForms, LINQ, and everything else I can think of. Unless I could disassemble them and convert their existing implementations. That would surely be illegal... right? Certainly cool, though, if it could be done.

The final thing to consider would be interoperability- exactly how code generated from MSIL or JVM bytecode would interact with plain old "DeadMG++" code.

Saturday 20 August 2011

Attributes- again

I've been thinking more about attributes. Fundamentally, some of them, like const, are views on a type, and some are copies. This is easily demonstrated because, quite simply, some of them are binary compatible and some aren't. That is, if you do a const_cast, then that's OK. But, if I had a Debug attribute, then the contents of a Debug std::vector may well be very different to the contents of a non-Debug std::vector, and something like a debug_cast could never exist, because there's no rules at all about what you might choose to change in a Debug std::vector. They may, in fact, be completely different types with little in common- or nearly everything in common.

An excellent example of this would be arrays. I might decide that non-Debug arrays don't check their boundaries but Debug arrays do, infact, check their boundaries. I might decide that instead of allocating Debug arrays on the native stack, I might allocate them in a separate memory arena so that they can never overflow the stack, and when an error happens, we can always get a stack trace. Hell, I might have "device" as an attribute and allocate a "device" array on the GPU or something similar- and who knows what that might involve?

That is, some attributes are like filters- you can cast them one way or another without breaking at the binary level, like const. However, some attributes are like template specialization- wholly unrelated to the source, at least, from a binary level.

This returns us to the core of the problem. What about primitive types? Users also have the option to lock user-defined types for the same effect. It's obviously illegal for user code to modify them- and most Standard types would do this, too. But I'd still want the choice to create a Tested Standard type if I wanted to.

This leads us to the inescapable conclusion that filters which do not add functionality must be valid on locked types. Furthermore, it also leads us to the next conclusion, which is that we can generalize const further to a simple removal filter.

namespace Standard {
    namespace Attributes {
        type Const {
            Type := Standard.Attributes.NonAddingFilter;
            Remove(TypeReference T) {
                // etc
            }
        }
        type Threadsafe {
            Type := Standard.Attributes.AddingFilter;
            Add(TypeReference T) {
                // etc
             }
        }
    }
}

The rules for casting are simple- the language will always implicitly reduce the functions available to the user, and always require an explicit cast to increase them. That is, NonAddingFilters are trivial to add, explicit to remove, and AddingFilters are trivial to remove, explicit cast to add, and Replacements cannot be cast at all. This means that AttributeCast will serve as a clean replacement for const_cast across any filter attributes.

For example, consider writing a webpage. You might decide logically, that an Unvalidated string would contain user input. In this case, the difference between a string and an Unvalidated string is, effectively, nothing- only that a function which writes output to the page will only take a string. The problem is that you want to have an implicit conversion in which you validate the contents. However, this violates the contract of a NonAddingFilter, namely that it doesn't add anything. You don't want to inherit from the String class, because that would produce problems like no virtual destructor, and you really don't want the implicit conversion. This is the ideal place for a Copy attribute. Copy attributes receive a type which has one member variable- the type being copied- and all functions (including constructors) appropriately forwarded. This is achievable in "DeadMG++" and not C++ because in "DeadMG++" you can go back and remove them if you don't want them, or simply start afresh. Then, all you would have to do is add a conversion operator and you're done.

namespace Web {
    type Unvalidated {
        type := Standard.Attributes.Copy;
        Result(TypeReference T) {
            T.Conversions.Add([]() {
                return Validate(this.Internal);
            });
        }
    }
}

Whilst writing up this example, I noted that if the function name is always a string then you can't access conversion operators- since the type might not even have an accessible string. This would be akin to executing arbitrary strings at compile-time. Therefore, the only way to allow a conversion to say, a compile-time type argument, would be to simply decltype() the function. This *also* yields the issue of adding functions which take compile-time arguments. If the type has an existing compile-time component, it would be inaccessible to them- that is, the only way to get compile-time functionality out of a type is to write it as a literal. I'm not too happy about this. I think it's going to be a fundamental flaw of the N-pass system that I have designed.

Secondly, the lvalue/rvalue deal. I think that the logical option is to split rvalue references and perfect forwarding. I will have one type, "Reference", which is almost like a "base class" of references. Const will not receive special treatment. A Reference may be either an lvalue or rvalue reference.

Another topic that deserves consideration is the rules for template deduction. Consider:

template<typename T> void func(T& t);
int main() {
    const int i = 1;
    func(i);
}

In C++ this will throw an error, as T cannot be deduced. In "DeadMG++", however, I am certainly considering allowing an equivalent snippet to compile. It's my understanding that C++ doesn't allow such things for legacy reasons- it would break old code. Since I don't have any old code, I feel no compunction to prevent such things from compiling. This is especially true as in "DeadMG++" I have arbitrary attributes- there would be no way to grab them all.

This still leaves me with the problem of volatile, though. The problem with volatile is not just that it behaves like a NonAddingFilter, which is fine, but it has significant implications for the code generating process- something not expressible within the attribute system.

Finally, I've realized that I've screwed myself just a little bit here- and that is, the order of functionality in a literal could matter, something I wanted to avoid. Inherently, the capability to mutate a type whilst it's still under construction is both dangerous and necessary. Consider a trivial snippet:

type T {
    SomeFunc(T) variable;
    SomeOtherFunc(T) othervariable;
}

Which of these two gains precedence? The function "SomeFunc(T)" could arbitrary mutate T in any fashion it desires. It could even lock the type. I could "lock" the type whilst it's constructing- that is, in the body of "T", then "T" is a const TypeReference, not a TypeReference.


I know that my blog posts are getting a bit infrequent and unreliable. I've got a lot going on right now and can't really afford the distraction of dreams about a better language. In a couple weeks, it'll all be over- probably for worse rather than better, but that's life.

Tuesday 16 August 2011

C++ metaprogramming

It sucks balls. Seriously.

You can't debug it. If the compiler picks the wrong overload, good luck asking it why. You can't break in the middle of a template instantiation, and the compiler is it's usual unhelpful self. Half the time it doesn't even specify an error, it'll just say "specialization failed", or "It might be ambiguous, but it also might have no available overloads". Great, compiler. Maybe you could be more specific?

SFINAE prevents deduction. This I never knew, but apparently it actually does. How incredibly unhelpful. You might not want to compile for a reference... so I hope you didn't mind that I made your argument explicit nao.

I also discovered a need to place six bajillion typedefs in my iterator code for no apparent reason or benefit, which had me confused no end. And if you make a typo in your template code (I attempted to use a std::disable_if, which doesn't exist) the compiler gives a syntax error. I don't think so, buddy!

Oh, and then there's the declaration/definition pit of failure.

Not being able to decltype member variables is driving me nuts, since I frequently need to refer to them in decltypes for member function returns in generic code.

Want to buy: DeadMG++.

LINQ in C++

You know, LINQ really is nothing new. It's been supported in the Standard library since there was a Standard. And some trivial proxies/wrappers can prove that. The problem is just that it was butt ugly.


template<typename T, typename F> class proxy {
    T&& t;
    F&& f;
public:
    template<typename TRef, typename FRef> proxy(TRef&& tref, FRef&& fref) {
        : t(std::forward<TRef>(tref)), f(std::forward<FRef>(fref)) {}
    template<typename ret> operator ret() {
        ret retval;
        std::for_each(t.begin(), t.end(), [&](decltype(*t.begin()) ref) {
            retval.emplace_back(f(ref));
        });
        return retval;
    }
};

template<typename T, typename F> auto Select(T&& t, F&& f) -> proxy<T, F> {
    return proxy<T, F>(t, f);
}

What I really need is one: add member interface, not free function, and two, alter the lazy evaluation to be per-value on-demand, not all at once, if possible. To do this I will implement some custom iterators. Then my domination of the galaxy shall be complete. By that, I mean I will feel at least somewhat better. And less devastatingly sick.

Sunday 14 August 2011

Tiobe language index

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

Interesting list. The problem is that when you click on the definitions, you see that it's measured by search engine hits. Simply have more material, or old material, or wrong material, online or a more generic language name and you're "more popular". Apparently, the first 100 pages are checked for relevance. Congratulations- out of the 20 billion results for C, I'm sure that the first 2,500 provide a relevant sample.

I've been thinking more about my university work. It's so depressing. Someone please shoot me. Did I memorize the formula for generating a rotation matrix? No, I damn well did not memorize it. I looked it up on Wikipedia or I Googled it or I used the mathematical support library that comes with my API that actually puts shit on my screen. Now, I appreciate that my time has little value and I'm some nobody student who wastes it on games and languages that will likely never take off, but it sure as hell has more value than the time of Google's servers. That's free. Literally, free. For anyone. Even me.

What a total waste of my time.

Saturday 13 August 2011

More grammatical headaches

I've got a very annoying problem in Bison, namely that it's horrendous. The tool needs functions so badly, the DRY violations are quite extreme. How many rules do I have that are

rule2
    : rule1
    | rule2 rule1

or

rule2
   : rule1
   | rule2 ',' rule1

Then we have even more fun, in the form of expression_not_identifier. That is, I need to match separate rules if the expression is an identifier or not.

It's horrendous. Now, I'm down to a single S/R conflict and it's proving a resistant one. I did successfully eliminate "::" and managed to go back to '.' for all uses.

The problem is, fundamentally, quite simple.

function: foo(x, y) {}
variable: foo(x, y) var;

Now, I *can* convince Bison to delay it by unifying the paths and some other nastiness, but that would violate DRY like politicians violate promises. As such, I'm thinking that a little script, written in Lua- the I-can't-be-bothered language of my choice- ought to handle the DRY problems. Don't you think it's ironic that I'm writing a script in Lua, to generate the input for Bison, to generate the input for the C++ compiler, which will generate a program that can generate code from another input? Meta-meta-meta-meta-meta-programming for the wincakes.

Alternatively, I could write said program in C++.

Tuesday 9 August 2011

String literals

C++0x introduces many features to do with string literals. Now, user-defined literals, I have yet to consider for inclusion. However, I definitely did reject the extended string literals where you can pick your own delimiter and stuff like that. Why is this?

Firstly, in "DeadMG++" you can gain the string by another route anyway, like loading it from a file, reading it from a console, or even from a GUI, at compile-time, which avoids the escaping problem of C++.

Secondly, I really, really think you're doing it wrong with things like regexes. String literals are not designed to have complex semantics expressed in them. Regular expressions could easily have a more functional API, which would be vastly clearer to read for everyone.

For example:

alphabetical := Regex.Range("a", "z") || Regex.Range("A", "Z")
alphanumeric := alphabetical || Regex.Range("0", "9");
identifier := ("_" || alphabetical) 

    && Regex.ZeroOrMore("_" || alphanumeric);

match := identifier(true)(input); 

// throws if non-matching, returns match else.
if_matched = identifier(false)(input, match);

// Doesn't throw, returns bool, or puts match into "match"

They exist Under The Hood™ as FSMs anyway- this is just clearer. Secondly, this API doesn't violate DRY- whereas if you write [a-zA-Z_][a-zA-Z0-9_]*, then you specified the same alphabetical-or-underscore twice.

Of course, the above aren't especially internationalizable, which is a definite flaw. I'd have to write in Letter, Number etc as Standard.

Another thing that I've been re-evaluating is the effectiveness of compile-time/runtime deduction. I've been indecisive about how effective this can really be, but I've decided that ultimately, I should give it a shot. Partly because I feel that most code won't need to operate at both times quite like the allocator does. I am thinking right now of a kind of "supported" induction.

// usings
type AllocatorType {
// Default is private non-static non-virtual runtime

compiletime {
    HashMap(int, VariableReference) allocators;
    Default := HeapAllocator; // "malloc", essentially
    GetAllocator(Size) {
        if (Size > 52) // arbitrary for now
            return Default;
        if (allocators.Find(Size) == allocators.End())
            allocators[Size] = PoolAllocator(Size);
        return allocators[Size];
    }
}
public { // runtime default
    Allocate(Type)() {
        VariableReference Allocator := GetAllocator(Type.Size()); 

        // calls a compile-time function
        return Allocator.Allocate(Type); 

        // calls a run-time function- and returns
    }
    Allocate(Size)() {
        VariableReference Allocator := GetAllocator(Size); 

        return Allocator.Allocate(Size);
    }
    // stuff
}
}    

Having the grammar much closer to completion is really allowing me to get a better feel for coding like this. How about D's invariants? Now, the API for editing functions, I'm not too sure of right now, but worst comes to worst, I'll just directly replace it. Right now, I think that, say, adding a local variable with no name that calls the function in constructor and destructor as the first statement ought to do the trick.

// usings
Invariant(Type, Function) {
    invariant_type := type {
        Type.Pointer() This;
        type(this) {
            This = this;
            Function(This);
        }
        ~type() {
            Function(This);
        }
    };
    ForEach(Type.MemberFunction, [&](MemberFunc) {
         this := MemberFunc.Arguments["this"];
         MemberFunc.Statements.PushFront(Statement.ConstructedVariableDefinition("", invariant_type, this));
    });
    return Type;
}


This brings me back to a slight grammatical change. Previously, type definitions and type literals were different things. Now, I've decided to "promote" type definitions to full expressions. This means that you can do something like

type MyType {
    // .... some stuff here
}.Invariant([](this) {
    assert(i != 0); // semantics of "assert" TBD
});


Now, all I might have to do is create a simple "action" function which will take expressions or functions and execute them immediately...

type MyType {
    int i = 0;
public {
    SetI(value) { i = value; }
    GetI() { return value; }
}.Actions(
    [&] {

        MyType somevar;
        assert(somevar.GetI() == 0);
        assert(somevar.GetI() == somevar.i);
        somevar.SetI(5);
        assert(somevar.GetI() == 5);
    }
});



Gosh, it's like, a unit test or something! Not that I have ever written a single unit test in my entire life. But here's to hoping that arbitrary compile-time per-type actions can be used to implement unit tests.

Poison

Holy shit, I feel like I have been poisoned. This sucks, tremendously.

Oh, no useful post for today, unless I think of something later.

Monday 8 August 2011

Blocks

One thing I'm looking at is "block" syntax. The existing modifiers like public, virtual, static, compile/runtime would be condensed into a "block definition", which would define attributes for all following definitions- and in addition, I might want to add an "attribute" keyword to that.

type t {
public static runtime attribute(Const) {
    ohai() {}
}
public virtual compiletime {
    some_func() {}
}
}

This would significantly reduce the syntactic overhead of static, compiletime definitions, as well as any class where many functions/variables share the same basic attributes. Once I add this (and cut the old stuff), I need to add something to distinguish functions, I need to crack up Lambda expressions, unnamed type literals. These are already parsed but have no associated AST and the parser takes no action upon finding them, and I'll probably also discover a few corner cases where this is true. I also need to cut the ability to put access specifiers at global scope, because that's just wrong, and look again at saving the use of . as a compile-time access notation.

After that, run a few test harnesses on the parser and make sure that it works as advertised, then I'm on to semantic analysis.

The first step of semantic analysis is going to be to find all structures that *must* exist. By that, I mean all functions, all namespaces, and all types which are defined in-place and not metaprogrammed into existence, and add them to the symbol table. Then start from Main() and determine the compile-time program that will yield the program of the next phase- repeat until we get to Main()'s runtime, which will be the final phase.

Once that's done, generate the code, execute it, repeatedly, and then we're done diddly done bun.

I keep having something I intend to post, and I always forget what it is after I remember it. :( In addition, I'm very sick (again!) right now, so expect fewer postings whilst I scream in pain.

Sunday 7 August 2011

More types and attributes

I've been thinking some more about different attributes, namely, how they interact with primitive types. Previously, I had conceptually viewed attributes as a kind of "view" on to a type. Now, however, I'm re-considering that PoV. For example, if the Const attribute didn't ship with the language- then how could I define how Const interacts with primitives? If I define a user-defined Portable attribute or Tested attribute, then I might obviously say that every primitive and Standard function is Portable and Tested, even if not Tested by me, but I can't actually do that because they're immutable.

So, attributes are going to have to be copies- that is, when you have a type T, then a Const T is a completely separate type, and if T was locked, then Const T is not locked and can be modified arbitrarily. This would mean that when you have a Portable attribute, then when you talk about a Portable int, it's a copy and you can do what you wish with it- even though it's obviously illegal to talk about modifying int itself. This should also make life easier in terms of caching and parallelising attributes, as only one copy need exist at any one time.

I've also decided to go with the .NET/Java list of numerical sizes- byte, 8bit, short/char 16bit, int 32bit, long 64bit, all signed as default. Not going to have the variable-size primitives mess, and definitely not the char-is-it-signed-or-not mess.

Specification: I will define all of the primitive types as user-defined types. The "DeadMG++" Standard will not distinguish between UDTs and primitives in any way. A primitive is just a class with hardware support. If I find any property or behaviour of primitives that cannot be specified as UDTs, then I have created my language wrong. Except maybe literals, I might forgive that.

Some more grammar

With the pending completion (finally!) of the parser, I've been playing with writing more and more DeadMG++, not just for fun (although it is!) but also to stress the parser and lexer, and make sure that the lexer produces all the tokens it should, and that sort of thing. Also, gaining some experience here is rapidly looking like it's going to be important. Have a look at the following code:


type StandardAllocatorType {
    HashMap(int, VariableReference) Pools;
    VariableReference Default := HeapAllocator;
    GetAllocator()(Size) {
        if (Size > 48)
            return Default;
        if (Pools.Find(Size) == Pools.End())
            Pools[Size] = Pool(Size);
        return Pools[Size];
    }
public:

    Allocate(Type, TypeReference... Types)(Types.RvalueReference()... arguments) {
        compiletime {
            Allocator := GetAllocator(Type.Size());
            runtime {
                return Allocator.Allocate(Type, forward(arguments)...);
            }
        }
    }
    Allocate(Size)() {
        compiletime {
            Allocator := GetPoolAllocator(Size);
            runtime {
                return Allocator.Allocate(Size);
            }
        }
    }
    Allocate()(Size) {
        runtime {
            return Default.Allocate(Size);
        }
    }
}



Firstly, I know that I haven't spoken about VariableReference. It performs a job similar to the imaginatively similarly named FunctionReference and TypeReference- it refers to a variable which exists in the next pass, i.e., a static variable, where I have created one of type "HeapAllocator", whatever that is (although in the logic at hand, that realistically means malloc, give or take).

The main problem here is that it's not explicit about when the functions, or variables, exist- namely, the variable Pools has no indication when it exists and what disaster you might create by attempting to refer to it at run-time. I will have to alter the compiletime/runtime blocks to include functions and variables in types, too. Something like

type StandardAllocatorType {
    compiletime {
        HashMap(int, VariableReference) Pools;
        VariableReference Default := HeapAllocator;
        GetAllocator()(Size) {
            if (Size > 48)
                return Default;
            if (Pools.Find(Size) == Pools.End())
                Pools[Size] = Pool(Size);
            return Pools[Size];
        }
    }
public:
    Allocate(Type, TypeReference... Types)(Types.RvalueReference()... arguments) {
        compiletime {
            Allocator := GetAllocator(Type.Size());
            runtime {
                return Allocator.Allocate(Type, forward(arguments)...);
            }
        }
    }
    Allocate(Size)() {
        compiletime {
            Allocator := GetPoolAllocator(Type.Size());
            runtime {
                return Allocator.Allocate(Size);
            }
        }
    }
    Allocate()(Size) {
        return Default.Allocate(Size);
    }
}


It's a bit of a mess. This one object exists at both compile-time and run-time. I imagined that most objects would not exist at compile-time and run-time. Normally, it wouldn't be necessary, but we have to have one object of one type. Unless, I added the ability to execute on namespaces too. Then the object could be split into it's two logical parts- run-time allocation, and compile-time type/size tracking, and I might be able to ditch this whole compiletime {} runtime {} thing. Might. That would be swell.

It's also worth noting that I had to sacrifice using the dot notation for namespaces and types as well, and had to revent to the old C++ double colon for that, but I did manage to scrap the var keyword, which I like. Was not a fan of that at all. If I do add the ability to manipulate namespaces, which I guess is just logical after functions, types, expressions, and variables, then perhaps I can get back the dot.

Thursday 4 August 2011

Axioms

Of course, in "DeadMG++" then since you can deal with Expressions as raw semantic types, you can optimize them however you like. That means that we can introduce what's known as Axioms in the C++ world- as usual, a library-only feature.

Axioms are a simple concept. Consider

a = a + b; // 1
a += b; // 2

For most types, you would argue that (assuming the expression a is pure), the two are identical and an optimizing compiler might choose to change 1 into 2. Of course, for the remainder of types this would change the correctness of the program. Axioms are a system where you can tell "most" from "the remainder", allowing you implement very specific optimizations for UDTs which in C++ would be reserved for only primitive types. Pure helps with this at least a little, allowing us to apply CSE.

The main question is how would you design and specify Axioms?

At the most fundamental level, it would be a pair of expressions- transform one into the other for some benefit.

Axiom(
    Original := expr1 = expr1 + expr2,
    Preferred := expr1 += expr2
);

Axiom(
    Original := (expr1++).static_cast(void), // return value is unused
    Preferred := ++expr1
);

The question is, is that enough to implement all the Axioms that people might want, and is it implementable in such a relatively simple state?

References

I've got a slight referential problem.

In C++, you can write code that takes a const reference, and not care whether or not it's an lvalue or rvalue. But at the moment in DeadMG++, then there's no such functionality, because const references aren't given special leeway anymore and won't bind to rvalues. The thing is that, well, this is pretty much unique to constness and not helpful in the general attribute sense.

Secondly, I've got growing concerns about my design- namely, that I have no idea when it'll be finished. Or, to put it another way, I have no way to tell when it has no massive, gaping holes in it. It would be a bit shit to find out that for some reason I didn't foresee the whole thing just doesn't work.

Thirdly, I'm beginning to consider another problem, which is popularity. I mean, I wouldn't even know where to begin to promote a new language, I've never even finished an open-source project.

On the upside, I grabbed a new version of Bison that's massively easier to work with as it allows complex C++ types to be written in. I also added friend expressions and friend accessibility levels and fixed the semicolons problem where you had to do if (expression;) and if( variable = expression;). I also changed constructors/destructors to be more like D's syntax- using this rather than the type name. How else could you call the destructor of an anonymous type? Especially something where the type may be complex to express.

Wednesday 3 August 2011

Function overloading and more variables

Right. I can't ditch function literals, because overloading will be a total mess. What, do I introduce "variable overloading" or something to support it? I will simply have to make them self-referential, to avoid the "this" problem. That is,


using Standard;
using Attributes;
Factorial(i) {
    Factorial.Attributes(Pure, Threadsafe);
    if (i < 1)
        return i * Factorial(i - 1);
    else
        return i;
}


Now that yields an inconsistent interface to member functions. I could, of course, just remove "this". It's not like having to explicitly add "this" to the list will make life a lot harder for anyone, and it will make them much easier to deal with. And, of course, should anyone desire, they can always refer to the type name.


using Standard;
using Attributes;
type MahType {
    Func(MahType.Attributes(Const).Reference() this) {
        Func.Attributes(Pure, Threadsafe);
        return this.i;
    }
    int i = 0;
}


I've also seen a good idea in D, where instead of using the type's name as a destructor syntax, they just used "this". Now that's a good idea, because the type could be ... fun ... to express in metaprogrammed types. Now, I already wanted to have extension methods, so the additional uses of "this" won't be too bad to specify, and since it can't legally appear outside of a function definition, then it'd be just fine.

Now, let's talk briefly about extension methods. I fully plan on eliminating the free/member issue in C++, not least because you can add and remove member methods at any time and make member methods into free functions, and suchlike. However, in "DeadMG++" then "this" need not be the first argument. It's a named parameter, you see :) Of course, defining operator overloads in a type literal is not currently something the grammar supports, although I will have to remember to add that in.

type MahType {
    int i = 0;
    operator>>(Input, this) {
        Input >> i;
        return Input;
    }
}


Now, about those attributes...

using Standard;
using IO;
type MahType {
    int i = 0;
    // Hmmm.. without "const" given special treatment,
    // how could you write a function that doesn't care whether it binds to lvalue
    // or rvalue or the attributes, but as a reference to a MahType?
    operator+(auto.RvalueReference() this, auto.RvalueReference() Other) {
        // Acceptable?
        MahType.MemberFunctions["operator+"].Attributes(Pure, Threadsafe);
        return i + Other.i;
    }
}


I've decided that, well, I totally forgot that now regular objects are just fine to exist at compile-time and it's easy enough to define them as objects. This, along with restoring type and function literals, should solve the problem well enough.

using Standard;
using Concurrency;
using Containers;
type MahClass {
    Lock l;
    HashMap(TypeReference, TypeReference) Memoize;
    operator()(this, TypeReference t) {
        MahClass.MemberFunctions["operator()"].Attributes(Pure, Threadsafe);
        ScopedLock temp(l);
        if (Memoize.Find(t) != Memoize.End())
            return Memoize[t];
        return Memoize[t] = type {
        };
    }
}
namespace Standard {
    namespace Containers {
        MahClass.Attributes(Const) Vector;
    }
}



Finally, I've decided that there could be need for something I've thought about previously- explicit compile-time and run-time delineation. For example, it's easy for me to say that, since a FunctionReference exists as a bunch of data structures in this pass, and an executable function in the next pass, therefore if you call "operator()" then it's got to be at run-time. However, it becomes significantly harder when I consider how you might actually implement that. I mean, in the compiler, sure, but what would happen if you actually had a library that might operate that way?

Tuesday 2 August 2011

Memory allocators and metaprogramming

When making dynamic allocations, there are two main types- static and dynamic. By that, what I mean is that if you do something like

using Standard;
using Containers;
using Memory;
String.Pointer() sptr = Allocate(String);


In this case, you are actually dynamically allocating a fixed size- sizeof(String). This is begging to be metaprogrammed into an object pool allocation. In fact, the Allocate function knows every type or statically sized allocation you ever passed to it. This means that common allocation sizes can be statically moved to an object pool that fits them. More than that, this re-direction can occur at compile-time, not run-time, alleviating a run-time cost for it.

And, of course, a realloc-style function will be part of the "DeadMG++" Standard allocator interface. After all, the worst you can do is just "return allocate()" if you can't or don't want to implement it.

Another trick that I will employ is the extensive use of custom allocators. The C++ Standard does not really ship any custom allocators, to my knowledge. In "DeadMG++", I will ship with custom allocators. The most obvious to use would be an object pool. In addition, I'm considering a thread-local allocator, and one of my first blog posts was about various memory pool or arena schemes which I could also ship. In addition to the GC, of course. The PPL also provides a special memory allocator. The user's life with a custom allocator is also easier because we provide a slicker interface to them, thanks to named parameters.

Monday 1 August 2011

Garbage Collection

I reckon that it might actually be feasible, with a couple of nice tricks that should enable a more performant implementation.

Firstly, GC references will be strongly typed, as opposed to non-owning pointers. They will have the type Standard.GC.Reference(T). This will instantly remove the need to follow pointers which won't lead to garbage-collected references, which will hopefully be, say, most of them. If you point to a type which, directly or indirectly, holds no GC references, then we don't need to follow that pointer. In addition, this may well allow compacting GC implementations, because it's instantly obvious which references are GC and which aren't.

Secondly, the GC will follow raw pointers that may point to objects which may hold GC references. This leads to the direct conclusion that it is undefined behaviour in the general case to hold a pointer that does not point to a valid object which is not null, because the GC might follow it if the link eventually leads to a GC reference. This may force changes in iteration idioms- for example, one-past-the-end is now undefined in the general case, unless (as is the case with node-based containers) the end explicitly exists as a node. However, making the end be the last valid object instead would remove the need to explicitly have an end node, which may yield a performance improvement anyway.

Thirdly, objects can override their GC behaviour. This means that for a type such as Vector, the default behaviour would have to be overridden so that the GC can reach any references to any objects contained within, because the compile-time reflection used to generate the default behaviour will not register the Vector's contained types. Node-based algorithms like lists and maps probably won't have that problem.

Now, I've been considering stack-scanning. Of course, native C++ already includes a mechanism for telling which objects are on the stack- exception handling. It's the same principle, we're just interested in (somewhat) different objects. That is, a stack scan would be the same algorithm as throwing an exception, and then not catching it until main(). That necessarily makes it quite an expensive operation- but not necessarily a huge one and may influence a change in exception handling mechanisms. After all, throwing an exception should be a rare event, whereas GC collections happen frequently. Now, I'm not that familiar with the hardware stack, it's not normally the kind of place I program, but as far as I know, scanning it should be as simple as check the return address to know where we are, and modify the stack pointer appropriately to find the variables that we want. Oh, and don't forget the registers, too.

Next time, I will write about general-purpose memory allocation performance and how metaprogramming can make your program run faster.

Sunday 31 July 2011

FunctionReferences

In a recent post, I discussed plans for my Const attribute system. However, I'm not completely sure about how this is going to work. Now, function attributes and type attributes will likely have to be different things, so let's pretend that I named the previous Attribute type something type-specific like TypeAttribute, and get rolling. The first problem is that the syntax completely doesn't support attributes. At all. I could use this as a keyword but that would conflict with the object this for member functions. I might have to flat out cut function definitions as they are in C++ and completely replace them with lambda expressions.

FunctionReference f([&]() {
    f.Attributes(Pure, Threadsafe);
    return []() {
    };
});


Alternatively, I had plans for "this" to be just another argument... maybe I should just cut "this" as a reference to the object, and set it to be a reference to the function? Another problem is member functions.. again. I can't use this syntax for them, because that would create a run-time FunctionReference, which is not what I'm looking for. I would basically have to cut type and function literals and just go straight to lambdas and types as just another container. In that case, how does one create Types if not by literal? Default-construct a TypeReference? Maybe call a function?

Another problem is going to be dealing with types that have attributes. For example, FunctionReference and TypeReference. You can add attributes to them- but what if I want the actual TypeReference itself to be Const instead of a TypeReference to a Const type? Maybe decltype() can solve that problem.

decltype(FunctionReference).Attributes(Const) f([&]() {
    f.Attributes(Pure, Threadsafe);
    return []() {
    };
});


decltype(TypeReference).Attributes(Const) MyClass([&]() {
    TypeReference T(TypeReference.New());
    T.PublicMemberVariables.Insert("ohai", Standard.Containers.String);
    return T;
});


Owch. You know, I think a little method chaining could solve alleviate that problem.

using Standard;
using Compiler;
using Containers;
using Attributes;

TypeReference MyClass([&]() {
    var x = TypeReference.New()
        .PublicMemberVariables.Insert("name", String)
        .PublicMemberVariables.Insert("type", TypeReference);
    x.OnChange(

        event = any,
        Action = [](arg) {
            throw Standard.Exceptions.Error("Attempted to modify MyClass!");

        }
    );
});



It's not that bad. I can work on it. Maybe I could stick with "type" as the type literal. Now on to the decidedly non-trivial problem: How to define a FunctionAttribute. I suspect that fundamentally, we will ask mostly the same questions as of a TypeAttribute- what happens when we add it, and what happens when we take it away.

decltype(FunctionAttribute).Attributes(Const) Pure([&] {
    Pure.OnRemove = [&](Function) {
        // remove trivially
        // remove the hook we added
    };
    Pure.OnAdd = [&](Function) {
        // Verify that the function is actually pure by checking the expression contents and only calls
        // other pure functions
        // Make sure that nobody changes it behind our back

        Function.OnChange(
            Event := change, // maybe only one event is possible with Functions?
            Action := [&](Function) {
                // Verify pure, or throw
            }
        );
    };
});


ThreadSafe is going to be another kind of problem. I could simply propagate it like const or pure, but fundamentally, a single function cannot be proved to be threadsafe. I guess that I could quite explicitly prove that it doesn't have some kinds of threading bugs, maybe.

Saturday 30 July 2011

Friend functions

So how are we going to improve "friend" for "DeadMG++"?

Well, firstly, we are going to add a "friend" accessibility level, and I'm thinking about making it so that you can make multiple friend levels. I think that this will make "friend" more controllable.

Secondly, I'm going to introduce friendship transferring.

Have you ever tried to friend a Standard function so that it could use your functionality? It probably gave an error anyway, because when you call a Standard C++ function, there's no guarantee that that function actually does the work. Maybe it delegates to some other functions that do it. In this case, it's violating the abstraction, because even though you friended the function that was supposed to do the work, it still doesn't work because you have to friend the function that needs it. I had this problem with private constructors and make_shared in Visual Studio.

This is what friendship transfers are for. Effectively, you can specify individual expressions or individual objects to be friended, rather than necessarily whole classes or functions. And secondly, you can "pass on" your access level, by specifying a function call or object to be a friend- even if you're a free function. This means that if you use helper functions to do your real work, then you can give them your access, hiding their existence from anyone who tried to friend you to get work done.

This should allow more fine-grained and more abstract use of friend than exists in C++.

Variable initializers and global variables

See, I've been musing over global variables. The problem is that in "DeadMG++", it's very difficult to go around *not* using global variables, being as how stuff like types and functions used to be constants and now they're not. One of the most fundamental problems that I've got when dealing with them is initializing them. For example, my post on Const's definition the other day actually violates normal "DeadMG++" rules- that code goes in a function. Now, we can define a lambda that executes itself inline to define arbitrarily complex initializations. For example,

using Standard;
using Containers;
using Attributes;
Map(String, String).Attribute(Const) x = [&] {
    Map(String, String) retval;
    retval["ohai"] = "nub";
    return retval;
}();


This is a trick I've used in C++- it's better than initializer lists, in my opinion. The problem is that initializing some complex types depends on referring to them- similar to using this in the initializer list. For example, my previous definition of Const depends on recursively applying Const to it's member types. This is illegal in C++, but I've decided that it will have to be legalized in "DeadMG++". Referring to a variable in it's initializer will be legal, and it will be legal to take a pointer to it, however actually accessing any part of it will be Bad™, unless explicitly initialized first.

using Standard;
using Containers;
using Attributes;
Map(String, String).Attribute(Const) x = [&] {
    x = Map(String, String)(); 
    x["ohai"] = "nub";
}();

Alternatively, a sort of lazy-evaluation could become a common idiom. Perhaps most types, most Standard types, will simply take a function object that will initialize them as a constructor argument after default-initializing as a construction option.

using Standard;
using Containers;
using Attributes;
Map(String, String).Attribute(Const) x([&] {
    x["ohai"] = "nub";
});

Of course, this is pretty handy in C++ too. In fact, some of the lazy expression evaluation can be performed in C++, if you have a lib that supports it in combination with the new lambda functions. I think I'd rather have this than init lists.

Thursday 28 July 2011

Const- a library feature

The question here is quite simple. How do you implement const as a library feature, allowing users to introduce arbitrary attributes?

The solution here is TypeReference, and it's parallel, FunctionReference. Const is really a type thing, but pure and threadsafe are function things (maybe threadsafe is an object thing too?) I already said that types were references, and now it becomes explicit with type_reference. Type references will not just be pointers, they will also store attributes and, critically, function hooks. When you access data about a type through a type reference, then the results will be influenced by the hook. In the case of, for example, member variables, then they will become const recursively. In this case, we will need a structure to define an attribute.

We will need to know what it means to add const to a type reference. We will need to know what it means to take it away- both in reference versions, as well. Some attributes are trivial to add- e.g., const. Some attributes will be trivial to take away- e.g., threadsafe.

type Attribute = type {
   Standard.Functional.Function(void, TypeReference) OnAdd;
   Standard.Functional.Function(void, TypeReference) OnAddReference;

   Standard.Functional.Function(void, TypeReference) OnRemove;
   Standard.Functional.Function(void, TypeReference) OnRemoveReference;

};
For const, some of these definitions will be relatively trivial.

using Standard;
using Compiler;
using Attribute;
using Algorithms;
var Attribute Standard.Attributes.Const;
Const = [&] {
    var Attribute attribute;
    

    attribute.OnAdd = [&](type) {
        ForEach(type.member_variables, [&](member_variable) {
            member_variable.Type.AddAttribute(Const);
        });
        var mem_functions_copy := type->MemberFunctions;
        type.MemberFunctions.Clear();
        ForEach(mem_functions_copy, [&](member_function) {
            if (member_function.arguments["this"].HasAttribute(Const)) {
                type.MemberFunctions.Insert(member_function);
            }
        });
        type->OnChange(
            event = MemberFunctionAdd,
            action = [&](member_function) {
                if (!member_function.arguments["this"].HasAttribute(Const)) {
                    type.MemberFunctions.Remove(member_function);
                }
            }
        );

        type->OnChange(
            event = VariableAdd,
            action = [&](variable) {
                variable.AddAttribute(Const);
            }
        );
    };
    

    attribute.OnAddReference = [&](type) {
        OnAdd(type.Unref());
    };



    attribute.OnRemoveReference = [&](type) {
        throw Standard.Compiler.BuildError("Error: Cannot bind non-const reference to const object.");
    };
    attribute.OnRemove = [&](type) {
        type.MemberFunctions = type->MemberFunctions;
        type.MemberVariables = type->MemberVariables;
        // Also remove the OnChange functions
        // Not too sure how to make that happen :(
    });
    return attribute; 
};


So what does all this mean? It means that when you add const to a TypeReference, the type becomes const, and all the member variables of that type become const, and all member functions are removed except those where "this" is marked as const. It means that you can trivially add const to references, and that when you do so, the referred-to type becomes const. It means that if you attempt to instantiate a non-const reference to a const object, then the compiler will cry like a little girl. It means that if you add a member function or variable to a type, then the references will update themselves accordingly.

This is pretty much it for the definition of const and how const behaves. However, we still haven't defined what const means. Right now, we could use "const" to mean anything, it's more aptly named as TriviallyAddableNonReferenceRemovableDiscriminating. That definition comes from which members we choose to mark as const. Obviously primitive types will follow the conventions. Other specifiers, like "pure", could come with different costs or accesses.

Wednesday 27 July 2011

Valarray

Have a look at this simple SO question:

http://stackoverflow.com/questions/6850807/why-is-valarray-so-slow/6851501#6851501

This raises an equally simple question about "DeadMG++": what the hell to do about expression templates? In this case, an expression template can more than double the performance of the code. Now, in "DeadMG++" I already described a feature where you can take parsed functions as ASTs and create your own semantic interpretation of them. This could yield code like


c = [&]() { return a * b; };

Where c has an operator= that will take a function, of type .. some type which I have yet to nail down .. which has access to the whole expression tree. In this case, the writer has the opportunity to perform whatever semantics he wants. But what I might do is extend this to be individual expressions too. That would mean that you could write


c = a * b;


and c would be equatable to an expression, and can have whatever fun it wants optimizing the hell out of that expression, including CSE and all the rest, or whatever it decides should be efficient. Of course, I'll need to actually play with both of these systems before deciding what to do about them. Valarray itself, of course, I'll likely not have.

Now, I've already conceived of simplistic systems which could provide optimizations in more common cases. For example, let's say an operator overload. In C++ then each operator overload can only deal with a direct equivalent. However, I considered the possibility that actually, they could be worth more than one. For example, let's say that in the addition operator of std::string, you could take any number of arguments. That would mean that where


std::string a;
a = a + a + a + a + a;


in C++ this would become

a = a + (a + (a + (a + a)));


or, more directly,

a.operator=(a.operator+(a.operator+(a.operator+(a.operator+(a)))));


However, in the above scenario in DeadMG++, this would actually become something more like

a.operator=(a.operator+(a, a, a, a));


The problem is that tasks like CSE can be repetitive and coding them in every type would be quite difficult, so I'd certainly have to try to Standardise some of it. I'm also considering mandating that operators and values must have the same semantics as primitives- i.e., that for any expression like (a = a + b), that the optimizer is legally allowed to reduce it to (a += b). In this case, CSE and other compiler optimizations would run before the expression is generated, and the class would only have to deal with the result, which is smoother and cleaner but less customizable. Of course, I just cut the need for expression templates as they are now with the expression in-built type, so it's questionable as to whether they would have any use that would not fall into the acceptable category. In order to address this properly, I'd need a "pure" specifier, which is "Please eliminate me", as well as "Please parallelize me". There's some considerable overlap between "pure", "const" and "threadsafe", and I'd really need to explicitly define further what means what, precisely.

Pure functions are necessarily valid const functions. Const functions are not necessarily valid pure functions, and neither of them are valid threadsafe functions- after all, you might const_cast away the constness and use something like memoizing, which would not break the promises to the compiler but it would not be thread safe, and in addition, the compiler cannot guarantee that any access is thread-safe.

Now, the real question is if I can make threadsafe more than just a promise. Of course, I could propagate it, but the problem is that determining threadsafety will be more than just checking that all accesses are thread safe. I'm not sure that it's even theoretically possible to determine threadsafety of a function automatically, even given all the knowledge in the world. Unless I want to crack out a virtual machine and run it in every permutation of being accessed and changed, etc.