Friday 26 August 2011

Coursework

It's done. God help me, but it's done. That means that, in theory, 60% of my workload was just ended. Now I only have two examinations to revise for and then I am free as a bird- to sort out my accomodation, health, financial, romantic (lack thereof) etc, issues.

On the plus side, I've been thinking some more about attributes. Firstly, I've decided to eliminate the different filters- now a filter is a filter is a filter. Secondly, I've decided that no matter how hard I try, it's going to be damn difficult to implement attributes on the level of the ones that ship with the language. Volatile is just the easiest example- but there are other language optimization rules, such as "Any pure function may be replaced by it's return value", or, "The implementation may automatically parallelise code marked as threadsafe", which would be difficult, if not impossible, to genuinely replicate as user-defined rules unless you wrote literally the entire optimizer and code-generator as a compile-time metaprogram- which would be impossible, because there'd be no way to code generate the code generator. Therefore, I've decided that whilst attributes offer neat semantics and relatively easy extension points to the language, the extensions that can be emplaced there are fundamentally limited. I'll have to come up with something much better if I want people to genuinely be able to write their own Const, Pure, etc.

In addition, I've been thinking some more about what I said earlier about compiling to target the JVM or CLR. Whilst I've decided again that that would be fundamentally impossible, I'm now not so sure about the reverse. As long as the Standard libraries and native interface can be dealt with, I see no reason that it should not be possible to generate "DeadMG++" code from MSIL or JVM bytecode. After all, I view them as more strict subsets of "DeadMG++" and don't see any semantics that I can't replicate. The only problem would be reflection and/or run-time code generation. Now, reflection I can work with, especially with compile-time reflection as a working base. However, I'm not sure about the ability to JIT code on the fly. I had planned to include a quite minimal JIT in the Standard library of "DeadMG++", but the JIT capabilities offered by those platforms is well in advance of what I would have put in otherwise. In addition, I would have to consider a performance loss due to a substantially less mature GC system- although compiling to native code would of course be a significant performance gain.

In addition, there's a significant library problem. Java and .NET ship with significant libraries by default- and users of .NET will expect some of the Windows-only functionality as well. This would effectively lead to me having to implement WPF, WinForms, LINQ, and everything else I can think of. Unless I could disassemble them and convert their existing implementations. That would surely be illegal... right? Certainly cool, though, if it could be done.

The final thing to consider would be interoperability- exactly how code generated from MSIL or JVM bytecode would interact with plain old "DeadMG++" code.

Saturday 20 August 2011

Attributes- again

I've been thinking more about attributes. Fundamentally, some of them, like const, are views on a type, and some are copies. This is easily demonstrated because, quite simply, some of them are binary compatible and some aren't. That is, if you do a const_cast, then that's OK. But, if I had a Debug attribute, then the contents of a Debug std::vector may well be very different to the contents of a non-Debug std::vector, and something like a debug_cast could never exist, because there's no rules at all about what you might choose to change in a Debug std::vector. They may, in fact, be completely different types with little in common- or nearly everything in common.

An excellent example of this would be arrays. I might decide that non-Debug arrays don't check their boundaries but Debug arrays do, infact, check their boundaries. I might decide that instead of allocating Debug arrays on the native stack, I might allocate them in a separate memory arena so that they can never overflow the stack, and when an error happens, we can always get a stack trace. Hell, I might have "device" as an attribute and allocate a "device" array on the GPU or something similar- and who knows what that might involve?

That is, some attributes are like filters- you can cast them one way or another without breaking at the binary level, like const. However, some attributes are like template specialization- wholly unrelated to the source, at least, from a binary level.

This returns us to the core of the problem. What about primitive types? Users also have the option to lock user-defined types for the same effect. It's obviously illegal for user code to modify them- and most Standard types would do this, too. But I'd still want the choice to create a Tested Standard type if I wanted to.

This leads us to the inescapable conclusion that filters which do not add functionality must be valid on locked types. Furthermore, it also leads us to the next conclusion, which is that we can generalize const further to a simple removal filter.

namespace Standard {
    namespace Attributes {
        type Const {
            Type := Standard.Attributes.NonAddingFilter;
            Remove(TypeReference T) {
                // etc
            }
        }
        type Threadsafe {
            Type := Standard.Attributes.AddingFilter;
            Add(TypeReference T) {
                // etc
             }
        }
    }
}

The rules for casting are simple- the language will always implicitly reduce the functions available to the user, and always require an explicit cast to increase them. That is, NonAddingFilters are trivial to add, explicit to remove, and AddingFilters are trivial to remove, explicit cast to add, and Replacements cannot be cast at all. This means that AttributeCast will serve as a clean replacement for const_cast across any filter attributes.

For example, consider writing a webpage. You might decide logically, that an Unvalidated string would contain user input. In this case, the difference between a string and an Unvalidated string is, effectively, nothing- only that a function which writes output to the page will only take a string. The problem is that you want to have an implicit conversion in which you validate the contents. However, this violates the contract of a NonAddingFilter, namely that it doesn't add anything. You don't want to inherit from the String class, because that would produce problems like no virtual destructor, and you really don't want the implicit conversion. This is the ideal place for a Copy attribute. Copy attributes receive a type which has one member variable- the type being copied- and all functions (including constructors) appropriately forwarded. This is achievable in "DeadMG++" and not C++ because in "DeadMG++" you can go back and remove them if you don't want them, or simply start afresh. Then, all you would have to do is add a conversion operator and you're done.

namespace Web {
    type Unvalidated {
        type := Standard.Attributes.Copy;
        Result(TypeReference T) {
            T.Conversions.Add([]() {
                return Validate(this.Internal);
            });
        }
    }
}

Whilst writing up this example, I noted that if the function name is always a string then you can't access conversion operators- since the type might not even have an accessible string. This would be akin to executing arbitrary strings at compile-time. Therefore, the only way to allow a conversion to say, a compile-time type argument, would be to simply decltype() the function. This *also* yields the issue of adding functions which take compile-time arguments. If the type has an existing compile-time component, it would be inaccessible to them- that is, the only way to get compile-time functionality out of a type is to write it as a literal. I'm not too happy about this. I think it's going to be a fundamental flaw of the N-pass system that I have designed.

Secondly, the lvalue/rvalue deal. I think that the logical option is to split rvalue references and perfect forwarding. I will have one type, "Reference", which is almost like a "base class" of references. Const will not receive special treatment. A Reference may be either an lvalue or rvalue reference.

Another topic that deserves consideration is the rules for template deduction. Consider:

template<typename T> void func(T& t);
int main() {
    const int i = 1;
    func(i);
}

In C++ this will throw an error, as T cannot be deduced. In "DeadMG++", however, I am certainly considering allowing an equivalent snippet to compile. It's my understanding that C++ doesn't allow such things for legacy reasons- it would break old code. Since I don't have any old code, I feel no compunction to prevent such things from compiling. This is especially true as in "DeadMG++" I have arbitrary attributes- there would be no way to grab them all.

This still leaves me with the problem of volatile, though. The problem with volatile is not just that it behaves like a NonAddingFilter, which is fine, but it has significant implications for the code generating process- something not expressible within the attribute system.

Finally, I've realized that I've screwed myself just a little bit here- and that is, the order of functionality in a literal could matter, something I wanted to avoid. Inherently, the capability to mutate a type whilst it's still under construction is both dangerous and necessary. Consider a trivial snippet:

type T {
    SomeFunc(T) variable;
    SomeOtherFunc(T) othervariable;
}

Which of these two gains precedence? The function "SomeFunc(T)" could arbitrary mutate T in any fashion it desires. It could even lock the type. I could "lock" the type whilst it's constructing- that is, in the body of "T", then "T" is a const TypeReference, not a TypeReference.


I know that my blog posts are getting a bit infrequent and unreliable. I've got a lot going on right now and can't really afford the distraction of dreams about a better language. In a couple weeks, it'll all be over- probably for worse rather than better, but that's life.

Tuesday 16 August 2011

C++ metaprogramming

It sucks balls. Seriously.

You can't debug it. If the compiler picks the wrong overload, good luck asking it why. You can't break in the middle of a template instantiation, and the compiler is it's usual unhelpful self. Half the time it doesn't even specify an error, it'll just say "specialization failed", or "It might be ambiguous, but it also might have no available overloads". Great, compiler. Maybe you could be more specific?

SFINAE prevents deduction. This I never knew, but apparently it actually does. How incredibly unhelpful. You might not want to compile for a reference... so I hope you didn't mind that I made your argument explicit nao.

I also discovered a need to place six bajillion typedefs in my iterator code for no apparent reason or benefit, which had me confused no end. And if you make a typo in your template code (I attempted to use a std::disable_if, which doesn't exist) the compiler gives a syntax error. I don't think so, buddy!

Oh, and then there's the declaration/definition pit of failure.

Not being able to decltype member variables is driving me nuts, since I frequently need to refer to them in decltypes for member function returns in generic code.

Want to buy: DeadMG++.

LINQ in C++

You know, LINQ really is nothing new. It's been supported in the Standard library since there was a Standard. And some trivial proxies/wrappers can prove that. The problem is just that it was butt ugly.


template<typename T, typename F> class proxy {
    T&& t;
    F&& f;
public:
    template<typename TRef, typename FRef> proxy(TRef&& tref, FRef&& fref) {
        : t(std::forward<TRef>(tref)), f(std::forward<FRef>(fref)) {}
    template<typename ret> operator ret() {
        ret retval;
        std::for_each(t.begin(), t.end(), [&](decltype(*t.begin()) ref) {
            retval.emplace_back(f(ref));
        });
        return retval;
    }
};

template<typename T, typename F> auto Select(T&& t, F&& f) -> proxy<T, F> {
    return proxy<T, F>(t, f);
}

What I really need is one: add member interface, not free function, and two, alter the lazy evaluation to be per-value on-demand, not all at once, if possible. To do this I will implement some custom iterators. Then my domination of the galaxy shall be complete. By that, I mean I will feel at least somewhat better. And less devastatingly sick.

Sunday 14 August 2011

Tiobe language index

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

Interesting list. The problem is that when you click on the definitions, you see that it's measured by search engine hits. Simply have more material, or old material, or wrong material, online or a more generic language name and you're "more popular". Apparently, the first 100 pages are checked for relevance. Congratulations- out of the 20 billion results for C, I'm sure that the first 2,500 provide a relevant sample.

I've been thinking more about my university work. It's so depressing. Someone please shoot me. Did I memorize the formula for generating a rotation matrix? No, I damn well did not memorize it. I looked it up on Wikipedia or I Googled it or I used the mathematical support library that comes with my API that actually puts shit on my screen. Now, I appreciate that my time has little value and I'm some nobody student who wastes it on games and languages that will likely never take off, but it sure as hell has more value than the time of Google's servers. That's free. Literally, free. For anyone. Even me.

What a total waste of my time.

Saturday 13 August 2011

More grammatical headaches

I've got a very annoying problem in Bison, namely that it's horrendous. The tool needs functions so badly, the DRY violations are quite extreme. How many rules do I have that are

rule2
    : rule1
    | rule2 rule1

or

rule2
   : rule1
   | rule2 ',' rule1

Then we have even more fun, in the form of expression_not_identifier. That is, I need to match separate rules if the expression is an identifier or not.

It's horrendous. Now, I'm down to a single S/R conflict and it's proving a resistant one. I did successfully eliminate "::" and managed to go back to '.' for all uses.

The problem is, fundamentally, quite simple.

function: foo(x, y) {}
variable: foo(x, y) var;

Now, I *can* convince Bison to delay it by unifying the paths and some other nastiness, but that would violate DRY like politicians violate promises. As such, I'm thinking that a little script, written in Lua- the I-can't-be-bothered language of my choice- ought to handle the DRY problems. Don't you think it's ironic that I'm writing a script in Lua, to generate the input for Bison, to generate the input for the C++ compiler, which will generate a program that can generate code from another input? Meta-meta-meta-meta-meta-programming for the wincakes.

Alternatively, I could write said program in C++.

Tuesday 9 August 2011

String literals

C++0x introduces many features to do with string literals. Now, user-defined literals, I have yet to consider for inclusion. However, I definitely did reject the extended string literals where you can pick your own delimiter and stuff like that. Why is this?

Firstly, in "DeadMG++" you can gain the string by another route anyway, like loading it from a file, reading it from a console, or even from a GUI, at compile-time, which avoids the escaping problem of C++.

Secondly, I really, really think you're doing it wrong with things like regexes. String literals are not designed to have complex semantics expressed in them. Regular expressions could easily have a more functional API, which would be vastly clearer to read for everyone.

For example:

alphabetical := Regex.Range("a", "z") || Regex.Range("A", "Z")
alphanumeric := alphabetical || Regex.Range("0", "9");
identifier := ("_" || alphabetical) 

    && Regex.ZeroOrMore("_" || alphanumeric);

match := identifier(true)(input); 

// throws if non-matching, returns match else.
if_matched = identifier(false)(input, match);

// Doesn't throw, returns bool, or puts match into "match"

They exist Under The Hood™ as FSMs anyway- this is just clearer. Secondly, this API doesn't violate DRY- whereas if you write [a-zA-Z_][a-zA-Z0-9_]*, then you specified the same alphabetical-or-underscore twice.

Of course, the above aren't especially internationalizable, which is a definite flaw. I'd have to write in Letter, Number etc as Standard.

Another thing that I've been re-evaluating is the effectiveness of compile-time/runtime deduction. I've been indecisive about how effective this can really be, but I've decided that ultimately, I should give it a shot. Partly because I feel that most code won't need to operate at both times quite like the allocator does. I am thinking right now of a kind of "supported" induction.

// usings
type AllocatorType {
// Default is private non-static non-virtual runtime

compiletime {
    HashMap(int, VariableReference) allocators;
    Default := HeapAllocator; // "malloc", essentially
    GetAllocator(Size) {
        if (Size > 52) // arbitrary for now
            return Default;
        if (allocators.Find(Size) == allocators.End())
            allocators[Size] = PoolAllocator(Size);
        return allocators[Size];
    }
}
public { // runtime default
    Allocate(Type)() {
        VariableReference Allocator := GetAllocator(Type.Size()); 

        // calls a compile-time function
        return Allocator.Allocate(Type); 

        // calls a run-time function- and returns
    }
    Allocate(Size)() {
        VariableReference Allocator := GetAllocator(Size); 

        return Allocator.Allocate(Size);
    }
    // stuff
}
}    

Having the grammar much closer to completion is really allowing me to get a better feel for coding like this. How about D's invariants? Now, the API for editing functions, I'm not too sure of right now, but worst comes to worst, I'll just directly replace it. Right now, I think that, say, adding a local variable with no name that calls the function in constructor and destructor as the first statement ought to do the trick.

// usings
Invariant(Type, Function) {
    invariant_type := type {
        Type.Pointer() This;
        type(this) {
            This = this;
            Function(This);
        }
        ~type() {
            Function(This);
        }
    };
    ForEach(Type.MemberFunction, [&](MemberFunc) {
         this := MemberFunc.Arguments["this"];
         MemberFunc.Statements.PushFront(Statement.ConstructedVariableDefinition("", invariant_type, this));
    });
    return Type;
}


This brings me back to a slight grammatical change. Previously, type definitions and type literals were different things. Now, I've decided to "promote" type definitions to full expressions. This means that you can do something like

type MyType {
    // .... some stuff here
}.Invariant([](this) {
    assert(i != 0); // semantics of "assert" TBD
});


Now, all I might have to do is create a simple "action" function which will take expressions or functions and execute them immediately...

type MyType {
    int i = 0;
public {
    SetI(value) { i = value; }
    GetI() { return value; }
}.Actions(
    [&] {

        MyType somevar;
        assert(somevar.GetI() == 0);
        assert(somevar.GetI() == somevar.i);
        somevar.SetI(5);
        assert(somevar.GetI() == 5);
    }
});



Gosh, it's like, a unit test or something! Not that I have ever written a single unit test in my entire life. But here's to hoping that arbitrary compile-time per-type actions can be used to implement unit tests.

Poison

Holy shit, I feel like I have been poisoned. This sucks, tremendously.

Oh, no useful post for today, unless I think of something later.

Monday 8 August 2011

Blocks

One thing I'm looking at is "block" syntax. The existing modifiers like public, virtual, static, compile/runtime would be condensed into a "block definition", which would define attributes for all following definitions- and in addition, I might want to add an "attribute" keyword to that.

type t {
public static runtime attribute(Const) {
    ohai() {}
}
public virtual compiletime {
    some_func() {}
}
}

This would significantly reduce the syntactic overhead of static, compiletime definitions, as well as any class where many functions/variables share the same basic attributes. Once I add this (and cut the old stuff), I need to add something to distinguish functions, I need to crack up Lambda expressions, unnamed type literals. These are already parsed but have no associated AST and the parser takes no action upon finding them, and I'll probably also discover a few corner cases where this is true. I also need to cut the ability to put access specifiers at global scope, because that's just wrong, and look again at saving the use of . as a compile-time access notation.

After that, run a few test harnesses on the parser and make sure that it works as advertised, then I'm on to semantic analysis.

The first step of semantic analysis is going to be to find all structures that *must* exist. By that, I mean all functions, all namespaces, and all types which are defined in-place and not metaprogrammed into existence, and add them to the symbol table. Then start from Main() and determine the compile-time program that will yield the program of the next phase- repeat until we get to Main()'s runtime, which will be the final phase.

Once that's done, generate the code, execute it, repeatedly, and then we're done diddly done bun.

I keep having something I intend to post, and I always forget what it is after I remember it. :( In addition, I'm very sick (again!) right now, so expect fewer postings whilst I scream in pain.

Sunday 7 August 2011

More types and attributes

I've been thinking some more about different attributes, namely, how they interact with primitive types. Previously, I had conceptually viewed attributes as a kind of "view" on to a type. Now, however, I'm re-considering that PoV. For example, if the Const attribute didn't ship with the language- then how could I define how Const interacts with primitives? If I define a user-defined Portable attribute or Tested attribute, then I might obviously say that every primitive and Standard function is Portable and Tested, even if not Tested by me, but I can't actually do that because they're immutable.

So, attributes are going to have to be copies- that is, when you have a type T, then a Const T is a completely separate type, and if T was locked, then Const T is not locked and can be modified arbitrarily. This would mean that when you have a Portable attribute, then when you talk about a Portable int, it's a copy and you can do what you wish with it- even though it's obviously illegal to talk about modifying int itself. This should also make life easier in terms of caching and parallelising attributes, as only one copy need exist at any one time.

I've also decided to go with the .NET/Java list of numerical sizes- byte, 8bit, short/char 16bit, int 32bit, long 64bit, all signed as default. Not going to have the variable-size primitives mess, and definitely not the char-is-it-signed-or-not mess.

Specification: I will define all of the primitive types as user-defined types. The "DeadMG++" Standard will not distinguish between UDTs and primitives in any way. A primitive is just a class with hardware support. If I find any property or behaviour of primitives that cannot be specified as UDTs, then I have created my language wrong. Except maybe literals, I might forgive that.

Some more grammar

With the pending completion (finally!) of the parser, I've been playing with writing more and more DeadMG++, not just for fun (although it is!) but also to stress the parser and lexer, and make sure that the lexer produces all the tokens it should, and that sort of thing. Also, gaining some experience here is rapidly looking like it's going to be important. Have a look at the following code:


type StandardAllocatorType {
    HashMap(int, VariableReference) Pools;
    VariableReference Default := HeapAllocator;
    GetAllocator()(Size) {
        if (Size > 48)
            return Default;
        if (Pools.Find(Size) == Pools.End())
            Pools[Size] = Pool(Size);
        return Pools[Size];
    }
public:

    Allocate(Type, TypeReference... Types)(Types.RvalueReference()... arguments) {
        compiletime {
            Allocator := GetAllocator(Type.Size());
            runtime {
                return Allocator.Allocate(Type, forward(arguments)...);
            }
        }
    }
    Allocate(Size)() {
        compiletime {
            Allocator := GetPoolAllocator(Size);
            runtime {
                return Allocator.Allocate(Size);
            }
        }
    }
    Allocate()(Size) {
        runtime {
            return Default.Allocate(Size);
        }
    }
}



Firstly, I know that I haven't spoken about VariableReference. It performs a job similar to the imaginatively similarly named FunctionReference and TypeReference- it refers to a variable which exists in the next pass, i.e., a static variable, where I have created one of type "HeapAllocator", whatever that is (although in the logic at hand, that realistically means malloc, give or take).

The main problem here is that it's not explicit about when the functions, or variables, exist- namely, the variable Pools has no indication when it exists and what disaster you might create by attempting to refer to it at run-time. I will have to alter the compiletime/runtime blocks to include functions and variables in types, too. Something like

type StandardAllocatorType {
    compiletime {
        HashMap(int, VariableReference) Pools;
        VariableReference Default := HeapAllocator;
        GetAllocator()(Size) {
            if (Size > 48)
                return Default;
            if (Pools.Find(Size) == Pools.End())
                Pools[Size] = Pool(Size);
            return Pools[Size];
        }
    }
public:
    Allocate(Type, TypeReference... Types)(Types.RvalueReference()... arguments) {
        compiletime {
            Allocator := GetAllocator(Type.Size());
            runtime {
                return Allocator.Allocate(Type, forward(arguments)...);
            }
        }
    }
    Allocate(Size)() {
        compiletime {
            Allocator := GetPoolAllocator(Type.Size());
            runtime {
                return Allocator.Allocate(Size);
            }
        }
    }
    Allocate()(Size) {
        return Default.Allocate(Size);
    }
}


It's a bit of a mess. This one object exists at both compile-time and run-time. I imagined that most objects would not exist at compile-time and run-time. Normally, it wouldn't be necessary, but we have to have one object of one type. Unless, I added the ability to execute on namespaces too. Then the object could be split into it's two logical parts- run-time allocation, and compile-time type/size tracking, and I might be able to ditch this whole compiletime {} runtime {} thing. Might. That would be swell.

It's also worth noting that I had to sacrifice using the dot notation for namespaces and types as well, and had to revent to the old C++ double colon for that, but I did manage to scrap the var keyword, which I like. Was not a fan of that at all. If I do add the ability to manipulate namespaces, which I guess is just logical after functions, types, expressions, and variables, then perhaps I can get back the dot.

Thursday 4 August 2011

Axioms

Of course, in "DeadMG++" then since you can deal with Expressions as raw semantic types, you can optimize them however you like. That means that we can introduce what's known as Axioms in the C++ world- as usual, a library-only feature.

Axioms are a simple concept. Consider

a = a + b; // 1
a += b; // 2

For most types, you would argue that (assuming the expression a is pure), the two are identical and an optimizing compiler might choose to change 1 into 2. Of course, for the remainder of types this would change the correctness of the program. Axioms are a system where you can tell "most" from "the remainder", allowing you implement very specific optimizations for UDTs which in C++ would be reserved for only primitive types. Pure helps with this at least a little, allowing us to apply CSE.

The main question is how would you design and specify Axioms?

At the most fundamental level, it would be a pair of expressions- transform one into the other for some benefit.

Axiom(
    Original := expr1 = expr1 + expr2,
    Preferred := expr1 += expr2
);

Axiom(
    Original := (expr1++).static_cast(void), // return value is unused
    Preferred := ++expr1
);

The question is, is that enough to implement all the Axioms that people might want, and is it implementable in such a relatively simple state?

References

I've got a slight referential problem.

In C++, you can write code that takes a const reference, and not care whether or not it's an lvalue or rvalue. But at the moment in DeadMG++, then there's no such functionality, because const references aren't given special leeway anymore and won't bind to rvalues. The thing is that, well, this is pretty much unique to constness and not helpful in the general attribute sense.

Secondly, I've got growing concerns about my design- namely, that I have no idea when it'll be finished. Or, to put it another way, I have no way to tell when it has no massive, gaping holes in it. It would be a bit shit to find out that for some reason I didn't foresee the whole thing just doesn't work.

Thirdly, I'm beginning to consider another problem, which is popularity. I mean, I wouldn't even know where to begin to promote a new language, I've never even finished an open-source project.

On the upside, I grabbed a new version of Bison that's massively easier to work with as it allows complex C++ types to be written in. I also added friend expressions and friend accessibility levels and fixed the semicolons problem where you had to do if (expression;) and if( variable = expression;). I also changed constructors/destructors to be more like D's syntax- using this rather than the type name. How else could you call the destructor of an anonymous type? Especially something where the type may be complex to express.

Wednesday 3 August 2011

Function overloading and more variables

Right. I can't ditch function literals, because overloading will be a total mess. What, do I introduce "variable overloading" or something to support it? I will simply have to make them self-referential, to avoid the "this" problem. That is,


using Standard;
using Attributes;
Factorial(i) {
    Factorial.Attributes(Pure, Threadsafe);
    if (i < 1)
        return i * Factorial(i - 1);
    else
        return i;
}


Now that yields an inconsistent interface to member functions. I could, of course, just remove "this". It's not like having to explicitly add "this" to the list will make life a lot harder for anyone, and it will make them much easier to deal with. And, of course, should anyone desire, they can always refer to the type name.


using Standard;
using Attributes;
type MahType {
    Func(MahType.Attributes(Const).Reference() this) {
        Func.Attributes(Pure, Threadsafe);
        return this.i;
    }
    int i = 0;
}


I've also seen a good idea in D, where instead of using the type's name as a destructor syntax, they just used "this". Now that's a good idea, because the type could be ... fun ... to express in metaprogrammed types. Now, I already wanted to have extension methods, so the additional uses of "this" won't be too bad to specify, and since it can't legally appear outside of a function definition, then it'd be just fine.

Now, let's talk briefly about extension methods. I fully plan on eliminating the free/member issue in C++, not least because you can add and remove member methods at any time and make member methods into free functions, and suchlike. However, in "DeadMG++" then "this" need not be the first argument. It's a named parameter, you see :) Of course, defining operator overloads in a type literal is not currently something the grammar supports, although I will have to remember to add that in.

type MahType {
    int i = 0;
    operator>>(Input, this) {
        Input >> i;
        return Input;
    }
}


Now, about those attributes...

using Standard;
using IO;
type MahType {
    int i = 0;
    // Hmmm.. without "const" given special treatment,
    // how could you write a function that doesn't care whether it binds to lvalue
    // or rvalue or the attributes, but as a reference to a MahType?
    operator+(auto.RvalueReference() this, auto.RvalueReference() Other) {
        // Acceptable?
        MahType.MemberFunctions["operator+"].Attributes(Pure, Threadsafe);
        return i + Other.i;
    }
}


I've decided that, well, I totally forgot that now regular objects are just fine to exist at compile-time and it's easy enough to define them as objects. This, along with restoring type and function literals, should solve the problem well enough.

using Standard;
using Concurrency;
using Containers;
type MahClass {
    Lock l;
    HashMap(TypeReference, TypeReference) Memoize;
    operator()(this, TypeReference t) {
        MahClass.MemberFunctions["operator()"].Attributes(Pure, Threadsafe);
        ScopedLock temp(l);
        if (Memoize.Find(t) != Memoize.End())
            return Memoize[t];
        return Memoize[t] = type {
        };
    }
}
namespace Standard {
    namespace Containers {
        MahClass.Attributes(Const) Vector;
    }
}



Finally, I've decided that there could be need for something I've thought about previously- explicit compile-time and run-time delineation. For example, it's easy for me to say that, since a FunctionReference exists as a bunch of data structures in this pass, and an executable function in the next pass, therefore if you call "operator()" then it's got to be at run-time. However, it becomes significantly harder when I consider how you might actually implement that. I mean, in the compiler, sure, but what would happen if you actually had a library that might operate that way?

Tuesday 2 August 2011

Memory allocators and metaprogramming

When making dynamic allocations, there are two main types- static and dynamic. By that, what I mean is that if you do something like

using Standard;
using Containers;
using Memory;
String.Pointer() sptr = Allocate(String);


In this case, you are actually dynamically allocating a fixed size- sizeof(String). This is begging to be metaprogrammed into an object pool allocation. In fact, the Allocate function knows every type or statically sized allocation you ever passed to it. This means that common allocation sizes can be statically moved to an object pool that fits them. More than that, this re-direction can occur at compile-time, not run-time, alleviating a run-time cost for it.

And, of course, a realloc-style function will be part of the "DeadMG++" Standard allocator interface. After all, the worst you can do is just "return allocate()" if you can't or don't want to implement it.

Another trick that I will employ is the extensive use of custom allocators. The C++ Standard does not really ship any custom allocators, to my knowledge. In "DeadMG++", I will ship with custom allocators. The most obvious to use would be an object pool. In addition, I'm considering a thread-local allocator, and one of my first blog posts was about various memory pool or arena schemes which I could also ship. In addition to the GC, of course. The PPL also provides a special memory allocator. The user's life with a custom allocator is also easier because we provide a slicker interface to them, thanks to named parameters.

Monday 1 August 2011

Garbage Collection

I reckon that it might actually be feasible, with a couple of nice tricks that should enable a more performant implementation.

Firstly, GC references will be strongly typed, as opposed to non-owning pointers. They will have the type Standard.GC.Reference(T). This will instantly remove the need to follow pointers which won't lead to garbage-collected references, which will hopefully be, say, most of them. If you point to a type which, directly or indirectly, holds no GC references, then we don't need to follow that pointer. In addition, this may well allow compacting GC implementations, because it's instantly obvious which references are GC and which aren't.

Secondly, the GC will follow raw pointers that may point to objects which may hold GC references. This leads to the direct conclusion that it is undefined behaviour in the general case to hold a pointer that does not point to a valid object which is not null, because the GC might follow it if the link eventually leads to a GC reference. This may force changes in iteration idioms- for example, one-past-the-end is now undefined in the general case, unless (as is the case with node-based containers) the end explicitly exists as a node. However, making the end be the last valid object instead would remove the need to explicitly have an end node, which may yield a performance improvement anyway.

Thirdly, objects can override their GC behaviour. This means that for a type such as Vector, the default behaviour would have to be overridden so that the GC can reach any references to any objects contained within, because the compile-time reflection used to generate the default behaviour will not register the Vector's contained types. Node-based algorithms like lists and maps probably won't have that problem.

Now, I've been considering stack-scanning. Of course, native C++ already includes a mechanism for telling which objects are on the stack- exception handling. It's the same principle, we're just interested in (somewhat) different objects. That is, a stack scan would be the same algorithm as throwing an exception, and then not catching it until main(). That necessarily makes it quite an expensive operation- but not necessarily a huge one and may influence a change in exception handling mechanisms. After all, throwing an exception should be a rare event, whereas GC collections happen frequently. Now, I'm not that familiar with the hardware stack, it's not normally the kind of place I program, but as far as I know, scanning it should be as simple as check the return address to know where we are, and modify the stack pointer appropriately to find the variables that we want. Oh, and don't forget the registers, too.

Next time, I will write about general-purpose memory allocation performance and how metaprogramming can make your program run faster.