Sunday 31 July 2011

FunctionReferences

In a recent post, I discussed plans for my Const attribute system. However, I'm not completely sure about how this is going to work. Now, function attributes and type attributes will likely have to be different things, so let's pretend that I named the previous Attribute type something type-specific like TypeAttribute, and get rolling. The first problem is that the syntax completely doesn't support attributes. At all. I could use this as a keyword but that would conflict with the object this for member functions. I might have to flat out cut function definitions as they are in C++ and completely replace them with lambda expressions.

FunctionReference f([&]() {
    f.Attributes(Pure, Threadsafe);
    return []() {
    };
});


Alternatively, I had plans for "this" to be just another argument... maybe I should just cut "this" as a reference to the object, and set it to be a reference to the function? Another problem is member functions.. again. I can't use this syntax for them, because that would create a run-time FunctionReference, which is not what I'm looking for. I would basically have to cut type and function literals and just go straight to lambdas and types as just another container. In that case, how does one create Types if not by literal? Default-construct a TypeReference? Maybe call a function?

Another problem is going to be dealing with types that have attributes. For example, FunctionReference and TypeReference. You can add attributes to them- but what if I want the actual TypeReference itself to be Const instead of a TypeReference to a Const type? Maybe decltype() can solve that problem.

decltype(FunctionReference).Attributes(Const) f([&]() {
    f.Attributes(Pure, Threadsafe);
    return []() {
    };
});


decltype(TypeReference).Attributes(Const) MyClass([&]() {
    TypeReference T(TypeReference.New());
    T.PublicMemberVariables.Insert("ohai", Standard.Containers.String);
    return T;
});


Owch. You know, I think a little method chaining could solve alleviate that problem.

using Standard;
using Compiler;
using Containers;
using Attributes;

TypeReference MyClass([&]() {
    var x = TypeReference.New()
        .PublicMemberVariables.Insert("name", String)
        .PublicMemberVariables.Insert("type", TypeReference);
    x.OnChange(

        event = any,
        Action = [](arg) {
            throw Standard.Exceptions.Error("Attempted to modify MyClass!");

        }
    );
});



It's not that bad. I can work on it. Maybe I could stick with "type" as the type literal. Now on to the decidedly non-trivial problem: How to define a FunctionAttribute. I suspect that fundamentally, we will ask mostly the same questions as of a TypeAttribute- what happens when we add it, and what happens when we take it away.

decltype(FunctionAttribute).Attributes(Const) Pure([&] {
    Pure.OnRemove = [&](Function) {
        // remove trivially
        // remove the hook we added
    };
    Pure.OnAdd = [&](Function) {
        // Verify that the function is actually pure by checking the expression contents and only calls
        // other pure functions
        // Make sure that nobody changes it behind our back

        Function.OnChange(
            Event := change, // maybe only one event is possible with Functions?
            Action := [&](Function) {
                // Verify pure, or throw
            }
        );
    };
});


ThreadSafe is going to be another kind of problem. I could simply propagate it like const or pure, but fundamentally, a single function cannot be proved to be threadsafe. I guess that I could quite explicitly prove that it doesn't have some kinds of threading bugs, maybe.

Saturday 30 July 2011

Friend functions

So how are we going to improve "friend" for "DeadMG++"?

Well, firstly, we are going to add a "friend" accessibility level, and I'm thinking about making it so that you can make multiple friend levels. I think that this will make "friend" more controllable.

Secondly, I'm going to introduce friendship transferring.

Have you ever tried to friend a Standard function so that it could use your functionality? It probably gave an error anyway, because when you call a Standard C++ function, there's no guarantee that that function actually does the work. Maybe it delegates to some other functions that do it. In this case, it's violating the abstraction, because even though you friended the function that was supposed to do the work, it still doesn't work because you have to friend the function that needs it. I had this problem with private constructors and make_shared in Visual Studio.

This is what friendship transfers are for. Effectively, you can specify individual expressions or individual objects to be friended, rather than necessarily whole classes or functions. And secondly, you can "pass on" your access level, by specifying a function call or object to be a friend- even if you're a free function. This means that if you use helper functions to do your real work, then you can give them your access, hiding their existence from anyone who tried to friend you to get work done.

This should allow more fine-grained and more abstract use of friend than exists in C++.

Variable initializers and global variables

See, I've been musing over global variables. The problem is that in "DeadMG++", it's very difficult to go around *not* using global variables, being as how stuff like types and functions used to be constants and now they're not. One of the most fundamental problems that I've got when dealing with them is initializing them. For example, my post on Const's definition the other day actually violates normal "DeadMG++" rules- that code goes in a function. Now, we can define a lambda that executes itself inline to define arbitrarily complex initializations. For example,

using Standard;
using Containers;
using Attributes;
Map(String, String).Attribute(Const) x = [&] {
    Map(String, String) retval;
    retval["ohai"] = "nub";
    return retval;
}();


This is a trick I've used in C++- it's better than initializer lists, in my opinion. The problem is that initializing some complex types depends on referring to them- similar to using this in the initializer list. For example, my previous definition of Const depends on recursively applying Const to it's member types. This is illegal in C++, but I've decided that it will have to be legalized in "DeadMG++". Referring to a variable in it's initializer will be legal, and it will be legal to take a pointer to it, however actually accessing any part of it will be Bad™, unless explicitly initialized first.

using Standard;
using Containers;
using Attributes;
Map(String, String).Attribute(Const) x = [&] {
    x = Map(String, String)(); 
    x["ohai"] = "nub";
}();

Alternatively, a sort of lazy-evaluation could become a common idiom. Perhaps most types, most Standard types, will simply take a function object that will initialize them as a constructor argument after default-initializing as a construction option.

using Standard;
using Containers;
using Attributes;
Map(String, String).Attribute(Const) x([&] {
    x["ohai"] = "nub";
});

Of course, this is pretty handy in C++ too. In fact, some of the lazy expression evaluation can be performed in C++, if you have a lib that supports it in combination with the new lambda functions. I think I'd rather have this than init lists.

Thursday 28 July 2011

Const- a library feature

The question here is quite simple. How do you implement const as a library feature, allowing users to introduce arbitrary attributes?

The solution here is TypeReference, and it's parallel, FunctionReference. Const is really a type thing, but pure and threadsafe are function things (maybe threadsafe is an object thing too?) I already said that types were references, and now it becomes explicit with type_reference. Type references will not just be pointers, they will also store attributes and, critically, function hooks. When you access data about a type through a type reference, then the results will be influenced by the hook. In the case of, for example, member variables, then they will become const recursively. In this case, we will need a structure to define an attribute.

We will need to know what it means to add const to a type reference. We will need to know what it means to take it away- both in reference versions, as well. Some attributes are trivial to add- e.g., const. Some attributes will be trivial to take away- e.g., threadsafe.

type Attribute = type {
   Standard.Functional.Function(void, TypeReference) OnAdd;
   Standard.Functional.Function(void, TypeReference) OnAddReference;

   Standard.Functional.Function(void, TypeReference) OnRemove;
   Standard.Functional.Function(void, TypeReference) OnRemoveReference;

};
For const, some of these definitions will be relatively trivial.

using Standard;
using Compiler;
using Attribute;
using Algorithms;
var Attribute Standard.Attributes.Const;
Const = [&] {
    var Attribute attribute;
    

    attribute.OnAdd = [&](type) {
        ForEach(type.member_variables, [&](member_variable) {
            member_variable.Type.AddAttribute(Const);
        });
        var mem_functions_copy := type->MemberFunctions;
        type.MemberFunctions.Clear();
        ForEach(mem_functions_copy, [&](member_function) {
            if (member_function.arguments["this"].HasAttribute(Const)) {
                type.MemberFunctions.Insert(member_function);
            }
        });
        type->OnChange(
            event = MemberFunctionAdd,
            action = [&](member_function) {
                if (!member_function.arguments["this"].HasAttribute(Const)) {
                    type.MemberFunctions.Remove(member_function);
                }
            }
        );

        type->OnChange(
            event = VariableAdd,
            action = [&](variable) {
                variable.AddAttribute(Const);
            }
        );
    };
    

    attribute.OnAddReference = [&](type) {
        OnAdd(type.Unref());
    };



    attribute.OnRemoveReference = [&](type) {
        throw Standard.Compiler.BuildError("Error: Cannot bind non-const reference to const object.");
    };
    attribute.OnRemove = [&](type) {
        type.MemberFunctions = type->MemberFunctions;
        type.MemberVariables = type->MemberVariables;
        // Also remove the OnChange functions
        // Not too sure how to make that happen :(
    });
    return attribute; 
};


So what does all this mean? It means that when you add const to a TypeReference, the type becomes const, and all the member variables of that type become const, and all member functions are removed except those where "this" is marked as const. It means that you can trivially add const to references, and that when you do so, the referred-to type becomes const. It means that if you attempt to instantiate a non-const reference to a const object, then the compiler will cry like a little girl. It means that if you add a member function or variable to a type, then the references will update themselves accordingly.

This is pretty much it for the definition of const and how const behaves. However, we still haven't defined what const means. Right now, we could use "const" to mean anything, it's more aptly named as TriviallyAddableNonReferenceRemovableDiscriminating. That definition comes from which members we choose to mark as const. Obviously primitive types will follow the conventions. Other specifiers, like "pure", could come with different costs or accesses.

Wednesday 27 July 2011

Valarray

Have a look at this simple SO question:

http://stackoverflow.com/questions/6850807/why-is-valarray-so-slow/6851501#6851501

This raises an equally simple question about "DeadMG++": what the hell to do about expression templates? In this case, an expression template can more than double the performance of the code. Now, in "DeadMG++" I already described a feature where you can take parsed functions as ASTs and create your own semantic interpretation of them. This could yield code like


c = [&]() { return a * b; };

Where c has an operator= that will take a function, of type .. some type which I have yet to nail down .. which has access to the whole expression tree. In this case, the writer has the opportunity to perform whatever semantics he wants. But what I might do is extend this to be individual expressions too. That would mean that you could write


c = a * b;


and c would be equatable to an expression, and can have whatever fun it wants optimizing the hell out of that expression, including CSE and all the rest, or whatever it decides should be efficient. Of course, I'll need to actually play with both of these systems before deciding what to do about them. Valarray itself, of course, I'll likely not have.

Now, I've already conceived of simplistic systems which could provide optimizations in more common cases. For example, let's say an operator overload. In C++ then each operator overload can only deal with a direct equivalent. However, I considered the possibility that actually, they could be worth more than one. For example, let's say that in the addition operator of std::string, you could take any number of arguments. That would mean that where


std::string a;
a = a + a + a + a + a;


in C++ this would become

a = a + (a + (a + (a + a)));


or, more directly,

a.operator=(a.operator+(a.operator+(a.operator+(a.operator+(a)))));


However, in the above scenario in DeadMG++, this would actually become something more like

a.operator=(a.operator+(a, a, a, a));


The problem is that tasks like CSE can be repetitive and coding them in every type would be quite difficult, so I'd certainly have to try to Standardise some of it. I'm also considering mandating that operators and values must have the same semantics as primitives- i.e., that for any expression like (a = a + b), that the optimizer is legally allowed to reduce it to (a += b). In this case, CSE and other compiler optimizations would run before the expression is generated, and the class would only have to deal with the result, which is smoother and cleaner but less customizable. Of course, I just cut the need for expression templates as they are now with the expression in-built type, so it's questionable as to whether they would have any use that would not fall into the acceptable category. In order to address this properly, I'd need a "pure" specifier, which is "Please eliminate me", as well as "Please parallelize me". There's some considerable overlap between "pure", "const" and "threadsafe", and I'd really need to explicitly define further what means what, precisely.

Pure functions are necessarily valid const functions. Const functions are not necessarily valid pure functions, and neither of them are valid threadsafe functions- after all, you might const_cast away the constness and use something like memoizing, which would not break the promises to the compiler but it would not be thread safe, and in addition, the compiler cannot guarantee that any access is thread-safe.

Now, the real question is if I can make threadsafe more than just a promise. Of course, I could propagate it, but the problem is that determining threadsafety will be more than just checking that all accesses are thread safe. I'm not sure that it's even theoretically possible to determine threadsafety of a function automatically, even given all the knowledge in the world. Unless I want to crack out a virtual machine and run it in every permutation of being accessed and changed, etc.

Success := It comes to me

So, I realized that my previous musings of library and language separation are quite irrelevant. After all, I already was describing systems of compile-time function manipulation. So one simple idiom later, and those problems are, well, clearly resolved.

GenerateCode(function) {
    // Generate your own code from the contents of function here
    // And emit the result as output from the process
    // completely by-passing the main compiler
    // Alternatively, compile for JVM or CLR or whatever you feel like
}

main() {
    GenerateCode(real_main);
}

real_main() {
   // Really run the program here
}

Meta-programming is so powerful.

In other news, the parser construction is going a lot more solidly. I grabbed Bison 2.5 using Cygwin, and it's considerably more impressive than it's previous version. Adding a var keyword quickly resolved my ambiguity problems (turns out you can't function call and function define in the same scope, so all I had to do was disambiguate variables from everything else). All I'm doing now is wrapping up actually creating the AST, making sure the lexer has been updated with all the changes, and then it's time to semantically analyze, and by that, I mean get meta-programming. After that, all that's left is code generation, and probably updating for everything I notice I've forgotten. For example, right now, I've got no initialization lists in constructors, whoops.

I've actually been seriously considering implementing for JVM and CLR. They would hardly be the most maintainable Java/C# code you ever saw, but I think I can make it work. It would certainly increase the probability of the language being adopted. I even think that it can be implemented as a library, which would be so sick.

Oh lord, I might actually succeed.

Tuesday 26 July 2011

Language and Library separation, and some Library plans

Library and language separation. When the language is simple, then it's easy to enforce a separation. However, as the language becomes more complex, then it's getting more difficult. For example, earlier, I posted about the need for a Standard.Map -like structure to define the type 'type' by. It would be much easier to recycle those definitions of how a Set should behave, not least easier for the implementers.

What I could do is simply leave the type as undefined, and merely mandate that 'type' can be used. In conjunction with the language providing an explicit cast in situations where it's needed, this should allow the type 'type' to be part of the Standard library, and even inherited from. What I don't like about this approach is that you could never write your own Standard library for a specific implementation, even with a copy of the operating system and processor architecture documentation. I guess that I'll have to live with extending that definition to "implementation documentation". Hell, it's not like you couldn't write- and use- your own compiler- at compile-time- and then run the resulting code- at compile-time- and use it to compile your own program-  if you really, really, really wanted to, I guess.

I should just include a note in the "Standard" saying, quite clearly, that the implementation shall document all APIs necessary to implement the whole Standard library.

Maybe I should begin constructing an actual Standard document. It would be much easier to have a full reference of all my ideas, it would be easier to communicate with people, and it would be easier to make sure that I haven't missed anything. Of course, a Standard is not an actual Standard without a long, drawn-out ISO committee waste of time. But a full specification would be an advantage.

Library changes. Firstly, as you may have gathered, there's now the use of sub-namespaces (the existing std namespace was way too cluttered). The containers will go in, imaginatively, Containers. I'm going to strip the unordered containers and call them hash instead, e.g. HashMap. That's just more descriptive. In addition, I will add an unordered_vector. The main difference between Vector and UnorderedVector is that you could call unordered_vector just a bucket full of stuff- for example, you can erase O(1) by swapping to the end and then popping off the back.

Most things will remain mostly the same, but I'm definitely going to introduce a full range object and use it where appropriate to simplify the use of algorithms, and cut the number of overloads of many functions drastically and use named parameters instead- especially functions like constructors. Strings, I am so cutting the string/wstring debacle and just going to go UTF-16. Java and C# and Windows are all UTF-16, and I figure that it's just the most compatible way to go. I'm not sure about what the new IOStreams are going to look like, but they're certainly not going to look like the current ones. For example, I am leaving buffering to be an implementation detail and not going to define any of it in my interfaces, and I will probably go back to templates instead of virtual functions for polymorphism.

In addition, I'm going to add some more functional algorithms, like map and reduce, and I've talked about some Interop functionality. I also want to add some Standard GUI code. Honestly, the point of Standard GUI libraries is not to be the latest and greatest in GUI work, but something simple and functional that can be used to create simpler GUI applications.

I'm also definitely considering adding support for DSLs for some relatively ubiquitous languages like SQL (i.e., kind of like LINQ but it'll suck less in terms of error reporting) or XML as Standard.

Threading. I don't want to just wrap atomic operations or mutexes. I want something along the lines of TBB  or PPL. I want something smoother, more integrated. I also need to address the use of compile-time threading, which is something I want to hide from the user. I definitely don't want to force serialized compilation, but I also need to perform sometimes complex mutations at compile-time, and I'm going to need a powerful threading library to support that, and possibly a couple of language additions too.

I've also been thinking about mathematical support. Let's face it, there's really no need for everyone to define their own Point class or their own Vector3 class or their own Rectangle class. The Standard can provide that. Everybody's life would be much easier if the Standard provided a little more in terms of BLAS support. 

I also want to version the library separately from the language. If there's a problem, I want it resolved sooner rather than later. I don't want stuff that's bugged sitting around for a decade before it gets fixed. In addition, I want to be able to provide full source code for all functionality that isn't compiler, OS or processor specific- that is, all functionality built on functionality from the language. If I'm going to say "You must provide SharedPointer", then I want to have source code that is portable to show for it and say "Well, if you can't be bothered to write your own, here's mine so just copy and paste it and you're done".

What I will definitely not do is ever refer to the C or C++ Standards, or include the C or C++ Standard library. If I ever see anyone tag an SO question "DeadMG++/C++" without discussing interoperation, I'll flip my brains.

Another problem I've been considering is variadics. Ultimately, I feel like now that we can produce complex data structures at compile-time, they should be part of the library. However, I'm not totally sure how I could retain the relatively seamless integration of variadics, mostly as relates to deduction. Of course, now that regular iteration over them can be done.

I guess that ultimately, "DeadMG++" really offers a new style, where functions and types are generated, and there's old-style "C++" where they are just literals. The new style is much more powerful and flexible, but the old style can be easier to use.

Monday 25 July 2011

Legacy types, type baggage, and CamelCase

I've been thinking about clearly separating some functionality out into a standard Legacy namespace. For example, function pointers. I already had plans to Standardise a small JIT functionality, and this will give me the opportunity that I need. I want to make it very clear that function pointers should not be used for regular code anymore.

func(float, float) -> int {
    return 1;
}
std.functional.function(int(float, float)) my_func(func);
std.legacy.function_jit("cdecl", int(float, float)) my_func_jit(my_func);
std.legacy.function_pointer("cdecl", int(float, float)) my_func_ptr = my_func_jit; // implicit cast


Other things I'm expecting to put in the legacy namespace will be tools for dealing with C-strings, for example, as well as (maybe?) things like error codes. Perhaps instead of legacy, it should be interop, and I'll add things for dealing with things like BSTRs, other string encodings, that kind of thing.

Another thing I've been thinking about is type baggage. For example, if you look at a C++ type, then they don't just have member functions and member variables (and static variables)- they also have typedefs and constants, which I am terming compile-time baggage. Now, in "DeadMG++", the simplest way to do this would be to just pack the baggage into a struct. However, of course, a struct is not a type and cannot be used to declare types, which would be somewhat problematic. Fortunately, casts are here to the rescue. Effectively, this would be given the same treatment as bool- language constructs which expect a type, such as variable definitions, would explicitly cast the result of any expression to 'type'. Of course, users might find that an implicit or even automatic cast would be the right thing to do, but that's none of my business.

More, I've been thinking about naming conventions. I always have a naming convention problem. I suck up the naming convention of code I'm using like a vacuum cleaner, which makes my code a bit rough when I'm dealing with std::vector<T>::push_back() and ReadFile in the same code. That's why I've decided that "DeadMG++" will use the Windows API conventions- which are also followed by C# and (close to) Java. That is, I will likely use things like Standard.Legacy.FunctionPointer, as opposed to std.legacy.function_ptr. Oh, and I also decided to use the dot notation for namespaces- they're objects like everything else, right?

The simple fact is that now that using definitions are not broken, then you can effectively re-name it however you like, and it also makes using verbose names much less of a big deal. You can see this in C#- who complains about System.Collections.Generic.Dictionary? Nobody, because everybody just does using namespace System; using namespace Collections; using namespace Generic;.

Sunday 24 July 2011

GPU parsing?

See, I figure that, in any specific program of a non-trivial size, there must be millions of statements and/or expressions that are only a few characters, and that they could all be lexed and parsed in parallel. This sounds lik a job for a GPU- massive parallelisation on a task. The only problem is that lexing and parsing will be very branchy integer code and GPUs prefer less branches, more floating-point. I wonder if it would be viable?

Apparently, Fermi (and likely AMD's new architecture) offer equal 32-bit int and 32-bit floating-point performance. Still very branchy code... but that's what mass parallelism is for, amirite?

Now, first you need to know the scope you're in - type, namespace, function. Consider the ANTLR grammar for a namespace:

namespace_definition: 'namespace' identifier '{' 
    (variable_definition | function_definition | namespace_definition)+ 
'}'

The "Parallelize Me" sign here is that big plus sign saying "One or more". So we start off in serial, and let's say that we find the namespace token, and we know we're in a namespace. What we do is we find the terminating '}', so we know everything up to that point is a sequence of (variable_definition | function_definition | namespace_definition). The trick to the parallelising here is that each of them is clearly delineated. Let's have a further look at the definition of these rules.

variable_definition : [stuff] ';'
function_definition : 'function' [stuff] '{' [stuff] '}'
namespace_definition : 'namespace' [stuff] '{' [stuff] '}'


Once we have the list of contents of the original namespace, we just search through it and partition off all the contents into their separate rule invocations. Another sweet trick here is that we don't need to know the difference between a function_definition and a namespace_definition to parallelise at this point- we only need to know where they lie. Once we've separated out the invocations, we just stick them in a list, and parse them in parallel.

Now, an important point is that this is only partially independent. For example, inside the curly brackets, you need to know whether you're a namespace or a function before you can parse the contents. Once you know, however, the above rule can be trivially re-applied, until you reach the "atom" of the grammar- expression-statements. Haven't come up with any way to parallelize the parsing of individual expressions- most. Some, like indexing expressions, can be parallelized and importantly, type_literals indeed can also be parallelised, should I choose to include them. However, I'm going to go out on a limb and say that in a non-trivial program, there could be millions of expressions.

Now, ideally, I'd split this workload up into two segments, CPU and GPU parsing. CPU parsing will do the initial separation work- basically, determine all the expressions. The GPU will then parse all the expressions. Of course, since C++ AMP doesn't exist yet, then I have no desire to actually write a single line of GPGPU code and will probably just stick to CPU parsing, but it's an interesting idea.

Saturday 23 July 2011

Literal syntax

I've been thinking again about literal syntax- that is, defining type literals.

type t = new type;
t.public_functions.insert("DoStuff", [&](this) {
        return 1;
});

type t {
    DoStuff() {
        return 1;
    }
}


Is the first form really that bad? I could just cut the second form entirely. It would be a bit ... strange, I guess. The literal is more natural. But I honestly can't think of any reason to actually keep it, and ditching it would save the grammar from having to deal with it. I could also avoid having to deal with "this" as a keyword- along with many other keywords I've managed to eliminate. Like, say, dynamic_cast, and new. I wonder if the other cast keywords can also be eliminated?

If I went with form 1 instead of form 2, I could have public/private/protected just as strings.

Could it also be used to remove declarations/definitions? I mean as an implementation detail. When a function is added to the type, then it doesn't have to make full semantic sense until the pass is complete and the type is instantiated. Adding the function would effectively "declare" it. If all functions were written in Form 1 instead of Form 2, then I would never need declarations.

The problem is that some devs may well find it unintuitive. It's one thing to attract people with "You can do whatever the hell you want" and another to say "You have to write in this new crazy syntax". And what about free functions? Whilst it would be great to transform Form 2 into Form 1, I'm not sure that dropping Form 1 would actually be a smart idea.

Lexing and parsing

So, I manually constructed a working lexer. It wasn't that tough. Now when I'm constructing a working parser, it's getting a little tougher, although on the plus side, the parallelisable design is going to be sweet. Consider the following:

vector(identifier) var;

versus

vector(identifier) { }

Owch. This gets even hairer if I start passing, say, a lambda to my metafunction. Now, I believe that this is resolvable, but, well, it would be difficult. As such, I've decided that pretty much the only unambiguous way to make this work is going to be to introduce a "function" keyword.

function vector(identifier) {} // I'm a function!
vector(identifier) var; // I'm a variable!


Earlier decision points will make it much easier to parallelise, as well. I also decided to, well, flat out ditch type literals, and for the same reason I will probably not bother with initializer lists either- or certainly not to begin with. I will now do

t = []{ 
    type t;     
    // some more stuff
    return t;

}();



The previous system will be, quite simply, not in the language. Hopefully once the parser is completed, I can begin working on the semantic analyzer and get back to posting about something that isn't either grammar or theory.

Ultimately, the grammar can be fine-tuned later. What I need to do is get cracking and finish a parser that can do more than just parse namespaces, which it currently can but that's not terrifically impressive, and then I can get cracking on semantic analysis.

Friday 22 July 2011

Mutable Types Part 2: dynamic

Another implication of mutable types and unified compilation is dynamic. Here I'm referring to the specifier used in .NET 4 to refer to types where operations are determined at run-time, and the compiler will accept anything. In "DeadMG++" then we can implement it as a library using mutable types. dynamic will operate like boost::any does in C++ in terms of holding and storing any value. The new trick is how it can be operated on dynamically- that is, when you attempt to access a member or function, then a new one will be generated and added, and implemented or thrown on.

type dynamic = struct {
    type base = struct {}
    std::unique_ptr(base);
public:
    dynamic(object) {
        type derived = struct : base {
            decltype(object).remove_reference() value;
            // implement constructors & etc
        }
        base = std::dynamic_allocate(std::unique, derived, forward(object));
    }
    // etc
};
dynamic.on_change(
    event = member_variable_access,
    action = [&](name) {
        base.public_functions.insert(
            name := "__" + name, // reserved names
            function := [&](this) {
                if (decltype(this).has_public_variable(name))
                    return dynamic(this.name);
                else
                    throw std::no_operation(...);
            },
            virtual := true
        );


    }
    // Repeat for member function

);


This is, of course, somewhat simplified pseudo-code. I haven't gone anywhere near locking down the API of type. However, I hope that you can get the gist of it.

One language that can do dynamic types, static types, type inference, meta-programming? Now that's what I call generic.

Thursday 21 July 2011

Unified Compilation

In my previous post, I mentioned unified compilation, but in this post I'm going to go into a little more depth. So what is unified compilation? Quite simply, unified compilation is the idea that everything gets compiled at one time, unless you have a desperate need for a C interface and run-time loading. As a static-favouring language, that would mean at compile-time. Now, I've already expressed the idea that all code could be expressed as a meta-function, and this is just the next step in that line of reasoning. This would imply that basically all "DeadMG++" code would be shipped as meta-functions that add the code to the compiler structures.

The downsides, as they are, are quite simple- you can't change your code without a re-compile. The other potential downside is longer compile-times. Now, the language offers for some serious compile-time optimizations that I've already discussed, mostly involving compiling functions to assembly and then executing them to compile the program, but quite simply, it's doing a lot more work at compile-time, so I'm not that bothered about longer compile-times, and the users see a real ease of use benefit from it. Ultimately, programmer time is more valuable than compiler time. Compiler time, you can solve by throwing money at the problem. Programmers need training, experience, offices, that sort of thing.

The upsides are very numerous. The very best optimization, as the whole program is known at once. Shipping "templates" that aren't in source code form, don't take forever and a million compiler-specific changes to make work. Compile-time reflection (and run-time reflection can be likely implemented on top of it) and other meta-programming. Type inference across library boundaries. No binary compatibility problems with using complex types, throwing exceptions, adding overloads, or anything like that. Code for different architectures and platforms could be shipped as a single library. It would be similar to how managed code works, except it would operate at compile-time. And be better, of course.

This doesn't exclude dynamic loading. I would expect that the very minimum would be expressed as a dynamic load, if any, and then the rest would be in pre-compiled static loading format.

Wednesday 20 July 2011

The value of mutable types

So, previously, I've been discussing some of the problems that come up with mutable types. Recently, I've been revisiting some of the advantages, and they may cause me to re-think my approach to mutability of types. For example, take this simple, easy dynamic_cast implementation.

type t = struct {};
t.public_functions.insert(
    name := "dynamic_cast",
    function := [&](type other_t)(this) {
        if (decltype(this).inherits_from_or_is(other_t))
            return forward(this);
        else
            throw std::bad_cast();
    },
    virtual := true
);


The logic is pretty simple. For each dynamic_cast attempted, insert a new function (the instantiation) into the vtable and then call it, and we return this or throw appropriately. This requires two things: first, unified compilation, which I will go into in another post, and secondly, that the type t is still mutable, and that all it's derived classes are still mutable. This means that, in my previously proposed solution, this can't work for any type that you may wish to place in a place where non-mutability is required or any type where any of it's derived types are placed somewhere where non-mutability is required.

The upside is that one, assuming that indexing into a giant vtable isn't a problem, this is one hell of a faster implementation than the linked-list scanning seen in C++ dynamic_cast implementations, and two, you can dynamic_cast types that don't currently have virtual functions, and indeed, if it's never inherited then we can issue appropriate optimizations/warnings/etc at compile-time, and three, dynamic_cast would no longer have to be part of the language, which is a massive win. Of course, this technique would be generalizable to any virtual function which has compile-time arguments, and that could be a big win. I don't think even managed languages can do that.

However, an interesting thing hit me- mutability of types could solve itself. So I've decided to ditch the immutable_type thing I had going on before and I'm thinking about to something more fine-grained. I will have an event-based model that can fire functions when types are modified. This way, we can even choose what to do in reaction. If you want an immutable type, you can simply set the event to "any" and throw an error in any condition.

type vector(type t, bool throw_on_change = false) {
    if (t.is_pod()) {
        type retval = ...;
        // generate an optimized std::vector
        t.on_change(
             
             event := is_not_pod,
             action := [&]() {  
                 if (throw_on_change)
                     throw build_error("Attempted to change type to non-POD.");
                 else
                     // change retval to be non-optimzed now           
             }
        );
        return retval; 
    }
    // Do non-optimized version

}

This is more effort on the part of type creators, but vastly more fine-grained and controllable, and it saves everyone from having to use immutable_type.

Tuesday 19 July 2011

Type object interface

A better interface would be to use containers. I've always considered types to be reflectable on and always planned to include static reflection in the language. One of the primary reasons I wanted to move to a types-are-objects system. In this case, it could be considerably easier to define the interface as containers. You could consider "type" as a type, like

class type {
public:
    std::map(std::string, type) private_variables;
    //... and more
};
type MyClass = new type;
MyClass.private_variables.insert("MyArray", int.array(10));
MyClass.public_functions.insert("operator[]", [](this, index)
{
    return forward(this).MyArray[index];
});


This will trivially extend our API to allow for having a good look in what's in those containers right now. The thing is that I definitely do not want language primitive types to be defined in terms of library types. I want a strict separation between language and library- no std::type_info in "DeadMG++". Even if I supposedly encapsulated the type, in reality any Standard library would have to follow it's conventions.

I guess the reality is that unless I move a whole bunch of library stuff into the language, it'll be impossible to separate the two. I should just do my best to specify as little as possible about the implementation of type.

Monday 18 July 2011

Forwarding on member functions

What I'm looking to do is eliminate member functions as a syntactic element, except where accesing them is concerned. The main reason I want to do this is const correctness- or even volatile correctness. Everyone is going to be familiar with a relatively basic situation:

class MyClass {
    int MyArray[10];
public:
    int& operator[](std::size_t index) {
        return MyArray[index];
    }
    const int& operator[](std::size_t index) const {
        return MyArray[index];
    }
};


Does anyone ever add the volatile overload? Does anyone ever add the const and volatile overload? This is as bad as const/non-const references in C++03. Defining these as extensions may be considerably easier.

type MyClass = new type;
MyClass.add_private_member_variable("MyArray", int.array(10));
MyClass.add_public_member_function("operator[]", 
[](this, index) -> auto.rvalue_reference()
{
    return forward(this).MyArray[index];
});


In this format, you can see that we can perfectly forward on this, something which is impossible in C++0x. Our returning of that reference will always respect const-correctness, volatile-correctness, lvalue-rvalue correctness, and any other kind of correctness I can think of that may be forwarded, which may include thread-safety correctness (thread-correctness?) in a single function.

Sunday 17 July 2011

Unparsable

So I have a horrifically context-sensitive grammar. Maybe this is the source of why I had so many problems with my grammar in ANTLR and such.

Consider this simple example:

GetTypes()[10] identifier;

Array of type "GetTypes()" or an object the tenth element of GetTypes()?

This gets even worse when you start introducing pointer specifiers.

GetTypes()[10]* identifier;

Pointer to array of ten elements, or pointer to the tenth element, or even just a regular old multiply. Hence I've decided to change the grammar and remove all type modifiers. They will now be member functions on "type". For example,

GetTypes().array(10).pointer() identifier;

This should be differentiable from

GetTypes()[10]* identifier;

even if it could be fairly verbose. What I'm really looking to do in my custom parser here will be to find the semicolon first, and work backwards. The trick is that no two expressions exist without syntax between them, like an operator or comma. If I find such a case, then it must be a variable definition, because it's not a valid expression. Of course, determining the difference between any of the above cases and a statement (that isn't an expression) will be easy- it really doesn't look like a for loop or anything like that.

Saturday 16 July 2011

Flex and Yacc

Damn, these tools suck. Flex doesn't have a version for Windows newer than 1997, and will do fun things like #include <iostream.h>, forward-declare Standard classes and use Standard classes before (attempting) to include the correct header. Yacc will call functions without declaring them, implicit-int and other glories of K&R C. I'm just going to write my own parser now. Maybe it'll be slow, maybe it'll be buggy, but hell, it'll at least compile.

I know that I haven't been posting very frequently right now. However, you can feel glad that there are five new posts scheduled to go up over the next five days, and they're good ones, so stay tuned.
Damn, I never figured that defining the grammar would be so difficult. I guess that it's easy for me to think "The C grammar but cut all the declarator crap add some C++ features and types as expressions" and be done with it, but actually defining it so that I can parse it is a significantly greater challenge, even using compiler-compiler tools like yacc and lex. It turns out that you have to define grammar rules in a very context-sensitive fashion respective to exactly what is in the grammar, and modifying an existing grammar or defining a new one is a decidedly non-trivial process.

For example, take the following example grammar:

expression : arithmetic-expression
           | access-expression;

access-expression : expression '[' expression ']';

arithmetic-expression : expression '+' expression;


Of course, whilst this is the intent of the code, then actually this is thoroughly ambiguous and a simple input will yield the problem:

1 + 2[3]

Is it a:

(1 + 2)[3]

or b:

1 + (2[3])

This means that the need for disambiguating parenthesis is confusingly implied and the recursion has to be specified. The expression syntax instead has to look something like this:

expression : '(' arithmetic_expression ')'
           | access_expression;

access_expression : arithmetic_expression '[' arithmetic_expression ']';
arithmetic_expression : bitwise_expression '+' bitwise_expression;


You get the picture. A pity that this process cannot be automated. The grammar for even a relatively simple language like C is actually hundreds of lines long, a lot more complex than I had imagined. However, should I successfully complete a grammar, then both the parser and the lexer will be generated for me, which is nice, and the first step towards realizing my meta-programming type-inferring dream.

Thursday 14 July 2011

Regular source files as functions

Well, I came to the obvious conclusion that compile-time functions could be functions. But what about regular code too?

main.wc:
namespace bar {
    foo() {}
}
main() {
     bar::foo();
}


void ExportedMainWC(Compiler* comp) {
    static FunctionAST fooast;
    static CallAST callfooast(fooast);
    static FunctionAST main(fooast);
    namespace* ptr;
    if(comp->global_namespace["bar"] != comp->global_namespace.end())
        ptr = comp->global_namespace["bar"];
    else
        ptr = comp->global_namespace["bar"] = new namespace;
    comp->global_namespace.add_function("main", main);
}



The fastest code is machine code. It's kind of like pre-compiled AST. What I'd need to do is add one function for declarations and another for definitions, because I just realized that I completely don't support that in this quick example. But it's still genius.

Monday 11 July 2011

Grammar

Right. The time has come to, hopefully, determine a grammar. So far, I know what I want from the grammar, but as for whether it's genuinely feasible or not is another matter, and actually working through the EBNF for it is going to be tough, because I usually hate formally defining things. The plus side is that ditching templates will remove a trunk of ambiguity and the resulting grammar of C++ shouldn't be too bad to deal with, it's mostly C's easy-mode grammar. The down side is that having arbitrary expressions as types is probably going to hurt, and I'm still going to have to end up with a templates-esque grammar.

So far, I'm thinking of just using regular brackets for function arguments, regardless of which pass they're at. That would turn static_cast<float>(0) into static_cast(float, 0). The problem is defining the grammar for functions which make extensive use of computed types and all of the implicit auto that I've been thinking about adding.

// implicit auto
main() {
    // implicit auto
    DatabaseTableTypes = GetTypesForDatabaseTables("my_db", "admin", "password");
    DatabaseTableTypes["CustomerTable"] NewRecord;
}


So grammatically, a "DeadMG++" program is comprised of definitions. Namespaces, functions, classes, variables, whatever.





program := definitions
definitions := definition [definitions]


definition := type-creation
           := variable-definition
           := namespace-definition
           := function-definition


using-definition := 'using' identifier '=' type-definition


type-creation := type-literal
              := using-definition


pointer-modifier := '*' ['const']
array-modifier := '[' expression ']' [array-modifier] [pointer-modifier]
reference-modifier := '&'
                   := '&&'


type-modifiers := pointer-modifier
               := array-modifier
               := reference-modifier


type-base := ['const'] ['volatile'] identifier
                := expression
                := type-literal
                := 'auto'


type-definition := type-base [type-modifiers]
                
variable-definition := ['static'] [type-definition] identifier
                    := ['static'] [type-definition] identifier function-call
                    := ['static'] [type-definition] identifier '=' expression


namespace-definition := 'namespace' identifier '{' [definitions] '}'


function-definition := [type-definition] identifier function-arguments-list compound-statement


function-arguments-list := '(' function-arguments ')' [function-arguments-list]


function-arguments := [type-definition] identifier


variable-modifiers := ['static'] ['auto'] [type-definition]


type-definition-list := type-definition [',' type-definition-list]


function-call := '(' [expression-list] ')' [function-call]


function-call-expression := expression function-call


identifier := [_\w][_\w\d]*


cast-expression := 'static_cast' function-call
                := 'dynamic_cast' function-call
                := 'reinterpret_cast' function-call
                := 'const_cast' function-call


arithmetic-expression := expression '*' expression
                      := expression '+' expression
                      := expression '/' expression
                      := expression '-' expression
                      := '+' expression
                      := '-' expression


boolean-expression := expression '>' expression
                   := expression '<' expression
                   := expression '==' expression
                   := expression '!=' expression
                   := expression '<=' expression
                   := expression '>=' expression


member-expression := expression '.' identifier
                  := expression '->' identifier


return-statement := 'return' expression


type-literal := 'type' [':' type-definition-list] '{' [definitions] '}'


integral-literal := \d+
                 := '0x' [\dA-F]+


floating-point-literal := \d+ '.' \d+


literal-expression := type-literal
                   := integral-literal
                   := floating-point-literal


throw-statement := 'throw' expression
try-statement := 'try' compound-statement catch-statement
catch-statement := 'catch' '(' type-definition identifier ')' compound-statement [catch-statement]
                := 'catch' '(' '...' ')' compound-statement


access-expression := expression '[' expression ']'


assignment-expression := expression '=' expression


this-expression := 'this'


expression-list := [expression [',' expression-list]]

expression := member-expression
           := boolean-expression
           := arithmetic-expression
           := cast-expression
           := function-call-expression
           := access-expression
           := assignment-expression
           := type-definition
           := identifier
           := this-expression
           := literal-expression


if-statement := 'if' '(' expression ')' statement [else-statement]
else-statement := 'else' statement


label-statement := identifier ':'
goto-statement := 'goto' identifier


for-statement := 'for' '(' statement ';' expression ';' expression ')' compound-statement
while-statement := 'while' '(' expression ')' compound-statement
do-while-statement := 'do' compound-statement 'while' '(' expression ')'
case-statement := 'case' expression ':' statement-list [case-statement]
               := 'default' ':' statement-list
switch-statement := 'switch' '(' expression ')' '{' case-statement '}'


break-statement := 'break'
continue-statement := 'continue'


statement := expression
          := return-statement
          := if-statement
          := throw-statement
          := catch-statement
          := try-statement
          := compound-statement
          := label-statement
          := goto-statement
          := for-statement
          := while-statement
          := do-while-statement
          := switch-statement
          := break-statement
          := continue-statement


statement-list := statement [statement-list]


compound-statement := '{' [statement-list] '}'



Still missing

variadics
exception specifiers
lambdas
default arguments
enums

Likely, the grammar is flawed. I'm pretty sure that at the moment, you can grammatically do stupid things in many places, it's not tightly restricted enough. I need to work it again from the top down, some of the bottom stuff should be fine.

As correctly pointed out, string literals are missing, and catch statements aren't correctly limited to being after try statements.

Saturday 9 July 2011

Automatic type deduction

I think that DRY is a very important software engineering principle. It is, however, relatively amazing how often our basic programming systems violate it.

class f {
    std::string func();
}
std::string f::func() {
    return std::string("");
}


Count the violations. We said that func returns a std::string twice and implied it at least once, and we also stated that f has a function called func twice.

This is a little resolved by multiple-pass compilation.

class f {
    std::string func() {
        return std::string("");
    }
}


Now we have an improvement- we only told the compiler about f and func once, but we've still doubled up on the return type. As such, I think that automatic type deduction should be the default pretty much everywhere, so for stack-based variables, for inferrable return types (for some definition of inferrable), for argument types. This would be similar to the polymorphic lambdas proposal for C++0x that didn't make it. This can make for some interesting, fluent code when dealing with value types.

template<typename lhs_type, typename rhs_type> add(lhs_type&& lhs, rhs_type&& rhs)
-> decltype(std::forward<lhs_type>(lhs) + std::forward<rhs_type>(rhs)) {
    return std::forward<lhs_type>(lhs) + std::forward<rhs_type>(rhs);
}


versus

add(lhs, rhs) {
    return lhs + rhs;
}


Damn, that'd be a breath of fresh air. Of course, the original add will do better when dealing with things that are already lvalues, as the second performs unnecessary copying. So I need to brush up on my lvalue and rvalue references. I want to change auto to be auto&& instead, and I also want forward to not need an explicit argument anymore. That would mean that I could have

add(lhs, rhs) {
    return forward(lhs) + forward(rhs);
}


But, apparently, I still don't understand rvalue references properly. Amazing that anyone can deal with them beyond calling the Standard library to handle it. I've also been thinking of a kind of last-reference rule, where the last reference to a variable in a function is implicitly an rvalue reference. Not only would that substantially shorten some shorter functions, but in some cases notably involving lambdas which take by value, then it could  result in moving rather than copying. The trouble with this is when you start involving pointers... as usual. It would be so nice if I could just ban them outright. Unfortunately, re-bindable aliases are probably a necessary fact of life. Probably.

I might still be able to get away with

add(auto&& lhs, auto&& rhs) {
    return forward(decltype(lhs))(lhs) + forward(decltype(rhs))(rhs);
}

Edit: I could, of course, just make auto&& the default for function arguments, and auto for returns, local variables, member variables ... :) I'd still have to deal with the problem of forward, though, that really needs cleaning up. I could add a language feature, called forward, that does the same job that doesn't need an explicit type argument if I can't come up with a library feature.

add(lhs, rhs) {
    return forward(lhs) + forward(rhs);
}


It's not quite Python or Lua, but it's pretty damn close.

Lock

That's pretty much it. I've decided that there's little choice except to lock a regular unordered_map. The issue is that when you come to more than a couple parameters, it's way unfeasible to implement the whole thing atomically. So the result would look something like (ignoring immutability):

threadsafe type vector()(type t) {
    static unordered_map(type, type) cache;
    static lock l;
    scoped_lock sl(l);
    if (cache.find(t) != cache.end())
        return cache[t];
    type retval = struct {
        ...
    }
    return cache[t] = retval;
}



Not sure how the thread-safe specifiers are going to work quite yet. Static variables make me iffy, but they're definitely not part of the interface and they're still thread-safe, if I initialize them before making any calls, so I'm not feeling too bad about it. Of course, as this is all going to be transformed before anything happens, then it shouldn't be too hard to transform it into something else, like

struct vector {
    unordered_map<type, type> cache;
    lock l;
    threadsafe type operator()(type t) {
        scoped_lock sl(l);
        if (cache.find(t) != cache.end())
            return cache[t];
        type retval = struct {
            ...
        }
        return cache[t] = retval;
    }
};



It's thread-safe and referentially transparent. Now we'll just add a dose of immutability (don't want anyone to play with my vectors or break their rules).

threadsafe immutable_type vector()(immutable_type t) {
    static unordered_map(immutable_type, immutable_type) cache;
    static lock l;
    scoped_lock sl(l);
    if (cache.find(t) != cache.end())
        return cache[t];
    type retval = struct {
        ...
    }
    return cache[t] = retval;
}

Problem solved?

Friday 8 July 2011

Food- boredom vs productivity

I've got a rather annoying stomach malady lately (lately being for the last six months) and if you're a poor soul who has to deal with me in the C++ chat, you already know this. So my current diet is exclusively one tin of spaghetti and sausages, and two bowls of yoghurt and banana, in one day. That's about 1300 calories max. But I'm still not losing weight at a particularly good rate.

The simple fact is that after eating the same stuff for six months, I'm just bored of it. Any old junk that won't make me sick, I'll buy it, just for the variety. And I live above a food shop, so it's quite easy for me to obtain said junk. This is seriously impinging on my efforts to lose weight. But it has another, nasty side effect, which is that I'm just so damn lazy. I'd be better off walking to MacDonalds, having one, and walking back every day than not going at all.

Of course, I could go home, where my parents will handily prevent me from eating cookies and make me walk the dog, which should be plenty. But then I'd have to compete with my brother and sister for the internet and use some srsly substandard hardware. Save electricity and food bills though.

Implementation Details

Haven't posted in a while. I've got a lot to say this time around though - I've been working more on N-pass. I've been coming a cropper against a few implementation details.

Firstly, we come to referential transparency. In C++ then it's impossible to make a non-referentially-transparent template, but in "DeadMG++" it would be trivial to do so- especially since the source of the type may not even be present. Thus, how could you implement type equality? So far, I've decided to have a hash mechanism, where I will hash the arguments and use a thread-safe cache to ensure memoization of "pure" functions. This would effectively mean that for all the calls to std::containers::vector()(int), there is only ever actually one call. Non-pure functions will always return different types. The trouble with this is firstly, "pure" functions not only guarantee referential transparency, but must also take only hashable arguments, a limitation I'm not a particular fan of. Whilst I can see that most relatively trivial arguments will be hashable, like types, strings, integers/floats, etc, then it would be perfectly reasonable to have arguments which are not hashable, in which case I would be unable to guarantee type equality. I especially don't like it because it would bind the library (hashing) to the language. Secondly, finding a concurrent cache structure is not a trivial undertaking.

And then I have another issue. Consider C++ code such as

template<typename T> void f(std::vector<T> t);



In "DeadMG++" then this would roughly correspond to

void f(type T)(std::containers::vector()(T) t);


However, this would mean that given a type, I would have to find it's source function call, basically. In the general case, I'm guessing this would be akin to solving the Halting Problem- especially since the source code need not exist. I'm considering just not supporting it, I think it's hardly the generic code I want to write, but I'm not sure that I want to have easily demonstratable code that C++ supports that I don't.

Thirdly, I'm also combating the problem of mutable types. Mutable types are important, but they also offer possibilities for the invariants to break. Right now, I'm dealing with reference types that look more like .NET. For example,

type vector()(type t) {
    t = struct {};
    if (t.is_pod()) {
        t.add_member_function("is_pod", [&]()(t& ref) {
            return true;
        });
    }
    return t;
}

type fail() {
    type t = struct {};
    type vectort = vector(t);
    t.add_member_variable("hai", std::string);
    // OOPS no longer POD.

}

Now, const would be the usual solution to this, and as such, you can bet that immutability is the cure. I've been thinking of adding an immutable_type. This would, when constructed, copy it's contents from a type, and then be immutable from then on, meaning that types would effectively have a construction phase and a use phase. Then, the above vector function would take and return an immutable type.

So I've got one major implementation problem left- referential transparency.

Monday 4 July 2011

Implementations

Of course, if I pre-processed into C++, then I could actually implement N-pass without too much effort, one hopes. Apparently, MSVC does offer command-line access without having to run their own command line if you run a batch file first. And, I can topsort to fix up the includes, especially with no macros, templates, etc. If the compiler could output the form as a DLL, say, where I want, and then I'll just GetProcAddress() it and execute. Dangerous to load code into the compiler process... maybe. All I'd have to do is define some simple data structures to represent the code, parse it, transform it, compile it with MSVC, load it, execute it, repeat.

Languages and implementations

You know, I've been thinking. Some languages out there could be really innovative, and nobody would use them, because they have crappy implementations or bad marketing. Almost all of the language ideas I've had, I've seen them in other languages- but they don't get wide-scale adoption, which seems strange to me. The main conclusion that I've come to is that their implementations suck.

For example, take LISP macros. I've heard legends of their power, but nobody codes in LISP. That's probably because LISP is kinda slow, no OO, etc. I figure that creating a great language isn't about being the smartest kid on the block, it's about picking elements that already work and combining them in new and awesome ways.

Saturday 2 July 2011

N-Pass

I've been thinking more and more about complex compile-time operations. The fundamental problem is, where do you draw the line? If I allow metaprogramming, do I allow meta-meta-programming? How do I allow complex types- which might be metaprogrammed themselves- to be used in metaprogramming? For example, it's a valid and known optimization in std::copy for certain POD types to go to memcpy or memmove, something which I should always allow, which the three-pass system doesn't. But then, I'd have to introduce a new pass.

constexpr type array(type t, int i) {
    type t;
    t.add_public_member_array(t, i); // etc
}


constexpr void func(int i) {
     array(t, i) arr; // How do I deal with this?
}


So far, I've come up with two solutions, and the first is the three-pass system that I discussed previously. You would use SFINAE like now, etc. The downside of that is basically that I'd have to keep templates, which would be an incredible mess and massive constrast to the smooth, easy constexpr system that I have now.

The other solution is something I've termed N-pass. This is where the compiler performs a variable number of compiling passes to compile the program. The trouble is, more passes means a longer compile time, and determining how many passes are appropriate could be ... "fun". However, the end result would be much better- there would be effectively no blur between compile-time and run-time.

The language would only have the notion of "this pass", which is "runtime" in C++, and "the previous pass", which is "constexpr" in C++0x. Then, the compiler would resolve which passes are which.

// Takes no "previous pass" arguments, takes two "this pass" arguments
type array()(type t, int MaxSize) {
    type t;
    t.add_public_array(t, MaxSize, "data");
    // A lambda function- takes no "prev pass" arguments, 

    // two "this pass" arguments "this" and "index"
    t.add_public_member_function("at", [&]()(t& ptr, int index) {
        if (index >= MaxSize)
            throw std::runtime_error("Array access out of range.");
        return ptr.data[index];
    });
}

float GetSomeShiz(int x)(array(float, x) arr, int i) { 

    // OK- x is a prev-pass argument
    return arr.at(i);
}
float GetSomeShiz()(int i, array(float, i) arr) { 

    // BAD, i is a this-pass argument
}


A slightly more complex example:


void copy(type t)(t* dest, t* source, int size) {
    if (t.IsPod()) {
        memcpy(dest, source, size);
    } else {
        for(int i = 0; i < size; i++) {
            dest[i] = source[i];
        }
    }
}


The trouble with this is that it's not immediately clear which statements are executed at which pass. Obviously anything depending on the "this-pass" arguments must be executed at this pass. However, some statements like the loop (if the size were a "prev-pass" argument) might be ambiguous. Is that good, or bad?

The only upside of this system is that functionality can be compiled at "this pass" and then kept that way, regardless of how many actual passes it ends up being used in. Tell me what you think.

Another advantage I've come up with (maybe) is some surprising overloading possibilities, in addition to what I already had with named variables.

// If the size is a "prev-pass" variable, execute some magic optimized version
void copy(type t)(t* dest, t* source)(int size) {
    if (t.IsPod()) {
        memcpy(dest, source, size);
    } else {
        for(int i = 0; i < size; i++) {
            dest[i] = source[i];
        }
    }
}

Another thing I've been considering is a quick syntax for what happens when. Like,


// If the size is a "prev-pass" variable, execute some magic 
// optimized version
void copy(type t)(t* dest, t* source, int size) {
    constexpr {
        if (t.IsPod()) {
            runtime {     
                memcpy(dest, source, size);
            }
        } else {
            runtime {
                for(int i = 0; i < size; i++) {                   
                    dest[i] = source[i];
                }    
            }        
        }
    }
}


As you can see though, it could get quite cluttered- especially if you wanted most than a couple levels nesting.

Friday 1 July 2011

Semantic inline

Inline. In a multi-pass environment like you'd find in .NET or Java then there's no need and no purpose to inline as it exists in C++. However, I figured it can be used for another purpose- "semantic" inlining. In this case, we would make __FILE__, __LINE__ language calls like file() and line(). And, more importantly, alloca() would be inlined, meaning that you can allocate memory off the caller's stack in an inline function. Great for implementing, say, memory stacks :)

Of course, actual inlining would be best left to the compiler.