Wednesday, 27 July 2011

Valarray

Have a look at this simple SO question:

http://stackoverflow.com/questions/6850807/why-is-valarray-so-slow/6851501#6851501

This raises an equally simple question about "DeadMG++": what the hell to do about expression templates? In this case, an expression template can more than double the performance of the code. Now, in "DeadMG++" I already described a feature where you can take parsed functions as ASTs and create your own semantic interpretation of them. This could yield code like


c = [&]() { return a * b; };

Where c has an operator= that will take a function, of type .. some type which I have yet to nail down .. which has access to the whole expression tree. In this case, the writer has the opportunity to perform whatever semantics he wants. But what I might do is extend this to be individual expressions too. That would mean that you could write


c = a * b;


and c would be equatable to an expression, and can have whatever fun it wants optimizing the hell out of that expression, including CSE and all the rest, or whatever it decides should be efficient. Of course, I'll need to actually play with both of these systems before deciding what to do about them. Valarray itself, of course, I'll likely not have.

Now, I've already conceived of simplistic systems which could provide optimizations in more common cases. For example, let's say an operator overload. In C++ then each operator overload can only deal with a direct equivalent. However, I considered the possibility that actually, they could be worth more than one. For example, let's say that in the addition operator of std::string, you could take any number of arguments. That would mean that where


std::string a;
a = a + a + a + a + a;


in C++ this would become

a = a + (a + (a + (a + a)));


or, more directly,

a.operator=(a.operator+(a.operator+(a.operator+(a.operator+(a)))));


However, in the above scenario in DeadMG++, this would actually become something more like

a.operator=(a.operator+(a, a, a, a));


The problem is that tasks like CSE can be repetitive and coding them in every type would be quite difficult, so I'd certainly have to try to Standardise some of it. I'm also considering mandating that operators and values must have the same semantics as primitives- i.e., that for any expression like (a = a + b), that the optimizer is legally allowed to reduce it to (a += b). In this case, CSE and other compiler optimizations would run before the expression is generated, and the class would only have to deal with the result, which is smoother and cleaner but less customizable. Of course, I just cut the need for expression templates as they are now with the expression in-built type, so it's questionable as to whether they would have any use that would not fall into the acceptable category. In order to address this properly, I'd need a "pure" specifier, which is "Please eliminate me", as well as "Please parallelize me". There's some considerable overlap between "pure", "const" and "threadsafe", and I'd really need to explicitly define further what means what, precisely.

Pure functions are necessarily valid const functions. Const functions are not necessarily valid pure functions, and neither of them are valid threadsafe functions- after all, you might const_cast away the constness and use something like memoizing, which would not break the promises to the compiler but it would not be thread safe, and in addition, the compiler cannot guarantee that any access is thread-safe.

Now, the real question is if I can make threadsafe more than just a promise. Of course, I could propagate it, but the problem is that determining threadsafety will be more than just checking that all accesses are thread safe. I'm not sure that it's even theoretically possible to determine threadsafety of a function automatically, even given all the knowledge in the world. Unless I want to crack out a virtual machine and run it in every permutation of being accessed and changed, etc.

No comments:

Post a Comment