Wednesday, 15 June 2011

I decided to ask another question today. When is a language not a language? This is mostly a hypothetical question that I'd like to discuss/think about, rather than suggest as a practical solution.

For example, let's talk about the hideous mess that is the current C++ grammar. Hideous mess isn't just my opinion- it's not context-free, which makes it an official bitch to parse and compile. In addition to that, there are other compilation-related problems- for example, header files. The trouble with eliminating these problems is that all our old code is stuck with them- if we change the C++ grammar, we would have to re-write every line of existing code.

That's why I'm going to suggest that C++ defines two grammars. And, further to that, that we cut the preprocessor and compilation model entirely.

How could such a thing possibly work?

Firstly, we need to consider that new languages will always supersede old languages eventually. A new language will come along and beat C++. It might not be D, or the JIT generation, but it'll happen. And compilers for that mythical language, they will have to be implemented. When you talk about how implementers are going to have to implement two grammars, then that's what's going to happen anyway- if not already. Consider that Microsoft, for example, already compiles more than two major languages in parallel - C++ and C#. At least, if all we did was define an alternative grammar, then they could keep the same back-end generators, assembly optimizers, and that kind of thing.

Secondly, code in the new grammar could be dramatically easier to deal with than code in the existing paradigm. Not least of which because the new grammar could be designed from the ground up to be extended and meet all of C++'s existing needs in a context-free way, making parsing it substantially easier than now. This makes it a lot less than double the work for a compiler implementer.

Thirdly, a new grammar gives us an opportunity to genuinely rectify our mistakes. For example, the preprocessor. Was it a mistake in 1995? Probably not. But right now, it's a huge problem, and we need to eliminate it. Having an old and a new grammar is an excellent way to separate having old and new semantics too. When you compile "old-grammar" code, then you can do this- but it's forbidden in "new-grammar" code. Even simple things, like on char* to string literal, and array-pointer conversion- the kind of thing that nobody wants to admit really exists in the C++ Standard but always has to. When you compile in "old-grammar" mode, then you get headers and all the rest, and "new-grammar" mode will have no preprocessing.

Ultimately, in my opinion, either C++ will make this transition, or another language will come along. I think it would be better if C++ and the C++ Standard chose to make this happen themselves instead of waiting for someone else to do it. I think that C++0x is beating a dead horse- it might twitch, but it'll never get up and plough.

No comments:

Post a Comment