Had a bug where an expression's type inexplicably became a Function rather than a UDT. Suspected ownership bug.
Had to switch all expression ownership to shared_ptr. Fixing destructors to occur semantically, as is correct, instead of being collected at codegen time. Lots of messy ABI details like type complexity. And I'll have to change AggregateType's functions to be real functions instead of simply complex expressions.
Yay.
Sunday, 29 June 2014
Saturday, 28 June 2014
Deleted function semantics
I've been thinking about deleted functions and they have some undesirable results in C++, I feel. Right now I'm thinking that a deleted function will register as a general OR failure, not a specific hard error. So if you explicitly delete a copy constructor, the type will register as uncopyable.
Today I finished rebuilding my parser. Now it's 800loc saved, more extensible in ways that actually matter, and the error handling will be better when I pass over it.
Now I'm thinking about stuff like dynamic destructors/operators, abstract/final/override, and function behaviours. Here we mean stuff like throw, rethrow, return, terminate, what you can or cannot throw, etc. I'm not sure what the most efficient way of expressing this stuff is.
You can either promise not to throw, promise to throw, or say you might throw. You can promise to throw nothing, throw one of X types, or throw anythin. You can guarantee to return, might return, or guarantee not to return. You can rethrow, not rethrow, or might rethrow. What I'm really seeing is that three of these are "Will, won't, or might", and the fourth is just what you will/won't/might do in more detail. The default here would be can throw, can rethrow, can return, and may throw anything. These attributes will aid in CFG (control flow graph) computation, which can mean generating more efficient code, and giving more accurate warnings and other stuff.
Today I finished rebuilding my parser. Now it's 800loc saved, more extensible in ways that actually matter, and the error handling will be better when I pass over it.
Now I'm thinking about stuff like dynamic destructors/operators, abstract/final/override, and function behaviours. Here we mean stuff like throw, rethrow, return, terminate, what you can or cannot throw, etc. I'm not sure what the most efficient way of expressing this stuff is.
You can either promise not to throw, promise to throw, or say you might throw. You can promise to throw nothing, throw one of X types, or throw anythin. You can guarantee to return, might return, or guarantee not to return. You can rethrow, not rethrow, or might rethrow. What I'm really seeing is that three of these are "Will, won't, or might", and the fourth is just what you will/won't/might do in more detail. The default here would be can throw, can rethrow, can return, and may throw anything. These attributes will aid in CFG (control flow graph) computation, which can mean generating more efficient code, and giving more accurate warnings and other stuff.
Wednesday, 25 June 2014
A good refactor
Reduced code size? Check.
Fixed bugs? Check.
More extensible? Check.
New functionality? Check.
Better error handling? Check. Well, partially. It's a smidgeon better and will get a lot better. I had to rip out a considerable quantity of it since it was broken.
Today is a good day todie refactor.
Fixed bugs? Check.
More extensible? Check.
New functionality? Check.
Better error handling? Check. Well, partially. It's a smidgeon better and will get a lot better. I had to rip out a considerable quantity of it since it was broken.
Today is a good day to
Sunday, 22 June 2014
Cleanup in progress
I've cleaned up a few things that were annoying me.
I fixed exporting to use a C# style attribute syntax, so now you use
[export := header.function]
f() {}
I fixed constructor member initialization syntax to support identifying bases by type.
I fixed resolving overloads based on lvalue/rvalue w.r.t. ref-qualifiers in C++.
I fixed auto-detecting Linux includes (boy, not having that was super annoying).
I implemented exception re-throwing with throw;.
Next up on my list is parsing and lexing. My current parser and AST are fairly kludgy- their design is from way back when I parsed multiple files directly into the same AST concurrently, didn't support operator overloads, and the parser's lexer interface is from years ago also. The lexer itself isn't too bad but I just need to alter the token types a bit.
The main reason for this is that the parser doesn't support dynamic destructors, or dynamic operators, or defaulted/deleted functions, etc, and the design is non-conducive to being modified at run-time. There's also some duplicate code in terms of rules operating in terms of what other rules expect and the error handling is quite duplicated.
The new approach will be half table-driven, half recursive descent. And the AST/Builder will be changed to not stringly type destructors, constructors, and not indicate operators by token type, etc. The main change for the lexer will be that it will no longer be a token type enumeration. Instead token types will be indicated by a constant pointer (probably to std::string). This allows new token types to be added. In addition the lexing tables will be made members of the lexer instance to permit modification instead of constant as they are now, as will the parsing tables.
I fixed exporting to use a C# style attribute syntax, so now you use
[export := header.function]
f() {}
I fixed constructor member initialization syntax to support identifying bases by type.
I fixed resolving overloads based on lvalue/rvalue w.r.t. ref-qualifiers in C++.
I fixed auto-detecting Linux includes (boy, not having that was super annoying).
I implemented exception re-throwing with throw;.
Next up on my list is parsing and lexing. My current parser and AST are fairly kludgy- their design is from way back when I parsed multiple files directly into the same AST concurrently, didn't support operator overloads, and the parser's lexer interface is from years ago also. The lexer itself isn't too bad but I just need to alter the token types a bit.
The main reason for this is that the parser doesn't support dynamic destructors, or dynamic operators, or defaulted/deleted functions, etc, and the design is non-conducive to being modified at run-time. There's also some duplicate code in terms of rules operating in terms of what other rules expect and the error handling is quite duplicated.
The new approach will be half table-driven, half recursive descent. And the AST/Builder will be changed to not stringly type destructors, constructors, and not indicate operators by token type, etc. The main change for the lexer will be that it will no longer be a token type enumeration. Instead token types will be indicated by a constant pointer (probably to std::string). This allows new token types to be added. In addition the lexing tables will be made members of the lexer instance to permit modification instead of constant as they are now, as will the parsing tables.
Friday, 20 June 2014
Slump
I'm in a bit of a slump right now. Was hunting for a job and I thought I'd found something but seems like not. Now I'm kinda off my game. None of my music feels right and I can't seem to get into the flow. My Internet also keeps failing. I haven't really coded anything in the last couple of days, just watched The Matrix and the few good minutes from the sequels on repeat.
Found a bug where tuples (and presumably also lambdas) are incorrectly rvalues instead of values when created.
I need to create some sort of more serious project management. Right now I just have "todo.txt", and I checked it today and it's clear I haven't made use of it in months. There's stuff like "Add basic inheritance (no virtual functions)" and "Add exceptions".
It's time for a cleanup. My parser and lexer code is both bad and non-extensible. My AST is pretty poor in many regards like for example representing constructors and destructors with string names, operator overloads with token types, etc. This has got to go. I need to replace all those runtime_errors. I also need to fix all those places where I don't error but LLVM will crash the process with type errors, like if you export a function but define it with the wrong signature.
Found a bug where tuples (and presumably also lambdas) are incorrectly rvalues instead of values when created.
I need to create some sort of more serious project management. Right now I just have "todo.txt", and I checked it today and it's clear I haven't made use of it in months. There's stuff like "Add basic inheritance (no virtual functions)" and "Add exceptions".
It's time for a cleanup. My parser and lexer code is both bad and non-extensible. My AST is pretty poor in many regards like for example representing constructors and destructors with string names, operator overloads with token types, etc. This has got to go. I need to replace all those runtime_errors. I also need to fix all those places where I don't error but LLVM will crash the process with type errors, like if you export a function but define it with the wrong signature.
Thursday, 19 June 2014
Optional
I'm looking to move optional(t) over to Wide. To achieve this I'll need three new language features- library is-a, since null is-a optional(t), aligned storage, and boolean testing. Aligned storage means supporting the attributes. Right now, I'm thinking of something like
template(t)
[align := t.alignment]
type aligned_storage {
storage := int8.array(t.size);
}
This is stealing the attribute syntax from C#, which is my current idea to replace function prologs with the primary advantage that they are less noisy and I can consider extensibility in the future. Perhaps I could consider permitting an attribute directly on to a data member, so for example,
template(t) type Optional {
[align := t.alignment]
storage := int8.array(t.size);
// other stuff
}
template(t)
[align := t.alignment]
type aligned_storage {
storage := int8.array(t.size);
}
This is stealing the attribute syntax from C#, which is my current idea to replace function prologs with the primary advantage that they are less noisy and I can consider extensibility in the future. Perhaps I could consider permitting an attribute directly on to a data member, so for example,
template(t) type Optional {
[align := t.alignment]
storage := int8.array(t.size);
// other stuff
}
Wednesday, 18 June 2014
Arrays, code cleanup, and stdlib
I implemented some basic array stuff today. It's boring and easy but also fast and new feature. I added a couple simple tests for it. I also want to do a code cleanup pass and did the first part of that today. The problem I'm looking to address here is that when debugging Wide functions, there's a huge amount of noise, and the functional logic is lost. This is because Wide constructors can only operate in terms of a "this" pointer, even when in reality it's just going to be loaded to produce a value right away. There were other cases when I unnecessarily promoted from value to rvalue too.
Secondly, I've got my first language feature that should throw an exception- array indexing. I'm fairly confident that LLVM can handle optimizing the array index bounds check out. Annoyingly, LLVM cannot dynamically index into an array value, which totally throws the whole value thing out of whack. Right now, I just copy to the stack every time... LLVM can optimize out the repeated copies, I'm fairly sure. I'm also going to offer an unchecked access so you can use that if the optimizer's not good enough. The problem with this is that unless I want to define the exception type in the compiler, I need my Wide Standard library available during testing, which is going to make life ... fun.
I've also been thinking about some slightly more complex transformations, like maybe yield return. Semantically, this transform is not too hard- just shift the locals that you need from allocas to member accesses, and add a member for the current "state". The trouble is that returning would implicitly mean returning an optional, which would again mean making the Wide stdlib available during testing. Another trouble is that pointers/references to the local variables can't really be trusted, but I guess this is already true of lambda captures.
I've also got to clean up stuff like attributes, introduce library is-a, and such.
Secondly, I've got my first language feature that should throw an exception- array indexing. I'm fairly confident that LLVM can handle optimizing the array index bounds check out. Annoyingly, LLVM cannot dynamically index into an array value, which totally throws the whole value thing out of whack. Right now, I just copy to the stack every time... LLVM can optimize out the repeated copies, I'm fairly sure. I'm also going to offer an unchecked access so you can use that if the optimizer's not good enough. The problem with this is that unless I want to define the exception type in the compiler, I need my Wide Standard library available during testing, which is going to make life ... fun.
I've also been thinking about some slightly more complex transformations, like maybe yield return. Semantically, this transform is not too hard- just shift the locals that you need from allocas to member accesses, and add a member for the current "state". The trouble is that returning would implicitly mean returning an optional, which would again mean making the Wide stdlib available during testing. Another trouble is that pointers/references to the local variables can't really be trusted, but I guess this is already true of lambda captures.
I've also got to clean up stuff like attributes, introduce library is-a, and such.
Sunday, 15 June 2014
Bughunting and feature drive
Today I fixed some bugs, but more importantly, I decided on my next core feature drive. And that drive will be for modules and ABI.
Right now, Wide offers a relatively ABI-independent interface, in theory. I want to tighten that up so that the Type interface is properly ABI-independent and remove Itanium helpers. I want to support laying out types according to more than one ABI. I want to be able to ship headers with modules. I want to be able to handle dynamic import/export. I have plans for how some of this stuff can be achieved.
Before that, I need to work on bugfixes, since presumably I just introduced a few hundred (thousand) of them. And a long time ago, I wanted to handle incremental analysis, re-implement error handling, ... the errors issued in the new features are all just std::runtime_error, and half the failure conditions are probably either ignored leading to a compiler crash, or asserted.
But incremental re-analysis has taken a more serious back seat, and it's because Clang can't handle it, and it also can't handle analysis -> codegen -> analysis. This puts more serious roadblocks in the way of supporting those features myself. I particularly dislike not being able to code generate more than once from a particular Analyzer instance. There are only a few cases where I would need to modify my code to support it, but I can't do it because Clang cannot handle it.
Ultimately, I'm just a one-man shop with other things on my plate. I need to hire more help.
Right now, Wide offers a relatively ABI-independent interface, in theory. I want to tighten that up so that the Type interface is properly ABI-independent and remove Itanium helpers. I want to support laying out types according to more than one ABI. I want to be able to ship headers with modules. I want to be able to handle dynamic import/export. I have plans for how some of this stuff can be achieved.
Before that, I need to work on bugfixes, since presumably I just introduced a few hundred (thousand) of them. And a long time ago, I wanted to handle incremental analysis, re-implement error handling, ... the errors issued in the new features are all just std::runtime_error, and half the failure conditions are probably either ignored leading to a compiler crash, or asserted.
But incremental re-analysis has taken a more serious back seat, and it's because Clang can't handle it, and it also can't handle analysis -> codegen -> analysis. This puts more serious roadblocks in the way of supporting those features myself. I particularly dislike not being able to code generate more than once from a particular Analyzer instance. There are only a few cases where I would need to modify my code to support it, but I can't do it because Clang cannot handle it.
Ultimately, I'm just a one-man shop with other things on my plate. I need to hire more help.
Saturday, 14 June 2014
Caught an exception
Last night I caught my first exception. You can't rethrow, you can't catch anything but ..., and the compiler crashes if you try to insert code after a throw in a try. But it works.
EH intrinsics on the LLVM level are pretty broken. Fortunately, one of the ways in which they are broken is coming up puppy.
I also implemented but have not yet tested special semantics for destroying members in constructors that throw.
I think that catching non-... things is the most useful feature to add next. I think that in theory, it's a relatively simple deal now that I have the rest of the infrastructure done. After that, rethrowing. After that, test test test.
I actually kinda... don't know where to go now. I didn't expect exceptions to be so simple. I practically spent more time on RTTI or Itanium-compatible layout.
EH intrinsics on the LLVM level are pretty broken. Fortunately, one of the ways in which they are broken is coming up puppy.
I also implemented but have not yet tested special semantics for destroying members in constructors that throw.
I think that catching non-... things is the most useful feature to add next. I think that in theory, it's a relatively simple deal now that I have the rest of the infrastructure done. After that, rethrowing. After that, test test test.
I actually kinda... don't know where to go now. I didn't expect exceptions to be so simple. I practically spent more time on RTTI or Itanium-compatible layout.
Wednesday, 11 June 2014
Payoff
All the investment I've put into refactoring my core systems is paying off.
I refactored UserDefinedType's GetClangType, which now accepts without error all UDTs in all test cases.
I threw an exception from Wide and caught it in C++.
The core remaining feature is implementing destructors in case of exception and catching/rethrowing in Wide. After that, it's test, test, test for the new ABI features.
Even with the new Codegen cleanup that removed a lot of code, I'm now ranking over 19k loc. Seems just a while ago that I was barely breaking 18k. I feel good.
I've discovered that there's quite a number of features that got silently cut. For example, it used to be that you could use !() to pass explicit template arguments to C++ functions. I've discovered that there are now literally no types that respond to !(). It would have to be OverloadSet that handles this, I feel. Another example of a silent feature cut is OverloadSet conversion to C++ type.
One thing I'm minorly concerned about is unused functions and C++ type conversions. Converting a UDT to C++ requires exporting the members, which counts as a use of those functions, even if it turns out C++ never calls them or exports them. This is particularly problematic since getting RTTI (which is done for all types with a vtable) first attempts to do it by converting to C++ type and asking Clang to work out the RTTI for us. Only if this fails do we compute our own RTTI.
So, first exceptions, then maybe a couple cut features, then test test test.
I refactored UserDefinedType's GetClangType, which now accepts without error all UDTs in all test cases.
I threw an exception from Wide and caught it in C++.
The core remaining feature is implementing destructors in case of exception and catching/rethrowing in Wide. After that, it's test, test, test for the new ABI features.
Even with the new Codegen cleanup that removed a lot of code, I'm now ranking over 19k loc. Seems just a while ago that I was barely breaking 18k. I feel good.
I've discovered that there's quite a number of features that got silently cut. For example, it used to be that you could use !() to pass explicit template arguments to C++ functions. I've discovered that there are now literally no types that respond to !(). It would have to be OverloadSet that handles this, I feel. Another example of a silent feature cut is OverloadSet conversion to C++ type.
One thing I'm minorly concerned about is unused functions and C++ type conversions. Converting a UDT to C++ requires exporting the members, which counts as a use of those functions, even if it turns out C++ never calls them or exports them. This is particularly problematic since getting RTTI (which is done for all types with a vtable) first attempts to do it by converting to C++ type and asking Clang to work out the RTTI for us. Only if this fails do we compute our own RTTI.
So, first exceptions, then maybe a couple cut features, then test test test.
Destructors
I refactored destructors today. The new algorithm is substantially superior in every respect- it's simpler, it's smaller, it's faster. I also introduced the CodegenContext that can make refactoring code generation easier and simpler in the future.
But the core benefit was making it EH-ready. Well, not exactly EH ready, but not too far from. I also had a quick peek at clang and CodeGenEH is only 68kb or so, which makes me feel better about the probability of Itanium EH being relatively easy to implement.
I also fixed a couple bugs and found an important and unfortunate new class of potentially compiler-crashing error. Itanium ABI says that vtable layout depends on function return type, which depends on function body, so any dependency from a member function on the vtable layout means assertion failure. I have removed dependency for calling other virtual functions but there are probably other ways in which a member function can request the contents of the vtable- constructing a new object of it's own type being the simplest example.
But the core benefit was making it EH-ready. Well, not exactly EH ready, but not too far from. I also had a quick peek at clang and CodeGenEH is only 68kb or so, which makes me feel better about the probability of Itanium EH being relatively easy to implement.
I also fixed a couple bugs and found an important and unfortunate new class of potentially compiler-crashing error. Itanium ABI says that vtable layout depends on function return type, which depends on function body, so any dependency from a member function on the vtable layout means assertion failure. I have removed dependency for calling other virtual functions but there are probably other ways in which a member function can request the contents of the vtable- constructing a new object of it's own type being the simplest example.
Monday, 9 June 2014
Extensibility
One thing I've been thinking about with regards to Wide is how to enable its use as a library. My experience working with Clang was ... questionable in this regard. So far I've been thinking about how to handle extending Wide.
Currently, anybody can inherit from Type, and anybody can add a special member to a module. This is how C++ support is implemented. And in addition, anybody can inherit from Semantic::Expression- Wide is not picky. Although you can currently only generate code once, this is something that is not a core limitation- generating code multiple times from the same analyzer is something I will fix in the future and it hopefully won't be a big deal.
But when it comes to adding new AST expressions or statements, I've got no plan. Adding a new AST expression to Wide consists of adding a manual dynamic_cast in the analyzer implementation. I'm thinking of a new trick- use a type switch. Something like the following:
I'm thinking of using a similar trick to handle extending the parser. This way you can add new expressions (and something similar for statements) at run-time, as well as new types and such.
Today I've hunted down the last detected bugs from the Itanium ABI switchover. Once I finish up dynamic_cast, it's time to make preparations for Itanium ABI exceptions. Oh boy. Then test, test, cleanup, test test cleanup cleanup, etc.
Currently, anybody can inherit from Type, and anybody can add a special member to a module. This is how C++ support is implemented. And in addition, anybody can inherit from Semantic::Expression- Wide is not picky. Although you can currently only generate code once, this is something that is not a core limitation- generating code multiple times from the same analyzer is something I will fix in the future and it hopefully won't be a big deal.
But when it comes to adding new AST expressions or statements, I've got no plan. Adding a new AST expression to Wide consists of adding a manual dynamic_cast in the analyzer implementation. I'm thinking of a new trick- use a type switch. Something like the following:
class Analyzer {
public:
std::unordered_map<std::type_index, std::function<std::unique_ptr<Expression>(Analyzer&, const AST::Expression*)>> expression_handlers;
std::unique_ptr<Expression> AnalyzeExpression(const AST::Expression* e) {
if (expression_handlers.find(typeid(e)) != expression_handlers.end())
return expression_handlers[typeid(e)](*this, e);
throw ...;
}
Analyzer() {
expression_handlers[typeid(AST::String)] = [] { ... };
}
};
I'm thinking of using a similar trick to handle extending the parser. This way you can add new expressions (and something similar for statements) at run-time, as well as new types and such.
Today I've hunted down the last detected bugs from the Itanium ABI switchover. Once I finish up dynamic_cast, it's time to make preparations for Itanium ABI exceptions. Oh boy. Then test, test, cleanup, test test cleanup cleanup, etc.
Sunday, 8 June 2014
Itanium fun
So I've been working on implementing Itanium ABI layout. I've determined that many places in the Wide implementation assumed that every member had an associated LLVM field, which is not true in the presence of the EBCO mandated by Itanium. In addition, I had to implement a few new members and move vptr handling to AggregateType. Previously, there was a bug where since officially, the vptr was a member of the type, then stuff like generated copy assignment operators would copy the vptr (very bad!). Now AggregateType should respect the fact that the vptr is special.
In addition, someone in #llvm pointed out that if I didn't follow Itanium's layout rules, I couldn't use their dynamic_cast implementation, which makes assumptions.
Also, the ClangType implementation of constructor field locations was just totally broken, as well as my handling of EBCO- it was totally non-compliant.
I also had bugs in derived-to-base conversions where a null derived did not lead to a null base. I don't believe I have a single test that actually performs derived-to-base conversions on pointers, although the conversion for references is implemented in those terms.
I also cut down on my overhead by moving some common base-class related functions to Type instead of UserDefinedType and ClangType. There's probably more work I can do in this area, but some functions I'm not comfortable with moving down because they make assumptions about the ABI involved. I know that I don't support any ABI other than Itanium right now, but I'd rather not hardcode that fact into my base-level interface. After all, back when Wide and Clang had diverging ABIs, the basic functionality held together exactly because the Type interface is ABI-independent- except vtable layouts, which currently have their Itanium helper interface coded in the Type interface, which is bad.
I have several Type functions that should probably be static or hell, just non-members. But I'm powering ahead now until exceptions. When I have Itanium ABI exceptions, I'll take a break from new features and clean up/test everything. At least, that's what I promise myself so I can sleep at night. As long as my existing test base passes, it's More Feature Time until I have exceptions.
In addition, someone in #llvm pointed out that if I didn't follow Itanium's layout rules, I couldn't use their dynamic_cast implementation, which makes assumptions.
Also, the ClangType implementation of constructor field locations was just totally broken, as well as my handling of EBCO- it was totally non-compliant.
I also had bugs in derived-to-base conversions where a null derived did not lead to a null base. I don't believe I have a single test that actually performs derived-to-base conversions on pointers, although the conversion for references is implemented in those terms.
I also cut down on my overhead by moving some common base-class related functions to Type instead of UserDefinedType and ClangType. There's probably more work I can do in this area, but some functions I'm not comfortable with moving down because they make assumptions about the ABI involved. I know that I don't support any ABI other than Itanium right now, but I'd rather not hardcode that fact into my base-level interface. After all, back when Wide and Clang had diverging ABIs, the basic functionality held together exactly because the Type interface is ABI-independent- except vtable layouts, which currently have their Itanium helper interface coded in the Type interface, which is bad.
I have several Type functions that should probably be static or hell, just non-members. But I'm powering ahead now until exceptions. When I have Itanium ABI exceptions, I'll take a break from new features and clean up/test everything. At least, that's what I promise myself so I can sleep at night. As long as my existing test base passes, it's More Feature Time until I have exceptions.
Saturday, 7 June 2014
Vtable layout- thanks Itanium ABI
Came across a slight fun factor today- namely, that derived classes don't get their own vtable, but they often need one.
In Wide's current vtable layout model, each class has a vtable listing all the dynamic functions it has, regardless of source, and then we add offset to top and RTTI pointer to that. For calling dynamic functions found in the base class, we convert to the base class pointer and look them up through the base class vtable.
However, this leaves us a problem with the offset-to-top and RTTI pointers, namely that we have a derived class which needs updated offset-to-top and RTTI but has no vtable of it's own. So if I have something like
type base { dynamic f() {} } // offset, rtti, f
type base2 { dynamic g() {} } // offset, rtti, g
type base3 { dynamic h() {} } // offset, rtti, h
type derived : base2, base {} // no vtable
type more_derived : base3, derived {} // no vtable
It's pretty clear here that when derived and more_derived are constructed, they need to set new offset and RTTI pointers in their base classes, which currently they do. The problem comes when implementing RTTI and dynamic_cast for derived itself, as it doesn't have a vtable carrying the necessary data. For RTTI I can probably poll for any vtable, as they should all have the same RTTI entry. offset-to-top is more problematic because every base has a different value, and that value would need to be adjusted depending on where you got it from to account for the derived class's other bases.
In addition, I could consider adding vtable slots for inherited virtual functions. There is an argument that in some circumstances these could be more efficient.
But if I move to Itanium ABI then that whole primary-base thing will take care of this, so I think that today I will simply do that.
In Wide's current vtable layout model, each class has a vtable listing all the dynamic functions it has, regardless of source, and then we add offset to top and RTTI pointer to that. For calling dynamic functions found in the base class, we convert to the base class pointer and look them up through the base class vtable.
However, this leaves us a problem with the offset-to-top and RTTI pointers, namely that we have a derived class which needs updated offset-to-top and RTTI but has no vtable of it's own. So if I have something like
type base { dynamic f() {} } // offset, rtti, f
type base2 { dynamic g() {} } // offset, rtti, g
type base3 { dynamic h() {} } // offset, rtti, h
type derived : base2, base {} // no vtable
type more_derived : base3, derived {} // no vtable
It's pretty clear here that when derived and more_derived are constructed, they need to set new offset and RTTI pointers in their base classes, which currently they do. The problem comes when implementing RTTI and dynamic_cast for derived itself, as it doesn't have a vtable carrying the necessary data. For RTTI I can probably poll for any vtable, as they should all have the same RTTI entry. offset-to-top is more problematic because every base has a different value, and that value would need to be adjusted depending on where you got it from to account for the derived class's other bases.
In addition, I could consider adding vtable slots for inherited virtual functions. There is an argument that in some circumstances these could be more efficient.
But if I move to Itanium ABI then that whole primary-base thing will take care of this, so I think that today I will simply do that.
Friday, 6 June 2014
Standard library smoke tests
Had an interesting experience today. I wanted to show off my typeid() support, so I put a test up to Coliru. Instead of returning true I output it with cout. Imagine my surprise when this failed. printf() is one thing as it's variadic and I don't explicitly support that just yet. But there's no reason why a simple std.cout << true should fail.
This is down to lack of something that robot termed "smoke tests". With the new typeid() support comes a new testing constraint- the test environment must have a working copy of the stdlib, including headers. Previously I didn't need the headers, only the symbols, which on Linux I acquired from my own process (srsly) and on Windows I loaded MinGW's libstdc++. This is why I didn't have any tests interacting with the C++ or C Standard libraries.
But now that I require that the headers are available anyway, then I may as well introduce tests that check that Wide can successfully interoperate with the C and C++ Standard libraries. For example I well recall having an unusual problem getting malloc() to function.
What I'm not sure about is how to construct this driver. For example, if I wanted to check that std.cout << true executes correctly, I'd .. what, redirect the process stdin/stdout and check their contents? Smoketesting other stdlib features seems simpler. Then there's tests for warnings which I still haven't constructed yet.
This is down to lack of something that robot termed "smoke tests". With the new typeid() support comes a new testing constraint- the test environment must have a working copy of the stdlib, including headers. Previously I didn't need the headers, only the symbols, which on Linux I acquired from my own process (srsly) and on Windows I loaded MinGW's libstdc++. This is why I didn't have any tests interacting with the C++ or C Standard libraries.
But now that I require that the headers are available anyway, then I may as well introduce tests that check that Wide can successfully interoperate with the C and C++ Standard libraries. For example I well recall having an unusual problem getting malloc() to function.
What I'm not sure about is how to construct this driver. For example, if I wanted to check that std.cout << true executes correctly, I'd .. what, redirect the process stdin/stdout and check their contents? Smoketesting other stdlib features seems simpler. Then there's tests for warnings which I still haven't constructed yet.
Thursday, 5 June 2014
Had a funsie with vtables. In the previous implementation, vtables were only initialized if the more derived type had a virtual pointer. This was always the case when needed before because if you had any virtual functions you had a virtual pointer. Of course, with RTTI and offset-to-top implemented, you need to override the base vtable even if you don't change the functions that are called.
Now constructors always call the vtable initialization routine, and then if the type doesn't have any vptrs, nothing happens.
Furthermore, I now have a run-time dependency on the C++ Standard Library, even for pure Wide code. I have cracked up typeid() and a couple tests for it. dynamic_cast should not be hard- the routine is a library routine, all I need to do is implement a small Expression wrapper on top.
So the ABI checklist now looks like this:
Now constructors always call the vtable initialization routine, and then if the type doesn't have any vptrs, nothing happens.
Furthermore, I now have a run-time dependency on the C++ Standard Library, even for pure Wide code. I have cracked up typeid() and a couple tests for it. dynamic_cast should not be hard- the routine is a library routine, all I need to do is implement a small Expression wrapper on top.
So the ABI checklist now looks like this:
- Dynamic_cast (easy)
- Exception handling (oh shit...)
- Change layout algorithm to be Itanium-compliant (shouldn't be too hard)
- Fix Wide types to be exposable even if they inherit (really depends on Clang)
- Fix some Wide types exposure like overload set, lambda (shouldn't be too hard)
- Look into MS ABI support (dunno)
- Implement abstract types (should be easy I hope)
- Fix deleting destructors. Right now they only destroy and that's bad.
RTTI- check
Just had a good implement of RTTI. I had a brainwave which is that if I always create the Clang aggregate TU, then I can then query it for everything I need, so types that have a Clang type can just delegate to Clang for their RTTI implementation. Most of the rest are simple "Use the RTTI vtables and add a simple null-terminated string" thing. This means that in principle, I can now implement dynamic_cast, typeid(), and begin work on EH.
I haven't exactly written many tests for it, though...
But when EH is done, then I feel like I will be on much more solid ground. Some MS ABI support would be nice too.
I haven't exactly written many tests for it, though...
But when EH is done, then I feel like I will be on much more solid ground. Some MS ABI support would be nice too.
Wednesday, 4 June 2014
sizeoff(), ABI, and code duplication
I'm looking at implementing more ABI stuff. Right now, I have the vtable layout stuff fixed, I hope- the vtable layout can now contain things other than virtual functions, like virtual destructors, deleting destructors, offsets (glory) and RTTI pointers.
What I'm really thinking about right now though is layout.
In order for Wide and C++ to communicate using a type, they have to agree on it's layout. You could not have a type where Wide thinks one subobject is in a different place. The problem with this is that Clang cannot lay out arbitrary types in the way that Wide can- it can only lay out Clang types, and it can only do so in the context of a particular translation unit.
The real problem is sizeof(). Since sizeof() is a constant, I have to know when you request the size how to lay out the class. If I lay it out in one way, and then Clang lays it out in another, I can't simply drop my own layout. I have to know beforehand. This means either strictly laying out all classes in the same way as Clang (sucks), and duplicating their layout code, or, change sizeof().
I've been thinking about introducing a new class of value- a semi-constant, you could call it. The value would not be a constant (since it's only semi) but not vary at run-time. There would be some language features that could accept semi-constants instead of constants- say, array size.
Another advantage of this would be that strictly speaking, the code would be more platform-independent. One of the reasons that C++ is not platform independent right now is that when you use sizeof(), it has to tell you the size. You can't port the IR output of Clang from x86 to x64 because the sizeof()s will be incorrect.
But in principle, a hypothetical Wide VM could use the same LLVM IR across multiple platforms. There's already work in this direction with PNaCL.
Quick edit: First, LLVM's array types take only integer values, not constantexprs, so that would be fun. Second, turns out that Itanium ABI specifies a bunch of secondary virtual tables which are complete duplicates of the primary ones for ... some reason. I didn't have these secondary tables. This just shows that I really, really need more tests. But onwards and upwards, as they say. I will stop implementing new features when I have Itanium-compatible EH. And my code count is now 18,500. Feel the growth.
Thirdly, my laparoscopy is scheduled for Monday. If the surgeon gives me the all clear then I'm done, done, done.
What I'm really thinking about right now though is layout.
In order for Wide and C++ to communicate using a type, they have to agree on it's layout. You could not have a type where Wide thinks one subobject is in a different place. The problem with this is that Clang cannot lay out arbitrary types in the way that Wide can- it can only lay out Clang types, and it can only do so in the context of a particular translation unit.
The real problem is sizeof(). Since sizeof() is a constant, I have to know when you request the size how to lay out the class. If I lay it out in one way, and then Clang lays it out in another, I can't simply drop my own layout. I have to know beforehand. This means either strictly laying out all classes in the same way as Clang (sucks), and duplicating their layout code, or, change sizeof().
I've been thinking about introducing a new class of value- a semi-constant, you could call it. The value would not be a constant (since it's only semi) but not vary at run-time. There would be some language features that could accept semi-constants instead of constants- say, array size.
Another advantage of this would be that strictly speaking, the code would be more platform-independent. One of the reasons that C++ is not platform independent right now is that when you use sizeof(), it has to tell you the size. You can't port the IR output of Clang from x86 to x64 because the sizeof()s will be incorrect.
But in principle, a hypothetical Wide VM could use the same LLVM IR across multiple platforms. There's already work in this direction with PNaCL.
Quick edit: First, LLVM's array types take only integer values, not constantexprs, so that would be fun. Second, turns out that Itanium ABI specifies a bunch of secondary virtual tables which are complete duplicates of the primary ones for ... some reason. I didn't have these secondary tables. This just shows that I really, really need more tests. But onwards and upwards, as they say. I will stop implementing new features when I have Itanium-compatible EH. And my code count is now 18,500. Feel the growth.
Thirdly, my laparoscopy is scheduled for Monday. If the surgeon gives me the all clear then I'm done, done, done.
Tuesday, 3 June 2014
Implemented basic constructor exporting: check.
Implemented basic destructor exporting: check
Fixed Coliru: check
TODO tomorrow: Virtual Itanium ABI destructors and make a start on RTTI, video maybe, definitely a lot of slacking and eating, dog cuddling and walking, you know. I've found that somehow life's more satisfying when you do stuff instead of cry about stuff.
What I really ought to do is UNINSTALL GAMES.
Implemented basic destructor exporting: check
Fixed Coliru: check
TODO tomorrow: Virtual Itanium ABI destructors and make a start on RTTI, video maybe, definitely a lot of slacking and eating, dog cuddling and walking, you know. I've found that somehow life's more satisfying when you do stuff instead of cry about stuff.
What I really ought to do is UNINSTALL GAMES.
ABI support
I need to seriously consider how dependent I am on a particular ABI. I've been looking into adding RTTI and several parts of my code could not possibly handle another ABI. I've certainly been thinking about supporting Microsoft ABI as well as Itanium. Some ABI details Clang very neatly abstracts away from us. Some it does not.
Mangled names are one example of an ABI detail that I need virtually never concern myself with. Clang has a simple function to mangle the name, I use it, I'm done. The mangled name does not concern me in the slightest. There are a few ABI details for which this does concern me but they're quite limited and easily handled.
Class layout is (will be) another. Soon I can unify AggregateType and ClangType, and allow Clang to perform all layout for Clang types. This will simply be a question of setting the appropriate ABI and letting Clang handle the rest.
Vtable layout is something I do myself, which will require adjustment. Currently, based on some Clang APIs, for Itanium ABI I can perform a compliant vtable layout. For Microsoft ABI I'd have to rework this code.
Calling convention. Part of calling convention is handled by LLVM but another part is handled by Clang. I'm not quite sure why non-complex types are not handled entirely at the LLVM level but that's another question. I will probably have to duplicate Clang's code here (it's quite short) to determine the correct calling convention for C++ functions. For Wide functions I can use whatever calling convention I like.
RTTI will be completely ABI-dependent, as will EH. Clang contains some support routines for RTTI for Itanium, I'm not sure how solid they are for Microsoft ABI as their support for that is still under construction.
Just for reference, Itanium ABI is the one followed by GCC and Clang on nearly all platforms, optionally including Windows. ARM ABI is used on ARM processors and is a close derivative. As far as I'm aware, Microsoft are pretty much the only ones who don't follow Itanium ABI, on any platform.
I've been wondering about how to architect support for various ABIs. FunctionType, my class that handles calling functions, will probably need re-working to handle calling functions of differing ABIs, and thunk-handling code will have to be able to generate thunks for more than one ABI. For stuff like vtables, a single class can only have one vtable layout, but I figure that the base classes can have vtables in any ABI.
Currently, Wide does not take advantage of ODR- every TU's copy of a given type representation is a distinct Wide type. This is something I'd like to change but for sure, every ABI's copy of a given type is distinct.
The next thing I need to do in terms of ABI support is exporting constructors and destructors, and support virtual destructors. When this is done, I can move to RTTI and then EH.
For search paths on Coliru, I have decided to simply hardcode them into the Wide shell script. That will solve the immediate problem of not being able to use it as a demo.
Mangled names are one example of an ABI detail that I need virtually never concern myself with. Clang has a simple function to mangle the name, I use it, I'm done. The mangled name does not concern me in the slightest. There are a few ABI details for which this does concern me but they're quite limited and easily handled.
Class layout is (will be) another. Soon I can unify AggregateType and ClangType, and allow Clang to perform all layout for Clang types. This will simply be a question of setting the appropriate ABI and letting Clang handle the rest.
Vtable layout is something I do myself, which will require adjustment. Currently, based on some Clang APIs, for Itanium ABI I can perform a compliant vtable layout. For Microsoft ABI I'd have to rework this code.
Calling convention. Part of calling convention is handled by LLVM but another part is handled by Clang. I'm not quite sure why non-complex types are not handled entirely at the LLVM level but that's another question. I will probably have to duplicate Clang's code here (it's quite short) to determine the correct calling convention for C++ functions. For Wide functions I can use whatever calling convention I like.
RTTI will be completely ABI-dependent, as will EH. Clang contains some support routines for RTTI for Itanium, I'm not sure how solid they are for Microsoft ABI as their support for that is still under construction.
Just for reference, Itanium ABI is the one followed by GCC and Clang on nearly all platforms, optionally including Windows. ARM ABI is used on ARM processors and is a close derivative. As far as I'm aware, Microsoft are pretty much the only ones who don't follow Itanium ABI, on any platform.
I've been wondering about how to architect support for various ABIs. FunctionType, my class that handles calling functions, will probably need re-working to handle calling functions of differing ABIs, and thunk-handling code will have to be able to generate thunks for more than one ABI. For stuff like vtables, a single class can only have one vtable layout, but I figure that the base classes can have vtables in any ABI.
Currently, Wide does not take advantage of ODR- every TU's copy of a given type representation is a distinct Wide type. This is something I'd like to change but for sure, every ABI's copy of a given type is distinct.
The next thing I need to do in terms of ABI support is exporting constructors and destructors, and support virtual destructors. When this is done, I can move to RTTI and then EH.
For search paths on Coliru, I have decided to simply hardcode them into the Wide shell script. That will solve the immediate problem of not being able to use it as a demo.
Monday, 2 June 2014
Clang- not designed as a library
It's becoming all too clear to me that Clang was not, in fact, designed as a library, except for some uses supporting Intellisense and such. Here's an unfortunate and simple example. Clang has acres of code (it's really quite a lot) to handle finding G++ include paths. But it's impossible to re-use this code in Wide because their structure talks in terms of Clang driver command-line arguments. So now I can't deploy Wide to Unix systems because I can't find the G++ include paths, which vary a lot more than you'd expect from system to system (why? who knows). Clang can find them, but good luck actually getting that to function when you're using Clang as a library.
This is a prime example of what I want to avoid with Wide.
This is a prime example of what I want to avoid with Wide.
Lines of code
On a more personal note, I love watching the lines. I run the command to check how many lines Wide is nearly every commit. It's not that I feel that this is some empirical evidence of quality. We all know that adding LoC means little. But when the lines of code grows a lot, I feel like I'm making progress. Just for reference, the entire LLVM Project (including clang, and some other subprojects) has 860,000 lines of code right now, and 18,000 tests. I have 18,000 lines and 140 tests. I guess this always leaves me feeling like the small fish in the pond (also that LLVM has nearly triple the number of tests that I do when accounting for codebase size).
Obviously I like to feel like my code is high quality. I don't mind making changes that reduce the LoC and I know there are plenty of good changes that decrease it and bad changes that increase it. Tests are included in my measurement so the more tests I have, the higher that value should be. But ultimately, as an entirely subjective feeling, I feel like I should be adding to the codebase's size.
I've been sitting pretty at about 17-18k for a while now. I guess it's a good thing that I've implemented many new features like inheritance without substantially increasing the size of the codebase, and since I've introduced automated testing (with many more tests to come, hopefully) the reliability is a lot higher. And now that I'm not horribly, horribly sick, I'm much more available.
What I really need to do is ensure that I spend less of my time chatting in the Lounge, shooting people in the face, flying spacecraft, or lynching people for being the Mafia, and more time working. Also job-hunting. That would be good too. Maybe I should ask Daisy to help me, she's always happy to make sure that my left hand isn't good for much.
Time to write tests for all those new features I implemented. And devise a test driver for warnings.
Fixing up some real-world issues.
Yesterday I took a big crack at fixing up the meagre stdlib and associated. Turns out there were a few issues that I don't currently test for that were missing. Here's the commit. Notable is that I have no tests for any of these fixes. Also notable is my lack of tests for warnings, my driver won't support them so more fun there, no tests for the lib itself, etc. Plus I need to look into exception handling (the joy!). I also need to fix things like overload set exposure to C++. And I also need to look into more ABI support- particularly for Microsoft, but also better for Itanium, including RTTI and EH.
Once I have RTTI and EH across all platforms, I can bootstrap Wide and that shall be a glorious day. Let's face it, right now the language definitely feels like C with Classes with a few nice extras on top like lambdas. Sometimes it's hard not to try and rush directly for these features. Rushing for features in the past has been a bad move for me, though. It's clear that I still don't understand some features I want, like incremental re-analysis, and the new semantic error handling model is a mile away.
On the upside the non-void falloff warning means computing the CFG, which can resolve a few issues. LLVM is amazingly finnicky about when it will and will not accept code. For example, consider the following program:
It's pretty clear that the second return will never be executed but you would think that it should be legal. Perhaps I will consider explicitly rejecting such code in the future. However, naively generating both returns to LLVM IR will result in an assertion failure, because LLVM will not handle more than one control flow statement per basic block. Using a CFG can avoid such problems because we can eliminate statements without any predecessors. We can also issue a warning.
This leads us to the next question. Imagine something like
It seems obvious to all that this can fall off the end of a non-void function- for some instances. Other instances cannot. We could discriminate at compile-time, but should we? Right now, the compiler will warn for this function for all instances. When I implement some constant folding (low priority) it will stop warning for instances where it's statically provable.
In short, it's pretty obvious that I need more manpower. There's just so many tests to write, so many new features to implement, and I need people to bounce ideas off. Monologuing into a blog only serves this purpose to some degree, and the LLVM chat only really suffices for lower-level code-generation stuff (thanks for the help on that stuff, btw). Plus, I don't get Cool Internet Points for working in silence in a corner.
I've been considering putting together a YouTube video or two about Wide. I don't know shit about animating or anything, but when I have another Unix build and upload it, I have my online compiler back again, which should make life a lot easier w.r.t. advertising the language. Just go here and play with this sample, and you can see how easy it is. Fixing up my VS addin (I doubt it needs much work) would also help in this regard.
Once I have RTTI and EH across all platforms, I can bootstrap Wide and that shall be a glorious day. Let's face it, right now the language definitely feels like C with Classes with a few nice extras on top like lambdas. Sometimes it's hard not to try and rush directly for these features. Rushing for features in the past has been a bad move for me, though. It's clear that I still don't understand some features I want, like incremental re-analysis, and the new semantic error handling model is a mile away.
On the upside the non-void falloff warning means computing the CFG, which can resolve a few issues. LLVM is amazingly finnicky about when it will and will not accept code. For example, consider the following program:
f() {
return true;
return true;
}
It's pretty clear that the second return will never be executed but you would think that it should be legal. Perhaps I will consider explicitly rejecting such code in the future. However, naively generating both returns to LLVM IR will result in an assertion failure, because LLVM will not handle more than one control flow statement per basic block. Using a CFG can avoid such problems because we can eliminate statements without any predecessors. We can also issue a warning.
This leads us to the next question. Imagine something like
f(arg) {
if (decltype(arg).size > 5)
return true;
}
It seems obvious to all that this can fall off the end of a non-void function- for some instances. Other instances cannot. We could discriminate at compile-time, but should we? Right now, the compiler will warn for this function for all instances. When I implement some constant folding (low priority) it will stop warning for instances where it's statically provable.
In short, it's pretty obvious that I need more manpower. There's just so many tests to write, so many new features to implement, and I need people to bounce ideas off. Monologuing into a blog only serves this purpose to some degree, and the LLVM chat only really suffices for lower-level code-generation stuff (thanks for the help on that stuff, btw). Plus, I don't get Cool Internet Points for working in silence in a corner.
I've been considering putting together a YouTube video or two about Wide. I don't know shit about animating or anything, but when I have another Unix build and upload it, I have my online compiler back again, which should make life a lot easier w.r.t. advertising the language. Just go here and play with this sample, and you can see how easy it is. Fixing up my VS addin (I doubt it needs much work) would also help in this regard.
Sunday, 1 June 2014
Uses and analyzer design
I believe I've come to the next stage of analyzer design. It occurred to me that many of the problems I'm looking at have already been solved, by LLVM. Simply stealing their design would seem to be an appropriate solution here.
The problem I've been considering is thteefold. One, exceptions. Currently, I can only determine which destructors need calling at the Statement level. However, I need to be able to determine which destructors need calling at the Expression level in order to implement appropriate EH. Secondly, I've been considering the uses problem. For example, given an ImplicitTemporaryExpression, which is stored to, and I'm trying to load from, is it safe to elide the temporary and just take the value that was stored to it? Only if I'm the only user. This suggests that I need to be able to track who uses what expressions. Thirdly, I've been considering the problem of incremental re-analysis and such further. I've come to the conclusion that there are two different types of Expression. The first cannot change- it is an implementation. The second can and it would be a function. The key insight here is that first, I can represent these as different types in my analyzer. In addition, if the second is viewed as a function, then the arguments are all "metaexpressions" that are arguments- including their types, which are meta-expressions.
First, we observe that all expression dependencies form a DAG, or should do. Second, we maintain a list of those uses. When an expression's use count drops to zero, we destroy it. This gives us several things. first, the ability to find and enumerate all uses of an expression. Second, we can eliminate all those annoying ExpressionReference things.
Right now, I am looking at issuing warnings through control flow analysis- e.g. flow may reach end of non-void function. But after that, it's time for another analysis overhaul... great.
I've had one last-ditch thought about the syntax, and I may just introduce attribute syntax from C#- say something like
[export := "name"]
[export := cpp("main.cpp").print]
[return := blah]
f() { return "hello"; }
I also have failed to consider exporting functions or dynamic functions where their return types or arguments are is-a matches but not an exact match. I need to unify my thunk-generating code to handle these issues.
Finally, I also need to add one of the features I really needed from this analyzer design- MultiTypeDependency. This will essentially tell the analyzer which expressions/statements hold dependencies on arguments of variable type.
The problem I've been considering is thteefold. One, exceptions. Currently, I can only determine which destructors need calling at the Statement level. However, I need to be able to determine which destructors need calling at the Expression level in order to implement appropriate EH. Secondly, I've been considering the uses problem. For example, given an ImplicitTemporaryExpression, which is stored to, and I'm trying to load from, is it safe to elide the temporary and just take the value that was stored to it? Only if I'm the only user. This suggests that I need to be able to track who uses what expressions. Thirdly, I've been considering the problem of incremental re-analysis and such further. I've come to the conclusion that there are two different types of Expression. The first cannot change- it is an implementation. The second can and it would be a function. The key insight here is that first, I can represent these as different types in my analyzer. In addition, if the second is viewed as a function, then the arguments are all "metaexpressions" that are arguments- including their types, which are meta-expressions.
First, we observe that all expression dependencies form a DAG, or should do. Second, we maintain a list of those uses. When an expression's use count drops to zero, we destroy it. This gives us several things. first, the ability to find and enumerate all uses of an expression. Second, we can eliminate all those annoying ExpressionReference things.
Right now, I am looking at issuing warnings through control flow analysis- e.g. flow may reach end of non-void function. But after that, it's time for another analysis overhaul... great.
I've had one last-ditch thought about the syntax, and I may just introduce attribute syntax from C#- say something like
[export := "name"]
[export := cpp("main.cpp").print]
[return := blah]
f() { return "hello"; }
I also have failed to consider exporting functions or dynamic functions where their return types or arguments are is-a matches but not an exact match. I need to unify my thunk-generating code to handle these issues.
Finally, I also need to add one of the features I really needed from this analyzer design- MultiTypeDependency. This will essentially tell the analyzer which expressions/statements hold dependencies on arguments of variable type.
Subscribe to:
Posts (Atom)