Moving from my old "todo.txt that never gets updated" to Trello. I can now confirm that I have fixed the Clang layout thing so now, Clang uses my layout when talking about a Wide type. I even succeeded in commanding it to use my size/alignment override attributes and the interface suggests that I can make it respect almost any layout I want, which is good.
In a not-really-related note, I also fixed arrays to now use bounds checking, and I am going to add unsafe array/pointer indexing soon. I also added a super-quick test for array interop with Clang that seems to have worked OK. I need to handle array-to-pointer decay, though.
One thing I feel that I need to make a big pass on soon is error handling. The parser errors, many of them are plain incorrect, and you can make a custom production in most places or override the default but you can't create new parser errors. Also the parser cannot handle multi-token operators in general, and the lexer cannot handle variable-length tokens in general, which are things I may or may not want to address. One of the things I can do with the new parser structure is, in theory, automatically create errors in many places which are of about the same quality as before, which would save a lot of effort.
And the analyzer errors, many of them are problematic, they're just throw std::runtime_error, which is somewhat difficult to deal with. I need to introduce special error handling for creating values of abstract bases and also been thinking about some kind of private_cast. Even the better errors are mostly just throw SemanticError. I need to factor out errors so they are a property of expressions or maybe also types that you can query and ignore if you want to.
For Clang types, I really managed to improve the description of the header a type came from. Now it should be almost always accurate as to the header the user actually requested to include from their Wide program that resulted in that type.
But, well, I got many things I need to do. At least this one will mean a break in fighting Clang for a while.
Friday, 29 August 2014
Wednesday, 27 August 2014
Buffing up arrays
My original implementation of arrays was a bit haphazard. I tried to treat them like any other value type. But LLVM IR brings it's own array fun. Particularly, you cannot random-access an array value in LLVM IR, which is a bit problematic. Furthermore, since C++ expects arrays to always be in memory, I've decided to reclassify arrays as an "Always keep in memory" type.
I also need to confirm that my array type interoperates correctly with C++ arrays and in addition, I need to implement that bounds-checking thing.
After that, I need to
The Clang people don't seem to be too interested in making the Codegen headers public. Alas. It seems that they'd be perfectly happy to accept a public interface that looks something like Wide's Clang->IR layer, but I'd have to build and contribute the implementation first, or at least the interface. I might have this done in time for 3.6 or a hypothetical 3.5.1, but the bottom line is that for the foreseeable future, building and working on Wide will remain a bitch.
I also need to confirm that my array type interoperates correctly with C++ arrays and in addition, I need to implement that bounds-checking thing.
After that, I need to
- Fix the Clang layout thing
- Extend the modules thing
- Add better module imports. May want to use keyword "import".
- Update documentation.
- Contribute some stuff to Clang and issue some Clang patches.
The Clang people don't seem to be too interested in making the Codegen headers public. Alas. It seems that they'd be perfectly happy to accept a public interface that looks something like Wide's Clang->IR layer, but I'd have to build and contribute the implementation first, or at least the interface. I might have this done in time for 3.6 or a hypothetical 3.5.1, but the bottom line is that for the foreseeable future, building and working on Wide will remain a bitch.
Tuesday, 26 August 2014
Codegen time vs Analysis time
Recently, I've been fighting a lot with the analyzer, mostly about not what gets done, but when.
Analysis-time:
Getting the order of operations wrong leads to a lot of infinite recursion and cross-dependencies in the compiler. Also raises questions about how effective stuff like warnings can be.
Codegen-time:
Shit breaks with Clang.
Some situations like vtables don't fit well into either of these buckets. The problem is that computing the vtable lazily at analysis time, as the usual strategy, means that the offset from base to derived must be known at that time. This plays badly with converting to Clang-type for layout purposes. Yay.
But doing it at compile-time implies that I have to analyze which functions override which others at compile-time, and when it comes to for example implicit conversion from A to B, then this implies that when analysis is finished I don't know all the functions I'm going to call and they haven't been analyzed yet, which bears badly on my ability to implement code support features.
Right now, I've been experimenting with getting Clang to lay out all Clang-compatible types for me. The problem is that the old layout code, where I duplicate the Itanium layout code (sometimes incorrectly) only required knowledge of the members. Converting to Clang type requires knowledge of, well, everything.
Turns out that Clang's ExternalASTSource is sweet, sweet sex. I'm an idiot. Let's see how far this greatness can get me.
Analysis-time:
Getting the order of operations wrong leads to a lot of infinite recursion and cross-dependencies in the compiler. Also raises questions about how effective stuff like warnings can be.
Codegen-time:
Shit breaks with Clang.
Some situations like vtables don't fit well into either of these buckets. The problem is that computing the vtable lazily at analysis time, as the usual strategy, means that the offset from base to derived must be known at that time. This plays badly with converting to Clang-type for layout purposes. Yay.
But doing it at compile-time implies that I have to analyze which functions override which others at compile-time, and when it comes to for example implicit conversion from A to B, then this implies that when analysis is finished I don't know all the functions I'm going to call and they haven't been analyzed yet, which bears badly on my ability to implement code support features.
Right now, I've been experimenting with getting Clang to lay out all Clang-compatible types for me. The problem is that the old layout code, where I duplicate the Itanium layout code (sometimes incorrectly) only required knowledge of the members. Converting to Clang type requires knowledge of, well, everything.
Turns out that Clang's ExternalASTSource is sweet, sweet sex. I'm an idiot. Let's see how far this greatness can get me.
Tuesday, 19 August 2014
Access specifiers
The more I think about this, the more I dislike C++'s "private".
If the Clang guys had made their members private and friended the classes they needed, instead of poorly just not shipping the headers in /include, it would be impossible for Wide to exist.
It seems that I simply have a talent for working with existing rules and bending them to do things they were never supposed to do, like exception vomiting, using unique_ptr for file descriptors (I keep being told this is impossible... useless suckers), and now this stuff with Clang.
For example, I know for a fact right now that the LLVM/Clang codebase contains a re-implementation of std::vector because the existing std::vector doesn't do small-allocation stuff. If only std::vector were more open, they would not have had to do that. They could have just futzed with the internals in a somewhat unsafe way.
And protected is even worse. It frustrates me that protected methods and fields can only be used on "this". I have several functions where what I really wanted was to be non-public but accessible to derived class implementations.
At the moment, "protected" in Wide is the revised definition, but I've been thinking about how I might want to nerf private. I've been thinking that I might introduce specifiers like module private/module protected/module public, where other types or functions defined in that module can get extra access rights.
I've also been thinking about adding a kind of access_cast to the language that ignores accessibility-specifiers. But that would entail Improved Error Handling™.
Just as a side note, it occurred to me that I'll need field attributes for fine control of binary module exporting.
If the Clang guys had made their members private and friended the classes they needed, instead of poorly just not shipping the headers in /include, it would be impossible for Wide to exist.
It seems that I simply have a talent for working with existing rules and bending them to do things they were never supposed to do, like exception vomiting, using unique_ptr for file descriptors (I keep being told this is impossible... useless suckers), and now this stuff with Clang.
For example, I know for a fact right now that the LLVM/Clang codebase contains a re-implementation of std::vector because the existing std::vector doesn't do small-allocation stuff. If only std::vector were more open, they would not have had to do that. They could have just futzed with the internals in a somewhat unsafe way.
And protected is even worse. It frustrates me that protected methods and fields can only be used on "this". I have several functions where what I really wanted was to be non-public but accessible to derived class implementations.
At the moment, "protected" in Wide is the revised definition, but I've been thinking about how I might want to nerf private. I've been thinking that I might introduce specifiers like module private/module protected/module public, where other types or functions defined in that module can get extra access rights.
I've also been thinking about adding a kind of access_cast to the language that ignores accessibility-specifiers. But that would entail Improved Error Handling™.
Just as a side note, it occurred to me that I'll need field attributes for fine control of binary module exporting.
Thursday, 14 August 2014
Working with the Clang guys
I've officially begun the process of co-ordinating with the Clang devs to fix the stuff that needs fixing and hopefully push the Wide supporting APIs to be part of Clang proper instead of being hacked on top of their private internals. This will hopefully result in Wide being more robust, smaller, much easier to build, and also, I'll be getting some OSS commits for respectable projects on my CV, which will definitely help matters out.
The downside of this is that it may hinder future development on Wide directly, since I'll be working on fixing Clang instead.
The next thing that's Full Steam Ahead™ for Wide directly will probably be the first revision of modules. I have a few tricks up my sleeve, but for now, I'll be only aiming to handle non-generic code. It seems that unfortunately I'll have to add yet another dependency, or more likely, two, for this. Glorious. In a not-at-all kind of way.
But before doing that, I've got a slew of jobs to apply for, I just finished updating my website, and I've got a bunch of other stuff to do.
The downside of this is that it may hinder future development on Wide directly, since I'll be working on fixing Clang instead.
The next thing that's Full Steam Ahead™ for Wide directly will probably be the first revision of modules. I have a few tricks up my sleeve, but for now, I'll be only aiming to handle non-generic code. It seems that unfortunately I'll have to add yet another dependency, or more likely, two, for this. Glorious. In a not-at-all kind of way.
But before doing that, I've got a slew of jobs to apply for, I just finished updating my website, and I've got a bunch of other stuff to do.
Tuesday, 12 August 2014
More Clang limitations
Welp, just ran into a new Clang problem. Turns out that they can't handle multiple codegen units going into the same module.
The exacerbating problem is that I simply can't really patch Clang. My machine is too weak to feasibly rebuild and test the damn thing because it's so big, and even if I succeeded, requiring patches to be applied would only make worse an already problematic build process.
I'll probably just have to employ a hack that requires a bit more grunt work- manually copy from one module to another the Clang declarations/types/etc that I need. This is gonna suck tremendously.
The exacerbating problem is that I simply can't really patch Clang. My machine is too weak to feasibly rebuild and test the damn thing because it's so big, and even if I succeeded, requiring patches to be applied would only make worse an already problematic build process.
I'll probably just have to employ a hack that requires a bit more grunt work- manually copy from one module to another the Clang declarations/types/etc that I need. This is gonna suck tremendously.
Monday, 11 August 2014
Purity
Today, I had an idea.
LLVM has this really annoying verification function where when it fails, it dumps an error to stdout and then terminates the process. This is super irritating, partly because if Wide is, say, running as part of a VS extension, then there's nobody to see that error.
The key here is that the guy who wrote that part of the LLVM code made an assumption. He made an assumption about how the library would be used- as part of a command-line application. This assumption is obviously bad. This is why for Wide I'm choosing to restrict stdin/stdout/stderr.
But what I suddenly realized is that this is a vague match for that endless purity bullshit the Haskell guys like to spew. The libraries should not perform I/O for themselves, but only request data from higher up. Only the final driver author knows how the final product is used. He knows whether it's a command-line app or a VS extension or a web service or library for someone else to consume arbitrarily.
The problem seems to be that they simply express why things should be this way for totally the wrong reasons. Instead of shitting around about currying or mathematically correct functionality or some other random thing nobody cares about, they should get to the real reason why restricting I/O and side effects is a good thing- because it makes programs more maintainable and helps enforce separation of concerns. If they just gave an example like that, it'd be way easier to buy in.
On the level of implementing any given function, I tend to use mutability a lot. But on the level of interface design, I'm increasingly trending towards immutability, purity, and a bunch of other things.
I'm definitely still not feeling positive towards Haskell, though. Too much wankery, too little choice. Purity and immutability are tools, nothing more, and they are to be applied to the right situations, and nowhere else. Being forced into them is no better than being forced into using inheritance for everything in Java. But hell, if you made a Haskell where you could mutate function-local variables, I might start buying into that shit.
LLVM has this really annoying verification function where when it fails, it dumps an error to stdout and then terminates the process. This is super irritating, partly because if Wide is, say, running as part of a VS extension, then there's nobody to see that error.
The key here is that the guy who wrote that part of the LLVM code made an assumption. He made an assumption about how the library would be used- as part of a command-line application. This assumption is obviously bad. This is why for Wide I'm choosing to restrict stdin/stdout/stderr.
But what I suddenly realized is that this is a vague match for that endless purity bullshit the Haskell guys like to spew. The libraries should not perform I/O for themselves, but only request data from higher up. Only the final driver author knows how the final product is used. He knows whether it's a command-line app or a VS extension or a web service or library for someone else to consume arbitrarily.
The problem seems to be that they simply express why things should be this way for totally the wrong reasons. Instead of shitting around about currying or mathematically correct functionality or some other random thing nobody cares about, they should get to the real reason why restricting I/O and side effects is a good thing- because it makes programs more maintainable and helps enforce separation of concerns. If they just gave an example like that, it'd be way easier to buy in.
On the level of implementing any given function, I tend to use mutability a lot. But on the level of interface design, I'm increasingly trending towards immutability, purity, and a bunch of other things.
I'm definitely still not feeling positive towards Haskell, though. Too much wankery, too little choice. Purity and immutability are tools, nothing more, and they are to be applied to the right situations, and nowhere else. Being forced into them is no better than being forced into using inheritance for everything in Java. But hell, if you made a Haskell where you could mutate function-local variables, I might start buying into that shit.
Sunday, 10 August 2014
New build up on Coliru
Finally found the source of my Linux build failures. Turns out that LLVM's process API simply doesn't work for parallel invocations on Linux. Thanks, LLVM. I should look into submitting a patch for that. As a result, I finally have a new build up on Coliru. It's time for me to revisit the contents of my website again. And maybe stick a link up to this blog.
The general new features are the thunking improvements I discussed previously. I also first-passed delegating constructors. One thing I need to fix is that when I changed member initialization syntax, I changed it in such a fashion that you can no longer initialize a member with multiple arguments /whoops.
In general, I feel like I want to clean up handling members. Right now, there are too many indices going around which are converted all over the place. I'm fairly sure they're all correct but it's too easy to mix up one unsigned value with another.
So right now I'm thinking most about modules and ABI. The real trick is going to be what happens when there's an ABI mismatch. I have a few sprinkles of magic I'm planning to add.
The general new features are the thunking improvements I discussed previously. I also first-passed delegating constructors. One thing I need to fix is that when I changed member initialization syntax, I changed it in such a fashion that you can no longer initialize a member with multiple arguments /whoops.
In general, I feel like I want to clean up handling members. Right now, there are too many indices going around which are converted all over the place. I'm fairly sure they're all correct but it's too easy to mix up one unsigned value with another.
So right now I'm thinking most about modules and ABI. The real trick is going to be what happens when there's an ABI mismatch. I have a few sprinkles of magic I'm planning to add.
Friday, 8 August 2014
C ABI- done
Right. I kicked Clang into generating C ABI function calls for me, as well as generating thunks for me. Now all the things work correctly and according to the specified ABI. But I'm kinda nervous about how I'm initializing my Clang parameters. There are some values that I think should be initialized for a given target, but aren't.
Also since TeamCity is down, I can't test these changes on any plat except Win32 MinGW32, which is leaving me a little iffy. So there may yet be work to do in this area.
I also found a bunch of unrelated bugs and fixed them.
Not quite sure what to do next, but I think I might look at modules. I have some plans for Wide ABI and I'd like to look into implementing them.
Also since TeamCity is down, I can't test these changes on any plat except Win32 MinGW32, which is leaving me a little iffy. So there may yet be work to do in this area.
I also found a bunch of unrelated bugs and fixed them.
Not quite sure what to do next, but I think I might look at modules. I have some plans for Wide ABI and I'd like to look into implementing them.
Monday, 4 August 2014
C ABI, attributes, and attribution.
LLVM and Clang's function handling is a bit problematic. The function types don't express anywhere near everything you need to call the function properly. They have a bunch of attributes that you have to handle manually (including calling convention!).
But I've realized that the real problem is that I don't know the C ABI. I've been assuming that the LLVM layer handles it, when in reality, it does not. This could explain quite a bit of the deficient behaviour/plain WTFery I've observed from Clang (but far from all of it). However, once I've got this under control, I can finally and confidently erase whatever Clang does and just do my own shit without having to worry about ABIs and why the fuck does Clang generate that code there.
I also added llvm-credits and llvm-licence to my repo. I probably need to change the deploy script to include them in the build.
But I've realized that the real problem is that I don't know the C ABI. I've been assuming that the LLVM layer handles it, when in reality, it does not. This could explain quite a bit of the deficient behaviour/plain WTFery I've observed from Clang (but far from all of it). However, once I've got this under control, I can finally and confidently erase whatever Clang does and just do my own shit without having to worry about ABIs and why the fuck does Clang generate that code there.
I also added llvm-credits and llvm-licence to my repo. I probably need to change the deploy script to include them in the build.
Saturday, 2 August 2014
Function arguments
Today I mostly cleaned up function arguments. I fixed exported members so that they can have an implicit this. I fixed member functions with an implicit this so that they don't generate a new member function body for value/rvalue/lvalue. I fixed overload resolution for exact match preference as well as is-a preference so you can now overload for value/rvalue/lvalue. I reduced a bunch of code duplication by making various methods available on the Analyzer. I fixed non-static functions to generally be much more reliable.
I still have mysterious test failures on Linux. I had problems reproducing the issues, there's clearly some undefined behaviour in there somewhere. Run the test directly- success every time. Run them from the driver and randomly 100-150 of them will fail. TeamCity is down right now anyway so I can't farm out Linux builds.
I've been looking into debug information and there's both good and bad news. The good news is that it looks like it could be fairly easy to implement basic debug info. The bad news is that LLVM's debug info intrinsics are somewhat broken, so there's a limit as to what I can do with them. The other bad news is that I'd have to pretend to be C++.
What I'm really thinking right now is that under certain circumstances, I'll forbid C++ conversion. There are just a few too many questions about how I'd implement various Wide features with no C++ analog.
I still have mysterious test failures on Linux. I had problems reproducing the issues, there's clearly some undefined behaviour in there somewhere. Run the test directly- success every time. Run them from the driver and randomly 100-150 of them will fail. TeamCity is down right now anyway so I can't farm out Linux builds.
I've been looking into debug information and there's both good and bad news. The good news is that it looks like it could be fairly easy to implement basic debug info. The bad news is that LLVM's debug info intrinsics are somewhat broken, so there's a limit as to what I can do with them. The other bad news is that I'd have to pretend to be C++.
What I'm really thinking right now is that under certain circumstances, I'll forbid C++ conversion. There are just a few too many questions about how I'd implement various Wide features with no C++ analog.
Subscribe to:
Posts (Atom)