Thursday 27 November 2014

Clang interop and VS extension

I've gotta get back cracking on Clang interop. This is one of the main things I needed to do before, but that got delayed due to circumstances. I need to present an interface they can use for IR interop behind the codegen layer, then ideally, provide something like an ExternalCodegenSource you can use.

From memory and according to Trello, I fixed Clang interop so I could lay out Wide types however I like. However, this still requires a direct pointer index to find all members, which means that ABI-indirected types can't be supported with a native interface in C++.

In addition, now that I'm finished refactoring my parser, it's time to take another swing at VS extensibility and the C API. That hasn't been maintained in a long time, but I have the power to offer many more capabilities now. For example, in the past, I had to call QuickInfo every time I wanted to provide QuickInfo. Now I can offer the reverse- as in, the user provides a source location, and I can tell them what's there. I think that not only is this more efficient, but it makes for a nicer interface, and I can also provide multiple results for example if it's in a template function with many arguments. There can also be other supported functions like automated refactoring. I think that if I can demo some R# style refactoring, this would be nice.

This also implies directly that I can offer incremental re-lexing and incremental re-parsing and potentially opens a gateway into incremental re-analysis too.

Sunday 28 September 2014

Random-access into UTF8

I've heard many people state that you can't random-access (that is, read a random codepoint in O(1)) into UTF-8, or other variable-length encodings. This, however, is simply not true. It's possible to generalize deque to provide this functionality.

For the purpose of this article, we consider deque to be an array of arrays- where each subarray has a constant maximum size. For simplicity, we'll consider it as vector<unique_ptr<array<T, N>>>. Thus, for some index i, we simply use i / N to find the subarray, and i % N to find the subarray index. This gives us the final element location in our array of arrays.

The key insight here is that since each subarray has a constant maximum size, it actually doesn't matter whether we use an algorithm with a linear complexity to locate the final element inside this subarray, since it's linear in a constant factor.

So imagine a specialization of deque<codepoint>, which is a vector<unique_ptr<vector<codeunit>>>. Each subarray can hold N codepoints, just like before. So to find the subarray holding the codepoint corresponding to index i, we perform the same i / N step. Now we perform a linear scan decoding each UTF-8 codepoint in the array, but since it's linear in our constant factor of N, it still has constant complexity.

For somewhat less simplicity, we could simply go with unique_ptr<codepoint[]>, and then use the fact that all but the last subarray must be full of N codepoints to find the end. This gives us the exact same core structure as before and as deque<int> it's just that the raw size of the subarrays can shrink to accommodate smaller sizes.

Arguably, it's questionable as to whether this is superior to just using a deque of 32bit codepoints directly, since in theory it offers memory savings but I'm not sure how it plays out in reality, and it's definitely questionable as to how useful it would be to random-access codepoints anyway.

Monday 22 September 2014

Employment

Welp, I found myself work, so I won't be spending all day dicking around on Wide anymore. Life's tough. And hopefully financially independent. For once.


Saturday 13 September 2014

Constants, variables, and laziness.

Today I finally got rid of that dumb "Every string literal has a unique type" thing. It's a holdover from pre-constant-expression days. I nuked the code paths that used to handle it. And I discovered that this code path also handled my local variable as reference solution.

So I decided to just whack it and change that.

Now I will use "var : type = value" for variables whose type needs to be explicitly specified. And it will also be useful for members (contrast "var : type;" with "var := value;" with NSDMIs or function arguments). Speaking of which, I should look into function defaulted arguments and such again.

I've been thinking about using type & type as a tuple syntax- so you could do f() := int64 & int64 to denote a function returning a tuple. My current module export code REQUIRES that all types have a notation, and I'm not sure that's a bad thing, it's certainly motivational to fix such issues. As a pairing, I've been thinking about using | to denote a kind of language-implemented variant. One of the keys here is that the compiler can translate it into different run-time semantics. For example, if you said
 
    f(arg : int32 | int64)

then the compiler has no obligation to differentiate them at runtime. It could also simply generate two different overloads and branch on call if necessary.

I'm also thinking about implementing something like tuple[i], as long as i is a constant expression.

Long story short, I'm a smidge burned out on modules. I've been faffing around with dependencies and implements too long.

Wednesday 10 September 2014

Dead Zone

I've been suffering a double whammy of 3RFTS and Planetary Annihilation. My brain is concrete. Using the same machine for both fun and work is problematic.

Friday 5 September 2014

Module dependency data

So today I successfully exported a user-defined type with opaque members (i.e, the data members were not exposed to the caller). There are three more key features I want to finish up for modules.

First is "virtual headers". How this will work is (somewhat) simple. All you do is nominate a directory as a "header" directory. I copy all the headers in that directory (recursively) and stick them in the module archive in a certain folder. When the consumer uses the interface, the headers in the archive can be accessed as their filepath relative to the original directory they came from- so effectively, import/headers is added as an include directory. So if you create, say, a "Boost" module, then you can add Boost headers right into the module and ship it, so the user doesn't need to crap around with getting the headers.

Second, I want to separate interface and implementation modules. Right now they are all the same thing, but I need to split them off so that you can create a module against an interface of another module, then link in various implementations later. As part of this feature, I also need to decide what information modules need to hold about their direct and indirect dependencies, and how to match implementations with interfaces.

I've been thinking about marking each interface and implementation with a UUID, and then referring to each of them that way. Then you could get implementations from a central database by just asking for an implementation of a given UUID. For direct and indirect dependencies, I could simply list them as a UUID. If I kept around the full interface then if you want to use a dependency directly, it would be simpler.

Thirdly, I need to implement one of the primary features - exporting an implementation against an existing interface. My existing hack for imports and exports (create valid Wide source with some hidden attributes for binary interfacing and then just include it directly) is probably not going to function here. I think I can still re-use the basic subcomponents of the lexer and parser but the analyzer will likely require special support.

I think that #1 will be easy to finish up, the first part of #2 shouldn't be too hard but the second part may be harder, and #3 will be moderately difficult. Fortunately the Wide compiler is already mostly extremely abstract in how it represents things, so implementing them in a funky-tastic way shouldn't be too bad. I think that it will make a clear case that Wide can do things better when I can add data members and change the size/alignment without breaking binary compatibility.

Thursday 4 September 2014

ABIs, complex types, and move elision

I've run into a slight problem with the ABI for certain types in a specific situation. Bear with me.

Imagine that we have a type T which has as a member a "complex" type. This is basically any type with a non-trivial copy ctor/dtor, so std::string, say. Due to the identity requirements of being able to see the address in the ctor/dtor, this type always has to be held in memory and never in a register. This means that when you have a "value" of that type, under the hood it's really a pointer.

Let's say that you have a value of this type T, and you access it as a member. Currently, Wide elides the move by simply addressing to the subobject's memory.

But when exposing the access as a function, it's a lot harder because the compiler can't see "into" the function and avoid the double destruction of the return value. In addition, the existing ABI for returning complex types does not permit returning a reference into or to one of the arguments, even though this could be useful, I feel. You absolutely must return a new value (in this case, a move).

I think that for now I will simply implement the function to move. But I am uncertain as to whether it would be worth changing the ABI to support this case, and if it was, how I would do so. It seems to be roughly in line with a similar plan that I had where a function can request an arbitrary amount of scratch memory from the caller, then place multiple potential return values in there. When the function returns, it destroys the non-returned values, and simply returns a pointer to something in that space. This scheme would have to be extended slightly to include a value for whether or not to destroy the thing pointed to by the return value.

Not sure if just moving the damn object is slower, after that.

I've also been thinking of a feature where the destructor can take the object by value. The current constructor/destructor system only allows you to express that an object's identity must be destructed; not that it's value must be destructed. To wit, this would state that a trivial move/copy is acceptable, as long as either src or dest is trivially destructed- i.e., every value of a user-defined type must only be destructed once, but the identity is unimportant. This is, again, useful for ABI-related details and various implementations. For example, you could vectorize copies or moves of something like std::string, or return/pass them in registers. As with the above I'm honestly not sure if this theoretical optimization will actually turn out to be worth actually implementing. Not to mention that I have other ideas for passing parameters to destructors.

Monday 1 September 2014

New parser errors

As a result of the previous rebuild of the parser to a good part table-driven, then I now have automated parser error generation. This means that when the user alters the parser's data tables, the errors are updated automagically. I also cut 200 lines from the parser. Let's have a brief look at the Wide breakdown.

Premake Lua script: 394
Website source: 138 Python, 1133 HTML
VisualWide: 2172 of C#
Lexer: 590
Parser: 1900
ParserTest: 261
Semantic: 11835
SemanticTest: 2971
Util: 1073
CLI: 368
CAPI: 279
WideLibrary: 432
Total: 23549.

Perhaps the thing that needs the most love right now is VisualWide. I haven't worked on it in a long time and the parser/analyzer have changed a lot since then. I doubt the thing really works.

Friday 29 August 2014

Some fixes, also Trello

Moving from my old "todo.txt that never gets updated" to Trello. I can now confirm that I have fixed the Clang layout thing so now, Clang uses my layout when talking about a Wide type. I even succeeded in commanding it to use my size/alignment override attributes and the interface suggests that I can make it respect almost any layout I want, which is good.

In a not-really-related note, I also fixed arrays to now use bounds checking, and I am going to add unsafe array/pointer indexing soon. I also added a super-quick test for array interop with Clang that seems to have worked OK. I need to handle array-to-pointer decay, though.

One thing I feel that I need to make a big pass on soon is error handling. The parser errors, many of them are plain incorrect, and you can make a custom production in most places or override the default but you can't create new parser errors. Also the parser cannot handle multi-token operators in general, and the lexer cannot handle variable-length tokens in general, which are things I may or may not want to address. One of the things I can do with the new parser structure is, in theory, automatically create errors in many places which are of about the same quality as before, which would save a lot of effort.

And the analyzer errors, many of them are problematic, they're just throw std::runtime_error, which is somewhat difficult to deal with. I need to introduce special error handling for creating values of abstract bases and also been thinking about some kind of private_cast. Even the better errors are mostly just throw SemanticError. I need to factor out errors so they are a property of expressions or maybe also types that you can query and ignore if you want to.

For Clang types, I really managed to improve the description of the header a type came from. Now it should be almost always accurate as to the header the user actually requested to include from their Wide program that resulted in that type.

But, well, I got many things I need to do. At least this one will mean a break in fighting Clang for a while.

Wednesday 27 August 2014

Buffing up arrays

My original implementation of arrays was a bit haphazard. I tried to treat them like any other value type. But LLVM IR brings it's own array fun. Particularly, you cannot random-access an array value in LLVM IR, which is a bit problematic. Furthermore, since C++ expects arrays to always be in memory, I've decided to reclassify arrays as an "Always keep in memory" type.

I also need to confirm that my array type interoperates correctly with C++ arrays and in addition, I need to implement that bounds-checking thing.

After that, I need to
  1. Fix the Clang layout thing
  2. Extend the modules thing
  3. Add better module imports. May want to use keyword "import".
  4. Update documentation.
  5. Contribute some stuff to Clang and issue some Clang patches.
Peasy, right?

The Clang people don't seem to be too interested in making the Codegen headers public. Alas. It seems that they'd be perfectly happy to accept a public interface that looks something like Wide's Clang->IR layer, but I'd have to build and contribute the implementation first, or at least the interface. I might have this done in time for 3.6 or a hypothetical 3.5.1, but the bottom line is that for the foreseeable future, building and working on Wide will remain a bitch.

Tuesday 26 August 2014

Codegen time vs Analysis time

Recently, I've been fighting a lot with the analyzer, mostly about not what gets done, but when.

Analysis-time:
Getting the order of operations wrong leads to a lot of infinite recursion and cross-dependencies in the compiler. Also raises questions about how effective stuff like warnings can be.

Codegen-time:
Shit breaks with Clang.

Some situations like vtables don't fit well into either of these buckets. The problem is that computing the vtable lazily at analysis time, as the usual strategy, means that the offset from base to derived must be known at that time. This plays badly with converting to Clang-type for layout purposes. Yay.

But doing it at compile-time implies that I have to analyze which functions override which others at compile-time, and when it comes to for example implicit conversion from A to B, then this implies that when analysis is finished I don't know all the functions I'm going to call and they haven't been analyzed yet, which bears badly on my ability to implement code support features.

Right now, I've been experimenting with getting Clang to lay out all Clang-compatible types for me. The problem is that the old layout code, where I duplicate the Itanium layout code (sometimes incorrectly) only required knowledge of the members. Converting to Clang type requires knowledge of, well, everything.

Turns out that Clang's ExternalASTSource is sweet, sweet sex. I'm an idiot. Let's see how far this greatness can get me.

Tuesday 19 August 2014

Access specifiers

The more I think about this, the more I dislike C++'s "private".

If the Clang guys had made their members private and friended the classes they needed, instead of poorly just not shipping the headers in /include, it would be impossible for Wide to exist.

It seems that I simply have a talent for working with existing rules and bending them to do things they were never supposed to do, like exception vomiting, using unique_ptr for file descriptors (I keep being told this is impossible... useless suckers), and now this stuff with Clang.

For example, I know for a fact right now that the LLVM/Clang codebase contains a re-implementation of std::vector because the existing std::vector doesn't do small-allocation stuff. If only std::vector were more open, they would not have had to do that. They could have just futzed with the internals in a somewhat unsafe way.

And protected is even worse. It frustrates me that protected methods and fields can only be used on "this". I have several functions where what I really wanted was to be non-public but accessible to derived class implementations.

At the moment, "protected" in Wide is the revised definition, but I've been thinking about how I might want to nerf private. I've been thinking that I might introduce specifiers like module private/module protected/module public, where other types or functions defined in that module can get extra access rights.

I've also been thinking about adding a kind of access_cast to the language that ignores accessibility-specifiers. But that would entail Improved Error Handling™.

Just as a side note, it occurred to me that I'll need field attributes for fine control of binary module exporting.

Thursday 14 August 2014

Working with the Clang guys

I've officially begun the process of co-ordinating with the Clang devs to fix the stuff that needs fixing and hopefully push the Wide supporting APIs to be part of Clang proper instead of being hacked on top of their private internals. This will hopefully result in Wide being more robust, smaller, much easier to build, and also, I'll be getting some OSS commits for respectable projects on my CV, which will definitely help matters out.

The downside of this is that it may hinder future development on Wide directly, since I'll be working on fixing Clang instead.

The next thing that's Full Steam Ahead™ for Wide directly will probably be the first revision of modules. I have a few tricks up my sleeve, but for now, I'll be only aiming to handle non-generic code. It seems that unfortunately I'll have to add yet another dependency, or more likely, two, for this. Glorious. In a not-at-all kind of way.

But before doing that, I've got a slew of jobs to apply for, I just finished updating my website, and I've got a bunch of other stuff to do.

Tuesday 12 August 2014

More Clang limitations

Welp, just ran into a new Clang problem. Turns out that they can't handle multiple codegen units going into the same module.

The exacerbating problem is that I simply can't really patch Clang. My machine is too weak to feasibly rebuild and test the damn thing because it's so big, and even if I succeeded, requiring patches to be applied would only make worse an already problematic build process.

I'll probably just have to employ a hack that requires a bit more grunt work- manually copy from one module to another the Clang declarations/types/etc that I need. This is gonna suck tremendously.

Monday 11 August 2014

Purity

Today, I had an idea.

LLVM has this really annoying verification function where when it fails, it dumps an error to stdout and then terminates the process. This is super irritating, partly because if Wide is, say, running as part of a VS extension, then there's nobody to see that error.

The key here is that the guy who wrote that part of the LLVM code made an assumption. He made an assumption about how the library would be used- as part of a command-line application. This assumption is obviously bad. This is why for Wide I'm choosing to restrict stdin/stdout/stderr.

But what I suddenly realized is that this is a vague match for that endless purity bullshit the Haskell guys like to spew. The libraries should not perform I/O for themselves, but only request data from higher up. Only the final driver author knows how the final product is used. He knows whether it's a command-line app or a VS extension or a web service or library for someone else to consume arbitrarily.

The problem seems to be that they simply express why things should be this way for totally the wrong reasons. Instead of shitting around about currying or mathematically correct functionality or some other random thing nobody cares about, they should get to the real reason why restricting I/O and side effects is a good thing- because it makes programs more maintainable and helps enforce separation of concerns. If they just gave an example like that, it'd be way easier to buy in.


On the level of implementing any given function, I tend to use mutability a lot. But on the level of interface design, I'm increasingly trending towards immutability, purity, and a bunch of other things.

I'm definitely still not feeling positive towards Haskell, though. Too much wankery, too little choice. Purity and immutability are tools, nothing more, and they are to be applied to the right situations, and nowhere else. Being forced into them is no better than being forced into using inheritance for everything in Java. But hell, if you made a Haskell where you could mutate function-local variables, I might start buying into that shit.

Sunday 10 August 2014

New build up on Coliru

Finally found the source of my Linux build failures. Turns out that LLVM's process API simply doesn't work for parallel invocations on Linux. Thanks, LLVM. I should look into submitting a patch for that. As a result, I finally have a new build up on Coliru. It's time for me to revisit the contents of my website again. And maybe stick a link up to this blog.

The general new features are the thunking improvements I discussed previously. I also first-passed delegating constructors. One thing I need to fix is that when I changed member initialization syntax, I changed it in such a fashion that you can no longer initialize a member with multiple arguments /whoops.

In general, I feel like I want to clean up handling members. Right now, there are too many indices going around which are converted all over the place. I'm fairly sure they're all correct but it's too easy to mix up one unsigned value with another.

So right now I'm thinking most about modules and ABI. The real trick is going to be what happens when there's an ABI mismatch. I have a few sprinkles of magic I'm planning to add.

Friday 8 August 2014

C ABI- done

Right. I kicked Clang into generating C ABI function calls for me, as well as generating thunks for me. Now all the things work correctly and according to the specified ABI. But I'm kinda nervous about how I'm initializing my Clang parameters. There are some values that I think should be initialized for a given target, but aren't.

Also since TeamCity is down, I can't test these changes on any plat except Win32 MinGW32, which is leaving me a little iffy. So there may yet be work to do in this area.

I also found a bunch of unrelated bugs and fixed them.

Not quite sure what to do next, but I think I might look at modules. I have some plans for Wide ABI and I'd like to look into implementing them.

Monday 4 August 2014

C ABI, attributes, and attribution.

LLVM and Clang's function handling is a bit problematic. The function types don't express anywhere near everything you need to call the function properly. They have a bunch of attributes that you have to handle manually (including calling convention!).

But I've realized that the real problem is that I don't know the C ABI. I've been assuming that the LLVM layer handles it, when in reality, it does not. This could explain quite a bit of the deficient behaviour/plain WTFery I've observed from Clang (but far from all of it). However, once I've got this under control, I can finally and confidently erase whatever Clang does and just do my own shit without having to worry about ABIs and why the fuck does Clang generate that code there.

I also added llvm-credits and llvm-licence to my repo. I probably need to change the deploy script to include them in the build.

Saturday 2 August 2014

Function arguments

Today I mostly cleaned up function arguments. I fixed exported members so that they can have an implicit this. I fixed member functions with an implicit this so that they don't generate a new member function body for value/rvalue/lvalue. I fixed overload resolution for exact match preference as well as is-a preference so you can now overload for value/rvalue/lvalue. I reduced a bunch of code duplication by making various methods available on the Analyzer. I fixed non-static functions to generally be much more reliable.

I still have mysterious test failures on Linux. I had problems reproducing the issues, there's clearly some undefined behaviour in there somewhere. Run the test directly- success every time. Run them from the driver and randomly 100-150 of them will fail. TeamCity is down right now anyway so I can't farm out Linux builds.

I've been looking into debug information and there's both good and bad news. The good news is that it looks like it could be fairly easy to implement basic debug info. The bad news is that LLVM's debug info intrinsics are somewhat broken, so there's a limit as to what I can do with them. The other bad news is that I'd have to pretend to be C++.

What I'm really thinking right now is that under certain circumstances, I'll forbid C++ conversion. There are just a few too many questions about how I'd implement various Wide features with no C++ analog.

Thursday 31 July 2014

Operators

Before handling virtual functions, I fixed up operators. You can now export operators, you can access operator members and use them as expressions, you can access operators statically, etc. You can also have dynamic operators when before they were banned.

I need to switch to Ubuntu to try and figure out why 209 tests suddenly failed on TeamCity, though.

So up next I promise: virtual functions. I already added support for final classes.

I also removed "auto" as a feature. Previously when doing type-inferred arguments, you could use "auto" as the decayed argument, so you could do e.g. "f(arg := auto.lvalue)" that would only accept lvalues. I removed this feature because it's redundant in the face of concepts and complicated the implementation for little benefit.

I'm also hunting to reduce ABI dependence. The MinGW x64 ABI is quite similar to Itanium but the EH is different. This obviously isn't a big deal for me, pretty much all the work is done on the LLVM side and not the Wide side. But in general, I have a bunch of stuff which is full of Itanium-specific details, like layout, vtables, RTTI and such.

Also turns out that I totally mis-implemented Itanium ABI for function calling, so parameters you pass by value to C++ functions will be incorrectly not destroyed, and there are other destructor bugs w.r.t. this misinterpretation that I wish to fix. It also makes my life easier w.r.t. elision in many ways. I'm a big fan of this and this fix because it only removed complexity from the compiler.

Wednesday 30 July 2014

Virtual Functions

Virtual functions are up next. I'm talking about final, override, abstract. Also clean up the internals to be less Itanium-ABI specific, and some general clean-uppery going on. After that, I will probably look at constants. I have an idea for how to handle them infinitely better than C++ but we will see what LLVM can support in this regard. And maybe include dynamic inheritance (virtual inheritance).

I've also been thinking about including & and | as type composers. | is more of a Concepts feature, but & I've been thinking can be useful for regular inheritance. This would effectively compose base classes to produce a new type, where any class that derives from both base classes is considered as deriving from this new type. The implementation I would think as being not too unlike dynamic inheritance, except that dynamic inheritance is intrusive and this feature would not be.

Monday 28 July 2014

Personal details

So far, I've been pretty allergic to having my personal details be Googleable. I get too much random crap already. Hey, puppy, come work for us in Amsterdam, £45,000 a year doing ASP.NET. Send CV if interested and don't forget to ask your friends. Yeah, right. So you don't know anything about me, who my friends are, you want me to do your job for you, and could I please come work for you? Kindly go shove it.

But I guess that I'm just going to have to learn to ignore them instead of shouting at my monitor in rage. For now I really want a job. Not that kind of, I'd like a job but I'm not sure how sick I am or am not and I didn't commit enough to any particular project to have anything to show for it. I really want a job, I'm healthy enough to do it, and I've got something to hit people with- a project I'm doing that's complete enough to clearly show it's potential.

So I guess that it's time to stop hiding in my corner and start really selling myself and learning to either ignore or slap those super annoying spammers.

Friday 25 July 2014

Windows Phone

My old iPhone died. I bought a new Nokia Lumia 630 with Windows Phone 8.1.

Boy, this shit is incredibly fucking annoying.

You can't even add a local contact without it having to be synced with some bullshit cloud account that totally doesn't need any of this crap. And Microsoft neglects to mention that you can't erase your bullshit cloud account from the phone later on. And every contact has to be a Person. It's not a Person. It's a fucking phone. A landline. Shared between FOUR people. So kindly do me a favour and fuck off with your endless cloud bullshit Microsoft. Just put the name and number on a text file on the local storage. How fucking hard can that be?

Thursday 24 July 2014

More work on defaulted members, more bugfixes

Reworked the Clang object support layer to be more reliable (caused some bugs which are now fixed and simplified the compiler). I'm looking into simplifying a bunch of the compiler-generated functions stuff. For example, I know that their EH is already totally bugged. I need to simplify some other stuff too, like there's many places where I repeatedly get the function argument type analyzed. And I need to implement AnalyzeExpression in terms of AnalyzeCachedExpression if I can. And fix up statements to be Function-independent.

And the operator handling still feels like a hack. I need to introduce t.operator+ and operator+ as valid expressions and, I feel, generally unify the handling for identifiers and operators in a more useful fashion than my current hackitude.

I've also been thinking about introducing explicit operator overload set unioning, and explicit ADL requests.

Most importantly, I feel like I've defined my first milestone, Phase One. The objective of Phase One, most simply, is to reach something like feature parity in the compiler. This will probably involve some variant of concepts later, but for now I'm mostly looking at smaller features like default, delete, override, etc. Emitting some DWARF debug data would be great too.

Finally, I've been thinking about creating some functionality as pure extensions. I want to show that my compiler design really is modular and extensible. I might start with a basic feature like properties.

Sunday 20 July 2014

More docs, a minor feature, a bugfix

I mostly tweaked my website and added more content today. I also fixed a couple bugs and hopefully introduced defaulted constructors.

I've been kinda plotting modules. I've been thinking about what form I need generic functions to take. LLVM IR is too low to preserve Wide's semantics. But I'd rather skip having to have an AST or something. Maybe for the immediate future I'll just ship the source.

I feel totally grump. I feel like I don't make much of a dent in my giant list of things to do and my source code is a terrible mess. I've been thinking and my list of things that I need sums up as "Everything".

At least a diagnosis isn't on the list anymore, I guess.

Thursday 17 July 2014

Examples and tests

Producing more examples is really helping with the testing. I found two more bugs today, one of which is fixed. I added a couple new far-from-final tutorials. But as expected I spent most of the day running around. Hopefully tomorrow will involve more code, less running.

Monday 14 July 2014

Modules - concepts

I realized two important things today.
First, the linker is my enemy more than the compiler.
Second, modules depend on concepts, but they can do more than I had ever imagined. I am a Cylon and I have a Plan.

I have ordered a wireless adapter so that I may have reliable internets. After that I am going to make mentoring a shot. I have a webcam because apparently seeing my face is so essential. It would be super nice to supplement my income.

Currently, I am fixing up CLI. Not building all in my project has made things a bit mediocre w.r.t. forgetting to build all the stuff. I also figure that I can solve my test times problem by moving to x64, where the x64 release version of Wide should be just fine w.r.t. testing times (also invoking the operations asynchronously should help).

Saturday 12 July 2014

First-class website

I've decided that one of the primary reasons my website efforts have failed in the past is for two reasons.

One is that I always considered them as a separate project. They were an optional extra, not a core part. This is something I'm looking to change right now. I want to integrate my website build into my primary Wide solution so that everything Wide will be in one repo except the C# addon. It's notable to see the decay in that project too. But to begin with, I've officially added the website code to my LOC-count command, which now stands at nearly 21k.

Second is mostly that I simply didn't have an implementation functional enough to discuss or tutor anyone with. My current implementation is a lot more useful in that regard. It still has a long way to go in every aspect. I've discovered that writing the examples really helps me see what's missing- like when I wanted to demo the return type inference and got stuck up against boost::optional. It also produces more tests, which is greatly useful for me.

Trying to build examples on the Wide CLI has shown me that my error handling desperately needs attention right now. I'm trying to build a sample and "Fuck" is the whole error message. I had to grep the source for "throw std::runtime_error("Fuck")" (got way too many hits) but I'm pretty sure I figured out the problem. Boost.None is a member function pointer and as we all know, the current limitations of the implementation are that it cannot handle implicit conversions from types that are not a (reference to a) ClangType to a ClangType.

This means futzing around with one of the most problematic areas of the compiler- the IsA function. Ho boy.

Bugfixing

I've been fixing a whole bunch of bugs. It's amazing how many you can find when you need to write examples in the language. For example, if you have a C++ type which is not a struct/class, then it crashes the compiler. That one's still in but I managed to convert member function/data pointers to Wide types, so stuff like boost::optional works now (boost::none is a member function pointer). I fixed a bug where issuing an error would crash the compiler. I fixed a bug with lookup of exported functions. I fixed a bug where some integral conversions would crash the compiler. I fixed a bug where attempting to access a nonexistent macro would crash the compiler. I fixed a bug where using member function templates or constructor templates would crash the compiler. (starting to see a theme here...). I fixed a bug where a const (const char*)& would cause spurious OR failures.

I fixed a bug in my test harness that caused spurious test successes (mostly failed on Linux). I reported (and had fixed) a couple bugs in coliru so that when I update Wide, it updates more readily.

But one of the things that concerns me the most is testing times. Now that I have way more tests involving the C++ Standard Library, it's becoming very problematic to test on Windows. Instead of taking like 10-20 seconds to complete a run, it's now like 3-4 minutes. I can reduce this problem by running the tests in parallel, but it's still going to be a bitch as I add more testing. This, aside from anything else, may drive my primary development platform to Linux. I dislike Linux but boy, the tests run a lot faster there because you can link release Clang even with debug symbols. I would test in Release on Windows but LLVM and Clang don't seem to play well with hosting both x64 and x86 objects simultaneously and the x86 release premake is broken...

As you may expect, I've also been writing more content for my website and fixing it up. The navigation is, I feel, a lot nicer now, some of the content is better, and I've created more examples.

Monday 7 July 2014

More website stuff

I rigged up my website to now have a live Wide compilation service backed by Coliru. I also wrote a few reference pages. I still suck at website design but it's a lot better than before. Now I need to write oh so many more reference pages. It helps to have more to actually reference.

I'm more thinking about what direction I want to take the tutorial in, though. Right now since Wide has only a tiny Standard library of it's own and most library functions are imported from C++, I feel like it would be adequate to target it at people who already know C++. If you're not comfortable at least discussing C++ then you won't get far in Wide right now anyway. So I figure that the first order of the day is a quick tour of the C++ interoperation facilities.

Writing the reference makes me think more about what the semantics are, and having them written down is useful for me.

Friday 4 July 2014

Other stuff

I've implemented default destructors, other default/delete stuff will be more painful. Had my surgery on Tuesday, been slacking all week to let the holes in my gut heal. Apparently, I have to go back to sickness claim for these seven days, then go back to jobseeking claim afterwards. What a total waste of time and paperwork.

I've decided to take a brief break from Wide to work on my website. I've moved the hosting to Github pages but I need to create content for it. I need to spend more time properly advertising my work. In addition, the GitHub for Windows program seems pretty nice, I'm not quite as familiar with it as TortoiseHg, but it might be worth seeing what the Git integration is like for VS. Apparently it's easier to attract contributors on GitHub and Git seems far more popular than Hg, and having the VCS baked straight into the IDE seems good.

I've also purchased a webcam. I've heard of various video code mentoring/assistance/etc services that pay the helpers, so it's worth checking into whether or not there's money to be made there. Supplementing my meagre income would be nice, even if it's not enough to support me on its own.

Sunday 29 June 2014

Expression ownership (again) and destructors

Had a bug where an expression's type inexplicably became a Function rather than a UDT. Suspected ownership bug.

Had to switch all expression ownership to shared_ptr. Fixing destructors to occur semantically, as is correct, instead of being collected at codegen time. Lots of messy ABI details like type complexity. And I'll have to change AggregateType's functions to be real functions instead of simply complex expressions.

Yay.

Saturday 28 June 2014

Deleted function semantics

I've been thinking about deleted functions and they have some undesirable results in C++, I feel. Right now I'm thinking that a deleted function will register as a general OR failure, not a specific hard error. So if you explicitly delete a copy constructor, the type will register as uncopyable.

Today I finished rebuilding my parser. Now it's 800loc saved, more extensible in ways that actually matter, and the error handling will be better when I pass over it.

Now I'm thinking about stuff like dynamic destructors/operators, abstract/final/override, and function behaviours. Here we mean stuff like throw, rethrow, return, terminate, what you can or cannot throw, etc. I'm not sure what the most efficient way of expressing this stuff is.

You can either promise not to throw, promise to throw, or say you might throw. You can promise to throw nothing, throw one of X types, or throw anythin. You can guarantee to return, might return, or guarantee not to return. You can rethrow, not rethrow, or might rethrow. What I'm really seeing is that three of these are "Will, won't, or might", and the fourth is just what you will/won't/might do in more detail. The default here would be can throw, can rethrow, can return, and may throw anything. These attributes will aid in CFG (control flow graph) computation, which can mean generating more efficient code, and giving more accurate warnings and other stuff.

Wednesday 25 June 2014

A good refactor

Reduced code size? Check.
Fixed bugs? Check.
More extensible? Check.
New functionality? Check.
Better error handling? Check. Well, partially. It's a smidgeon better and will get a lot better. I had to rip out a considerable quantity of it since it was broken.

Today is a good day to die refactor.

Sunday 22 June 2014

Cleanup in progress

I've cleaned up a few things that were annoying me.

I fixed exporting to use a C# style attribute syntax, so now you use

    [export := header.function]
    f() {}

I fixed constructor member initialization syntax to support identifying bases by type.
I fixed resolving overloads based on lvalue/rvalue w.r.t. ref-qualifiers in C++.
I fixed auto-detecting Linux includes (boy, not having that was super annoying).
I implemented exception re-throwing with throw;.

Next up on my list is parsing and lexing. My current parser and AST are fairly kludgy- their design is from way back when I parsed multiple files directly into the same AST concurrently, didn't support operator overloads, and the parser's lexer interface is from years ago also. The lexer itself isn't too bad but I just need to alter the token types a bit.

The main reason for this is that the parser doesn't support dynamic destructors, or dynamic operators, or defaulted/deleted functions, etc, and the design is non-conducive to being modified at run-time. There's also some duplicate code in terms of rules operating in terms of what other rules expect and the error handling is quite duplicated.

The new approach will be half table-driven, half recursive descent. And the AST/Builder will be changed to not stringly type destructors, constructors, and not indicate operators by token type, etc. The main change for the lexer will be that it will no longer be a token type enumeration. Instead token types will be indicated by a constant pointer (probably to std::string). This allows new token types to be added. In addition the lexing tables will be made members of the lexer instance to permit modification instead of constant as they are now, as will the parsing tables.

Friday 20 June 2014

Slump

I'm in a bit of a slump right now. Was hunting for a job and I thought I'd found something but seems like not. Now I'm kinda off my game. None of my music feels right and I can't seem to get into the flow. My Internet also keeps failing. I haven't really coded anything in the last couple of days, just watched The Matrix and the few good minutes from the sequels on repeat.

Found a bug where tuples (and presumably also lambdas) are incorrectly rvalues instead of values when created.

I need to create some sort of more serious project management. Right now I just have "todo.txt", and I checked it today and it's clear I haven't made use of it in months. There's stuff like "Add basic inheritance (no virtual functions)" and "Add exceptions".

It's time for a cleanup. My parser and lexer code is both bad and non-extensible. My AST is pretty poor in many regards like for example representing constructors and destructors with string names, operator overloads with token types, etc. This has got to go. I need to replace all those runtime_errors. I also need to fix all those places where I don't error but LLVM will crash the process with type errors, like if you export a function but define it with the wrong signature.

Thursday 19 June 2014

Optional

I'm looking to move optional(t) over to Wide. To achieve this I'll need three new language features- library is-a, since null is-a optional(t), aligned storage, and boolean testing. Aligned storage means supporting the attributes. Right now, I'm thinking of something like

    template(t)
    [align := t.alignment]
    type aligned_storage {
        storage := int8.array(t.size);
    }

This is stealing the attribute syntax from C#, which is my current idea to replace function prologs with the primary advantage that they are less noisy and I can consider extensibility in the future. Perhaps I could consider permitting an attribute directly on to a data member, so for example,

    template(t) type Optional {
        [align := t.alignment]
        storage := int8.array(t.size);
        // other stuff
    }


Wednesday 18 June 2014

Arrays, code cleanup, and stdlib

I implemented some basic array stuff today. It's boring and easy but also fast and new feature. I added a couple simple tests for it. I also want to do a code cleanup pass and did the first part of that today. The problem I'm looking to address here is that when debugging Wide functions, there's a huge amount of noise, and the functional logic is lost. This is because Wide constructors can only operate in terms of a "this" pointer, even when in reality it's just going to be loaded to produce a value right away. There were other cases when I unnecessarily promoted from value to rvalue too.

Secondly, I've got my first language feature that should throw an exception- array indexing. I'm fairly confident that LLVM can handle optimizing the array index bounds check out. Annoyingly, LLVM cannot dynamically index into an array value, which totally throws the whole value thing out of whack. Right now, I just copy to the stack every time... LLVM can optimize out the repeated copies, I'm fairly sure. I'm also going to offer an unchecked access so you can use that if the optimizer's not good enough. The problem with this is that unless I want to define the exception type in the compiler, I need my Wide Standard library available during testing, which is going to make life ... fun.

I've also been thinking about some slightly more complex transformations, like maybe yield return. Semantically, this transform is not too hard- just shift the locals that you need from allocas to member accesses, and add a member for the current "state". The trouble is that returning would implicitly mean returning an optional, which would again mean making the Wide stdlib available during testing. Another trouble is that pointers/references to the local variables can't really be trusted, but I guess this is already true of lambda captures.
 
I've also got to clean up stuff like attributes, introduce library is-a, and such.

Sunday 15 June 2014

Bughunting and feature drive

Today I fixed some bugs, but more importantly, I decided on my next core feature drive. And that drive will be for modules and ABI.

Right now, Wide offers a relatively ABI-independent interface, in theory. I want to tighten that up so that the Type interface is properly ABI-independent and remove Itanium helpers. I want to support laying out types according to more than one ABI. I want to be able to ship headers with modules. I want to be able to handle dynamic import/export. I have plans for how some of this stuff can be achieved.

Before that, I need to work on bugfixes, since presumably I just introduced a few hundred (thousand) of them. And a long time ago, I wanted to handle incremental analysis, re-implement error handling, ... the errors issued in the new features are all just std::runtime_error, and half the failure conditions are probably either ignored leading to a compiler crash, or asserted.

But incremental re-analysis has taken a more serious back seat, and it's because Clang can't handle it, and it also can't handle analysis -> codegen -> analysis. This puts more serious roadblocks in the way of supporting those features myself. I particularly dislike not being able to code generate more than once from a particular Analyzer instance. There are only a few cases where I would need to modify my code to support it, but I can't do it because Clang cannot handle it.

Ultimately, I'm just a one-man shop with other things on my plate. I need to hire more help.

Saturday 14 June 2014

Caught an exception

Last night I caught my first exception. You can't rethrow, you can't catch anything but ..., and the compiler crashes if you try to insert code after a throw in a try. But it works.

EH intrinsics on the LLVM level are pretty broken. Fortunately, one of the ways in which they are broken is coming up puppy.

I also implemented but have not yet tested special semantics for destroying members in constructors that throw.

I think that catching non-... things is the most useful feature to add next. I think that in theory, it's a relatively simple deal now that I have the rest of the infrastructure done. After that, rethrowing. After that, test test test.

I actually kinda... don't know where to go now. I didn't expect exceptions to be so simple. I practically spent more time on RTTI or Itanium-compatible layout. 

Wednesday 11 June 2014

Payoff

All the investment I've put into refactoring my core systems is paying off.

I refactored UserDefinedType's GetClangType, which now accepts without error all UDTs in all test cases.

I threw an exception from Wide and caught it in C++.

The core remaining feature is implementing destructors in case of exception and catching/rethrowing in Wide. After that, it's test, test, test for the new ABI features.

Even with the new Codegen cleanup that removed a lot of code, I'm now ranking over 19k loc. Seems just a while ago that I was barely breaking 18k. I feel good.

I've discovered that there's quite a number of features that got silently cut. For example, it used to be that you could use !() to pass explicit template arguments to C++ functions. I've discovered that there are now literally no types that respond to !(). It would have to be OverloadSet that handles this, I feel. Another example of a silent feature cut is OverloadSet conversion to C++ type.

One thing I'm minorly concerned about is unused functions and C++ type conversions. Converting a UDT to C++ requires exporting the members, which counts as a use of those functions, even if it turns out C++ never calls them or exports them. This is particularly problematic since getting RTTI (which is done for all types with a vtable) first attempts to do it by converting to C++ type and asking Clang to work out the RTTI for us. Only if this fails do we compute our own RTTI.

So, first exceptions, then maybe a couple cut features, then test test test.

Destructors

I refactored destructors today. The new algorithm is substantially superior in every respect- it's simpler, it's smaller, it's faster. I also introduced the CodegenContext that can make refactoring code generation easier and simpler in the future.

But the core benefit was making it EH-ready. Well, not exactly EH ready, but not too far from. I also had a quick peek at clang and CodeGenEH is only 68kb or so, which makes me feel better about the probability of Itanium EH being relatively easy to implement.

I also fixed a couple bugs and found an important and unfortunate new class of potentially compiler-crashing error. Itanium ABI says that vtable layout depends on function return type, which depends on function body, so any dependency from a member function on the vtable layout means assertion failure. I have removed dependency for calling other virtual functions but there are probably other ways in which a member function can request the contents of the vtable- constructing a new object of it's own type being the simplest example.

Monday 9 June 2014

Extensibility

One thing I've been thinking about with regards to Wide is how to enable its use as a library. My experience working with Clang was ... questionable in this regard. So far I've been thinking about how to handle extending Wide.

Currently, anybody can inherit from Type, and anybody can add a special member to a module. This is how C++ support is implemented. And in addition, anybody can inherit from Semantic::Expression- Wide is not picky. Although you can currently only generate code once, this is something that is not a core limitation- generating code multiple times from the same analyzer is something I will fix in the future and it hopefully won't be a big deal.

But when it comes to adding new AST expressions or statements, I've got no plan. Adding a new AST expression to Wide consists of adding a manual dynamic_cast in the analyzer implementation. I'm thinking of a new trick- use a type switch. Something like the following:

    class Analyzer {
    public:
        std::unordered_map<std::type_index, std::function<std::unique_ptr<Expression>(Analyzer&, const AST::Expression*)>> expression_handlers;
        std::unique_ptr<Expression> AnalyzeExpression(const AST::Expression* e) {
            if (expression_handlers.find(typeid(e)) != expression_handlers.end())
                return expression_handlers[typeid(e)](*this, e);
            throw ...;
        }
        Analyzer() {
            expression_handlers[typeid(AST::String)] = [] { ... };
        }
    };



I'm thinking of using a similar trick to handle extending the parser. This way you can add new expressions (and something similar for statements) at run-time, as well as new types and such.

Today I've hunted down the last detected bugs from the Itanium ABI switchover. Once I finish up dynamic_cast, it's time to make preparations for Itanium ABI exceptions. Oh boy. Then test, test, cleanup, test test cleanup cleanup, etc.

Sunday 8 June 2014

Itanium fun

So I've been working on implementing Itanium ABI layout. I've determined that many places in the Wide implementation assumed that every member had an associated LLVM field, which is not true in the presence of the EBCO mandated by Itanium. In addition, I had to implement a few new members and move vptr handling to AggregateType. Previously, there was a bug where since officially, the vptr was a member of the type, then stuff like generated copy assignment operators would copy the vptr (very bad!). Now AggregateType should respect the fact that the vptr is special.

In addition, someone in #llvm pointed out that if I didn't follow Itanium's layout rules, I couldn't use their dynamic_cast implementation, which makes assumptions.

Also, the ClangType implementation of constructor field locations was just totally broken, as well as my handling of EBCO- it was totally non-compliant.

I also had bugs in derived-to-base conversions where a null derived did not lead to a null base. I don't believe I have a single test that actually performs derived-to-base conversions on pointers, although the conversion for references is implemented in those terms.

I also cut down on my overhead by moving some common base-class related functions to Type instead of UserDefinedType and ClangType. There's probably more work I can do in this area, but some functions I'm not comfortable with moving down because they make assumptions about the ABI involved. I know that I don't support any ABI other than Itanium right now, but I'd rather not hardcode that fact into my base-level interface. After all, back when Wide and Clang had diverging ABIs, the basic functionality held together exactly because the Type interface is ABI-independent- except vtable layouts, which currently have their Itanium helper interface coded in the Type interface, which is bad.

I have several Type functions that should probably be static or hell, just non-members. But I'm powering ahead now until exceptions. When I have Itanium ABI exceptions, I'll take a break from new features and clean up/test everything. At least, that's what I promise myself so I can sleep at night. As long as my existing test base passes, it's More Feature Time until I have exceptions.

Saturday 7 June 2014

Vtable layout- thanks Itanium ABI

Came across a slight fun factor today- namely, that derived classes don't get their own vtable, but they often need one.

In Wide's current vtable layout model, each class has a vtable listing all the dynamic functions it has, regardless of source, and then we add offset to top and RTTI pointer to that. For calling dynamic functions found in the base class, we convert to the base class pointer and look them up through the base class vtable.

However, this leaves us a problem with the offset-to-top and RTTI pointers, namely that we have a derived class which needs updated offset-to-top and RTTI but has no vtable of it's own. So if I have something like

     type base { dynamic f() {} } // offset, rtti, f
     type base2 { dynamic g() {} } // offset, rtti, g
     type base3 { dynamic h() {} } // offset, rtti, h
     type derived : base2, base {} // no vtable
     type more_derived : base3, derived {} // no vtable


It's pretty clear here that when derived and more_derived are constructed, they need to set new offset and RTTI pointers in their base classes, which currently they do. The problem comes when implementing RTTI and dynamic_cast for derived itself, as it doesn't have a vtable carrying the necessary data. For RTTI I can probably poll for any vtable, as they should all have the same RTTI entry. offset-to-top is more problematic because every base has a different value, and that value would need to be adjusted depending on where you got it from to account for the derived class's other bases.

In addition, I could consider adding vtable slots for inherited virtual functions. There is an argument that in some circumstances these could be more efficient.

But if I move to Itanium ABI then that whole primary-base thing will take care of this, so I think that today I will simply do that.

Friday 6 June 2014

Standard library smoke tests

Had an interesting experience today. I wanted to show off my typeid() support, so I put a test up to Coliru. Instead of returning true I output it with cout. Imagine my surprise when this failed. printf() is one thing as it's variadic and I don't explicitly support that just yet. But there's no reason why a simple std.cout << true should fail.

This is down to lack of something that robot termed "smoke tests". With the new typeid() support comes a new testing constraint- the test environment must have a working copy of the stdlib, including headers. Previously I didn't need the headers, only the symbols, which on Linux I acquired from my own process (srsly) and on Windows I loaded MinGW's libstdc++. This is why I didn't have any tests interacting with the C++ or C Standard libraries.

But now that I require that the headers are available anyway, then I may as well introduce tests that check that Wide can successfully interoperate with the C and C++ Standard libraries. For example I well recall having an unusual problem getting malloc() to function.

What I'm not sure about is how to construct this driver. For example, if I wanted to check that std.cout << true executes correctly, I'd .. what, redirect the process stdin/stdout and check their contents? Smoketesting other stdlib features seems simpler. Then there's tests for warnings which I still haven't constructed yet.

Thursday 5 June 2014

Had a funsie with vtables. In the previous implementation, vtables were only initialized if the more derived type had a virtual pointer. This was always the case when needed before because if you had any virtual functions you had a virtual pointer. Of course, with RTTI and offset-to-top implemented, you need to override the base vtable even if you don't change the functions that are called.

Now constructors always call the vtable initialization routine, and then if the type doesn't have any vptrs, nothing happens.

Furthermore, I now have a run-time dependency on the C++ Standard Library, even for pure Wide code. I have cracked up typeid() and a couple tests for it. dynamic_cast should not be hard- the routine is a library routine, all I need to do is implement a small Expression wrapper on top.

So the ABI checklist now looks like this:

  • Dynamic_cast (easy)
  • Exception handling (oh shit...)
  • Change layout algorithm to be Itanium-compliant (shouldn't be too hard)
  • Fix Wide types to be exposable even if they inherit (really depends on Clang)
  • Fix some Wide types exposure like overload set, lambda (shouldn't be too hard)
  • Look into MS ABI support (dunno)
  • Implement abstract types (should be easy I hope)
  • Fix deleting destructors. Right now they only destroy and that's bad.

RTTI- check

Just had a good implement of RTTI. I had a brainwave which is that if I always create the Clang aggregate TU, then I can then query it for everything I need, so types that have a Clang type can just delegate to Clang for their RTTI implementation. Most of the rest are simple "Use the RTTI vtables and add a simple null-terminated string" thing. This means that in principle, I can now implement dynamic_cast, typeid(), and begin work on EH.

I haven't exactly written many tests for it, though...

But when EH is done, then I feel like I will be on much more solid ground. Some MS ABI support would be nice too.

Wednesday 4 June 2014

sizeoff(), ABI, and code duplication

I'm looking at implementing more ABI stuff. Right now, I have the vtable layout stuff fixed, I hope- the vtable layout can now contain things other than virtual functions, like virtual destructors, deleting destructors, offsets (glory) and RTTI pointers.

What I'm really thinking about right now though is layout.

In order for Wide and C++ to communicate using a type, they have to agree on it's layout. You could not have a type where Wide thinks one subobject is in a different place. The problem with this is that Clang cannot lay out arbitrary types in the way that Wide can- it can only lay out Clang types, and it can only do so in the context of a particular translation unit.

The real problem is sizeof(). Since sizeof() is a constant, I have to know when you request the size how to lay out the class. If I lay it out in one way, and then Clang lays it out in another, I can't simply drop my own layout. I have to know beforehand. This means either strictly laying out all classes in the same way as Clang (sucks), and duplicating their layout code, or, change sizeof().

I've been thinking about introducing a new class of value- a semi-constant, you could call it. The value would not be a constant (since it's only semi) but not vary at run-time. There would be some language features that could accept semi-constants instead of constants- say, array size.

Another advantage of this would be that strictly speaking, the code would be more platform-independent. One of the reasons that C++ is not platform independent right now is that when you use sizeof(), it has to tell you the size. You can't port the IR output of Clang from x86 to x64 because the sizeof()s will be incorrect.

But in principle, a hypothetical Wide VM could use the same LLVM IR across multiple platforms. There's already work in this direction with PNaCL.

Quick edit: First, LLVM's array types take only integer values, not constantexprs, so that would be fun. Second, turns out that Itanium ABI specifies a bunch of secondary virtual tables which are complete duplicates of the primary ones for ... some reason. I didn't have these secondary tables. This just shows that I really, really need more tests. But onwards and upwards, as they say. I will stop implementing new features when I have Itanium-compatible EH. And my code count is now 18,500. Feel the growth.

Thirdly, my laparoscopy is scheduled for Monday. If the surgeon gives me the all clear then I'm done, done, done.

Tuesday 3 June 2014

Implemented basic constructor exporting: check.
Implemented basic destructor exporting: check
Fixed Coliru: check

TODO tomorrow: Virtual Itanium ABI destructors and make a start on RTTI, video maybe, definitely a lot of slacking and eating, dog cuddling and walking, you know. I've found that somehow life's more satisfying when you do stuff instead of cry about stuff.

What I really ought to do is UNINSTALL GAMES.

ABI support

I need to seriously consider how dependent I am on a particular ABI. I've been looking into adding RTTI and several parts of my code could not possibly handle another ABI. I've certainly been thinking about supporting Microsoft ABI as well as Itanium. Some ABI details Clang very neatly abstracts away from us. Some it does not.

Mangled names are one example of an ABI detail that I need virtually never concern myself with. Clang has a simple function to mangle the name, I use it, I'm done. The mangled name does not concern me in the slightest. There are a few ABI details for which this does concern me but they're quite limited and easily handled.

Class layout is (will be) another. Soon I can unify AggregateType and ClangType, and allow Clang to perform all layout for Clang types. This will simply be a question of setting the appropriate ABI and letting Clang handle the rest.

Vtable layout is something I do myself, which will require adjustment. Currently, based on some Clang APIs, for Itanium ABI I can perform a compliant vtable layout. For Microsoft ABI I'd have to rework this code.

Calling convention. Part of calling convention is handled by LLVM but another part is handled by Clang. I'm not quite sure why non-complex types are not handled entirely at the LLVM level but that's another question. I will probably have to duplicate Clang's code here (it's quite short) to determine the correct calling convention for C++ functions. For Wide functions I can use whatever calling convention I like.

RTTI will be completely ABI-dependent, as will EH. Clang contains some support routines for RTTI for Itanium, I'm not sure how solid they are for Microsoft ABI as their support for that is still under construction.

Just for reference, Itanium ABI is the one followed by GCC and Clang on nearly all platforms, optionally including Windows. ARM ABI is used on ARM processors and is a close derivative. As far as I'm aware, Microsoft are pretty much the only ones who don't follow Itanium ABI, on any platform.

I've been wondering about how to architect support for various ABIs. FunctionType, my class that handles calling functions, will probably need re-working to handle calling functions of differing ABIs, and thunk-handling code will have to be able to generate thunks for more than one ABI. For stuff like vtables, a single class can only have one vtable layout, but I figure that the base classes can have vtables in any ABI.

Currently, Wide does not take advantage of ODR- every TU's copy of a given type representation is a distinct Wide type. This is something I'd like to change but for sure, every ABI's copy of a given type is distinct.

The next thing I need to do in terms of ABI support is exporting constructors and destructors, and support virtual destructors. When this is done, I can move to RTTI and then EH.

For search paths on Coliru, I have decided to simply hardcode them into the Wide shell script. That will solve the immediate problem of not being able to use it as a demo.

Monday 2 June 2014

Clang- not designed as a library

It's becoming all too clear to me that Clang was not, in fact, designed as a library, except for some uses supporting Intellisense and such. Here's an unfortunate and simple example. Clang has acres of code (it's really quite a lot) to handle finding G++ include paths. But it's impossible to re-use this code in Wide because their structure talks in terms of Clang driver command-line arguments. So now I can't deploy Wide to Unix systems because I can't find the G++ include paths, which vary a lot more than you'd expect from system to system (why? who knows). Clang can find them, but good luck actually getting that to function when you're using Clang as a library.

This is a prime example of what I want to avoid with Wide.

Lines of code


On a more personal note, I love watching the lines. I run the command to check how many lines Wide is nearly every commit. It's not that I feel that this is some empirical evidence of quality. We all know that adding LoC means little. But when the lines of code grows a lot, I feel like I'm making progress. Just for reference, the entire LLVM Project (including clang, and some other subprojects) has 860,000 lines of code right now, and 18,000 tests. I have 18,000 lines and 140 tests. I guess this always leaves me feeling like the small fish in the pond (also that LLVM has nearly triple the number of tests that I do when accounting for codebase size).

Obviously I like to feel like my code is high quality. I don't mind making changes that reduce the LoC and I know there are plenty of good changes that decrease it and bad changes that increase it. Tests are included in my measurement so the more tests I have, the higher that value should be. But ultimately, as an entirely subjective feeling, I feel like I should be adding to the codebase's size.

I've been sitting pretty at about 17-18k for a while now. I guess it's a good thing that I've implemented many new features like inheritance without substantially increasing the size of the codebase, and since I've introduced automated testing (with many more tests to come, hopefully) the reliability is a lot higher. And now that I'm not horribly, horribly sick, I'm much more available.

What I really need to do is ensure that I spend less of my time chatting in the Lounge, shooting people in the face, flying spacecraft, or lynching people for being the Mafia, and more time working. Also job-hunting. That would be good too. Maybe I should ask Daisy to help me, she's always happy to make sure that my left hand isn't good for much.

Time to write tests for all those new features I implemented. And devise a test driver for warnings.

Fixing up some real-world issues.

Yesterday I took a big crack at fixing up the meagre stdlib and associated. Turns out there were a few issues that I don't currently test for that were missing. Here's the commit. Notable is that I have no tests for any of these fixes. Also notable is my lack of tests for warnings, my driver won't support them so more fun there, no tests for the lib itself, etc. Plus I need to look into exception handling (the joy!). I also need to fix things like overload set exposure to C++. And I also need to look into more ABI support- particularly for Microsoft, but also better for Itanium, including RTTI and EH.

Once I have RTTI and EH across all platforms, I can bootstrap Wide and that shall be a glorious day. Let's face it, right now the language definitely feels like C with Classes with a few nice extras on top like lambdas. Sometimes it's hard not to try and rush directly for these features. Rushing for features in the past has been a bad move for me, though. It's clear that I still don't understand some features I want, like incremental re-analysis, and the new semantic error handling model is a mile away.

On the upside the non-void falloff warning means computing the CFG, which can resolve a few issues. LLVM is amazingly finnicky about when it will and will not accept code. For example, consider the following program:

    f() {
        return true;
        return true;
    }



It's pretty clear that the second return will never be executed but you would think that it should be legal. Perhaps I will consider explicitly rejecting such code in the future. However, naively generating both returns to LLVM IR will result in an assertion failure, because LLVM will not handle more than one control flow statement per basic block. Using a CFG can avoid such problems because we can eliminate statements without any predecessors. We can also issue a warning.

This leads us to the next question. Imagine something like

    f(arg) {
        if (decltype(arg).size > 5)
            return true;
    }



It seems obvious to all that this can fall off the end of a non-void function- for some instances. Other instances cannot. We could discriminate at compile-time, but should we? Right now, the compiler will warn for this function for all instances. When I implement some constant folding (low priority) it will stop warning for instances where it's statically provable.

In short, it's pretty obvious that I need more manpower. There's just so many tests to write, so many new features to implement, and I need people to bounce ideas off. Monologuing into a blog only serves this purpose to some degree, and the LLVM chat only really suffices for lower-level code-generation stuff (thanks for the help on that stuff, btw). Plus, I don't get Cool Internet Points for working in silence in a corner.

I've been considering putting together a YouTube video or two about Wide. I don't know shit about animating or anything, but when I have another Unix build and upload it, I have my online compiler back again, which should make life a lot easier w.r.t. advertising the language. Just go here and play with this sample, and you can see how easy it is. Fixing up my VS addin (I doubt it needs much work) would also help in this regard.

Sunday 1 June 2014

Uses and analyzer design

I believe I've come to the next stage of analyzer design. It occurred to me that many of the problems I'm looking at have already been solved, by LLVM. Simply stealing their design would seem to be an appropriate solution here.

The problem I've been considering is thteefold. One, exceptions. Currently, I can only determine which destructors need calling at the Statement level. However, I need to be able to determine which destructors need calling at the Expression level in order to implement appropriate EH. Secondly, I've been considering the uses problem. For example, given an ImplicitTemporaryExpression, which is stored to, and I'm trying to load from, is it safe to elide the temporary and just take the value that was stored to it? Only if I'm the only user. This suggests that I need to be able to track who uses what expressions. Thirdly, I've been considering the problem of incremental re-analysis and such further. I've come to the conclusion that there are two different types of Expression. The first cannot change- it is an implementation. The second can and it would be a function. The key insight here is that first, I can represent these as different types in my analyzer. In addition, if the second is viewed as a function, then the arguments are all "metaexpressions" that are arguments- including their types, which are meta-expressions.

First, we observe that all expression dependencies form a DAG, or should do. Second, we maintain a list of those uses. When an expression's use count drops to zero, we destroy it. This gives us several things. first, the ability to find and enumerate all uses of an expression. Second, we can eliminate all those annoying ExpressionReference things. 

Right now, I am looking at issuing warnings through control flow analysis- e.g. flow may reach end of non-void function. But after that, it's time for another analysis overhaul... great.

I've had one last-ditch thought about the syntax, and I may just introduce attribute syntax from C#- say something like

    [export := "name"]
    [export := cpp("main.cpp").print]
    [return := blah]
    f() { return "hello"; }

I also have failed to consider exporting functions or dynamic functions where their return types or arguments are is-a matches but not an exact match. I need to unify my thunk-generating code to handle these issues.

Finally, I also need to add one of the features I really needed from this analyzer design- MultiTypeDependency. This will essentially tell the analyzer which expressions/statements hold dependencies on arguments of variable type.

Saturday 31 May 2014

Fun with inheritance!

Along the design process of Wide, many people have suggested preprocessing to C++. I declined to go that route because I wanted to have the power to offer different semantics. A simple example is strict left-to-right evaluation order. Another is defined overflow/underflow on integers. But here's a more complex example: virtual functions. In C++, you can have

    class base {
        virtual base* clone();
    }
    class derived : base {
        virtual derived* clone();
    }

This is great, but not safe- we really need to use smart pointers. Unfortunately, this leads to an unfortunate surprise.

    class base {
        virtual std::shared_ptr<base> clone();
    }
    class derived : base {
        virtual std::shared_ptr<derived> clone(); // error
    }

This is obviously bad. So in Wide, I've decided for virtual function arguments and returns, we will use is-a- it's like implicitly convertible, but I've decided for it to have a deeper meaning. This means that the above sample, when converted to Wide, is legal.

The problem is when you want to express this Wide hierarchy as C++- because the converted code is illegal. Strictly from a vtable point of view, if I export the constructor then I can set whatever vtable I want in it. But, for example, if the base class function is pure virtual, I need Clang to recognize that derived is not abstract. There is a method in Clang that suggests this but who knows if it can handle the idea of one method overriding another even when the Standard says it should not.

 So, yes. Now you can inherit from a C++ interface, but what's going to happen if you try to convert that type to a C++ type, I don't know.

Thursday 29 May 2014

The early-move kraken, incremental analysis and error handling.

I've gotten into the habit of compulsively moving things. It doesn't help that my current design involves a bunch of unique_ptr. So I've just fixed a bunch of bugs in code that looks like this:

OverloadSet* Analyzer::GetOverloadSet(std::unordered_set<clang::NamedDecl*> decls, ClangTU* from, Type* context) {
    if (clang_overload_sets.find(decls) == clang_overload_sets.end()
     || clang_overload_sets[decls].find(context) == clang_overload_sets[decls].end()) {
        clang_overload_sets[decls][context] = Wide::Memory::MakeUnique<OverloadSet>(std::move(decls), from, context, *this);
    }
    return clang_overload_sets[decls][context].get();
}


First, here we're depending on compiler initialization order to evaluate the left-hand-side first, because in the right-hand-side we destroy decls's contents. Even more hilariously, after that, we then use decls AGAIN to get the object we just inserted. Unsurprisingly, this piece of code did not work fantastically well. What's more surprising is how few tests failed.

    return GetSignature()->BuildCall(Wide::Memory::MakeUnique<Self>(this, !args.empty() ? args[0].get() : nullptr, std::move(val)), std::move(args), c);

 
Similar problem here. We've read args and moved from it in the same function call. Ignoring potential UB due to evaluation order, which I believe is safe because all operations are in terms of user-defined functions, the compiler can (and GCC did) move args first. There are likely many more such places in the Wide codebase (yay!).

Incremental analysis and error handling. So far I don't have any tests about incremental analysis or error recovery, we still use the exception error handling model. But incremental analysis is beginning to concern me because I don't have a solid model for which nodes are responsible for handling what. The core problem is that some nodes but certainly not all would have to be written to listen for changes to their input and adjust appropriately. I just don't know which ones and why.

Tuesday 27 May 2014

Totally not overkill

On today's issue of Totally Not Overkill: JITting a function in your test to tell you what failure to expect. Oh, stop looking at me like that. It was the easiest thing to do since I already have machinery to JIT Wide functions.

ExpectedFailure() { return { "OverloadResolutionFailure", 179, 183 }; }
using movable := cpp("CompileFail/CPPInterop/NonCopyable.h").test;
Main() {
    var := movable();
    copy := var;
    return false;



Still, I turned up a BUNCH of bugs in those tests and a lot of missing coverage. For example, I have 30 possible semantic errors right now, and only 15 rejection tests (and several of those are missing member or overload resolution failures). I found some tests that hadn't been updated for the new function argument syntax. I found some tests that hadn't been updated for new integer literal rules. I found some tests that were never valid Wide in totally irrelevant ways. I found an error that gave the using context's location instead of the true location.

Next up I will begin classifying the compilation failure tests by which exception site they test, so I can identify less-tested call sites and exception types. There should be like, 40 compilation failure tests at least.

This is totally orthogonal to the fact that I want to completely overhaul error handling from analysis-terminating exceptions to per-node properties, and add tests for warnings to ensure that they also behave correctly.

Sunday 25 May 2014

Syntactical flexibility

Previously, I kept C++'s member initializer syntax- that is,

    type t {
        var := bool;
        type() : var(true) {}
    }


But this isn't really flexible enough for my needs. For example, var can only be an identifier, but bases need expressions to identify them. In addition, consider the needs of a function that is, say, an exported constructor. This function needs initializers even though it is not, syntactically, a constructor. This causes me to feel the need to unify syntaxes for exporting, explicit return type, and member initializers into a familiar syntax.

    type t {
        var := bool;
        type() var := true; {}
    }

    f()
        export := function;
        member1 := init;
    {}

In addition, consider that now, it could be possible to express the initializer of more than one member. For example,

    type t {
        var := bool;

        var2 := bool;
        type() var, var2 := { true, true }; {}
    }


I also feel that you should be able to define out-of-class functions as members, including a "this" member, and that this should function like a normal "this" in terms of implicit lookup. If the function is not exported as a member then failure. I feel that having export and return as keywords will save the user from not being able to use members that way. You can specify multiple values for export- string, function, or true. True is coming later.

For function I am in the position where you have to use an overload set. This is problematic when it's a member overload set. I feel like I need to re-introduce something like type->member, where -> will denote a static access. This access should produce an overload set including non-static members, where there are non-static member functions.

Finally, I should perform some parameter validation. Code-generation will fail if the exported function is not LLVM-compatible with the one Clang declared, but there's no other sanity checks involved. The same is true of virtual thunks. The compiler should not fail at code generation time.

Friday 2 May 2014

A complete overhaul

This will be the most complete overhaul I've ever done of my analyzer. My lexer and parser I iterated on several times before their current, mostly final, designs. But my analyzer has always operated on the same core design.

This design I'm thinking of as the "Glorified Interpreter" approach. Essentially, we start with something akin to

    struct Expression {
        Type* t;
        llvm::Value* v;
        Expression Add(Expression other);
    };
    struct Type {
        // Default impl is unsupported- throws an exception.
        virtual Expression BuildAdd(Expression lhs, Expression rhs);
    }; 
    Expression Expression::Add(Expression other) {
        return t->Add(*this, other);
    }

Here we can generate the code for, say, adding, relatively simply.

    struct IntegralType : Type {
        Expression BuildAdd(Expression lhs, Expression rhs) {
            if (!dynamic_cast<IntegralType*>(lhs.t)) throw ...;
            if (!dynamic_cast<IntegralType*>(rhs.t)) throw ...;
            return builder.CreateAdd(lhs.v, rhs.v);
        }
    };

This is basically an interpreter, except instead of an actual value, we have an llvm::Value*. I identified a few improvements on the design, like expressing most operations in terms of overload resolution (including primitive operators). However, I've identified a few flaws here.
  1. Like C++, all information must be available up-front and cannot change. This prevents incremental re-analysis, type inference for recursive functions, and such things.
  2. No complex analysis of traits- the compiler has no information available except what type an expression is. We don't know where it came from or what it's doing here or what process created it. We also can't identify other traits like whether it's dependent on a duck-typed argument.
  3. Error-handling is fundamentally broken. As soon as any error occurs the entire system stops. This is fine for an essentially linear process like lexing or parsing, but for an analyzer, there's no reason why you shouldn't analyze the true and false branches of an if just because the condition is bad.
  4. Code generation is performed eagerly, even if the user doesn't need it because it's say, performing analysis for support features like IDE integration.
  5. Supporting destructors is a bit of a kludge. And by a bit I mean a lot.
Instead, I now express each expression as a node in a graph. The node structure is permanent rather than transient and represents a function more than a value. Here's an example:

     struct Add : Expression {
       std::unique_ptr<Expression> lhs, rhs;
       Type* ty = nullptr;
       std::function<llvm::Value*(llvm::Value*, llvm::Value*)> add;
       std::unique_ptr<Error> err;
       // Called when lhs or rhs changes
       void OnNodeChanged(Node* n) {
           if (!lhs->GetType() || !rhs->GetType()) {               
               if (ty != nullptr) {
                   ty = nullptr;
                   OnChange();
               }
               return; 
           }
           auto unique_expr = lhs->GetType()->BuildAdd(rhs->GetType());
           if (unique_expr.error) {
               err = std::move(unique_expr.error);
               OnChange();
               return;
           }
           if (ty != unique_expr.t) {
               ty = unique_expr.t;
               add = unique_ptr.func;
               OnChange();
           }
       }
       void DestroyLocals() {
       }
       Error* GetError() {
           return err.get();
       }
       Type* GetType() {
           return ty;
       }
       llvm::Value* GetValue() {
           return add(lhs->GetValue(), rhs->GetValue());
       }
   };

I can delay code generation until it's actually needed. Each node can individually error or not. I can re-compute the same nodes with different inputs if the inputs are changed, say, because the user was live-typing his source code. I can dynamic_cast the lhs or rhs to see what they are if I want to implement anything funky. I can implement DestroyLocals with references to the LLVM values if I need to.

The main problem I've got so far is that many nodes have the same kind of logic- propagate null-type if our source has it, propagate error if our source has it, etc. So far I've duplicated it, but this is starting to piss me off. I believe it's some kind of monadic bind, I'll have to take a look.

I've also just noticed that I've got a problem with temporaries- namely, that the return of BuildAdd has no facility for determining what should be done about them. If the *result* of Add is a temporary, we can destruct it easily enough, but if, say, passing the rhs causes the creation of a temporary argument, then we've got no facility for destroying it. Can't just return a std::function<void()> in the return because how would the Type know what the llvm::Value*s are? Gonna have to say that the Type has to ask for any conversions it wants. There are no core language types that need destructing, and any that do will be forwards to user-defined function calls. This would be, I guess, implemented-in-terms-of-Expression node.

     struct Add : Expression {
       std::unique_ptr<Expression> lhs, rhs;
       std::unique_ptr<Expression> impl;
       // Called when lhs or rhs changes
       void OnNodeChanged(Node* n) {
           if (!lhs->GetType() || !rhs->GetType()) {               
               if (impl != nullptr) {
                   impl = nullptr;
                   OnChange();
               }
               return; 
           }
           impl = lhs->GetType()->BuildAdd(rhs->GetType());
           OnChange();
       }
       void DestroyLocals() {
           impl->DestroyLocals();
       }
       Error* GetError() {
           return impl->Error();
       }
       Type* GetType() {
           return impl->Type();
       }
       llvm::Value* GetValue() {
           return impl->GetValue();
       }
   };