Thursday, 16 June 2011

I wanted to talk today about a concurrent stack. No, not a stack usable from multiple threads. A stack that we use concurrently to the native hardware stack, used by automatic variables.

Whilst, on most platforms and under most compilers, you can dynamically allocate off the stack, it's really not a good idea, because as soon as the function ends, that memory is gone. In addition, you may, depending upon the smarts of your compiler and the logic of your function, have to pay for dynamic indexing of your normal, non-dynamically-allocated stack-based variables. In addition to this, because you can't exactly call alloca() from in the constructor, you will have to manually call it every time, which is not great encapsulation, and ensures that you will never be able to use it with a Standard container.

So I wanted to propose a solution: A second stack, allocated off the heap. You'd need one for each thread- not shown here, but we'll accomplish it in C++0x with thread_local. This would be a relatively large chunk of contiguous memory, and intrinsically function in the same way as the current hardware stack does- a simple decrement to allocate memory, all deallocated at once when the scope ends. The difference is that you can manually define "scope", and pass "scope" around and take references to "scopes".


thread_local memory_stack stack;
std::string func() {
    scope my_scope = stack.make_scope();

    auto stack_string = func(my_scope);
    return string(stack_string.begin(), stack_string.end());
}
std::basic_string<char, stack_allocator> func(const scope& s) {
    decltype(func(s)) string(s);

    // Perform some expensive computation involving lots of memory allocation here
    // But where none of the memory needs to live past the function call    
    return string;
}
int main() {
    // We want the string to live for more than a little while
    // let's say, we're going to move it into a container or something
    // which would be a copy for a stack-based string
    auto heap_string = func();

    // The string lives only for this function,
    scope s = stack.make_scope();
    auto stack_string = func(s);
}


This approach is quite efficient- but it has a few problems. For example, we can't pop off the working set of func(s), because the string is in that working set- we have to wait till we don't need the string anymore to get rid of all of it. This means that the working set of func(s) could sit around for quite some time, and the problem gets compounded as you start making more calls. So what I might consider is a transition to an instance-based approach. It'll also solve thread-safety problems and, icky globals. However, you'd now have to allocate a new stack for every function which wants to use this approach. This will dramatically reduce the amount of memory we could reasonably pre-allocate, as now we may have many instances instead of just one per thread.


std::string func() {
    memory_stack m;
    auto stack_string = func(m);
    return string(stack_string.begin(), stack_string.end());
}
std::basic_string<char, stack_allocator> func(const memory_stack& s) {
    decltype(func(s)) string(s);
    memory_stack inner_stack; // owch
    // Perform some expensive computation involving lots of memory allocation here
    // from inner_stack but where none of the memory needs to live past the function call    
    return string;
}

int main() {
    memory_stack stack;
    scope s = stack.make_scope();
    auto heap_string = func();
    
    auto stack_string = func(stack);
}


The other problem is working set size. We can only reasonably pre-allocate so much memory, and it's going to be unsuitable for large, individual allocations, although we can use an unrolled linked list to cope reasonably efficiently with unbounded working sets that are comprised of a lot of smaller allocations.

The other place where I can see this being of use is where an individual object needs dynamic memory, but that dynamic memory will always live for the lifetime of the object- where multiple dynamic allocations are required, of course.

Edit: What is the use case for such a class/such semantics? It's simple- repeated dynamic allocations and de-allocations under a certain size with a limited scope. If you're building temporary linked lists, maps, hash_maps or even temporary vectors or strings where the size isn't known up front, then you can go from several allocations to one, or one every max-allocation-size. Careful of the maximum allocation size, though.

Of course, if all the chunks have the same max allocation size, then they themselves can be allocated from a fixed-size object pool, such as that provided by Boost.

Apparently, I'm not the first one to have this idea, which would hardly be the first time my Epic Uber Groundbreaking™ idea was already thought of by someone else, and it's known as a memory pool. The Boost library only offers a fixed-size pool, so forgive me for not noticing that :P

No comments:

Post a Comment