OpenHealthMeasures Home > WinBUGS Index > Optimization Dummy variables to increase speed (optimization) 4 hours vs 5 minutes for age groups May then fit poorly! DIC slows down refresh considerably

Data limits in WinBUGS?

My dissertation is going to have a lot of data. One of my datasets has about 1 million observations in it. I'm going to be splitting it up into "small" chunks around 100k observations for different models. I had no idea if WinBUGS could even handle that much data.

I did some very simplified runs today (my prelim is this week) to see if I could break WinBUGS. Shockingly, it seemed to run just fine.

I just did a very basic regression. The first set had 3 predictors. This is the amount of memory it took:
60k observations: 45k
100k observations: 70k

For 100 updates, it took this much time:
20k: 6 seconds
40k: 13 seconds
60k: 21s
100k: 35s

The second set had 6 predictors. Memory:
20k: 30k
40k: 48k
60k: 70k
100k: 102k

For 100 updates, it took:
20k: 12s
40k: 24s
60k: 36s
100k: 61s

With DIC running and all nodes set, the 100k observations with 6 predictors took 106k of memory and 66 seconds to update 100 times.

It appears the primary way to speed up the updates would be to have a faster processor, as WinBUGS always takes as much processor power as is available. As far as I know, there is no way to parallelize WinBUGS, so it can only use one processor on a multicore processor. The advantage of a multicore processor, of course, is that you'd still have enough power to do other things while your model is running in the background.

Simple Optimizations

If you have a lot of models to run, or the models you're going to run are extremely large, it may be best to write your own code. My husband is a bleeding edge video game programmer, so he finds my use of this scripting language (ie, WinBUGS) ridiculous. He, however, has 15 years of programming experience that I do not have. We estimated, with the help of a statistician friend, that there are a series of coding optimizations that could be done:
1. Use R or S. This probably results in a 3 to 4 times faster model run time.
2. Use a "real" programming language like C++. Another 3 to 4 times faster.
3. Algorithmically optimize your C++ code. Another 10 times faster.
4. Hardware optimize your C++ code. Another 4 times faster.

The advantage of using WinBUGS is that I don't have to do all of that, obviously. I also don't have to worry about errors in my code to the same overwhelming degree as that sort of project. So, here are a few things that don't require quite so much to do:
1. Pick smart priors. I know it is easier said than done, but the few minutes spent thinking about the impact of transformations and things really pays off. If your model starts to fly instead of crawl, you may have chosen priors that make no sense and will not converge.
2. Don't allow too much freedom in your model. In particular, I have a bad habit of including too many unspecified scaling terms and constants. Constants can counterbalance each other in the model! This shows up as constants drifting in opposite directions. Either drop some of them or fix some of them.
3. Within reason, use binary variables instead of continuous variables.