OpenHealthMeasures Home > WinBUGS Index > Other>

Using DIC for Model Evaluation

There is now a very good page about DIC available through at the BUGS project website. Using DIC requires model convergence. WinBUGS does not detect convergence, but will not allow you to turn on the DIC monitor without having some model burn in. I've found that WinBUGS will allow the DIC monitor to be set if more than 2000 interations have been run, though a longer burn-in period may be needed. Running the DIC monitor will slow down reiteration speed considerably, so it is worth using it sparingly. However, it is also good to have it be accurate! I also monitor the node named "deviance" at the same time I set the DIC monitor and take iterations until the MC error of the deviance node is 5% of the standard deviation (sd) or less.

Variable Names

WinBUGS will blow up on model checking because a variable name starts with a number.

WinBUGS also likes variable names with periods and underscores in them. It does not like variable names with hyphens.

Rectangular data structure

I prefer the rectangular structure for data, though it can be a bit of a hassle to get into WinBUGS. The main trick is that you must end the data entry with a line with the text "END" and then hit "enter" twice, so you have at least one blank line.

It is only possible to enter data that you will be using in your model. Unlike SAS or other programs, you can't load in data but not use it. Unfortunately, this means a lot of datafile management if you're running a lot of models with slight modifications.

The rectangular data has two parts. First you tell WinBUGS how many observations you have in. For example:
list(N=n, M=m)

The next part is the data part. The top line has the (case sensitive!) variable names followed by brackets. Each line after that is an entry. "END" is put on a line by itself at the end of the data entry and there must be at least one blank line at the end of the file. For example:
A[] B[] C[]
a1 b1 c1
a2 b2 c2
....
an bn cn
END
[blank line]

Entering the data can be painful. It is fairly easy if you do not have any missing data or can replace all missing data with NA in whatever program you usually use for data management. If you don't have any missing data, create a datasets with only the variables that you need and export it or save it as a delimited file (no commas). I open these, add in the header and footer, and then save as a .txt file, which WinBUGS can read.

My process for datasets with missing data is much more painful. It goes something like this:
1. Export the data into Excel
2. Replace missing data with "NA" and arrange the data in Excel so the columns are the way I want them
3. Copy the columns of data I want to import
4. Paste as unformated text into Word (this is a Paste Special)
5. Search and replace all tabs (^t) with a space
6. Copy the text
7. Paste Special into a WinBUGS file as plain text
8. On the next row, type END
9. Add at least two blank lines after END

Batch scripts for WinBUGS

You can run scripts from within Winbugs by going to Model -> Script. The example is actually quite good (User Manual -> Batch-mode: Scripts), but there are two things that took me a moment to figure out:

1. WinBUGS assumes that your files paths are relative to the WinBUGS program directory. As in, the directory it opens when you File -> Open. This is annoying to me because I don't store my work anywhere near there. Oh, well.

2. If you use rectangular data structures, you'll have to have (at least) two data files. The first is the number of observations (ie, "list(N=n, M=m)"). The second is the actual rectangular data.

Out of Sample tests

Here is a trick I learned (thanks, Dave!) for performing an out of sample test on new data. This allows you to read in one data set where the first N rows are the sample used for estimation and rows N+1 to M. These do not need to be equally sized samples. Below, I calcuate the mean squared error for the out of sample test which can be monitored like any other parameter.

model;
{
for( i in 1 : N ) {
mu[i] <- cons + male * MALE[i] + age*AGE[i]
SCORE[i] ~ dnorm(mu[i],tau)
}

for( j in N+1 : M) {
predmu[j-N] <- cons + male * MALE[j] + age*AGE[j]
error[j-N] <- predmu[j-N] - SCORE[j]
se[j-N] <- error[j-N] * error[j-N]
}

mse <- mean(se[])
cons ~ dnorm( 0.0,1.0E-6)
male ~ dnorm( 0.0,1.0E-6)
age ~ dnorm( 0.0,1.0E-6)
tau ~ dgamma(0.001,0.001)
}