Running SPEC2000 Benchmarks with SimpleScalar


Introduction

This guide was created to help others avoid the pitfalls I encountered trying to get SPEC2000 benchmarks to run under SimpleScalar. This is certainly not a comprehensive list of pitfalls and remedies, but it should be enough to get you started with SPEC2000 and SimpleScalar and also help troubleshoot or avoid a few problems that I've run into.

You'll have to get your own copy of SPEC2000, as it isn't free. SimpleScalar, however, is free, provided you are using it for academic research purposes.

SimpleScalar Notes

The versions of SimpleScalar that I've used are:

If you decide to use Sim-Mase Test-1, there are some things that you'll have to do to get that environment functional. The main issue is that this release doesn't come with a complete set of files, so you have to first download the SimpleScalar 3.0d release, extract it into a directory, and then extract the Sim-Mase Test-1 release on top of it. The order of operations is important, I think, as some files get overwritten. After you've extracted, you will have to modify the alpha.def file, as the Sim-Mase Test-1 version is incorrect for its implementation of SQRTT. Here's the code snippet with the before and after:

DEFINST(SQRTT, 0x2b,
     "sqrtt",     "B,C",
     NA,     NA,
     /* MODIFIED by fredb */
     /* Original Code: */
     /* DNA, DNA,     DNA, DNA, DNA) */
     /* Modified Code: */
     /* Revert to what seems right from SS 3.0d distribution. */
     DFPR(RC), DNA,     DFPR(RB), DNA, DNA)
     /* END MODIFIED by fredb */

Duke Architecture users should use the local resources, rather than going and getting these tools. If you are running in a batch job environment with a large pool of machines, I recommend compiling on the machine you run on, not running a pre-compiled binary, as subtle issues, like which version of glibc you have available, can make for strange behavior that is difficult to debug. Assuming that you have a shared filesystem accross your machines, this is a rather simple item to script. Just beware of issues with multiple compilations attempting to run simultaneously. I generally make a sandbox and link to the actual sources to prevent issues of object confusion.

Once you have SimpleScalar building and running toy benchmarks, you can run SPEC2000.

SimPoints

Beware that SPEC2000 running under a full sim-outorder or sim-mase simulation environment will run for weeks on even the fastest Intel processor available. To allow for faster, nearly-as-accurate simulation, I recommend utilizing the SimPoint Toolkit of Brad Calder and his students at UCSD. They maintain SimPoints for SPEC on the SimPoint website. If you decide to go this route, there are a few recommendations I can make, as I have used SimPoints with SPEC2000 to satisfaction.

  1. At the very least, read the SimPoint documentation at the website cited above. You don't have to read the paper that they ask you to cite if you use SimPoints, but it isn't a bad paper and will give you a better understanding of what you are doing.
  2. If you're a Duke user, go look at the aforementioned Duke SPEC Tools Page, as it will let you leverage my work to potentially skip the following steps and get right to your research. When you've browsed that material, skip ahead to the next section for more information on SPEC benchmarks.
  3. Make certain that the SPEC2000 binaries you are using are the ones from Chris Weaver's website at Michigan. This website has apparently disappeared. For the time being, I am making these binaries available here, as I am unaware of any problems with doing so.
  4. Run sim-fast to completion on the benchmarks with either the full or test inputs. Compare instruction counts with those from Chris Weaver (see site referenced above). Since this site is no longer avaialable, I can't help you here, but I've done this for v3.0d and v4.0 (which uses the v3.0d core, so is the same, to no surprise), so you can skip this step if you are using one of these versions. Earlier versions of SimpleScalar (and potentially some modified versions) tally instructions differently, resulting in drift in instruction counts. This breaks SimPoint, so is important. If you aren't using one of the blessed versions of SimpleScalar, you have a couple of options. The first would be to run one benchmark to completion on a known-good version as well as on your new version. If the instruction counts are identical, or within a few thousand instructions to it, then you can probably risk-assume that this will work for you. Another option is to count the basic blocks, as outlined in the SimPoint process, triggering simulation based upon that, rather than instruction count.
  5. Use only what you need to. For a course project, the single SimPoints should be sufficient and can be easily scripted for unmodified SimpleScalar. If you need to use multiple SimPoints (for a camera-ready paper, say), then I recommend modifying SimpleScalar's fast-forwarding feature to allow for multiple transitions from fast operation to detailed simulation. This isn't too hard to do. E-mail me if you need help making these modifications. I hesitate to post code here, as I don't really want to be in the business of maintaining code.

SPEC2000

You should know the following about SPEC2000 and running it with SimpleScalar (with or without the use of SimPoints).

One final thing worth noting is that I have had consistent problems getting certain SPEC benchmarks to run all SimPoints with both SimpleScalar 3.0d and the sim-mase extensions to it. This may be due to some issue with my environment, but I suspect bugs in the simulator that arise from interactions with different libraries and such under different Linux distributions and kernels as the root cause. I have not had the bandwidth to investigate this further and have worked around this by running SimPoints individually until I get a complete set for a probelmatic benchmark or by forcing runs of certain benchmarks to happen on machines that seem to work better than others. For those only running a few of the SPEC benchmarks, the ones I would avoid are: galgel, lucas, bzip2, gcc, and ammp. If you elect to use multiple SimPoints and are able to selectively pick benchmarks, pick ones with earlier SimPoints. Fast-forwarding takes non-trivial time for things like parser and twolf.

I welcome feedback on this and hope that it is helpful to you in your research. Good luck and happy simulating!


Page Created by Fred Bower. Last Updated: 21 November, 2005.