Running SPEC2000 Benchmarks with SimpleScalar
This guide was created to help others avoid the pitfalls
I encountered trying to get
SPEC2000 benchmarks to run under SimpleScalar. This is certainly not
a comprehensive list of pitfalls and remedies, but it should be enough to
get you started with SPEC2000 and SimpleScalar and also help troubleshoot or
avoid a few problems that I've run into.
You'll have to get your own copy of SPEC2000, as it isn't free. SimpleScalar,
however, is free, provided you are using it for academic research purposes.
The versions of SimpleScalar that I've used are:
If you decide to use Sim-Mase Test-1, there are some things that you'll have
to do to get that environment functional. The main issue is that this release
doesn't come with a complete set of files, so you have to first download the
SimpleScalar 3.0d release, extract it into a directory, and then extract the
Sim-Mase Test-1 release on top of it. The order of operations is important, I
think, as some files get overwritten. After you've extracted, you will have to
modify the alpha.def file, as the Sim-Mase Test-1 version is incorrect for its
implementation of SQRTT. Here's the code snippet with the before and after:
/* MODIFIED by fredb */
/* Original Code: */
/* DNA, DNA, DNA, DNA, DNA) */
/* Modified Code: */
/* Revert to what seems right from SS 3.0d distribution. */
DFPR(RC), DNA, DFPR(RB), DNA, DNA)
/* END MODIFIED by fredb */
Duke Architecture users should use the local
resources, rather than going and getting these tools.
If you are running in a batch job environment with a large pool of machines,
I recommend compiling on the machine you run on, not running a pre-compiled
binary, as subtle issues, like which version of glibc you have available, can
make for strange behavior that is difficult to debug. Assuming that you
have a shared filesystem accross your machines, this is a rather simple item
to script. Just beware of issues with multiple compilations attempting to
run simultaneously. I generally make a sandbox and link to the actual
sources to prevent issues of object confusion.
Once you have SimpleScalar building and running toy benchmarks,
you can run SPEC2000.
Beware that SPEC2000 running under a full
sim-outorder or sim-mase simulation environment will run for weeks on even the
fastest Intel processor available. To allow for faster, nearly-as-accurate
simulation, I recommend utilizing the SimPoint Toolkit of Brad Calder and his
students at UCSD. They maintain SimPoints for SPEC on the SimPoint website. If you
decide to go this route, there are a few recommendations I can make, as I
have used SimPoints with SPEC2000 to satisfaction.
At the very least, read the SimPoint documentation at the website cited
above. You don't have to read the
paper that they ask you to cite if you use
SimPoints, but it isn't a bad paper and will give you a better understanding
of what you are doing.
If you're a Duke user, go look at the
aforementioned Duke SPEC Tools Page, as it will let you leverage my work
to potentially skip the following steps and get right to your research. When
you've browsed that material, skip ahead to the next section for more
information on SPEC benchmarks.
Make certain that the SPEC2000 binaries you are using are the ones from
Chris Weaver's website at Michigan. This website has apparently disappeared.
For the time being, I am making these binaries available
here, as I am unaware of any problems
with doing so.
Run sim-fast to completion on the benchmarks with either the full or
test inputs. Compare instruction counts with those from Chris Weaver
(see site referenced above). Since this site is no longer avaialable, I can't
help you here, but I've done this for v3.0d and v4.0 (which uses the v3.0d
core, so is the same, to no surprise), so you can skip this step if you are
using one of these versions. Earlier versions of SimpleScalar (and potentially
some modified versions) tally instructions differently, resulting in drift in
instruction counts. This breaks SimPoint, so is important. If you aren't
using one of the blessed versions of SimpleScalar, you have a couple of options.
The first would be to run one benchmark to completion on a known-good version
as well as on your new version. If the instruction counts are identical, or
within a few thousand instructions to it, then you can probably risk-assume
that this will work for you. Another option is to count the basic blocks, as
outlined in the SimPoint process, triggering simulation based upon that, rather
than instruction count.
Use only what you need to. For a course project, the single SimPoints
should be sufficient and can be easily scripted for unmodified SimpleScalar.
If you need to use multiple SimPoints (for a camera-ready paper, say), then I
recommend modifying SimpleScalar's fast-forwarding feature to allow for
multiple transitions from fast operation to detailed simulation. This isn't
too hard to do. E-mail me if you need help making these modifications. I
hesitate to post code here, as I don't really want to be in the business of
You should know the following about SPEC2000 and running it with
SimpleScalar (with or without the use of SimPoints).
There are 26 SPEC2000 Benchmarks - 12 Integer and 14 Floating Point.
Each benchmark has test and reference inputs.
Test inputs are for validating that you have things configured properly to run
the benchmark. They can also be used to shorten the runtime of your benchmark
run, but they achieve this by being smaller all around, so it is hard to argue
that they are an effective workload, as they may not exercise the caches very
hard and may skew other statistics due to disproportionate startup and
shutdown phase weighting.
Reference inputs are for official benchmark runs of SPEC. These are the
ones you probably want to use, particularly if you are using SimPoint, as
SimPoint gives you the "best of both worlds" - full benchmark
workload behavior with shortened simulation time.
Some benchmarks have multiple inputs. For example gcc (the GNU C
Compiler) has different inputs in the form of different source files to
I believe that it is common practice to select one of the inputs for use
with SimpleScalar, although a true SPEC run on a machine requires all of the
inputs be run (I think). That is, results are reported for one input set for
gcc in a paper, not for every input set. Some folks indicate which input
set they use by suffixing the benchmark name with the input set name (for
There are SimPoints for each of the different input sets. If you elect
to use multiple SimPoints, you can shorten your simulation time by picking
input sets that have fewer, earlier SimPoints than others for the same
If you need a quick reference for the command line parameters for each
of the input sets, I recommend using Ken Barr's SPEC2000 Command Lines site
SPEC2000 Integer Benchmarks and
SPEC2000 Floating Point Benchmarks.
If you are at Duke, the SPEC2000 inputs with Chris Weaver's Alpha binaries
SimpleScalar's benchmarks page) are permanently installed (see:
Duke SPEC Tools Page for details). I've
modified the structure to have copies of the reference inputs, and other
data, if necessary, up in the root of each benchmark (see below), but the
binaries are in the run directory, as would be expected in a SPEC source tree.
Some benchmarks create temporary files from where they are run. I
recommend setting up a sandbox to run from for each separate benchmark run
in order to prevent multiple instances of the benchmark from stepping on
each other's temp files.
Within SimpleScalar, I've found that the directory structure of input
files and training files doesn't work unless all needed files are in the
directory you run from. This could be a path problem in my environment, but
the simple solution is to put the SPEC binary and all necessary input
files in the sandbox directory where you run SimpleScalar from to
make sure that everything works.
The two exceptions to the flat-directory rule are perlbmk and
parser, which require sub-directories within the sandbox (parser needs a
words sub-directory and perlbmk needs a lib subdirectory). All other
benchmarks seem to work with a flat directory structure containing all
necessary inputs and the binary.
Some benchmarks have an all directory in addition to ref, test, and train
directories under data. You need material from both your target input set and
the all directory.
One final thing worth noting is that I have had consistent problems getting
certain SPEC benchmarks to run all SimPoints with both SimpleScalar 3.0d and
the sim-mase extensions to it. This may be due to some issue with my
environment, but I suspect bugs in the simulator that arise from
interactions with different libraries and such under different Linux
distributions and kernels as the root cause. I have
not had the bandwidth to investigate this further and have worked around
this by running SimPoints individually until I get a complete set for a
probelmatic benchmark or by forcing runs of certain benchmarks to happen on
machines that seem to work better than others. For those only running a few of the SPEC benchmarks,
the ones I would avoid are: galgel, lucas, bzip2, gcc, and ammp. If you
elect to use multiple SimPoints and are able to selectively pick benchmarks,
pick ones with earlier SimPoints. Fast-forwarding takes non-trivial time
for things like parser and twolf.
I welcome feedback on this and hope that it is helpful to you in your
research. Good luck and happy simulating!
Page Created by Fred Bower. Last Updated: 21 November, 2005.