for combinatorial optimization applications: we present a parallel
version of a constraint-based local search algorithm that has been
implemented on a multiprocessor BladeCenter machine with twin
Cell/BE processors (total of 16 SPUs per blade). This algorithm was
chosen because it fits very well the Cell/BE architecture and
requires neither shared memory nor communication between processors,
while retaining a compact memory footprint. We study the
performance on several large optimization benchmarks and show that
this achieves mostly linear time speedups, even sometimes
super-linear. This is possible because the parallel implementation
might explore simultaneously different parts of the search space and
therefore converge faster towards the best sub-space and thus
towards a solution. Besides getting speedups, the resulting times
exhibit a much smaller variance, which benefits applications where a
timely reply is critical