Parallel stream vs sequential stream vs for-each loop processing in java 8
I had a discussion with another java programmer about processing collections with java 8 streams and she mentioned that she prefers classical loops because streams are slower. This is the perfect start for a hot debate about which one is better and the best way to continue it is to have facts. In previous post I parsed a pretty large text file into a list so I decided to extend foods parsing code and process this list in three ways: the old fashion for-each loop, sequential stream and parallel stream. The result list after parsing contains almost 9000 elements and the processing consists in computing the number of calories in a common household measure like the tablespoon starting from the number of calories in 100g.
Each line from the original file is saved in an instance of
/**
* It contains values from ABBREV.txt file
*/
data class Abbrev(val ndbNo: String,
val shrtDesc: String,
val energKcal: Int) {
var gmWt1 : Double = 0.0;
var gmWtDesc1 : String = "";
var gmWt2 : Double = 0.0;
var gmWtDesc2 : String = "";
}
the result of the transformation is kept in an instance of
/**
* It contains Kilo calories for 100g, and common house hold weights
*/
data class AbbrevKcal(val ndbNo: String,
val energKcal: Int,
val gmWt1Kcal: Double,
val gmWt2Kcal: Double) {
}
and the transformation is done by
fun convert(abbrev: Abbrev): AbbrevKcal = AbbrevKcal(
abbrev.ndbNo,
abbrev.energKcal,
abbrev.energKcal * abbrev.gmWt1 / 100,
abbrev.energKcal * abbrev.gmWt2 / 100)
I used kotlin to implement needed data classes because in kotlin is very easy to define them and it is also very easy to mix kotlin with java. I used jmh to create the benchmark and the code is below:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class KCalConverter {
List<Abbrev> abbrevs = new StreamAbbrevParser().parseFile("/sr28abbr/ABBREV.txt");
@Benchmark
public void baseline() {
}
@Benchmark
public void forEachConvert(Blackhole blackhole) {
List<AbbrevKcal> abbrevKcals = new ArrayList<>(abbrevs.size());
for(Abbrev abbrev : abbrevs) {
abbrevKcals.add(ConverterKt.convert(abbrev));
}
blackhole.consume(abbrevKcals);
}
@Benchmark
public void sequentialStreamConvert(Blackhole blackhole) {
List<AbbrevKcal> abbrevKcals = abbrevs.stream()
.map(ConverterKt::convert)
.collect(Collectors.toList());
blackhole.consume(abbrevKcals);
}
@Benchmark
public void parallelStreamConvert(Blackhole blackhole) {
List<AbbrevKcal> abbrevKcals = abbrevs.parallelStream()
.map(ConverterKt::convert)
.collect(Collectors.toList());
blackhole.consume(abbrevKcals);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(KCalConverter.class.getSimpleName())
.warmupIterations(25)
.measurementIterations(100)
.forks(1)
.build();
new Runner(opt).run();
}
}
I was ready to see impressive results for parallel processing but I was disappointed because it looked like:
Benchmark Mode Cnt Score Error Units
KCalConverter.baseline avgt 100 0.324 ± 0.001 ns/op
KCalConverter.forEachConvert avgt 100 184953.601 ± 988.417 ns/op
KCalConverter.parallelStreamConvert avgt 100 167660.485 ± 1993.920 ns/op
KCalConverter.sequentialStreamConvert avgt 100 189933.029 ± 2779.487 ns/op
After few minutes of thought I realised that the actual processing is too simple and it runs fast even in sequential way. I artficially “improved” it to use BigDecimal
:
fun complexConvert(abbrev: Abbrev): AbbrevKcal = AbbrevKcal(
abbrev.ndbNo,
abbrev.energKcal,
BigDecimal(abbrev.energKcal).multiply(BigDecimal(abbrev.gmWt1)).divide(BigDecimal(100)).toDouble(),
BigDecimal(abbrev.energKcal).multiply(BigDecimal(abbrev.gmWt2)).divide(BigDecimal(100)).toDouble())
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class KCalConverter {
List<Abbrev> abbrevs = new StreamAbbrevParser().parseFile("/sr28abbr/ABBREV.txt");
@Benchmark
public void baseline() {
}
@Benchmark
public void forEachConvert(Blackhole blackhole) {
List<AbbrevKcal> abbrevKcals = new ArrayList<>(abbrevs.size());
for(Abbrev abbrev : abbrevs) {
abbrevKcals.add(ConverterKt.convert(abbrev));
}
blackhole.consume(abbrevKcals);
}
@Benchmark
public void forEachComplexConvert(Blackhole blackhole) {
List<AbbrevKcal> abbrevKcals = new ArrayList<>(abbrevs.size());
for(Abbrev abbrev : abbrevs) {
abbrevKcals.add(ConverterKt.complexConvert(abbrev));
}
blackhole.consume(abbrevKcals);
}
@Benchmark
public void sequentialStreamConvert(Blackhole blackhole) {
List<AbbrevKcal> abbrevKcals = abbrevs.stream()
.map(ConverterKt::convert)
.collect(Collectors.toList());
blackhole.consume(abbrevKcals);
}
@Benchmark
public void sequentialStreamComplexConvert(Blackhole blackhole) {
List<AbbrevKcal> abbrevKcals = abbrevs.stream()
.map(ConverterKt::complexConvert)
.collect(Collectors.toList());
blackhole.consume(abbrevKcals);
}
@Benchmark
public void parallelStreamConvert(Blackhole blackhole) {
List<AbbrevKcal> abbrevKcals = abbrevs.parallelStream()
.map(ConverterKt::convert)
.collect(Collectors.toList());
blackhole.consume(abbrevKcals);
}
@Benchmark
public void parallelStreamComplexConvert(Blackhole blackhole) {
List<AbbrevKcal> abbrevKcals = abbrevs.parallelStream()
.map(ConverterKt::complexConvert)
.collect(Collectors.toList());
blackhole.consume(abbrevKcals);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(KCalConverter.class.getSimpleName())
.warmupIterations(25)
.measurementIterations(100)
.forks(1)
.build();
new Runner(opt).run();
}
}
and the results are:
Benchmark Mode Cnt Score Error Units
KCalConverter.baseline avgt 100 0.324 ± 0.001 ns/op
KCalConverter.forEachComplexConvert avgt 100 11306389.001 ± 61637.238 ns/op
KCalConverter.forEachConvert avgt 100 184953.601 ± 988.417 ns/op
KCalConverter.parallelStreamComplexConvert avgt 100 3902042.888 ± 60230.745 ns/op
KCalConverter.parallelStreamConvert avgt 100 167660.485 ± 1993.920 ns/op
KCalConverter.sequentialStreamComplexConvert avgt 100 12033879.972 ± 129120.560 ns/op
KCalConverter.sequentialStreamConvert avgt 100 189933.029 ± 2779.487 ns/op
In this case the implementation with parallel stream is ~ 3 times faster than the sequential implementations. Also there is no significant difference between fore-each loop and sequential stream processing.
My conclusions after this test are to prefer cleaner code that is easier to understand and to always measure when in doubt.