I had a discussion with another java programmer about processing collections with java 8 streams and she mentioned that she prefers classical loops because streams are slower. This is the perfect start for a hot debate about which one is better and the best way to continue it is to have facts. In previous post I parsed a pretty large text file into a list so I decided to extend foods parsing code and process this list in three ways: the old fashion for-each loop, sequential stream and parallel stream. The result list after parsing contains almost 9000 elements and the processing consists in computing the number of calories in a common household measure like the tablespoon starting from the number of calories in 100g.
Each line from the original file is saved in an instance of
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
/** * It contains values from ABBREV.txt file */ data class Abbrev(val ndbNo: String, val shrtDesc: String, val energKcal: Int) { var gmWt1 : Double = 0.0; var gmWtDesc1 : String = ""; var gmWt2 : Double = 0.0; var gmWtDesc2 : String = ""; } |
the result of the tranformation is kept in an instance of
1 2 3 4 5 6 7 8 9 10 |
/** * It contains Kilo calories for 100g, and common house hold weights */ data class AbbrevKcal(val ndbNo: String, val energKcal: Int, val gmWt1Kcal: Double, val gmWt2Kcal: Double) { } |
and the transformation is done by
1 2 3 4 5 6 |
fun convert(abbrev: Abbrev): AbbrevKcal = AbbrevKcal( abbrev.ndbNo, abbrev.energKcal, abbrev.energKcal * abbrev.gmWt1 / 100, abbrev.energKcal * abbrev.gmWt2 / 100) |
I used kotlin to implement needed data classes because in kotlin is very easy to define them and it is also very easy to mix kotlin with java. I used jmh to create the benchmark and the code is below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Thread) public class KCalConverter { List<Abbrev> abbrevs = new StreamAbbrevParser().parseFile("/sr28abbr/ABBREV.txt"); @Benchmark public void baseline() { } @Benchmark public void forEachConvert(Blackhole blackhole) { List<AbbrevKcal> abbrevKcals = new ArrayList<>(abbrevs.size()); for(Abbrev abbrev : abbrevs) { abbrevKcals.add(ConverterKt.convert(abbrev)); } blackhole.consume(abbrevKcals); } @Benchmark public void sequentialStreamConvert(Blackhole blackhole) { List<AbbrevKcal> abbrevKcals = abbrevs.stream() .map(ConverterKt::convert) .collect(Collectors.toList()); blackhole.consume(abbrevKcals); } @Benchmark public void parallelStreamConvert(Blackhole blackhole) { List<AbbrevKcal> abbrevKcals = abbrevs.parallelStream() .map(ConverterKt::convert) .collect(Collectors.toList()); blackhole.consume(abbrevKcals); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(KCalConverter.class.getSimpleName()) .warmupIterations(25) .measurementIterations(100) .forks(1) .build(); new Runner(opt).run(); } } |
I was ready to see impressive results for parallel processing but I was disappointed because it looked like:
1 2 3 4 5 6 |
Benchmark Mode Cnt Score Error Units KCalConverter.baseline avgt 100 0.324 ± 0.001 ns/op KCalConverter.forEachConvert avgt 100 184953.601 ± 988.417 ns/op KCalConverter.parallelStreamConvert avgt 100 167660.485 ± 1993.920 ns/op KCalConverter.sequentialStreamConvert avgt 100 189933.029 ± 2779.487 ns/op |
After few minutes of thought I realised that the actual processing is too simple and it runs fast even in sequential way. I artficially “improved” it to use BigDecimal
:
1 2 3 4 5 6 |
fun complexConvert(abbrev: Abbrev): AbbrevKcal = AbbrevKcal( abbrev.ndbNo, abbrev.energKcal, BigDecimal(abbrev.energKcal).multiply(BigDecimal(abbrev.gmWt1)).divide(BigDecimal(100)).toDouble(), BigDecimal(abbrev.energKcal).multiply(BigDecimal(abbrev.gmWt2)).divide(BigDecimal(100)).toDouble()) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Thread) public class KCalConverter { List<Abbrev> abbrevs = new StreamAbbrevParser().parseFile("/sr28abbr/ABBREV.txt"); @Benchmark public void baseline() { } @Benchmark public void forEachConvert(Blackhole blackhole) { List<AbbrevKcal> abbrevKcals = new ArrayList<>(abbrevs.size()); for(Abbrev abbrev : abbrevs) { abbrevKcals.add(ConverterKt.convert(abbrev)); } blackhole.consume(abbrevKcals); } @Benchmark public void forEachComplexConvert(Blackhole blackhole) { List<AbbrevKcal> abbrevKcals = new ArrayList<>(abbrevs.size()); for(Abbrev abbrev : abbrevs) { abbrevKcals.add(ConverterKt.complexConvert(abbrev)); } blackhole.consume(abbrevKcals); } @Benchmark public void sequentialStreamConvert(Blackhole blackhole) { List<AbbrevKcal> abbrevKcals = abbrevs.stream() .map(ConverterKt::convert) .collect(Collectors.toList()); blackhole.consume(abbrevKcals); } @Benchmark public void sequentialStreamComplexConvert(Blackhole blackhole) { List<AbbrevKcal> abbrevKcals = abbrevs.stream() .map(ConverterKt::complexConvert) .collect(Collectors.toList()); blackhole.consume(abbrevKcals); } @Benchmark public void parallelStreamConvert(Blackhole blackhole) { List<AbbrevKcal> abbrevKcals = abbrevs.parallelStream() .map(ConverterKt::convert) .collect(Collectors.toList()); blackhole.consume(abbrevKcals); } @Benchmark public void parallelStreamComplexConvert(Blackhole blackhole) { List<AbbrevKcal> abbrevKcals = abbrevs.parallelStream() .map(ConverterKt::complexConvert) .collect(Collectors.toList()); blackhole.consume(abbrevKcals); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(KCalConverter.class.getSimpleName()) .warmupIterations(25) .measurementIterations(100) .forks(1) .build(); new Runner(opt).run(); } } |
and the results are:
1 2 3 4 5 6 7 8 9 |
Benchmark Mode Cnt Score Error Units KCalConverter.baseline avgt 100 0.324 ± 0.001 ns/op KCalConverter.forEachComplexConvert avgt 100 11306389.001 ± 61637.238 ns/op KCalConverter.forEachConvert avgt 100 184953.601 ± 988.417 ns/op KCalConverter.parallelStreamComplexConvert avgt 100 3902042.888 ± 60230.745 ns/op KCalConverter.parallelStreamConvert avgt 100 167660.485 ± 1993.920 ns/op KCalConverter.sequentialStreamComplexConvert avgt 100 12033879.972 ± 129120.560 ns/op KCalConverter.sequentialStreamConvert avgt 100 189933.029 ± 2779.487 ns/op |
In this case the implementation with parallel stream is ~ 3 times faster than the sequential implementations. Also there is no significant difference between fore-each loop and sequential stream processing.
My conclusions after this test are to prefer cleaner code that is easier to understand and to always measure when in doubt.