Performance Of Mutable HashSet
In June 2013 Softage find out that mutable Hashset in Scala language has not satisfiable performance. We decided to start a research project to find out how it can be optimized.
First part of research was to analyze current implementation of Scala mutable HashSet and other possible implementations. During that part we were looking into Scala collection library implementation. We found that current solution is using strategy called open addressing. When Java is using separate chaining.
Second part was creation of benchmark to compare performance and memory. Was checked several frameworks like JMH, Caliper and ScalaMeter, and after analyzing the pros and cons for this task was selected JMH. On base of that framework was created several benchmarks: for most regular data (Strings and Integers) and for boundary data with a lot of collisions.
And last part was creating simple separate chaining implementation with List in buckets and implementation with more composite buckets to compare with current Scala HashSet.
As a main result we find out that in most of cases current solution is slowest one. For regular data simple separate chaining solution is fastest, but with composite buckets we can have better performance for some boundary data. So one of the best solution is in Java 8 when we have simple buckets before some threshold and composite bucket after some amount of collisions.
Tools & Technologies
Scala, Java, JMH, Caliper, ScalaMeter