Performance comparison between different kinds of string concatenation/formatting in Java/Scala
String concatenation is a basic building block in every modern programming language. Many different projects, especially in Web, produce a lot of strings. And it’s interesting, how this problem is solved in different languages. In our case, in Java and Scala (because it’s much easier to compare, and these I use day-to-day).
There are a lot of posts on Internet about this topic. But still, there are something interesting to say about it (I hope you won’t be bored). Yes, this is about performance and microbenchmarking, I know.
Basically, this post is about how to make robust string concatenation (with macros!) and performance comparison. You may skip this big part right to Graphs and Conclusion.
What is string concatenation?
In JVM String object is immutable. If you need to create a new String with a content of two other strings, you need to create a third one. In Java you may concatenate not only string but any objects — Java compiler will convert an object to its string representation automatically. Scala compiler also does it.
What will compiler do, when we write such code in Java:
This code will be converted to this code:
Nothing new, it’s well-known thing. But, it’s important to understand, what is happening there:
- new StringBuilder creates a class with a char array of 16 elements.
- append method converts an argument to chars and stores into an internal array. If an array is not big enough, it will create a bigger array.
- toString creates a String object, but char array is not shared between StringBuilder and this new String object (because it’s mutable in StringBuilder).
So, in order to concatenate 2 objects, we need to create an intermediate object, big enough to store the string representation of our 2 objects.
How to improve default concatenation
We have 2 heavy operations from the list above: allocation of a buffer in StringBuilder (and reallocations!) and a creation of String (making copy of StringBuilder’s buffer).
Let’s put some efforts to deal with it. Because I had a lot of experience in .NET, I know, how string concatenation is implemented there.
What’s about C#.NET?
In C# the same problem solved in a slightly different way. Actually, many problems in .NET solved slightly differently.
This concatenation will be converted by the compiler to call to static method Concat (yes, in .NET they use CamelCase instead of camelCase in Java, yet another little difference):
And inside Concat we may see, how it works:
What is interesting here is resulting string length precomputation. They create a char buffer of resulting size, fill it and use in String object without copying. Obviously, this should work much faster (and it’s).
Let’s try to do something similar in Scala.
Optimized string concatenation
So, what do we need is precompute a StringBuilder length to avoid any reallocations.
This is a version for 6 arguments (we may generate as many as we want). Precalculation of a StringBuilder buffer size should give us some performance boost (we will see it later).
Let’s take a look at StringBuilder.toString implementation:
And in String’s constructor we see:
Aha! Arrays.copyOfRange creates a new char and copy data there. We know for sure, that char array will be filled up and we can use it directly (we’re good ppl, we won’t modify it after string creation). In String class, there is a constructor, that don’t do copy, but… it’s package-private. Meaning, we can’t use it. Or can we?
We may try to use fast reflection (MethodHandle) to access to private constructor! Looks pretty easy:
Spoiler: it doesn’t work! Details below.
So, that’s it, no more optimizations? Of course not! What else to we do in concatenation? The toString method calls for each argument. This also could be optimized. How?
Imagine, you pass an integer there, which is primitive. In order to pass it, you need to box it and then we call Integer.toString method. What’s wrong with toString? It allocates new memory. StringBuilder has an optimization for appending primitive values, it calculates how many chars needed to print an integer (in order to reallocate internal buffer, if necessary) and then uses internal method Integer.getChars, passing internal char array there. So, there are no additional allocations for printing integer.
Having such generic method, we can’t utilize StringBuilder power to not do unnecessary allocations. But, we’re in Scala world, we may use macroses! The very good explanation is in this StackOverflow topic.
Scala String Interpolation
But before I move to macros implementation of string concatenation, let’s take a look at Scala’s way of string formatting which is called String Interpolation:
This is an equivalent of Java of Scala simple string concatenation, but looks much better:
Scala compiler transforms such interpolation to this code:
Oh no. Again, more allocations? Yes. At least 2: one for an array of string constants and another for an array of arguments. And also, inside method s, there is a processing of string constants (yes, on each call). Doesn’t look as highly optimal.
Macros to rule them all
We may create a macros, that will generate another code for string interpolation. We still can use this neat syntax, but do whatever we want to concatenate these string constants and arguments.
For example, we can precalculate length of StringBuilder, and then append arguments there and be sure, that there won’t be any other allocation.
The code of macros is relatively small (around 100 lines), but it’s hard to demonstrate it well. I will explain what it does. This code:
Will be transformed to this code:
Important parts of generated code:
- sfi stands for “super fast interpolation” :)
- Precalculated length constant — 12 — consists of 3 characters for strings “a”, “b” and “c” and 9 characters allocated for int (value1). It may be not enough (for bigger int), but this is a simplification for now.
- We don’t box integer and don’t call toString, but using append method overload for int. It should be faster.
Obviously, macros could be much richer — we can calculate exactly length of integer (and all primitive values) to not over/under allocate buffer. We may determine whether it’s safe to not create local variables (if it’s already local variable or field) to reduce bytecode size.
But, for testing purposes, I think it’s good enough.
Let’s microbenchmark it!
Basically, I will test this case:
With various input arguments, resulting string will be of 7, 17, 29, 75, 212, 1004 and 1006 characters long, which should cover a lot of use cases.
What to test
Besides various input arguments there will be benchmarks for different kinds of concatenation/formatting (all listed in this test):
- Java concatenation, as in code fragment above;
- String.format method;
- slf4j message formatter (just because it’s a popular logging framework);
- Scala concatenation;
- s string interpolator;
- f string interpolator;
- raw string interpolator;
- optimized concatenation 1 (with length precalculated);
- optimized concatenation 2 (without copying of char array);
- optimized concatenation via string interpolation + macros (soInterpolator);
- super fast interpolation via macros (sfiInterpolator).
Charts and tables
As you can see, optimized concatenation works better on strings longer than 16 characters. String interpolations, based on macros (so and sfi), are the best. Scala’s interpolations (s, f and raw) are slow, its performance lower than regular string concatenation and approximately the same as slf4j.
The more convenient chart is here.
Data table for a resulting string length 212:
Benchmark Score Error Units javaConcat 165.320 ± 0.578 ns/op stringFormat 1364.206 ± 19.899 ns/op messageFormat 1370.831 ± 11.360 ns/op scalaConcat 169.005 ± 1.481 ns/op concatOptimized1 94.336 ± 1.419 ns/op concatOptimized2 131.683 ± 4.288 ns/op concatOptimizedMacros 68.702 ± 3.256 ns/op slf4j 275.377 ± 10.165 ns/op sInterpolator 268.072 ± 12.385 ns/op fInterpolator 1375.487 ± 10.255 ns/op rawInterpolator 271.416 ± 1.797 ns/op sfiInterpolator 58.674 ± 0.562 ns/op
s interpolator is slow
For me it was unexpected. I thought, such generic thing should work as fast as possible, but…
Now, after my research, I understand, why it works so poorly. But I don’t understand, why it wasn’t implemented on compiler level, or at least with macros.
Additionally, I’ve performed couple more tests:
And the results:
Benchmark Score Error Units EmptyStringBenchmark.baseline 2.899 ± 0.011 ns/op EmptyStringBenchmark.sInterpolator 36.181 ± 0.083 ns/op <--- EmptyStringBenchmark.sfiInterpolator 2.895 ± 0.009 ns/op ConstStringBenchmark.baseline 2.897 ± 0.008 ns/op ConstStringBenchmark.sInterpolator 48.307 ± 0.112 ns/op <--- ConstStringBenchmark.sfiInterpolator 2.892 ± 0.006 ns/op
Besides allocations of arrays, sInterpolator also processes constant strings (non-arguments) to replace special characters. So, for constant string s“a\tb” it will be even worse.
Macros do that processing during compilation time.
Other Scala interpolators
According to documentation, fInterpolator is based on String.format. And its performance actually correlates.
rawInterpolator don’t do any string processing, that’s why it’s slightly better than sInterpolator.
MethodHandle trick is not working
At the beginning, I’ve trusted to StackOverflow response about the performance of MethodHandle. But the only thing I noticed is a performance degradation between concatOptimized1 and concatOptimized2 (StringBuilder.toString won over new String(shared_char_array)).
So, I’ve tested MethodHandle performance by myself, and I got “good” results:
Benchmark Score Error Units baseline 2.907 ± 0.021 ns/op fastSb 8.213 ± 0.088 ns/op fastString 5.129 ± 0.568 ns/op newString 246.314 ± 3.909 ns/op sbToString 253.588 ± 7.551 ns/op
fastString is an invocation of private constructor of String and newString is an invocation of public constructor of String, that copies input char array.
But when I tested it on code like:
I received performance degradation. Apparently, JVM doesn’t optimized it in this case. Sadly.
But there is a lesson learned: don’t trust in microbenchmarks — verify!
Scala’s StringBuilder is slower
I even wrote a small post about it. Despite the fact, that it’s a wrapper of java.lang.StringBuilder, it’s significantly slower.
Btw, inside sInterpolator Java’s StringBuilder is used. Apparently, they knew it! :)
“They” know that java.lang.StringBuilder is better
Scala concatenation (+ operator) works at the same performance as Java concatenation since 2.12.
And also sInterpolator uses java.lang.StringBuilder inside.
Better performance as I expected for sfiInterpolator
Inside sfiInterpolator implementation, I’ve decided to make it simple, and I am not calculating the precise length for integer and just preallocate 9 characters. And I’ve made a single test with Int.MAX_VALUE, which requires 10 characters. I expected some significant performance degradation since we need to reallocate buffer from 1004 to 2008 chars (yes, I actually verified, that this reallocation happened). But, according to benchmarks, there wasn’t any degradation. Which is really strange.
Performance optimization is always a fun journey (at least for me): you read code, reason about it, try to write an alternative solution, measure, analyze. But, it’s not fun to realize, that such basic things aren’t optimized as it could be. I don’t know the motivation (maybe just lack of time), but I’m disappointed a bit.
Anyway, Scala has a powerful tool (macros), and it could be fixed.
- Ubuntu 16.04
- Linux Kernel 4.4.0–51-generic
- JDK 1.8.0_91
- Scala Library 2.12
- Hardware is Intel® Core™ i7–5600U CPU @ 2.60GHz × 4 (2 core + 2 HT) with 16 GB RAM
As always, all code is on GitHub. Graphs and tables on GitHub.io. Comments, suggestions and blaming are welcome.
Originally posted on Medium.