Why we need automatic back pressure
At One2Team, we are developing a processing engine on Storm to deal with event related computations. We know that the events can be sparse or abundant depending on the load, so our system has to play nicely with bursts. If our processing units implemented in the bolts are too busy, the spout should slow down to avoid unnecessary fails and resend, this is the backpressure.
At first, we were running on a 0.10 Storm version and we though back pressure was already a built-in magical thing like TCP congestion control. We didn’t worry about it until we first ran tests with more data than the topology could handle at once. Unfortunately the magic didn’t show up, the spout went all out, and we ended up with a lot of failed tuples.
After a bit of research, we were very pleased to see that the fresh 1.0.0 stable release came with an automatic back pressure system. See the description on the release note webpage. Prior to that version, the only solution available to throttle the spouts was to set the topology.max.spout.pending property, but that solution seemed difficult to tune as it was dependent on the number of bolts, and would probably not work for concurrent spouts with shared bolts. The first thing we did was to happily upgrade our version of Storm, run the topology again, and wait for the magic to kick in. The results we got were pretty similar to the first ones, that’s when we knew we had to investigate on how to tune Storm’s back pressure system to fit it to our needs.
How the back pressure system works
The first step is to get a grasp of the mechanism that enables that back pressure feature to work. It relies on the message buffers to determine when the bolts are getting too much load. A bolt is run by an executor thread which has 2 message buffers to distribute and send messages to the application code:
- executor.receive.buffer.size (1024)
→ How many messages max we can have pending for a bolt
- executor.send.buffer.size (1024)
→ How many messages max can be buffered for emitting
To better understand Storm internal message buffers and how messages go around, I strongly recommend to read that blog article, it goes into details on the different kinds of buffers.
The back pressure system relies on how full the receive buffer size is for a bolt. This is why there is a notion of watermarks. The high and low watermarks define how full or how empty the buffer must be to throttle the spout or to restart it:
- disruptor.highwatermark (default 0.9)
→ For 0.9, send the « full » signal, and throttle the spout when the receive buffer of the bolt is 90% full
- disruptor.lowwatermark (default 0.4)
→ For 0.4, send the « not full » signal, and start the spout again when the receive buffer of the bolt drops below 40% of capacity
The following image was taken from the ticket STORM-886 and describes pretty well the mechanics of the back pressure.
How we tune it
The reason why the back pressure didn’t work out of the box with the default parameters is that we have a bolt that has a pretty long processing time, and usually takes 0.1 second to process a single message. During a peak of events, the spout was fast enough to fill the buffer of these slow bolts. When we kept the default size of 1024 messages, a bolt had to process more than 1000 messages in 0.1s each, which would add up to 100 seconds before the last message gets processed. That didn’t work well with the timeout of 30 seconds for a tuple in the topology, and it caused many messages to fail.
The main parameter we had to tune was the buffer size. Based on the assumptions that we don’t want to have too many fails, and that our topology is not latency sensitive, we agreed on the limit that a tuple shouldn’t wait more than 10 seconds in that executor receive buffer. This means we don’t want more than 100 messages in the buffer, then we can go for a buffer size of 64 or 128. As a rule of thumb, we align the value of the topology.executor.send.buffer.size with the one of the topology.executor.receive.buffer.size. For us, tuning the buffer to the adapted size was sufficient to get the backpressure to work. It throttles the spout when the bolts can’t keep up the pace.
We barely touched the watermark values, these seem to make sense only if you have specific considerations like:
- When the spout is throttling at 0.9, it’s too late, some of the tuples are still filling the buffers, lets reduce the high watermark to 0.8
- When the spout is throttled, and the messages are dropping under 0.4, the spout has some latency to fetch data and build new messages, that causes some bolts to be idle for a small moment, lets increase the low watermark to 0.5
The automatic back pressure in Apache Storm is a pretty nice feature that enables you to dynamically adapt the spout flow to your topology’s bottleneck. It requires a minimum of tuning of the buffer sizes if your topology doesn’t have default characteristics, and you might want to investigate further on the watermarks for finer grain tuning.