Any reactive system, by definition, wants to be responsive, resilient and elastic, and therefore can’t be anything but message-driven. Any action within such system is an asynchronous message that needs to be processed. This takes a form of a
Runnable: a chunk of data (message) embedded into a snippet of code, that needs to be executed to process the data.
To process the stream of
Runnables, at the heart of any reactive system, there is one or more execution contexts. An execution context, backed by some sort of worker pool, accepts the
Runnable in its inbound queue. When one of the context workers becomes available, the
Runnable is executed.
In Akka runtime, a
Runnable could be actor mailbox, processing one message, or just
ExecutionContext that handles Futures, or IO monad materialisation.
It is important to keep the number of workers to a healthy minimum: too many idle workers incurs overhead. Assigning them to CPU cores and removing one worker to assign another involves context switch and is even more expensive. Hence, in an optimised low-latency reactive system, limited number of workers are occupying available cores all the time, without switching, and just keep processing
Runnables as they come. They don’t compete for the cores and do not the switch the cores, either.
To illustrate it, the following ‘supermarket’ metaphor could be applied.
- worker threads are the cashiers
- CPU cores are the cash registers
If you are efficient supermarket manager, you want your customers to spend as little time in a queue as possible, and cashiers sitting at the cash registers, not swapping them.
You could achieve that by opening many cash registers and putting many cashiers at work, but these are expensive and when they are idle, it just wastes money. So instead, you just open several of them (one cashier at each), as a first guess, and watch the following parameters:
- how much time a customer waited in a queue before being served? (queue time)
- how much time a customer spent being served by cashier? (run time)
- how many workers are actually occupied serving customers? (active workers)
If too many customers are waiting, you may increase the number of open cash registers and cashiers, so that stream of your
Runnables is processed without any waiting. Or you can assume that during peak loads, some queue time is acceptable, as long as the peaks are gone quickly, and the queues dissolve. And by tuning down the number of the workers to the necessary minimum, you spare the resources for other uses.
There also can be situations when in an otherwise balanced shop, a customer is happen to have regular issues with her payment, and always needs to call her husband to top her card balance. This blocks the cashier — he can’t do anything but wait — until she could pay (as you may have guessed, in a reactive system, it is a
Runnable that behaves non-reactively, blocking thread for I/O or a mutex/lock). And if several such customers come at the same moment, a dreadful moment could happen: all cash registers are blocked and the queues starts piling up really quickly. We shouldn’t interrupt the execution, as it may cause yet more severe issues - but certainly a note must be made of that customer and her behaviour. By tracking such misbehaving non-reactive customers and correcting them, we could ensure that our cashiers are never blocked and relatively few of them serve vast numbers of happy reactive customers.
In a reactive actor-based system, actors are stateful entities, forming transaction boundaries and guarding state integrity.
To understand the behaviour of the system under load, the following characteristics of an actor (likely not each single one, but a class of actors or any other meaningful grouping, sharing similar logics) must be monitored:
- how many of a given actor (class) instances are active in memory and for how long they stay?
- how often are actors passivated (receive timeout triggered), as opposed to terminated explicitly?
- how much traffic (messages per second) does an actor (class) get?
- how much time does an actor (class) spend on processing a message?
- how many errors and unprocessed messages are there?
In addition, for persistent actors, that are recovered from durable storage:
- how many recovery events were replayed to recover actor (class) and how much time did it take?
- how much time a persistence actor (class) spends persisting an event?
This is simple, but rather essential information, that lets engineers identify poor performance, excess resource use and find root causes of incidents.
Existing solutions for Akka
Akka itself is free open source software at its core (Apache license), but to use the great Cinnamon telemetry library, commercial Lightbend subscription is required, which may not be affordable for every team.
As alternative, there is a metrics library called Kamon, using attached agent and bytecode instrumentation, to expose Akka metrics. Kamon is suitable for testing and moderate loads, but by nature of its instrumentation, it adds too much overhead that can’t be acceptable in high-load production environments.
Teams working with Akka usually chose to implement their own custom metrics for Akka in production.
Akka Sensors is a new free open source (MIT) Scala library that instruments JVM and Akka explicitly, using native Prometheus collectors, with very low overhead, for high-load production use.
The sources are published on Github.
It is a greenfield implementation, not based on either Cinnamon or Kamon.
While the library itself is new as a package, the approaches and techniques applied are distilled from many years of production experience, implementing ad-hoc custom Akka/Prometheus metrics development, and from some other open-source projects.
Key measurements performed on running actors, Akka dispatchers and Cassandra client. The configuration is inlined to application.conf.
type = "akka.sensors.dispatch.InstrumentedDispatcherConfigurator"
executor = "akka.sensors.dispatch.InstrumentedExecutor"
delegate = "fork-join-executor"
measure-runs = true
watch-long-runs = true
watch-check-interval = 1s
watch-too-long-run = 3s
An optional runnable watcher, configurable per dispatcher, keeps an eye on runnables, reporting stack traces of those rogue ones, hanging too long on non-reactive activities: e.g. waiting for locks, or doing blocking I/O. In the example above, the watcher will check every second for threads executed more than 3 seconds, and extract their stacktraces — to let you identify the line of code that waits on I/O or a lock.
Collected metrics are indeed not as extensive as with Cinnamon, at the moment, most notably lacking the automatic instrumentation for cluster/sharding and remote traffic between cluster nodes.
Akka appeared in 2009. At its core is an implementation of actor model, as known from Erlang, rewritten in Scala. Actors are stateful entities, communicating with each other asynchronously, by passing messages around. Each actor is guaranteed to process just one message a time, allowing for lock-free mutable state updates.
On top of actors keeping their state in memory, there is Akka Persistence, adding robust event sourcing, and Akka Cluster with Sharding to distribute persistent actors on available cluster nodes. Backed by scalable database (such as Cassandra) and scalable streaming (such as Kafka), the result is a suitable platform for nearly-infinite scalable system.
It took few years to mature into industrial quality software, and now Akka is being successfully used highly concurrent event processing systems across wide variety of industries: from gambling to banking, and from postal logistics to IoT — where each millisecond in latency matters, and data is extremely valuable. Scaling such a system to process 10x times the current load is solved by adding hardware, but, generally, without rewriting any code.
Prometheus is free open source (Apache) time-series database that is widely used to keep process metrics.
Prometheus collectors for JVM could be enhanced with any kind of metrics to collect, using very low-overhead concurrent JVM primitives.