LinkedIn recently published how it handles overload detection and remediation in its Java-based microservices. LinkedIn’s solution, Hodor, provides an adaptive solution that works out of the box with no configuration, InfoQ.
The company has developed a standard framework for Java-based services that provides Holistic Overload Detection and Overload Remediation, aka “Hodor.” It is designed to detect service overload caused by multiple root causes, and to automatically remediate the problem by dropping just enough traffic to allow the service to recover, and then maintaining an optimal traffic level to prevent reentering overload, according to a LinkedIn’s blog post.
The service is designed for a wide range of different types of overload. The most obvious ones revolve around physical resource limits such as CPU and memory exhaustion, and I/O limits for network and disk access. There are also virtual resource limits such as execution threads, pooled DB connections, or semaphore permits.
These limits may be exceeded due to increases in traffic to the service, though they can also be reached when latencies of downstream traffic increase, which can cause the number of concurrent requests being handled in the local service to increase with no change to the incoming request rate as noted in the article.
Hodor’s services include framework overview; detecting CPU overload; shedding requests when overloaded; testing and rollout and more.