2. Kafka-Based Event Bus#

Status#

Accepted

Context#

The draft OEP-52: Event Bus Architecture explains how the Open edX platform would benefit from an event bus, as well as providing some additional decisions around the event bus. One decision is to enable the event bus technology to be pluggable through some abstraction layer.

This still requires selecting a specific technology for the first implementation.

Decision#

An initial implementation of an event bus for the Open edX platform will be implemented using Kafka. This implementation will be used by edx.org, and available to the Open edX community.

This decision does not preclude the introduction of alternative event bus implementations based on other technologies in the future.

Why Kafka?#

Kafka is a distributed streaming platform. Kafka’s implementation maps nicely to the pub/sub pattern. However, some native features of a message broker are not built-in.

Kafka has been around for a long time. See Thoughtworks’s technology radar introduced Kafka as “Assess” in 2015, and “Trial” in 2016. It never moved up to “Adopt”, and also never moved down to “Hold”. Read Thoughtwork’s Kafka decoder page to learn more about its benefits and trade-offs, and how it is used.

More recently, the Thoughtworks’s technology radar introduced Apache Pulsar as “assess” in 2020, and the technology radar introduced Kafka API without Kafka in 2021. This both demonstrates the de facto standard of the Kafka API, but also Thoughtwork’s hope to find a less complex alternative.

We believe Apache Kafka is still the right option due to its maturity, documentation, support and community.

Kafka Highlights#

Pros#

  • Battle-tested, widely adopted, big community, lots of documentation and answers.

  • Enables event replay-ability.

Cons#

  • Complex to manage, including likely manual scaling.

  • Simple consumers require additional code for some messaging features.

Consequences#

  • Operators will need to deploy and manage the selected infrastructure, which will likely be complex. If Apache Kafka is selected, there are likely to be a set of auxiliary parts to provide all required functionality for our message bus. However, third-party hosting is also available (see separate decision).

  • Most of the consequences of an event bus should relate to OEP-52: Event Bus Architecture more generally, and hopefully will not be Kafka specific.

Rejected Alternatives#

Apache Pulsar#

Although rejected for initial edx.org implementation, Apache Pulsar remains an option for those looking for an alternative to Kafka.

Pros#

  • Ease of scalability (built-in, according to docs).

  • Good data retention capabilities.

  • Additional built-in pub/sub features (built-in, according to docs).

Cons#

  • Requires 3rd party hosting or larger upfront investment if self-hosted (kubernetes).

  • Less mature (but growing) community, little documentation, and few answers.

  • Python built-in schema management is buggy and hard to work with for complex use cases.

Note: Read an interesting (Kafka/Confluent) biased article exploring comparisons and myths of Kafka vs Pulsar.

Redis#

Pros#

  • Already part of the Open edX platform.

Cons#

  • Can lose acked data, even if RAM backed up with an append-only file (AOF).

  • Requires homegrown schema management.

RabbitMQ#

Pros#

  • Built-in message broker capabilities like routing, filtering, and fault handling.

Cons#

  • Not built for message retention or message ordering.

AWS SNS/SQS#

Pros#

  • Simpler hosting for those self-hosting in AWS.

Cons#

  • Cannot be shared as an open source solution.

  • Events are not replayable.

Additional References#