Analytics

5. Analytics#

The edX LMS and Studio are instrumented to enable tracking of metrics and events of interest. These data can be used for educational research, decision support, and operational monitoring.

The primary mechanism for tracking events is the Event Tracking API. It should be used for the vast majority of cases.

5.1. Event Tracking#

The event-tracking library aims to provide a simple API for tracking point- in-time events. The event-tracking documentation summarizes the features and primary use cases of the library as well as the current and future design intent.

5.1.1. Emitting Events#

Emitting from server-side python code:

from eventtracking import tracker
tracker.emit('some.event.name', {'foo': 'bar'})

Emitting from client-side JavaScript:

Logger.log 'some.event.name', 'foo': 'bar'

5.1.2. Request Context Middleware#

The platform includes a middleware class that enriches all events emitted during the processing of a given request with details about the request that greatly simplify downstream processing. This is called the TrackMiddleware and can be found in edx-platform/common/djangoapps/track/middleware.py.

5.1.2.1. Naming Events#

Event names are intended to be formatted as . separated strings and help processing of related events. The structure is intended to be namespace.object.action. The namespace is intended to be a . separated string that helps identify the source of events and prevent events with similar or identical objects and actions from being confused with one another. The object is intended to be a noun describing the object that was acted on. The action is intended to be a past tense verb describing what happened.

Examples:

edx.course.enrollment.activated

Namespace: edx

Object: course.enrollment

Action: activated

5.1.2.2. Choosing Events to Emit#

Consider emitting events to capture user intent. These will typically be emitted on the client side when a user interacts with the user interface in some way.

Consider also emitting events when models change. Most models are not append- only and it is frequently the case that an analyst would want to see the value of a particular field at a particular moment in time. Given that many fields are overwritten, that information is lost unless an event is emitted when the model is changed.

Consider the source of the event. Events sent from the client can be much more easily lost, manipulated, or blocked by browser extensions. Business critical events such as enrollments or grading should always be emitted from the server. User interface events, such as video interaction, have less inherent value to an adversary and are fine to trust to the client.

5.1.2.3. Sensitive Information#

By default, event information is written to an unencrypted file on disk. Therefore, do not include clear text passwords, credit card numbers, or other similarly sensitive information.

5.1.2.4. Size#

A cursory effort to regulate the size of the event is appreciated. If an event is too large, it may be omitted from the event stream. However, do not sacrifice the clarity of an event in an attempt to minimize size. It is much more important that the event is clear.

5.1.2.5. Debugging Events#

On devstack, emitted events are stored in the /edx/var/log/tracking.log log file. On Tutor dev or local the file is stored on the host computer at $(tutor config printroot)/data/lms/logs/tracking.log and $(tutor config printroot)/data/cms/logs/tracking.log

This file can be useful for validation and debugging.

5.1.2.6. Testing Event Emission#

It is important to test instrumentation code since regressions can have a significant negative impact on downstream consumers of the data.

Each test can make stronger or weaker assertions about the structure and content of various fields in the event. Tests can make assertions about particular fields in the nested hierarchical structures that are events. This allows them to continue to pass even if a new global field is added to all events (for example).

Failing tests are a form of communication with future developers. A test failure is a way for you to tell those future developers that they have changed something that you did not expect to change, and that there are implications that they should think about carefully before making the change. For this reason, limit the scope of your test to details you expect to remain constant. Specifically, for eventing, this means only asserting on the presence and correctness of fields your code is adding, not the precise set of fields that happen to be present in all events today.

In general, it is acceptable for events to contain “unexpected” fields. If you add a field, most JSON parsers will accept this new field and allow the downstream code to process the event. Since that downstream code does not know about the new field it will simply be ignored.

For this reason, most of our tests do not actually make assertions about unexpected fields appearing in the events, instead they focus on the fields that they do expect to be present and make assertions about the values of these fields. This enables us to add global context without having to update hundreds (or even thousands) of tests that were making assertions about the exact set of fields present in the event. Instead, we prefer to only have a small number of tests fail when making a change like this. Those tests might be making more strict assertions about the global context, for example. When a small number of targeted tests fail, they can be more effective at communicating the exact set of assumptions that were being made before that have now changed.

5.1.2.6.1. Assertions#

The openedx.core.lib.tests.assertions.events module contains helper functions that can be used to make assertions about events. The key function in this module is assert_event_matches which allows tests to make assertions about parts of the event. The signature looks like this:

def assert_event_matches(expected_event, actual_event, tolerate=None):

The expected_event parameter contains the assertion that is being made. The actual_event parameter contains the complete event that was emitted. The tolerate parameter allows the test to specify the types of discrepancies that it cares about. This allows you to be very strict in assertions about some parts of the event and more lenient in other areas.

Here are examples that highlight the default settings for tolerate.

# By default, decode string values for the "event" field as JSON and compare
# the contents with the actual event. This will not raise an error.
assert_event_matches(
    {'event': {'a': 'b'}},
    {'event': '{"a": "b"}'}
)

# Ignore "unexpected" root fields. This will not raise an error even though
# the field "foo" does not appear in the expected event.
assert_event_matches(
    {'event_type': 'test'},
    {'event_type': 'test', 'foo': 'bar'}
)

# Ignore "unexpected" fields in the context. This will not raise an error
# even though the field "foo" does not appear in the expected event context.
assert_event_matches(
    {'event_type': 'test'},
    {'event_type': 'test', 'context': {'foo': 'bar'}}
)

# Overriding "tolerate" allows more strict assertions to be made.
# This assertion will raise an error!
assert_event_matches(
    {'event_type': 'test'},
    {'event_type': 'test', 'context': {'foo': 'bar'}},
    tolerate=[]
)

5.1.2.6.2. Unit testing#

Test classes should inherit from common.djangoapps.track.tests.EventTrackingTestCase. Additionally, some helper assertion functions are available to help with making assertions about events.

Here is an example of a subclass.

from common.djangoapps.track.tests import EventTrackingTestCase
from openedx.core.lib.tests.assertions.events import assert_event_matches

class MyTestClass(EventTrackingTestCase):

    def setUp(self):
        # The setUp() of the superclass must be called
        super(MyTestClass, self).setUp()

    def test_event_emitted(self):
        my_function_that_emits_events()

        # If the above function only emits a single event, this can be used.
        actual_event = self.get_event()

        # This will assert that the "event_type" of the event is "foobar".
        # Note that it makes no assertions about any of the other fields
        # in the event.
        assert_event_matches({'event_type': 'foobar'}, actual_event)

    def test_no_event_emitted(self):

        my_function_that_does_not_emit()

        # This will fail if any events were emitted by the above function
        # call.
        self.assert_no_events_emitted()

5.1.3. Documenting Events#

When you add events to the platform, your PR should describe the purpose of the event and include an example event. In addition, consider including comments that identify the purpose of the event and its fields. Your descriptions and examples can help assure that researchers and other members of the open edX community understand your intent and use the events correctly.

You must also update the Event Reference documentation to include your changes. These documents are highly valuable for instructors, researchers, site operators, and others who use the tracking logs.

The Open edX Developer Guide provides guidance for contributing to the Open edX project.

5.1.4. Legacy Application Event Processor#

In order to support legacy analysis applications, the platform emits standard events using eventtracking.tracker.emit(). However, it uses a custom event processor which modifies the event before saving it to ensure that the event can be parsed by legacy systems. Specifically, it replicates some information so that it is accessible in exactly the same way as it was before. This state is intended to be temporary until all existing legacy systems can be altered to use the new field locations.

5.2. Other Tracking Systems#

The following tracking systems are currently used for specialized analytics. There is some redundancy with event-tracking that is undesirable. The event- tracking library could be extended to support some of these systems, allowing for a single API to be used while still transmitting data to each of these service providers. This would reduce discrepancies between the measurements made by the various systems and significantly clarify the instrumentation.

5.2.1. Segment#

A selection of events can be transmitted to Segment in order to take advantage of a wide variety of analytics-related third party services such as Mixpanel and Chartbeat. It is enabled in the LMS if the SEGMENT_KEY key is set to a valid Segment API key in the lms.yml file. Additionally, the setting EVENT_TRACKING_SEGMENTIO_EMIT_WHITELIST in the lms.yml file can be used to specify event names that should be emitted to Segment from normal tracker.emit() calls. Events specified in this whitelist will be sent to both the tracking logs and Segment. Similarly, it is enabled in Studio if the SEGMENT_KEY key is set to a valid Segment API key in the studio.yml file.

5.2.2. Google Analytics#

Google analytics tracks all LMS page views. It provides several useful metrics such as common referrers and search terms that users used to find the edX web site.

5.2.3. Deprecated APIs#

The track djangoapp contains a deprecated mechanism for emitting events. Direct usage of server_track is deprecated and should be avoided in new code. Old calls to server_track should be replaced with calls to tracker.emit(). The celery task-based event emission and client-side event handling do not currently have a suitable alternative approach, so they continue to be supported.

Feedback