Toggle life expectancy and use cases

Toggle life expectancy and use cases#

Status#

Accepted

Context#

At this time, our toggle annotations have the following problems:

  • The toggle life expectancy isn’t as clear as it could be.

  • The justification for permanent toggles is not clear.

  • Our use of toggle use cases is also unclear.

The following provides some historical background.

As of the writing of this decision, OEP-17 on Feature Toggles:

The initial attempt to meet this requirement led to the toggle_use_cases annotation with the following options:

  • incremental_release, launch_date, monitored_rollout, graceful_degradation, beta_testing, vip, opt_out, opt_in, open_edx

In a PR discussion about toggle_use_cases, the following proposal was made:

  1. Add toggle_life_expectancy with values of “temporary” or “permanent”.

  2. Make toggle_use_cases required when toggle_life_expectancy was “permanent”.

The second point was proposed to force people to be more intentional around “permanent” toggles, so someone could not simply mark a toggle as “permanent” to avoid the clean-up required for “temporary” toggles without hopefully being a bit more intentional.

In the PR to implement this proposal, the solution was tweaked based on some lost decisions. The update adjusted the toggle_uses_cases to the following values:

  • temporary: A new use case that replaced all uses case options that were considered temporary.

  • circuit_breaker: Replaced “graceful_degradation”, although the OEP has not yet been updated.

  • vip, opt_out, opt_in, open_edx: The other original permanent use case options.

The current implementation has the following flaws:

  • Defining “permanent” as the absence of “temporary” reduces clarity for “permanent” toggles.

  • The current choices for toggle_use_cases aren’t optimal.

    • The options “circuit_breaker” and “opt_out” (“opt_in”) seem like two types of toggles. The former is ops related, and the latter is business related.

      • Note that “opt_out” and “opt_in” only differ in language based on the default of the toggle. For example, if the default is True, then “opt_out” linguistically makes sense as the purpose of the toggle, and if the default is False, “opt_in” makes more sense.

    • The options “vip” vs “open_edx” seem more about the scope at which one might apply the ability to opt-out/opt-in. However, this scope may vary depending on the Open edX instance, and it may not make sense to document the scope selected for edX.org. Additionally, the toggle_implementation annotation may duplicate this information, assuming the scoping features of the implementation match the intent.

    • In practice, it seems that “open_edx” is often chosen as a general option for permanent toggles, and negates the intent to use toggle_use_cases to force intentional design and consideration for permanent toggles.

If a stakeholder wanted to review our “permanent” toggles to determine which really must be permanent, and which could potentially be deprecated/removed:

  • Most permanent toggles have .. toggle_use_cases: open_edx, which provides no information about the confidence around it being a permanent toggle.

  • If confidence were boosted using additional research or a failed attempt to DEPR, there would be no consistent way to document this change in confidence, other than adding text to the toggle_description and hoping it would be seen.

Decision#

In order to make the toggle life expectancy more clear, and the justification for permanent toggles more clear, we will take the following actions:

  • As originally proposed, introduce toggle_life_expectancy with values of “temporary” or “permanent”. This makes the options very clear.

  • Replace toggle_use_cases with a free-form toggle_permanent_justification.

    • toggle_permanent_justification would be required for “permanent” toggles.

    • toggle_target_removal_date would be required for “temporary” toggles (as was the case with the earlier version of “temporary” toggles).

    • The value would be a string description justifying why the toggle is marked as permanent.

    • The purpose of the annotation would be very clear, and will hopefully deter declaring a toggle as “permanent” that really could be “temporary”.

    • Authors could reference use cases as required to provide justification.

    • Legacy permanent toggles could use a value like: “Seems permanent, but toggle defined before justification was required.” This would make it easier to tell the difference between a toggle marked “permanent” with a strong or weak justification.

    • This annotation would not be included in the published documentation, but used for reporting purposes only.

    • “Permanent” toggles with a weak justification could still be candidates for a DEPR(ecation) process to either remove the toggle or strengthen the justification.

  • We can remind authors to include use cases in the toggle_description as well, if it aids documentation for operators or other audiences.

Consequences#

  • Toggle life expectancy and justification should be more clear.

  • Toggle use cases can be noted in other free text toggle annotations, rather than having its own annotation.

  • If a stakeholder wanted to review our “permanent” toggles to determine which really must be permanent, and which could potentially be deprecated/removed:

    • They might see toggle_permanent_justification with any of the following example values:

      • “See ADR detailing why this toggle must be permanent: …”

      • “See OSPR that added this feature and explains why it is needed for different Open edX instances: …”

      • “Tried DEPR-XXX, but got push back because …”

      • “None” (An example of how people work around required fields.)

      • “Seems permanent, but toggle defined before justification was required.”

    • For each of these, the stakeholder could decide whether the justification is strong enough, or whether to initiate DEPR to test the strength of the justification.

    • Stakeholders could also add a prefix like “[Director Approved]”, or something else that hopefully won’t get copy/pasted, so that the same toggles don’t need to be reviewed multiple times once determined to have a strong justification.

  • A rollout plan is required for annotation changes. We can use optional code-annotations to expand with new annotations before contracting (removing toggle_use_cases).

    • The longer we wait for the expand phase, the longer we lose useful justification data for new toggles. The updates also may affect more repos over time.

    • Linting should be updated as required as part of the contraction phase of the rollout plan.

  • The how-to documenting_new_feature_toggles.rst should be updated as necessary.

  • OEP-17 should be updated to reflect these choices. Note that the OEP is outdated in other ways as well.