openedx.core.djangoapps.content.search package

Contents

openedx.core.djangoapps.content.search package#

Subpackages#

Submodules#

openedx.core.djangoapps.content.search.api module#

Content index and search API using Meilisearch

class openedx.core.djangoapps.content.search.api.IndexDrift(exists: bool, is_empty: bool | None = None, primary_key_correct: bool | None = None, distinct_attribute_match: bool | None = None, filterable_attributes_match: bool | None = None, searchable_attributes_match: bool | None = None, sortable_attributes_match: bool | None = None, ranking_rules_match: bool | None = None)#

Bases: object

Represents the drift state of a Meilisearch index compared to the expected configuration.

distinct_attribute_match: bool | None#
exists: bool#
filterable_attributes_match: bool | None#
is_empty: bool | None#
property is_settings_drifted: bool#

True if any of the 5 settings fields is False (not None, but explicitly False).

primary_key_correct: bool | None#
ranking_rules_match: bool | None#
searchable_attributes_match: bool | None#
sortable_attributes_match: bool | None#
openedx.core.djangoapps.content.search.api.clear_meilisearch_client()#
openedx.core.djangoapps.content.search.api.delete_docs_with_context_key(key: OpaqueKey) None#

Delete all docs for given context key

openedx.core.djangoapps.content.search.api.delete_index_doc(key: OpaqueKey, *, delete_children: bool = False) None#

Deletes the document for the given XBlock from the search index

Parameters:

key (OpaqueKey) – The opaque key of the XBlock/Container to be removed from the index

openedx.core.djangoapps.content.search.api.fetch_block_types(extra_filter: str | list[str | list[str]] | None = None)#

Fetch the block types facet distribution for the search results.

This data may not always be 100% accurate / up to date because it’s based on the search index, so this should only be used for analysis/estimation purposes.

Params: - extra_filter: Filters the query. Example: [‘context_key = “course-v1:SampleTaxonomyOrg1+CC22+CC22”’]

Return example: {

… ‘estimatedTotalHits’: 5, ‘facetDistribution’: {

‘block_type’: {

‘html’: 2, ‘problem’: 1, ‘video’: 2,

}

},

}

openedx.core.djangoapps.content.search.api.force_array(extra_filter: str | list[str | list[str]] | None = None) list[str]#

Convert a filter value into a list of strings.

Strings are wrapped in a list, lists are returned as-is (cast to list[str]), and None results in an empty list.

Returns a Meilisearch API key that only allows the user to search content that they have permission to view

openedx.core.djangoapps.content.search.api.get_all_blocks_from_context(context_key: str, extra_attributes_to_retrieve: list[str] | None = None) Iterator[dict]#

Lazily yields all blocks for a given context key using Meilisearch pagination. Meilisearch works with limits of 1000 maximum; ensuring we obtain all blocks requires making several queries.

This data may not always be 100% accurate / up to date because it’s based on the search index, so this should only be used for analysis/estimation purposes.

openedx.core.djangoapps.content.search.api.index_course(course_key: CourseKey, index_name: str | None = None, status_cb: Callable[[str], None] | None = None) list[dict]#

Rebuilds the index for a given course.

openedx.core.djangoapps.content.search.api.init_index(status_cb: Callable[[str], None] | None = None, warn_cb: Callable[[str], None] | None = None) None#

This method is depricated as of Verawood and would be removed in the future release.

Initialize the Meilisearch index, creating it and configuring it if it doesn’t exist.

This is a compatibility wrapper around reconcile_index().

openedx.core.djangoapps.content.search.api.is_meilisearch_enabled() bool#

Returns whether Meilisearch is enabled

openedx.core.djangoapps.content.search.api.only_if_meilisearch_enabled(f)#

Only call f if meilisearch is enabled

openedx.core.djangoapps.content.search.api.rebuild_index(status_cb: Callable[[str], None] | None = None, incremental=False) None#

Rebuild the Meilisearch index from scratch

openedx.core.djangoapps.content.search.api.reconcile_index(status_cb: Callable[[str], None] | None = None, warn_cb: Callable[[str], None] | None = None) None#

Reconcile the Meilisearch index state.

Inspects the current Studio Meilisearch index and takes appropriate action based on its state: - Creates the index if missing. - Reconfigures if empty and drifted. - Applies updated settings if populated and drifted. - Recreates the index if primary key is mismatched (even if populated — data loss is unavoidable). - No-ops if everything is correctly configured.

This is the primary reconciliation entry point, called from post_migrate and init_index().

openedx.core.djangoapps.content.search.api.reset_index(status_cb: Callable[[str], None] | None = None) None#

Reset the Meilisearch index, deleting all documents and reconfiguring it

openedx.core.djangoapps.content.search.api.update_library_components_collections(collection_key: LibraryCollectionLocator, batch_size: int = 1000) None#

Updates the “collections” field for all components associated with a given Library Collection.

Because there may be a lot of components, we send these updates to Meilisearch in batches.

openedx.core.djangoapps.content.search.api.update_library_containers_collections(collection_key: LibraryCollectionLocator, batch_size: int = 1000) None#

Updates the “collections” field for all containers associated with a given Library Collection.

Because there may be a lot of containers, we send these updates to Meilisearch in batches.

openedx.core.djangoapps.content.search.api.upsert_content_library_index_docs(library_key: LibraryLocatorV2, full_index: bool = False) None#

Creates or updates the documents for the given Content Library in the search index

openedx.core.djangoapps.content.search.api.upsert_content_object_tags_index_doc(key: OpaqueKey)#

Updates the tags data in document for the given Course/Library item

openedx.core.djangoapps.content.search.api.upsert_item_collections_index_docs(opaque_key: OpaqueKey)#

Updates the collections data in documents for the given Course/Library block, or Container

openedx.core.djangoapps.content.search.api.upsert_item_containers_index_docs(opaque_key: OpaqueKey, container_type: str)#

Updates the containers (units/subsections/sections) data in documents for the given Course/Library block

openedx.core.djangoapps.content.search.api.upsert_library_block_index_doc(usage_key: UsageKey) None#

Creates or updates the document for the given Library Block in the search index

openedx.core.djangoapps.content.search.api.upsert_library_collection_index_doc(collection_key: LibraryCollectionLocator) None#

Creates, updates, or deletes the document for the given Library Collection in the search index.

If the Collection is not found or disabled (i.e. soft-deleted), then delete it from the search index.

openedx.core.djangoapps.content.search.api.upsert_library_container_index_doc(container_key: LibraryContainerLocator) None#

Creates, updates, or deletes the document for the given Library Container in the search index.

TODO: add support for indexing a container’s components, like upsert_library_collection_index_doc does.

openedx.core.djangoapps.content.search.api.upsert_xblock_index_doc(usage_key: UsageKey, recursive: bool = True) None#

Creates or updates the document for the given XBlock in the search index

Parameters:
  • usage_key (UsageKey) – The usage key of the XBlock to index

  • recursive (bool) – If True, also index all children of the XBlock

openedx.core.djangoapps.content.search.apps module#

Define the content search Django App.

class openedx.core.djangoapps.content.search.apps.ContentSearchConfig(app_name, app_module)#

Bases: AppConfig

App config for the content search feature

default_auto_field = 'django.db.models.BigAutoField'#
label = 'search'#
name = 'openedx.core.djangoapps.content.search'#
ready()#

Override this method in subclasses to run code when Django starts.

openedx.core.djangoapps.content.search.documents module#

Utilities related to indexing content for search

class openedx.core.djangoapps.content.search.documents.DocType#

Bases: object

Values for the ‘type’ field on each doc in the search index

collection = 'collection'#
course_block = 'course_block'#
library_block = 'library_block'#
library_container = 'library_container'#
class openedx.core.djangoapps.content.search.documents.Fields#

Bases: object

Fields that exist on the documents in our search index

access_id = 'access_id'#
block_id = 'block_id'#
block_type = 'block_type'#
breadcrumbs = 'breadcrumbs'#
child_display_names = 'child_display_names'#
child_usage_keys = 'child_usage_keys'#
collections = 'collections'#
collections_display_name = 'display_name'#
collections_key = 'key'#
containers_display_name = 'display_name'#
containers_key = 'key'#
content = 'content'#
context_key = 'context_key'#
created = 'created'#
description = 'description'#
display_name = 'display_name'#
id = 'id'#
last_published = 'last_published'#
modified = 'modified'#
num_children = 'num_children'#
org = 'org'#
problem_types = 'problem_types'#
publish_status = 'publish_status'#
published = 'published'#
published_content = 'content'#
published_description = 'description'#
published_display_name = 'display_name'#
published_num_children = 'num_children'#
sections = 'sections'#
sections_display_name = 'display_name'#
sections_key = 'key'#
subsections = 'subsections'#
tags = 'tags'#
tags_level0 = 'level0'#
tags_level1 = 'level1'#
tags_level2 = 'level2'#
tags_level3 = 'level3'#
tags_taxonomy = 'taxonomy'#
type = 'type'#
units = 'units'#
usage_key = 'usage_key'#
class openedx.core.djangoapps.content.search.documents.PublishStatus#

Bases: object

Values for the ‘publish_status’ field on each doc in the search index

modified = 'modified'#
never = 'never'#
published = 'published'#
openedx.core.djangoapps.content.search.documents.meili_id_from_opaque_key(key: OpaqueKey) str#

Meilisearch requires each document to have a primary key that’s either an integer or a string composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_). Since our opaque keys don’t meet this requirement, we transform them to a similar slug ID string that does.

In the future, with openedx_content’s data models in place for courseware, we could use PublishableEntity’s primary key / UUID instead.

openedx.core.djangoapps.content.search.documents.searchable_doc_collections(object_id: OpaqueKey) dict#

Given an XBlock, course, library, etc., get the collections for its index doc.

e.g. for something in Collections “COL_A” and “COL_B”, this would return:
{
“collections”: {

“display_name”: [“Collection A”, “Collection B”], “key”: [“COL_A”, “COL_B”],

}

}

If the object is in no collections, returns:
{
“collections”: {

“display_name”: [], “key”: [],

},

}

openedx.core.djangoapps.content.search.documents.searchable_doc_containers(object_id: OpaqueKey, container_type: str) dict#

Given an XBlock, course, library, etc., get the containers that it is part of for its index doc.

e.g. for something in Units “UNIT_A” and “UNIT_B”, this would return:
{
“units”: {

“display_name”: [“Unit A”, “Unit B”], “key”: [“UNIT_A”, “UNIT_B”],

}

}

If the object is in no containers, returns:
{
“sections”: {

“display_name”: [], “key”: [],

},

}

openedx.core.djangoapps.content.search.documents.searchable_doc_for_collection(collection_key: LibraryCollectionLocator, *, collection: Collection | None = None) dict#

Generate a dictionary document suitable for ingestion into a search engine like Meilisearch or Elasticsearch, so that the given collection can be found using faceted search.

If no collection is found for the given library_key + collection_key, the returned document will contain only basic information derived from the collection usage key, and no Fields.type value will be included in the returned dict.

openedx.core.djangoapps.content.search.documents.searchable_doc_for_container(container_key: ContainerKey) dict#

Generate a dictionary document suitable for ingestion into a search engine like Meilisearch or Elasticsearch, so that the given container can be found using faceted search.

If no container is found for the given container key, the returned document will contain only basic information derived from the container key, and some fields like Fields.display_name will be missing from the returned dict.

openedx.core.djangoapps.content.search.documents.searchable_doc_for_course_block(block) dict#

Generate a dictionary document suitable for ingestion into a search engine like Meilisearch or Elasticsearch, so that the given course block can be found using faceted search.

openedx.core.djangoapps.content.search.documents.searchable_doc_for_key(key: OpaqueKey) dict#

Generates a base document identified by its opaque key.

openedx.core.djangoapps.content.search.documents.searchable_doc_for_library_block(xblock_metadata: LibraryXBlockMetadata) dict#

Generate a dictionary document suitable for ingestion into a search engine like Meilisearch or Elasticsearch, so that the given library block can be found using faceted search.

Datetime fields (created, modified, last_published) are serialized to POSIX timestamps so that they can be used to sort the search results.

openedx.core.djangoapps.content.search.documents.searchable_doc_tags(object_id: OpaqueKey) dict#

Given an XBlock, course, library, etc., get the tag data for its index doc.

See the comments above on “Field.tags” for an explanation of the format.

e.g. for something tagged “Difficulty: Hard” and “Location: Vancouver” this would return:

{
“tags”: {

“taxonomy”: [“Location”, “Difficulty”], “level0”: [“Location > North America”, “Difficulty > Hard”], “level1”: [“Location > North America > Canada”], “level2”: [“Location > North America > Canada > Vancouver”],

}

}

Note: despite what you might expect, because this is only used for the filtering/refinement UI, it’s fine if this is a one-way transformation. It’s not necessary to be able to re-construct the exact tag IDs nor taxonomy IDs from this data that’s stored in the search index. It’s just a bunch of strings in a particular format that the frontend knows how to render to support hierarchical refinement by tag.

openedx.core.djangoapps.content.search.handlers module#

Signal/event handlers for content search

openedx.core.djangoapps.content.search.handlers.content_library_created_handler(**kwargs) None#

Create the index and SearchAccess for the content library

openedx.core.djangoapps.content.search.handlers.content_library_updated_handler(**kwargs) None#

Update the index for the content library

openedx.core.djangoapps.content.search.handlers.content_object_associations_changed_handler(**kwargs) None#

Update the collections/tags data in the index for the Content Object

openedx.core.djangoapps.content.search.handlers.delete_course_search_access(sender, instance, **kwargs)#

Deletes the SearchAccess instance for deleted CourseOverview

openedx.core.djangoapps.content.search.handlers.delete_library_search_access(content_library: ContentLibraryData, **kwargs)#

Deletes the SearchAccess instance for deleted content libraries

openedx.core.djangoapps.content.search.handlers.handle_post_migrate(sender, **kwargs)#

Reconcile Meilisearch index state after Django migrations run.

Filters on sender.label to only execute for the search app’s post_migrate signal. Tolerant of Meilisearch unavailability — logs a warning and continues.

openedx.core.djangoapps.content.search.handlers.handle_reindex_on_signal(**kwargs)#

Automatically update Meiliesearch index for course in database on new import or rerun.

openedx.core.djangoapps.content.search.handlers.library_block_deleted(**kwargs) None#

Delete the index for the content library block

openedx.core.djangoapps.content.search.handlers.library_block_published_handler(**kwargs) None#

Update the index for the content library block when its published version has changed.

openedx.core.djangoapps.content.search.handlers.library_block_updated_handler(**kwargs) None#

Create or update the index for the content library block

openedx.core.djangoapps.content.search.handlers.library_collection_updated_handler(**kwargs) None#

Create or update the index for the content library collection

openedx.core.djangoapps.content.search.handlers.library_container_deleted(**kwargs) None#

Delete the index for the content library container

openedx.core.djangoapps.content.search.handlers.library_container_published_handler(**kwargs) None#

Update the index for the content library container when its published version has changed.

openedx.core.djangoapps.content.search.handlers.library_container_updated_handler(**kwargs) None#

Create or update the index for the content library container

openedx.core.djangoapps.content.search.handlers.listen_for_course_delete(sender, course_key, **kwargs)#

Catches the signal that a course has been deleted and removes its entry from the Course About Search index.

openedx.core.djangoapps.content.search.handlers.xblock_created_handler(**kwargs) None#

Create the index for the XBlock

openedx.core.djangoapps.content.search.handlers.xblock_deleted_handler(**kwargs) None#

Delete the index for the XBlock

openedx.core.djangoapps.content.search.handlers.xblock_updated_handler(**kwargs) None#

Update the index for the XBlock and its children

openedx.core.djangoapps.content.search.index_config module#

Configuration for the search index.

openedx.core.djangoapps.content.search.models module#

Database models for content search

class openedx.core.djangoapps.content.search.models.IncrementalIndexCompleted(*args, **kwargs)#

Bases: Model

Stores the contex keys of aleady indexed courses and libraries for incremental indexing.

exception DoesNotExist#

Bases: ObjectDoesNotExist

exception MultipleObjectsReturned#

Bases: MultipleObjectsReturned

context_key#

DO NOT REUSE THIS CLASS. Provided for backwards compatibility only!

A placeholder class that provides a way to set the attribute on the model.

id#

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

objects = <django.db.models.manager.Manager object>#
class openedx.core.djangoapps.content.search.models.SearchAccess(*args, **kwargs)#

Bases: Model

Stores a numeric ID for each ContextKey.

We use this shorter ID instead of the full ContextKey when determining a user’s access to search-indexed course and library content because:

  1. in some deployments, users may be granted access to more than 1_000 individual courses, and

  2. the search filter request is stored in the JWT, which is limited to 8Kib.

exception DoesNotExist#

Bases: ObjectDoesNotExist

exception MultipleObjectsReturned#

Bases: MultipleObjectsReturned

context_key#

DO NOT REUSE THIS CLASS. Provided for backwards compatibility only!

A placeholder class that provides a way to set the attribute on the model.

id#

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

objects = <django.db.models.manager.Manager object>#
openedx.core.djangoapps.content.search.models.get_access_ids_for_request(request: Request, omit_orgs: list[str] = None) list[int]#

Returns a list of SearchAccess.id values for courses and content libraries that the requesting user has been individually grated access to.

Omits any courses/libraries with orgs in the omit_orgs list.

openedx.core.djangoapps.content.search.plain_text_math module#

Helper class to convert mathjax equations to plain text.

exception openedx.core.djangoapps.content.search.plain_text_math.EqnPatternNotFound#

Bases: Exception

Raised when a pattern is not found in equation. This is used to skip a specific transformation.

exception openedx.core.djangoapps.content.search.plain_text_math.InvalidMathEquation#

Bases: Exception

Raised when mathjax equation is invalid. This is used to skip all transformations.

class openedx.core.djangoapps.content.search.plain_text_math.PlainTextMath#

Bases: object

Converts mathjax equations to plain text using unicodeit and some preprocessing.

eqn_replacements = (('\\sin', 'sin'), ('\\cos', 'cos'), ('\\tan', 'tan'), ('\\arcsin', 'arcsin'), ('\\arccos', 'arccos'), ('\\arctan', 'arctan'), ('\\cot', 'cot'), ('\\sec', 'sec'), ('\\csc', 'csc'), ('\\left', ''), ('\\right', ''))#
equation_pattern = re.compile('\\[mathjaxinline\\](.*?)\\[\\/mathjaxinline\\]|\\[mathjax\\](.*?)\\[\\/mathjax\\]|\\\\\\((.*?)\\\\\\)|\\\\\\[(.*?)\\\\\\]')#
extract_inner_texts = ('\\mathbf{', '\\bm{')#
frac_open_close_pattern = re.compile('}\\s*{')#
regex_replacements = ((re.compile('{\\\\bf (.*?)}'), '\\1'),)#
run(eqn_matches: Match) str#

Takes re.Match object and runs conversion process on each match group.

openedx.core.djangoapps.content.search.plain_text_math.process_mathjax(content: str) str#

openedx.core.djangoapps.content.search.tasks module#

Defines asynchronous celery task for content indexing

openedx.core.djangoapps.content.search.urls module#

URLs for content sesarch

openedx.core.djangoapps.content.search.views module#

REST API for content search

class openedx.core.djangoapps.content.search.views.StudioSearchView(**kwargs)#

Bases: APIView

Give user details on how they can search studio content

authentication_classes = (<class 'edx_rest_framework_extensions.auth.jwt.authentication.JwtAuthentication'>, <class 'openedx.core.lib.api.authentication.BearerAuthenticationAllowInactiveUser'>, <class 'edx_rest_framework_extensions.auth.session.authentication.SessionAuthenticationAllowInactiveUser'>)#
get(request)#

Give user details on how they can search studio content

permission_classes = (<class 'rest_framework.permissions.IsAuthenticated'>,)#

Module contents#