openedx.core.djangoapps.content.search package#
Subpackages#
Submodules#
openedx.core.djangoapps.content.search.api module#
Content index and search API using Meilisearch
- class openedx.core.djangoapps.content.search.api.IndexDrift(exists: bool, is_empty: bool | None = None, primary_key_correct: bool | None = None, distinct_attribute_match: bool | None = None, filterable_attributes_match: bool | None = None, searchable_attributes_match: bool | None = None, sortable_attributes_match: bool | None = None, ranking_rules_match: bool | None = None)#
Bases:
objectRepresents the drift state of a Meilisearch index compared to the expected configuration.
- distinct_attribute_match: bool | None#
- exists: bool#
- filterable_attributes_match: bool | None#
- is_empty: bool | None#
- property is_settings_drifted: bool#
True if any of the 5 settings fields is False (not None, but explicitly False).
- primary_key_correct: bool | None#
- ranking_rules_match: bool | None#
- searchable_attributes_match: bool | None#
- sortable_attributes_match: bool | None#
- openedx.core.djangoapps.content.search.api.clear_meilisearch_client()#
- openedx.core.djangoapps.content.search.api.delete_docs_with_context_key(key: OpaqueKey) None#
Delete all docs for given context key
- openedx.core.djangoapps.content.search.api.delete_index_doc(key: OpaqueKey, *, delete_children: bool = False) None#
Deletes the document for the given XBlock from the search index
- Parameters:
key (OpaqueKey) – The opaque key of the XBlock/Container to be removed from the index
- openedx.core.djangoapps.content.search.api.fetch_block_types(extra_filter: str | list[str | list[str]] | None = None)#
Fetch the block types facet distribution for the search results.
This data may not always be 100% accurate / up to date because it’s based on the search index, so this should only be used for analysis/estimation purposes.
Params: - extra_filter: Filters the query. Example: [‘context_key = “course-v1:SampleTaxonomyOrg1+CC22+CC22”’]
Return example: {
… ‘estimatedTotalHits’: 5, ‘facetDistribution’: {
- ‘block_type’: {
‘html’: 2, ‘problem’: 1, ‘video’: 2,
}
},
}
- openedx.core.djangoapps.content.search.api.force_array(extra_filter: str | list[str | list[str]] | None = None) list[str]#
Convert a filter value into a list of strings.
Strings are wrapped in a list, lists are returned as-is (cast to list[str]), and None results in an empty list.
- openedx.core.djangoapps.content.search.api.generate_user_token_for_studio_search(request)#
Returns a Meilisearch API key that only allows the user to search content that they have permission to view
- openedx.core.djangoapps.content.search.api.get_all_blocks_from_context(context_key: str, extra_attributes_to_retrieve: list[str] | None = None) Iterator[dict]#
Lazily yields all blocks for a given context key using Meilisearch pagination. Meilisearch works with limits of 1000 maximum; ensuring we obtain all blocks requires making several queries.
This data may not always be 100% accurate / up to date because it’s based on the search index, so this should only be used for analysis/estimation purposes.
- openedx.core.djangoapps.content.search.api.index_course(course_key: CourseKey, index_name: str | None = None, status_cb: Callable[[str], None] | None = None) list[dict]#
Rebuilds the index for a given course.
- openedx.core.djangoapps.content.search.api.init_index(status_cb: Callable[[str], None] | None = None, warn_cb: Callable[[str], None] | None = None) None#
This method is depricated as of Verawood and would be removed in the future release.
Initialize the Meilisearch index, creating it and configuring it if it doesn’t exist.
This is a compatibility wrapper around reconcile_index().
- openedx.core.djangoapps.content.search.api.is_meilisearch_enabled() bool#
Returns whether Meilisearch is enabled
- openedx.core.djangoapps.content.search.api.only_if_meilisearch_enabled(f)#
Only call f if meilisearch is enabled
- openedx.core.djangoapps.content.search.api.rebuild_index(status_cb: Callable[[str], None] | None = None, incremental=False) None#
Rebuild the Meilisearch index from scratch
- openedx.core.djangoapps.content.search.api.reconcile_index(status_cb: Callable[[str], None] | None = None, warn_cb: Callable[[str], None] | None = None) None#
Reconcile the Meilisearch index state.
Inspects the current Studio Meilisearch index and takes appropriate action based on its state: - Creates the index if missing. - Reconfigures if empty and drifted. - Applies updated settings if populated and drifted. - Recreates the index if primary key is mismatched (even if populated — data loss is unavoidable). - No-ops if everything is correctly configured.
This is the primary reconciliation entry point, called from post_migrate and init_index().
- openedx.core.djangoapps.content.search.api.reset_index(status_cb: Callable[[str], None] | None = None) None#
Reset the Meilisearch index, deleting all documents and reconfiguring it
- openedx.core.djangoapps.content.search.api.update_library_components_collections(collection_key: LibraryCollectionLocator, batch_size: int = 1000) None#
Updates the “collections” field for all components associated with a given Library Collection.
Because there may be a lot of components, we send these updates to Meilisearch in batches.
- openedx.core.djangoapps.content.search.api.update_library_containers_collections(collection_key: LibraryCollectionLocator, batch_size: int = 1000) None#
Updates the “collections” field for all containers associated with a given Library Collection.
Because there may be a lot of containers, we send these updates to Meilisearch in batches.
- openedx.core.djangoapps.content.search.api.upsert_content_library_index_docs(library_key: LibraryLocatorV2, full_index: bool = False) None#
Creates or updates the documents for the given Content Library in the search index
- openedx.core.djangoapps.content.search.api.upsert_content_object_tags_index_doc(key: OpaqueKey)#
Updates the tags data in document for the given Course/Library item
- openedx.core.djangoapps.content.search.api.upsert_item_collections_index_docs(opaque_key: OpaqueKey)#
Updates the collections data in documents for the given Course/Library block, or Container
- openedx.core.djangoapps.content.search.api.upsert_item_containers_index_docs(opaque_key: OpaqueKey, container_type: str)#
Updates the containers (units/subsections/sections) data in documents for the given Course/Library block
- openedx.core.djangoapps.content.search.api.upsert_library_block_index_doc(usage_key: UsageKey) None#
Creates or updates the document for the given Library Block in the search index
- openedx.core.djangoapps.content.search.api.upsert_library_collection_index_doc(collection_key: LibraryCollectionLocator) None#
Creates, updates, or deletes the document for the given Library Collection in the search index.
If the Collection is not found or disabled (i.e. soft-deleted), then delete it from the search index.
- openedx.core.djangoapps.content.search.api.upsert_library_container_index_doc(container_key: LibraryContainerLocator) None#
Creates, updates, or deletes the document for the given Library Container in the search index.
TODO: add support for indexing a container’s components, like upsert_library_collection_index_doc does.
- openedx.core.djangoapps.content.search.api.upsert_xblock_index_doc(usage_key: UsageKey, recursive: bool = True) None#
Creates or updates the document for the given XBlock in the search index
- Parameters:
usage_key (UsageKey) – The usage key of the XBlock to index
recursive (bool) – If True, also index all children of the XBlock
openedx.core.djangoapps.content.search.apps module#
Define the content search Django App.
- class openedx.core.djangoapps.content.search.apps.ContentSearchConfig(app_name, app_module)#
Bases:
AppConfigApp config for the content search feature
- default_auto_field = 'django.db.models.BigAutoField'#
- label = 'search'#
- name = 'openedx.core.djangoapps.content.search'#
- ready()#
Override this method in subclasses to run code when Django starts.
openedx.core.djangoapps.content.search.documents module#
Utilities related to indexing content for search
- class openedx.core.djangoapps.content.search.documents.DocType#
Bases:
objectValues for the ‘type’ field on each doc in the search index
- collection = 'collection'#
- course_block = 'course_block'#
- library_block = 'library_block'#
- library_container = 'library_container'#
- class openedx.core.djangoapps.content.search.documents.Fields#
Bases:
objectFields that exist on the documents in our search index
- access_id = 'access_id'#
- block_id = 'block_id'#
- block_type = 'block_type'#
- breadcrumbs = 'breadcrumbs'#
- child_display_names = 'child_display_names'#
- child_usage_keys = 'child_usage_keys'#
- collections = 'collections'#
- collections_display_name = 'display_name'#
- collections_key = 'key'#
- containers_display_name = 'display_name'#
- containers_key = 'key'#
- content = 'content'#
- context_key = 'context_key'#
- created = 'created'#
- description = 'description'#
- display_name = 'display_name'#
- id = 'id'#
- last_published = 'last_published'#
- modified = 'modified'#
- num_children = 'num_children'#
- org = 'org'#
- problem_types = 'problem_types'#
- publish_status = 'publish_status'#
- published = 'published'#
- published_content = 'content'#
- published_description = 'description'#
- published_display_name = 'display_name'#
- published_num_children = 'num_children'#
- sections = 'sections'#
- sections_display_name = 'display_name'#
- sections_key = 'key'#
- subsections = 'subsections'#
- tags = 'tags'#
- tags_level0 = 'level0'#
- tags_level1 = 'level1'#
- tags_level2 = 'level2'#
- tags_level3 = 'level3'#
- tags_taxonomy = 'taxonomy'#
- type = 'type'#
- units = 'units'#
- usage_key = 'usage_key'#
- class openedx.core.djangoapps.content.search.documents.PublishStatus#
Bases:
objectValues for the ‘publish_status’ field on each doc in the search index
- modified = 'modified'#
- never = 'never'#
- published = 'published'#
- openedx.core.djangoapps.content.search.documents.meili_id_from_opaque_key(key: OpaqueKey) str#
Meilisearch requires each document to have a primary key that’s either an integer or a string composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and underscores (_). Since our opaque keys don’t meet this requirement, we transform them to a similar slug ID string that does.
In the future, with openedx_content’s data models in place for courseware, we could use PublishableEntity’s primary key / UUID instead.
- openedx.core.djangoapps.content.search.documents.searchable_doc_collections(object_id: OpaqueKey) dict#
Given an XBlock, course, library, etc., get the collections for its index doc.
- e.g. for something in Collections “COL_A” and “COL_B”, this would return:
- {
- “collections”: {
“display_name”: [“Collection A”, “Collection B”], “key”: [“COL_A”, “COL_B”],
}
}
- If the object is in no collections, returns:
- {
- “collections”: {
“display_name”: [], “key”: [],
},
}
- openedx.core.djangoapps.content.search.documents.searchable_doc_containers(object_id: OpaqueKey, container_type: str) dict#
Given an XBlock, course, library, etc., get the containers that it is part of for its index doc.
- e.g. for something in Units “UNIT_A” and “UNIT_B”, this would return:
- {
- “units”: {
“display_name”: [“Unit A”, “Unit B”], “key”: [“UNIT_A”, “UNIT_B”],
}
}
- If the object is in no containers, returns:
- {
- “sections”: {
“display_name”: [], “key”: [],
},
}
- openedx.core.djangoapps.content.search.documents.searchable_doc_for_collection(collection_key: LibraryCollectionLocator, *, collection: Collection | None = None) dict#
Generate a dictionary document suitable for ingestion into a search engine like Meilisearch or Elasticsearch, so that the given collection can be found using faceted search.
If no collection is found for the given library_key + collection_key, the returned document will contain only basic information derived from the collection usage key, and no Fields.type value will be included in the returned dict.
- openedx.core.djangoapps.content.search.documents.searchable_doc_for_container(container_key: ContainerKey) dict#
Generate a dictionary document suitable for ingestion into a search engine like Meilisearch or Elasticsearch, so that the given container can be found using faceted search.
If no container is found for the given container key, the returned document will contain only basic information derived from the container key, and some fields like Fields.display_name will be missing from the returned dict.
- openedx.core.djangoapps.content.search.documents.searchable_doc_for_course_block(block) dict#
Generate a dictionary document suitable for ingestion into a search engine like Meilisearch or Elasticsearch, so that the given course block can be found using faceted search.
- openedx.core.djangoapps.content.search.documents.searchable_doc_for_key(key: OpaqueKey) dict#
Generates a base document identified by its opaque key.
- openedx.core.djangoapps.content.search.documents.searchable_doc_for_library_block(xblock_metadata: LibraryXBlockMetadata) dict#
Generate a dictionary document suitable for ingestion into a search engine like Meilisearch or Elasticsearch, so that the given library block can be found using faceted search.
Datetime fields (created, modified, last_published) are serialized to POSIX timestamps so that they can be used to sort the search results.
- openedx.core.djangoapps.content.search.documents.searchable_doc_tags(object_id: OpaqueKey) dict#
Given an XBlock, course, library, etc., get the tag data for its index doc.
See the comments above on “Field.tags” for an explanation of the format.
e.g. for something tagged “Difficulty: Hard” and “Location: Vancouver” this would return:
- {
- “tags”: {
“taxonomy”: [“Location”, “Difficulty”], “level0”: [“Location > North America”, “Difficulty > Hard”], “level1”: [“Location > North America > Canada”], “level2”: [“Location > North America > Canada > Vancouver”],
}
}
Note: despite what you might expect, because this is only used for the filtering/refinement UI, it’s fine if this is a one-way transformation. It’s not necessary to be able to re-construct the exact tag IDs nor taxonomy IDs from this data that’s stored in the search index. It’s just a bunch of strings in a particular format that the frontend knows how to render to support hierarchical refinement by tag.
openedx.core.djangoapps.content.search.handlers module#
Signal/event handlers for content search
- openedx.core.djangoapps.content.search.handlers.content_library_created_handler(**kwargs) None#
Create the index and SearchAccess for the content library
- openedx.core.djangoapps.content.search.handlers.content_library_updated_handler(**kwargs) None#
Update the index for the content library
- openedx.core.djangoapps.content.search.handlers.content_object_associations_changed_handler(**kwargs) None#
Update the collections/tags data in the index for the Content Object
- openedx.core.djangoapps.content.search.handlers.delete_course_search_access(sender, instance, **kwargs)#
Deletes the SearchAccess instance for deleted CourseOverview
- openedx.core.djangoapps.content.search.handlers.delete_library_search_access(content_library: ContentLibraryData, **kwargs)#
Deletes the SearchAccess instance for deleted content libraries
- openedx.core.djangoapps.content.search.handlers.handle_post_migrate(sender, **kwargs)#
Reconcile Meilisearch index state after Django migrations run.
Filters on sender.label to only execute for the search app’s post_migrate signal. Tolerant of Meilisearch unavailability — logs a warning and continues.
- openedx.core.djangoapps.content.search.handlers.handle_reindex_on_signal(**kwargs)#
Automatically update Meiliesearch index for course in database on new import or rerun.
- openedx.core.djangoapps.content.search.handlers.library_block_deleted(**kwargs) None#
Delete the index for the content library block
- openedx.core.djangoapps.content.search.handlers.library_block_published_handler(**kwargs) None#
Update the index for the content library block when its published version has changed.
- openedx.core.djangoapps.content.search.handlers.library_block_updated_handler(**kwargs) None#
Create or update the index for the content library block
- openedx.core.djangoapps.content.search.handlers.library_collection_updated_handler(**kwargs) None#
Create or update the index for the content library collection
- openedx.core.djangoapps.content.search.handlers.library_container_deleted(**kwargs) None#
Delete the index for the content library container
- openedx.core.djangoapps.content.search.handlers.library_container_published_handler(**kwargs) None#
Update the index for the content library container when its published version has changed.
- openedx.core.djangoapps.content.search.handlers.library_container_updated_handler(**kwargs) None#
Create or update the index for the content library container
- openedx.core.djangoapps.content.search.handlers.listen_for_course_delete(sender, course_key, **kwargs)#
Catches the signal that a course has been deleted and removes its entry from the Course About Search index.
- openedx.core.djangoapps.content.search.handlers.xblock_created_handler(**kwargs) None#
Create the index for the XBlock
- openedx.core.djangoapps.content.search.handlers.xblock_deleted_handler(**kwargs) None#
Delete the index for the XBlock
- openedx.core.djangoapps.content.search.handlers.xblock_updated_handler(**kwargs) None#
Update the index for the XBlock and its children
openedx.core.djangoapps.content.search.index_config module#
Configuration for the search index.
openedx.core.djangoapps.content.search.models module#
Database models for content search
- class openedx.core.djangoapps.content.search.models.IncrementalIndexCompleted(*args, **kwargs)#
Bases:
ModelStores the contex keys of aleady indexed courses and libraries for incremental indexing.
- exception DoesNotExist#
Bases:
ObjectDoesNotExist
- exception MultipleObjectsReturned#
Bases:
MultipleObjectsReturned
- context_key#
DO NOT REUSE THIS CLASS. Provided for backwards compatibility only!
A placeholder class that provides a way to set the attribute on the model.
- id#
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>#
- class openedx.core.djangoapps.content.search.models.SearchAccess(*args, **kwargs)#
Bases:
ModelStores a numeric ID for each ContextKey.
We use this shorter ID instead of the full ContextKey when determining a user’s access to search-indexed course and library content because:
in some deployments, users may be granted access to more than 1_000 individual courses, and
the search filter request is stored in the JWT, which is limited to 8Kib.
- exception DoesNotExist#
Bases:
ObjectDoesNotExist
- exception MultipleObjectsReturned#
Bases:
MultipleObjectsReturned
- context_key#
DO NOT REUSE THIS CLASS. Provided for backwards compatibility only!
A placeholder class that provides a way to set the attribute on the model.
- id#
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>#
- openedx.core.djangoapps.content.search.models.get_access_ids_for_request(request: Request, omit_orgs: list[str] = None) list[int]#
Returns a list of SearchAccess.id values for courses and content libraries that the requesting user has been individually grated access to.
Omits any courses/libraries with orgs in the omit_orgs list.
openedx.core.djangoapps.content.search.plain_text_math module#
Helper class to convert mathjax equations to plain text.
- exception openedx.core.djangoapps.content.search.plain_text_math.EqnPatternNotFound#
Bases:
ExceptionRaised when a pattern is not found in equation. This is used to skip a specific transformation.
- exception openedx.core.djangoapps.content.search.plain_text_math.InvalidMathEquation#
Bases:
ExceptionRaised when mathjax equation is invalid. This is used to skip all transformations.
- class openedx.core.djangoapps.content.search.plain_text_math.PlainTextMath#
Bases:
objectConverts mathjax equations to plain text using unicodeit and some preprocessing.
- eqn_replacements = (('\\sin', 'sin'), ('\\cos', 'cos'), ('\\tan', 'tan'), ('\\arcsin', 'arcsin'), ('\\arccos', 'arccos'), ('\\arctan', 'arctan'), ('\\cot', 'cot'), ('\\sec', 'sec'), ('\\csc', 'csc'), ('\\left', ''), ('\\right', ''))#
- equation_pattern = re.compile('\\[mathjaxinline\\](.*?)\\[\\/mathjaxinline\\]|\\[mathjax\\](.*?)\\[\\/mathjax\\]|\\\\\\((.*?)\\\\\\)|\\\\\\[(.*?)\\\\\\]')#
- extract_inner_texts = ('\\mathbf{', '\\bm{')#
- frac_open_close_pattern = re.compile('}\\s*{')#
- regex_replacements = ((re.compile('{\\\\bf (.*?)}'), '\\1'),)#
- run(eqn_matches: Match) str#
Takes re.Match object and runs conversion process on each match group.
- openedx.core.djangoapps.content.search.plain_text_math.process_mathjax(content: str) str#
openedx.core.djangoapps.content.search.tasks module#
Defines asynchronous celery task for content indexing
openedx.core.djangoapps.content.search.urls module#
URLs for content sesarch
openedx.core.djangoapps.content.search.views module#
REST API for content search
- class openedx.core.djangoapps.content.search.views.StudioSearchView(**kwargs)#
Bases:
APIViewGive user details on how they can search studio content
- authentication_classes = (<class 'edx_rest_framework_extensions.auth.jwt.authentication.JwtAuthentication'>, <class 'openedx.core.lib.api.authentication.BearerAuthenticationAllowInactiveUser'>, <class 'edx_rest_framework_extensions.auth.session.authentication.SessionAuthenticationAllowInactiveUser'>)#
- get(request)#
Give user details on how they can search studio content
- permission_classes = (<class 'rest_framework.permissions.IsAuthenticated'>,)#