Extract#
extract_
methods that return the documents to index as a list of dicts. Each dict sets these keys:
- url
The remote URL of the document, which might include a fragment identifier
- title
The title of the document, which might be the page title and the heading text
- text
The plain text content of the document