Extracting data and drawing context-dependent conclusions from websites
extract
is ideal for high volume extraction from predictable page structures where content remains staticask
is ideal for dynamic content or context-sensitive data extraction where the page structure might be unpredictableextract
method is made for predictable extraction where one or several subpages (like blog articles) will be mapped on a regular basis. It utilizes caching of successful scripts, making it ideal for repeat use. More about caching.
extract
method exists on the Page
and Dendrite
classes. If invoked
from a Dendrite
instance, as above, it is performed on the active pageask
method allows you to ask questions about a page and get a structured output. It doesn’t write any scripts, it’s an agent that always uses the full context of the page to decide the return value. It has access to the page both through its source code and through computer vision. So you can ask any question that a human would be able to answer after taking a look at the page, and more.
For certain tasks where the page is more dynamic, the ask
is more helpful for structured data extraction that the extract
method. For those use cases, ask
support Pydantic models and JsonSchemas as type specification.
ask
, since the conversation will be dynamic with:
ask
method exists on the Page
and Dendrite
classes. If invoked from
a Dendrite
instance, as above, it is performed on the active page