Data Extraction
Extracting data and drawing context-dependent conclusions from websites
Extraction basics
Dendrite supports structured data extraction for any website using natural language, making it ideal for LLMs and AI agents. This is based on our agents that write scripts for extracting data based on the prompt it’s given. There are different methods that are specialized on different use cases:
extract
is ideal for high volume extraction from predictable page structures where content remains staticask
is ideal for dynamic content or context-sensitive data extraction where the page structure might be unpredictable
Extract method
The extract
method is made for predictable extraction where one or several subpages (like blog articles) will be mapped on a regular basis. It utilizes caching of successful scripts, making it ideal for repeat use. More about caching.
Simple example
Example with typing
In this example, we extract all AI agent companies in the YC startup directory. We use a Pydantic model to get a typed response.
The extract
method exists on the Page
and Dendrite
classes. If invoked
from a Dendrite
instance, as above, it is performed on the active page
Asking quesions about the page
The ask
method allows you to ask questions about a page and get a structured output. It doesn’t write any scripts, it’s an agent that always uses the full context of the page to decide the return value. It has access to the page both through its source code and through computer vision. So you can ask any question that a human would be able to answer after taking a look at the page, and more.
For certain tasks where the page is more dynamic, the ask
is more helpful for structured data extraction that the extract
method. For those use cases, ask
support Pydantic models and JsonSchemas as type specification.
Example with Page Validation
In this example, we’re asking the agent if there are any unread emails in the inbox and proceeding differently depending on the outcome.
Example with Data Extraction
In this example, we’re extracting the latest posts sent as DM to the user through Instagram. This is a perfect use case for ask
, since the conversation will be dynamic with:
- messages sent to and from the current user
- a mix of plain text messages and shared posts
The ask
method exists on the Page
and Dendrite
classes. If invoked from
a Dendrite
instance, as above, it is performed on the active page