Version: 1.0.0

Conceptual Search

Experimental Feature

Conceptual Search is an experimental feature. The API and functionality may change in future releases.

Conceptual Search is an AI-powered feature that allows you to generate a data product plan from a natural language query. It bridges the gap between a high-level business question and a concrete, executable data product definition.

Overview

At its core, Conceptual Search uses a sophisticated, two-stage process orchestrated by AI agents and knowledge graphs:

Knowledge Graphs: The system builds knowledge graphs for both database tables and columns. Nodes represent tables/columns, and edges connect conceptually related items based on semantic similarity and shared concepts extracted by an LLM.
Graph-Based Retrievers: When you search, the system uses a hybrid approach of vector search and graph traversal to find relevant tables and columns, even if they are not direct keyword matches.
AI Agents: The process is managed by two LangChain/LangGraph agents: a DataProductPlannerAgent and a DataProductBuilderAgent.

The Two-Stage Workflow

Stage 1: Planning

The goal of this stage is to convert a vague user request (e.g., "customer churn metrics") into a structured DataProductPlan, which is a well-defined list of dimensions and measures.

Input: A natural language query.
Agent's Task: The DataProductPlannerAgent uses its tools to find relevant database tables and existing data products.
Output: The agent produces a DataProductPlan object, which can be reviewed and modified by the user.

User Validation is Key

The DataProductPlan generated by the AI is a starting point. It is crucial to review and validate this plan to ensure it aligns with your business requirements before proceeding to the building stage.

Stage 2: Building

This stage takes the abstract DataProductPlan and maps each attribute to a specific, physical database column, defining its logic (e.g., aggregation for measures).

Input: The DataProductPlan from Stage 1.
Agent's Task: The DataProductBuilderAgent iterates through each attribute in the plan, using the graph-based column retriever to find the most relevant physical column.
Output: The collected mappings are assembled into a final ETLModel, which is a complete, machine-readable definition of the data product, ready to be used to generate a SQL query.

Usage Example

from intugle import DataProduct

dp = DataProduct()

# 1. Generate a plan from a natural language query
plan = await dp.plan(query="top 10 customers by their total purchase amount")

# 2. Review and modify the plan
print("Original Plan:")
plan.display()

plan.rename_attribute("total purchase amount", "Total Spend")
plan.disable_attribute("customer address") # Assuming this was in the plan

print("\nModified Plan:")
plan.display()


# 3. Create the ETL model from the modified plan
etl_model = await dp.create_etl_model_from_plan(plan)

# 4. Build the data product
result_dataset = dp.build(etl=etl_model)

# 5. Access the results
print(result_dataset.to_df())

Modifying the Data Product Plan

The DataProductPlan object is not just a static output; it's an interactive object that you can modify to refine the AI's suggestions. This allows you to correct any misunderstandings or add your own domain knowledge to the plan.

Here are the available methods to modify the plan:

Method	Description	Example
`rename_attribute(old, new)`	Renames an existing attribute.	`plan.rename_attribute('Customer ID', 'Client Identifier')`
`set_attribute_description(name, desc)`	Updates the description of an attribute.	`plan.set_attribute_description('Client Identifier', 'The unique ID for each client')`
`set_attribute_classification(name, class)`	Changes the classification to 'Dimension' or 'Measure'.	`plan.set_attribute_classification('Total Sales', 'Measure')`
`disable_attribute(name)`	Deactivates an attribute so it won't be included in the final data product.	`plan.disable_attribute('Customer Address')`
`enable_attribute(name)`	Reactivates a previously disabled attribute.	`plan.enable_attribute('Customer Address')`
`to_df()`	Returns the final plan as a pandas DataFrame with only active attributes.	`final_plan_df = plan.to_df()`

Qdrant Server Requirement

Conceptual Search utilizes Qdrant as its vector database for efficient retrieval of relevant tables and columns. Therefore, a running Qdrant instance is required.

You can easily set up a Qdrant server using Docker:

docker run -d -p 6333:6333 -p 6334:6334 \
    -v qdrant_storage:/qdrant/storage:z \
    --name qdrant qdrant/qdrant

After starting the Qdrant server, you need to configure its URL and API key (if authorization is used) in your environment variables:

export QDRANT_URL="http://localhost:6333"
export QDRANT_API_KEY="your-qdrant-api-key" # if authorization is used

Enhancing Performance with Tavily Web Search

For better performance and more contextually aware data product plans, it is recommended to use the Tavily web search tool. This allows the planning agent to research industry-best practices and common metrics related to your query.

To enable this feature, you need to get a API key from Tavily and set it as an environment variable:

export TAVILY_API_KEY="your-tavily-api-key"

Overview​

The Two-Stage Workflow​

Stage 1: Planning​

Stage 2: Building​

Usage Example​

Modifying the Data Product Plan​

Qdrant Server Requirement​

Enhancing Performance with Tavily Web Search​