Code examples
HexTractor examples module for tabular data processing.
This module provides complete examples showing how to use HexTractor to transform tabular data into heterogeneous graphs. Two main cases are demonstrated:
- Single-table data processing where all data resides in one denormalized table
- Multi-table data processing where data is split across normalized tables
The examples show common patterns like: - Creating node and edge type parameters - Handling multi-value columns - De-duplicating entities - Joining data across tables - Building graph specifications
Example Usage:
from hextractor.examples.single_table import create_single_table_graph
graph = create_single_table_graph()
from hextractor.examples.multi_table import create_multi_table_graph
graph = create_multi_table_graph()
Data sources
Example datasets for demonstrating HexTractor functionality.
This module provides sample datasets in both single-table and multi-table formats that demonstrate common patterns in heterogeneous graph extraction.
The data represents a simple company-employee-tag relationship graph where: - Companies have employees and tags - Companies have attributes (employee count, revenue) - Employees have attributes (occupation, age) and a label (promotion) - Tags are simple identifiers
The same data is provided in two formats: 1. Single denormalized table with all relationships 2. Multiple normalized tables (companies, employees, tags, relationships)
get_multi_table_data()
Generate example data split across multiple normalized tables.
Returns:
| Type | Description |
|---|---|
dict of {str: pd.DataFrame}
|
Dictionary containing DataFrames: - companies: Company information (id, employees, revenue) - employees: Employee information (id, occupation, age, promotion) - tags: Tag IDs - company_employees: Company-employee relationships - company_tags: Company-tag relationships |
Source code in hextractor/examples/data.py
get_single_table_data()
Generate example data in single denormalized table format.
The table contains company data duplicated across rows, one row per company-employee relationship. Companies can have multiple tags stored as lists in the tags column.
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns: - company_id (int): Unique company identifier - company_employees (int): Number of employees - company_revenue (int): Company revenue - employee_id (int): Unique employee identifier - employee_occupation (int): Employee occupation code - employee_age (int): Employee age - employee_promotion (int): Binary promotion label - tags (List[int]): List of tag IDs for the company |
Source code in hextractor/examples/data.py
Commands and specs
Utility functions for creating graph specifications.
This module provides helper functions for creating node and edge parameters used in both single-table and multi-table examples. These utilities help reduce code duplication and standardize parameter creation.
create_company_employee_edge_params(company_id_col='company_id', employee_id_col='employee_id')
Create edge parameters for company-employee relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
company_id_col
|
str
|
Column name for company IDs |
'company_id'
|
employee_id_col
|
str
|
Column name for employee IDs |
'employee_id'
|
Returns:
| Type | Description |
|---|---|
EdgeTypeParams
|
EdgeTypeParams configured for company-employee edges |
Source code in hextractor/examples/utils.py
create_company_node_params(id_col='company_id', employees_col='company_employees', revenue_col='company_revenue')
Create node parameters for company entities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id_col
|
str
|
Column name for company ID |
'company_id'
|
employees_col
|
str
|
Column name for employee count |
'company_employees'
|
revenue_col
|
str
|
Column name for company revenue |
'company_revenue'
|
Returns:
| Type | Description |
|---|---|
NodeTypeParams
|
NodeTypeParams configured for company nodes |
Source code in hextractor/examples/utils.py
create_company_tag_edge_params(company_id_col='company_id', tag_id_col='tags', multivalue=True)
Create edge parameters for company-tag relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
company_id_col
|
str
|
Column name for company IDs |
'company_id'
|
tag_id_col
|
str
|
Column name for tag IDs |
'tags'
|
multivalue
|
bool
|
Whether tags are stored as lists of values |
True
|
Returns:
| Type | Description |
|---|---|
EdgeTypeParams
|
EdgeTypeParams configured for company-tag edges |
Source code in hextractor/examples/utils.py
create_dataframe_specs(name, df, node_params=None, edge_params=None)
Create DataFrame specifications for a data source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name identifier for the data source |
required |
df
|
DataFrame
|
Source DataFrame |
required |
node_params
|
Optional[Tuple[NodeTypeParams, ...]]
|
Tuple of NodeTypeParams for entities in the DataFrame |
None
|
edge_params
|
Optional[Tuple[EdgeTypeParams, ...]]
|
Tuple of EdgeTypeParams for relationships in the DataFrame |
None
|
Returns:
| Type | Description |
|---|---|
DataFrameSpecs
|
DataFrameSpecs configured with the provided parameters |
Source code in hextractor/examples/utils.py
create_employee_node_params(id_col='employee_id', occupation_col='employee_occupation', age_col='employee_age', promotion_col='employee_promotion')
Create node parameters for employee entities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id_col
|
str
|
Column name for employee ID |
'employee_id'
|
occupation_col
|
str
|
Column name for occupation code |
'employee_occupation'
|
age_col
|
str
|
Column name for employee age |
'employee_age'
|
promotion_col
|
str
|
Column name for promotion label |
'employee_promotion'
|
Returns:
| Type | Description |
|---|---|
NodeTypeParams
|
NodeTypeParams configured for employee nodes |
Source code in hextractor/examples/utils.py
create_tag_node_params(id_col='tags', multivalue=True)
Create node parameters for tag entities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id_col
|
str
|
Column name containing tag IDs |
'tags'
|
multivalue
|
bool
|
Whether tags are stored as lists of values |
True
|
Returns:
| Type | Description |
|---|---|
NodeTypeParams
|
NodeTypeParams configured for tag nodes |
Source code in hextractor/examples/utils.py
Single-table case
Single table data processing example.
This module demonstrates how to use HexTractor to extract a heterogeneous graph from a single denormalized table containing all entities and relationships. The example shows how to handle: - Multiple entity types in one table - Entity de-duplication - Multi-value columns (tags)
create_single_table_graph(df=None)
Extract a heterogeneous graph from a single denormalized table.
This function demonstrates the complete workflow of: 1. Creating node type parameters 2. Creating edge type parameters 3. Creating DataFrame specifications 4. Creating graph specifications 5. Extracting the final graph
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame containing all entities and relationships. If None, uses example data from get_single_table_data(). |
None
|
Returns:
| Type | Description |
|---|---|
HeterogeneousGraph
|
Extracted heterogeneous graph |
Examples:
Basic usage:
from hextractor.examples.single_table import create_single_table_graph
graph = create_single_table_graph()
With custom data:
Source code in hextractor/examples/single_table.py
create_single_table_specs(df=None)
Create graph specifications for single table processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame containing all entities and relationships. If None, uses example data from get_single_table_data(). |
None
|
Returns:
| Type | Description |
|---|---|
GraphSpecs
|
GraphSpecs configured for single table processing |
Examples:
Basic usage:
from hextractor.examples.single_table import create_single_table_specs
specs = create_single_table_specs()
With custom data:
Source code in hextractor/examples/single_table.py
Multi-table case
Multi table data processing example.
This module demonstrates how to use HexTractor to extract a heterogeneous graph from multiple normalized tables. This represents a typical relational database scenario where: - Each entity type has its own table (companies, employees, tags) - Relationships are stored in separate junction tables - Data is normalized to avoid duplication
create_multi_table_graph(tables=None)
Extract a heterogeneous graph from multiple normalized tables.
This function demonstrates the complete workflow of: 1. Creating node type parameters for each entity table 2. Creating edge type parameters for each relationship table 3. Creating DataFrame specifications for each table 4. Creating graph specifications combining all tables 5. Extracting the final graph
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables
|
dict of {str: pd.DataFrame}
|
Dictionary of DataFrames containing entities and relationships. If None, uses example data from get_multi_table_data(). |
None
|
Returns:
| Type | Description |
|---|---|
HeterogeneousGraph
|
Extracted heterogeneous graph |
Examples:
Basic usage:
from hextractor.examples.multi_table import create_multi_table_graph
graph = create_multi_table_graph()
With custom data:
tables = {
'companies': companies_df,
'employees': employees_df,
'tags': tags_df,
'company_employees': company_employees_df,
'company_tags': company_tags_df
}
graph = create_multi_table_graph(tables)
Source code in hextractor/examples/multi_table.py
create_multi_table_specs(tables=None)
Create graph specifications for multi-table processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables
|
dict of {str: pd.DataFrame}
|
Dictionary containing DataFrames: - companies: Company information - employees: Employee information - tags: Tag information - company_employees: Company-employee relationships - company_tags: Company-tag relationships If None, uses example data from get_multi_table_data(). |
None
|
Returns:
| Type | Description |
|---|---|
GraphSpecs
|
GraphSpecs configured for multi-table processing |
Examples:
Basic usage:
from hextractor.examples.multi_table import create_multi_table_specs
specs = create_multi_table_specs()
With custom data:
tables = {
'companies': companies_df,
'employees': employees_df,
'tags': tags_df,
'company_employees': company_employees_df,
'company_tags': company_tags_df
}
specs = create_multi_table_specs(tables)
Source code in hextractor/examples/multi_table.py
LangChain integration
Langchain GraphDocument Integration Example.
This module demonstrates how to integrate Langchain's GraphDocument with a simple example. It shows how to create nodes and relationships to form a heterogeneous knowledge graph from a given text. The example includes creating nodes for persons, a library, and a graph, and establishing relationships between them.
Functions:
| Name | Description |
|---|---|
get_text |
Returns a sample text describing the developers of HeXtractor and its purpose. |
get_example_langchain_graphdocument |
Creates an example GraphDocument using Langchain, with nodes and relationships based on the sample text. |
get_example_langchain_graphdocument()
Create an example Langchain GraphDocument.
This function creates an example GraphDocument using Langchain. It defines nodes for persons (Filip Wójcik and Marcin Malczewski), a library (HeXtractor), and a graph (Heterogeneous knowledge graph). It also establishes relationships between these nodes to form a heterogeneous knowledge graph.
Returns:
| Type | Description |
|---|---|
list of GraphDocument
|
A list containing a single GraphDocument with the defined nodes and relationships. |
Source code in hextractor/examples/langchain_integration.py
get_text()
Get sample text.
This function returns a sample text that describes the developers of HeXtractor and its purpose. The text is used to create nodes and relationships in the graph.
Returns:
| Type | Description |
|---|---|
str
|
Sample text describing the developers and purpose of HeXtractor. |