Interactions
This section presents HeXtractor integration with other tools and frameworks.
Langchain Integration - Graph Documents
Module contains functions to convert a GraphDocument to a PyTorch Geometric heterogeneous graph. It makes it easy to integrate LangChain LLM with PyTorch Geometric for graph-based learning tasks.
convert_graph_document_to_hetero_data(graph_doc)
Convert a GraphDocument to a PyTorch Geometric heterogeneous graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph_doc
|
GraphDocument
|
The graph document containing nodes and relationships. |
required |
Returns:
| Type | Description |
|---|---|
tuple[HeteroData, dict[tuple[str, str], int]]
|
A tuple containing:
|
Notes
This function performs the following steps:
- Groups nodes by their type
- Creates a mapping from string node IDs to numerical indices per node type
- Creates feature matrices for each node type
- Extracts all unique edge types
- Creates edge indices for each edge type
The resulting HeteroData object follows PyTorch Geometric's format for heterogeneous graphs, where node IDs within each type start from 0.
Source code in hextractor/integrations/langchain_graphdoc.py
create_edge_indices(data, graph_doc, edge_types, node_id_mapping)
Create edge indices for each edge type in the heterogeneous graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
HeteroData
|
The PyTorch Geometric HeteroData object to populate. |
required |
graph_doc
|
GraphDocument
|
The graph document containing nodes and relationships. |
required |
edge_types
|
Set[Tuple[str, str, str]]
|
A set of (source_type, relation_type, target_type) tuples representing all unique edge types in the graph. |
required |
node_id_mapping
|
Dict[Tuple[str, str], int]
|
A dictionary mapping (node_type, original_id) tuples to type-specific indices. |
required |
Returns:
| Type | Description |
|---|---|
None
|
This function modifies the data object in-place. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an edge references a node that doesn't exist in the graph. |
Notes
This function creates edge indices for each edge type in the format required by PyTorch Geometric: a tensor of shape [2, num_edges] where the first row contains source node indices and the second row contains target node indices.
Source code in hextractor/integrations/langchain_graphdoc.py
create_node_features(data, nodes_by_type)
Create feature matrices for each node type in the heterogeneous graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
HeteroData
|
The PyTorch Geometric HeteroData object to populate. |
required |
nodes_by_type
|
Dict[str, List[str]]
|
A dictionary mapping node types to lists of node IDs. |
required |
Returns:
| Type | Description |
|---|---|
None
|
This function modifies the data object in-place. |
Notes
This implementation creates simple feature matrices where each node's feature is just its index. In a real application, you would use actual node features extracted from the graph_doc's properties.
Source code in hextractor/integrations/langchain_graphdoc.py
create_node_id_mapping(nodes_by_type)
Create a mapping from string node IDs to numerical indices per node type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes_by_type
|
Dict[str, List[str]]
|
A dictionary mapping node types to lists of node IDs. |
required |
Returns:
| Type | Description |
|---|---|
Dict[Tuple[str, str], int]
|
A dictionary mapping (node_type, original_id) tuples to type-specific indices. |
Notes
This function ensures that node IDs within each type start from 0, which is required for PyTorch Geometric's heterogeneous graph format.
Source code in hextractor/integrations/langchain_graphdoc.py
extract_edge_types(graph_doc)
Extract all unique edge types from the graph document.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph_doc
|
GraphDocument
|
The graph document containing nodes and relationships. |
required |
Returns:
| Type | Description |
|---|---|
Set[Tuple[str, str, str]]
|
A set of (source_type, relation_type, target_type) tuples representing all unique edge types in the graph. |
Notes
Edge types in PyTorch Geometric are defined as tuples of (source_node_type, edge_type, target_node_type).
Source code in hextractor/integrations/langchain_graphdoc.py
group_nodes_by_type(graph_doc)
Group nodes by their type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph_doc
|
GraphDocument
|
The graph document containing nodes and relationships. |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, List[str]]
|
A dictionary mapping node types to lists of node IDs. |
Notes
This function creates a mapping from node types to lists of node IDs, which is useful for further processing of nodes by their type.