
Introduction
HeXtractor is an advanced tool designed to facilitate the transformation of tabular data into heterogeneous graphs, specifically tailored for use with PyTorch Geometric. As the field of graph neural networks (GNNs) continues to evolve, the significance of heterogeneous graphs becomes increasingly apparent. However, the prevalent format for data remains tabular, necessitating a laborious and error-prone manual conversion process. HeXtractor addresses this challenge by automating the conversion, thereby streamlining the workflow for researchers and practitioners.
This package has been reviewed and published in the Journal of Open Source Software (JOSS). You can find the paper here.
Wójcik et al., (2025). HeXtractor: Extracting Heterogeneous Graphs from Structured and Textual Data for Graph Neural Networks. Journal of Open Source Software, 10(110), 8057, https://doi.org/10.21105/joss.08057
Goals of the Project
The primary objective of HeXtractor is to provide a seamless and efficient method for converting tabular data into heterogeneous graphs. This automation aims to reduce the time and effort required for data preprocessing, allowing users to focus on the development and training of their GNN models. By integrating with PyTorch Geometric, HeXtractor ensures that the generated graphs are immediately usable within this framework, further enhancing the user experience.
Key Features
- Automatic Conversion: HeXtractor automates the transformation of tabular data into heterogeneous graphs, eliminating the need for manual intervention.
- Support for Multiple Formats: The tool is capable of handling various tabular data formats, ensuring versatility and adaptability.
- Integration with PyTorch Geometric: The generated graphs are compatible with PyTorch Geometric, facilitating seamless integration into existing workflows.
- Visualization: HeXtractor leverages NetworkX and PyVis for the visualization of graphs, providing users with intuitive and interactive representations of their data.
Why HeXtractor?
Heterogeneous graphs are pivotal in numerous applications of graph neural networks. The manual creation of these graphs from tabular data is often cumbersome and prone to errors. HeXtractor automates this process, enabling researchers to concentrate on model development and training rather than data preprocessing. This automation not only enhances efficiency but also improves the accuracy and reliability of the resulting graphs.
Key Applications:
- Transform single tabular datasets into heterogeneous graph structures.
- Transform multiple tables into a heterogeneous graph.
- Leverage Large Language Models (LLMs) to identify and extract semantic relationships from text, converting them into heterogeneous graph representations.

Technologies
HeXtractor is built using a robust stack of technologies, including:
- Python: The primary programming language used for HeXtractor.
- pandas: Utilized for data manipulation and handling tabular data.
- PyTorch Geometric: A framework for creating and working with graph neural networks.
- NetworkX: Used for creating and managing complex graph structures.
- PyVis: Enables interactive visualization of graphs.
Installation
HeXtractor can be installed either from PyPI (recommended for most users) or from source code (recommended for developers or if you need the latest features).
From PyPI
To install the latest version from PyPI run:
From Source Code
To install HeXtractor from source, you'll first need to clone the repository:
You can then install it using either conda or any standard Python virtual environment. We use Poetry as our primary dependency manager because it provides robust dependency resolution, reproducible builds, and better package management.
Option 1: Using Conda
- If you prefer Conda for environment management:
Option 2: Using Standard Python Virtual Environment
-
Create and activate a virtual environment using your preferred method:
-
Install Poetry and the package:
Remember to activate your environment (conda or virtual environment) whenever you want to use HeXtractor.
Examples
You can find a dedicated package with examples in the examples directory. These examples demonstrate the usage of HeXtractor for various datasets and scenarios. Additionally in the notebooks directory, you will find Jupyter notebooks that provide detailed walkthroughs of the tool's functionality.