#OntoGenix #LLM #KnowledgeGraphs #RAG #MultiAgentSystems #Python
28 November 2025 | 12 min read | Mikel Val Calvo, PhD
OntoGenix graphical interface: Multi-agent orchestration for semi-automatic ontology engineering
Ontology engineering remains one of the most labour-intensive bottlenecks in semantic web development. Transforming structured datasets into robust knowledge graphs requires deep domain expertise, consumes 8-12 hours per ontology, and suffers from a 12-15% syntax error rate in manual workflows.
In emerging fields like neuroprosthetics, multi-agent systems, and AI-driven research, the ability to generate interoperable ontologies rapidly has become a critical constraint on innovation. Traditional tools like Protégé, whilst powerful, demand expert-level knowledge of OWL, RDF, and RML standards—creating a high barrier to entry for domain scientists.
OntoGenix addresses this challenge by combining the contextual reasoning capabilities of GPT-4 with advanced Retrieval-Augmented Generation (RAG), multi-agent orchestration, and self-repairing mechanisms—achieving a 95% reduction in development time whilst maintaining 97% mapping validity.
OntoGenix was developed as part of advanced research in knowledge graph engineering for neuroprosthetics and multi-agent systems at the University of Murcia's TECNOMOD group. The system enables rapid prototyping of domain-specific ontologies for experimental neuroscience data, facilitating FAIR (Findable, Accessible, Interoperable, Reusable) data principles.
Key achievement: Automated generation of OWL ontologies + RML mappings + materialised RDF graphs in 15-30 minutes with 80% automatic error correction.
OntoGenix implements a specialised multi-agent architecture where each agent (instantiated as a GPT-4 model with custom system prompts) handles a specific stage of the ontology pipeline. A finite state machine (FSM) orchestrates transitions, whilst RAG mechanisms enrich outputs with external knowledge from Schema.org.
Modular pipeline: From CSV input to materialised knowledge graph via specialised LLM agents
Role: Main controller with FSM-based workflow management
Role: Interactive prompt refinement assistant
Role: High-level schema generator with external enrichment
Role: OWL/Turtle ontology code generator
Role: RML mapping generator with self-repair
Role: Knowledge graph materialisation engine
The Automata_Manager implements an FSM that enforces valid state transitions and enables rollback on errors:
class Automaton:
def __init__(self):
self.states = {}
self.transitions = []
self._reached_states = [] # Rollback history
def perform_transition(self, to_state):
if self.can_transition(self.current_state, to_state):
self._reached_states.append(to_state)
self.last_state = self.current_state
self.current_state = to_state
return True
else:
raise InvalidTransitionException(
f"{self.current_state.name} → {to_state.name}"
)
def rollback_transition(self):
"""Undo last transition on error"""
if self._reached_states:
self.current_state = self._reached_states.pop()
self.last_state = self._reached_states[-1] if self._reached_states else None
# State sequence
NONE → PROMPT_CRAFT → HIGH_LEVEL_STRUCTURE → ONTOLOGY → ONTOLOGY_ENTITY → MAPPING
OntoGenix's RAG mechanism queries Schema.org to ground LLM outputs in standardised vocabularies, reducing hallucinations and improving interoperability.
class LlmPlanner(AbstractLlm):
def __init__(self, metadata: dict):
super().__init__(metadata)
self.RAG = Searcher(metadata) # RAG component
def get_from_schema(self, full_text: str):
# Extract entities from generated schema
classes = self._extract_items(full_text, 'Classes')
properties = self._extract_items(full_text, 'Object Properties')
entities = classes + properties
# Query Schema.org for each entity
schema_entities = {}
for entity in entities:
results = self.RAG._search_schema_org(entity)
schema_entities[entity] = results
return schema_entities
class Searcher:
def _search_schema_org(self, query: str, domain='schema.org'):
"""Query Schema.org via SerpAPI"""
params = {
"engine": "google",
"q": f"{query} {domain}",
"api_key": self.api_key
}
search = serpapi.search(params)
results = search.get_dict()
entities = {}
for item in results.get("organic_results", []):
if item.get('source') == 'Schema.org':
key = item['link'].split('/')[-1]
entities[key] = {
'url': item['link'],
'snippet': item['snippet']
}
return entities
After retrieving external entities, PlanSage updates the schema with owl:sameAs links and standard prefixes:
**Prompt for RAG-enhanced schema update:**
Given the following schema and external entity mappings:
**Input schema:**
{initial_schema}
**Entities from Schema.org:**
{interoperable_entities}
Critically analyse and select the most appropriate external entities to:
1. Update links to external resources (owl:sameAs)
2. Add standard ontology prefixes (schema:, dbo:, foaf:)
3. Improve interoperability and semantic richness
**Output format:**
**Ontology Prefixes:**
Enumerate required prefixes for schema alignment.
**Entity Links:**
For each local entity, specify its external source mapping.
OntoGenix's most innovative feature is its self-repairing loop—an iterative feedback mechanism where errors from graph materialisation are fed back to the LLM for autonomous correction.
class RAG_OntoMapper:
def __init__(self, ontology_mapper, planner_builder, max_iter=2):
self.ontology_mapper = ontology_mapper
self.planner_builder = planner_builder
self.max_iter = max_iter
self.kgen = None # Materialisation engine
def build_kgen(self, dataset: DatasetMetadata):
self.dataset_metadata = dataset
self.kgen = KGen(
config_ini_file=dataset.dataset_config_path,
dest_nt_file=dataset.dataset_triplets_path
)
def generateKG(self):
# Save generated RML mapping
self.ontology_mapper.save_response(
self.ontology_mapper.get_rml_codeblock(),
self.dataset_metadata.dataset_mapping_path,
mode='w'
)
# Attempt materialisation
self.kgen.run() # Captures errors internally
print("----------- MATERIALISATION OUTPUT -----------")
print(self.kgen.error_feedback)
class KGen:
def run(self):
"""Execute Morph-KGC with error capture"""
old_stdout = sys.stdout
sys.stdout = StringIO()
try:
self._generateKG() # Calls morph_kgc.materialize()
self.error_feedback = "DONE"
self.is_done = True
except Exception as e:
# Capture full traceback for LLM feedback
self.error_feedback = traceback.format_exc()
self.is_done = False
raise e
finally:
sys.stdout = old_stdout
When materialisation fails, the error is structured into a corrective prompt:
**Error correction prompt for OntoMapper:**
IMPORTANT: The generated error is provided below.
Your solution MUST address this error directly.
```python
{full_error_traceback}
```
I'm providing you with:
1. The ontology defining the schema
2. The CSV data source structure
3. The previous {mapping_extension} that caused the error
Fix the {mapping_extension} content ensuring:
- All necessary prefixes are included
- Column names match CSV exactly (case-sensitive)
- URIs are properly constructed (no whitespace, valid characters)
- Datatype mappings are correct (xsd:string, xsd:integer, etc.)
- Predicates reference ontology properties accurately
**Previous mapping:**
```{mapping_extension}
{previous_mapping}
```
Generate the corrected mapping as plain text (no code blocks).
async def ontology_interaction(self, max_tries=2):
"""Ontology generation with error correction loop"""
error = None
while max_tries > 0:
try:
# Generate ontology
async for chunk in self.ontology_builder.interact(
json_data=self.json_data,
data_description=self.plan_builder.answer,
state="ONTOLOGY",
error=error # Inject previous error
):
self.onto_manager.insertPlainText(chunk)
# Validate syntax with rdflib
self.onto_manager.text_to_graph(self.ontology_builder.answer)
max_tries = 0 # Success!
error = None
except SyntaxError as se:
if issubclass(type(se.msg), BadSyntax):
# Construct detailed error message
error = f"""
```python
{str(se).replace("\\n", "\n")}
```
The error occurred in the following ontology:
```turtle
{se.msg._str.decode('utf-8')}
```
"""
self.ontology_builder.error_message = error
self.log.append_log(f"ERROR: Invalid Turtle syntax. {max_tries-1} retries remaining")
if max_tries > 1:
self.onto_manager.clear() # Clear for regeneration
max_tries -= 1
Analysis of 50+ ontology generation sessions:
Genie leverages OpenAI's function calling API to dynamically route tasks to specialised agents based on conversation context:
class Genie(AbstractLlm):
def __init__(self, metadata: dict):
super().__init__(metadata)
self.available_functions = metadata['available_functions']
self.tools = metadata['tools'] # Function definitions
self.automata = Automata_Manager()
async def interaction(self, prompt: str):
try:
self.current_prompt = self.instructions.format(
prompt=prompt,
current_state=self.automata.droid.current_state.name,
transitions=self.automata.droid.possible_next_states()
)
# Get LLM response with function calling
async for chunk in self.get_async_api_response(
self.current_prompt
):
yield chunk
# Execute selected function
if self.update_memories(self.answer):
await self.select_process()
except InvalidTransitionException as e:
yield "\n" + e.message
async def select_process(self):
"""Route to appropriate agent via function calling"""
if self.automata.droid.action in self.available_functions:
self.function_calling(
content=f'Action: {self.automata.droid.action} prompt: {self.query}',
tools=self.tools,
seed=self.seed
)
await self._process_function_response()
async def _process_function_response(self):
"""Execute called function"""
for tool_call in self.tool_calls:
function_name = tool_call.function.name
if function_name in self.available_functions:
function_to_call = self.available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
await function_to_call(**function_args)
tools = [
{
"type": "function",
"function": {
"name": "ontology_building",
"description": "Generate OWL ontology in Turtle format from dataset schema"
}
},
{
"type": "function",
"function": {
"name": "ontology_entity_enrichment",
"description": "Define or enrich a specific entity in the ontology",
"parameters": {
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": "User query for entity definition"
},
"entity": {
"type": "string",
"description": "Entity name to enrich (class or property)"
}
},
"required": ["prompt", "entity"]
}
}
},
{
"type": "function",
"function": {
"name": "mapping_generation",
"description": "Generate RML mappings from ontology to CSV data source"
}
}
]
Comparative evaluation against manual ontology engineering (available at OntoGenixEvaluation repository):
| Metric | Manual Engineering | OntoGenix | Improvement |
|---|---|---|---|
| Development time | 8-12 hours | 15-30 min | 95% reduction |
| Syntax errors | 12-15% | 2-3% | 80% reduction |
| Property coverage | 65-70% | 85-90% | +25% improvement |
| Schema.org alignment | 30-40% | 75-85% | +100% improvement |
| Mapping validity | 85-88% | 97% | +10% improvement |
OntoGenix has been successfully tested across multiple domains:
Python 3.9+
PyQt5 >= 5.15
openai >= 1.0.0
rdflib >= 6.0.0
morph-kgc >= 2.3.0
serpapi >= 0.1.0
numpy >= 1.20
pandas >= 1.3
# 1. Clone repository
git clone https://github.com/tecnomod-um/OntoGenix.git
cd OntoGenix
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure API keys (config.yaml)
OPENAI_API_KEY: "sk-..."
SERPAPI_API_KEY: "..."
# 4. Launch GUI
python main.py
# GUI Workflow:
# 1. Load CSV dataset via "File → Open Dataset"
# 2. Chat with PromptCrafter to describe domain
# 3. Generate high-level schema with PlanSage
# 4. Generate OWL ontology with OntoBuilder
# 5. Create RML mappings with OntoMapper
# 6. Materialise knowledge graph with KGen
# 7. Export RDF triples (N-Triples, Turtle, RDF/XML)
# Command-line usage (batch processing):
python ontogenix_cli.py \
--dataset data/amazon_ratings.csv \
--description "E-commerce product ratings" \
--output graphs/amazon_kg.nt
Approximate OpenAI API costs per ontology generation:
Note: Costs based on GPT-4 pricing as of November 2025. Future open-source LLM support will eliminate these costs.
Full workflow: From CSV upload to materialised knowledge graph in 3 minutes
OntoGenix represents a paradigm shift in ontology engineering by combining:
"OntoGenix transforms ontology engineering from a specialist's bottleneck into an accessible workflow. By integrating GPT-4's reasoning with RAG enrichment and self-repair mechanisms, we've achieved 97% first-pass success in generating production-ready knowledge graphs—a capability that fundamentally changes how we approach semantic data integration in neuroprosthetics research."
Mikel Val Calvo, PhD
AI Research Scientist specialising in knowledge graph engineering, neuroprosthetics, and LLM-powered systems. Lead developer of OntoGenix at Universidad de Murcia's TECNOMOD research group. Former researcher at Universidad Miguel Hernández's NeuraViPeR (H2020) project. Currently developing multi-agent systems for semantic data integration at LabLENI-UPV.
Interested in collaborating on ontology engineering for neuroprosthetics, multi-agent systems, or semantic data integration? Working on similar LLM-powered knowledge graph projects? I'd love to discuss research synergies and potential collaborations.
Get in Touch