Deep Dive into the world of ODPs - Knowledge Graphs - Tech and Tools
- Carolyn Klein
- Feb 6
- 7 min read

Structure:
- Nodes - Entities with labels and properties
- Relationships - Directed edges with labels and properties
- Properties - Key-value pairs on both nodes and edges
Example:
(Person:Carolyn {
name: "Carolyn Klein",
role: "Principal Enterprise Architect",
since: 2020
})
--[WORKS_AT {
start_date: "2020-01-15",
employment_type: "full-time"
}]-->
(Company:Acme {
name: "Acme Corp",
industry: "Technology"
})
When to use:
- Real-time analytics and traversals
- Operational workloads (fraud detection, recommendations)
- Multi-hop reasoning (finding patterns across many relationships)
- AI/ML integration
- When performance at scale is critical
Popular databases: Neo4j, TigerGraph, AWS Neptune, Azure Cosmos DB
---
RDF Model (Resource Description Framework)
Structure:
- Everything is a triple: (Subject, Predicate, Object)
- Based on W3C standards
- Designed for semantic web and linked data
- Supports ontologies and formal reasoning
Example:
<Carolyn> <worksAt> <AcmeCorp> .
<Carolyn> <hasRole> "Principal Enterprise Architect" .
<AcmeCorp> <industry> "Technology" .
When to use:
- Semantic precision and ontology alignment
- Formal reasoning and inference
- Cross-system interoperability (linking data from multiple sources)
- Standards compliance (W3C, OWL, RDFS)
- Knowledge management and scholarly data
- When you need to integrate with existing RDF/linked data
Popular databases: Ontotext GraphDB, Stardog, AllegroGraph, AWS Neptune
---
The key difference:
- Property Graphs: Optimized for performance, analytics, and operations
- RDF: Optimized for semantics, reasoning, and interoperability
Most enterprises use Property Graphs because they're faster and more flexible for operational use cases. RDF shines in academia, research, and heavily regulated industries that need semantic precision.
---
Part 2: Major Graph Databases (2026 Landscape)
Neo4j (Property Graph)
The market leader. Think "MySQL of graph databases."
Strengths:
- Native graph storage and processing (fast traversals)
- Excellent tooling and visualization (Neo4j Bloom, Browser)
- Massive ecosystem and community
- Comprehensive documentation
- Easy to get started
- ACID transactions
Query Language: Cypher (declarative, SQL-like)
Example Query:
// Find all services Carolyn owns that depend on a PostgreSQL database
MATCH (carolyn:Person {name: "Carolyn"})-[:OWNS]->(service:Service)
-[:DEPENDS_ON]->(db:Database {type: "PostgreSQL"})
RETURN service.name, db.name
Best for:
- Startups and mid-size companies
- When you need great developer experience
- Pattern matching and fraud detection
- Recommendation engines
- Network analysis
Limitations:
- Can be expensive at enterprise scale
- Horizontal scaling requires enterprise edition
- Performance can degrade with massive graphs (billions of edges)
Who uses it: eBay, Walmart, UBS, NASA
---
TigerGraph (Property Graph)
The performance beast. Built for massive scale and real-time analytics.
Strengths:
- Native parallel processing (handles trillions of relationships)
- 2x to 8000x faster than competitors (per their benchmarks)
- Deep link analysis (can traverse 10+ hops efficiently)
- Real-time analytics at scale
- Distributed architecture
Query Language: GSQL (SQL-like, designed for complex analytics)
Example Query:
// Find fraud rings by traversing 5 hops across accounts
CREATE QUERY findFraudRing() {
START = {Account.*};
accounts = SELECT t FROM START:s
-(TRANSFERRED_TO>:e)-:t
WHERE e.amount > 5000
ACCUM t.@path += s;
PRINT accounts;
}
Best for:
- Large enterprises with massive connected datasets
- Fraud detection at scale (financial services)
- Supply chain optimization
- Cybersecurity (threat detection)
- When you need deep multi-hop traversals in real-time
Limitations:
- Steeper learning curve
- Smaller ecosystem than Neo4j
- GSQL is proprietary
Who uses it: Mastercard, Intuit, UnitedHealth Group
---
AWS Neptune (Property Graph + RDF)
The cloud-native choice. Fully managed, integrated with AWS ecosystem.
Strengths:
- Fully managed (no ops overhead)
- Supports both property graphs AND RDF
- Seamless AWS integration (Lambda, S3, IAM, CloudWatch)
- High availability and automated backups
- Pay-as-you-go pricing
Query Languages:
- Gremlin (for property graphs)
- SPARQL (for RDF)
- OpenCypher (experimental)
Example Query (Gremlin):
// Find Carolyn's services and their dependencies
g.V().has('person', 'name', 'Carolyn')
.out('owns')
.as('service')
.out('depends_on')
.as('dependency')
.select('service', 'dependency')
Best for:
- Organizations already on AWS
- When you want managed infrastructure
- Projects needing both RDF and property graph models
- Rapid prototyping without infrastructure setup
Limitations:
- Less performant than native solutions at massive scale
- Gremlin is harder to learn than Cypher
- Vendor lock-in
Who uses it: Companies in AWS ecosystem, government agencies
---
Other Notable Tools
PuppyGraph (2026 newcomer)
Query engine that sits on top of existing data lakes (S3, Delta Lake, Iceberg). You don't migrate data - it queries in place. Great for enterprises with data already in data lakes.
Ontotext GraphDB (RDF specialist)
The gold standard for semantic knowledge graphs. Used in pharma, publishing, and government. Best RDF performance.
Azure Cosmos DB for Apache Gremlin
Microsoft's cloud-native graph database. Good if you're in Azure ecosystem.
Memgraph (Property Graph)
High-performance in-memory graph database. Great for real-time streaming analytics.
---
Part 3: Query Languages Deep Dive
Cypher (Neo4j, AWS Neptune OpenCypher)
Philosophy: Declarative pattern matching (like SQL).
Syntax: Uses ASCII art to represent patterns.
// Create nodes and relationships
CREATE (c:Person {name: "Carolyn", role: "Principal Enterprise Architect"})
CREATE (kg:Skill {name: "Knowledge Graphs"})
CREATE (c)-[:LEARNING {status: "in-progress"}]->(kg)
// Pattern matching
MATCH (person:Person)-[:WORKS_AT]->(company:Company)
WHERE company.industry = "Technology"
RETURN person.name, company.name
// Multi-hop traversal
MATCH path = (app:Application)-[:DEPENDS_ON*1..5]->(db:Database)
WHERE db.type = "PostgreSQL"
RETURN path
// Aggregation
MATCH (p:Person)-[:OWNS]->(s:Service)
RETURN p.name, count(s) AS service_count
ORDER BY service_count DESC
Pros:
- Intuitive and readable
- Great for complex pattern matching
- Easy to learn if you know SQL
Cons:
- Proprietary to Neo4j (though becoming more standardized)
- Limited procedural capabilities
---
GSQL (TigerGraph)
Philosophy: SQL-like with graph extensions and procedural capabilities.
Syntax: More verbose but powerful for analytics.
// Create a vertex
CREATE VERTEX Person (
PRIMARY_ID name STRING,
role STRING,
department STRING
)
// Create an edge
CREATE DIRECTED EDGE WorksAt (
FROM Person,
TO Company,
start_date DATETIME
)
// Query with accumulator pattern
CREATE QUERY findColleagues(VERTEX<Person> inputPerson) {
colleagues = SELECT colleague
FROM inputPerson:p -(WorksAt>)- Company:c -(<WorksAt)- Person:colleague
WHERE colleague != p;
PRINT colleagues;
}
Pros:
- Extremely powerful for complex analytics
- Handles procedural logic well
- Optimized for parallel processing
Cons:
- Steeper learning curve
- More verbose than Cypher
- TigerGraph-specific
---
Gremlin (AWS Neptune, Azure Cosmos DB, JanusGraph)
Philosophy: Imperative graph traversal (functional programming style).
Syntax: Method chaining (like Java streams or JavaScript promises).
// Create vertices and edges
g.addV('Person').property('name', 'Carolyn').property('role', 'Principal Enterprise Architect').as('c')
.addV('Skill').property('name', 'Knowledge Graphs').as('kg')
.addE('LEARNING').from('c').to('kg').property('status', 'in-progress')
// Traversal
g.V().hasLabel('Person')
.has('name', 'Carolyn')
.out('owns')
.hasLabel('Service')
.values('name')
// Multi-hop with filtering
g.V().has('type', 'Application')
.repeat(out('depends_on')).times(5)
.has('type', 'Database')
.has('db_type', 'PostgreSQL')
.path()
// Aggregation
g.V().hasLabel('Person')
.project('name', 'service_count')
.by('name')
.by(out('owns').count())
.order().by('service_count', desc)
Pros:
- Part of Apache TinkerPop (cross-database standard)
- Works with multiple graph databases
- Powerful for complex traversals
- Available in multiple languages (Java, Python, JavaScript)
Cons:
- Harder to read than Cypher
- Method chaining can get unwieldy
- Less intuitive for SQL developers
---
SPARQL (RDF databases)
Philosophy: Query RDF triples with semantic reasoning.
Syntax: SQL-like with triple patterns.
# Find all people who work at technology companies
PREFIX : <http://example.org/>
SELECT ?person ?company
WHERE {
?person :worksAt ?company .
?company :industry "Technology" .
}
# Path queries (transitive relationships)
SELECT ?person ?manager
WHERE {
?person :reportsTo+ ?manager .
}
# Inference example
SELECT ?person ?skill
WHERE {
?person :hasSkill ?skill .
?skill rdfs:subClassOf :TechnicalSkill .
}
Pros:
- W3C standard
- Supports reasoning and inference
- Federated queries (query across multiple endpoints)
- Ontology support
Cons:
- More complex syntax
- Not as intuitive as Cypher
- Slower for operational use cases
---
Part 4: How to Choose (Enterprise Decision Framework)
Decision Tree
Question 1: Do you need semantic reasoning, ontologies, or RDF standards?
- Yes → Use RDF (Ontotext GraphDB, Stardog, AWS Neptune with SPARQL)
- No → Use Property Graphs → Continue to Q2
Question 2: Are you already locked into a cloud provider?
- AWS → AWS Neptune (Gremlin/OpenCypher)
- Azure → Azure Cosmos DB (Gremlin)
- No preference/On-prem → Continue to Q3
Question 3: What's your scale and performance requirement?
- Massive scale (trillions of edges), real-time analytics → TigerGraph
- Medium scale, ease of use, great tooling → Neo4j
- Existing data in data lakes, no migration → PuppyGraph
Question 4: What's your team's skill level?
- New to graphs, want easy onboarding → Neo4j (best docs, community)
- Experienced with distributed systems → TigerGraph
- Already using AWS/cloud-native → Neptune
---
Part 5: Enterprise Considerations
Licensing & Cost:
- Neo4j: Community (free), Enterprise ($$$$), Cloud ($$$)
- TigerGraph: Developer (free), Enterprise (contact sales)
- AWS Neptune: Pay-as-you-go ($$$, can get expensive at scale)
Operational Overhead:
- Managed: AWS Neptune, Neo4j Aura (cloud) → Low ops
- Self-hosted: Neo4j, TigerGraph, Memgraph → Higher ops
Integration:
- All major graph databases integrate with: Kafka, Spark, ETL tools, BI platforms
- AWS Neptune has best AWS integration
- Neo4j has largest connector ecosystem
Skills Availability:
- Cypher (Neo4j): Largest community, easiest to hire for
- GSQL (TigerGraph): Smaller talent pool
- Gremlin: Moderate availability
---
Part 6: Getting Started (Practical Next Steps)
For Enterprise Architects:
1. Start with Neo4j Desktop (free)
- Download: https://neo4j.com/download/
- Spin up a local instance
- Load sample data
- Play with Cypher queries
2. Model your enterprise architecture
- Systems → Nodes
- Dependencies → Edges
- Properties → Metadata (owner, criticality, tech stack)
3. Ask architectural questions:
// What breaks if this database goes down?
MATCH (db:Database {name: "CustomerDB"})<-[:DEPENDS_ON]-(service:Service)
RETURN service.name
// Who owns the most critical systems?
MATCH (person:Person)-[:OWNS]->(system:System)
WHERE system.criticality = "high"
RETURN person.name, count(system) AS critical_systems
ORDER BY critical_systems DESC
4. Explore real-world datasets:
- Movie database (comes with Neo4j)
- Panama Papers (public Neo4j dataset)
- Your own systems (export from CMDB/ServiceNow)
---
Key Resources
Database comparisons:
Query language guides:
RDF vs Property Graph:
Best databases lists:
---
Next (and final) step: Task #5 - Build a simple Knowledge Graph yourself. You'll pick a tool and create a proof-of-concept with your own data.
The tools exist. The languages are learnable. The only thing left is to build something and watch it fail in exciting new ways. Welcome to engineering.
---
Sources:



Comments