top of page

Deep Dive into the world of ODPs - Knowledge Graphs - Tech and Tools

  • Writer: Carolyn Klein
    Carolyn Klein
  • Feb 6
  • 7 min read


Structure:

- Nodes - Entities with labels and properties

- Relationships - Directed edges with labels and properties

- Properties - Key-value pairs on both nodes and edges


Example:

(Person:Carolyn {

name: "Carolyn Klein",

role: "Principal Enterprise Architect",

since: 2020

})

--[WORKS_AT {

start_date: "2020-01-15",

employment_type: "full-time"

}]-->

(Company:Acme {

name: "Acme Corp",

industry: "Technology"

})


When to use:

- Real-time analytics and traversals

- Operational workloads (fraud detection, recommendations)

- Multi-hop reasoning (finding patterns across many relationships)

- AI/ML integration

- When performance at scale is critical


Popular databases: Neo4j, TigerGraph, AWS Neptune, Azure Cosmos DB


---

RDF Model (Resource Description Framework)


Structure:

- Everything is a triple: (Subject, Predicate, Object)

- Based on W3C standards

- Designed for semantic web and linked data

- Supports ontologies and formal reasoning


Example:

<Carolyn> <worksAt> <AcmeCorp> .

<Carolyn> <hasRole> "Principal Enterprise Architect" .

<AcmeCorp> <industry> "Technology" .


When to use:

- Semantic precision and ontology alignment

- Formal reasoning and inference

- Cross-system interoperability (linking data from multiple sources)

- Standards compliance (W3C, OWL, RDFS)

- Knowledge management and scholarly data

- When you need to integrate with existing RDF/linked data


Popular databases: Ontotext GraphDB, Stardog, AllegroGraph, AWS Neptune


---

The key difference:

- Property Graphs: Optimized for performance, analytics, and operations

- RDF: Optimized for semantics, reasoning, and interoperability


Most enterprises use Property Graphs because they're faster and more flexible for operational use cases. RDF shines in academia, research, and heavily regulated industries that need semantic precision.


---

Part 2: Major Graph Databases (2026 Landscape)


Neo4j (Property Graph)


The market leader. Think "MySQL of graph databases."


Strengths:

- Native graph storage and processing (fast traversals)

- Excellent tooling and visualization (Neo4j Bloom, Browser)

- Massive ecosystem and community

- Comprehensive documentation

- Easy to get started

- ACID transactions


Query Language: Cypher (declarative, SQL-like)


Example Query:

// Find all services Carolyn owns that depend on a PostgreSQL database

MATCH (carolyn:Person {name: "Carolyn"})-[:OWNS]->(service:Service)

-[:DEPENDS_ON]->(db:Database {type: "PostgreSQL"})

RETURN service.name, db.name


Best for:

- Startups and mid-size companies

- When you need great developer experience

- Pattern matching and fraud detection

- Recommendation engines

- Network analysis


Limitations:

- Can be expensive at enterprise scale

- Horizontal scaling requires enterprise edition

- Performance can degrade with massive graphs (billions of edges)


Who uses it: eBay, Walmart, UBS, NASA


---

TigerGraph (Property Graph)


The performance beast. Built for massive scale and real-time analytics.


Strengths:

- Native parallel processing (handles trillions of relationships)

- 2x to 8000x faster than competitors (per their benchmarks)

- Deep link analysis (can traverse 10+ hops efficiently)

- Real-time analytics at scale

- Distributed architecture


Query Language: GSQL (SQL-like, designed for complex analytics)


Example Query:

// Find fraud rings by traversing 5 hops across accounts

CREATE QUERY findFraudRing() {

START = {Account.*};


accounts = SELECT t FROM START:s

-(TRANSFERRED_TO>:e)-:t

WHERE e.amount > 5000

ACCUM t.@path += s;


PRINT accounts;

}


Best for:

- Large enterprises with massive connected datasets

- Fraud detection at scale (financial services)

- Supply chain optimization

- Cybersecurity (threat detection)

- When you need deep multi-hop traversals in real-time


Limitations:

- Steeper learning curve

- Smaller ecosystem than Neo4j

- GSQL is proprietary


Who uses it: Mastercard, Intuit, UnitedHealth Group


---

AWS Neptune (Property Graph + RDF)


The cloud-native choice. Fully managed, integrated with AWS ecosystem.


Strengths:

- Fully managed (no ops overhead)

- Supports both property graphs AND RDF

- Seamless AWS integration (Lambda, S3, IAM, CloudWatch)

- High availability and automated backups

- Pay-as-you-go pricing


Query Languages:

- Gremlin (for property graphs)

- SPARQL (for RDF)

- OpenCypher (experimental)


Example Query (Gremlin):

// Find Carolyn's services and their dependencies

g.V().has('person', 'name', 'Carolyn')

.out('owns')

.as('service')

.out('depends_on')

.as('dependency')

.select('service', 'dependency')


Best for:

- Organizations already on AWS

- When you want managed infrastructure

- Projects needing both RDF and property graph models

- Rapid prototyping without infrastructure setup


Limitations:

- Less performant than native solutions at massive scale

- Gremlin is harder to learn than Cypher

- Vendor lock-in


Who uses it: Companies in AWS ecosystem, government agencies


---

Other Notable Tools


PuppyGraph (2026 newcomer)

Query engine that sits on top of existing data lakes (S3, Delta Lake, Iceberg). You don't migrate data - it queries in place. Great for enterprises with data already in data lakes.


Ontotext GraphDB (RDF specialist)

The gold standard for semantic knowledge graphs. Used in pharma, publishing, and government. Best RDF performance.


Azure Cosmos DB for Apache Gremlin

Microsoft's cloud-native graph database. Good if you're in Azure ecosystem.


Memgraph (Property Graph)

High-performance in-memory graph database. Great for real-time streaming analytics.


---

Part 3: Query Languages Deep Dive


Cypher (Neo4j, AWS Neptune OpenCypher)


Philosophy: Declarative pattern matching (like SQL).


Syntax: Uses ASCII art to represent patterns.


// Create nodes and relationships

CREATE (c:Person {name: "Carolyn", role: "Principal Enterprise Architect"})

CREATE (kg:Skill {name: "Knowledge Graphs"})

CREATE (c)-[:LEARNING {status: "in-progress"}]->(kg)


// Pattern matching

MATCH (person:Person)-[:WORKS_AT]->(company:Company)

WHERE company.industry = "Technology"

RETURN person.name, company.name


// Multi-hop traversal

MATCH path = (app:Application)-[:DEPENDS_ON*1..5]->(db:Database)

WHERE db.type = "PostgreSQL"

RETURN path


// Aggregation

MATCH (p:Person)-[:OWNS]->(s:Service)

RETURN p.name, count(s) AS service_count

ORDER BY service_count DESC


Pros:

- Intuitive and readable

- Great for complex pattern matching

- Easy to learn if you know SQL


Cons:

- Proprietary to Neo4j (though becoming more standardized)

- Limited procedural capabilities


---

GSQL (TigerGraph)


Philosophy: SQL-like with graph extensions and procedural capabilities.


Syntax: More verbose but powerful for analytics.


// Create a vertex

CREATE VERTEX Person (

PRIMARY_ID name STRING,

role STRING,

department STRING

)


// Create an edge

CREATE DIRECTED EDGE WorksAt (

FROM Person,

TO Company,

start_date DATETIME

)


// Query with accumulator pattern

CREATE QUERY findColleagues(VERTEX<Person> inputPerson) {

colleagues = SELECT colleague

FROM inputPerson:p -(WorksAt>)- Company:c -(<WorksAt)- Person:colleague

WHERE colleague != p;


PRINT colleagues;

}


Pros:

- Extremely powerful for complex analytics

- Handles procedural logic well

- Optimized for parallel processing


Cons:

- Steeper learning curve

- More verbose than Cypher

- TigerGraph-specific


---

Gremlin (AWS Neptune, Azure Cosmos DB, JanusGraph)


Philosophy: Imperative graph traversal (functional programming style).


Syntax: Method chaining (like Java streams or JavaScript promises).


// Create vertices and edges

g.addV('Person').property('name', 'Carolyn').property('role', 'Principal Enterprise Architect').as('c')

.addV('Skill').property('name', 'Knowledge Graphs').as('kg')

.addE('LEARNING').from('c').to('kg').property('status', 'in-progress')


// Traversal

g.V().hasLabel('Person')

.has('name', 'Carolyn')

.out('owns')

.hasLabel('Service')

.values('name')


// Multi-hop with filtering

g.V().has('type', 'Application')

.repeat(out('depends_on')).times(5)

.has('type', 'Database')

.has('db_type', 'PostgreSQL')

.path()


// Aggregation

g.V().hasLabel('Person')

.project('name', 'service_count')

.by('name')

.by(out('owns').count())

.order().by('service_count', desc)


Pros:

- Part of Apache TinkerPop (cross-database standard)

- Works with multiple graph databases

- Powerful for complex traversals

- Available in multiple languages (Java, Python, JavaScript)


Cons:

- Harder to read than Cypher

- Method chaining can get unwieldy

- Less intuitive for SQL developers


---

SPARQL (RDF databases)


Philosophy: Query RDF triples with semantic reasoning.


Syntax: SQL-like with triple patterns.


# Find all people who work at technology companies

PREFIX : <http://example.org/>


SELECT ?person ?company

WHERE {

?person :worksAt ?company .

?company :industry "Technology" .

}


# Path queries (transitive relationships)

SELECT ?person ?manager

WHERE {

?person :reportsTo+ ?manager .

}


# Inference example

SELECT ?person ?skill

WHERE {

?person :hasSkill ?skill .

?skill rdfs:subClassOf :TechnicalSkill .

}


Pros:

- W3C standard

- Supports reasoning and inference

- Federated queries (query across multiple endpoints)

- Ontology support


Cons:

- More complex syntax

- Not as intuitive as Cypher

- Slower for operational use cases


---

Part 4: How to Choose (Enterprise Decision Framework)


Decision Tree


Question 1: Do you need semantic reasoning, ontologies, or RDF standards?

- Yes → Use RDF (Ontotext GraphDB, Stardog, AWS Neptune with SPARQL)

- No → Use Property Graphs → Continue to Q2


Question 2: Are you already locked into a cloud provider?

- AWS → AWS Neptune (Gremlin/OpenCypher)

- Azure → Azure Cosmos DB (Gremlin)

- No preference/On-prem → Continue to Q3


Question 3: What's your scale and performance requirement?

- Massive scale (trillions of edges), real-time analytics → TigerGraph

- Medium scale, ease of use, great tooling → Neo4j

- Existing data in data lakes, no migration → PuppyGraph


Question 4: What's your team's skill level?

- New to graphs, want easy onboarding → Neo4j (best docs, community)

- Experienced with distributed systems → TigerGraph

- Already using AWS/cloud-native → Neptune


---

Part 5: Enterprise Considerations


Licensing & Cost:

- Neo4j: Community (free), Enterprise ($$$$), Cloud ($$$)

- TigerGraph: Developer (free), Enterprise (contact sales)

- AWS Neptune: Pay-as-you-go ($$$, can get expensive at scale)


Operational Overhead:

- Managed: AWS Neptune, Neo4j Aura (cloud) → Low ops

- Self-hosted: Neo4j, TigerGraph, Memgraph → Higher ops


Integration:

- All major graph databases integrate with: Kafka, Spark, ETL tools, BI platforms

- AWS Neptune has best AWS integration

- Neo4j has largest connector ecosystem


Skills Availability:

- Cypher (Neo4j): Largest community, easiest to hire for

- GSQL (TigerGraph): Smaller talent pool

- Gremlin: Moderate availability


---

Part 6: Getting Started (Practical Next Steps)


For Enterprise Architects:


1. Start with Neo4j Desktop (free)

- Spin up a local instance

- Load sample data

- Play with Cypher queries

2. Model your enterprise architecture

- Systems → Nodes

- Dependencies → Edges

- Properties → Metadata (owner, criticality, tech stack)

3. Ask architectural questions:

// What breaks if this database goes down?

MATCH (db:Database {name: "CustomerDB"})<-[:DEPENDS_ON]-(service:Service)

RETURN service.name


// Who owns the most critical systems?

MATCH (person:Person)-[:OWNS]->(system:System)

WHERE system.criticality = "high"

RETURN person.name, count(system) AS critical_systems

ORDER BY critical_systems DESC

4. Explore real-world datasets:

- Movie database (comes with Neo4j)

- Panama Papers (public Neo4j dataset)

- Your own systems (export from CMDB/ServiceNow)


---

Key Resources


Database comparisons:


Query language guides:


RDF vs Property Graph:


Best databases lists:


---

Next (and final) step: Task #5 - Build a simple Knowledge Graph yourself. You'll pick a tool and create a proof-of-concept with your own data.


The tools exist. The languages are learnable. The only thing left is to build something and watch it fail in exciting new ways. Welcome to engineering.


---

Sources:

Comments


bottom of page