这次的笔记介绍一下 关系型数据库的模型 到 图数据库模型 的转换过程, 内容整理自neo4j 的官网。Import Data Into Neo4j

将关系型数据转换成图数据,可以遵循的原则是:

  • 一条记录( record ) 是 一个节点( node )
  • 一个表( table ) 是一个 标签名 ( label )

我们使用的数据集是 NorthWind dataset, 其中包含了丰富的表连接。非常适合转化成图数据库。

假设这些表已经存在我们的PostgresSQL 里面,数据库名为northwind,执行psql -d northwind < export_csv.sql 把它们 导出为csv 文件。 export_csv.sql :

COPY (SELECT * FROM customers) TO '/tmp/customers.csv' WITH CSV header;

COPY (SELECT * FROM suppliers) TO '/tmp/suppliers.csv' WITH CSV header;

COPY (SELECT * FROM products)  TO '/tmp/products.csv' WITH CSV header;

COPY (SELECT * FROM employees) TO '/tmp/employees.csv' WITH CSV header;

COPY (SELECT * FROM categories) TO '/tmp/categories.csv' WITH CSV header;

COPY (SELECT * FROM orders

      LEFT OUTER JOIN order_details ON order_details.OrderID = orders.OrderID) TO '/tmp/orders.csv' WITH CSV header;

下面就需要一个Cypher 脚本,执行 LOAD CSV 命令,将csv 文件导入 neo4j, 建立节点和关系。

import_csv.cypher 首先创建节点:

// Create customers

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:customers.csv" AS row

CREATE (:Customer {companyName: row.CompanyName, customerID: row.CustomerID, fax: row.Fax, phone: row.Phone});

// Create products

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:products.csv" AS row

CREATE (:Product {productName: row.ProductName, productID: row.ProductID, unitPrice: toFloat(row.UnitPrice)});


// Create suppliers

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:suppliers.csv" AS row

CREATE (:Supplier {companyName: row.CompanyName, supplierID: row.SupplierID});

// Create employees

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:employees.csv" AS row

CREATE (:Employee {employeeID:row.EmployeeID,  firstName: row.FirstName, lastName: row.LastName, title: row.Title});


// Create categories

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:categories.csv" AS row

CREATE (:Category {categoryID: row.CategoryID, categoryName: row.CategoryName, description: row.Description});

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:orders.csv" AS row

MERGE (order:Order {orderID: row.OrderID}) ON CREATE SET order.shipName =  row.ShipName;

然后创建索引,这可以在创建 关系(relationship ) 时提高节点(node )查询速度。

CREATE INDEX ON :Product(productID);

CREATE INDEX ON :Product(productName);

CREATE INDEX ON :Category(categoryID);

CREATE INDEX ON :Employee(employeeID);

CREATE INDEX ON :Supplier(supplierID);

CREATE INDEX ON :Customer(customerID);

CREATE INDEX ON :Customer(customerName);

在创建了节点和 索引之后,就可以建立关系 ( relationship )。 订单 与产品及 雇员间的关系

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:orders.csv" AS row

MATCH (order:Order {orderID: row.OrderID})

MATCH (product:Product {productID: row.ProductID})

MERGE (order)-[pu:PRODUCT]->(product)

ON CREATE SET pu.unitPrice = toFloat(row.UnitPrice), pu.quantity = toFloat(row.Quantity);


USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:orders.csv" AS row

MATCH (order:Order {orderID: row.OrderID})

MATCH (employee:Employee {employeeID: row.EmployeeID})

MERGE (employee)-[:SOLD]->(order);


USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:orders.csv" AS row

MATCH (order:Order {orderID: row.OrderID})

MATCH (customer:Customer {customerID: row.CustomerID})

MERGE (customer)-[:PURCHASED]->(order);

建立产品、供应商、类别 之间的关系。

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:products.csv" AS row

MATCH (product:Product {productID: row.ProductID})

MATCH (supplier:Supplier {supplierID: row.SupplierID})

MERGE (supplier)-[:SUPPLIES]->(product);


USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:products.csv" AS row

MATCH (product:Product {productID: row.ProductID})

MATCH (category:Category {categoryID: row.CategoryID})

MERGE (product)-[:PART_OF]->(category);

最后创建 雇员之间 的层级关系

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM "file:employees.csv" AS row

MATCH (employee:Employee {employeeID: row.EmployeeID})

MATCH (manager:Employee {employeeID: row.ReportsTo})

MERGE (employee)-[:REPORTS_TO]->(manager);

为了数据的完整性和查询优化,最后在 Order 标签类上创建一个 unique constraint: CREATE CONSTRAINT ON (o:Order) ASSERT o.orderID IS UNIQUE;

执行以下命令来 运行Cypher脚本:neo4j-shell -path northwind.db -file import_csv.cypher.

关系型数据库模型 里面的 外键 在 图数据中变成了 一个 关系。

关系型数据库的每一条记录 既可以包含 各种类型的属性 ,也会包含 数据模型之间一对一 、 一对多、 多对多 的关系。同一个关系型数据库表 ,可以拆分成 图数据库中的节点(node )和关系(relationship )。甚至可以把 记录的一部分属性 赋值给 关系( relationship ). 建立图数据库的步骤是先节点再 关系。

创建 节点的 CREATE 从句紧跟 LOAD CSV 从句。 每一行数据被赋值给 row 变量。row的一部分属性成为了节点的属性。

创建节点之后,才可以 创建 关系。建立关系 其实就是遍历 那些包含了 一对一、 一对多、 多对多关系的 表,利用每一条记录 的外键 组合,用MATCH命令 匹配 到 相对应的两个节点,然后用MERGE 命令创建两节点之间的关系。