Replies: 6 comments 4 replies
-
Reading from HDFS should work out of the box via GraphSr Spark (there is a Hadoop dependency) with P.S. HDFS is a file system, not a file format. |
Beta Was this translation helpful? Give feedback.
-
According to the HugeGraph document, The |
Beta Was this translation helpful? Give feedback.
-
I think we can first compare the graph model specification of HugeGraph with GraphAr's specification, to figure out whether GraphAr can support HugeGraph's schema(if not, are there some new features that we can import to GraphAr that improve the ability). And then, see how to construct HugeGraph's schema with GraphAr's infos for a graph dataset. Finally, see how to process the files of GraphAr to construct the graph in HugeGraph. |
Beta Was this translation helpful? Give feedback.
-
Differences of schema and how to mappingRepeat PropertyLabelIn HugeGraph, PropertyLabel belongs to graph, and can be reused by different vertices and edges. So PropertyLabels with same name is not allowed in a graph. In GraphAr, a property only belongs to a vertex label or a edge label. So properties with same name(maybe different data type) may exist in a graph. Possible mapping methods:
Nullable properties(keys)In HugeGraph, properties in a VertexLabel or EdgeLabel must been set as nullable or not, not nullable by default. In GraphAr, we haven't record that information, which means that every not primary property is nullable, and each primary property is not nullable. Mapping solution: primary property by default(not nullable); not primary property by nullable. PrimaryKey of VertexLabelIn HugeGraph, there are 4 id strategies to choose primary key of each VertexLabel:
In GraphAr, a specific vertex can been identified by its vertex label + internal id, or been identified by its vertex label + its primary key(if have). Mapping solution: using PRIMARY_KEY id strategy by vertex label + internal id; Direction of edgeHugeGraph have only directed edge. GraphAr have directed edge and not directed edge. Mapping solution: for each undirected edge, split to in and out two directed edges. Frequency of edgeHugeGraph allow single and multiple frequency of same edge with different property, and must set sortKey for multiple frequency edge. GraphAr allow an edge occur many times, but have no further recording about that. It is a problem to deal with that. Beside, should find out how HugeGraph handled repeated edge when it was set as single, cover it or have other behaviors? IndexLabelIndexLabel in HugeGraph aims to increase the ability to access data, but will use more spaces and cost more time to create and maintain. GraphAr don't have this part, so don't need to mapping. (Un)ordered by src/dst of edgeNot sure about that, maybe related to data rather than schema? What may we learn from schema of HugeGraph(need discussion)Nullable properties(keys)Record nullable of property in each vertex and edge, for three type:
Index labelLike what HugeGraph do, it well help querying data efficiently. But need consider cost of spaces to record them and cost of time to maintain them. User defined informationFor graph/edge/vertex info, property info in a vertex/edge or in a graph? |
Beta Was this translation helpful? Give feedback.
-
hi, @Thespica, I think you can create some issue about the feature base on the discussion here. |
Beta Was this translation helpful? Give feedback.
-
hi, @Thespica。I'd like to clarify that GraphAr supports the storage of multiple edges between the same source and destination nodes, allowing for the representation of edge frequencies. |
Beta Was this translation helpful? Give feedback.
-
Information about how HugeGraph loads files.
Information below may help us understand and support loading of HugeGraph, but maybe have limited coverage. Supplements and suggestions are welcome!
Links
Doc
hugegraph(server)
hugegraph-toolchain
Related modules
Concepts
Three kinds files
How to use hugegraph loader
HugeGraph loader support files or directories from local or HDFS.
Further consider: make GraphAr Java SDK supporting HDFS?
comment(@acezen): GraphAr and HDFS is decoupled, I think Java SDK is supporting HDFS since c++ allow to read data from hdfs.
Vertex/Edge ID Policy
Vertex id policy
Edge id policy
The Edge Id of HugeGraph is composed of
srcVertexId
+edgeLabel
+sortKey
+tgtVertexId
. AndsortKey
is an important concept of HugeGraph.Beta Was this translation helpful? Give feedback.
All reactions