vaticle/typedb 1.5.9 on GitHub

Install & Run: http://dev.grakn.ai/docs/running-grakn/install-and-run

New Features

Use keyspace stats to fetch instance counts.
The goal of this PR is to optimise the compute count queries.
Currently we are computing the counts each time the query is executed. This is potentially heavy and time-consuming for large graphs. However, we are already storing instance counts on type vertices in the graph. These come in the form of KeyspaceStatistics. We could readily use this information to provide instant compute count responses.
In this PR we use the cached counts to provide instant response to compute count queries.
Cache attributes within transaction.
In situations when we insert multiple attributes within a transaction, especially when the attribute value is a super node (for example male/female gender), it is beneficial to cache the inserted attribute instead of doing a db lookup which requires an index check. Otherwise we need to fetch the same attribute multiple times. This PR introduces an extra transaction-based attribute cache to allow to cache attributes.

Bugs Fixed

Fix eager execution due to flatMap.
We currently have a use of flatMap() on the stream of answers that is produced from Janus via our various transformations to produce Concepts and Answers. The flatMap() at the end of this sequence of stream manipulations is not necessarily lazy.
See:
https://bugs.openjdk.java.net/browse/JDK-8075939
To fix this we introduced Vavr which is able to properly handle lazy streams and flatMap operations. We introduce a minimal change using this library as required here, converting to a Vavr stream and back once the flatMap is added to the stream.
Increase commit performance by optimising type sharding.
The recently introduced type-sharding (#5412) caused the commit time to increase.
This PR cuts down the commit time by optimising the checks that are performed before performing type sharding. Whereas previously we check every type for whether it needs to be sharded, with this PR only types which contain new instances are checked.
Additionally, the PR contains a minor fix where we simplify the type shard behaviour: type shard is always created right after the threshold has been reached. This change has been reflected in the test.
Re-implement and adapt type sharding to Grakn Core 1.5.x.
Type sharding is a feature which is intended to maintain good query performance when working with large graph.
The main aim of type sharding is to alleviate a well-known graph problem known as the "supernode" problem. A supernode refers to a vertex which has many incidental edges. Queries that need to touch such a vertex will be slow. The bigger the graph is, the more likely it is for a supernode to exist in the graph.
In Grakn, a supernode can exist when there exists a single type with millions of instances. In this scenario, the type vertex is the supernode as it will have millions of edges to each and every instance.
With type sharding, the type vertex will be connected to several "shards", each of which connected to a portion of the instances.
Type sharding is performed when a transaction is committed.
Accept relative paths in Grakn Console arguments.
Fix #5404
Previously, user would have needed to supply absolute paths as arguments (e.g. -f /absolute/path/to/schema.gql). This PR allows to supply paths relative to current directory (../relative/path/schema.gql)
Fix null pointer issue when retrieving meta implicit type hierarchy in RuleCache.
Querying for meta implicit relations: $x isa @has-attribute, results in a Null Pointer exception.
The reason is that when trying to determine the rules that are applicable in this case we look at the downstream type hierarchy of 'has-attribute including its explicit equivalents. However the meta implicit relation doesn't have an explicit equivalent and its db lookup results in a null.
Remove tx.stream cycles from reasoner.
Remove round (recursive) trips within transaction stream method.
Fix issue with ids from negation block being propagated into the conjunction.
Fixing the following bug:
If we define the negation block to have ids, e. g.:
match $x isa someThing; not {$x id 'abc';} get
The id would get propagated to the non-negated part leading to incorrect answers being returned.
The bug was triggered by #5328 when we changed how inner states of CompositeState are built.
Exit with correct exit code on invalid args.
Fix #5389
Increase server and client storage frame size.
Increase frame size so we do not hit exceptions when doing reads.
Optimise instance check in ComputeExecutor.
Similarly to #5397, we want to remove recursive tx.stream calls. When streaming compute queries, some of them perform extra checks to verify whether any instances are present. These currently perform stream calls on a transaction leading to multiple tx.stream calls within a single transaction stream method. This leads to poor performance and potential cycling problems.
Skip searching concepts to persist if no inferred concepts exist.
We incur a substantial performance hit by searching the dependency tree of every concept that is written/queried in a match...insert. This can entirely be avoided if there are 0 inferred concepts live in this Transaction. The PR implements this slightly conservative check that should be a large help with data loading jobs using match...insert without any inference.
Console clean abort and startup failure cleanup .
As noted in #5315 and #5314 console doesn't not provide a very nice error message when trying to connect to a non-existing Grakn server. Additionally, we always exit the console when using clean, even if no clean is actually performed.

Code Refactors

Extracted //console and //bin to separate repos.
We need to move the //console package into its own repository as it holds Grakn Console: an application that serves as an "interface" for users to interact with Grakn. Like Grakn Workbase and all Grakn Clients, Grakn Console should work for both Grakn Core and KGMS. Its source code and development should be independent of Grakn Core and KGMS codebase.
Given that both Grakn Core and Console depends on one binary (grakn.sh and grakn.bat) to start the JAR, we also need to move the //bin package to its own repository in which Grakn Core and Console could both depend on without creating a circular dependency.
Refactor definition and application of semantic difference.
Refactor the definition and application of semantic difference so that it is less confusing to understand and that it follows the processing flow more naturally.
Get rid of DisjunctionIterator.
We introduced the DisjunctionIterator to enforce sequentiality of processing of conjunctions to leverage query cache among conjunctions.
What it does is it creates an iterator for each component conjunction of the provided disjunction, from which it provides a top level merging iterator for the whole query.
Since QueryCache has now a transaction lifetime, this extra handling is not needed. We can simply use sequential streams within the same transaction.
Additionally, the DisjunctionIterator was preventing us from problems with flatMap stream operation (https://bugs.openjdk.java.net/browse/JDK-8075939). Since we introduced Vavr to handle that, this use case is also redundant.

Other Improvements

Test Delete With Limit Correctness.
As noted in #5388 we cannot limit delete queries, which leads to unexpected deletions occuring. For example, match...delete..; limit 5; is expected to only delete 5 concepts, not all of the things that were matched. This behavior was fixed in vaticle/typeql#78 and is tested here
Add release repository to test-deployment-apt and test-deployment-rpm.
test-deployment-apt and test-deployment-rpm needs to access the release repository. For example, when grakn depends on the common version 0.2.0, it is a released dependency and will exist in the release repository.

Fix flaky test related to type sharding in TransactionOLTPIT.
We've fixed a flaky test caused by attempting to get the oldest shard out of an unordered collection of shards. These were the two culprits in the test:
First culprit:

assertEquals(janusGraph.traversal().V()
    .has(Schema.VertexProperty.SCHEMA_LABEL.name(), "person").in().hasLabel("SHARD")
    .toSet().iterator().next(), // there were two shards at this point and we're attempting to get the oldest shard out of an unordered set. therefore, incorrect.
  typeShardForP1);

Second culprit:

assertEquals(
  Sets.difference(
    janusGraph.traversal().V().has(Schema.VertexProperty.SCHEMA_LABEL.name(), "person").in().hasLabel("SHARD").toSet(),
    Sets.newHashSet(typeShardForP1)
  ).iterator().next(),  // this is a set of two elements, and we're attempting to get the oldest shard out of an unordered set. therefore, incorrect.
  typeShardForP2
);

vaticle/typedb 1.5.9 Grakn Core 1.5.9 on GitHub

New Features

Bugs Fixed

Code Refactors

Other Improvements

vaticle/typedb 1.5.9
Grakn Core 1.5.9

on GitHub