Classes | |
class | ZTB |
A smart pointer representing the connection to a tuplebase. More... | |
class | ZTBTxn |
Combines references to a ZTxn and ZTB. More... | |
class | ZTBIter |
Allows one to walk over the set of tuples in a ZTB as described by a ZTBQuery. More... | |
class | ZTBQuery |
Describes the tuple selection operations to be applied to a tuplebase or tuplesoup. More... | |
class | ZTBSpec |
Represents criteria to be matched against tuples. More... |
All accesses to a tuplebase, even simple reads, are made in the context of a transaction. When the transaction is committed failure can be reported. The power of this is that your code can treat the tuplebase as always being in a consistent state without any need to explicitly synchronize access. The same applies when your transaction needs to write to the tuplebase -- you can write code of arbitrary complexity, and when you commit either all of the work is made permanent and visible to the world or none of it is.
A tuplebase is represented in your code by a ZTB object. ZTB is a smart pointer with value semantics, it can thus be passed by value or by reference and can be assigned from and to. Under the hood there is an instance of ZTBRep, of which there are several concrete subclasses that use a blockstore or RAM to store tuples, or use a network connection to a remote tuplebase. These differences do not affect application-level code.
iTxn
, and that a ZTB reference has been passed in a parameter called iTB
.The simplest and least interesting way to access tuples in a tuplebase is by using the ZTB::Get and ZTB::Set, treating it as a very large array. We retrieve the tuple stored at ID 27 thus:
ZTuple theTuple = iTB.Get(iTxn, 27);
theTuple
is just a regular tuple, it is independent of the tuplebase from which it came (although the copy-on-write representation sharing that ZTuple uses means there's often no actual cost). We can make changes to theTuple, assigning to it or from it. When the time comes to write our changes we do this:
iTB.Set(iTxn, 27, theTuple);
In the preceding example we presumed the existence of a tuple with ID 27. A problem? No. A tuplebase is always fully populated. You can always read an arbitrary ID, just as you can always write to an arbitrary ID. Consequentially there is no concept of 'creating' or 'deleting' a tuple in a tuplestore. However we do need to arbitrate access to the ID Space. We want to be able to write tuples into slots that are guaranteed never to have been used previously. That's why we use 64 bit IDs -- they're large enough to be considered inexhaustible. ZTB::AllocateID returns a 64 bit ID that has never been returned previously, nor ever will again. For convenience ZTB::Add both allocates an ID and stores the passed-in tuple under that ID.
So we've got a humungous array of tuples. This in itself can be useful, perhaps as a way to store tuple-ized object trees. More interesting is in treating it as a database, but to see how to do so we need some more building blocks.
ZTBSpec provides a way to specify tuples. An instance of ZTBSpec is initialized with criteria or by being the combination of other specifications. Then ZTBSpec::Matches can be used to determine if a particular tuple matches that specification.
For example, the spec that will match all tuples whose property named "Prop"
has the string value "Value"
can be constructed thus, using the static pseudo-constructor ZTBSpec::sEquals:
ZTBSpec equalsSpec = ZTBSpec::sEquals("Prop", "Value");
ZTBSpec equalsSpec("Prop", ZTBSpec::eRel_Equal, "Value");
"OtherProp"
is an int32 less than 100: ZTBSpec lessSpec = ZTBSpec::sLess("OtherProp", int32(100));
ZTBSpec instances can be combined with the &
and | operators, so this expression would match all tuples satsifying either of the prior specifications:
ZTBSpec eitherSpec = equalsSpec | lessSpec;
ZTBSpec bothSpec = equalsSpec & lessSpec;
ZTBSpec gives us what we need to describe tuples that we're interested in. ZTBIter encapsulates the notion of applying such a specification to a tuplebase and iterating through all the tuples that match. Of course we also need to have a ZTxn instance to specify the context in which the access will be performed.
The ZTBIter constructor thus takes a ZTxn, a ZTB and a ZTBQuery. The ZTBQuery can itself be constructed from a ZTBSpec, and its additional capabilities will be covered a little later.
To iterate through all the tuples matching bothSpec
from above:
for (ZTBIter theIter(iTxn, iTB, bothSpec); theIter; theIter.Advance()) { ZTuple aTuple = theIter.Get(); // Do something with aTuple. }
ZTBIter has value semantics, so it can be assigned to or from another iterator, can be kept in instance variables, passed as a parameter to functions etc. Under the hood, copy-on-write makes it virtually zero cost to pass around instances of ZTBIter. However once the transaction with which it was initialized has been committed or aborted the iterator will become invalid, and currently it becomes unsafe to use (except to destroy).
Being able to access that subset of the tuples in a tuplebase that match a specification is very useful. However real-world use often requires that property values from found tuples be used as the criteria for another level of search, somewhat like an SQL join and select. It's straightforward to structure application code as a loop that walks an iterator, using properties from visited tuples as the criteria for an iterator walked by a nested loop. For the following examples we'll assume we have a tuplebase with the following tuples, each is preceded by its ID:
1: { Kind = "Organization"; "Name" = "SomeCompany" } 2: { Kind = "Organization"; "Name" = "OtherCompany" } 10: { Kind = "Person"; "Name" = "Fred"; Organization = ID(1); } 11: { Kind = "Person"; "Name" = "Bill"; Organization = ID(1); } 12: { Kind = "Person"; "Name" = "Jack"; Organization = ID(2); } 13: { Kind = "Person"; "Name" = "Jill"; Organization = ID(2); } 14: { Kind = "Person"; "Name" = "John"; Organization = ID(3); } 15: { Kind = "Equipment"; "Model" = "Fujitsu"; Organization = ID(2); }
ZTBSpec orgSpec = ZTBSpec::sEquals("Kind", "Organization"); for (ZTBIter orgIter(iTxn, iTB, orgSpec); orgIter; orgIter.Advance()) { uint64 orgID = orgIter.GetID(); ZTBSpec entitySpec = ZTBSpec::sEquals("Organization", orgID); for (ZTBIter entityIter(iTxn, iTB, entitySpec); entityIter; entityIter.Advance()) { ZTuple entityTuple = entityIter.Get(); // Do something with the tuple. } }
By walking the outer iterator and using returned values to initialize the inner iterator's query we're effectively using C++ as our query language. The preceding example is a bit verbose, and could have been simplified somewhat as follows:
for (ZTBIter orgIter(iTxn, iTB, ZTBSpec::sEquals("Kind", "Organization")); orgIter; orgIter.Advance()) { for (ZTBIter entityIter(iTxn, iTB, ZTBSpec::sEquals("Organization", orgIter.GetID())); entityIter; entityIter.Advance()) { ZTuple entityTuple = entityIter.Get(); // Do something with the tuple. } }
But the real problem is with the line // Do something with the tuple.
Given that C++ does not support closures (functors and function pointers notwithstanding), how could we parameterize that line? Well, we can't. But we can represent the nested searches thus:
ZTBQuery orgQuery = ZTBSpec::sEquals("Kind", "Organization")); ZTBQuery entityQuery("Organization", orgQuery); for (ZTBIter iter(iTxn, iTB, entityQuery); iter; iter.Advance()) { ZTuple entityTuple = entityIter.Get(); // Do something with the tuple. }
Here orgQuery
represents those tuples whose property "Kind"
has the value "Organization"
. And entityQuery
represents those tuples whose property "Organization"
is of type ID
and matches any of the IDs of tuples from orgQuery
. The code is not really any shorter, but it does have two points at which we can parameterize things. We can take entityQuery
and return it as the result of a function, store it in an instance variable or pass it to a function. It's an abstract representation of the nested loops from earlier, and can be applied against any ZTxn/ZTB pair. Or we can take the initialized ZTBIter object and pass it off, return it or store it. Let's turn our example into a factory function, that returns a ZTBQuery:
ZTBQuery QueryFactory() { ZTBQuery orgQuery = ZTBSpec::sEquals("Kind", "Organization"); return ZTBQuery("Organization", orgQuery); }
This returns the ZTBQuery that represents "entities that have an extant organization". To further restrict the results to include only people (i.e. tuples whose "Kind"
property has the value "Person"
):
ZTBQuery theQuery = QueryFactory(); theQuery &= ZTBSpec::sEquals("Kind", "Person");
run against our example tuples this drops ID 15, because its "Kind"
property has the value "Equipment"
.
The other advantage to the use of a complex ZTBQuery over manual iteration is that it is possible for the ZTBQuery to be shipped over the wire to a remote server for execution close to the tuplebase, thus removing the latency that would be incurred on each construction of a ZTBIter by nested loops. And because the whole of the query is available to the tuplebase it can be examined in its entirety and the work needed to generate results can be optimized. The disadvantage is that code using a complex query sees only the tuples that would be returned by the innermost loop of a manual iteration. If the higher level tuples are needed then a hybrid approach can be used.
The tuplebase may be configured to maintain indices of the values of tuples, in which case walking an iterator can be very efficient. If no suitable index exists then the iterator will still work, but it may require that every non-empty tuple be visited. Configuring indices on a tuplebase is a system administration job, and updating the suite of indices is something that should be informed by viewing log information associated with a tuplebase, or by knowledge of the actual usage patterns of a tuplebase by code known to be executing against it.
We've already seen that a ZTBQuery can be initialized from a ZTBSpec, and a ZTBIter initialized from such a ZTBQuery will return all the tuples that match the specification. The other simple instantiations of a ZTBQuery take an ID or a list of IDs in a vector
, set
or pointer and count. A ZTBIter initialized from one of the following queries would simply return the tuples with the specified IDs.
ZTBQuery theQuery(27); // Single ID vector<uint64> theVector; ZTBQuery theQuery(theVector); set<uint64> theSet; ZTBQuery theQuery(theSet); uint64 theIDs = [ 1, 7, 11, 13, 27 ]; ZTBQuery theQuery(theIDs, 5);
More complex queries can be formed by intersecting a ZTBSpec with a ZTBQuery thus:
This represents those tuples that would be returned bytheQuery
which also satisfy the specification theSpec
. It might be that theQuery
is already simply a search for tuples matching certain criteria, in which case the criteria represented by theSpec
are added in as a further constraint. Or it might be that theQuery
is highly complex and cannot simply have a specification applied to it, in which case theSpec
will be used to filter the results that are returned. In any case it's not the application's concern, the underlying mechanisms will take care of doing the most efficient job possible based on the details both of the query and of what data actually exists in the tuplebase and how any indices that may exist can be used.Similarly one can union a pair of queries thus:
which represents those tuples that would be returned byqueryA
plus those that would be returned by queryB
. The tuplebase implementation will take care of actually collapsing both queries into a single physical search if that is possible.Our earlier examples also showed how one can use the result set of one subquery to provide values to be fed into another:
ZTBQuery theQuery("Organization", orgQuery);
"Organization"
are of type ID and match the ID of tuples returned by orgQuery
. Compare it to the similar ZTBSpec constructor syntax The opposite order:
ZTBQuery theQuery(entityQuery, "Organization")
"Organization"
from tuples returned by entityQuery
. If in our sample tuplebase entityQuery
represents the "Person"
and "Equipment"
tuples (IDs 10 through 15), then theQuery
represents the organizations of those entities.