Tuplebase


Classes
class	ZTB
	A smart pointer representing the connection to a tuplebase. More...
class	ZTBTxn
	Combines references to a ZTxn and ZTB. More...
class	ZTBIter
	Allows one to walk over the set of tuples in a ZTB as described by a ZTBQuery. More...
class	ZTBQuery
	Describes the tuple selection operations to be applied to a tuplebase or tuplesoup. More...
class	ZTBSpec
	Represents criteria to be matched against tuples. More...

Detailed Description

See also:: Tuplestore

A tuplebase is an array of 2^64 tuples. It's always fully populated so that every index of the array contains a tuple. When first created a tuplebase contains 2^64 empty tuples, tuples with no properties, which is why you don't need 2^64 * x bytes of storage to create a tuplebase -- empty tuples don't need to actually be stored. The 64 bit index of each tuple is unique and permanent and is more usually called its ID. Individual tuples are accessed by ID, and more usefully sets of tuples can be accessed by using iterators that return tuples matching criteria.

All accesses to a tuplebase, even simple reads, are made in the context of a transaction. When the transaction is committed failure can be reported. The power of this is that your code can treat the tuplebase as always being in a consistent state without any need to explicitly synchronize access. The same applies when your transaction needs to write to the tuplebase -- you can write code of arbitrary complexity, and when you commit either all of the work is made permanent and visible to the world or none of it is.

A tuplebase is represented in your code by a ZTB object. ZTB is a smart pointer with value semantics, it can thus be passed by value or by reference and can be assigned from and to. Under the hood there is an instance of ZTBRep, of which there are several concrete subclasses that use a blockstore or RAM to store tuples, or use a network connection to a remote tuplebase. These differences do not affect application-level code.

Note:: In many of these examples I'll assume that a reference to a ZTxn instance has been passed in a parameter called iTxn, and that a ZTB reference has been passed in a parameter called iTB.

Access via ID

The simplest and least interesting way to access tuples in a tuplebase is by using the ZTB::Get and ZTB::Set, treating it as a very large array. We retrieve the tuple stored at ID 27 thus:

ZTuple theTuple = iTB.Get(iTxn, 27);

theTuple is just a regular tuple, it is independent of the tuplebase from which it came (although the copy-on-write representation sharing that ZTuple uses means there's often no actual cost). We can make changes to theTuple, assigning to it or from it. When the time comes to write our changes we do this:

iTB.Set(iTxn, 27, theTuple);

No create or delete

In the preceding example we presumed the existence of a tuple with ID 27. A problem? No. A tuplebase is always fully populated. You can always read an arbitrary ID, just as you can always write to an arbitrary ID. Consequentially there is no concept of 'creating' or 'deleting' a tuple in a tuplestore. However we do need to arbitrate access to the ID Space. We want to be able to write tuples into slots that are guaranteed never to have been used previously. That's why we use 64 bit IDs -- they're large enough to be considered inexhaustible. ZTB::AllocateID returns a 64 bit ID that has never been returned previously, nor ever will again. For convenience ZTB::Add both allocates an ID and stores the passed-in tuple under that ID.

So we've got a humungous array of tuples. This in itself can be useful, perhaps as a way to store tuple-ized object trees. More interesting is in treating it as a database, but to see how to do so we need some more building blocks.

Specifications

ZTBSpec provides a way to specify tuples. An instance of ZTBSpec is initialized with criteria or by being the combination of other specifications. Then ZTBSpec::Matches can be used to determine if a particular tuple matches that specification.

For example, the spec that will match all tuples whose property named "Prop" has the string value "Value" can be constructed thus, using the static pseudo-constructor ZTBSpec::sEquals:

ZTBSpec equalsSpec = ZTBSpec::sEquals("Prop", "Value");

or by explicitly passing the relationship:

ZTBSpec equalsSpec("Prop", ZTBSpec::eRel_Equal, "Value");

The property name parameter is of course always a string, and should considered to be UTF-8 although at this stage it's really just a bunch of bytes. The value parameter can be any ZTupleValue. So to specify tuples whose property "OtherProp" is an int32 less than 100:

ZTBSpec lessSpec = ZTBSpec::sLess("OtherProp", int32(100));

ZTBSpec instances can be combined with the & and | operators, so this expression would match all tuples satsifying either of the prior specifications:

ZTBSpec eitherSpec = equalsSpec | lessSpec;

and to specify tuples that match both:

ZTBSpec bothSpec = equalsSpec & lessSpec;

Iterators

ZTBSpec gives us what we need to describe tuples that we're interested in. ZTBIter encapsulates the notion of applying such a specification to a tuplebase and iterating through all the tuples that match. Of course we also need to have a ZTxn instance to specify the context in which the access will be performed.

The ZTBIter constructor thus takes a ZTxn, a ZTB and a ZTBQuery. The ZTBQuery can itself be constructed from a ZTBSpec, and its additional capabilities will be covered a little later.

To iterate through all the tuples matching bothSpec from above:

for (ZTBIter theIter(iTxn, iTB, bothSpec); theIter; theIter.Advance())
        {
        ZTuple aTuple = theIter.Get();
        // Do something with aTuple.
        }

Here ZTBIter::operator_bool_type() is being checked before each iteration occurs. It continues to return true until we run off the end of the result set. ZTBIter::Advance updates the iterator to reference the next matching tuple, or changes its state so that ZTBIter::operator_bool_type() will return false, indicating that there are no more tuples to examine.

ZTBIter has value semantics, so it can be assigned to or from another iterator, can be kept in instance variables, passed as a parameter to functions etc. Under the hood, copy-on-write makes it virtually zero cost to pass around instances of ZTBIter. However once the transaction with which it was initialized has been committed or aborted the iterator will become invalid, and currently it becomes unsafe to use (except to destroy).

More complex queries

Being able to access that subset of the tuples in a tuplebase that match a specification is very useful. However real-world use often requires that property values from found tuples be used as the criteria for another level of search, somewhat like an SQL join and select. It's straightforward to structure application code as a loop that walks an iterator, using properties from visited tuples as the criteria for an iterator walked by a nested loop. For the following examples we'll assume we have a tuplebase with the following tuples, each is preceded by its ID:

 1: { Kind = "Organization"; "Name" = "SomeCompany" }
 2: { Kind = "Organization"; "Name" = "OtherCompany" }

10: { Kind = "Person"; "Name" = "Fred"; Organization = ID(1); }
11: { Kind = "Person"; "Name" = "Bill"; Organization = ID(1); }
12: { Kind = "Person"; "Name" = "Jack"; Organization = ID(2); }
13: { Kind = "Person"; "Name" = "Jill"; Organization = ID(2); }
14: { Kind = "Person"; "Name" = "John"; Organization = ID(3); }

15: { Kind = "Equipment"; "Model" = "Fujitsu"; Organization = ID(2); }

To examine each tuple that has an extant organization (IDs 10 through 15, excepting 14):

ZTBSpec orgSpec = ZTBSpec::sEquals("Kind", "Organization");
for (ZTBIter orgIter(iTxn, iTB, orgSpec); orgIter; orgIter.Advance())
        {
        uint64 orgID = orgIter.GetID();
        ZTBSpec entitySpec = ZTBSpec::sEquals("Organization", orgID);
        for (ZTBIter entityIter(iTxn, iTB, entitySpec); entityIter; entityIter.Advance())
                {
                ZTuple entityTuple = entityIter.Get();
                // Do something with the tuple.
                }
        }

By walking the outer iterator and using returned values to initialize the inner iterator's query we're effectively using C++ as our query language. The preceding example is a bit verbose, and could have been simplified somewhat as follows:

for (ZTBIter orgIter(iTxn, iTB, ZTBSpec::sEquals("Kind", "Organization"));
        orgIter; orgIter.Advance())
        {
        for (ZTBIter entityIter(iTxn, iTB, ZTBSpec::sEquals("Organization", orgIter.GetID()));
                entityIter; entityIter.Advance())
                {
                ZTuple entityTuple = entityIter.Get();
                // Do something with the tuple.
                }
        }

But the real problem is with the line // Do something with the tuple. Given that C++ does not support closures (functors and function pointers notwithstanding), how could we parameterize that line? Well, we can't. But we can represent the nested searches thus:

ZTBQuery orgQuery = ZTBSpec::sEquals("Kind", "Organization"));
ZTBQuery entityQuery("Organization", orgQuery);
for (ZTBIter iter(iTxn, iTB, entityQuery); iter; iter.Advance())
        {
        ZTuple entityTuple = entityIter.Get();
        // Do something with the tuple.
        }

Here orgQuery represents those tuples whose property "Kind" has the value "Organization". And entityQuery represents those tuples whose property "Organization" is of type ID and matches any of the IDs of tuples from orgQuery. The code is not really any shorter, but it does have two points at which we can parameterize things. We can take entityQuery and return it as the result of a function, store it in an instance variable or pass it to a function. It's an abstract representation of the nested loops from earlier, and can be applied against any ZTxn/ZTB pair. Or we can take the initialized ZTBIter object and pass it off, return it or store it. Let's turn our example into a factory function, that returns a ZTBQuery:

ZTBQuery QueryFactory()
        {
        ZTBQuery orgQuery = ZTBSpec::sEquals("Kind", "Organization");
        return ZTBQuery("Organization", orgQuery);
        }

This returns the ZTBQuery that represents "entities that have an extant organization". To further restrict the results to include only people (i.e. tuples whose "Kind" property has the value "Person"):

ZTBQuery theQuery = QueryFactory();
theQuery &= ZTBSpec::sEquals("Kind", "Person");

run against our example tuples this drops ID 15, because its "Kind" property has the value "Equipment".

The other advantage to the use of a complex ZTBQuery over manual iteration is that it is possible for the ZTBQuery to be shipped over the wire to a remote server for execution close to the tuplebase, thus removing the latency that would be incurred on each construction of a ZTBIter by nested loops. And because the whole of the query is available to the tuplebase it can be examined in its entirety and the work needed to generate results can be optimized. The disadvantage is that code using a complex query sees only the tuples that would be returned by the innermost loop of a manual iteration. If the higher level tuples are needed then a hybrid approach can be used.

The tuplebase may be configured to maintain indices of the values of tuples, in which case walking an iterator can be very efficient. If no suitable index exists then the iterator will still work, but it may require that every non-empty tuple be visited. Configuring indices on a tuplebase is a system administration job, and updating the suite of indices is something that should be informed by viewing log information associated with a tuplebase, or by knowledge of the actual usage patterns of a tuplebase by code known to be executing against it.

Kinds of queries

We've already seen that a ZTBQuery can be initialized from a ZTBSpec, and a ZTBIter initialized from such a ZTBQuery will return all the tuples that match the specification. The other simple instantiations of a ZTBQuery take an ID or a list of IDs in a vector, set or pointer and count. A ZTBIter initialized from one of the following queries would simply return the tuples with the specified IDs.

ZTBQuery theQuery(27); // Single ID

vector<uint64> theVector;
ZTBQuery theQuery(theVector);

set<uint64> theSet;
ZTBQuery theQuery(theSet);

uint64 theIDs = [ 1, 7, 11, 13, 27 ];
ZTBQuery theQuery(theIDs, 5);

More complex queries can be formed by intersecting a ZTBSpec with a ZTBQuery thus:

ZTBQuery theQuery;
ZTBSpec theSpec;
ZTBQuery intersectedQuery = theQuery & theSpec;

This represents those tuples that would be returned by theQuery which also satisfy the specification theSpec. It might be that theQuery is already simply a search for tuples matching certain criteria, in which case the criteria represented by theSpec are added in as a further constraint. Or it might be that theQuery is highly complex and cannot simply have a specification applied to it, in which case theSpec will be used to filter the results that are returned. In any case it's not the application's concern, the underlying mechanisms will take care of doing the most efficient job possible based on the details both of the query and of what data actually exists in the tuplebase and how any indices that may exist can be used.

Similarly one can union a pair of queries thus:

ZTBQuery queryA;
ZTBQuery queryB;
ZTBQuery unionedQuery = queryA | queryB;

which represents those tuples that would be returned by queryA plus those that would be returned by queryB. The tuplebase implementation will take care of actually collapsing both queries into a single physical search if that is possible.

Our earlier examples also showed how one can use the result set of one subquery to provide values to be fed into another:

ZTBQuery theQuery("Organization", orgQuery);

the syntax is intended to read tuples whose property named "Organization" are of type ID and match the ID of tuples returned by orgQuery. Compare it to the similar ZTBSpec constructor syntax

ZTBSpec theSpec("Organization", uint64(27));

The opposite order:

ZTBQuery theQuery(entityQuery, "Organization")

reads as tuples whose ID matches the property named "Organization" from tuples returned by entityQuery. If in our sample tuplebase entityQuery represents the "Person" and "Equipment" tuples (IDs 10 through 15), then theQuery represents the organizations of those entities.

Generated on Thu Jul 26 11:22:00 2007 for ZooLib by

1.4.7