|
|
|
ObjectDatabase++ Indexes Explained
The power of using a database to manage information comes from the use of
advance index structures that make efficient searches to retrieve only the
required information when it is actually needed. There are several different
index structures available to people designing databases and with this page I
hope to get people a guide on how to use these structure to there full
potential.
The Basics
The Linked List
 The very first dynamic index structure
that most computer programmers experience is the humble ordered linked list. It
has in its purest, a head pointer and a linear structure, which in every node
containing index key value and a pointer that points to the next node within
the list. This structure while been very simple has a very poor build
efficiency rating of O(n2) as each newly added key starts its search
from the first index and compared itself with each successive key until the
correct location is finally found.
|
| |
The B-Tree
Beyond the linked list comes the more complex B-Tree. It is called a B-Tree as
it is short for Binary Tree, and it is called a binary tree because apart from
the index key value, there are two pointers that point to the next node within
the index structure, where as the linked list only had one.
This structure has a better build efficiency rating of O(n log(n)) as each
newly added key starts from the top of the tree, comparing with each node and
if greater that the current key value it moves into the right sub-tree or if
less than moves into the left sub-tree. Comparing with only the nodes it comes
in contact with before finding a place at the bottom of the structure, it would
on average compare log(n) nodes to find its correct position.
|
| |
 The B-Tree does have a problem however,
when key values are inserted into the index in ever increasing value, for
example 1,2,3,4,5,6,7... the B-Tree is turned into a linked list with the
efficiency rating of O(n2).
The AVL Tree
AVL Trees are an improvement on the B-Tree - even though they both have two
nodes beneath each node, proposed by mathematicians Adel’son-Vel’skii and
Landis, and are a balanced binary tree. These structures are fluidic in nature
as when a newly added index key is inserted, the structure rearranges itself to
maintain a build efficiency rating of O(n log(n)) even when index values are
inserted in ever increasing value, rather much like water that finds its own
level.
|
| |
Database Indexes
When designing the indexes used within a database management systems (DBMS),
there are a couple of special considerations that influence the finial design.
Firstly, with the size of many databases, it would be too slow to create an
index each and every time a search is required and therefore the indexes are
maintained over the life of the data. Secondly, the nature of disk drives is
that they are capable of holding vast amounts of information, but accessing the
data can be slow and if the indexes requires many small reads from the disk, as
a B-Tree or AVL tree would require, performance can be severely hampered so the
indexes need to be page orientated. And lastly, when the indexes are used by
many people – only the best will do, the indexes must be as efficient as
possible by being self leveling, to maintain efficiency under all various
circumstances at all times.
|
| |
| The B+-Tree Indexes
The most powerful of all database indexes
is the B+-Tree index. It can use fixed or variable length keys and
orders them in such a way as to allow multiple methods of searching, such as
find first, last, greater than, less than as well as equal to. It is organized
into pages that have much better read/write efficiency and it is also
self-leveling. With its basic key design it can make use of most data formats,
from integers, floating point, dates, text strings and so on.
The B+-Tree is similar in design to be B-Tree but each node has more
than one key value and two pointers, in fact each page of the B+-Tree
normally contains about 50 or so key values that are sorted from smallest to
largest. Within each key value within the index nodes is a pointer that has the
reference to the next page below the node. The search first obtains the head
node, searches through the list of keys until it finds the first key greater
than the desired key than moves onto the next node, if no key is greater than
the search key, then the last page is referenced. Once the leaf page at the
bottom of the index is found, the corresponding ObjectID can be obtained and
the search in complete.
When a new key is inserted into the B+-Tree index, the search finds
if the desired position is free, then if there is not enough room to fit the
new key within the page is broken up into two, filling each new page with half
of the contents of the old and placing a new key into the new page. This
creates a data structure that expands sideways rather than vertically like the
humble B-Tree.
|
| |
| The Spatial Tree (S-Tree) Indexes
The S-Tree index is so called, as it is
short for spatial indexes. Originally developed by Informix as a R-Tree index
for grid searching in rectangular areas, it has now been further developed for
3 dimensional searches with cubes. S-Tree searches for any corresponding values
that have some cross over area or volume and then builds a temporary index in
memory for the standard read operations.
S-Tree has a very similar structural pattern as the B+-Tree indexes,
but the key value in the above node does not separate those small to the left
and larger to the right, S-Tree node values contain the bounding area for all
the areas directly below. In this way, searching for all areas or volumes that
have a cross region with the search key, a sub tree can be cancelled if none of
its bounding values crosses the search key, thus giving it a O(n log(n))
build/creation efficiency rating.
The tree grows in a very similar manner as the B+-Tree, as an index
key is placed into the best fitting leaf, if that leaf does not have enough
space, the leaf is divided into two and a new key value is placed into the tree
representing the new leaf.
One of the biggest differences between the B+Tree and the Spatial Index is that
searching the Spatial Index can return more than one matching result and
therefore a temporary index is created from the search query before any
matching results are read from the database. Spatial and Token List indexes are
special in the way that they are an index structure used to create an index,
rather that being the index itself.
|
| |
| The Temporal Tree (T-Tree) Indexes
Temporal Tree Indexes are very much similar to Spatial Indexes except they use
time quantities CODBPP::DATE and CODBPP:FILETIME, and are one dimensional which
goes from start to finish. T-Tree Indexes will search for all over lapping time
periods between the search quantities and the corresponding index fields.
The Structure of the T-Tree is identical to the spatial indexes and is searched
in identical fashion.
|
| |
| The Spatial Pattern Tree (Pattern) Indexes
Spatial Pattern indexes are designed to accommodate needs for biometrics pattern matching.
Falling into the same category as spatial indexes, pattern indexes allow for a series of two
or three dimensional control points to be matched against another.
For example in the picture to the left, where a finger print has been analysed to find a
relative centre and x y axis, we can find a series of control points that describe the unique
features of each print. Once the pattern has been entered into the database, this index structure
allows for a partial or complete matching of the same series of control points, within a specified
error margin.
This index regime as been implemented for both two and three dimensions, allow for two dimensional
patterns such as finger prints and also three dimensional patterns such as facial recognition.
The Pattern index works in a similar manner to the standard Spatial Tree index described above. The
nodes structure is identical in both instances and only differs in the leaf structure, where only the
control points are saved from each pattern series. Searching is also done in a similar way for both
indexes, where only paths that are bounded by the area described in the node are followed – this reduces
the amount pages read for each search. But unlike searching with Spatial Trees where overlapping regions
are matched, in Pattern indexes only the control points within the margin of error are returned.
|
| |
| The Token List Indexes
Token List Indexes are designed for the quick searching of full text queries.
There are two types of token list indexes, first is the ASCII (8 bit
characters) token list and second is the wide character (16 bits) token list.
The token list structure is like the Spatial Index as more than one matching
result can be found by searching the structure, so a temporary index is created
before any reads from the database can take places. The Token List in structure
is very similar to a variable B+-Tree index that has buckets to
store the many matching ObjectIDs per token.
This Token List index first breaks up the defined index fields into each
separate token that it contains, for example the string "the lazy brown fox"
would be broken up into four tokens:- the; lazy; brown; fox, each of these
token would then be indexed and the ObjectID added to the matching bucket. Then
the temporary index is created, the search only has to cross check each unique
token and not the full text of each field in each object.
The Token List searches provides several different mechanisms for the search,
firstly there are the Boolean logical operators: - and; or; xor; not; and ().
Secondly there are token operators: - if a token is to contain a string, "A" or
A; if a token is to start with a string " A"; if a token is to end with a
string "A "; and lastly, is a token is to match a given string " A ". Where
more advance search logic is required you are to implement this yourself, for
example if you wanted a search "A near B", you would have to search "A and B"
then read each result to cross check that A was in fact near B. This is because
the Token List index does not provide this sort of searching abilities and the
implementation would be the same if done within ObjectDatabase++ or not.
|
| |
| The Hash Indexes
Hash Indexes are some times used in a DBMS and fall into two groups – dynamic
and static hashing. Static hashing is very powerful, but are usable only within
a handful of applications, it also has the staggering search efficiency rating
of O(1) and is used by ObjectDatabase++ to reference the ObjectIDs. It
basically is an index array where the ObjectID points directly to a position in
the array where the address for the object is located. However, static hashing
is very limited in its application, holes can occur, only unique values can be
used and the hash algorithm needs to be simple.
Dynamic hashing on the other hand is more flexible, but when it is created into
a general algorithm for all situations the search efficiency rating drops to
O(log(n)), which is the same as the trusty old B+-Tree index, but
without the ability to search for the first, last, next, previous, greater than
and less than. This is why many DBMS and ObjectDatabase++ do not offer dynamic
hashing indexes.
|
Try today by freely downloading ObjectDatabase++
GUI Editor.
|