NAME
Algorithm::SpatialIndex::Strategy::MedianQuadTree - QuadTree splitting
on bucket medians
SYNOPSIS
use Algorithm::SpatialIndex;
my $idx = Algorithm::SpatialIndex->new(
strategy => 'MedianQuadTree',
);
DESCRIPTION
A modified quad tree implementation that I'll call Median Quad Tree
(MQT) in this document. (Not sure if this data structure has a different
name elsewhere.) See "ALGORITHM" below.
For a description of the public interface, see Algorithm::SpatialIndex.
This spatial index strategy uses the default
"Algorithm::SpatialIndex::Strategy::QuadTree" as a base class and gets
away with containing very little code because of that. This also means
that most of the interface and behaviour is inherited.
ALGORITHM
For a basic discussion of quad trees, take a look at the documentation
of the Algorithm::QuadTree module or look it up on Wikipedia. The
following describes how the MQT differs from a normal quad tree and some
implementation details.
Any x/y coordinate pair can be used to divide a rectangular area into
four sub-rectangles. When splitting up a node of an ordinary quad tree,
the center of the quad tree is chosen to split the node into four
sub-nodes. This can be done either when the tree is created (before it
is populated) with a static depth of the tree, or dynamically whenever
the number of items associated with a node becomes too large.
For the MQT, the point which splits a given node into four is chosen to
be the median of all contained item coordinates in each dimension.
This has several consequences:
* Due to the dynamic nature of the coordinate choice when splitting a
node in four, the MQT cannot be of a fixed depth and preallocated. It
needs to grow dynamically as it is filled.
* When the data in the tree is a reasonably general sample of the
underlying distribution of data, then this algorithm will create a
tree of evenly filled buckets, but not necessarily a well balanced
tree. To obtain a reasonable sample of the underlying data
distribution, it is prudent to insert items in random order.
* Due to this behaviour, the tree will create very small nodes/bins
where the most data is concentrated and bins of gradually increasing
size as one moves away from the highest concentrations. A normal quad
tree will have a similar behaviour, but due to the fixed size of the
bins, will "converge" much more slowly.
* If the data is inserted in order of the item coordinates, an ordinary
quad tree is a more efficient. The MQT will make bad choices as the
median of a contiguous subset of the ordered data will not reflect the
overall distribution and the property of having fairly evenly filled
buckets vanishes.
* It is likely that polling a spatial index using an MQT is faster than
an ordinary quad tree if the distribution of data is very different
from uniformity. If in doubt, benchmark.
* If the data is uniform but inserted in random order, the MQT will at
best be equal in performance to a quad tree.
* Filling a dynamically growing MQT has slightly more overhead than
filling a dynamically growing quad tree due to the median calculation,
but it is neither algorithmically slower nor necessarily slower in
practice. Algorithmically, splitting a quad tree node will be O(n) in
the bucket size. Splitting an MQT node will be O(n*log(n)) if a naive
median calculation is used or also O(n) with a linear-time median
calculation (like this implementation).
If the MQT is filled in random order and the data is not uniformly
distributed, it will, on average, have more evenly filled buckets and
thus less nodes than a quad tree, thus reducing the amount of tree
walking required while filling and later polling the index.
SEE ALSO
Algorithm::SpatialIndex
Algorithm::SpatialIndex::Strategy::QuadTree
Algorithm::SpatialIndex::Strategy
Algorithm::QuadTree
AUTHOR
Steffen Mueller,
COPYRIGHT AND LICENSE
Copyright (C) 2010 by Steffen Mueller
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself, either Perl version 5.10.1 or, at
your option, any later version of Perl 5 you may have available.