NAME Set::Similarity::BV - similarity measures for sets using fast bit vectors (BV) SYNOPSIS use Set::Similarity::BV::Dice; # object method my \$dice = Set::Similarity::BV::Dice->new; my \$similarity = \$dice->similarity('af09ff','9c09cc'); # class method my \$dice = 'Set::Similarity::BV::Dice'; my \$similarity = \$dice->similarity('af09ff','9c09cc'); DESCRIPTION This is the base class including mainly helper and convenience methods. Use one of the child classes: Set::Similarity::BV::Cosine Set::Similarity::BV::Dice Set::Similarity::BV::Jaccard Set::Similarity::BV::Overlap Overlap coefficient ( A intersect B ) / min(A,B) Jaccard Index The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets ( A intersect B ) / (A union B) The Tanimoto coefficient is the ratio of the number of features common to both sets to the total number of features, i.e. ( A intersect B ) / ( A + B - ( A intersect B ) ) # the same as Jaccard The range is 0 to 1 inclusive. Dice coefficient The Dice coefficient is the number of features in common to both sets relative to the average size of the total number of features present, i.e. ( A intersect B ) / 0.5 ( A + B ) # the same as sorensen The weighting factor comes from the 0.5 in the denominator. The range is 0 to 1. METHODS All methods can be used as class or object methods. new \$object = Set::Similarity::BV->new(); similarity my \$similarity = \$object->similarity(\$hex1,\$hex2); \$hex is a string of hexadecimal characters. from_integers my \$similarity = \$object->from_integers(\$AoI1,\$AoI2); Croaks if called directly. This method should be implemented in a child module. intersection my \$intersection_size = \$object->intersection(\$AoI1,\$AoI2); \$AoI is an array reference of integers. Returns the length of the intersection. combined_length my \$set_size_sum = \$object->combined_length(\$AoI1,\$AoI2); \$AoI is an array reference of integers. min my \$min = \$object->min(\$int1,\$int2); bits my \$bits = \$object->bits(\$int); Returns the number of bits set in integer. SEE ALSO Set::Similarity::BV::Cosine Set::Similarity::BV::Dice Set::Similarity::BV::Jaccard Set::Similarity::BV::Overlap SOURCE REPOSITORY http://github.com/wollmers/Set-Similarity-BV AUTHOR Helmut Wollmersdorfer, COPYRIGHT AND LICENSE Copyright (C) 2016 by Helmut Wollmersdorfer This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.