NAME AI::MaxEntropy - Perl extension for learning Maximum Entropy Models SYNOPSIS use AI::MaxEntropy; # create a maximum entropy learner my \$me = AI::MaxEntropy->new; # the learner see 2 red round smooth apples \$me->see(['round', 'smooth', 'red'] => 'apple' => 2); # the learner see 3 yellow long smooth bananas \$me->see(['long', 'smooth', 'yellow'] => 'banana' => 3); # and more # samples needn't have the same numbers of active features \$me->see(['rough', 'big'] => 'pomelo'); # the order of active features is not concerned, too \$me->see(['big', 'rough'] => 'pomelo'); # ... # okay, let it learn my \$model = \$me->learn; # then, we can make prediction on unseen data # ask what a red thing is most likely to be print \$model->predict(['red'])."\n"; # the answer is apple, because all red things the learner have ever seen # are apples # ask what a smooth thing is most likely to be print \$model->predict(['smooth'])."\n"; # the answer is banana, because the learner have seen more smooth bananas # (weighted 3) than smooth apples (weighted 2) # ask what a red, long thing is most likely to be print \$model->predict(['red', 'long'])."\n"; # the answer is banana, because the learner have seen more long bananas # (weighted 3) than red apples (weighted 2) # print out scores of all possible answers to the feature round and red for (\$model->all_labels) { my \$s = \$model->score(['round', 'red'] => \$_); print "\$_: \$s\n"; } # save the model \$model->save('model_file'); # load the model \$model->load('model_file'); DESCRIPTION Maximum Entropy (ME) model is a popular machine learning approach, which is used widely as a general classifier. A ME learner try to recover the real probability distribution based on limited number of observations, by applying the principle of maximum entropy. The principle of maximum entropy assumes nothing on unknown data, in another word, all unknown things are as even as possible, which makes the entropy of the distribution maxmized. Samples In this module, each observation is abstracted as a sample. A sample is denoted as "x => y => w", which consists of a set of active features (array ref x), a label (scalar y) and a weight (scalar w). The client program adds a new sample to the learner by "see". Features and Active Features The features describe which characteristics things can have. And, if a thing has a certain feature, we say that feature is active in that thing (an active feature). For example, an apple is red, round and smooth, then the active features of an apple can be denoted as an array ref "['red', 'round', 'smooth']". Each element here is an active feature (generally, denoted by a string), and the order of active features is not concerned. Label The label denotes the name of the thing we describe. For the example above, we are describing an apple, so the label can be 'apple'. Weight The weight can be simply taken as how many times a thing with certain characteristics occurs, or how persuasive it is. For example, we see 2 red round smooth apples, we denote it as "['red', 'round', 'smooth'] => 'apple' => 2". Model After seeing enough samples, a model can be learnt from them by calling "learn", which generates an AI::MaxEntropy::Model object. A model is generally considered as a classifier. When given a set of features, one can ask the model which label is most likely to come with these features by calling "predict" in AI::MaxEntropy::Model. FUNCTIONS NOTE: This is still an alpha version, the APIs may be changed in future versions. new Create a Maximum Entropy learner. Optionally, initial values of properties can be specified here. my \$me1 = AI::MaxEntropy->new; my \$me2 = AI::MaxEntropy->new( optimizer => { epsilon => 1e-6 }); my \$me3 = AI::MaxEntropy->new( optimizer => { m => 7, epsilon => 1e-4 }, smoother => { type => 'gaussian', sigma => 0.8 } ); The properties values specified in creation time can be changed later, like, \$me->{optimizer} = { epsilon => 1e-3 }; \$me->{smoother} = {}; see Let the Maximum Entropy learner see a new sample. The weight can be omitted, in which case, default weight 1.0 will be used. my \$me = AI::MaxEntropy->new; # see a sample with default weight 1.0 \$me->see(['a', 'b'] => 'p'); # see a sample with specified weight 0.5 \$me->see(['c', 'd'] => 'q' => 0.5); The sample can be also represented in the attribute-value form, which like \$me->see({color => 'yellow', shape => 'long'} => 'banana'); \$me->see({color => ['red', 'green'], shape => 'round'} => 'apple'); Actually, the two samples above are converted internally to, \$me->see(['color:yellow', 'shape:long'] => 'banana'); \$me->see(['color:red', 'color:green', 'shape:round'] => 'apple'); forget_all Forget all samples the learner have seen previously. learn Learn a model from all the samples, returns an AI::MaxEntropy::Model object, which can be used to make prediction on unseen data. ... my \$model = \$me->learn; print \$model->predict(['x1', 'x2', ...]); PROPERTIES optimizer The optimizer is the essential component of this module. This property enable the client program to customize the behavior of the optimizer. It is a hash ref, containing all parameters that the client program want to pass to the L-BFGS optimizer. Please refer to "List of Parameters" in Algorithm::LBFGS for details. smoother The smoother is a solution to the over-fitting problem. This property chooses the which type of smoother the client program want to apply and sets the smoothing parameters. Only one smoother have been implemented in this version, the Gaussian smoother. One can apply the Gaussian smoother as following, my \$me = AI::MaxEntropy->new( smoother => { type => 'gaussian', sigma => 0.6 } ); The Gaussian smoother has one parameter sigma, which indicates the strength of smoothing. Usually, sigma is a positive number no greater than 1.0. The strength of smoothing grows as sigma getting close to 0. progress_cb Usually, learning a model is a time consuming job. And the most time consuming part of this process is the optimization. This callback subroutine is for people who want to trace the progress of the optimization. By tracing it, they can do some useful output, which makes the learning process more user-friendly. This callback subroutine will be passed directly to "fmin" in Algorithm::LBFGS. You can also pass a string 'verbose' to this property, which simply print out the progress by a build-in callback subroutine. Please see "progress_cb" in Algorithm::LBFGS for details. SEE ALSO AI::MaxEntropy::Model, Algorithm::LBFGS Statistics::MaxEntropy, Algorithm::CRF, Algorithm::SVM, AI::DecisionTree AUTHOR Laye Suen, COPYRIGHT AND LICENSE The MIT License Copyright (C) 2008, Laye Suen Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. REFERENCE A. L. Berge, V. J. Della Pietra, S. A. Della Pietra. A Maximum Entropy Approach to Natural Language Processing, Computational Linguistics, 1996. S. F. Chen, R. Rosenfeld. A Gaussian Prior for Smoothing Maximum Entropy Models, February 1999 CMU-CS-99-108.