NAME

WeightRandomList.pm

SYNOPSIS

# Read weights from a file and then use them to randomize a list of songs

use WeightRandomList;

my @songs = split "\n", `find $ENV{HOME}/Music -name *.mp3`;
my $weights_file = $ENV{HOME} . "/.songweights";
my %weights = %{ weights_from_file( $weights_file ) };
my @songs_random = @{ make_weighted_list( \%weights, \@songs ) };

# Declare a weights hash and then use it to randomize @pictures

my %pic_weights = ( "Waterhouse" => 3, 
    "Jack Kirby" => 5, 
    "family_photos" => 2, 
    "thumbnail" => 0 );
my @pictures = split "\n", `find $ENV{HOME}/Pictures -name *.jpg`;
my @pics_random = @{ make_weighted_list( \%pic_weights, \@pictures ) };

# The same, but with debugging messages

WeightRandomList::set_debug(1);
my @pics_random = @{ make_weighted_list( \%pic_weights, \@pictures ) };

# The same, but with the weights inverted (all nonzero weights changed 
# to their reciprocal).

WeightRandomList::set_inverse(1);
my @pics_random = @{ make_weighted_list( \%pic_weights, \@pictures ) };

# Build a weighted list yourself by getting weights for each string as you
# decide whether to add them to a given list and how many copies to add.
# (The code below won't randomize the list, just build a list with multiple
# copies of the things with weights >= 2 and none of the items with weights < 1.)

my @list;
while ( my $str = <> ) {
    my $working_weight = calc_weight( $str, \%pic_weights );
    while ( $working_weight >= 1 ) {
        push @list, $str;
        --$working_weight;
    }
}

DESCRIPTION

WeightRandomList is a module designed to randomize lists of strings while applying weights to them, i.e., checking whether one or more regular expressions match each string and inserting zero, one, or many copies of the string into the resulting randomized list depending on the weight associated with the regular expression.

You can build a weights hash in your own code and pass it to make_weighted_list() or read weights from a file with weights_from_file().

I use it in a number of ways, in creating slideshows of random images, shuffling random songs from my music library, copying a random subset of songs to my mp3 player, in a web crawler that downloads random images, and in my textual slideshow, which slowly scrolls a random series of paragraphs from text files on one's hard drive (available from my website).

FUNCTIONS

weights_from_file( <filename> )

Read a configuration file and build a hash suitable for passing by ref to make_weighted_list. The file should have regex/weight pairs, one per line, separated by tabs. Blank lines are OK. '#' starts a comment, unless it's preceded by '\', i.e. you can put \# in a regex to match a literal #.

make_weighted_list( <weights hash ref>, <list array ref>, [<default weight>] );

Take a list of strings and produce a randomized list where the number of copies of each element of the original list is determined by weights. The weights are pairs of regular expressions and floating-point numbers. If a regex has weight 2, every string matching that regex in the original list will appear twice in the randomized list; if it has weight 0.5, roughly half the strings matching that regex will appear in the randomized list; if it has weight 1.5, roughly half the strings matching that regex will appear once while the rest appear twice. Weight 0 means don't include any strings matching that regex.

A string matching two or more regexes will appear a number of times corresponding to the product of those weights.

Normally, strings matching none of the weight regexes will appear once. If the optional default weight argument is given, strings matching none of the weight regexes will appear that many times instead (usually you would pass zero when you only want strings matching one or more regexes).

calc_weight( <string>, <weights hash reference>, [<default weight>] )

Calculate the overall weight of the regular expressions that apply to a particular string.

set_inverse( <boolean> )

Turn on or off inverse weights. If this is true, weights_from_file() will invert all nonzero weights, replacing them with their reciprocal.

(I don't currently remember what use case I had in mind when I added this feature, but I don't see a reason to remove it.)

set_debug( <boolean> )

Turn on or off debugging messages.

AUTHOR

Jim Henry III, http://jimhenry.conlang.org/software/

LICENSE

This library is free software; you may redistribute it and/or modify it under the same terms as Perl itself.