Software by Jim Henry III

This contains my non-conlang-related Perl scripts and modules. For my conlang-related scripts, see my conlang page.

epubcssfix

Many epubs come with unprofessional CSS that will not display correctly on some ebook readers. For instance, the font size may be illegibly small on a mobile device, or the user may have dark mode turned on, but the CSS specifies element colors according to an assumed (but not specified) white background, so there is little or no contrast with the actual black background. This script tries to fix those problems.

2023-09-10 – fixes based on feedback from people at perlmonks.org

podcatcher.pl

This is a highly configurable podcatcher than runs on the command line (typically in a cron job for downloading episodes, and interactively for copying episodes onto the mp3 player).

2023-09-10 – fixes based on feedback from people at perlmonks.org

textual-slideshow.pl

This program takes a list of text files or directories containing text files, then randomly display paragraphs from those files to the terminal one line at a time with short pause between each line. It’s interactive, with various keystrokes speeding up or slowing down the scrolling, getting help, exiting or doing other things.

You can run this on your home directory and see what output you get, or you can customize the output by giving it particular subdirectories, supplying a weights file to make files whose paths match certain patterns more or less likely to be chosen, setting the file extensions it looks for, etc.

This will probably produce more interesting output if you have a lot of HTML or plain text ebooks on your hard drive. A good source of them is Project Gutenberg. In a future version I plan to have this script grab paragraphs from .epub files as well. I also have vague plans to create a web crawler that feeds ebooks to this script by downloading random ebooks from Project Gutenberg and other sources of public domain books.

The zip below also includes WeightedRandomList.pm and its documentation, a module that transforms lists of strings by checking each against a list of regular expressions and applying the associated weight if the regular expression matches – omitting the string from the resulting list if the weight is zero, including one copy if it’s 1, two copies if it’s 2, etc., and randomly choosing the number of copies if it’s not an integer. (E.g., if a regex’s weight is 1.5, there is a 50% chance that matching strings will have one copy in the resulting list and a 50% chance they’ll have two copies.)

crawl-web-for-images.pl

Given a list of seed URLs, and possibly a file of weights (see above) to apply to URLs found, crawl the web randomly and download random images matching certain criteria to a specified directory.

Includes WeightedRandomList.pm, described above, and ImageSites.pm, a library of functions for web crawling.

stegspace.pl

This script inserts or reads a steganographic message in a text file, encoded as variable number of spaces at ends of lines. Each line can have 0-3 spaces at the end, encoding two bits, so one byte of the message requires four lines of the haystack file. It keeps the access and modification times of the haystack file the same as before.

For reference, a 5k text file hidden in the Bible (Project Gutenberg Douay-Rheims Version) adds spaces to the ends of lines as far as Deuteronomy 15:12, about 20k lines.

This is just for fun, not adequate security for actually concealing passwords or anything important. It can’t hide messages containing null bytes, so it can’t hide a binary file such as typical encryption produces.

Main page
Last updated 2023-09-12