View Single Post
Old 10-15-2009, 07:46 AM   #1
Jeffr0
 
Jeffr0's Avatar
 
Join Date: Mar 2005
Location: Harrisonburg VA
Default Comprehensive e23 Index

Brett requested a comprehensive index for the Spaceships series on another thread... and realizing that the PDF indexes would let you cut and paste to text files, I decided to write a short perl script to generate a comprehensive index. As this code would be useful for other PDF series as well, I'm posting it here.

Here is the code:

Code:
use strict;
use warnings;

my $previous;
my @stuff;

while (my $file = shift) {
    die "Need a proper file name" unless $file =~ /(\w+)/;
    my $f = $1;

    open FILE, "< $file"
        or die "Cannot find file $file: $file";
    
    while (my $line = <FILE>) {
        chomp($line);
        #print ":: $line\n";

        #do not insert a space into the line if the last
        #one ended with a dash
        $previous .= ' ' if $previous && $previous =~ /[^-]$/;
        $previous .= $line;

        #if the line ends in a period, insert book
        #references and remember that complete line
        if ($line =~ /\.$/){
            $previous =~ s/( \d)/ $f$1/g;
            $previous =~ s/\222/'/g;
            push (@stuff, $previous);
            #print "|" . $previous . "|\n";
            $previous = '';
        }
    }
}

foreach my $line (sort @stuff) {
    print "$line\n";
}
To produce an index with it, install perl on your box if necessary and create a folder for your text files. Put the above code into a textfile called indexer.pl. Name each text file with the book designator that you want to use. (Example: I.txt, II.txt, III.txt, IV.txt....) From a command line type "perl indexer.pl I.txt II.txt III.txt IV.txt V.txt" and it will print a comprehensive index. [Note, this example is written without shorthand because windows does not automatically glob filename arguments the way I think unix does and I did not yet want to waste time figuring out the correct cross plant-form user friendly approach to this issue, so my code just stupidly goes through the list of file names it receives as arguments.]

It seems to work pretty well. The only issue that I see is for cases where the same entry appears in more than one book. You would want to change those to be a single line with the line numbers concatenated together. Also... some may prefer not to have the volume designator with every single page number.

The next step would be for someone to take the text generated by this code and lay it out in a word processor... and maybe save it as a PDF for other people to use. But as that gets into copyright issues and so forth, I will leave that to others to worry about.

It took me an hour and a half to write this code and describe my solution here.

PS I am currently looking for work. I am an expert .Net platform database application developer with 8 years experience... and I enjoy tinkering with Perl on the side. Contact me via private messages here if you know of any opportunities that might be a good fit for me. I'd like to try working from home, perhaps, but I'm also willing to move. Thanks!
__________________
Jeffro's Space Gaming Blog
Microgames, Monster Games, and Role Playing Games
Jeffr0 is offline   Reply With Quote