Indexing Words by page, paragraph, line

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15626
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Indexing Words by page, paragraph, line

Post by ChrisGreaves »

The attached ZIP file contains a Word document with some simple VBA code, plus two screen snapshots.
The goal was to produce a laid-out index of every word in a document, excepting for a list of prohibited words and excepting words below a specified length.
The report is placed in a new document with each word listed, its Count of occurences, and its occurrence as Page Number (within the document), Paragraph Number (within the document) and Line Number (within the page).
The output is sorted in Order of First Appearance within the document.
The report is in tab-delimited format to allow a plain convert-text-to-table operation.
If I were counting this as a favour to The Lounge(s) the score would read something like: Lions 23, Christians 4,567.
You do not have the required permissions to view the files attached to this post.
There's nothing heavier than an empty water bottle

User avatar
HansV
Administrator
Posts: 78493
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Indexing Words by page, paragraph, line

Post by HansV »

Thanks - it works well.

Just a few remarks:
  • It's cute to see the code work its way through the source document, but for "production" purposes it might be better to set Application.ScreenUpdating to False before processing the document, and to True again afterwards.
  • Text comparison is case sensitive, so for example "mine", "Mine" and "MINE" end up as different entries. In my humble opinion, it would be better to convert everything to lower case, or to provide an extra parameter to specify case (in)sensitivity.
  • It would also be nice to have an option to sort the words alphabetically, but I realize that would be more work, and it's not crucial.
Best wishes,
Hans

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15626
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Indexing Words by page, paragraph, line

Post by ChrisGreaves »

HansV wrote:
  • set Application.ScreenUpdating to False
  • Text comparison is case sensitive,
  • sort the words alphabetically,
True, true and true.
And THANKS!

I ran this up early this morning for a friend - it's more a proof-of-concept than anything else.
It has none of my usual bells and whistles, and mirabile dictu it does NOT make use of my utility library UW.dot.

I certainly prefer Range over Selection, but this being a P-O-C I wanted the user to see das blinken lights. Also a developer can step through the code and see how it works, for now.

I usually have an INI file with an associated GUI form (supported by UW.dot) where the user can set options such as case-sensitivity, word-length, prohibited words and so on. Again, for this P-O-C not essential.

Sorting is not so hard. I use the original QSort algorithm I stole borrowed in 1997; I usually create an auxiliary string array with the key (each word) and a string-formatted version of its index in the Type array, sort the auxiliary string array, and then rebuild the Type array in asc/descending sequence (there's that INI/GUI again!). It is a straightforward matter then to sort by word, page or paragraph.

Anyway, all perfectly valid points.
I'm not even sure if we have a line-of-page thread on Eileen's lounge.
Yet! (grin!)

P.S. I had to use a fine-toothed saw to trim the ZIP to 100K; first time I've been so close that a pixel made itself known ...
There's nothing heavier than an empty water bottle

User avatar
HansV
Administrator
Posts: 78493
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Indexing Words by page, paragraph, line

Post by HansV »

ChrisGreaves wrote:P.S. I had to use a fine-toothed saw to trim the ZIP to 100K; first time I've been so close that a pixel made itself known ...
100 KB was the size limit for attachments in Woody's Lounge before it moved to different software. Here in Eileen's Lounge, the size limit is 256 KB.
Best wishes,
Hans

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15626
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Indexing Words by page, paragraph, line

Post by ChrisGreaves »

HansV wrote:Here in Eileen's Lounge, the size limit is 256 KB.
Correct. My old eyes ...
I just re-zipped the original and it looks like 272Kb. I must have seen the "over-limit" messages and reverted to a mental state of 100K limit.
There's nothing heavier than an empty water bottle