"Sounds Like" - searching for text.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

"Sounds Like" - searching for text.

Post by ChrisGreaves »

My first thought was "Soundex", and I quickly scanned my data base of VBA code for procedures, dragged up a couple, and found that they failed. (see below)

My second thought was "Google", and this example looked promising, but failed, sort of. (see below)

There are numerous web pages which promise an on-line test (good for high school students who have a homework deadline), or dedicated sites ("Ancestry" or "Genealogy" dot anything) which focus on Names of People or Places and, I suspect, lean heavily on a variant of Soundex.

Below:
My acid test is for Modern English words that From 38m0s to 39m23s "Begin with C pronounced “ess”". Examples include "Civil", "Cease" and "Cymbal". (Yes, and pronounCement too, although it does not Begin With "C", and anyway, I have a better rule - not shown here - for classifying that sort of word.)

Code: Select all

Sub test()
    With Selection.Find
     .ClearFormatting
     .Text = "Symbal" ' Replace leading C with a leading S and test to find the leading-C string.
     .MatchFuzzy = False
     .MatchSoundsLike = True
     .Execute Format:=False, Forward:=True, Wrap:=wdFindContinue
    End With
End Sub
The code from the docs.microsoft.com web page worked pretty well.

I have a DOCument with notes which notes include various examples, such as "Civil", "Cease" and "Cymbal". I loaded the test VBA code with the word "civil" and changed that leading "c" to be an "s".
I reason that if I have the word "civil" to test, looking for sounds-like "sivil" would, if successful, satisfy the requirement that I have here (in "civil" it will turn out), a word that sounds like "sivil", and since I am testing only words that begin with "c", then any such word that sounds-like itself with the "c" replaced with "s" is a satisfactory target. (The pseudo code says "look at all words and for those that start with "C", search for that string after replacing the "c" with "s" in the FindWhat test")
My SUB TEST above shows me successfully determining that the word "cymbal" sounds like "symbal", and so satisfies my requirements (by locating "cymbal" in my document)

I had strings “Christopher” and “Church” in my document, and the SUB TEST correctly told me that "Christopher" failed the test, but that "Church" satisfied the test (which, sadly, matched "Shurch" to "Church")

Looking for a ç-cedilla probably won't help much (as in comparing “cymbal” and “la cymbale”) in the sense that we have many Modern English words that do not use modified letters (such as the ç character), and even if the french word for a cymbal did have a cedilla, how is my program supposed to know that?

My current status is that the code from the docs.microsoft.com web page is good-but-not-perfect. It seems to work better than Soundex (which was designed for indexing names by sound, as pronounced in English as distinct from classifying Modern English words in general. Soundex is very good at equating Greaves with Greeves with Grieves and so on, Witt with Wit with Whytte and so on when you are asking about your flight reservation. Not so good with "gravitational" or perhaps "generous".

There are more sounds-like conditions than the C/S one I have shown here.

I can run some more tests and measure the accuracy of the docs.microsoft.com code, and then compare that benchmark against any VBA-like method suggested.

(later) this web site suggests that "c" before "i" or "e" is a worthwhile rule in French; that might help me in English to focus on successful matches.
(later still) Getting there! This is with a rule that looks for leading ce/ci/cy (attached Text file)
Thanks
Chris
You do not have the required permissions to view the files attached to this post.
An expensive day out: Wallet and Grimace