Friday, October 07, 2011

Processing common suffixes with AIML

One of the longstanding problems with AIML pattern matching is related to processing certain common suffixes in English.  These are words or phrases added to the end of a sentence that may contribute little in the way of meaning, such as “really”, “eh”, and “you know.”  By far the most common example of this is the use of a personal name at the end of a sentence: “Nice to meet you, Alice”, “How is the weather over there, Alice?”  and “You are silly, Alice.” 

When creating a bot in AIML we’d like to reduce these kinds of sentences to the same thing without the suffix.  “How is the weather over there Alice?” is the same as “How is  the weather over there?”  It’s quite tempting to write one AIML category that covers all these cases:

<category>
<pattern>_ ALICE</pattern>
<template><srai><star/></srai></template>
</category>

This category reduces any sentence ending with ALICE to the same sentence without the suffix, and uses the <srai> recursion to find a response.  Moreover, because it uses the _ wildcard, the category will reduce every input ending with ALICE.  This reduction category saves us from having to write separate patterns for every case like “How are you” and “How are you Alice?” (which would effectively double the number of categories needed for a bot). 

This is all well and good except that in other cases the suffix ALICE does add meaning to the sentence.  “Who is Alice”, “Why are you named Alice?” and “My name is Alice” would all be reduced to meaningless sentence fragments: “Who is”, “Why are you named”, and “My name is”. 

AIML  uses the wildcard _ to indicate that the pattern has higher priority than a word or the wildcard *.  Simply changing the above category to use <pattern>* ALICE</pattern> doesn’t solve the problem.  An input like “Why are you named Alice” might match a default category with a 
<pattern>WHY ARE YOU *</pattern>  
before it matches 
<pattern>* ALICE</pattern>.

The solution outlined here uses two tricks.  First, we replace suffix ALICE with the word BOTNAME.  Then we use the AIML <topic> feature to collect all the categories ending with ALICE where the word ALICE plays a meaningful role.   If one of those matches the modified input, then the AIML returns that result.  Otherwise, we look for a match for the sentence with the suffix removed.  

<category><pattern>_ ALICE</pattern>
<template>
<think>
<set name="savedtopic"><get name="topic"/></set>
<set name="topic">ENDS WITH ALICE</set>
<set name="matched">false</set>
<set name="response"><srai><star/> BOTNAME</srai></set>
</think>
<condition name="matched">
<li value="true"><get name="response"/></li>
<li><srai><star/></srai></li>
</condition>
<think>
<set name="topic"><get name="savedtopic"/></set>
</think>
</template>
</category>


The AIML above uses the “savedtopic” predicate to remember the current conversation topic.  Then we set “topic” to ENDS WITH ALICE.    Another predicate, “matched”, is initialized to false.  If the program finds a match where the suffix is meaningful, the value of "matched" is set to true and the conditional branch returns that match.  Otherwise, the conditional returns a match for the sentence without the suffix.  The following AIML topic group collects together the categories where we want to match the sentence including the suffix ALICE.   Each of these includes the statement <set name="matched">true</set> indicating a successful match.

<topic name="ENDS WITH ALICE">

<category><pattern>TEST BOTNAME</pattern>
<template>
<bot name="name"/> is functioning normally.
<think><set name="matched">true</set></think>
</template>
</category>

<category><pattern>CALL ME BOTNAME</pattern>
<template>
My name is 
<set name="name"><bot name="name"/></set> too!
<think><set name="matched">true</set></think>
</template>
</category>

<category><pattern>WHAT IS BOTNAME</pattern>
<template><srai>WHAT ARE YOU</srai>
<think><set name="matched">true</set></think>
</template>
</category>

<category><pattern>WHO IS BOTNAME</pattern>
<template>I am <bot name="name"/>.
<think><set name="matched">true</set></think>
</template>
</category>

</topic>

One slightly pathological case can however make the above code “forget” the original conversation topic.  This is the case where the suffix is repeated, and the first category is called recursively more than once.  For example the input “Alice Alice Alice Alice Alice” will result in four levels of recursion, and when the procedure unwinds, the topic will still be set to ENDS WITH ALICE.  There does appear to be a solution covering this situation, however, and it is left as an exercise to the attentive reader.

The code in this example is built into the latest development snapshot of the ALICE AIML set available in our repository at http://code.google.com/p/aiml-en-us-foundation-alice/

No comments:

Post a Comment

 

blogger templates | Make Money Online