Saturday, March 28, 2009

Safe Reductions and Dangerous ones

In order to help people make their own chat bots, with their own unique personalities, we would like to leverage as much of the ALICE bot as possible. The ALICE bot contains a large number (around 20,000) of symbolic reduction categories that use <srai>. Generally speaking, these reductions should be applicable to all English speaking bots. They take care of such things as associating synonyms, transforming inputs into simpler forms, and splitting inputs into smaller chunks. Fortunately, it is fairly straightforward to extract these from the ALICE brain into a set of distinct reduction categories, and then try to use them with other bots.

A simple and useful application of the reduction categories by themselves is a bot called the Summarizer. The purpose of the Summarizer is to simplify paragraphs and reduce the number of words. The Summarizer works not by reducing the number of sentences, but by applying as many AIML reductions as possible to reduce the number of words in each sentence. The ultimate default category in the Summarizer with <pattern>*</pattern> simply echos back the input. When the Summarizer can find no more reductions, it prints out the result. The following are some examples of the types of simplifications found by the Summarizer:

I AM FEELING VERY TIRED RIGHT NOW --> I AM TIRED
IN FACT HE HAS GONE TO ANOTHER COUNTRY --> HE WENT ABROAD
YOU CAN REST ASSURED WHAT HE SAYS IS ABSOLUTELY CORRECT --> HE IS CORRECT

When we first created the Summarizer bot, the first step was to export the AIML reductions from the ALICE bot. Although the extraction was straighforward, we quickly noticed an unexpected problem. The ALICE bot contais reductions such as

<category>
<pattern>WHO IS * WALLACE</pattern>
<template><srai>WHO IS RICHARD WALLACE</srai></template>
</category>


The author's original intention for this category is clear. There conversation logs contain inputs such as "Who is DrRichard Wallace", "Who is Richrd Wallace" and "Who is Docter Wallace". The botmaster would like to capture all of these inputs to be classified the same as "Who is Richard Wallace". In the context of the ALICE bot, it is far less likely that the input will be "Who is Henry Wallace" or "Who is William Wallace". Given the topics ALICE talks about, an approrpiate startegy for the bot is to guess that the client intended to ask "Who is Richard Wallace?"

Once this category is extracted to the Summarizer bot however, there is a problem. The base category with the pattern WHO IS RICHARD WALLACE no longer exists. Suppose the input is now "Who is Richard Wallace?" AIML will match the input with <pattern>WHO IS * WALLACE</pattern> and then recursively try to match "WHO IS RICHARD WALLACE" again, leading to infinite recursion.

Therefore it became necessary to filter out these "dangerous" AIML reductions. How can we identify reductions that might lead to infinite loops? Technically it is mathematically impossible to prove that a reduction will lead to an inifinite loop (otherwise we would have solved the unsolvable Halting Problem). What we can do however is identify a set of reductions that are "safe", i.e. guaranteed to eventually produce an output by applying only a finite number of reductions.

A "safe" reduction is one that takes an input, and then calls <srai> with a shorter input, that is, it reduces the number of words. If every reduction reduces the number of words by at least one, then eventually the number of words has to reach one or zero, and the program will terminate. For example these reductions are safe:

<category>
<pattern>I AM REALLY *</pattern>
<template><srai>I AM <star/></srai></template>
</category>

<category>
<pattern>DO YOU KNOW WHAT * IS</pattern>
<template><srai>WHAT IS <star/></srai></template>
</category>

<category>
<pattern>YES *</pattern>
<template><srai>YES</srai> <srai><star/></srai></template>
</category>

The last example is considered safe because the divide and conqer splits the input into two parts, one of length one, and another with one fewer words than the original input.

Some reductions not considered safe are:

<category>
<pattern>WANT TO *</pattern>
<template><srai>DO YOU WANT TO <star/></srai></template>
</category>

<category>
<pattern>I LOVE *</pattern>
<template><srai>I LIKE <star/></srai></template>
<category>

<category>
<pattern>CAN YOU SPEAK *</pattern>
<template><srai>WHAT LANGUAGES DO YOU SPEAK</srai></template>
</category>

Again, it is not necessarily true that these categories will lead to infinite recursion. But the safe set is guaranteed to terminate, so we are better off using them in our general purpose AIML reduction library.

3 comments:

  1. Couldn't you just track the recursion and detect when it repeats a reduction? It shouldn't be possible to create an infinite loop, with something like AIML.

    ReplyDelete
  2. Repeating reductions is not a sufficient condition for detecting loops in AIML. Sometimes a bot wants use a reduction more than once, for example reducing "I really really really really really like you" to "I like you" uses the same reduction 5 times. There is no way to predict the variety or length of human inputs.

    ReplyDelete
  3. While repeating reduction is not a sufficient criteria for detecting loops, would both looking for repeated reductions along with the text not getting shorter (or at least changing) a good criteria.

    I suppose if the criteria were that the text changed, it might be necessary to look back at past text to see if it repeats?

    Not being familiar with Alice, I'm not sure whether this idea fits, although I am inclined to think it's better to avoid such repetition by design rather than by detecting it later. The former would probably make the system less extensible because behavior would be more difficult to understand. So, it's probably best not to implement the idea I suggest above!

    ReplyDelete

 

blogger templates | Make Money Online