Sunday, March 29, 2009

Safe AIML Reductions

The ALICE A.I. Foundation is pleased to make available a set of over 20,000 AIML reduction categories that can be used with any bot. These reductions simplify a huge range of natural language inputs. These categories:

Resolve Synonyms: Map inputs "Hello", "Hi there", "Howdy" etc. onto "Hi".

Simplify Inputs: Convert "I am feeling very happy right now" to "I am happy".

Divide and Conquer: Split "Yes my name is Jim" into "Yes" and "My name is Jim".

The Safe AIML Reductions contain only reduction responses and is stripped of all ALICE bot personality content. You can use the Safe AIML Reductions with any AIML chat bot. The Safe AIML Reductions will save a lot of time when you are developing a unique chat bot personality.

The Safe AIML Reductions avoid the dreaded "Too much recursion" error message in AIML by using <srai>'s that eventually terminate. The Safe AIML Reductions are delivered to you in the form of AIML files.

Saturday, March 28, 2009

Safe Reductions and Dangerous ones

In order to help people make their own chat bots, with their own unique personalities, we would like to leverage as much of the ALICE bot as possible. The ALICE bot contains a large number (around 20,000) of symbolic reduction categories that use <srai>. Generally speaking, these reductions should be applicable to all English speaking bots. They take care of such things as associating synonyms, transforming inputs into simpler forms, and splitting inputs into smaller chunks. Fortunately, it is fairly straightforward to extract these from the ALICE brain into a set of distinct reduction categories, and then try to use them with other bots.

A simple and useful application of the reduction categories by themselves is a bot called the Summarizer. The purpose of the Summarizer is to simplify paragraphs and reduce the number of words. The Summarizer works not by reducing the number of sentences, but by applying as many AIML reductions as possible to reduce the number of words in each sentence. The ultimate default category in the Summarizer with <pattern>*</pattern> simply echos back the input. When the Summarizer can find no more reductions, it prints out the result. The following are some examples of the types of simplifications found by the Summarizer:

I AM FEELING VERY TIRED RIGHT NOW --> I AM TIRED
IN FACT HE HAS GONE TO ANOTHER COUNTRY --> HE WENT ABROAD
YOU CAN REST ASSURED WHAT HE SAYS IS ABSOLUTELY CORRECT --> HE IS CORRECT

When we first created the Summarizer bot, the first step was to export the AIML reductions from the ALICE bot. Although the extraction was straighforward, we quickly noticed an unexpected problem. The ALICE bot contais reductions such as

<category>
<pattern>WHO IS * WALLACE</pattern>
<template><srai>WHO IS RICHARD WALLACE</srai></template>
</category>


The author's original intention for this category is clear. There conversation logs contain inputs such as "Who is DrRichard Wallace", "Who is Richrd Wallace" and "Who is Docter Wallace". The botmaster would like to capture all of these inputs to be classified the same as "Who is Richard Wallace". In the context of the ALICE bot, it is far less likely that the input will be "Who is Henry Wallace" or "Who is William Wallace". Given the topics ALICE talks about, an approrpiate startegy for the bot is to guess that the client intended to ask "Who is Richard Wallace?"

Once this category is extracted to the Summarizer bot however, there is a problem. The base category with the pattern WHO IS RICHARD WALLACE no longer exists. Suppose the input is now "Who is Richard Wallace?" AIML will match the input with <pattern>WHO IS * WALLACE</pattern> and then recursively try to match "WHO IS RICHARD WALLACE" again, leading to infinite recursion.

Therefore it became necessary to filter out these "dangerous" AIML reductions. How can we identify reductions that might lead to infinite loops? Technically it is mathematically impossible to prove that a reduction will lead to an inifinite loop (otherwise we would have solved the unsolvable Halting Problem). What we can do however is identify a set of reductions that are "safe", i.e. guaranteed to eventually produce an output by applying only a finite number of reductions.

A "safe" reduction is one that takes an input, and then calls <srai> with a shorter input, that is, it reduces the number of words. If every reduction reduces the number of words by at least one, then eventually the number of words has to reach one or zero, and the program will terminate. For example these reductions are safe:

<category>
<pattern>I AM REALLY *</pattern>
<template><srai>I AM <star/></srai></template>
</category>

<category>
<pattern>DO YOU KNOW WHAT * IS</pattern>
<template><srai>WHAT IS <star/></srai></template>
</category>

<category>
<pattern>YES *</pattern>
<template><srai>YES</srai> <srai><star/></srai></template>
</category>

The last example is considered safe because the divide and conqer splits the input into two parts, one of length one, and another with one fewer words than the original input.

Some reductions not considered safe are:

<category>
<pattern>WANT TO *</pattern>
<template><srai>DO YOU WANT TO <star/></srai></template>
</category>

<category>
<pattern>I LOVE *</pattern>
<template><srai>I LIKE <star/></srai></template>
<category>

<category>
<pattern>CAN YOU SPEAK *</pattern>
<template><srai>WHAT LANGUAGES DO YOU SPEAK</srai></template>
</category>

Again, it is not necessarily true that these categories will lead to infinite recursion. But the safe set is guaranteed to terminate, so we are better off using them in our general purpose AIML reduction library.

English Contractions and Expansions

I have searched for a list like this online, but have never been able to find one. So I thought I better publish my own list. These are all the common contractions of the English language, along with their expanded forms. AIML chat bots expand all contractions using a list like this, as a pre-processing step. Perhaps they may be of use to someone else besides us botmasters.

("AREN'T" "ARE NOT")
("CAN'T" "CAN NOT")
("CANNOT" "CAN NOT")
("COULD'VE" "COULD HAVE")
("COULDN'T" "COULD NOT")
("DIDN'T" "DID NOT")
("DOESN'T" "DOES NOT")
("DON'T" "DO NOT")
("EVERYTHING'S" "EVERYTHING IS")
("HADN'T" "HAD NOT")
("HASN'T" "HAS NOT")
("HAVEN'T" "HAVE NOT")
("HE S" "HE IS")
("HE'D" ("HE HAD" "HE WOULD"))
("HE'LL" "HE WILL")
("HE'S" "HE IS")
("HOW'D" ("HOW HAD" "HOW WOULD"))
("HOW'S" "HOW IS")
("I'D" ("I HAD" "I WOULD"))
("I'LL" "I WILL")
("I'M" "I AM")
("I'VE" "I HAVE")
("ISN'T" "IS NOT")
("IT S" "IT IS")
("IT'D" ("IT HAD" "IT WOULD"))
("IT'LL" "IT WILL")
("IT'S" "IT IS")
("LET S" "LET US")
("LET'S" "LET US")
("MIGHT'VE" "MIGHT HAVE")
("SHE'LL" "SHE WILL")
("SHE'S" "SHE IS")
("SHOULD'VE" "SHOULD HAVE")
("SHOULDN'T" "SHOULD NOT")
("THAT S" "THAT IS")
("THAT'D" ("THAT HAD" "THAT DID"))
("THAT'LL" "THAT WILL")
("THAT'S" "THAT IS")
("THERE S" "THERE IS")
("THERE'LL" "THERE WILL")
("THERE'S" "THERE IS")
("THERE'S" "THERE IS")
("THEY'D" ("THEY HAD" "THEY WOULD"))
("THEY'LL" "THEY WILL")
("THEY'RE" "THEY ARE")
("THEY'VE" "THEY HAVE")
("THEY'VE" "THEY HAVE")
("THIS'LL" "THIS WILL")
("WASN'T" "WAS NOT")
("WE'D" ("WE HAD" "WE WOULD"))
("WE'LL" "WE WILL")
("WE'RE" "WE ARE")
("WE'VE" "WE HAVE")
("WEREN'T" "WERE NOT")
("WHAT'D" ("WHAT HAD" "WHAT DID"))
("WHAT'LL" "WHAT WILL")
("WHAT'S" "WHAT IS")
("WHERE S" "WHERE IS")
("WHERE'S" "WHERE IS")
("WHO'S" "WHO IS")
("WHO'S" "WHO IS")
("WHY'S" "WHY IS")
("WON'T" "WILL NOT")
("WOULD'VE" "WOULD HAVE")
("WOULDN'T" "WOULD NOT")
("YOU'D" "YOU HAD YOU WOULD")
("YOU'LL" "YOU WILL")
("YOU'RE" "YOU ARE")
("YOU'VE" "YOU HAVE")
("'TIS" "IT IS")
("'EM" "THEM")


Bot Training I: Lingustic Targets

The construction of a chat robot has a video game metaphor. The botmaster writes some AIML, then reads the conversation logs with the resulting bot. Spotting places in the dialog where the bot breaks down, the botmaster writes new AIML to improve the broken dialog, and then repeats this cycle again and again. When we find an opportunity to improve the bot's response in the conversation log, we call that a “target”. Thus authoring the chat bot becomes a process of identifying targets and “filling them up” with new AIML content. The more content, the
more AIML categories created the higher you “score” in this video game metaphor.

Pandorabots has taken the game metaphor a step further and automated the process of finding targets. How do we know when the bot gives an incorrect, vague, or imprecise response? In AIML, the answer is simple: whenever the input pattern contains a wildcard (* or _character). If the input pattern contains no wildcards, then thematch was exact, and in most cases, the bot can formulate an exact reply. If there are wildcards, then the bot by definition only recognized part of what the client said.

(The above describes the situation only to a first order approximation. Strictly speaking, we should consider the input <pattern>, <that> and <topic>. Only if all three contain no wild cards is there truly an exact match. If for example the <pattern> is YES and the <that> pattern is thewildcard *, it is a potential target, because we can make a more exact response by taking into account a <that> value. But as happens many times with AIML, it is simpler to explain a principle of the language by thinking about the input <pattern> in isolation and ignoring the details of <that> and <topic> until later. The extension of the principle to <that> and <topic> is a matter of bookkeeping.)

Strictly speaking, a Target consists of two things: an input, and an AIML category that it matches. For example, the input HE IS STRONG together with the category

<category>
<pattern>HE IS *</pattern>
<template>I did not know he is.</template>
</category>

form a Target. Let's call the input the “Target input” and the category the “Target category”. The Target input “He is strong” together with the Target category above give the botmaster to create a
new, more specific category:

<category>
<pattern>HE IS STRONG</pattern>
<template>Does he work out?</template>
</category>

Pandorabots Targeting algorithm scans the conversation logs, re-classifying the inputs into the AIML Graphmaster, and finds matches. When the matches contain a wildcard, the algorithm saves the input and the matched category on a list of matches. As we might expect, there is a Zipf distribution over the Target categories. Usually there is one category, typically the ultimate default category with <pattern>*</pattern& gt;, associated with more Target inputs than any other category. Then there is a second most activated category, and a third, and so on, down to a long tail of Target categories with only one Target input each.

Using the Targeting algorithm, the botmaster can have quite an enjoyable afternoon building his bot by “filling up lingusitic targets” and accumulating category-count points. To use the Targeting algoritm, the botmaster first selects the conversation logs for analysis. Then by choosing the “Find Targets” option, the botmaster generates a list of Target categories, each with a link to the associated Target inputs. The program provides a direct link from the Target inputs to the Pandorabots Training section, so that the botmaster can efficiently move from Targets to writing new categories.

Friday, March 27, 2009

<srai>: The basics: Table of Common <srai> forms

In writing AIML response templates, certain forms of <srai> responses occur over and over again. They are common enough to be worth identifying and naming them. The following table summarizes the four basic types of common <srai> templates: the Synonym, Simple Wildcard, Multiple Wildcard and Divide and Conquer forms.


Synonym form:

<srai>PHRASE</srai>

Simple Wildcard forms:

<srai><star/></srai>

<srai>PHRASE <star/></srai>

<srai><star index="2"/></srai>

Multiple Wildcard forms:

<srai><star/> <star index="2"/></srai>

<srai><star/> PHRASE <star index="2"/></srai>

Divide and Conquer forms:

<srai><star/></srai>. <srai><star index="2"/></srai>

<srai>PHRASE1 <star/></srai>. <srai>PHRASE2 <star index="2"/></srai>

Thursday, March 26, 2009

<srai>: The basics II: Simple wildcard reductions

AIML uses a wildcard * (the star character) to stand for one or more words. An AIML pattern such as <pattern>I AM *</pattern>, taken by itself, matches a wide range of inputs such as "I am tired", "I am reading a book", "I am waiting for a reply" and so on. If the input is "I am tired", the wildcard is said to be bound to the word "tired" (1 word). If the input is "I am reading a book", the wildcard is bound to "reading a book" (3 words), and similarly, if the input is "I am waiting for a reply", the wildcard is bound to "waiting for a reply" (4 words).

Inside the template, AIML uses the <star/> tag to access wildcard bindings.

Trivially, the category

<category>
<pattern>*</pattern>
<template><star/></template>
</category>

will just echo the client's input:

Human: Hello!
Robot: Hello!
Human: Who are you?
Robot: Who are you?

The simplest form of AIML reduction using <srai> together with <star/>, involves reducing or simplifying the input by one or a few words:

<category>
<pattern>I AM ESPECIALLY *</pattern>
<template><srai>I AM <star/></srai></template>
</category>

If someone says, "I am especially tired", or "I am especially interested in this book", it is really no different logically from saying "I am tired" or "I am interested in this book". A philosopher might say, the word "especially" plays no logical role in the sentence. More practically, the bot may have a reply for "I am tired" and "I am interested in something", so reducing the input by removing the
word "especially" will link these inputs to appropriate responses.

The categories:

<category>
<pattern>I AM TIRED</pattern>
<template>Maybe you should take a nap?</template>
</category>

<category>
<pattern>I AM ESPECIALLY *</pattern>
<template><srai>I AM <star/></srai></template>
</category>

produce the dialog:

Human: I am especially tired.
Robot: Maybe you should take a nap?

As a bonus, these types of reduction categories can reduce a sentence with "I am" followed by any number of occurrences of "especially":

Human: I am especially especially especially tired.
Robot: Maybe you should take a nap?

A slightly related, also very common form of reduction, eliminates sequences of words (clauses) that can be eliminated from the input without changing its meaning significantly. Such clauses are decoration, added by the human personality, perhaps as social conventions, but again, the philosopher might say they have no logical purpose. "I will state that I warned you about his condition" is really the same as "I warned you about his condition", at least as far as the robot is concerned. The robot may already have a response to "I warned you about something", so by reducing the input, the bot stands a better chance of making an intelligent sounding reply.

In these cases we use a <pattern> with a wildcard, but end up throwing away all the words in the pattern except for the wildcard:

<category>
<pattern>I WILL STATE THAT *</pattern>
<template><srai><star/></srai></template>
</category>

Another example is a category for inputs that start with "At any rate..."

<category>
<pattern>AT ANY RATE *</pattern>
<template><srai><star/></srai></template>
</category>

A last example handles sentences that begin "I assure you that..."

<category>
<pattern>I ASSURE YOU THAT *</pattern>
<template><srai><star/></srai></template>
</category>

<srai>: The basics I: Linking Synonyms

The simplest form of category in AIML is one with an atomic pattern, namely, a pattern with no wild cards.

For example

<category>
<pattern>NAME</pattern>
<template>My name is ALICE.</template>
</category>

produces the dialog:

Human: Name?
Robot: My name is ALICE.

But there are many different ways of asking for a name. "What is your name?", "Who are you?", "Tell me your name", "Do you have a name?", "Your name is?" are all synonyms for "Name?".

In AIML we link all of these synonyms together using the <srai> tag.

The category:

<category>
<pattern>WHAT IS YOUR NAME</pattern>
<template><srai>NAME</srai></template>
</category>

used in conjunction with the category above, produces the dialog:

Human: What is your name?
Robot: My name is ALICE.

Similarly, we can add categories such as:

<category>
<pattern>DO YOU HAVE A NAME</pattern>
<template><srai>NAME</srai></template>
</category>

and

<category>
<pattern>YOUR NAME IS</pattern>
<template><srai>NAME</srai></template>
</category>

and so on for all the variations of "Name?"

Creating a new chat bot involves writing an enormous amount of original AIML content. Because of its minimalism, AIML lets the botmaster approach content creation one small step at a time. Typically when creating a bot, we identify in the chat logs a new variation of an input we have already seen before. For example, we might see "I want to know your name", and realize there is already a response for this. The convenience of <srai> is that we don't have to go searching through files and files of code to find the response. We know the response is already in there somewhere, so we can just link to it with <srai>:

<category>
<pattern>I WANT TO KNOW YOUR NAME</pattern>
<template><srai>WHAT IS YOUR NAME</srai></template>
</category>

Notice that we linked to WHAT IS YOUR NAME. This doesn't matter, because there is already another category linking WHAT IS YOUR NAME to NAME.

As a general rule of thumb, a good design principle of AIML is to follow the rule that each <srai> should "shorten" then input. In this case that means that reducing WHAT IS YOUR NAME (4 words) to NAME (1 word) is ok, and reducing I WANT TO KNOW YOUR NAME (6 words) to WHAT IS YOUR NAME (4 words) is ok, but we should avoid categories like:

<category>
<pattern>TELL ME YOUR NAME</pattern>
<template><srai>I WANT TO KNOW YOUR NAME</srai></template>
</category>

By avoiding such designs, the AIML program is guaranteed to terminate, and we will never see the "Too much recursion in AIML" error. Why this is, will be explained in more detail later. For now, keep in mind the design rule that in AIML, it is better to store actual responses along with the simplest and shortest possible patterns. That is why putting the response "My name is ALICE" inside the category with pattern NAME, is better than putting it with WHAT IS YOUR NAME.

Tuesday, March 10, 2009

New AIML Learning File

AIML Botmaster Steve Worswick (Square Bear) has written a file that will allow users to teach your bot new facts rather than just correcting its replies using badanswer.aiml. The user should say LEARN then the fact. For example:

Human: Learn the sun is hot
Bot: Ok I will learn that the sun is hot.

Human: Learn London is in England
Bot: Ok I will learn that London is in England.

Human: Learn a cat has four legs.
Bot: Ok I will learn that a cat has 4 legs.

Human: What has 4 legs?
Bot: a cat.

Human: What is the sun
Bot: hot.

Human: Learn the sun is hot and yellow
Bot: Ok I will learn that the sun is hot and yellow.

Human: What is hot and yellow
Bot: the sun.

Human: What is the sun
Bot: hot and yellow.

Human: Where is London
Bot: in England.

Human: What is in England
Bot: London.

Human: Learn Dr Wallace is an AI expert
Bot: Ok I will learn that Dr Wallace is an AI expert.

Human: What is an AI expert
Bot: Dr Wallace.

Human: Who is an AI expert
Bot: Dr Wallace.

Human: Who is Dr Wallace
Bot: an AI expert.

Human: What is hot
Bot: the sun.

Human: What has a cat?
Bot: 4 legs.

Square-Bear will update this file to cover more complicated knowledge such as "Learn the 2012 Olympic Games will be held in London", "Where will the 2012 Olympic Games be?". The current version of the file is available, along with all my other public AIML files at: http://square-bear.co.uk/aiml Square-Bear says, "Feel free to use it in your bots!"
 

blogger templates | Make Money Online