Skip to content

Distribution inside a category

toncho11 edited this page Jul 9, 2019 · 8 revisions

Once a category is selected from the main distribution, we need to select an item from it. Some items are tagged. For example some jokes are labeled as "Delayed" or "Mildly offensive". When we start a conversation with a new person usually it takes some time before we get close, so some of the jokes are to be used later in the conversation.

Here are some possibilities to select an item:

Uniform

We can simply call the Random class in C#, which by default follows the Uniform distribution.

Combination with or without repetition

A problem arises when we have a small list of items and a low number of samples taken. We would like to present the items in a random way with repetition. Having a repetition and selecting using the uniform distribution means that the same item can be selected more than once and thus leaving another item never used (or not selected for a long time). Also we would like to show all the items at least once, so the user can be aware that they exist. A strategy can be to first create a combination without repetition and then once the user has seen all, we switch to a combination with repetition. KorraAI provides helper methods for this strategy.

Subgroups with different probabilities

A group of items can be divided in sub-groups. For example all "Delayed" vs "Non-delayed". Questions such as "What is your name?" or "Where do you come from?" are usually asked in the beginning when people meet for the first time before "Did you watch a movie yesterday?"

string[] group1 = { "UserName" }; //conversation should start with these
string[] group2 = { "UserAge", "UserSex" }; //conversation should continue with these
string[] group3 = { "UserLocation", "UserNationality", "UserIsMarried", "UserHasKids", "UserHasJob" }; 

    List<ItemProb<string>> itemProbs = new List<ItemProb<string>>();

    //assign probabilities 
    foreach (PureFact fact in q)
    {
          if (group1.Contains(fact.Name)) //GROUP 1 
             {
                 itemProbs.Add(new ItemProb<string>(fact.Name, Prob(0.99)));
                 break; //add only 1 item
             }
             else
             if (group2.Contains(fact.Name)) //GROUP 2
                 itemProbs.Add(new ItemProb<string>(fact.Name, Prob(0.8)));
             else
             if (group3.Contains(fact.Name)) //GROUP 3
                 itemProbs.Add(new ItemProb<string>(fact.Name, Prob(0.20)));
             else itemProbs.Add(new ItemProb<string>(fact.Name, Prob(0.06))); //all the other pure facts not in group1 and group 2
    }

As you can see the probabilities 0.99, 0.8, 0.20, 0.06 are assigned to each group. The advantage is that we still leave the possibility of asking a question from group 3, we just make it less probable. And this is done to make the bot less predictable and more human like.