CodeBlock - Building a Personalized Chatbot using Python and NLTK

Building a Personalized Chatbot using Python and NLTK

Author: Daniel Marsh | Published: June 25, 2023

Introduction:

In today's digital age, chatbots have become increasingly popular for their ability to simulate human-like conversations and provide automated assistance. In this tutorial, we will walk you through the process of building a personalized chatbot using Python and the Natural Language Toolkit (NLTK). This chatbot will be capable of understanding and responding to user input, as well as storing and utilizing personalized information provided by the user. 

Overview:

Our chatbot, named ARCHIE, utilizes NLTK's natural language processing capabilities to understand and generate meaningful responses. ARCHIE is designed to interact with users in a personalized manner, by recognizing and remembering key information such as the user's name, age, and interests. By integrating these personal details into the conversation, ARCHIE aims to create a more engaging and tailored user experience. 

How it Works: 

  1. Preprocessing User Input: The chatbot starts by preprocessing the user's input. It removes punctuation, tokenizes the input into individual words, removes common English stopwords, and performs lemmatization to reduce words to their base form. This preprocessing step ensures that the chatbot can better understand and analyze the user's input. 

  1. Matching Patterns: ARCHIE employs a pattern matching technique to identify specific user intents or queries. It compares the preprocessed user input with predefined patterns and selects the best match. If a match is found, ARCHIE randomly selects a response from a corresponding list of possible responses. This pattern matching process allows the chatbot to handle a variety of user inputs and provide relevant replies. 

  1. Handling Custom Data Patterns: ARCHIE goes beyond generic patterns and supports custom data patterns, such as capturing the user's name, age, and interests. These patterns are defined in a separate file and allow ARCHIE to store and recall the user's personalized information. When a custom data pattern matches the user's input, ARCHIE extracts the relevant data and incorporates it into the generated response, creating a more personalized interaction. 

  1. Generating Responses: After attempting to match patterns, ARCHIE generates responses based on the processed user input. If a custom data pattern is matched, ARCHIE generates a response using the stored user data or prompts the user to provide missing information. If no specific pattern is matched, ARCHIE falls back to a default response, acknowledging that it doesn't fully understand the user's input and requesting clarification. 

Now, you're ready to dive into the step-by-step process of creating this chatbot! 

  

Step 1: Setting Up the Environment 

To get started, make sure you have Python installed on your system. NLTK (Natural Language Toolkit) also needs to be installed. You can install it by running the following command in your terminal:

pip install nltk 

 

Step 2: Importing the Required Libraries and Resources 

In your main Python file, let's name it `main.py`, start by importing the necessary libraries and resources. Add the following code at the beginning of your `main.py` file:

import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re
import random

# Import custom modules
from response import pairs, user_data_patterns 

# Check if NLTK resources are downloaded, and if not, download them
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt') 

# Custom user input data
user_info = { 
    "name": "",
    "age": "", 
    "interests": "" 
} 

Here, we import the necessary NLTK libraries, such as `stopwords`, `WordNetLemmatizer`, and `re` for regular expressions. We also import the `random` module for generating random responses. Additionally, we import the `pairs` and `user_data_patterns` from a separate file called `response.py`, which we will create later. Lastly, we define a dictionary called `user_info` to store user-specific information.

 

Step 3: Preprocessing User Input 

To make user input more manageable, we need to preprocess it. We remove punctuation, tokenize the input, remove stopwords, and lemmatize the words. Add the following function to your `main.py` file:

def preprocess_input(user_input):
    """
    Preprocesses user input by removing punctuation, tokenizing, removing stopwords, and lemmatizing.
    """ 

    # Remove punctuation
    user_input = re.sub(r'[^\w\s]', '', user_input) 

    # Tokenize the input
    tokens = nltk.word_tokenize(user_input.lower()) 

    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    tokens = [token for token in tokens if token not in stop_words] 

    # Lemmatization
    lemmatizer = WordNetLemmatizer()
    tokens = [lemmatizer.lemmatize(token) for token in tokens] 

    return tokens 

In this function, we use regular expressions `re.sub()` to remove punctuation from the user input. Then, we tokenize the input using `nltk.word_tokenize()` and remove any stopwords using NLTK's `stopwords` corpus. Finally, we perform lemmatization on the remaining tokens using `WordNetLemmatizer`.

 

Step 4: Matching Patterns and Generating Responses 

Now, let's define functions that will match user input with predefined patterns and generate appropriate responses. Add the following code to your `main.py` file:

def match_pattern(user_input, patterns):
    """
    Matches user input with patterns and returns a random response. 
    """ 

    for pattern, responses in patterns:
        match = re.match(pattern, user_input, re.IGNORECASE)
        if match:
            response = random.choice(responses)

            # Check if the response contains a group
            if "%1" in response:
                return response.replace("%1", match.group(1)) 
            else: 
                return response

def generate_response(user_input): 
    """ 
    Generates a response based on user input. 
    """ 

    # Preprocess user input 
    tokens = preprocess_input(user_input) 
  
    # Check if the user input requires custom data patterns
    response = custom_data_pattern(user_input) 
    if response: 
        return response

    # Check for other specific patterns
    response = match_pattern(user_input, pairs) 

    if response: 
        return response 
  
    # Fallback response if no specific pattern matches 
    response = "I'm sorry, I don't understand. Can you please rephrase?" 
    return response 

In the `match_pattern` function, we iterate through the predefined patterns and use regular expressions `re.match()` to check if the user input matches any pattern. If a match is found, we randomly select a response from the list of possible responses. If the response contains a group (indicated by `%1`), we replace it with the matched group. 

The `generate_response` function preprocesses the user input using the `preprocess_input` function. It then checks if the user input requires custom data patterns using the `custom_data_pattern` function. If a custom response is generated, it is returned. Otherwise, the function checks for other specific patterns using the `match_pattern` function. If a specific pattern matches, the corresponding response is returned. If no specific pattern matches, a fallback response is provided.

 

Step 5: Handling Custom Data Patterns 

We will now implement functions to handle custom user data patterns. Add the following code to your `main.py` file:

def custom_data_pattern(user_input): 
    """ 
    Checks if the user input matches the custom user data patterns and provides a response accordingly. 
    """ 

    for pattern, data in user_data_patterns.items(): 
        match = re.match(pattern, user_input, re.IGNORECASE) 
        if match: 
            return custom_data_pattern_response(data["key"], pairs[data["index"]][1], match, data["save"], data["default_response"]) 

    # No match found 
    return None 

  
def custom_data_pattern_response(key, pattern, match, save, default_response): 
    """ 
    Generates a response based on the provided parameters and user input match. 
    """ 

    if save: 
        # Save the user's input data in the user_info dictionary 

        user_info[key] = match.group(1)
        response = random.choice(pattern).replace("%1", user_info[key]) 
    elif key != "all" and user_info[key]: 

        # Use the stored user data in the response if available 
        response = random.choice(pattern).replace("%1", user_info[key]) 

    elif user_info.get("name") and user_info.get("age") and user_info.get("interests"): 

        # Use the stored user data in the response if available 
        response = random.choice(pattern).replace("%1", user_info["name"]).replace("%2", user_info["age"]).replace("%3", user_info["interests"]) 
    else: 

        # Use the default response if the user data is not available 
        response = default_response

    return response 

The `custom_data_pattern` function iterates through the custom user data patterns defined in the `user_data_patterns` dictionary. It uses regular expressions to check if the user input matches any pattern. If a match is found, the function calls the `custom_data_pattern_response` function to generate an appropriate response. 

The `custom_data_pattern_response` function generates a response based on the provided parameters and user input match. If the `save` flag is set to `True`, it saves the user's input data in the `user_info` dictionary. If the `key` is not `"all"` and the corresponding data exists in the `user_info` dictionary, it uses the stored data in the response. If all required data (`name`, `age`, and `interests`) is available, it uses the stored data in the response. Otherwise, it uses the `default_response`.

 

Step 6: Defining Patterns and Responses 

Finally, let's create a separate file called `response.py` to define the patterns and responses. Create a new file called `response.py` and add the following code: 

# Dictionary mapping custom user data patterns 
user_data_patterns = { 
    r"my name is (.*)": {
        "key": "name", 
        "index": 0, 
        "save": True, 
        "default_response": "" 
    }, 
    r"(What is my name|whats my name)\??": { 
        "key": "name", 
        "index": 1, 
        "save": False,
        "default_response": "I don't know your name yet. Can you please tell me?" 
    }, 
    r"my age is (\d+)": { 
        "key": "age", 
        "index": 2,
        "save": True,
        "default_response": "" 
    }, 
    r"(What is my age|whats my age|how old am i)\??": { 
        "key": "age", 
        "index": 3, 
        "save": False, 
        "default_response": "I don't know your age yet. Can you please tell me?" 
    }, 
    r"i like (.*)": { 
        "key": "interests", 
        "index": 4, 
        "save": True, 
        "default_response": "I don't know your interests yet. Can you please tell me?" 
    }, 
    r"(What do i like|whats my interests)\??": { 
        "key": "interests", 
        "index": 5, 
        "save": False, 
        "default_response": "I don't know your interests yet. Can you please tell me?" 
    }, 
    r"(do you know me|what do you know about me)\??": { 
        "key": "all", 
        "index": 6, 
        "save": False, 
        "default_response": "I don't really know you, but we can start with names. Hi, I'm ARCHIE!" 
    }, 
}

# Define pairs of patterns and responses 
pairs = [ 

    # Getting user data interests, age, and name 

    [ 
        r"my name is (.*)", 
        ["Hello %1, How are you today?", "Nice to meet you, %1! How can I assist you?"] 
    ], 
    [ 
        r"(What is my name|whats my name)\??", 
        ["Your name is %1.", "You told me your name is %1."] 
    ], 
    [ 
        r"my age is (.*)", 
        ["Cool, you're %1!", "Wow, %1 is old!", "Wow, %1 is young!"] 
    ], 
    [ 
        r"(What is my age|whats my age|how old am i)\??", 
        ["Your age is %1.", "You told me you are %1 years old.", "You are %1 years old."] 
    ], 
    [ 
        r"i like (.*)",
        ["Cool, I like %1 too!", "Wow, %1 is interesting!"]
    ], 
    [ 
        r"(What do i like|whats my interests)\??",
        ["You told me you like %1.", "Your interests include %1."] 
    ], 
    [ 
        r"(do you know me|what do you know about me)\??", 
        ["I don't really know you, but we can start with names. Hi, I'm ARCHIE!"] 
    ], 

    # add more responses here...

] 

In the `user_data_patterns` dictionary, we define various patterns related to user data, such as capturing the user's name, age, and interests. Each pattern is associated with a key, an index to access the response in the `pairs` list, a `save` flag to determine if the data should be saved, and a `default_response` in case the data is not available.

The `pairs` list contains pairs of patterns and corresponding responses. Each pattern is associated with a list of possible responses. The responses can include `%1`, `%2`, `%3`, etc., which will be replaced with the matched groups in the response generation process.

 

Step 7: Testing the Chatbot 

To test the chatbot, create a new Python file and import the necessary functions from `main.py`. Add the following code to your new file:


from main import generate_response
  

# Test the chatbot 
while True: 
    user_input = input("You: ") 
    response = generate_response(user_input) 
    print("Bot:", response) 

 This code imports the `generate_response` function from `main.py` and sets up a loop to continuously prompt the user for input. The chatbot generates a response based on the user's input and prints it to the console.

Well done! You have completed the tutorial for building a personalized chatbot using Python and the Natural Language Toolkit (NLTK). By following the step-by-step process, you have learned how to preprocess user input, match patterns, generate responses, and handle custom data patterns.

It's important to note that this is a basic implementation of a chatbot, and there is room for further improvement and customization. You can enhance the chatbot by expanding the patterns and responses, incorporating more advanced natural language processing techniques, and integrating it with external APIs or databases to provide more comprehensive and personalized assistance.

If you struggle to understand anything in this article, you can refer to the documentation for NLTK here. The NLTK documentation provides detailed information and examples that can help you dive deeper into the capabilities of the toolkit.

By continuing to explore and experiment with chatbot development, you can create more sophisticated and intelligent conversational agents. Good luck with your future chatbot projects!



Explore More articles




Subscribe to our Newsletter

CodeBlock - Empowering Your Digital Presence