Building an Intelligent Medical WeChat Bot with Dify, GPT-4 Turbo, and Wechaty

Author: gscfwid, An anesthetist in a big ship of mainland.

Hello everyone, I’m a doctor and also a technology enthusiast. Today I’d like to share with you my recent project — building an intelligent Agent based on GPT-4 Turbo using Dify to implement an advanced WeChat chatbot.

Why Choose a WeChat Bot

First, let me explain why I wanted to build this bot. As a doctor, one of my important tasks is following up with patients to understand their postoperative recovery. Our hospital performs 70,000-80,000 surgeries annually, making manual phone follow-ups very impractical. Moreover, telecom carriers now have strict restrictions on phone calls. I found that for patients, WeChat might be a more acceptable follow-up method because it doesn’t feel as intrusive to their daily lives.

Why Choose Wechaty

However, most WeChat bot frameworks on the market are based on the web protocol, which is currently largely unusable. For example, although Wechaty can utilize the web protocol preserved on UOS, it cannot obtain permanent IDs, remarks, tags, and other data, which is very disadvantageous for follow-up work. After trying the padlocal protocol, I found it very suitable for my needs.

Why Choose Dify

Next, I want to talk about why I chose to use Dify to build this bot. First, I hoped this bot would not only complete follow-up tasks but also serve as a medical education bot for patients. With the explosion of large language models, this idea became very easy to implement. Using Wechaty and the large model API, I quickly conceived a preliminary framework.

Dify is a well-known large model Agent platform. Its API encapsulation is much more friendly than OpenAI’s official API, especially in terms of prompt construction and conversation thread maintenance. Although OpenAI can also build assistants, maintaining conversations doesn’t seem as easy. Additionally, the Dify platform itself has built some plugins, such as Google Search, which can be easily integrated into the API. Therefore, I ultimately chose Dify as my development platform.

The above is some background and thinking behind developing this medical follow-up WeChat bot. As a doctor and technical novice, I hope that by sharing my project experience, I can bring some inspiration and food for thought to everyone. In the following sections, I’ll discuss some technical details of this bot. I welcome everyone’s valuable opinions and suggestions.

Creating a GPT-4 Turbo-Based Model Through Dify

Dify provides a simple and easy-to-use interface that allows me to quickly create and test models.

First, I created a new application on the Dify platform and selected GPT-4 Turbo as the base model. In this initial phase, I temporarily didn’t use any custom prompts or plugins — I just wanted to do a simple test first to see how the model performs.

After creating the application, Dify automatically generates an API key, which we can use to call Dify’s API interface and interact with the intelligent conversation model we created.

Implementing the WeChat Bot with Wechaty

With the intelligent conversation model in place, we next need a platform to implement the WeChat bot and integrate the model into WeChat. Here I chose Wechaty. Wechaty is an open-source conversational bot SDK that supports personal WeChat accounts, built with Node.js and TypeScript.

Below is my code implementation, mainly divided into the following parts:

First, I use wechaty-puppet-padlocal as Wechaty’s Puppet Provider. It connects to WeChat through the iPad protocol, which is more stable and reliable compared to the Web protocol. Then I use WechatyBuilder to build our bot instance.

// Initialize Wechaty
const { PuppetPadlocal } = require("wechaty-puppet-padlocal");
const { WechatyBuilder } = require("wechaty");

const puppet = new PuppetPadlocal({
  token: process.env.PUPPET_PADLOCAL_TOKEN,
});

const bot = WechatyBuilder.build({ puppet, name: "test" });

Next is the core function for calling the Dify API. I use the axios library to send a POST request to Dify’s API endpoint, passing in the user’s input message, conversation ID, and other parameters, and authenticating with the API Key. Dify returns the reply generated by the intelligent conversation model.

// Function to call Dify API
const difyApiKey = process.env.DIFY_API_KEY;
const difyApiUrl = "https://api.dify.ai/v1/chat-messages";

async function sendMessage(message, userName) {
  // ...
  try {
    const response = await axios.post(
      difyApiUrl,
      {
        inputs: {},
        query: message,
        response_mode: "streaming",
        conversation_id: conversationData.conversationId,
        user: userName,
        files: [],
      },
      {
        headers: {
          Authorization: `Bearer ${difyApiKey}`,
          "Content-Type": "application/json",
        },
      }
    );
    // Process response...
  } catch (error) {
    if (error.response) {
      console.error(
        "Dify API responded with status code:",
        error.response.status
      );
    } else if (error.request) {
      console.error("No response received from Dify API:", error.request);
    } else {
      console.error("Error setting up request to Dify API:", error.message);
    }
    // Handle error appropriately...
  }
  // ...
}

Finally, I listen to Wechaty’s message event. When receiving a message from a user @mentioning the bot in a group chat, I extract the message content, call the sendMessage function to get the intelligent reply, and then send the reply to the group chat through room.say.

// Listen to message events
bot.on("message", async (message) => {
  // Get information about the message sender
  const id = message.talker().id;
  const room = message.room();
  const userName = message.name();
  const text = message.text();
  // Check if it's in my already created SQLite database
  if (!room) {
    const query = "SELECT * FROM contacts WHERE id = ?";
    try {
      const row = await db.get(query, [id]);
      if (row != undefined) {
        const reply = await sendMessage(text, userName);
        await message.talker().say(reply);
      }
    } catch (err) {
      console.error(err.message);
    }
  }
});

One important point to emphasize here is that I use the conversation_id in the Dify API to implement the conversation persistence feature. This part of the code is mainly in the sendMessage function:

const conversationMap = new Map(); // Create a key-value pair to save questioner information and conversation_id
const CONVERSATION_EXPIRATION = 5 * 60 * 1000; // Set conversation retention time to 5 minutes
async function sendMessage(message, userName) {
  // ...
  let conversationData = conversationMap.get(userName);
  const timestamp = Date.now();

  // If the conversation doesn't exist or has expired, create a new conversation
  if (
    !conversationData ||
    timestamp - conversationData.timestamp > CONVERSATION_EXPIRATION
  ) {
    conversationData = { conversationId: null, timestamp };
    conversationMap.set(userName, conversationData);
  }

  const response = await axios.post(
    difyApiUrl,
    {
      // ...
      conversation_id: conversationData.conversationId,
      user: userName,
      // ...
    }
    // ...
  );

  // Update conversation ID and timestamp
  conversationData.timestamp = timestamp;

  // ...
  // The following code is needed because I'm using stream mode to get Dify's response, so I need to traverse each reply to find the final response content
  for (const line of lines) {
    if (line.startsWith("data:")) {
      const data = JSON.parse(line.slice(5).trim());
      if (data.event === "agent_thought") {
        // ...
        conversationData.conversationId = data.conversation_id;
      }
    }
  }
  // ...
}

Through this approach, we can maintain an independent conversation context for each user, implementing multi-turn dialogues. When users continue to send messages within a certain time period (set to 5 minutes here), the context remains coherent; if this time is exceeded, a new conversation begins.

The next steps are all on the Dify platform, which I’m still working on. My vision is to upload health education articles downloaded from the web as a knowledge base and limit GPT-4 to only answer from the knowledge base. Anyway, that’s beyond the scope of technical discussion.

The above is the core code for implementing an intelligent WeChat conversation bot using Dify and Wechaty. Through Dify’s powerful conversation model and Wechaty’s convenient WeChat integration, we can quickly build a practical medical education bot. Of course, this is just a basic version. We can continue to add more features, such as custom prompts and knowledge base search, to further enhance the bot’s intelligence level.

本文也有中文版本。