CrisisCompanion: Creating a Mental Health Support Chatbot

Summary: Access to mental health support remains a critical global challenge, with millions unable to receive timely care due to cost, stigma, or availability barriers. While AI-powered chatbots show promise in providing 24/7 accessible support, developing systems that are both empathetic and safe requires careful integration of large language models. In this challenge, your task is to develop a mental health support chatbot that can engage in compassionate conversations.

Method areas: Generative AI, Text Analysis, Large Language Models (LLMs)

Prerequisites: Some familiarity with NLP concepts like tokenization and embeddings will be helpful. Prior experience with LLMs or deep learning isn’t required, but you should be ready to learn and collaborate—especially around prompt design or model selection. Tutorials and sample scripts will be provided.

Background

Mental health concerns affect people across all demographics, yet access to timely, compassionate support remains limited. Many individuals face barriers such as stigma, cost, or lack of local resources—factors that delay or prevent them from seeking help. In response, mental health professionals and technologists alike have explored digital tools to provide scalable, always-available support.

Recent advances in large language models (LLMs) have opened new possibilities for conversational agents that can engage in emotionally supportive dialogue. However, building systems that are both helpful and safe—especially in sensitive domains like mental health—requires more than just prompt engineering. Developers must balance empathy, clarity, and risk awareness, while drawing from real-world conversational data and best practices in digital mental health.

Goal

In this challenge, you’ll explore how generative AI can be responsibly applied to the development of mental health support chatbots. You’ll work with a range of publicly available datasets—from counseling transcripts to emotion classification corpora—to build and evaluate a conversational agent that can provide basic, non-clinical support.

There are a number of openly available mental health conversations, emotional reaction, and mental health state detection datasets. 

GitHub Datasets

Empathy-Mental-Health Dataset

  • Description: Reddit-based dataset focusing on empathetic responses to mental health discussions
  • Use Case: Training models to generate more empathetic and supportive responses
  • Format: CSV file with emotional reactions and responses

HOPE Dataset (Access Request Required)

  • Description: 202 dyadic counseling conversation transcripts with dialogue-act tags
  • Use Case: Professional counseling conversation patterns and dialogue flow
  • Special Note: Requires access request due to sensitive nature of professional counseling data
  • Key Feature: Each utterance tagged for dialogue classification

Hugging Face Datasets

Mental Health Conversations Collection by CalebE

Three related datasets providing different formats and perspectives:

  1. New Mental Health Conversations All1
    • Comprehensive mental health conversation dataset
    • Multiple conversation formats and contexts
  2. New Mental Health Conversations All
    • Alternative version with different preprocessing
    • Suitable for various NLP tasks
  3. Mental Health Counseling Conversations Formatted
    • Specifically formatted for counseling-style interactions
    • Clean structure for fine-tuning conversational models

Suicide Depression Detection

  • Description: Specialized dataset for detecting crisis indicators
  • Use Case: Training safety classifiers and risk assessment modules
  • Critical Feature: Labeled data for different risk levels

TREC Question Classification

  • Description: Question intent classification dataset
  • Use Case: Understanding user query types and improving response relevance
  • Application: Helps chatbots better interpret what users are really asking

Twitter Sentiment Analysis

  • Description: Social media language patterns with sentiment labels
  • Use Case: Understanding informal language, abbreviations, and emotional expressions
  • Key Features: Real-world social media text with varied expression styles

Kaggle Datasets

Mental Health Conversational Data

  • Description: Structured mental health conversations
  • Use Case: Direct application to chatbot training
  • Format: Question-answer pairs with mental health context

IMDB Movie Reviews

  • Description: Large-scale sentiment analysis benchmark (50,000 reviews)
  • Use Case: Pre-training sentiment analysis components
  • Advantages:
    • Well-established baseline for sentiment classification
    • Large size enables robust model training
    • Binary sentiment labels (positive/negative)

Emotion Dataset

  • Description: Multi-class emotion classification in short messages
  • Use Case: Fine-grained emotion detection beyond binary sentiment
  • Emotion Categories: Joy, sadness, anger, fear, love, surprise
  • Application: Understanding emotional nuances in user messages

Required

  • Version Control with GitHub Desktop: All participants must know how to work in Git/GitHub. 
  • Intro to Python: While it’s possible to build chatbots in other languages, Python is strongly recommended for LLM integration and NLP libraries. You can work in another language if you find team members who are also willing.
  • Intro to Natural Language Processing (NLP): Understanding text preprocessing, tokenization, and embeddings will accelerate your chatbot development, though tutorials will be provided.

Helpful but not strictly required

  • Basic Neural Networks: Some exposure to deep learning concepts is helpful for understanding how LLMs work, but beginners are welcome if paired with experienced teammates.
  • Large Language Models (LLMs): No prior experience with the application of LLMs, Hugging Face transformers, or fine-tuning is required, but you must be willing to learn! We’ll provide LLM tutorials, example scripts (e.g., the use of Qwen), and prompt engineering guides.
  • Hugging Face Transformers Quickstart: A beginner-friendly introduction to using pre-trained models for text generation and sentiment analysis. 
  • LangChain Chatbot Tutorial: Step-by-step guide to building a basic chatbot with conversation memory. Includes simple code for maintaining chat history and generating contextual responses – essential for any mental health application.
  • Google Colab: Fine-tuning GPT-2 for Beginners: Free, cloud-based notebook that walks through fine-tuning a smaller language model. No GPU required locally.
  • Streamlit Chatbot Interface Tutorial: Simple guide to creating a web-based chat interface in Python. Students can have a working demo in 30 minutes, perfect for showcasing their mental health chatbot.

This challenge will launch on Sept. 11, 2025 (MLM25 kickoff), and will be hosted outside of Kaggle (MLM25 only).

If you have any questions about participating, please contact the challenge organizer: Ross Jacobucci (jacobucci@wisc.edu).