Distilling Survey Insights: ChatGPT Meets Python

Decoding Volumes: Efficiently Extracting Insights from Qualitative Data with GPT-3.5 Turbo

Isabelle Bittar
3 min readAug 25, 2023
By KI Data Science

Analyzing qualitative survey data can be overwhelming, especially with numerous responses. Leveraging OpenAI’s GPT-3.5 Turbo, this guide demonstrates extracting key topics from survey results in Python. This technique fueled our project to craft an impactful survey dashboard. Let’s unveil the magic behind this section of the dashboard’s creation:

Starting Point: We have 108 survey comments saved in a CSV file.

Screenshot of Survey Comments

Step 1: Obtain OpenAI API Key

  1. Create an account at OpenAI.
  2. Access your API key under “View API”.
  3. Note: Initial sign-ups might have API credits. If exhausted, update your billing details.
Screenshot of OpenAI Account

Step 2: Install the Python OpenAI Library

pip install openai

Step 3: Import Necessary Libraries

#Load libraries
import openai
import pandas as pd
import re

Step 4: Authenticate with OpenAI

#Authenticate with OpenAI
openai.api_key = 'YOUR_API_KEY'

Step 5: Create a Function to Extract Key Topics

This function communicates with ChatGPT and fetches summarized key topics.

#Use GPT-3.5 Turbo to ask for key topics in the text
def extract_key_topics(text):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{
"role":"user",
"content":"Summarize the key topics from the following text in 7 to 10 points.:" + text
}],
max_tokens=150
)

# Parse and return the extracted topics
return response.choices[0].message.content

Step 6: Process the Data

Merge all data rows and send them to ChatGPT for topic extraction.

# Summarize key topics
df = pd.read_csv("Survey Responses_Return to the Office.csv")
combined_string = df['Comment'].astype(str).str.cat()
topics = extract_key_topics(combined_string)
print(topics)

This provides the following results:

1. Struggling with work-life separation while working from home\n2. Excitement for access to resources and equipment in the office\n3. Enjoyment of quietness while working from home and concerns about noise and distractions in the office\n4. Hesitation to return due to lack of social distancing and health risks\n5. Productivity preference for working from home\n6. Appreciation for routine and structure in the office for mental health\n7. Concerns about losing flexibility when returning to the office\n8. Request for more time to make a decision about returning\n9. Desire for social interaction and catching up with coworkers\n10. Importance of having a choice in returning to the office or continuing remote work.'

Step 7: Structure the Results into a DataFrame

For better visualization, reformat the data and structure it into a dataframe.

# Organize topics in dataframe
topics = re.sub("[0-9]","",topics)
topics_list = topics.split(". ")
topics_df = pd.DataFrame(topics_list)
topics_df = topics_df.tail(-1)
print(topics_df)

This provides the following result:

Step 8: Export the Data to a CSV

Save your organized data for future usage or sharing.

#Save new dataframe to CSV
topics_df.to_csv("topics_output.csv", index=False)

Conclusion

Automating the process of summarizing qualitative survey results can save a significant amount of time and ensure consistency in data analysis. By harnessing the power of OpenAI’s GPT-3.5 Turbo model, we can efficiently extract key topics from large volumes of survey data. This guide walked you through the process, from setting up the OpenAI environment to saving organized results in a CSV file. By integrating such automated techniques into your workflow, you can focus on making data-driven decisions based on clear insights rather than getting bogged down in manual data processing.

Inspired from:

--

--

Isabelle Bittar

Isabelle is a Montreal-based business consultant specialized in data science.