Documentation
Using the Topic Clustering app
3 min read
What does this app do?
This app is designed to help researchers quickly and easily make sense of large amounts of open-ended responses. It clusters the responses into groups and provides a label for each cluster, explaining the main theme or issue covered by the responses in that cluster. This can save researchers a significant amount of time when reviewing hundreds of open-ended responses, as it provides a starting point for understanding the content of the responses. While it is not a replacement for the more in-depth qualitative process of reviewing open-ended responses, it can serve as a valuable tool to accelerate the process and provide initial insights into the main themes and issues covered in the responses.
This app can be useful in a variety of research contexts, including when:
- You have included an open-ended question in a survey and want to understand the various types of responses being given.
- You are gathering feedback from customers and want to organize issues into clusters to better understand the main problems they are experiencing.
- You are conducting a survey and want to quickly identify any potential issues that may have arisen from comments collected during the survey.
- You have given respondents the option to specify a different (“other”) response to a multiple choice question and want to group these responses into categories.
- You are conducting a survey to gather employee feedback on your organization’s culture and want to identify areas for improvement based on the responses you receive.
Please note that this app can take some time to run as it often has to process hundreds of entries, then group them and then label them. Also note that the output can sometimes be nonsensical. Despite its shortcomings, we do thing it offers
What do I need to know about the input and the output?
To use this app, you will need to provide an excel sheet as input (in xlsx format). The sheet should have two columns, with a unique identifier in the first column and the corresponding open-ended responses in the second column. It is important to make sure that each column has a header, and we recommend labeling the second column with the question being asked. It is also important to note that each open-ended response should be limited to no more than 2,000 characters. Before uploading your data, it is a good idea to do some initial cleaning to remove any irrelevant or duplicate entries. This will help the app to more accurately cluster and label your responses. Once you have your input sheet ready, you can then upload it to the app to begin the clustering and labeling process.
Here is an example of what such an excel sheet might look like:
The app will provide you with an output excel sheet, containing a summary of the clusters and the corresponding label for each cluster. The first sheet includes as summary of the various clusters, with five examples of text under each cluster. The second sheet contains the dataset with the corresponding cluster groupings and labels. Please note that the app will not consider observations that are empty, that are deemed not useful (e.g. contain nonsensical text), entries that are offensive or outliers that the app was not able to fit into a cluster.