Sentiment and topic modelling of clothing fashion brands on Reddit

1 minute read

Aim

What brands are the most popular across Reddit? How does popularity change over time? Is there a gender divide? What brands are talked about positively and negatively?

To answer these questions I will be conducting a short project using machine learning with Reddit data involving natural language processing (NLP).

Data collection

I will take data from a purposive sample of fashion-related subreddits. I will use Python to access Pushshift’s API to access and download relevant comments and posts to construct a corpus.

Date cleaning and pre-processing

Having a clean and processed dataset is crucial for NLP analysis. In addition to typical pre-processing you have to take into account Reddit’s custom form of markdown which has to be removed with redditcleaner.

Analysis

To analyse the dataset I will use the Natural Language Toolkit (NLTK) library for sentiment analysis of identified brands and topic modelling, specifically Latent Dirichlet Allocation (LDA), to identify in what ways brands are discussed.

Further scope

After popularity and sentiment of brands has been tracked I’m keen to explore qualitatively for deeper insights. I expect high positive and negative sentiment to be clustered around certain events, such as articles on slave labour, store closures or product announcements.

Share on

Twitter Facebook LinkedIn

Naiyan Jones

Sentiment and topic modelling of clothing fashion brands on Reddit

Aim

Data collection

Date cleaning and pre-processing

Analysis

Further scope

Share on

You May Also Enjoy

Some operational tips for running large survey: dates and return by dates

Tips for finding a mentor in the civil service

Different names for social and economics surveys

A quick comparison of social and economic surveys in the UK