Key points when using Reddit as a source of data
A list of academic journal articles and resources you should really be citing. Yes, this is for you, random Google searcher.
People use data from Reddit all the time. Sometimes they’re students, academics, or data scientists.
These are three peer reviewed articles from the Reddit academic literature which are the most substantive from a methodological angle. They’re definitely a good idea to place in a research design or methodology section.
Adams, Artigiani and Wish (2019) compared Twitter and Reddit for conducting social media drug research. They focused on the use of keywords as the main way of finding out how people talked about drugs online.
… subreddits can be seen as self-selecting sampling frames.
The main highlight and findings were:
- Reddit like many online forums conveniently categorise relevant posts and comments themselves into topical subforums. Known as “subreddits”. Because topics can range from issues (see /r/politics), locations (/r/unitedkingdom), or interests (/r/movies) subreddits can be seen as self-selecting sampling frames
- Because Reddit has a large population of young white males it can be an avenue for research on phenomena which affects this group. Such as the American opioid epidemic
- In comparison to Twitter there was less slang when discussing drugs, but more synonyms of drugs and drug related practises
Shatz (2016) directly outlined Reddit as a source for ‘fast, free and targeted’ platform for recruiting participants online.
Older studies noted a high gender disparity, while newer studies generally note a small disparity, or near equal userbase
The main demographic highlights were:
- Older studies noted a gender disparity of the userbase, which skewed male. Though newer studies seem to disagree on this, with some generally finding a small difference or a near equal representation of both genders
- A study on the demographics of adult US users, who make up around half of active users, found that they are relatively representative of the US adult population when controlling for age
- Demographics probably vary from subreddit to subreddit, plus there is no wide database of subreddits. However users and volunteer moderators of the site sometimes conduct their own ‘subreddit surveys’ which ask various demographic information
Shatz identified several advantages:
- Potentially large samples and be recruited in a short amount of time
- Targeted particular topic-based subreddits allowed recruitment from specific demographics, special interest groups, or those traditionally marginalised
- Studies can generally be conducted for free. Which is important to students and early career researchers
And several disadvantages or limitations:
- The possibility of low recruitment or retention rates
- The recruitment process relied on user self-selection based on a post criteria, which means people could lie and take part. However he noted the majority of studies did not offer monetary compensation which may reduce the issue
- Reddit cannot be a catchall method for recruitment. For some studies it may be an inappropriate method for recruitment.
- General issues related to online recruitment are still relevant. Such as ethical considerations, required technical expertise and various errors in terms of sampling, measurements, and non-responses.
Jamik and Lane (2017) compared the use of Reddit to other sources of recruitment used in psychology.
They compared a sample drawn from Reddit with the Amazon’s online crowdsourcing Mechanical Turk (MTurk) and a typical undergraduate university sample. These types of samples are commonly used in psychology due to ease of access and low cost (Pollet, 2019).
The main findings were:
- The Reddit sample contained more males than females, was less ethnically diverse than the undergraduate sample, contained more non-Americans, and contained more highly educated individuals, some who had already graduated university
- The Reddit sample was significantly older and covered a large range and than the undergraduate sample. However overall Grade Point Averages (GPAs) were similar
- They concluded that using Reddit as a source of recruitment may create more diverse samples than traditional university student samples.
Main citations
Adams, N., Artigiani, E.E. and Wish, E.D. (2019) ‘Choosing Your Platform for Social Media Drug Research and Improving Your Keyword Filter List’, Journal of Drug Issues, 49(3), pp. 477–492. doi: 10.1177/0022042619833911.
Jamnik, M.R. and Lane, D.J. (2017) ‘The Use of Reddit as an Inexpensive Source for High-Quality Data’, Practical Assessment, Research & Evaluation, 22(4/5), pp. 1–10.
Shatz, I. (2016) ‘Fast, Free, and Targeted: Reddit as a Source for Recruiting Participants Online’, Social Science Computer Review, 35(4), pp. 537–549. doi: 10.1177/0894439316650163.
Note: This blogpost was originally posted on Medium in November 2019.