Assistant professor’s two NSF grants aim to better sort social media content, identify online trolls
Projects totalling more than $737,000 to fund Blackburn's continuing research about bad actors on the internet.
The discussions happening on social media, both healthy and unhealthy, drive a lot of the public discourse and news coverage in our 21st-century world. Some people use platforms such as Facebook, Twitter and Reddit to make positive connections, but others prefer to sow misinformation and hate.
Given the popularity of those platforms and similar ones, which see millions of posts each day, it can be difficult for researchers to wrap their heads around what is being shared and how it affects our opinions on political and social topics.
Assistant Professor Jeremy Blackburn 鈥 a faculty member in the Department of Computer Science at 91社区鈥檚 Thomas J. Watson College of Engineering and Applied Science since 2019 鈥 is devising ways to make online content easier to gather and sort, particularly from emerging social media platforms.
Blackburn recently received a five-year, $517,484 National Science Foundation CAREER Award for his project The CAREER Award supports early-career faculty who have the potential to serve as academic role models in research and education.
The project includes four objectives:
- Create a multiplatform social media dataset, developing tools to leverage prior experience in large-scale data collection to perform continuous identification and collection of multimedia data from emerging social media platforms.
- Develop data-driven techniques to understand coded language used in social media, both text and images.
- Develop a new system for rating the sentiment of content by comparing pieces rather than looking at them individually.
- Explore user and community-level modeling of online sentiment.
鈥淎 big focus is on images,鈥 Blackburn said. 鈥淐an we infer the sentiment or the underlying meaning of an image? Images are used almost as much as text on the internet, and it鈥檚 hard to figure out what people are talking about if you can鈥檛 understand the visual language they鈥檙e using.鈥
Current algorithms classify the sentiment of an image by assessing it and assigning it an independent score, he said. For instance, one tweet may get a 0.4 on a predetermined 鈥渉appiness scale,鈥 while another one may get a 0.5 鈥 but what does that incremental difference mean for humans?
Instead, by showing two pieces of content and asking which is more positive, Blackburn hopes to get a better gauge of the emotion behind it. Complicating that endeavor, however, is knowing how images become memes among certain subsets of online commenters.
鈥淲e鈥檙e not interested in just saying what鈥檚 in the image 鈥 we鈥檙e interested in saying how it鈥檚 being used,鈥 he said. 鈥淲e鈥檙e going from the adage of 鈥榓 picture is worth 1,000 words鈥 and treat it as a piece of vocabulary. We have ways that can capture the look of it, but we鈥檙e also going to treat it like a word as we do in a language model and place it where it was used.
鈥淔or instance, if you tweet a picture, you may also include some words, and if we have enough of those samples, we can now figure out that someone is upset or sad or whatever the underlying meaning is. We can translate it into regular words.鈥
Although the development of this new technology to monitor online sentiment could have many uses, such as the political and business realms, Blackburn has a specific goal that he hopes to achieve.
鈥淲e could better understand violent content or hate speech online that is very coded, or we could identify misinformation so that people can鈥檛 hide this type of behavior by using just images,鈥 he said. 鈥淭hat鈥檚 my personal passion and the reason why I鈥檓 developing it.鈥
Another recently awarded NSF project takes aim at better detecting so-called 鈥渢roll鈥 accounts that disseminate false information as part of larger influence campaigns on social media.
The two-year, $220,000 grant 鈥 a collaboration with 鈥 will collect information about the troll accounts identified by Twitter and Reddit as belonging to disinformation campaigns spearheaded by countries that are U.S. adversaries.
These malicious users are different from 鈥渂ot鈥 accounts that automatically post the same message in multiple places. They are coordinated to interact with each other and take multiple sides of the same argument just to sow discord among anyone watching.
One example, Blackburn said, is two troll accounts 鈥渁rguing鈥 about 鈥淏lack Lives Matter鈥 versus 鈥淎ll Lives Matter鈥 not as a matter of principle but merely to spark drama among other users.
鈥淥ver time, the same troll account may take different positions on the same issue, because ultimately they don鈥檛 have a particular opinion 鈥 they just want to cause trouble,鈥 he said. 鈥淭hey have to convince people to become engaged.
The data collected for this project will be used to train machine-learning algorithms to identify troll accounts by codifying patterns of interactions that are uncommon in real accounts. Social media platforms then would be able to shut down the trolling without needing someone to moderate every questionable post.
鈥淭owards a Data-driven Understanding of Online Sentiment鈥 is . 鈥淒etecting Accounts Involved in Influence Campaigns on Social Media鈥 is