New students join the lab!
- Vanja Vekić Chen, MA student
- Romina Hashemi, MA student
- Jodie Lee, Undergraduate student - Undergraduate Student Research Assistant (USRA)
- Amber Rynearson, MA student
A summary of our contributions on the Gender Gap Tracker:
1. Dashboards and code:
- Main dashboard with summary results: .
- Research dashboard with text analyzer and topic modelling results:
- Code:
2. Op-eds and commentary:
- P. Rao, L. Chambers and M. Taboada. "". Blog post. November 30, 2022.
- P. Rao, M. Taboada and S. Graydon. "". Poynter. October 28, 2021.
- M. Taboada "". The Conversation. November 25, 2020.
- M. Taboada and F. Torabi Asr. “”. The Conversation. February 3, 2019. (Translated as "".)
3. Academic papers
- Rao, P. and M. Taboada (2021) . Frontiers in Artificial Intelligence – Language and Computation 4(82). doi: 10.3389/frai.2021.664737.
- Asr, F.T., M. Mazraeh, A. Lopes, V. Gautam, J. Gonzales, P. Rao and M. Taboada (2021) PLoS ONE 16(1): e0245533.
Op-ed on the Gender Gap Tracker and its third birthday:
- P. Rao, M. Taboada and S. Graydon. "". Poynter. October 28, 2021.
And a more expanded blog post with more details, publications, and statistics:
Op-ed on the language of fake news, in Items, the journal of the Social Sciences Research Council:
- M. Taboada. “”. Items – Insights from the Social Sciences. September 7, 2021.
An update on our project about online news comments. Three more papers (#8, #9, and #10 below) on news comments and a summary of our findings:
- Raw data
- Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2018) The SFU Opinion and Comments Corpus: A corpus for the analysis of online news comments. 91ܽ. DOI:
- GitHub page, with link to download the corpus:
- Paper describing the raw data (with small annotations)
- Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2020) . Corpus Pragmatics 4: 155–190.
- Annotated data (12,000 comments), in collaboration with
- Kolhatkar, V., Thain, N., Sorensen, J., Dixon, L., Taboada, M., 2020. C3: The Constructive Comments Corpus. Jigsaw and 91ܽ. Dataset. DOI: .
- Paper describing the large-scale annotation
- Kolhatkar, V., N. Thain, J. Sorensen, L. Dixon and M. Taboada (to appear) First Monday. Available on arXiv:
- Register analysis: Are news comments like conversations? (tl;dr: NO)
- Ehret, K. and M. Taboada (2020) . Register Studies 2(1): 1-36.
- Subjectivity analysis: How complex are news comments vs. opinion articles? (tl;dr: it's complex)
- Ehret, K. and M. Taboada (2021) . 23(2): 141-165.
- Constructiveness and toxicity across 3 newspapers:
- Op-ed. Gautam, V. and M. Taboada. 2019. “”. The Tyee.
- NEW!!! Register analysis, again. If not like conversation, what are comments like? (Answer: a hybrid register):
- Ehret, K. and M. Taboada (2021) . 4(79): 10.3389/frai.2021.643770.
- NEW!!! Appraisal analysis. Comments are very negative. They tend to express evaluation as Judgement or Appreciation (rather than Affect).
- Cavasso, L. and M. Taboada (2021) . 4: 1-38.
- NEW!!! Concessive relations in comments. Concessions have an interpersonal function and are used for evaluation and argumentation, especially in constructive comments.
- Gómez-González, MLA and M. Taboada (2021) 174: 96-116.
We have learned a lot about online news comments. Mostly, that they are very complex and more like essays than casual conversation.
We have been working for almost 3 years now on a project analyzing the gender gap in Canadian media. We have created a summary dashboard with overall statistics and a research dashboard analyzing topics and top-quoted sources. We can also now share the great news that a research paper on the Gender Gap Tracker has been published!
- Asr, F.T., M. Mazraeh, A. Lopes, V. Gautam, J. Gonzales, P. Rao and M. Taboada (2021) 16(1): e0245533.
Our findings:
- In 2 years of Canadian news media, the percentage of women quoted is regularly below 30%
- Women authors quote more women
- Politicians dominate in the news
- NLP can help us find these patterns in data
Op-ed on women quoted during COVID-19:
- . The Conversation. November 25, 2020.
Op-ed by Lucas Chambers and Maite Taboada on media coverage of elections:
- . Canadian Science Policy Centre. November 3, 2020.
The lab has been busy analyzing news comments. Here, all in one place, are the papers and the data that we have produced:
- Raw data
- Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2018) The SFU Opinion and Comments Corpus: A corpus for the analysis of online news comments. 91ܽ. DOI:
- GitHub page, with link to download the corpus:
- Paper describing the raw data (with small annotations)
- Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2020) . Corpus Pragmatics 4: 155–190.
- Annotated data
- Kolhatkar, V., Thain, N., Sorensen, J., Dixon, L., Taboada, M., 2020. C3: The Constructive Comments Corpus. Jigsaw and 91ܽ. Dataset. DOI: .
- Paper describing the large-scale annotation
- Kolhatkar, V., N. Thain, J. Sorensen, L. Dixon and M. Taboada (to appear) First Monday. Available on arXiv:
- Register analysis: Are news comments like conversations? (tl;dr: NO)
- Ehret, K. and M. Taboada (2020) . Register Studies 2(1): 1-36.
- Subjectivity analysis: How complex are news comments vs. opinion articles? (tl;dr: it's complex)
- Ehret, K. and M. Taboada (to appear) Discourse Studies.
We analyzed more than 1.5 million comments from 3 news organizations. We found more constructive comments than we expected, that toxicity happens equally across topics and that some news outlets have better commenters than others. The full story:
- Gautam, V. and M. Taboada (2019) . Report. 91ܽ. November 2019.
We also published a short version as an op-ed for The Tyee:
- Gautam, V. and M. Taboada. . The Tyee (online). November 6, 2019.
, has been available online for a while. Now the paper describing it is also online:
- Kolhatkar, V.,H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (to appear) . .
Maite is participating in a project that studies online abuse against candidates in the 2019 Canadian federal election, with Heidi Tworek from the University of British Columbia as principal investigator:
Preliminary results of the analysis will be made available early in 2020.
Discourse Processing Lab postdoc Fatemeh Torabi Asr publishes
- Discussed in this
- Reprinted by , by , by , by , by .
Maite discusses the in Spanish:
- Article and radio interview from
- Article in
Paper on data quality for misinformation detection now available, open access:
Asr, F.T. and M. Taboada (2019) . January-June 2019:1-14.
Paper too long to read? There's a !
The lab participated again in the , showcasing the .
See a summary of .
A short video about Maite's research, especially on sentiment analysis.
The Discourse Lab is hosting a talk by visiting researcher Maite Martín.
Title: Affective and Social Computing in Spanish using Human Language Technologies
Speaker: Maite Martín, Universidad de Jaén (Spain)
When: Friday, June 22, 1 pm
Where: RCB 7402
Abstract: In this talk I will present some past projects and work in progress in which my research group SINAI (Sistemas INteligentes de Acceso a la Información – Intelligent systems for information access) is involved. Our area of specialization focuses on the development of techniques and tools to solve problems related to Human Language Technologies (HLT). I will briefly discuss our research oriented to Information Retrieval Systems (IRS) mainly in the biomedical domain. We are integrating heterogenous sources of medical and general information (UMLS, Google, SciELO, Dbpedia…) in order to improve the final IRS. I will also highlight the work we have done in the field of affective computing, mainly focused on Spanish and on the social web. Although lot of work has been already done in opinion mining, we think the real challenge is to recognize and analyse emotion expressed in textual documents. Finally, I describe future projects related to early detection of mental health problems (depression, anxiety, cyberbullying…) by analysing the textual information written in social networks. I will show some demos implemented by SINAI.
Short Bio
Dr. Maite Martín is Associate Professor in the Computer Science department of the University of Jaén (Spain). She received her Master's degree in Computer Science at the University of Granada, and her PhD in Computer Science at the University of Málaga. She has been teaching different courses at the University since 1995. She has been a member of the research group SINAI (Sistemas INteligentes de Acceso a la Información – Intelligent systems for information access) since 2000. Her scientific interests include several areas related to Human Language Technologies such as Information Retrieval, Machine Learning, Text Mining and Sentiment Analysis. She has been a member of programme committees of several international and national conferences. In addition, she has participated in more than 30 national research projects serving as lead researcher in some of them. She has published more than a hundred conference papers, journal papers, books and book chapters. Martín is the current treasurer of the Spanish Society of Natural Language Processing (SEPLN – Sociedad Española para el Procesamiento del Lenguaje Natural). She is editor of a number of issues of the journal Procesamiento de Lenguaje Natural (Natural Language Processing). She has also been an invited speaker at several conferences.
A couple of interviews on trolls and social media:
- CKNW in Vancouver, .
- CHQR in Calgary, .
in The Conversation about our research on online news comments. Trolls, toxicity and construtive conversations.
Maite is part of a panel discussing the documentary , about content moderation in social media.
is a visiting PhD researcher from the University of the Basque Country in Spain. He will be in the lab between February and May, doing research on rhetorical relations and sentiment in Basque.
We have just released the SFU Opinion and Comments Corpus (SOCC), a corpus for the analysis of online news comments. Our corpus contains comments and the articles from which the comments originated. The articles are all opinion articles, not hard news articles. The corpus is larger than any other currently available comments corpora, and has been collected with attention to preserving reply structures and other metadata. In addition to the raw corpus, we also present annotations for four different phenomena: constructiveness, toxicity, negation and its scope, and appraisal.
Full details, and download link, are available from our GitHub project page:
For more information about this work, please see our papers.
- Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2018) . Journal paper under review.
- Kolhatkar. V. and M. Taboada (2017) . , Conference on Empirical Methods in Natural Language Processing. Copenhagen. September 2017.
- Kolhatkar, V. and M. Taboada (2017) . , 55th Annual Meeting of the Association for Computational Linguistics. Vancouver. August 2017, pp. 11-17.
Contact:
Varada Kolhatkar (vkolhatk@sfu.ca)
Maite Taboada (mtaboada@sfu.ca)
Our postdoctoral researcher, Katharina Ehret, has been featured in an article on the Faculty of Arts and Social Sciences webpage.
Visiting Researcher
Dr. from Griffith University in Australia is visiting the Discourse Lab between October 18 and November 10. Dr. Goddard is a long-time collaborator, and is here thanks to an SFU-Griffith Collaborative Travel Grant.
The lab has grown! We have two new undergraduate students, a new master's student, and two new postdocs. It'll be a busy semester!
Speaker: Muhammad Abdul-Mageed, Assistant Professor of Information Science in the iSchool at UBC.
Abstract: Accurate detection of emotion from natural language has applications ranging from building emotional chatbots to better understanding individuals and their lives. However, progress on emotion detection has been hampered by the absence of large labeled datasets. In this work, we build a very large dataset for fine-grained emotions and develop deep learning models on it. We achieve a new state-of-the-art on 24 fine-grained types of emotions (with an average accuracy of 87.58%). We also extend the task beyond emotion types to model Robert Plutchik’s 8 primary emotion dimensions, acquiring a superior accuracy of 95.68%.
will be a visiting PhD researcher in the lab until August. She is conducting cross-linguistic research on socio-semiotic processes in privacy policies, using a systemic-functional lingusitics approach.
Presentation on Spark:
-- MapReduce
-- Spark dataframe udf
-- search engine, Spark GraphFrame
-- Spark MLLIB, Scikit Learn
-- Spark pipeline with coreNLP
Installation instructions for WebAnno
Speaker: Enamul Hoque
Abstract: Analyzing and gaining insights from a large amount of online conversations can be quite challenging for a user, especially when the discussions become very long. During my doctoral research, I have focused on integrating Information Visualization (InfoVis) with Natural Language Processing (NLP) techniques to better support the user’s task of exploring and analyzing conversations. For this purpose, I have designed a visual text analytics system that supports the user exploration, starting from a possibly large set of conversations, then narrowing down to a subset of conversations, and eventually drilling-down to a set of comments of one conversation. Our evaluations through case studies with domain experts and a formal user study with regular blog readers illustrate the potential benefits of our approach, when compared to a traditional blog reading interface.