PMID- 36046150
OWN - NLM
STAT- PubMed-not-MEDLINE
LR  - 20220907
IS  - 2624-8212 (Electronic)
IS  - 2624-8212 (Linking)
VI  - 5
DP  - 2022
TI  - Asian hate speech detection on Twitter during COVID-19.
PG  - 932381
LID - 10.3389/frai.2022.932381 [doi]
LID - 932381
AB  - Coronavirus disease 2019 (COVID-19) started in Wuhan, China, in late 2019, and 
      after being utterly contagious in Asian countries, it rapidly spread to other 
      countries. This disease caused governments worldwide to declare a public health 
      crisis with severe measures taken to reduce the speed of the spread of the 
      disease. This pandemic affected the lives of millions of people. Many citizens 
      that lost their loved ones and jobs experienced a wide range of emotions, such as 
      disbelief, shock, concerns about health, fear about food supplies, anxiety, and 
      panic. All of the aforementioned phenomena led to the spread of racism and hate 
      against Asians in western countries, especially in the United States. An analysis 
      of official preliminary police data by the Center for the Study of Hate & 
      Extremism at California State University shows that Anti-Asian hate crime in 16 
      of America's largest cities increased by 149% in 2020. In this study, we first 
      chose a baseline of Americans' hate crimes against Asians on Twitter. Then we 
      present an approach to balance the biased dataset and consequently improve the 
      performance of tweet classification. We also have downloaded 10 million tweets 
      through the Twitter API V-2. In this study, we have used a small portion of that, 
      and we will use the entire dataset in the future study. In this article, three 
      thousand tweets from our collected corpus are annotated by four annotators, 
      including three Asian and one Asian-American. Using this data, we built 
      predictive models of hate speech using various machine learning and deep learning 
      methods. Our machine learning methods include Random Forest, K-nearest neighbors 
      (KNN), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), 
      Logistic Regression, Decision Tree, and Naive Bayes. Our Deep Learning models 
      include Basic Long-Term Short-Term Memory (LSTM), Bidirectional LSTM, 
      Bidirectional LSTM with Drop out, Convolution, and Bidirectional Encoder 
      Representations from Transformers (BERT). We also adjusted our dataset by 
      filtering tweets that were ambiguous to the annotators based on low Fleiss Kappa 
      agreement between annotators. Our final result showed that Logistic Regression 
      achieved the best statistical machine learning performance with an F1 score of 
      0.72, while BERT achieved the best performance of the deep learning models, with 
      an F1-Score of 0.85.
CI  - Copyright (c) 2022 Toliyat, Levitan, Peng and Etemadpour.
FAU - Toliyat, Amir
AU  - Toliyat A
AD  - Computer Science Program, Graduate Center, City University of New York, New York, 
      NY, United States.
FAU - Levitan, Sarah Ita
AU  - Levitan SI
AD  - Computer Science Program, Hunter College, City University of New York, New York, 
      NY, United States.
FAU - Peng, Zheng
AU  - Peng Z
AD  - Computer Science Program, City College, City University of New York, New York, 
      NY, United States.
FAU - Etemadpour, Ronak
AU  - Etemadpour R
AD  - Computer Science Program, City College, City University of New York, New York, 
      NY, United States.
LA  - eng
PT  - Journal Article
DEP - 20220815
PL  - Switzerland
TA  - Front Artif Intell
JT  - Frontiers in artificial intelligence
JID - 101770551
PMC - PMC9421075
OTO - NOTNLM
OT  - Asian hate crime
OT  - COVID-19
OT  - Twitter
OT  - machine learning
OT  - natural language processing
COIS- The authors declare that the research was conducted in the absence of any 
      commercial or financial relationships that could be construed as a potential 
      conflict of interest.
EDAT- 2022/09/02 06:00
MHDA- 2022/09/02 06:01
PMCR- 2022/08/15
CRDT- 2022/09/01 02:23
PHST- 2022/04/29 00:00 [received]
PHST- 2022/06/27 00:00 [accepted]
PHST- 2022/09/01 02:23 [entrez]
PHST- 2022/09/02 06:00 [pubmed]
PHST- 2022/09/02 06:01 [medline]
PHST- 2022/08/15 00:00 [pmc-release]
AID - 10.3389/frai.2022.932381 [doi]
PST - epublish
SO  - Front Artif Intell. 2022 Aug 15;5:932381. doi: 10.3389/frai.2022.932381. 
      eCollection 2022.