Hate speech detection is a crucial endeavor in maintaining the safety of online spaces, but the effectiveness of supervised approaches hinges mainly on the availability of annotated data. Prior research has explored the utility of natural language inference (NLI) models for zero-shot hate speech detection (ZSHSD), which leverages the capacity of these models to learn semantic relationships and adapt to downstream task without relying on large annotated datasets. NLI models assess if a premise sentence logically entails a hypothesis sentence, relying on precise hypothesis design to achieve adequate downstream task performance. Existing frameworks that use NLI model for ZSHSD rely on multiple inferences with different hypothesis to extract characteristics to achieve desirable outcomes. In light of the challenges surrounding ZSHSD and the method of relying on discrete tokens to design hypotheses, we aim to optimize and identify ideal intermediate representations by applying p-tuning techniques. On HateCheck, a fully tuned hypothesis led to a 18.8% point (pp) accuracy improvement, compared to a discrete designed hypothesis. Our work surpassed prior work by achieving a 5.6 pp accuracy enhancement, outperforming previous approaches that required multiple inferences. Also, the optimized tokens uncover relations to broader aspects of hate speech, offering insights for hypothesis design.

Tuning Hypothesis Creation: Combining Discrete and Continuous Spaces for Zero-Shot Hate Speech Detection

Barbon Junior S.;
2025-01-01

Abstract

Hate speech detection is a crucial endeavor in maintaining the safety of online spaces, but the effectiveness of supervised approaches hinges mainly on the availability of annotated data. Prior research has explored the utility of natural language inference (NLI) models for zero-shot hate speech detection (ZSHSD), which leverages the capacity of these models to learn semantic relationships and adapt to downstream task without relying on large annotated datasets. NLI models assess if a premise sentence logically entails a hypothesis sentence, relying on precise hypothesis design to achieve adequate downstream task performance. Existing frameworks that use NLI model for ZSHSD rely on multiple inferences with different hypothesis to extract characteristics to achieve desirable outcomes. In light of the challenges surrounding ZSHSD and the method of relying on discrete tokens to design hypotheses, we aim to optimize and identify ideal intermediate representations by applying p-tuning techniques. On HateCheck, a fully tuned hypothesis led to a 18.8% point (pp) accuracy improvement, compared to a discrete designed hypothesis. Our work surpassed prior work by achieving a 5.6 pp accuracy enhancement, outperforming previous approaches that required multiple inferences. Also, the optimized tokens uncover relations to broader aspects of hate speech, offering insights for hypothesis design.
2025
9783031790379
9783031790386
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3115688
 Avviso

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact