Deception in Language
Language is a medium of conveying information, but unfortunately, it is also often used to deceive. We have explored the detection of such language in Online reviews, and discovered that the stylometric aspects of language play an important role in exposing the deceptive intent of writers (Feng, Banerjee, and Choi; 2012a). We have also carried out experiments on the process of creating such language by investigating the differences in how people type when writing truthful as opposed to deceptive texts, and revealed interesting parallels between typing patterns and speech patterns when people lie (Banerjee et al. 2014). On a related note, we also investigated stylometric aspects of language to identify the traits of individual writers (Feng, Banerjee, and Choi; 2012b).
Research Group
Ritwik Banerjee, Research Assistant Professor, Stony Brook University
Song Feng, Sr. Applied Scientist, Amazon Web Services
Jun S. Kang, Sr. Software Engineer, Blink Health
Yejin Choi, Professor of Computer Science, University of Washington
Research Products
[Feng, Banerjee, and Choi; 2012a]- Song Feng, Ritwik Banerjee, and Yejin Choi. Syntactic Stylometry for Deception Detection. In Proceedings of the 50th Annual Meeting of the Association for Computation Linguistics (Vol. 2: Short Papers), pp. 171 - 175. Association for Computational Linguistics, 2012. [ PDF ]
- Song Feng, Ritwik Banerjee, and Yejin Choi.Characterizing Stylistic Elements in Syntactic Structure. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1522 - 1533. Association for Computational Linguistics, 2012. [ PDF ]
- Ritwik Banerjee, Song Feng, Jun Seok Kang, and Yejin Choi. Keystroke Patterns as Prosody in Digital Writings: A Case Study with Deceptive Reviews and Essays. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1469 - 1473. Association for Computational Linguistics, 2014. [ PDF ]
- The dataset contains truthful and deceptive writings from two domains: business reviews, and essays on two topics of social interest: gun control and gay marriage. The data is available for download as compressed tar.bz2 files: