Research
Understanding human language plays a pivotal role in creating intelligent systems. With that in view, Banerjee's research spans multiple areas that bring together machine learning (ML) and natural language processing (NLP): biomedical knowledge discovery for better healthcare, misinformation analysis, and linguistics for security and privacy.
Language use varies a lot depending on the what (content), why (intent), who (speaker/writer and audience), and how (style). Natural language understanding can be improved based on these insights, and intelligent systems built using a deeper understanding of human language can be employed for immense social good. The language used in medical research, for instance, is highly specialized as it is meant for technical comprehension by other researchers in that field; but a system capable of understanding it, and extracting useful information from it, can help healthcare practitioners and patients. Quotidian language use, on the other hand, is dictated by various aspects of individual and collective human behavior. An intelligent system can be refined to provide better help to individuals and/or social groups by interpretation of (and inference from) language.
Current Projects
Privacy Compliance of Live Medical Data
In today’s digital healthcare world, mobile health apps offer great convenience, but they also raise serious concerns about the privacy of your personal health data. This project is developing a groundbreaking framework to ensure that health apps not only comply with privacy regulations but also give you more control and transparency over your information. By creating tools that help app developers design with privacy in mind, and models that make complex legal jargon clearer for everyone, this research empowers users to better understand how their data is collected, shared, and used. Ultimately, it builds trust in health apps, protects sensitive personal information, and fosters a safer digital healthcare experience for all.
Information Extraction in Clinical Nephrology
Chronic Kidney Disease (CKD) is a serious health threat, and finding early warning signs is critical to preventing its progression. This study explores a groundbreaking possibility: that mental health conditions like major depression and post-traumatic stress disorder (PTSD) could play a role in CKD development. By harnessing the power of supervised machine learning and natural language processing, the research analyzes clinical notes and radiology reports to uncover PTSD's potential link to CKD, paving the way for earlier detection and more personalized care.
(Project page under development)
Fallacious Argumentation in Information Disorder
Banerjee and his team address the challenge of identifying subtle misinformation that spreads through manipulative language and deceptive arguments in today’s information landscape. By developing advanced computational models, this research goes beyond detecting outright falsehoods, focusing instead on the hidden tactics that distort knowledge and erode trust. The research aims to expose this epistemic corruption by recognizing the fallacious arguments that often slip under the radar, offering a more comprehensive approach to combating misinformation.
(Project page under development)
Past Projects
Tracking Semantic Change in Medical Information
In this groundbreaking research, Banerjee and his team tackle a major issue plaguing today's digital world: how medical information can change as it moves from research papers to news articles and social media posts. By focusing on subtle shifts in meaning — like oversimplification or selective reporting — the project reveals how even well-cited medical news can mislead readers. Using advanced information retrieval (IR) and deep neural network models, the team studies these shifts without relying on human judgment, marking the first-ever AI-based analysis of health misinformation. One major finding showed that at least 1% of social media posts citing reputable news sources used those citations deceptively, creating false trust.
The research produced new tools for verifying medical claims across genres, provided valuable data sets, and trained the next generation of researchers. Through this project, Banerjee and his team highlight how AI can help protect us from misleading information in health-related news and social media without reliance on external sources of expertise for their opinions.
Extraction & Classification of Financial Information
A multi-year project focused on fine-grained document-type classification and information extraction from complex financial texts applied to scalable automation of a broad range of tasks.
(Details of this research are proprietary)
Semantic Similarity of Clinical Texts
Hospitals amass crucial textual data for healthcare, often in disorganized forms within Electronic Health Record (EHR) systems. Measuring semantic similarity between clinical texts (STS) mitigates the associated problems by streamlining data, reducing redundancy, while preserving valuable information and highlighting new information.
Literature-based Medical Knowledge Discovery
This research designed an AI-driven solution, and developed a prototype system, to automate updating medical databases with new findings. It uses pharmacodynamic similarities between drugs to identify potentially beneficial drugs and drug categories for specific diseases, symptoms, or syndromes, despite lacking prior knowledge of such drugs during the model's training.
Deception in Language
We think of language as a medium of conveying information, but unfortunately, it is also often used to deceive. This research explored the detection of such deception in online reviews using deep interpretable linguistic properties.
Forensic Linguistics
Research into "idiolects" (the unique use of language by individuals) to identify authors, including the authorship of collaborative multi-author documents.
Personalized Healthcare
Healthcare faces a challenge in utilizing patient-specific data for personalized medicine. The AI developed in this research extracts relevant patient details from diverse data sources, facilitating tailored lab tests, detecting drug reactions, and linking symptoms to safe medications.
Network Outage Analysis
A collaboration with researchers in computer networks to extract network outage information and develop supervised machine learning approaches to categorize them into multiple causal categories.