Illinois Institute of Technology Project Earns $1.6 Million

Machine Learning


Shlomo Argamon and Kai Shu

Image: Shlomo Argamon and Kai Shu
opinion more

Credit: Illinois Institute of Technology

CHICAGO—May 18, 2023—Illinois Institute of Technology researchers have been awarded a $1.6 million contract to develop a breakthrough system for authentic author attribution and anonymization. The program, known as AUTHOR, promises to use natural language processing and machine learning to create a “stylistic fingerprint” for reliable identification, while also providing a robust solution for anonymization. Offers. With a wide range of applications, including counterintelligence, fighting misinformation, and even investigating the origins of ancient religious texts, the project represents a major advance in computational analysis.

A joint project with Charles River Analytics, Rensselaer Polytechnic Institute, Aston University, and the Howard Brain Science Foundation, the project is an $11.3 million pool allocated by the Human Interpretable Attribution of Text using Underlying Structure (HIATUS) program at the National Academy of Sciences. received funding from Intelligence Advanced Research Projects Activity (IARPA), a research organization within the Office of the Director of National Intelligence.

AUTHOR (Text Attribution, and Attribution Impairment while Providing Human-Oriented Rationale) aims to accurately capture an author’s unique writing style through a sophisticated blend of natural language processing and machine learning. The project is led by Shlomo Argamon, Computer Science Professor and Dean of the Computer Science Department at the Illinois Institute of Technology, and Kaishu, Assistant Professor of Computer Science, Gladwin Development Committee Chair.

“There are many types of attribution tasks,” says Argamon, who has more than 25 years of research experience in the field. “One is when you have a particular author you want to identify in various texts. Another is when you have a particular text you want to attribute to one of many potential authors. It’s simply a matter of determining whether the texts were written by the same person.”

Argamon and Shu also aim to address the growing urgency caused by malicious online activity and machine-generated misinformation.

“Using a large language model like GPT-3, we might be able to generate human-like text from these ‘bots,’” Xu says. “Our research explores deep generative models and style transfer techniques to explore the boundaries between human-written and machine-generated text.”

One of the core challenges the team is trying to overcome is the limitations of current methods of authorship analysis and obfuscation. Part of the problem is if the type of document in question differs from the known one, considering that there are inherent stylistic differences between various forms of documents, such as personal letters, academic papers, short stories, etc. Second, to identify the author.

“Current best practices are not very effective when the type of test document is different from the training document,” says Argamon. “We develop author models that incorporate such stylistic-domain dependencies to enable more generally effective attribution.”

This project also tackles the challenge of author name obfuscation, preserving the meaning of the text while changing its style. The team integrates deep learning and semantic knowledge representation to generate text that retains the meaning of the original content while being restyled. This dual feature (attribution and obfuscation) distinguishes AUTHOR from existing algorithms.

Unlike existing systems, AUTHOR provides a clear rationale for author identification systems, adding even more transparency and credibility to projects.


Disclaimer: AAAS and EurekAlert! We are not responsible for the accuracy of news releases posted on EurekAlert!. Use of information by contributing institutions or via the EurekAlert system.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *