On the impossibility of separating intelligence and judgment: Computational difficulties in filtering for AI tuning

As large-scale language models (LLMs) are increasingly introduced, there is concern that they can be misused to generate harmful content. Our research studies coordination challenges, focusing on filters that prevent the generation of unsafe information. Two natural points of intervention are filtering input prompts before they reach the model, and filtering output after they are generated. Our main results demonstrate the computational challenges in filtering both prompts and output. First, we show that there are LLMs without efficient prompt filters. Adversarial prompts that induce harmful behavior are easy to construct and computationally indistinguishable from benign prompts by efficient filters. The second main result identifies natural settings for which output filtering is computationally intractable. All of our separation results are based on cryptographic strength assumptions. In addition to these core findings, we formalize and study a relaxed relaxation approach and demonstrate further computational barriers. We conclude that security cannot be achieved by designing filters outside the internal structure (architecture and weights) of the LLM. In particular, black-box access to LLM is not sufficient. Based on our technical results, we argue that the intelligence of a tailored AI system is inseparable from its judgment.

† Ludwig-Maximilians-University of Munich (MCML)
‡ University of California, Berkeley
§ JPSM University of Maryland
¶ Stanford University

Source link

創建binance帳戶 commented on MEGA sconto del 34% su Amazon: Your article helped me a lot, is there any more re
binance registrering commented on Global Industrial Automation Services Market Size to Reach: Your point of view caught my eye and was very inte
binance commented on WestMetric Defends Controversial On-Page SEO Services for the Era of AI: I don't think the title of your article matches th
创建个人账户 commented on AI in CMO Strategy: Transforming Marketing Leadership: Can you be more specific about the content of your
binance account creation commented on The rise of Artificial Intelligence in Film & TV: Thank you for your sharing. I am worried that I la

On the impossibility of separating intelligence and judgment: Computational difficulties in filtering for AI tuning

RECENT POSTS

Pharma 4.0: Digital Integration, LIMS, and AI in the Lab

YouTube now automatically detects and labels AI videos even if the creator hasn’t published them

Hewlett Packard Enterprise (HPE) stock valuation after record profits and accelerated AI infrastructure growth

Related Posts