Telling an AI model it’s an expert makes things worse • The Register

Many people start working with AI by making the machine imagine itself to be an expert at the task they want it to perform, but boffins found that this technique can be futile.

Persona-based prompts, which use instructions like “You are a machine learning expert” in model prompts, date back to 2023, when researchers began investigating how role-playing instructions affect the output of AI models.

It’s now common to find guided guides online that include sentences like, “You are a professional full-stack developer tasked with building a complete, production-ready full-stack web application from scratch.”

However, scholars who have studied this approach report that it does not always yield superior results.

In a preprint paper titled “Expert personas improve LLM alignment but reduce accuracy: Bootstrapping intent-based persona routing using PRISM,” researchers at the University of Southern California (USC) say they found that persona-based prompts are task-dependent, which explains the mixed results.

For tasks that rely on coordination, such as writing, role-playing, and safety, personas improve model performance. For tasks that rely on prior training, such as math or coding, this technique produces worse results.

The reason for this appears to be that telling a model that it is an expert in a field does not actually confer expertise and does not add facts to the training data.

In fact, telling a model that you are an expert in a particular field impedes the model’s ability to retrieve facts from pre-training data.

The researchers tested the persona-based prompts using the Measuring Massive Multitask Language Understanding (MMLU) benchmark, a means of evaluating LLM performance, and found that “when LLMs are asked to decide between multiple-choice answers, the expert persona consistently underperforms the base model across all four subject categories (overall accuracy: 68.0 percent vs. 71.6 percent for the base model). A possible explanation is that the persona prefix is activating the model’s “follow instructions” mode that would otherwise be devoted to recalling facts. ”

However, persona-based guidance can help guide the model to a response that is satisfactory for LLM-based judges to assess integrity. For example, the authors note that “a dedicated ‘safety monitor’ persona increases attack denial rates across all three safety benchmarks, with the largest increase seen in JailbreakBench (17.7 percentage points increase from 53.2 percent to 70.9 percent).”

said Zizhao Hu, a doctoral student at USC and one of the study’s co-authors. register Based on the research, asking AI to adopt the persona of an expert programmer does not help the quality or usefulness of the code, he said in an email.

But pointing to the instant guidance linked above, Hu said, “Many other aspects, such as UI preferences, project architecture, and tooling preferences, lean toward adjustment and can benefit from detailed personas.”

“In the example provided, we believe that a general expert persona such as ‘You are a full-stack professional developer’ is not necessary, but fine-grained and personalized project requirements could help the model generate code that meets the user’s requirements.”

Given the effectiveness of expertise prompts, the researchers (Fu and colleagues Mohammad Rostami and Jesse Thomasson) proposed a technique called PRISM (Persona Routing with Intent-Based Self-Modeling) that attempts to leverage the benefits of expert personas without doing any harm.

“We are using gated LoRA [low-rank adaptation] In this mechanism, the base model is fully preserved and used over generations relying on pre-trained knowledge,” he explained, adding, “This decision process is learned by gates.”

The LoRA adapter is activated when persona-based behavior improves output, and otherwise falls back to the unchanged model.

The researchers designed PRISM to avoid the tradeoffs of other approaches: prompt-based routing, which applies an expert persona during inference, and supervised fine-tuning, which incorporates behavior into model weights.

When asked if there was a way to generalize about effective prompting methods, Hu said: “While we can’t say for sure about general prompts, the potential takeaway from our findings with expert persona prompts is: ‘If you care about integrity (safety, rules, structure following, etc.), make your requirements specific. If you care more about accuracy and facts, don’t add anything, just submit the query.'” ®

Source link