This section provides an overview of the research approaches and machine learning models/techniques employed in this study. Two traditional machine learning models are used as baseline models: naive Bayes and logistic regression. Their performance is compared to two state-of-the-art transformer-based models, Bert and Distilbert, which explain the temporal dependence of text data. The code for these models is published33 Additionally, all four models can be run with text input via online tools described in the Interactive Detection Tools section.
Approach and experimental design
As explained in the introduction, the AI content detection problem is broken down into two classification tasks that distinguish between human writing and AI-generated texts, and between human writing and AI reform texts. There are two rationales for this separation. First, certain universities may allow AI reformed application materials, but may ban AI-generated materials. This distinction arises from the consideration that reformed AI documents can be considered human origin. AI tools can help you fix grammar mistakes and improve your writing style. Therefore, you need to distinguish between the content that these two types of AI operate on. Second, although we can approach the problem using a multiclass classifier for three document types, the similarity between AI generated text and AI reformed text makes this approach more difficult and brings ambiguity to class boundaries. Furthermore, metrics such as Precision, Recall, and F1-Score are most effectively designed for binary classification models, so assessing the performance of multiclass classifiers is not easy.
For each classification task, two experiments were conducted to assess the effectiveness of domain-specific models and the generalizability of their cross-domain. In the first experiment (results detailed in Table 2), only models were trained using educational data (IE, LOR, SOI). The training and testing dataset was created by random 4:1 splitting. Model performance was assessed using five metrics: overall accuracy, recall, specificity, accuracy, and F-1 score. In addition to assessing models of combined test data (i.e., SOI+LOR), we also analyzed individual document types (LOR and SOI) and performance over 12,000 cross-domain, and analyzed balanced examples of GPT-Wiki-Intro datasets.
In the second experiment, we replicated the same procedure, but augmented the training data with a disjointed set of 48,000 balanced instances from the GPT-Wiki-Intro dataset. The model trained with this mixed-domain dataset (i.e., Lors+Sois+Wiki data) showed significant improvements to the Wiki dataset with minimal impact on educational data. This result reinforces the hypothesis that developing AI-containing detectors within a particular domain is feasible. The results of the second experiment are shown in Table 3.
Machine Learning Algorithms
This section provides a brief introduction to the machine learning models used in this study. We chose Bert and Distilbert for wide adoption and proven reliability in NLP tasks. Although these models are outdated, they are sufficient for our tasks and provide both computational efficiency and accessibility. The same rationale applies to machine learning models selection. Naive Bayes (NB) and logistic regression (LR) were used, both of which showed effectiveness in detecting AI-generated content.
Naive Bays
Naive Bayes (NB)10 It is a probabilistic classification algorithm built on Bayes' theorem, with features relying on “naive” assumptions \(\{x_1, x_2, \dots, x_n \} \) Considering class labels, they are conditionally independent y. Mathematically,
\(p(x_1, x_2, \dots, x_n | y)=\prod_{i=1}^{n} p(x_i | y)\)
Although the above assumptions may not be retained in all real scenarios, NB often serves as a powerful baseline for text classification tasks. NB uses Bayes' theorem to calculate the probability of each class given the observed feature, and predict the class of unlabeled data with the highest probability \(\hat {y} \),In other words,
\(\hat {y} = {\text {*}} {arg \,max} _i({p(y)\cdot p(x_i | y)})\)\)
In the data preprocessing phase of the NB model, term frequency inverse document frequency (TF-IDF) was applied34 Vectorize text input. TF-IDF converts raw textual data into numerical features by taking into account two important factors. The frequency of terms in a document (term frequency) and their importance (inverse document frequency) across the dataset. This method allows the model to prioritize highly discriminatory terms in the classification task by capturing the relative importance of words while reducing the influence of general terms.
Logistic Regression
Logistic Regression (LR)11) It is an algorithm widely used in machine learning and statistics. It uses a sigmoid function, \(\sigma(z)= \frac {1} {1+e^{-z}} \)model the relationship between input functions and the probability belonging to a positive class (Class 1). input z The sigmoid functions are modeled as linear combinations of independent variables. \(\{x_1, x_2, \dots, x_n \} \),In other words,
\(z = w_0 + w_1*x_1 + w_2*x_2 + \ dots + w_n*x_n \)
where \(\{w_0, w_1, w_2, \dots, w_n \} \) Model parameters.
The LR model generates predictions for new data by calculating conditional probabilities related to positive classes based on observed input functions (i.e. \(p(y = 1 |(x_1, x_2, \dots, x_n)\)). If this probability is large to a given threshold (usually 0.5), or the model is classified as class 1. Otherwise, predict class 0. Logistic regression is evaluated for its simplicity and interpretability, but the assumption of a linear relationship between function and logging of target variables is not valid in all cases. The same TF-IDF technology was applied as used by Naive Bayes to prepare training data for LR models.
Bart
Bidirectional encoder representation from a transformer (BERT)12 It is one of the most notable, pre-trained language models in the NLP domain. The innovation in this model lies in the ability to capture the bidirectional context of words in sentences, allowing you to understand the complexity of language, such as nuance, word meaning, and context. As a result, Bert outperforms unidirectional models such as RNN and LSTM in a variety of NLP tasks, including sentiment analysis, answering questions, language translation, and text summary.
Bert's architecture is built on a transformer model35which introduced the concept of self-service mechanisms. These mechanisms allow Bert to assign different levels of importance to different words in a sentence, facilitating extraction of important information and contexts. Bert's pre-training involves two important tasks: masked language modeling and prediction of the next statement. In the former, Bert learns to predict missing words in sentences, and forces them to understand the relationships between words in the context. In the latter, Bert learns to determine whether a pair of statements logically follows each other, and to grasp document-level context.
One unique feature of Bert is that it allows you to fine-tune it to specific NLP tasks with relatively small amounts of task-specific data. This adaptability makes Bert an option for researchers and developers of a variety of applications36,37,38,39. In this study, we fine-tuned models missing from pre-trained Bert bases with a dropout rate of 55% in the final layer. This dropout level was empirically selected.
Distilbert
Distilbert13 It's a variant of the Bert model35is designed to be more compact and computationally efficient while maintaining comparable performance. Distilbert is built on the same transformer architecture as BERT and uses a stack of transformer encoder layers to process and encode input text data. The output is then used for various downstream NLP tasks, such as text classification, sentiment analysis, and named entity recognition.
The main innovation at Distilbert is the use of knowledge distillation. This involves training smaller “distillation” models to mimic the behavior of more trained models. Distilbert is trained to mimic Bert's output, achieving compactness and efficiency by reducing the number of parameters compared to Bert. Typically, 40% less parameters make training and classifying examples faster, but much of Bert's performance is preserved. This gives you the priority option for situations with constrained computational resources, such as deploying the NLP model in resource-limited environments. Similar to the Bert model, we trained the Distilbert model by applying a dropout rate of 55% to the final hidden layer.
