ChatGPT feels like everything, everywhere, all at once (reusing great movie titles but inserting punctuation). It is unclear how Generative Artificial Intelligence, an AI that creates new text and images (as you can see with ChatGPT, Bing, or DALL-E) will emerge. Will artificial superintelligence be created to replace humans, or harness its power to improve learning processes and outcomes?
No one can predict that future with certainty, but one thing we do know is that for generative AI to have any value, it needs large amounts of high-quality, relevant data. It means that The field of educational science also knows that such large-scale, high-quality data will not exist everywhere, and not all at once. However, national assessments of educational progressBetter known as the Nation’s Report Card, it provides carefully collected, valid and reliable data and rich contextual information about learners while protecting student privacy. In short, NAEP can begin to meet the data needs of modern education and research. And the National Evaluation and Management Commission, which sets NAEP policy and meets this week, should prioritize the release of these data.
As is often the case, scientific progress is outpacing governments, and this is one area where we have everything we need to keep up. Given the potential of these taxpayer-funded data to improve support for educators and outcomes for students, there is a clear obligation to make the information available to researchers. As advocates of high-quality, high-impact research, we urge you to take that step.
Since 1969, NAEP has measured student performance in mathematics, reading, science, writing, art, history, and civics. NAEP uses a combination of traditional forced selection items. student essay. Short, open-ended answers. and simulation. NAEP also uses a digital-based assessment platform to collect “process data” about how students interact with items. Additionally, NAEP collects detailed demographic and self-reported information. This includes basic information (race/ethnicity, gender, etc.) and more detailed information (English learner status, IEP status, disability accommodation, etc.). NAEP’s data mine stores hundreds of thousands of examples of student work combined with detailed contextual information about students, schools, and communities. These data should be used to improve AI algorithms and, as a result, improve student outcomes.
Automatic scoring is one of the most widely researched Deployed the use of AI in education. But it’s the floor, not the ceiling, that replicates human scoring. Researchers have used NAEP data to classify mathematics misconceptions, identify ways to improve student writing, and understand key themes in student writing about civic engagement, as well as other scoring strategies. It allows you to explore complex structures that have far-reaching effects.
With a large NAEP sample and detailed situational variables for test takers, schools, and families, you can also learn how many factors affect student performance.
NAEP can begin to meet the data needs of modern education and research.
Protecting student privacy is of course important, but it’s also no reason to delay releasing the data, as some have argued. Many safeguards have already been put in place. The fact that NAEP results are reported at the group level means that privacy is easier to protect than individual assessments, as all results are summaries across many individuals. Moreover, the long history of NAEP and its procedures minimize risks. For example, information that could identify a particular test taker is removed even before the data leaves the school. There are known solutions to avoid revealing the identity of individual students as a result of subgrouping a small number of students. Open-ended responses require some caution. NAEP has no control over what students enter into these fields, and sometimes students go a little off topic and reveal personal data that needs to be erased (perhaps “over My uncle Frank Johnson, who lives in London, was once arrested” for DUI”).
The Institute of Educational Sciences, where we work, is very sensitive to privacy issues in NAEP data. A recently announced contest (with a $100,000 prize) challenges researchers to use AI to solve the difficult problem of replicating human-assigned scores for open-ended math items. Before NAEP mathematics assessment data was released to participants, personally identifiable information and sensitive language were stripped from the information using an automated human-based review. The review confirmed that no other types of sensitive information, such as student identities or social media handles, were disclosed. The dataset is further processed through internal controls to ensure it is safe enough for release.
Decisions about data privacy should be made considering the relative risks and rewards. The value of using NAEP’s data treasure trove is high, and given its history and design, the risk to student privacy is low. That is, privacy concerns should not prevent the release of her NAEP data to qualified researchers.
Research using NAEP data has the potential to improve NAEP itself, but more importantly, it answers questions about how students learn. For her NAEP as an assessment, she uses the latest research methods to review and modify questions to identify items that particular student groups find difficult due to language or issues unrelated to underlying structure. can. It goes beyond standard psychometric analysis through the inclusion of rich contextual data.
NAEP data may have broader applicability, especially in the context of large-scale language models, a fundamental approach used in generative AI. Most existing large-scale language models are based on data collected from across the web. The company that created ChatGPT, OpenAI, has not disclosed the specific data sources used to train the model, but ChatGPT is reportedly trained. Use information from web texts, books, news articles, social media posts, code snippets, and more. There are many examples of ChatGPT providing questionable or harmful responses. according to the prompts given. An equally serious (and related) problem is that large language models are unable to access enough student work, leaving students severely anemic where they are most needed. NAEP data can help fine-tune these models, making them more accurate and useful.
We are only just beginning to see how generative AI will transform the future of education and research, but one thing is clear. NAEP data. Must Be part of that future. Exploring NAEP’s data treasure trove is easy. In doing so, she will be able to tap into the creativity of the research community to consider what insights can be derived from her NAEP data that are useful for educators.
NAEP is approaching a $200 million annual business scale. It provides valuable insight into student performance, but it still doesn’t quite live up to its promise.