In April, we released Meta's Llama 3 generative large-scale language model, which is known to be open source.
For decades, research in artificial intelligence (AI) has been hit-or-miss, alternating between decades of euphoric optimism and falling into a rut known as the AI winter when its promises don't materialize. I have made progress by accident.
But this season is shaping up to be the summer of AI. April saw the release of Meta's famously open-source Llama 3 generative large-scale language model. A few days ago, Google showed off its Project Astra and Gemini 1.5 generative AI (GenAI) models at the Google I/O developer conference, leading to Apple's 2024 Worldwide Developers Conference in June, where Apple typically announces updates and new features. Expectations are rising. Mobile and desktop operating system and other product announcements. While technology companies big and small are releasing a flurry of AI products, Apple, known as a latecomer, has so far remained remarkably quiet.
But I'm probably right in saying that the highlight of this season so far has been the release of OpenAI's first GPT-4o GenAI model to run on Google I/O. The “o” stands for omni (short for omnimodal), meaning it can understand and generate images and audio as well as text input and output like previous models. The demo allows multiple users to talk and interact through the camera's video feed and on-screen activity.
For me, the most fun thing to watch was the different intonations that GPT-4o could “speak” to the user, expressing excitement, patience, encouragement, mystery, drama, song, and more. This promises to make interactions more natural and, dare I say it, fun. In fact, some commentators have criticized GPT-4o for being too frivolous in its default mode.
OpenAI's preview of GPT-4o included several demos, including one that solves a simple linear equation in one variable. Along with this, a series of short demo videos have been released online. One of his shows how Sal (Man) Khan of Khan Academy fame and his son receive tutoring and interact by answering questions about right triangles interactively through screen sharing with an iPad and pen/stylus. It was a featured item. GPT-4o was able to interact and respond to students' on-screen annotations and activities, guiding students step-by-step toward answers like a good tutor.
Although these demos look very promising, they are still too few in number to make clear claims about their usefulness for students and the education sector in general. However, given that the two use cases demonstrated were education-related, this is an application area that is top of mind for developers. The pace of improving features and adding guardrails is fast, but we didn't expect this kind of multimodal GenAI model to arrive so soon, but here we are!
From an educator's perspective, what stands out to me most about this generation of models is their ability to do more than just spit out answers. Instead, just like a human tutor, it can nudge and guide the learner, encouraging them to understand the problem step by step.
Duolingo is the world's largest free language learning platform in app format. In an interview with Business Channel after GPT-4o's release, Duolingo's CEO, Luis von Ahn, explained how Duolingo plans to replace person-to-person chat functionality with chat via GPT-4o. He told me what he was doing. Learners are hesitant to use the existing one-on-one chat feature, likely due to social anxiety, fear of embarrassment, and similar factors. Knowing that they are talking to a (good) chatbot might help address this issue.
However, not all education service providers are winners. Chegg describes itself as a “24/7 homework helper.” In academia, investigations often reveal that it is involved in plagiarism cases. Chegg's fortune has risen during the pandemic. The stock hit an all-time high of over $113 in February 2021, but has been steadily declining since then. The release of multiple his GenAI models and increased access to them by the general public likely contributed to its decline. Two days ago, the company's stock closed at $4.38, an 11-year low. Why (blindly) when in reality it is easier to show your problem to a private tutor who is available 24 hours a day and let him or her tell you the solution without personal judgment? Should I copy the answer?
The difference between winners and losers for education service providers will also extend to learners. From a resource perspective, accessing a GenAI model requires at least access to a smartphone (or computer or tablet), internet access, and (depending on the specific model) a subscription, including a credit card. Is required.
Furthermore, although some GenAI models currently support dozens of languages, the primary language in which they are developed is English, the language of the Internet. That's fine for those who know English or a well-supported language, but what about everyone else? In terms of numbers, various reports place Urdu as the 10th or 11th most widely spoken language in the world. supported languages, but is not expected to be supported immediately.
Such decisions go back to economic conditions and relate to prioritizing more favorable markets and customers' ability to pay. In our situation, we expect the gap between the haves, who have the resources and the necessary English language skills, and the have-nots, who do not have both, to widen further. Even if a model like GPT-4o were made available for Urdu tomorrow, it would exclude millions of learners who are fluent in the local language but not Urdu.
As we navigate this summer of AI, the transformative potential of multimodal GenAI models such as GPT-4o, Project Astra, and Gemini in education is undoubtedly exciting. The ability to provide personalized, interactive tutoring will revolutionize learning, making education more engaging and accessible for many people. However, disparities in access to these advanced tools highlight significant challenges. In Pakistan, where economic constraints and language barriers are prevalent, the gap between those who can benefit from these technologies and those who cannot is likely to widen further.
Addressing these gaps is essential for policy makers, educators, and technology developers. Several groups in the private sector are working on the necessary pieces, but addressing this challenge will require significant investment, some of which I discussed in a previous editorial (“The AI Frontier in WGS ”, The News International, March 1, 2024). Investing in infrastructure to improve internet access and affordability, fostering digital literacy, and advocating for the development of AI tools in Urdu and local languages are essential steps. Additionally, public-private partnerships can play an important role in ensuring that the benefits of AI are equitably shared.
The potential for AI in education is enormous and is slowly coming into focus and realization, but its promise must be comprehensive. By actively working to close the digital divide, we can ensure that all students, regardless of their socio-economic background, have the opportunity to benefit from the educational advances that AI provides. Only then can we truly harness the power of this technological revolution to build a brighter and more equitable future for all learners in Pakistan.
The author (she) has a doctorate in education.