Create an AI-Driven Movie Quiz with Gemini LLM, Python, FastAPI, Pydantic, RAG and more | by Volker Janz

While in the Gemini Movie Detectives project, the prompt is enhanced with external API data from The Movie Database, RAG typically involves the use of vector indexes to streamline this process. It is using much more complex documents as well as a much higher amount of data for enhancement. Thus, these indexes act like signposts, guiding the system to relevant external sources quickly.

In this project, it is therefore a mini version of RAG but showing the basic idea at least, demonstrating the power of external data to augment LLM capabilities.

In more general terms, RAG is a very important concept, especially when crafting trivia quizzes or educational games using LLMs like Gemini. This concept can avoid the risk of false positives, asking wrong questions, or misinterpreting answers from the users.

Here are some open-source projects that might be helpful when approaching RAG in one of your projects:

txtai: All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows.
LangChain: LangChain is a framework for developing applications powered by large language models (LLMs).
Qdrant: Vector Search Engine for the next generation of AI applications.
Weaviate: Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.

Of course, with the potential value of this approach for LLM-based applications, there are many more open- and close-source alternatives, but with these, you should be able to get your research on the topic started.

Now that the main concepts are clear, let’s have a closer look how the project was created and how dependencies are managed in general.

The three main tasks Poetry can help you with are: Build, Publish and Track. The idea is to have a deterministic way to manage dependencies, to share your project and to track dependency states.

Poetry also handles the creation of virtual environments for you. Per default, those are in a centralized folder within your system. However, if you prefer to have the virtual environment of project in the project folder, like I do, it is a simple config change:

poetry config virtualenvs.in-project true

With poetry new you can then create a new Python project. It will create a virtual environment linking you systems default Python. If you combine this with pyenv, you get a flexible way to create projects using specific versions. Alternatively, you can also tell Poetry directly which Python version to use: poetry env use /full/path/to/python.

Once you have a new project, you can use poetry add to add dependencies to it.

With this, I created the project for Gemini Movie Detectives:

poetry config virtualenvs.in-project true
poetry new gemini-movie-detectives-apicd gemini-movie-detectives-api
poetry add 'uvicorn[standard]'
poetry add fastapi
poetry add pydantic-settings
poetry add httpx
poetry add 'google-cloud-aiplatform>=1.38'
poetry add jinja2

The metadata about your projects, including the dependencies with the respective versions, are stored in the poetry.toml and poetry.lock files. I added more dependencies later, which resulted in the following poetry.toml for the project:

[tool.poetry]
name = "gemini-movie-detectives-api"
version = "0.1.0"
description = "Use Gemini Pro LLM via VertexAI to create an engaging quiz game incorporating TMDB API data"
authors = ["Volker Janz <volker@janz.sh>"]
readme = "README.md"[tool.poetry.dependencies]
python = "^3.12"
fastapi = "^0.110.1"
uvicorn = {extras = ["standard"], version = "^0.29.0"}
python-dotenv = "^1.0.1"
httpx = "^0.27.0"
pydantic-settings = "^2.2.1"
google-cloud-aiplatform = ">=1.38"
jinja2 = "^3.1.3"
ruff = "^0.3.5"
pre-commit = "^3.7.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

FastAPI is a Python framework that allows for rapid API development. Built on open standards, it offers a seamless experience without new syntax to learn. With automatic documentation generation, robust validation, and integrated security, FastAPI streamlines development while ensuring great performance.

Implementing the API for the Gemini Movie Detectives projects, I simply started from a Hello World application and extended it from there. Here is how to get started:

from fastapi import FastAPIapp = FastAPI()
@app.get("/")
def read_root():
return {"Hello": "World"}

Assuming you also keep the virtual environment within the project folder as .venv/ and use uvicorn, this is how to start the API with the reload feature enabled, in order to test code changes without the need of a restart:

source .venv/bin/activate
uvicorn gemini_movie_detectives_api.main:app --reload
curl -s localhost:8000 | jq .

If you have not yet installed jq, I highly recommend doing it now. I might cover this wonderful JSON Swiss Army knife in a future article. This is how the response looks like:

From here, you can develop your API endpoints as needed. This is how the API endpoint implementation to start a movie quiz in Gemini Movie Detectives looks like for example:

@app.post('/quiz')
@rate_limit
@retry(max_retries=settings.quiz_max_retries)
def start_quiz(quiz_config: QuizConfig = QuizConfig()):
movie = tmdb_client.get_random_movie(
page_min=_get_page_min(quiz_config.popularity),
page_max=_get_page_max(quiz_config.popularity),
vote_avg_min=quiz_config.vote_avg_min,
vote_count_min=quiz_config.vote_count_min
)if not movie:
logger.info('could not find movie with quiz config: %s', quiz_config.dict())
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail='No movie found with given criteria')
try:
genres = [genre['name'] for genre in movie['genres']]
prompt = prompt_generator.generate_question_prompt(
movie_title=movie['title'],
language=get_language_by_name(quiz_config.language),
personality=get_personality_by_name(quiz_config.personality),
tagline=movie['tagline'],
overview=movie['overview'],
genres=', '.join(genres),
budget=movie['budget'],
revenue=movie['revenue'],
average_rating=movie['vote_average'],
rating_count=movie['vote_count'],
release_date=movie['release_date'],
runtime=movie['runtime']
)
chat = gemini_client.start_chat()
logger.debug('starting quiz with generated prompt: %s', prompt)
gemini_reply = gemini_client.get_chat_response(chat, prompt)
gemini_question = gemini_client.parse_gemini_question(gemini_reply)
quiz_id = str(uuid.uuid4())
session_cache[quiz_id] = SessionData(
quiz_id=quiz_id,
chat=chat,
question=gemini_question,
movie=movie,
started_at=datetime.now()
)
return StartQuizResponse(quiz_id=quiz_id, question=gemini_question, movie=movie)
except GoogleAPIError as e:
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f'Google API error: {e}')
except Exception as e:
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f'Internal server error: {e}')

Within this code, you can see already three of the main components of the backend:

tmdb_client: A client I implemented using httpx to fetch data from The Movie Database (TMDB).
prompt_generator: A class that helps to generate modular prompts based on Jinja templates.
gemini_client: A client to interact with the Gemini LLM via VertexAI in Google Cloud.

We will look at these components in detail later, but first some more helpful insights regarding the usage of FastAPI.

FastAPI makes it really easy to define the HTTP method and data to be transferred to the backend. For this particular function, I expect a POST request as this creates a new quiz. This can be done with the post decorator:

@app.post('/quiz')

Also, I am expecting some data within the request sent as JSON in the body. In this case, I am expecting an instance of QuizConfig as JSON. I simply defined QuizConfig as a subclass of BaseModel from Pydantic (will be covered later) and with that, I can pass it in the API function and FastAPI will do the rest:

class QuizConfig(BaseModel):
vote_avg_min: float = Field(5.0, ge=0.0, le=9.0)
vote_count_min: float = Field(1000.0, ge=0.0)
popularity: int = Field(1, ge=1, le=3)
personality: str = Personality.DEFAULT.name
language: str = Language.DEFAULT.name
# ...
def start_quiz(quiz_config: QuizConfig = QuizConfig()):

Furthermore, you might notice two custom decorators:

@rate_limit
@retry(max_retries=settings.quiz_max_retries)

These I implemented to reduce duplicate code. They wrap the API function to retry the function in case of errors and to introduce a global rate limit of how many movie quizzes can be started per day.

What I also liked personally is the error handling with FastAPI. You can simply raise a HTTPException, give it the desired status code and the user will then receive a proper response, for example, if no movie could be found with a given configuration:

raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail='No movie found with given criteria')

With this, you should have an overview of creating an API like the one for Gemini Movie Detectives with FastAPI. Keep in mind: all code is open-source, so feel free to have a look at the API repository on Github.

One of the main challenges with todays AI/ML projects is data quality. But that does not only apply to ETL/ELT pipelines, which prepare datasets to be used in model training or prediction, but also to the AI/ML application itself. Using Python for example usually enables Data Engineers and Scientist to get a reasonable result with little code but being (mostly) dynamically typed, Python lacks of data validation when used in a naive way.

That is why in this project, I combined FastAPI with Pydantic, a powerful data validation library for Python. The goal was to make the API lightweight but strict and strong, when it comes to data quality and validation. Instead of plain dictionaries for example, the Movie Detectives API strictly uses custom classes inherited from the BaseModel provided by Pydantic. This is the configuration for a quiz for example:

class QuizConfig(BaseModel):
vote_avg_min: float = Field(5.0, ge=0.0, le=9.0)
vote_count_min: float = Field(1000.0, ge=0.0)
popularity: int = Field(1, ge=1, le=3)
personality: str = Personality.DEFAULT.name
language: str = Language.DEFAULT.name

This example illustrates, how not only correct type is ensured, but also further validation is applied to the actual values.

Furthermore, up-to-date Python features, like StrEnum are used to distinguish certain types, like personalities:

class Personality(StrEnum):
DEFAULT = 'default.jinja'
CHRISTMAS = 'christmas.jinja'
SCIENTIST = 'scientist.jinja'
DAD = 'dad.jinja'

Also, duplicate code is avoided by defining custom decorators. For example, the following decorator limits the number of quiz sessions today, to have control over GCP costs:

call_count = 0
last_reset_time = datetime.now()def rate_limit(func: callable) -> callable:
@wraps(func)
def wrapper(*args, **kwargs) -> callable:
global call_count
global last_reset_time
# reset call count if the day has changed
if datetime.now().date() > last_reset_time.date():
call_count = 0
last_reset_time = datetime.now()
if call_count >= settings.quiz_rate_limit:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail='Daily limit reached')
call_count += 1
return func(*args, **kwargs)
return wrapper

It is then simply applied to the related API function:

@app.post('/quiz')
@rate_limit
@retry(max_retries=settings.quiz_max_retries)
def start_quiz(quiz_config: QuizConfig = QuizConfig()):

The combination of up-to-date Python features and libraries, such as FastAPI, Pydantic or Ruff makes the backend less verbose but still very stable and ensures a certain data quality, to ensure the LLM output has the expected quality.

The TMDB Client class is using httpx to perform requests against the TMDB API.

httpx is a rising star in the world of Python libraries. While requests has long been the go-to choice for making HTTP requests, httpx offers a valid alternative. One of its key strengths is asynchronous functionality. httpx allows you to write code that can handle multiple requests concurrently, potentially leading to significant performance improvements in applications that deal with a high volume of HTTP interactions. Additionally, httpx aims for broad compatibility with requests, making it easier for developers to pick it up.

In case of Gemini Movie Detectives, there are two main requests:

get_movies: Get a list of random movies based on specific settings, like average number of votes
get_movie_details: Get details for a specific movie to be used in a quiz

In order to reduce the amount of external requests, the latter one uses the lru_cache decorator, which stands for “Least Recently Used cache”. It’s used to cache the results of function calls so that if the same inputs occur again, the function doesn’t have to recompute the result. Instead, it returns the cached result, which can significantly improve the performance of the program, especially for functions with expensive computations. In our case, we cache the details for 1024 movies, so if 2 players get the same movie, we do not need to make a request again:

@lru_cache(maxsize=1024)
def get_movie_details(self, movie_id: int):
response = httpx.get(f'https://api.themoviedb.org/3/movie/{movie_id}', headers={
'Authorization': f'Bearer {self.tmdb_api_key}'
}, params={
'language': 'en-US'
})movie = response.json()
movie['poster_url'] = self.get_poster_url(movie['poster_path'])
return movie

Accessing data from The Movie Database (TMDB) is for free for non-commercial usage, you can simply generate an API key and start making requests.

Before Gemini via VertexAI can be used, you need a Google Cloud project with VertexAI enabled and a Service Account with sufficient access together with its JSON key file.

After creating a new project, navigate to APIs & Services –> Enable APIs and service –> search for VertexAI API –> Enable.

To create a Service Account, navigate to IAM & Admin –> Service Accounts –> Create service account. Choose a proper name and go to the next step.

Now ensure to assign the account the pre-defined role Vertex AI User.

Finally you can generate and download the JSON key file by clicking on the new user –> Keys –> Add Key –> Create new key –> JSON. With this file, you are good to go.

Using Gemini from Google with Python via VertexAI starts by adding the necessary dependency to the project:

poetry add 'google-cloud-aiplatform>=1.38'

With that, you can import and initialize vertexai with your JSON key file. Also you can load a model, like the newly released Gemini 1.5 Pro model, and start a chat session like this:

import vertexai
from google.oauth2.service_account import Credentials
from vertexai.generative_models import GenerativeModelproject_id = "my-project-id"
location = "us-central1"
credentials = Credentials.from_service_account_file("credentials.json")
model = "gemini-1.0-pro"
vertexai.init(project=project_id, location=location, credentials=credentials)
model = GenerativeModel(model)
chat_session = model.start_chat()

You can now use chat.send_message() to send a prompt to the model. However, since you get the response in chunks of data, I recommend using a little helper function, so that you simply get the full response as one String:

def get_chat_response(chat: ChatSession, prompt: str) -> str:
text_response = []
responses = chat.send_message(prompt, stream=True)
for chunk in responses:
text_response.append(chunk.text)
return ''.join(text_response)

A full example can then look like this:

import vertexai
from google.oauth2.service_account import Credentials
from vertexai.generative_models import GenerativeModel, ChatSessionproject_id = "my-project-id"
location = "us-central1"
credentials = Credentials.from_service_account_file("credentials.json")
model = "gemini-1.0-pro"
vertexai.init(project=project_id, location=location, credentials=credentials)
model = GenerativeModel(model)
chat_session = model.start_chat()
def get_chat_response(chat: ChatSession, prompt: str) -> str:
text_response = []
responses = chat.send_message(prompt, stream=True)
for chunk in responses:
text_response.append(chunk.text)
return ''.join(text_response)
response = get_chat_response(
chat_session,
"How to say 'you are awesome' in Spanish?"
)
print(response)

Running this, Gemini gave me the following response:

I agree with Gemini:

Eres increíble

Another hint when using this: you can also configure the model generation by passing a configuration to the generation_config parameter as part of the send_message function. For example:

generation_config = {
'temperature': 0.5
}responses = chat.send_message(
prompt,
generation_config=generation_config,
stream=True
)

I am using this in Gemini Movie Detectives to set the temperature to 0.5, which gave me best results. In this context temperature means: how creative are the generated responses by Gemini. The value must be between 0.0 and 1.0, whereas closer to 1.0 means more creativity.

One of the main challenges apart from sending a prompt and receive the reply from Gemini is to parse the reply in order to extract the relevant information.

One learning from the project is:

Specify a format for Gemini, which does not rely on exact words but uses key symbols to separate information elements

For example, the question prompt for Gemini contains this instruction:

Your reply must only consist of three lines! You must only reply strictly using the following template for the three lines:
Question: <Your question>
Hint 1: <The first hint to help the participants>
Hint 2: <The second hint to get the title more easily>

The naive approach would be, to parse the answer by looking for a line that starts with Question:. However, if we use another language, like German, the reply would look like: Antwort:.

Instead, focus on the structure and key symbols. Read the reply like this:

It has 3 lines
The first line is the question
Second line the first hint
Third line the second hint
Key and value are separated by :

With this approach, the reply can be parsed language agnostic, and this is my implementation in the actual client:

@staticmethod
def parse_gemini_question(gemini_reply: str) -> GeminiQuestion:
result = re.findall(r'[^:]+: ([^\n]+)', gemini_reply, re.MULTILINE)
if len(result) != 3:
msg = f'Gemini replied with an unexpected format. Gemini reply: {gemini_reply}'
logger.warning(msg)
raise ValueError(msg)question = result[0]
hint1 = result[1]
hint2 = result[2]
return GeminiQuestion(question=question, hint1=hint1, hint2=hint2)

In the future, the parsing of responses will become even easier. During the Google Cloud Next ’24 conference, Google announced that Gemini 1.5 Pro is now publicly available and with that, they also announced some features including a JSON mode to have responses in JSON format. Checkout this article for more details.

Apart from that, I wrapped the Gemini client into a configurable class. You can find the full implementation open-source on Github.

The Prompt Generator is a class wich combines and renders Jinja2 template files to create a modular prompt.

There are two base templates: one for generating the question and one for evaluating the answer. Apart from that, there is a metadata template to enrich the prompt with up-to-date movie data. Furthermore, there are language and personality templates, organized in separate folders with a template file for each option.

Using Jinja2 allows to have advanced features like template inheritance, which is used for the metadata.

This makes it easy to extend this component, not only with more options for personalities and languages, but also to extract it into its own open-source project to make it available for other Gemini projects.

The Gemini Movie Detectives frontend is split into four main components and uses vue-router to navigate between them.

The Home component simply displays the welcome message.

The Quiz component displays the quiz itself and talks to the API via fetch. To create a quiz, it sends a POST request to api/quiz with the desired settings. The backend is then selecting a random movie based on the user settings, creates the prompt with the modular prompt generator, uses Gemini to generate the question and hints and finally returns everything back to the component so that the quiz can be rendered.

Additionally, each quiz gets a session ID assigned in the backend and is stored in a limited LRU cache.

For debugging purposes, this component fetches data from the api/sessions endpoint. This returns all active sessions from the cache.

This component displays statistics about the service. However, so far there is only one category of data displayed, which is the quiz limit. To limit the costs for VertexAI and GCP usage in general, there is a daily limit of quiz sessions, which will reset with the first quiz of the next day. Data is retrieved form the api/limit endpoint.

Of course using the frontend is a nice way to interact with the application, but it is also possible to just use the API.

The following example shows how to start a quiz via the API using the Santa Claus / Christmas personality:

curl -s -X POST https://movie-detectives.com/api/quiz \
-H 'Content-Type: application/json' \
-d '{"vote_avg_min": 5.0, "vote_count_min": 1000.0, "popularity": 3, "personality": "christmas"}' | jq .

{
"quiz_id": "e1d298c3-fcb0-4ebe-8836-a22a51f87dc6",
"question": {
"question": "Ho ho ho, this movie takes place in a world of dreams, just like the dreams children have on Christmas Eve after seeing Santa Claus! It's about a team who enters people's dreams to steal their secrets. Can you guess the movie? Merry Christmas!",
"hint1": "The main character is like a skilled elf, sneaking into people's minds instead of houses. ",
"hint2": "I_c_p_i_n "
},
"movie": {...}
}

Source link