Google Vertex AI AutoML tutorial: Data preparation

This post is the first in a two-part series introducing Vertex AI, Google's newly released integrated machine learning and deep learning platform. This post details data preparation. Check back on Monday for the second article on the training and inference process.

Google's Vertex AI is an integrated machine learning and deep learning platform that supports AutoML and custom models. In this tutorial, you use Vertex AI AutoML to train an image classification model to detect face masks. For an overview of Vertex AI, read this article published last week on The New Stack.

To complete this tutorial, you must have a valid Google Cloud subscription and the Google Cloud SDK installed on your workstation.

Training this model involves three steps: dataset creation, training, and inference.

Creating a dataset involves uploading and labeling images. Because it uses AutoML, training requires minimal intervention. You don't need to write any code or perform any steps such as hyperparameter tuning. Once trained, the model can be downloaded and deployed to edge devices or hosted to perform inference.

The first part of this tutorial focuses on creating the dataset. This tutorial uses a raw dataset of masked and unmasked faces created by Prajna Bhandary.

She used image augmentation techniques to generate more than 600 images for each class.

Although this is not the most comprehensive dataset, it is good for AutoML as it allows you to train models with fewer images.

Upload these images to a Google Cloud Storage bucket that contains two folders. mask and no-mask. A CSV file containing each image's path and label is uploaded to the same bucket and becomes input to Vertex AI.

Let's create a Google Cloud Storage bucket.

BUCKET=j-mask-nomask REGION=Europe-West 4

bucket=j–mask–no mask

region=Europe–WEST4

Feel free to change the values to reflect your bucket name and region. At the time of release, Vertex AI AutoML is only available in the US-CENTRAL1 (Iowa) and EUROPE-WEST4 (Netherlands) regions.

gsutil mb -l $REGION -c STANDARD gs://$BUCKET

gsutil M.B. –I $region –c standard G.S.://$bucket

Start uploading images to the bucket above.

Clone the GitHub repository on your local machine.

git clone https://github.com/prajnasb/observations.git

git clone https//github.com/pradinasub/observation.git

Go to. data Change to the directory and run the following command:

gsutil cp -r with_mask gs://$BUCKET

gsutil C.P. –r with mask G.S.://$bucket

gsutil cp -r without_mask gs://$BUCKET

gsutil C.P. –r no mask G.S.://$bucket

To upload images from both directories at the same time, run the command in two different terminal windows.

Check the Google Cloud Console and browse to the folder.

Once the images are uploaded, you need to generate a CSV file containing the path and label for each image.

Run a simple BASH script for this task.

For the file name of with_mask/*.jpg; [ -e “$filename” ] || Continue echo “gs://$BUCKET/$filename,mask” >> Mask ds.csv Done

for file name in with mask/*.jpg; do

[ –e “$filename” ] || Continue

echo “gs://$BUCKET/$filename,mask” >> mask–DS.csv

end

This will set up the file. mask-ds.csv I have an entry like this:

gs://j-mask-nomask/with_mask/0-with-mask.jpg,Mask gs://j-mask-nomask/with_mask/1-with-mask.jpg,Mask gs://j-mask- nomask/with_mask/10-with-mask.jpg,Mask gs://j-mask-nomask/with_mask/100-with-mask.jpg,Mask

G.S.//j–mask–no mask/with mask/0–and–mask.jpg,mask

G.S.//j–mask–no mask/with mask/1–and–mask.jpg,mask

G.S.//j–mask–no mask/with mask/Ten–and–mask.jpg,mask

G.S.//j–mask–no mask/with mask/100–and–mask.jpg,mask

Let's repeat this for the second folder to generate the unmasked path and label.

For the file name without_mask/*.jpg; [ -e “$filename” ] || Continue echo “gs://$BUCKET/$filename,no-mask” >> Mask ds.csv Done

for file name in no mask/*.jpg; do

[ –e “$filename” ] || Continue

echo “gs://$BUCKET/$filename,no mask” >> mask–DS.csv

end

This will add a line to the CSV file containing the path of the unmasked image.

gs://j-mask-nomask/without_mask/0.jpg, without mask gs://j-mask-nomask/without_mask/1.jpg, without mask gs://j-mask-nomask/without_mask/ 10. jpg, without mask gs://j-mask-nomask/without_mask/100.jpg, without mask gs://j-mask-nomask/without_mask/101.jpg, without mask

G.S.//j–mask–no mask/no mask/0.jpg,no–mask

G.S.//j–mask–no mask/no mask/1.jpg,no–mask

G.S.//j–mask–no mask/no mask/10.jpg,no–mask

G.S.//j–mask–no mask/no mask/100.jpg,no–mask

G.S.//j–mask–No mask/no mask/101.jpg,no–mask

Finally, you need to upload the CSV file to your bucket.

gsutil cp mask-ds.csv gs://$BUCKET

gsutil C.P. mask–DS.csv G.S.://$bucket

The CSV file is the key input to Vertex AI AutoML to create the final dataset.

When I run the command, gsutil ls gs://$BUCKET Verify that the CSV file was successfully uploaded to your Google Cloud Storage bucket.

Once you've uploaded your data to cloud storage, it's time to convert it into a Vertex AI dataset.

Access the Vertex AI dashboard in Google Cloud Console and enable the API. Select a region and[データセットの作成]Click.

Name your dataset, select Image Classification with Single Label, and click Create.

In the next section, select Import files from Cloud Storage.

Browse to your Cloud Storage bucket, select the CSV file you uploaded earlier, and click Continue.

The import process will take a few minutes. Once complete, you will be taken to the next page where you will see all the images (both labeled and unlabeled) identified from your dataset.