Google Vertex AI AutoML tutorial: Data preparation

AI Basics


This post is the first in a two-part series introducing Vertex AI, Google's newly released integrated machine learning and deep learning platform. This post details data preparation. Check back on Monday for the second article on the training and inference process.

Google's Vertex AI is an integrated machine learning and deep learning platform that supports AutoML and custom models. In this tutorial, you use Vertex AI AutoML to train an image classification model to detect face masks. For an overview of Vertex AI, read this article published last week on The New Stack.

To complete this tutorial, you must have a valid Google Cloud subscription and the Google Cloud SDK installed on your workstation.

Training this model involves three steps: dataset creation, training, and inference.

Creating a dataset involves uploading and labeling images. Because it uses AutoML, training requires minimal intervention. You don't need to write any code or perform any steps such as hyperparameter tuning. Once trained, the model can be downloaded and deployed to edge devices or hosted to perform inference.

The first part of this tutorial focuses on creating the dataset. This tutorial uses a raw dataset of masked and unmasked faces created by Prajna Bhandary.

She used image augmentation techniques to generate more than 600 images for each class.

Although this is not the most comprehensive dataset, it is good for AutoML as it allows you to train models with fewer images.

Upload these images to a Google Cloud Storage bucket that contains two folders. mask and no-mask. A CSV file containing each image's path and label is uploaded to the same bucket and becomes input to Vertex AI.

Let's create a Google Cloud Storage bucket.

Feel free to change the values ​​to reflect your bucket name and region. At the time of release, Vertex AI AutoML is only available in the US-CENTRAL1 (Iowa) and EUROPE-WEST4 (Netherlands) regions.

Start uploading images to the bucket above.

Clone the GitHub repository on your local machine.

Go to. data Change to the directory and run the following command:

To upload images from both directories at the same time, run the command in two different terminal windows.

Check the Google Cloud Console and browse to the folder.

Once the images are uploaded, you need to generate a CSV file containing the path and label for each image.

Run a simple BASH script for this task.

This will set up the file. mask-ds.csv I have an entry like this:

Let's repeat this for the second folder to generate the unmasked path and label.

This will add a line to the CSV file containing the path of the unmasked image.

Finally, you need to upload the CSV file to your bucket.

The CSV file is the key input to Vertex AI AutoML to create the final dataset.

When I run the command, gsutil ls gs://$BUCKET Verify that the CSV file was successfully uploaded to your Google Cloud Storage bucket.

Once you've uploaded your data to cloud storage, it's time to convert it into a Vertex AI dataset.

Access the Vertex AI dashboard in Google Cloud Console and enable the API. Select a region and[データセットの作成]Click.

Name your dataset, select Image Classification with Single Label, and click Create.

In the next section, select Import files from Cloud Storage.

Browse to your Cloud Storage bucket, select the CSV file you uploaded earlier, and click Continue.

The import process will take a few minutes. Once complete, you will be taken to the next page where you will see all the images (both labeled and unlabeled) identified from your dataset.

You may receive warnings or errors during the import process because Vertex AI detected duplicate images. You can safely ignore these.

Now you're ready to start training. For a tutorial on the training and inference process, see the next part of the tutorial.

group Created with sketch.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *