Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

This can be a visitor weblog submit co-written with Ben Veasey, Jeremy Anderson, Jordan Knight, and June Li from Vacationers.

Satellite tv for pc and aerial photographs present perception into a variety of issues, together with precision agriculture, insurance coverage threat evaluation, city growth, and catastrophe response. Coaching machine studying (ML) fashions to interpret this knowledge, nevertheless, is bottlenecked by expensive and time-consuming human annotation efforts. One option to overcome this problem is thru self-supervised studying (SSL). By coaching on giant quantities of unlabeled picture knowledge, self-supervised fashions be taught picture representations that may be transferred to downstream duties, reminiscent of picture classification or segmentation. This strategy produces picture representations that generalize nicely to unseen knowledge and reduces the quantity of labeled knowledge required to construct performant downstream fashions.

On this submit, we display easy methods to prepare self-supervised imaginative and prescient transformers on overhead imagery utilizing Amazon SageMaker. Vacationers collaborated with the Amazon Machine Studying Options Lab (now often known as the Generative AI Innovation Heart) to develop this framework to help and improve aerial imagery mannequin use circumstances. Our answer relies on the DINO algorithm and makes use of the SageMaker distributed knowledge parallel library (SMDDP) to separate the information over a number of GPU cases. When pre-training is full, the DINO picture representations may be transferred to a wide range of downstream duties. This initiative led to improved mannequin performances throughout the Vacationers Information & Analytics house.

Overview of answer

The 2-step course of for pre-training imaginative and prescient transformers and transferring them to supervised downstream duties is proven within the following diagram.

Within the following sections, we offer a walkthrough of the answer utilizing satellite tv for pc photographs from the BigEarthNet-S2 dataset. We construct on the code supplied within the DINO repository.

Conditions

Earlier than getting began, you want entry to a SageMaker pocket book occasion and an Amazon Easy Storage Service (Amazon S3) bucket.

Put together the BigEarthNet-S2 dataset

BigEarthNet-S2 is a benchmark archive that comprises 590,325 multispectral photographs collected by the Sentinel-2 satellite tv for pc. The photographs doc the land cowl, or bodily floor options, of ten European international locations between June 2017 and Could 2018. The kinds of land cowl in every picture, reminiscent of pastures or forests, are annotated based on 19 labels. The next are a number of instance RGB photographs and their labels.

Step one in our workflow is to organize the BigEarthNet-S2 dataset for DINO coaching and analysis. We begin by downloading the dataset from the terminal of our SageMaker pocket book occasion:

wget https://bigearth.internet/downloads/BigEarthNet-S2-v1.0.tar.gz
tar -xvf BigEarthNet-S2-v1.0.tar.gz

The dataset has a measurement of about 109 GB. Every picture is saved in its personal folder and comprises 12 spectral channels. Three bands with 60m spatial decision (60-meter pixel top/width) are designed to determine aerosols (B01), water vapor (B09), and clouds (B10). Six bands with 20m spatial decision are used to determine vegetation (B05, B06, B07, B8A) and distinguish between snow, ice, and clouds (B11, B12). Three bands with 10m spatial decision assist seize seen and near-infrared mild (B02, B03, B04, B8/B8A). Moreover, every folder comprises a JSON file with the picture metadata. An in depth description of the information is supplied within the BigEarthNet Information.

To carry out statistical analyses of the information and cargo photographs throughout DINO coaching, we course of the person metadata recordsdata into a typical geopandas Parquet file. This may be finished utilizing the BigEarthNet Widespread and the BigEarthNet GDF Builder helper packages:

python -m bigearthnet_gdf_builder.builder build-recommended-s2-parquet BigEarthNet-v1.0/

The ensuing metadata file comprises the really useful picture set, which excludes 71,042 photographs which are totally coated by seasonal snow, clouds, and cloud shadows. It additionally comprises data on the acquisition date, location, land cowl, and prepare, validation, and check break up for every picture.

We retailer the BigEarthNet-S2 photographs and metadata file in an S3 bucket. As a result of we use true colour photographs throughout DINO coaching, we solely add the crimson (B04), inexperienced (B03), and blue (B02) bands:

aws s3 cp final_ben_s2.parquet s3://bigearthnet-s2-dataset/metadata/
aws s3 cp BigEarthNet-v1.0/ s3://bigearthnet-s2-dataset/data_rgb/
–recursive
–exclude “*”
–include “_B02.tif”
–include “_B03.tif”
–include “_B04.tif”

The dataset is roughly 48 GB in measurement and has the next construction:

bigearthnet-s2-dataset/ Amazon S3 bucket
├── metadata/
│ └── final_ben_s2.parquet
└── dataset_rgb/
├── S2A_MSIL2A_20170613T101031_0_45/
│ └── S2A_MSIL2A_20170613T101031_0_45_B02.tif Blue channel
│ └── S2A_MSIL2A_20170613T101031_0_45_B03.tif Inexperienced channel
│ └── S2A_MSIL2A_20170613T101031_0_45_B04.tif Crimson channel

Prepare DINO fashions with SageMaker

Now that our dataset has been uploaded to Amazon S3, we transfer to coach DINO fashions on BigEarthNet-S2. As proven within the following determine, the DINO algorithm passes totally different international and native crops of an enter picture to pupil and instructor networks. The coed community is taught to match the output of the instructor community by minimizing the cross-entropy loss. The coed and instructor weights are linked by an exponential shifting common (EMA).

We make two modifications to the unique DINO code. First, we create a customized PyTorch dataset class to load the BigEarthNet-S2 photographs. The code was initially written to course of ImageNet knowledge and expects photographs to be saved by class. BigEarthNet-S2, nevertheless, is a multi-label dataset the place every picture resides in its personal subfolder. Our dataset class hundreds every picture utilizing the file path saved within the metadata:

import pandas as pd
import rasterio
from PIL import Picture
import torch
from torch.utils.knowledge import Dataset, DataLoader
from torchvision import transforms, utils

OPTICAL_MAX_VALUE = 2000

LAND_COVER_LABELS = [
“Urban fabric”,
“Industrial or commercial units”,
“Arable land”,
“Permanent crops”,
“Pastures”,
“Complex cultivation patterns”,
“Land principally occupied by agriculture, with significant areas of natural vegetation”,
“Agro-forestry areas”,
“Broad-leaved forest”,
“Coniferous forest”,
“Mixed forest”,
“Natural grassland and sparsely vegetated areas”,
“Moors, heathland and sclerophyllous vegetation”,
“Transitional woodland, shrub”,
“Beaches, dunes, sands”,
“Inland wetlands”,
“Coastal wetlands”,
“Inland waters”,
“Marine waters”,
]

class BigEarthNetDataset(Dataset):
“””
PyTorch dataset class that hundreds the BigEarthNet-S2 photographs from a metadata file.

Args:
metadata_file: path to metadata file
data_dir: listing the place BigEarthNet-S2 knowledge is situated
break up: prepare, validation, or check break up
rework: transformations utilized to the enter picture
“””
def __init__(self, metadata_file, data_dir, break up=”prepare”, rework=None):
# picture file paths from metadata
metadata = pd.read_parquet(metadata_file)
self.metadata_split = metadata[metadata[“original_split”] == break up]
self.data_dir = data_dir
self.patch_names = self.metadata_split[“name”].tolist()

# one-hot-encode land cowl labels
multiclass_labels = self.metadata_split.new_labels.tolist()
self.labels = self.get_multi_onehot_labels(multiclass_labels)

# transforms
self.rework = rework

def __len__(self):
“””Return size of dataset.”””
return len(self.metadata_split)

def __getitem__(self, index):
“””Returns the picture and label for a given index.”””
patch_name = self.patch_names[index]
file_path = os.path.be a part of(self.data_dir, patch_name)

# generate RGB picture
r_channel = rasterio.open(os.path.be a part of(file_path, patch_name + “_B04.tif”)).learn(1)
g_channel = rasterio.open(os.path.be a part of(file_path, patch_name + “_B03.tif”)).learn(1)
b_channel = rasterio.open(os.path.be a part of(file_path, patch_name + “_B02.tif”)).learn(1)

picture = np.stack([r_channel, g_channel, b_channel], axis=2)
picture = picture / OPTICAL_MAX_VALUE * 255
picture = np.clip(picture, 0, 225).astype(np.uint8)

# apply picture transforms
picture = Picture.fromarray(picture, mode=”RGB”)
if self.rework isn’t None:
picture = self.rework(picture)

# load label
label = self.labels[index]

return picture, label

def get_multi_onehot_labels(self, multiclass_labels):
“””Convert BEN-19 labels to one-hot encoded vector.”””
targets = torch.zeros([len(multiclass_labels), len(LAND_COVER_LABELS)])
for index, img_labels in enumerate(multiclass_labels):
for label in img_labels:
index_hot = LAND_COVER_LABELS.index(label)
targets[index, index_hot] = 1.
return targets

This dataset class is named in main_dino.py throughout coaching. Though the code features a operate to one-hot encode the land cowl labels, these labels will not be utilized by the DINO algorithm.

The second change we make to the DINO code is so as to add help for SMDDP. We add the next code to the init_distributed_mode operate within the util.py file:

init_distributed_mode operate within the util.py file:

def init_distributed_mode(args):
if json.hundreds(
os.environ.get(‘SM_FRAMEWORK_PARAMS’, ‘{}’))
.get(‘sagemaker_distributed_dataparallel_enabled’, False)
):
# launch coaching with SMDDP
dist.init_process_group(backend=’smddp’)
args.word_size = dist.get_world_size()
args.gpu = int(os.environ[‘LOCAL_RANK’])

With these changes, we’re prepared to coach DINO fashions on BigEarthNet-S2 utilizing SageMaker. To coach on a number of GPUs or cases, we create a SageMaker PyTorch Estimator that ingests the DINO coaching script, the picture and metadata file paths, and the coaching hyperparameters:

import time
from sagemaker.pytorch import PyTorch

# output bucket the place remaining mannequin artifacts are uploaded
DINO_OUTPUT_BUCKET = ‘dino-models’

# paths on coaching occasion
sm_metadata_path=”/decide/ml/enter/knowledge/metadata”
sm_data_path=”/decide/ml/enter/knowledge/prepare”
sm_output_path=”/decide/ml/output/knowledge”
sm_checkpoint_path=”/decide/ml/checkpoints”

# coaching job title
dino_base_job_name = f’dino-model-{int(time.time())}’

# create SageMaker Estimator
estimator = PyTorch(
base_job_name=dino_base_job_name,
source_dir=”path/to/aerial_featurizer”,
entry_point=”main_dino.py”,
position=position,
framework_version=”1.12″,
py_version=”py38″,
instance_count=1,
instance_type=”ml.p3.16xlarge”,
distribution = {‘smdistributed’:{‘dataparallel’:{‘enabled’: True}}},
volume_size=100,
sagemaker_session=sagemaker_session,
hyperparameters = {
# hyperparameters handed to entry level script
‘arch’: ‘vit_small’,
‘patch_size’: 16,
‘metadata_dir’: sm_metadata_path,
‘data_dir’: sm_data_path,
‘output_dir’: sm_output_path,
‘checkpoint_dir’: sm_checkpoint_path,
‘epochs’: 100,
‘saveckp_freq’: 20,
},
max_run=24*60*60,
checkpoint_local_path = sm_checkpoint_path,
checkpoint_s3_uri =f’s3://{DINO_OUTPUT_BUCKET}/checkpoints/{base_job_name}’,
debugger_hook_config=False,
)

This code specifies that we are going to prepare a small imaginative and prescient transformer mannequin (21 million parameters) with a patch measurement of 16 for 100 epochs. It’s best apply to create a brand new checkpoint_s3_uri for every coaching job to be able to cut back the preliminary knowledge obtain time. As a result of we’re utilizing SMDDP, we should prepare on an ml.p3.16xlarge, ml.p3dn.24xlarge, or ml.p4d.24xlarge occasion. It is because SMDDP is barely enabled for the biggest multi-GPU cases. To coach on smaller occasion sorts with out SMDDP, you will want to take away the distribution and debugger_hook_config arguments from the estimator.

After now we have created the SageMaker PyTorch Estimator, we launch the coaching job by calling the match technique. We specify the enter coaching knowledge utilizing the Amazon S3 URIs for the BigEarthNet-S2 metadata and pictures:

# name match to start coaching
estimator.match(
inputs={
‘metadata’: ‘s3://bigearthnet-s2-dataset/metadata/’,
‘prepare’: ‘s3://bigearthnet-s2-dataset/data_rgb/’,
},
wait=False
)

SageMaker spins up the occasion, copies the coaching script and dependencies, and begins DINO coaching. We are able to monitor the progress of the coaching job from our Jupyter pocket book utilizing the next instructions:

# monitor coaching
training_job_name = estimator.latest_training_job.title
attached_estimator = PyTorch.connect(training_job_name)
attached_estimator.logs()

We are able to additionally monitor occasion metrics and examine log recordsdata on the SageMaker console beneath Coaching jobs. Within the following figures, we plot the GPU utilization and loss operate for a DINO mannequin educated on an ml.p3.16xlarge occasion with a batch measurement of 128.

Throughout coaching, the GPU utilization is 83% of the ml.p3.16xlarge capability (8 NVIDIA Tesla V100 GPUs) and the VRAM utilization is 85%. The loss operate steadily decreases with every epoch, indicating that the outputs of the coed and instructor networks have gotten extra related. In complete, coaching takes about 11 hours.

Switch studying to downstream duties

Our educated DINO mannequin may be transferred to downstream duties like picture classification or segmentation. On this part, we use the pre-trained DINO options to foretell the land cowl courses for photographs within the BigEarthNet-S2 dataset. As depicted within the following diagram, we prepare a multi-label linear classifier on high of frozen DINO options. On this instance, the enter picture is related to arable land and pasture land covers.

A lot of the code for the linear classifier is already in place within the authentic DINO repository. We make a number of changes for our particular job. As earlier than, we use the customized BigEarthNet dataset to load photographs throughout coaching and analysis. The labels for the pictures are one-hot encoded as 19-dimensional binary vectors. We use the binary cross-entropy for the loss operate and compute the common precision to judge the efficiency of the mannequin.

To coach the classifier, we create a SageMaker PyTorch Estimator that runs the coaching script, eval_linear.py. The coaching hyperparameters embrace the small print of the DINO mannequin structure and the file path for the mannequin checkpoint:

# output bucket the place remaining mannequin artifacts are uploaded
CLASSIFIER_OUTPUT_BUCKET = ‘land-cover-classification’

# DINO checkpoint title
checkpoint=”checkpoint.pth”

# paths on coaching occasion
sm_dino_path = f’/decide/ml/enter/knowledge/dino_checkpoint’
sm_dino_checkpoint = f'{sm_dino_path}/{checkpoint}’

# coaching job title
classifier_base_job_name = f’linear-classifier-{int(time.time())}’

# create Estimator
estimator = PyTorch(
base_job_name=classifier_base_job_name,
source_dir=”path/to/aerial_featurizer”,
entry_point=”eval_linear.py”,
position=position,
framework_version=’1.12′,
py_version=’py38′,
instance_count=1,
instance_type=”ml.p3.2xlarge”,
sagemaker_session=sagemaker_session,
hyperparameters = {
# hyperparameters handed to entry level script
‘arch’: ‘vit_small’,
‘pretrained_weights’: sm_dino_checkpoint,
‘epochs’: 50,
‘data_dir’: sm_data_path,
‘metadata_dir’: sm_metadata_path,
‘output_dir’: sm_checkpoint_path,
‘num_labels’: 19,
},
max_run=1*60*60,
checkpoint_local_path = sm_checkpoint_path,
checkpoint_s3_uri =f’s3://{CLASSIFIER_OUTPUT_BUCKET}/checkpoints/{base_job_name}’,
)

We begin the coaching job utilizing the match technique, supplying the Amazon S3 places of the BigEarthNet-S2 metadata and coaching photographs and the DINO mannequin checkpoint:

# name match to start coaching
estimator.match(
inputs={
‘metadata’: ‘s3://bigearthnet-s2-dataset/metadata/’,
‘dataset’: ‘s3://bigearthnet-s2-dataset/data_rgb/’,
‘dino_checkpoint’: f’s3://bigearthnet-s2-dataset/dino-models/checkpoints/{dino_base_job_name}’,
},
wait=False
)

When coaching is full, we are able to carry out inference on the BigEarthNet-S2 check set utilizing SageMaker batch rework or SageMaker Processing. Within the following desk, we evaluate the common precision of the linear mannequin on check set photographs utilizing two totally different DINO picture representations. The primary mannequin, ViT-S/16 (ImageNet), is the small imaginative and prescient transformer checkpoint included within the DINO repository that was pre-trained utilizing front-facing photographs within the ImageNet dataset. The second mannequin, ViT-S/16 (BigEarthNet-S2), is the mannequin we produced by pre-training on overhead imagery.

Mannequin
Common precision

ViT-S/16 (ImageNet)
0.685

ViT-S/16 (BigEarthNet-S2)
0.732

We discover that the DINO mannequin pre-trained on BigEarthNet-S2 transfers higher to the land cowl classification job than the DINO mannequin pre-trained on ImageNet, leading to a 6.7% enhance within the common precision.

Clear up

After finishing DINO coaching and switch studying, we are able to clear up our assets to keep away from incurring costs. We cease or delete our pocket book occasion and take away any undesirable knowledge or mannequin artifacts from Amazon S3.

Conclusion

This submit demonstrated easy methods to prepare DINO fashions on overhead imagery utilizing SageMaker. We used SageMaker PyTorch Estimators and SMDDP to be able to generate representations of BigEarthNet-S2 photographs with out the necessity for express labels. We then transferred the DINO options to a downstream picture classification job, which concerned predicting the land cowl class of BigEarthNet-S2 photographs. For this job, pre-training on satellite tv for pc imagery yielded a 6.7% enhance in common precision relative to pre-training on ImageNet.

You need to use this answer as a template for coaching DINO fashions on large-scale, unlabeled aerial and satellite tv for pc imagery datasets. To be taught extra about DINO and constructing fashions on SageMaker, take a look at the next assets:

In regards to the Authors

Ben Veasey is a Senior Affiliate Information Scientist at Vacationers, working throughout the AI & Automation Accelerator workforce. With a deep understanding of revolutionary AI applied sciences, together with pc imaginative and prescient, pure language processing, and generative AI, Ben is devoted to accelerating the adoption of those applied sciences to optimize enterprise processes and drive effectivity at Vacationers.

Jeremy Anderson is a Director & Information Scientist at Vacationers on the AI & Automation Accelerator workforce. He’s taken with fixing enterprise issues with the most recent AI and deep studying strategies together with giant language fashions, foundational imagery fashions, and generative AI. Previous to Vacationers, Jeremy earned a PhD in Molecular Biophysics from the Johns Hopkins College and likewise studied evolutionary biochemistry. Outdoors of labor yow will discover him operating, woodworking, or rewilding his yard.

Jordan Knight is a Senior Information Scientist working for Vacationers within the Enterprise Insurance coverage Analytics & Analysis Division. His ardour is for fixing difficult real-world pc imaginative and prescient issues and exploring new state-of-the-art strategies to take action. He has a selected curiosity within the social impression of ML fashions and the way we are able to proceed to enhance modeling processes to develop ML options which are equitable for all. Jordan graduated from MIT with a Grasp’s in Enterprise Analytics. In his free time yow will discover him both mountaineering, mountaineering, or persevering with to develop his considerably rudimentary cooking expertise.

June Li is an information scientist at Vacationers’s Enterprise Insurance coverage’s Synthetic Intelligence workforce, the place she leads and coordinates work within the AI imagery portfolio. She is keen about implementing revolutionary AI options that carry substantial worth to the enterprise companions and stakeholders. Her work has been integral in remodeling advanced enterprise challenges into alternatives by leveraging cutting-edge AI applied sciences.

Sourav Bhabesh is a Senior Utilized Scientist on the AWS Titan Labs, the place he builds Foundational Mannequin (FM) capabilities and options. His specialty is Pure Language Processing (NLP) and is keen about deep studying. Outdoors of labor he enjoys studying books and touring.

Laura Kulowski is an Utilized Scientist at Amazon’s Generative AI Innovation Heart, the place she works carefully with clients to construct generative AI options. In her free time, Laura enjoys exploring new locations by bike.

Andrew Ang is a Sr. Machine Studying Engineer at AWS. Along with serving to clients construct AI/ML options, he enjoys water sports activities, squash and watching journey & meals vlogs.

Mehdi Noori is an Utilized Science Supervisor on the Generative AI Innovation Heart. With a ardour for bridging know-how and innovation, he assists AWS clients in unlocking the potential of generative AI, turning potential challenges into alternatives for speedy experimentation and innovation by specializing in scalable, measurable, and impactful makes use of of superior AI applied sciences, and streamlining the trail to manufacturing.

Source link

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

Eric Evans receives Department of Defense Medal for Distinguished Public Service | MIT News

Adobe Express Enhances User Experience With Firefly Generative AI

Robotics part of mid-year success for IDS Imaging

Recommended For You

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

Eric Evans receives Department of Defense Medal for Distinguished Public Service | MIT News

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AI in Manufacturing: Overcoming Data and Talent Barriers

Robotics part of mid-year success for IDS Imaging

Yaskawa Motoman Highlights Robotic Solutions to Boost Production Stability at Pack Expo

Wauseon Machine Announces Consolidation of Aftermarket Services Through Merger

Leave a Reply Cancel reply

A technique for more effective multipurpose robots | MIT News

Helping robots grasp the unpredictable | MIT News

The Current State of AI! (My Personal News Recap)

Robotics investments reach $418M in November 2023

2024 World Battery & Energy Storage Industry Expo (WBE)

MIT faculty, instructors, students experiment with generative AI in teaching and learning | MIT News

What is AI – Artificial Intelligence in Telugu | Future of AI | TeluguBadi

Zion Solutions Group Joins Forces with Locus Robotics to Supercharge Warehouse Productivity

A method to enable safe mobile robot navigation in dynamic environments

Robot Talk Episode 90 – Robotically Augmented People

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

RBR50 Spotlight: Slip Robotics minimizes trailer loading times with simple approach

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

You might also like

Overview of answer

Conditions

Put together the BigEarthNet-S2 dataset

Prepare DINO fashions with SageMaker

Switch studying to downstream duties

Clear up

Conclusion

In regards to the Authors

Adobe Express Enhances User Experience With Firefly Generative AI

Robotics part of mid-year success for IDS Imaging

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password