The World of Conditional Generative Adversarial Networks (GANS): Enhancing the Bottle Challenge Dataset

Machine LearningData ScienceArtificial Intelligence

Aug 6

Unleashing the Power of Conditional GANs: The Ultimate Bottle Challenge

In the dynamic realm of computer science, Generative Adversarial Networks (GANs) have captured the imagination of researchers and enthusiasts alike. Among the many variants of GANs, the Conditional GAN (CGAN) stands out for its ability to generate images conditioned on specific inputs. Today, we'll delve into the exciting world of CGANs by tackling a unique and engaging problem: generating bottle images with varying levels of liquid. This challenge not only showcases the capabilities of CGANs but also hints at a future where technology can create highly customized and realistic data for numerous applications.

The Challenge: Creating Realistic Bottle Images

Imagine having a dataset of bottle images with different amounts of liquid – some bottles are empty, some are half-full, and others are completely full. Now, what if we wanted to generate new bottle images with precise liquid levels to enhance our dataset? This is where CGANs come into play. By conditioning the generation process on specific labels, we can create images that meet our exact requirements.

For this tutorial, we'll use the Water Bottle Dataset available on Kaggle. This dataset contains images of bottles with varying levels of liquid, perfect for our challenge.

Step 1: Preparing the Dataset

Before diving into the CGAN architecture, let's prepare our dataset. Our goal is to create a diverse set of bottle images with five different liquid levels: empty, 25% full, 50% full, 75% full, and full. Here's how we start:

Download the dataset from Kaggle.
Extract the dataset and organize the images into folders based on the liquid levels.

python

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
import os

# Load and prepare the dataset
dataset_path = './data/Water_Bottle_Dataset/'
class_dataset = keras.preprocessing.image_dataset_from_directory(
    dataset_path, labels="inferred", image_size=(64, 64), batch_size=32
)
class_dataset = class_dataset.map(lambda x,y: (x / 255.0, y)) # Normalize the images

Step 2: Building the Discriminator

The Discriminator's job is to distinguish between real and generated images. To condition this network on our labels, we'll use the Keras Functional API, allowing it to accept both images and labels as inputs.

python

# Define the Discriminator model
img_shape = (64, 64, 3)
num_classes = 5

# Label input
in_label = layers.Input(shape=(1,))
label_emb = layers.Embedding(num_classes, 50)(in_label)
li = layers.Dense(img_shape[0] * img_shape[1] * img_shape[2])(label_emb)
li = layers.Reshape((img_shape[0], img_shape[1], img_shape[2]))(li)

# Image input
in_image = layers.Input(shape=img_shape, name='image')
merge = layers.Concatenate()([in_image, li])

# Convolutional layers
fe = layers.Conv2D(64, (3, 3), strides=(2, 2), padding='same')(merge)
fe = layers.LeakyReLU(alpha=0.2)(fe)
fe = layers.Conv2D(128, (3, 3), strides=(2, 2), padding='same')(fe)
fe = layers.LeakyReLU(alpha=0.2)(fe)
fe = layers.Conv2D(256, (3, 3), strides=(2, 2), padding='same')(fe)
fe = layers.LeakyReLU(alpha=0.2)(fe)
fe = layers.Flatten()(fe)
fe = layers.Dropout(0.4)(fe)
out_layer = layers.Dense(1, activation='sigmoid')(fe)

# Create the Discriminator model
discriminator = keras.models.Model([in_image, in_label], out_layer)

Step 3: Building the Generator

The Generator creates new images from random noise and labels. This "reverse engineering" process involves upsampling and conditioning on labels to ensure the generated images meet our specifications.

python

# Define the Generator model
latent_dim = 128

# Label input
in_label = layers.Input(shape=(1,))
label_emb = layers.Embedding(num_classes, 50)(in_label)
num_nodes = 8 * 8 * 1
li = layers.Dense(num_nodes)(label_emb)
li = layers.Reshape((8, 8, 1))(li)

# Latent code input
in_lat = layers.Input(shape=(latent_dim,), name='latent_code')
num_img_nodes = 128 * 8 * 8
gen = layers.Dense(num_img_nodes)(in_lat)
gen = layers.LeakyReLU(alpha=0.2)(gen)
gen = layers.Reshape((8, 8, 128))(gen)
merge = layers.Concatenate()([gen, li])

# Transpose convolutional layers
gen = layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding='same')(merge)
gen = layers.LeakyReLU(alpha=0.2)(gen)
gen = layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding='same')(gen)
gen = layers.LeakyReLU(alpha=0.2)(gen)
gen = layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding='same')(gen)
gen = layers.LeakyReLU(alpha=0.2)(gen)
out_layer = layers.Conv2D(3, (5, 5), activation='tanh', padding='same')(gen)

# Create the Generator model
generator = keras.models.Model([in_lat, in_label], out_layer)

Step 4: Training the CGAN

Training a CGAN involves a dynamic interplay between the Generator and the Discriminator. Each network improves iteratively, with the Generator trying to produce realistic images and the Discriminator learning to distinguish real from fake.

python

class CGAN(keras.Model):
    def __init__(self, discriminator, generator, latent_dim):
        super(CGAN, self).__init__()
        self.discriminator = discriminator
        self.generator = generator
        self.latent_dim = latent_dim

    def compile(self, d_optimizer, g_optimizer, loss_fn):
        super(CGAN, self).compile()
        self.d_optimizer = d_optimizer
        self.g_optimizer = g_optimizer
        self.loss_fn = loss_fn
        self.d_loss_metric = keras.metrics.Mean(name="d_loss")
        self.g_loss_metric = keras.metrics.Mean(name="g_loss")

    @property
    def metrics(self):
        return [self.d_loss_metric, self.g_loss_metric]

    def train_step(self, batch_data):
        real_images, real_labels = batch_data
        real_labels = tf.cast(real_labels, tf.float32)
        batch_size = tf.shape(real_images)[0]

        # Generate fake images
        random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
        generated_images = self.generator([random_latent_vectors, real_labels])
        combined_images = tf.concat([generated_images, real_images], axis=0)
        combined_labels = tf.concat([real_labels, real_labels], axis=0)

        # Discriminator labels
        labels = tf.concat([tf.ones((batch_size, 1)), tf.zeros((batch_size, 1))], axis=0)
        labels += 0.05 * tf.random.uniform(tf.shape(labels))

        # Train the Discriminator
        with tf.GradientTape() as tape:
            predictions = self.discriminator([combined_images, combined_labels])
            d_loss = self.loss_fn(labels, predictions)
        grads = tape.gradient(d_loss, self.discriminator.trainable_weights)
        self.d_optimizer.apply_gradients(zip(grads, self.discriminator.trainable_weights))

        # Train the Generator
        random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
        misleading_labels = tf.ones((batch_size, 1))

        with tf.GradientTape() as tape:
            predictions = self.discriminator([self.generator([random_latent_vectors, real_labels]), real_labels])
            g_loss = self.loss_fn(misleading_labels, predictions)
        grads = tape.gradient(g_loss, self.generator.trainable_weights)
        self.g_optimizer.apply_gradients(zip(grads, self.generator.trainable_weights))

        self.d_loss_metric.update_state(d_loss)
        self.g_loss_metric.update_state(g_loss)
        return {"d_loss": self.d_loss_metric.result(), "g_loss": self.g_loss_metric.result()}

Step 5: Monitoring Progress

Tracking the progress of our CGAN is crucial. We save generated images at the end of each epoch to visualize how the model improves over time.

python

class CGANMonitor(keras.callbacks.Callback):
    def __init__(self, latent_dim=128):
        self.num_img = 5
        self.latent_dim = latent_dim

    def on_epoch_end(self, epoch, logs=None):
        random_latent_vectors = tf.random.normal(shape=(self.num_img, self.latent_dim))
        label = tf.reshape(tf.constant([0, 1, 2, 3, 4], dtype=tf.float32), (5, 1))
        generated_images = self.model.generator((random_latent_vectors, label))
        generated_images *= 255
        generated_images.numpy()
        for i in range(self.num_img):
            img = keras.preprocessing.image.array_to_img(generated_images[i])
            img.save(f'./temp_img/generated_img_{epoch:03d}_{i}.png')

What This Means for the Future of Technology

Successfully implementing a CGAN to generate bottle images with varying liquid levels showcases the potential of conditional generative models in creating highly specific and realistic data. This capability extends beyond academic exercises and into real-world applications, such as synthetic data generation for training machine learning models, enhancing datasets where data is scarce, and even creating realistic virtual environments.

As technology continues to evolve, the ability to generate tailored data will become increasingly valuable. It could revolutionize fields like medical imaging, where generating realistic patient data can help in training diagnostic models, or in the entertainment industry, where creating lifelike characters and environments is essential. The future holds immense possibilities for CGANs and similar technologies, pushing the boundaries of what we can achieve with artificial intelligence.

Embark on this exciting challenge and dive into the world of CGANs. As you experiment and fine-tune your models, you'll witness the fascinating process of generating images that meet specific criteria. It's a blend of creativity, technical prowess, and a bit of patience. Ready to take on the Bottle Challenge with CGANs? Let's get started and see where this adventure leads!

References

Alexander Gribtsov