Prompt Generator for Stable Diffusion#

Note

Install ekorpkit package first.

Set logging level to Warning, if you don’t want to see verbose logging.

If you run this notebook in Colab, set Hardware accelerator to GPU.

%pip install -U --pre ekorpkit[art]

exit()

Preparing the environment#

from ekorpkit import eKonf

if eKonf.is_colab():
    eKonf.mount_google_drive()
ws = eKonf.set_workspace(
    workspace="/content/drive/MyDrive/workspace",
    project="ekorpkit-book", 
    task="aiart", 
    log_level="INFO"
)
print("version:", ws.version)
print("project_dir:", ws.project_dir)
Hide code cell output
INFO:ekorpkit.utils.notebook:Google Colab not detected.
version: 0.1.40.post0.dev56
project_dir: /content/drive/MyDrive/workspace/projects/ekorpkit-book
time: 361 ms (started: 2022-12-16 00:49:14 +00:00)

Load a Generator and Generate Prompts#

To download a certain dataset or model checkpoint, you may need to provide a HuggingFace API token. You can get one from here.

# Set HuggingFace API token
ws.secrets.HUGGING_FACE_HUB_TOKEN = "YOUR_TOKEN"
# Set HuggingFace API token
# ws.secrets.HUGGING_FACE_HUB_TOKEN = "YOUR_TOKEN"

ws.secrets.HUGGING_FACE_HUB_TOKEN
SecretStr('**********')
time: 22.5 ms (started: 2022-12-15 05:45:26 +00:00)
# Set CUDA DEVICES for the model

ws.envs.CUDA_VISIBLE_DEVICES = "1,2"
INFO:ekorpkit.base:Set environment variable CUDA_VISIBLE_DEVICES=1,2
time: 20.7 ms (started: 2022-12-15 05:45:26 +00:00)
from ekorpkit.tasks.nlp import PromptGenerator

pgen = PromptGenerator()
2022-12-15 05:45:27.080526: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:absl:Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
INFO:absl:Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter CUDA Host
INFO:absl:Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
WARNING:ekorpkit.models.transformer.trainers.base:Process rank: -1, device: cuda:0, n_gpu: 2, distributed training: False, 16-bits training: True
INFO:ekorpkit.base:No method defined to call
time: 7.54 s (started: 2022-12-15 05:45:26 +00:00)

Loading a model#

Before loading a model, you need to train a model first. To train a model, refer to the Training a Generator section.

# pgen.load_model(model_name="ekorpkit/stable-prompts")

pgen.load_model(model_name="ekorpkit/prompt_parrot")
[INFO|tokenization_utils_base.py:1773] 2022-12-15 05:45:33,729 >> loading file vocab.json
[INFO|tokenization_utils_base.py:1773] 2022-12-15 05:45:33,730 >> loading file merges.txt
[INFO|tokenization_utils_base.py:1773] 2022-12-15 05:45:33,730 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:1773] 2022-12-15 05:45:33,731 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1773] 2022-12-15 05:45:33,731 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1773] 2022-12-15 05:45:33,731 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:898] 2022-12-15 05:45:33,784 >> Assigning <pad> to the pad_token key of the tokenizer
[INFO|tokenization_utils_base.py:898] 2022-12-15 05:45:33,785 >> Assigning <bop> to the bos_token key of the tokenizer
[INFO|tokenization_utils_base.py:898] 2022-12-15 05:45:33,785 >> Assigning <eop> to the eos_token key of the tokenizer
[INFO|configuration_utils.py:652] 2022-12-15 05:45:33,786 >> loading configuration file /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/prompt_parrot/config.json
[INFO|configuration_utils.py:706] 2022-12-15 05:45:33,787 >> Model config GPT2Config {
  "_name_or_path": "/content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/prompt_parrot",
  "_num_labels": 1,
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 6,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "torch_dtype": "float32",
  "transformers_version": "4.24.0",
  "use_cache": true,
  "vocab_size": 50260
}

[INFO|modeling_utils.py:2155] 2022-12-15 05:45:33,797 >> loading weights file /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/prompt_parrot/pytorch_model.bin
[INFO|modeling_utils.py:2608] 2022-12-15 05:45:35,180 >> All model checkpoint weights were used when initializing GPT2LMHeadModel.

[INFO|modeling_utils.py:2616] 2022-12-15 05:45:35,186 >> All the weights of GPT2LMHeadModel were initialized from the model checkpoint at /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/prompt_parrot.
If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training.
time: 1.62 s (started: 2022-12-15 05:45:33 +00:00)

Generating prompts#

You can generate prompts using the generate_prompts function. The function takes the following arguments:

  • prompt: The prompt to be used for generating prompts. If None, the prompt will be generated automatically.

  • num_prompts_to_generate: The number of prompts to be generated.

  • generate_images: Whether to generate images for the prompts.

  • num_samples: The number of images to be generated for each prompt.

  • For other arguments, refer to the following code.

pgen.method.generate.dict()
{'prompt': None,
 'num_prompts_to_generate': 5,
 'max_prompt_length': 50,
 'min_prompt_length': 30,
 'temperature': 1.2,
 'top_k': 70,
 'top_p': 0.9}
time: 2.21 ms (started: 2022-12-15 05:45:38 +00:00)
prompts = pgen.generate_prompts(
    batch_name = "m6-forest",
    prompt="tamed lions in the forest in an henri rousseau style",
    num_prompts_to_generate=5,
    generate_images=True,
    num_samples=3,
    max_prompt_length=60,
    top_p=0.8,
    temperature=0.95,
)
prompts
INFO:ekorpkit.visualize.collage:Creating collage of 3 images with 3 columns from 3 images
INFO:ekorpkit.visualize.collage:Saved collage to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/m6-forest/m6-forest.text_prompts(30)_run_configs_text_prompts(0).png
Prompt[0]: tamed lions in the forest in an henri rousseau style, surrealism, gothic, magical realism, digital art, highly detailed illustration, surrealism, artstation, octane render, oil on canvas, cinematic lighting, vibrant, illustration, aesthetic, artstation, dark ret
../../../_images/d7e7d0c11452c8bf69131a7d70b8f3f04e773f3c637c5e5e09710dbfdfd7a186.png
INFO:ekorpkit.visualize.collage:Creating collage of 3 images with 3 columns from 3 images
INFO:ekorpkit.visualize.collage:Saved collage to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/m6-forest/m6-forest.text_prompts(30)_run_configs_text_prompts(1).png
Prompt[1]: tamed lions in the forest in an henri rousseau style, dramatic scenery, award winning photograph, artstation, concept art, cinematic lighting, vibrant, deviantart, by Jordan Grimmer and RHADS and Gilbert Williams, german romanticism, vibrant, cinematic, artstation,
../../../_images/77c910ca0528c3ad64ecce321574ac54ff76514b6e47fc2849a017512e081dbf.png
INFO:ekorpkit.visualize.collage:Creating collage of 3 images with 3 columns from 3 images
INFO:ekorpkit.visualize.collage:Saved collage to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/m6-forest/m6-forest.text_prompts(30)_run_configs_text_prompts(2).png
Prompt[2]: tamed lions in the forest in an henri rousseau style by Jordan Grimmer and RHADS and Gilbert Williams, artstation, deviantart, dramatic lighting, vibrant, oil on canvas, moody palette, digital painting, moody palette, moody palette, vibrant, character concept
../../../_images/6b5dffee926b505a428b385bd34801aa9bf6bd5e1d9b3ff5a2e7dc081c20236b.png
INFO:ekorpkit.visualize.collage:Creating collage of 3 images with 3 columns from 3 images
INFO:ekorpkit.visualize.collage:Saved collage to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/m6-forest/m6-forest.text_prompts(30)_run_configs_text_prompts(3).png
Prompt[3]: tamed lions in the forest in an henri rousseau style, digital art, beautiful painting, magical realism, octane render, masterpiece, award winning, hdr, soft render, dark aesthetic, 8k, oil on canvas, deviantart, aesthetic, watercolor, highly detailed
../../../_images/946a7cba6ccb93e5a4c07418e994ed51c9316bbc9f425c48e9f82229bedf17f8.png
INFO:ekorpkit.visualize.collage:Creating collage of 3 images with 3 columns from 3 images
INFO:ekorpkit.visualize.collage:Saved collage to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/m6-forest/m6-forest.text_prompts(30)_run_configs_text_prompts(4).png
Prompt[4]: tamed lions in the forest in an henri rousseau style, oil on canvas, award winning, artstation, magical realism, moody palette, oil on canvas, award winning, german romanticism, character concept, oil on canvas, aesthetic, moody palette, cinematic lighting,
../../../_images/408211ff1825fb48c8c450ea8faed65f8200b713a88607bd123a2575e6c58146.png
['tamed lions in the forest in an henri rousseau style, surrealism, gothic, magical realism, digital art, highly detailed illustration, surrealism, artstation, octane render, oil on canvas, cinematic lighting, vibrant, illustration, aesthetic, artstation, dark ret',
 'tamed lions in the forest in an henri rousseau style, dramatic scenery, award winning photograph, artstation, concept art, cinematic lighting, vibrant, deviantart, by Jordan Grimmer and RHADS and Gilbert Williams, german romanticism, vibrant, cinematic, artstation,',
 'tamed lions in the forest in an henri rousseau style by Jordan Grimmer and RHADS and Gilbert Williams, artstation, deviantart, dramatic lighting, vibrant, oil on canvas, moody palette, digital painting, moody palette, moody palette, vibrant, character concept',
 'tamed lions in the forest in an henri rousseau style, digital art, beautiful painting, magical realism, octane render, masterpiece, award winning, hdr, soft render, dark aesthetic, 8k, oil on canvas, deviantart, aesthetic, watercolor, highly detailed',
 'tamed lions in the forest in an henri rousseau style, oil on canvas, award winning, artstation, magical realism, moody palette, oil on canvas, award winning, german romanticism, character concept, oil on canvas, aesthetic, moody palette, cinematic lighting,']
time: 1min 40s (started: 2022-12-15 05:45:40 +00:00)

Generating images for prompts#

results = pgen.generate_images(
    prompts=prompts,
    num_samples=3,
    num_inference_steps=50,
)
Prompt[0]: people looking out a window at night in ancient ruins, in a highly detailed epic CG render, dramatic light, epic shadows, dramatic lightinga 3D portrait of a dragon from a fantasy fantasy medium portrait in a beautiful fantasy, elegant, high detail,
../../../_images/c320b904c478aa0727e9da061e7648a8b9fa8cd76e7ef46c541f350fa4dcdda3.png
Prompt[1]: people looking out a window, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse
../../../_images/2aca1fe76b01e84253d979bb320f72dadfe16ca6ddb0658a903dda3bda8e47b5.png
Prompt[2]: people looking out a window at night, in the style of Daniel Craig and Steve Austin. trending on artstationin a desert, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth,
../../../_images/788e9c306a2abe125600ac883c0830852dd0b66ac0beb35c90b40d986f5276c0.png
Prompt[3]: people looking out a window at a beautiful view over the bridge on theland, the riverbeds of mountains, beautiful dramatic lighting, beautiful landscape, perfect face, intricate details, artstationhd, cgsocietya photo of an orange
../../../_images/68cee189e994e54c65bb36e7b754b4174443eeb5784fe1319b35f2573640b884.png
Prompt[4]: people looking out a window. The eyes, fine details, elegant, by greg rutkowski and alphonse muchacloseup shot of a cyberpunk robot in front of a dark dark cave, full of colors, vibrant, intricate,
../../../_images/a2a764363be99356da553ab2bc6e72a601ad3a407c85ab8c9431dff84201742c.png
time: 1min 28s (started: 2022-12-02 21:12:49 +00:00)

Generating images for one prompt#

results = pgen.imagine(
    text_prompts=prompts[0],
    num_samples=6,
    num_inference_steps=50,
    guidance_scale=10,
)
Prompt: people looking out a window at night in ancient ruins, in a highly detailed epic CG render, dramatic light, epic shadows, dramatic lightinga 3D portrait of a dragon from a fantasy fantasy medium portrait in a beautiful fantasy, elegant, high detail,
../../../_images/93726033746b96aa1a510191321f0be1915cb9a77429b28043ff2d89f27617d2.png
time: 33.5 s (started: 2022-12-02 21:14:27 +00:00)

Training a Generator#

Preparing a dataset#

You can use any dataset you want. However, the dataset should be in the format of HuggingFace Datasets.

Using a dataset from HuggingFace Hub#

To track runs with wandb, you may need to provide a Weights & Biases API Key. You can get one from here.

# Set WANDB API KEY
eKonf.os.secrets.WANDB_API_KEY = "YOUR_KEY"
# Set WANDB API KEY
# eKonf.os.secrets.WANDB_API_KEY = "YOUR_KEY"

print(eKonf.os.secrets.WANDB_API_KEY)
**********
time: 21.3 ms (started: 2022-12-13 06:47:50 +00:00)
pgen.dataset.validation_split_percentage = 5
pgen.load_datasets("Gustavosta/Stable-Diffusion-Prompts")
pgen.raw_datasets
INFO:root:Loading dataset Gustavosta/Stable-Diffusion-Prompts
WARNING:datasets.builder:Using custom data configuration Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb
INFO:datasets.builder:Overwrite dataset info from restored data version.
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec
WARNING:datasets.builder:Reusing dataset parquet (/content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec
WARNING:datasets.builder:Using custom data configuration Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb
INFO:datasets.builder:Overwrite dataset info from restored data version.
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec
WARNING:datasets.builder:Reusing dataset parquet (/content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec
WARNING:datasets.builder:Using custom data configuration Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb
INFO:datasets.builder:Overwrite dataset info from restored data version.
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec
WARNING:datasets.builder:Reusing dataset parquet (/content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec
DatasetDict({
    train: Dataset({
        features: ['Prompt'],
        num_rows: 70032
    })
    test: Dataset({
        features: ['Prompt'],
        num_rows: 8192
    })
    validation: Dataset({
        features: ['Prompt'],
        num_rows: 3686
    })
})
time: 14.8 s (started: 2022-12-14 09:53:25 +00:00)
model_name = "ekorpkit/stable-prompts"

pgen.dataset.line_by_line = False
pgen.trainer.num_train_epochs = 1
pgen.trainer.logging_steps = 100
pgen.model.model_name = model_name
# pgen.model.ignore_model_path = True

pgen.train()
INFO:ekorpkit.config:Initalized batch: m6-forest(15) in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart
INFO:ekorpkit.models.transformer.trainers.base:Moving model to device: cuda:0
INFO:datasets.arrow_dataset:Caching processed dataset at /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-c18d25663c203982.arrow
INFO:datasets.arrow_dataset:Caching processed dataset at /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-3aa9bea172d94ae9.arrow
INFO:datasets.arrow_dataset:Caching processed dataset at /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-a46354bdc81f9d52.arrow
INFO:datasets.arrow_dataset:Caching processed dataset at /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-3fc91025cb36262f.arrow
INFO:datasets.arrow_dataset:Caching processed dataset at /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-956d8af8599c87e9.arrow
INFO:datasets.arrow_dataset:Caching processed dataset at /content/drive/MyDrive/workspace/.cache/Gustavosta___parquet/Gustavosta--Stable-Diffusion-Prompts-d22aeec0ba2a9fdb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-cdf3af7b5477eadc.arrow
[INFO|trainer.py:557] 2022-12-13 07:48:06,993 >> Using cuda_amp half precision backend
INFO:ekorpkit.config:Saving config to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/m6-forest/configs/m6-forest(15)_config.yaml
[INFO|trainer.py:725] 2022-12-13 07:48:10,030 >> The following columns in the training set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
[INFO|trainer.py:1608] 2022-12-13 07:48:10,039 >> ***** Running training *****
[INFO|trainer.py:1609] 2022-12-13 07:48:10,039 >>   Num examples = 4242
[INFO|trainer.py:1610] 2022-12-13 07:48:10,039 >>   Num Epochs = 1
[INFO|trainer.py:1611] 2022-12-13 07:48:10,040 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:1612] 2022-12-13 07:48:10,040 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:1613] 2022-12-13 07:48:10,040 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:1614] 2022-12-13 07:48:10,040 >>   Total optimization steps = 265
[INFO|trainer.py:1615] 2022-12-13 07:48:10,041 >>   Number of trainable parameters = 81914880
[INFO|integrations.py:680] 2022-12-13 07:48:10,042 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Currently logged in as: entelecheia. Use `wandb login --relogin` to force relogin
wandb version 0.13.6 is available! To upgrade, please run: $ pip install wandb --upgrade
Tracking run with wandb version 0.13.5
Run data is saved locally in /content/drive/MyDrive/workspace/projects/ekorpkit-book/logs/wandb/run-20221213_074811-2skuvouv
[WARNING|logging.py:275] 2022-12-13 07:48:17,830 >> You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[265/265 01:26, Epoch 0/1]
Step Training Loss Validation Loss

[INFO|trainer.py:1859] 2022-12-13 07:49:48,614 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:2678] 2022-12-13 07:49:48,617 >> Saving model checkpoint to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/stable-prompts
[INFO|configuration_utils.py:447] 2022-12-13 07:49:48,619 >> Configuration saved in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/stable-prompts/config.json
[INFO|modeling_utils.py:1624] 2022-12-13 07:49:49,134 >> Model weights saved in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/stable-prompts/pytorch_model.bin
[INFO|tokenization_utils_base.py:2125] 2022-12-13 07:49:49,135 >> tokenizer config file saved in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/stable-prompts/tokenizer_config.json
[INFO|tokenization_utils_base.py:2132] 2022-12-13 07:49:49,136 >> Special tokens file saved in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/stable-prompts/special_tokens_map.json
INFO:ekorpkit.models.transformer.trainers.base:*** Evaluate ***
[INFO|trainer.py:725] 2022-12-13 07:49:49,225 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
[INFO|trainer.py:2929] 2022-12-13 07:49:49,228 >> ***** Running Evaluation *****
[INFO|trainer.py:2931] 2022-12-13 07:49:49,228 >>   Num examples = 222
[INFO|trainer.py:2934] 2022-12-13 07:49:49,228 >>   Batch size = 2
***** train metrics *****
  epoch                    =        1.0
  total_flos               =  1031810GF
  train_loss               =     1.8296
  train_runtime            = 0:01:38.57
  train_samples            =       4242
  train_samples_per_second =     43.034
  train_steps_per_second   =      2.688
[111/111 00:01]
***** eval metrics *****
  epoch                   =        1.0
  eval_loss               =     1.7144
  eval_runtime            = 0:00:01.98
  eval_samples            =        222
  eval_samples_per_second =    111.995
  eval_steps_per_second   =     55.997
  perplexity              =     5.5533
[INFO|modelcard.py:449] 2022-12-13 07:49:52,110 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}, 'dataset': {'name': 'Gustavosta/Stable-Diffusion-Prompts', 'type': 'Gustavosta/Stable-Diffusion-Prompts'}}
time: 2min 8s (started: 2022-12-13 07:47:44 +00:00)
pgen.load_model(model_name=model_name)
prompts = pgen.generate_prompts(
    prompt="people looking out a lonely city street",
    num_prompts_to_generate=2,
    generate_images=True,
    num_samples=2,
)
INFO:ekorpkit.visualize.collage:Creating collage of 2 images with 2 columns from 2 images
INFO:ekorpkit.visualize.collage:Saved collage to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/m6-forest_batch/m6-forest_batch.text_prompts(0)_run_configs_text_prompts(0).png
Prompt[0]: people looking out a lonely city street looking out onto a country road, looking at the ocean, highly detailed, digital painting, artstation, concept art, soft light, sharp focus, illustration, art by artgerm and greg rutkowski
../../../_images/97610b58718f4fb436b35765bc2d6fb89e35d295e7fa585df573018c932d9b6a.png
INFO:ekorpkit.visualize.collage:Creating collage of 2 images with 2 columns from 2 images
INFO:ekorpkit.visualize.collage:Saved collage to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/m6-forest_batch/m6-forest_batch.text_prompts(0)_run_configs_text_prompts(1).png
Prompt[1]: people looking out a lonely city street in the distance, anime fantasy illustration, fantasy, highly detailed, digital painting, trending on artstation, smooth, sharp focus, illustration, dreamlike, Artstationbeautiful fantasy digital art portrait painting of an anime
../../../_images/5e8e770aba9435a96842036bd4029866595217f015e8b36a3ece9edf97851fca.png
time: 35.4 s (started: 2022-12-13 07:49:57 +00:00)

Using a dataset from a text file#

prompt_uri = "https://raw.githubusercontent.com/entelecheia/ekorpkit-book/main/assets/data/prompt_parrot.txt"
pgen.load_datasets(train_file=prompt_uri)
pgen.raw_datasets
INFO:root:Loading validation dataset None
WARNING:datasets.builder:Using custom data configuration default-681f997af4470be8
INFO:datasets.builder:Overwrite dataset info from restored data version.
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/text/default-681f997af4470be8/0.0.0/21a506d1b2b34316b1e82d0bd79066905d846e5d7e619823c0dd338d6f1fa6ad
WARNING:datasets.builder:Reusing dataset text (/content/drive/MyDrive/workspace/.cache/text/default-681f997af4470be8/0.0.0/21a506d1b2b34316b1e82d0bd79066905d846e5d7e619823c0dd338d6f1fa6ad)
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/text/default-681f997af4470be8/0.0.0/21a506d1b2b34316b1e82d0bd79066905d846e5d7e619823c0dd338d6f1fa6ad
INFO:root:Loading training dataset https://raw.githubusercontent.com/entelecheia/ekorpkit-book/main/assets/data/prompt_parrot.txt
WARNING:datasets.builder:Using custom data configuration default-681f997af4470be8
INFO:datasets.builder:Overwrite dataset info from restored data version.
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/text/default-681f997af4470be8/0.0.0/21a506d1b2b34316b1e82d0bd79066905d846e5d7e619823c0dd338d6f1fa6ad
WARNING:datasets.builder:Reusing dataset text (/content/drive/MyDrive/workspace/.cache/text/default-681f997af4470be8/0.0.0/21a506d1b2b34316b1e82d0bd79066905d846e5d7e619823c0dd338d6f1fa6ad)
INFO:datasets.info:Loading Dataset info from /content/drive/MyDrive/workspace/.cache/text/default-681f997af4470be8/0.0.0/21a506d1b2b34316b1e82d0bd79066905d846e5d7e619823c0dd338d6f1fa6ad
DatasetDict({
    validation: Dataset({
        features: ['text'],
        num_rows: 9
    })
    train: Dataset({
        features: ['text'],
        num_rows: 176
    })
})
time: 2.65 s (started: 2022-12-14 09:53:48 +00:00)
model_name="ekorpkit/prompt_parrot"

pgen.dataset.line_by_line = True
pgen.trainer.num_train_epochs = 10
pgen.trainer.logging_steps = 100
pgen.model.model_name = model_name

pgen.train()
INFO:ekorpkit.config:Initalized batch: prompt-generator(2) in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart
INFO:datasets.arrow_dataset:Caching processed dataset at /content/drive/MyDrive/workspace/.cache/text/default-681f997af4470be8/0.0.0/21a506d1b2b34316b1e82d0bd79066905d846e5d7e619823c0dd338d6f1fa6ad/cache-1a86cc9859717c6c.arrow
INFO:datasets.arrow_dataset:Caching processed dataset at /content/drive/MyDrive/workspace/.cache/text/default-681f997af4470be8/0.0.0/21a506d1b2b34316b1e82d0bd79066905d846e5d7e619823c0dd338d6f1fa6ad/cache-f4352de44b50776d.arrow
[INFO|trainer.py:557] 2022-12-13 06:53:56,462 >> Using cuda_amp half precision backend
INFO:ekorpkit.config:Saving config to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/prompt-generator/configs/prompt-generator(2)_config.yaml
[INFO|trainer.py:725] 2022-12-13 06:53:58,046 >> The following columns in the training set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
[INFO|trainer.py:1608] 2022-12-13 06:53:58,055 >> ***** Running training *****
[INFO|trainer.py:1609] 2022-12-13 06:53:58,055 >>   Num examples = 176
[INFO|trainer.py:1610] 2022-12-13 06:53:58,056 >>   Num Epochs = 10
[INFO|trainer.py:1611] 2022-12-13 06:53:58,056 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:1612] 2022-12-13 06:53:58,057 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:1613] 2022-12-13 06:53:58,057 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:1614] 2022-12-13 06:53:58,058 >>   Total optimization steps = 110
[INFO|trainer.py:1615] 2022-12-13 06:53:58,058 >>   Number of trainable parameters = 81914880
[INFO|integrations.py:680] 2022-12-13 06:53:58,060 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[110/110 00:35, Epoch 10/10]
Step Training Loss Validation Loss

[INFO|trainer.py:1859] 2022-12-13 06:54:34,698 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:2678] 2022-12-13 06:54:34,701 >> Saving model checkpoint to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/prompt_parrot
[INFO|configuration_utils.py:447] 2022-12-13 06:54:34,703 >> Configuration saved in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/prompt_parrot/config.json
[INFO|modeling_utils.py:1624] 2022-12-13 06:54:35,166 >> Model weights saved in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/prompt_parrot/pytorch_model.bin
[INFO|tokenization_utils_base.py:2125] 2022-12-13 06:54:35,168 >> tokenizer config file saved in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/prompt_parrot/tokenizer_config.json
[INFO|tokenization_utils_base.py:2132] 2022-12-13 06:54:35,168 >> Special tokens file saved in /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/models/ekorpkit/prompt_parrot/special_tokens_map.json
INFO:ekorpkit.models.transformer.trainers.base:*** Evaluate ***
[INFO|trainer.py:725] 2022-12-13 06:54:35,260 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
[INFO|trainer.py:2929] 2022-12-13 06:54:35,263 >> ***** Running Evaluation *****
[INFO|trainer.py:2931] 2022-12-13 06:54:35,263 >>   Num examples = 9
[INFO|trainer.py:2934] 2022-12-13 06:54:35,264 >>   Batch size = 2
***** train metrics *****
  epoch                    =       10.0
  total_flos               =   428298GF
  train_loss               =      3.136
  train_runtime            = 0:00:36.64
  train_samples            =        176
  train_samples_per_second =     48.034
  train_steps_per_second   =      3.002
[5/5 00:00]
***** eval metrics *****
  epoch                   =       10.0
  eval_loss               =     3.5238
  eval_runtime            = 0:00:00.08
  eval_samples            =          9
  eval_samples_per_second =     102.68
  eval_steps_per_second   =     57.044
  perplexity              =    33.9116
[INFO|modelcard.py:449] 2022-12-13 06:54:36,253 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
time: 42.3 s (started: 2022-12-13 06:53:54 +00:00)
pgen.load_model(model_name=model_name)
prompts = pgen.generate_prompts(
    prompt="people looking out a lonely city street",
    num_prompts_to_generate=2,
    generate_images=True,
    num_samples=2,
)
INFO:ekorpkit.visualize.collage:Creating collage of 2 images with 2 columns from 2 images
INFO:ekorpkit.visualize.collage:Saved collage to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/prompt-generator_batch/prompt-generator_batch.text_prompts(0)_run_configs_text_prompts(0).png
Prompt[0]: people looking out a lonely city street, foggy foggy streets, trending on artstation, oil on canvas, award-winning, dramatic lighting, vibrant, vibrant, vibrant, surrealism, high coherence, foggy skies, low contrast,
../../../_images/34e88ae83bea833f39421c8778e120a68fb89959107158aec9d4de2bedf618ce.png
INFO:ekorpkit.visualize.collage:Creating collage of 2 images with 2 columns from 2 images
INFO:ekorpkit.visualize.collage:Saved collage to /content/drive/MyDrive/workspace/projects/ekorpkit-book/aiart/outputs/prompt-generator_batch/prompt-generator_batch.text_prompts(0)_run_configs_text_prompts(1).png
Prompt[1]: people looking out a lonely city street in an industrial field of flowers landscape painting by Thomas Kinkade and Lawrence Alma-Tadema, digital art, beautiful composition, vibrant palette, moody palette, concept art, award winning, octane soft
../../../_images/a45f220b1bdd1f5a03d6ca3dff9857caa625506bb80539606d6aa8e15f2d342b.png
time: 1min 36s (started: 2022-12-13 06:56:18 +00:00)

References#