Bloom Examples#

bloom

Transparency, openness, and inclusivity#

While most major LLMs have been trained exclusively on English text, BLOOM’s training corpus includes 46 natural languages and 13 programming languages. This makes it useful for the many regions where English is not the main language.

BLOOM is also a break from the de facto reliance on big tech to train models. One of the main problems of LLMs is the prohibitive costs of training and tuning them. This hurdle has made 100-billion-parameter LLMs the exclusive domain of big tech companies with deep pockets. Recent years have seen AI labs gravitate toward big tech to gain access to subsidized cloud compute resources and fund their research.

The BLOOM research team has been completely transparent about the entire process of training the model. They have published the dataset, the meeting notes, discussions, and code, as well as the logs and technical details of training the model.

BLOOM Architecture#

BLOOM is a causal model language, which means that it was trained as a next-token predictor. This apparently simple strategy of predicting the next token in a sentence, based on a set of preceding tokens, has shown to capture certain degree of reasoning abilities for large language models (arXiv:2205.11916). This enables BLOOM and similar models to connect multiple concepts in a sentence and manage to solve non-trivial problems such as arithmetic, translation, and programming with fair accuracy. BLOOM uses a Transformer architecture composed of an input embeddings layer, 70 Transformer blocks, and an output language-modeling layer, as shown in the figure below. Each Transformer block has a self-attention layer and a multi-layer perceptron layer, with input and post-attention layer norms.

To predict the next token in a sentence using BLOOM, we simply need to sequentially pass the input tokens (in the form of embeddings) through each of 70 BLOOM blocks. Given that this is a sequential operation, we can load into RAM only one block at a time to avoid memory overflow. Similarly, the word embeddings and output language-modeling layer can be loaded on-demand from disk.

Pre-trained BLOOM checkpoints#

From BigScience repository (https://huggingface.co/bigscience), you can find various versions of the model.

Download checkpoints#

cf) The original bloom model is very big with a size of about 350GB.

!pip install transformers 
from transformers import AutoModel, AutoTokenizer

model_path = "/workspace/data/tbts/archive/models/bloom/bloom" # replace with your local folder path
model_uri = "bigscience/bloom"

model = AutoModel.from_pretrained(model_uri)
model.save_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_uri)
tokenizer.save_pretrained(model_path)

!ls $model_path

Check file list and disk usage

model_path = "/workspace/data/tbts/archive/models/bloom" # replace with your local folder path

!du -h $model_path -d 2
657G	/workspace/data/tbts/archive/models/bloom/bloom
6.5G	/workspace/data/tbts/archive/models/bloom/bloom-1b3
664G	/workspace/data/tbts/archive/models/bloom

Bloom local inference#

Load Bloom model and tokenizer.

from transformers import pipeline

model_uri = "bigscience/bloom-1b3"

pipe = pipeline(model=model_uri, task="text-generation", device=7)

Inference function for bloom models#

def infer_local(
    prompt,
    temperature=0.7,
    top_k=None,
    top_p=None,
    max_new_tokens=50,
    repetition_penalty=None,
    do_sample=False,
    num_return_sequences=1,
    num_beams=None,
    no_repeat_ngram_size=None,
    early_stopping=False,
    return_full_text=True,
):
    response = pipe(
        prompt,
        temperature=temperature,  # 0 to 1
        top_k=top_k,
        top_p=top_p,  # None, 0-1
        max_new_tokens=max_new_tokens,  # up to 2047 theoretically
        return_full_text=return_full_text,  # include prompt or not.
        repetition_penalty=repetition_penalty,  # None, 0-100 (penalty for repeat tokens.
        do_sample=do_sample,  # True: use sampling, False: Greedy decoding.
        num_return_sequences=num_return_sequences,
        num_beams=num_beams,
        no_repeat_ngram_size=no_repeat_ngram_size,
        early_stopping=early_stopping,
    )
    return response
prompt = "Bloom is better than GPT-3"
result_length = 100
  • result_length: the size of the response (in tokens) we get for the prompt from the model.

  • inputs: the embedding representation of prompt, encoded for use specifically by PyTorch.

Sampling Top-k + Top-p#

infer_local(
    prompt,
    temperature=None,
    max_new_tokens=result_length,
    do_sample=True,
    top_k=50,
    top_p=0.9,
)
[{'generated_text': 'Bloom is better than GPT-3, but it requires large amount of memory.\nC) How many threads are needed to achieve a certain throughput?\nThe average throughput of the code running on three machines is 1.0, 2.2 and 3.6 GB/s.\nThe code of the code of this program will run on a cluster of workstation with 32 processors. According to the discussion above, the cluster will need to handle 8,000 requests in 1 second. If each workstation handles 1 request, and the cluster consists of'}]

Bloom api inference#

from huggingface_hub import notebook_login
from huggingface_hub import HfFolder

#enter your API key, you can make one for free on HF
notebook_login()
from huggingface_hub import InferenceApi

model_uri = "bigscience/bloom"

inference = InferenceApi(model_uri, token=HfFolder.get_token())
def infer_api(
    prompt,
    temperature=0.7,
    top_k=None,
    top_p=None,
    max_new_tokens=50,
    repetition_penalty=None,
    do_sample=False,
    num_return_sequences=1,
    num_beams=None,
    no_repeat_ngram_size=None,
    early_stopping=False,
    return_full_text=True,
    seed=123,
):
    top_k = None if top_k == 0 else top_k
    top_p = None if num_beams else top_p
    num_beams = None if num_beams == 0 else num_beams
    no_repeat_ngram_size = None if num_beams is None else no_repeat_ngram_size
    early_stopping = None if num_beams is None else num_beams > 0

    params = {
        "max_new_tokens": max_new_tokens,
        "top_k": top_k,
        "top_p": top_p,
        "temperature": temperature,
        "do_sample": do_sample,
        "early_stopping":early_stopping,
        "no_repeat_ngram_size":no_repeat_ngram_size,
        "num_beams":num_beams,
        "return_full_text":return_full_text,
        "repetition_penalty": repetition_penalty,
        "seed": seed,
    }
    
    response = inference(prompt, params=params)
    return response

Greedy Search#

infer_api(prompt, temperature=None, max_new_tokens=result_length)
[{'generated_text': 'Bloom is better than GPT-3, but not by much. GPT-3 is better than GPT-2, but not by much. GPT-2 is better than GPT-1, but not by much. GPT-1 is better than GPT-0, but not by much. GPT-0 is better than GPT-1, but not by much. GPT-1 is better than GPT-0, but not by much. GPT-0 is better than GPT-1, but not by much. GPT-1 is'}]

Beam Search#

infer_api(
    prompt,
    temperature=None,
    max_new_tokens=result_length,
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True,
)
[{'generated_text': 'Bloom is better than GPT-3, but not by much. GPT-3 is better than GPT-2, but not by much. GPT-2 is better than GPT-1, but not by much. GPT-1 is better than GPT-0, but not by much. GPT-0 is better than GPT-1, but not by much. GPT-1 is better than GPT-0, but not by much. GPT-0 is better than GPT-1, but not by much. GPT-1 is'}]

Sampling Top-k + Top-p#

infer_api(
    prompt, 
    temperature=None, 
    max_new_tokens=result_length, 
    do_sample=True,
    top_k=50,
    top_p=0.9,
)
[{'generated_text': 'Bloom is better than GPT-3.\nWhat is the best language to write AI for chatbots? - Quora\n\nI don\'t think it\'s a matter of programming language. I think it\'s a matter of AI algorithm. The algorithms you should use will depend on the type of your chatbot. In general, neural networks are widely used and I think that your question is a duplicate to this one on StackOverflow:\n\nWhat neural network should I use for a chatbot?\n\nA:\n\nI am not sure that the term "chatbot'}]
prompt = "One of the hottest areas of investing in recent years has been ESG: "
prompt += "the use of environmental, social, and governance criteria to evaluate possible investments"

res = infer_api(
    prompt,
    temperature=None,
    max_new_tokens=result_length,
    do_sample=True,
    top_k=100,
    top_p=0.95,
)
print(res[0]["generated_text"])
One of the hottest areas of investing in recent years has been ESG: the use of environmental, social, and governance criteria to evaluate possible investments. ESG investing is part of the growing trend of sustainable investing, where financial performance is considered in the context of environmental and social impacts, as well as how a business operates.
It’s a trend that’s on the rise in other areas of banking as well. For example, there are now programs where consumers can opt to receive lower interest rates for using environmentally friendly products and services. Many banks are also changing the way they measure performance: in some cases, they’re no longer reporting on assets

Translate with Bloom#

from ekorpkit import eKonf
from ekorpkit.models.bloom.demo import BloomDemo

hf_user_access_token = eKonf.osenv(
    "HF_USER_ACCESS_TOKEN"
)  # Set to your HF Access Token
demo = BloomDemo(
    model_uri="bigscience/bloom", device=6, hf_user_access_token=hf_user_access_token
)
2022-09-19 00:31:56.238331: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:ekorpkit.base:Loaded .env from /workspace/projects/ekorpkit-book/config/.env
INFO:ekorpkit.base:Loaded .env from /workspace/projects/ekorpkit-book/config/.env
demo.infer("Hi.")

Create widgets#

options = ["English", "Spanish", "French"]
from_lang = eKonf.create_dropdown(options, "English", "From Language")
to_lang = eKonf.create_dropdown(options, "Spanish", "To Language")
input_prompt = eKonf.create_textarea(
    "I am a student",
    "Input",
    "Enter the sentence to translate",
    style={"description_width": "50px"},
    layout={"width": "95%", "height": "100px"},
)
generated_txt = eKonf.create_textarea(
    "",
    "Output",
    "Translated sentence",
    style={"description_width": "50px"},
    layout={"width": "95%", "height": "100px"},
)
translate_button = eKonf.create_button("Translate", layout={"width": "95%"})

Register translate fucntion to the translate button for on_click event.#

def on_btn_click(btn):
    generated_txt.value = "infering..."
    input_text = input_prompt.value
    prompt = f'Instruction: translate {from_lang.value} to {to_lang.value} \nInput: "{input_text}" \nOutput:'
    res = demo.infer(
        prompt,
        temperature=None,
        max_new_tokens=int(len(input_text) * 1.5),
        do_sample=True,
        top_k=100,
        top_p=0.95,
    )
    generated_txt.value = res


translate_button.on_click(on_btn_click)

Display widgets on grid#

import ipywidgets as widgets

grid = widgets.GridspecLayout(4, 2, height="300px")
grid[0, 0] = from_lang
grid[0, 1] = to_lang
grid[1, :] = input_prompt
grid[2, :] = generated_txt
grid[3, :] = translate_button
grid

Zero Shot SQL by Bloom#

Create widgets#

instruction = "Instruction: Given an input question, respond with syntactically correct PostgreSQL. Only use table called 'employees'.\n"
instruction += "Input: Select names of all the employees who are working under 'Peter'.\nPostgreSQL query: "

input_prompt = eKonf.create_textarea(
    instruction,
    "Input",
    "Enter the instruction to generate",
    style={"description_width": "50px"},
    layout={"width": "95%", "height": "100px"},
)

generated_txt = eKonf.create_textarea(
    "",
    "Output",
    "Generated SQL",
    style={"description_width": "50px"},
    layout={"width": "95%", "height": "150px"},
)
generate_button = eKonf.create_button("Generate SQL", layout={"width": "95%"})

Register generate fucntion to the generate button for on_click event.#

def on_btn_click(btn):
    generated_txt.value = "generating..."
    prompt = input_prompt.value
    response = demo.infer(
        prompt,
        temperature=None,
        do_sample=True,
        top_k=50,
        top_p=0.97,
    )
    solution = response.split("\nQ:")[0]
    if "\nOutput:" in solution:
        final_solution = solution.split("\nOutput:")[0]
    elif "\n\n" in solution:
        final_solution = solution.split("\n\n")[0]
    else:
        final_solution = solution
    generated_txt.value = solution


generate_button.on_click(on_btn_click)

Display widgets on grid#

import ipywidgets as widgets

grid = widgets.GridspecLayout(3, 1, height="300px", align_items="center")
grid[0, 0] = input_prompt
grid[1, 0] = generated_txt
grid[2, 0] = generate_button
grid

Distracted Boyfriend Meme😄- Using Bloom 🌸#

Create widgets#

prompt = """Distracted from: homework\nby: side project\nDistracted from: goals\nby: new goals\nDistracted from: working hard\nby: hardly working\nDistracted from: twitter\nby: open in browser\nDistracted from:"""
input_prompt = eKonf.create_textarea(
    prompt,
    "Input",
    "Enter the instruction to generate",
    style={"description_width": "50px"},
    layout={"width": "95%", "height": "200px"},
)
in_image_display = eKonf.create_image(
    filename="../figs/deep_nlp/bloom/distracted00.jpg",
    width=500,
)
out_image = eKonf.create_image(
    filename=None,
    width=500,
)
out_image_display = eKonf.create_image(
    filename=None,
    width=500,
)
in_slider_temp = eKonf.create_floatslider(
    min=0.0,
    max=1.0,
    step=0.1,
    value=0.7,
    description="Temperature",
    disabled=False,
    continuous_update=False,
    orientation="horizontal",
    readout=True,
    readout_format=".1f",
)
in_slider_top_p = eKonf.create_floatslider(
    min=0.5,
    max=0.99,
    step=0.01,
    value=0.95,
    description="Top-p",
    disabled=False,
    continuous_update=False,
    orientation="horizontal",
    readout=True,
    readout_format=".2f",
)
generate_button = eKonf.create_button("Generate Memes", layout={"width": "95%"})

Register generate fucntion to the generate button for on_click event.#

import io
import PIL
from PIL import Image
from PIL import ImageDraw


def write_on_image(final_solution):
    image_path0 = "../figs/deep_nlp/bloom/distracted0.jpg"
    image0 = Image.open(image_path0)
    I1 = ImageDraw.Draw(image0)
    font = eKonf.get_imagefont(fontsize=40)

    prompt_list = final_solution.split("\n")
    girlfriend = prompt_list[8].split(":")[1].strip()
    girlfriend_list = girlfriend.split()
    if len(girlfriend_list) >= 2:
        girlfriend = "\n".join(girlfriend_list)
    new_girl = prompt_list[9].split(":")[1].strip()
    new_girl_list = new_girl.split()
    if len(new_girl_list) > 2:
        new_girl = "\n".join(new_girl_list)
    prompt_list.pop(0)
    prompt_list.pop(0)
    prompt_list = prompt_list[:8]
    prompt_list.append("Distracted from:")
    new_prompt = "\n".join(prompt_list)

    I1.text((570, 89), girlfriend, font=font, fill=(255, 255, 255))
    I1.text((427, 233), "ME", font=font, fill=(255, 255, 255))
    I1.text((142, 306), new_girl, font=font, fill=(255, 255, 255))

    img_byte_arr = io.BytesIO()
    image0.save(img_byte_arr, format="PNG")
    img_byte_arr = img_byte_arr.getvalue()

    return img_byte_arr, new_prompt
def on_btn_click(btn):
    out_image_display.value = out_image.value
    prompt = input_prompt.value
    top_p = in_slider_top_p.value
    temp = in_slider_temp.value
    response = demo.infer(
        prompt,
        temperature=temp,
        max_new_tokens=64,
        do_sample=True,
        top_k=50,
        top_p=top_p,
    )
    solution = response.split("\nQ:")[0]
    meme_image, new_prompt = write_on_image(solution)
    out_image_display.value = meme_image


generate_button.on_click(on_btn_click)

Display widgets on grid#

import ipywidgets as widgets

grid = widgets.GridspecLayout(4, 2, height="700px", align_items="center")
grid[0, 0] = in_image_display
grid[0, 1] = out_image_display
grid[1, 0] = in_slider_temp
grid[1, 1] = in_slider_top_p
grid[2, :] = generate_button
grid[3, :] = input_prompt
grid