Gemini API: Starting guide

The Google Gen AI SDK offers a standardized interface for interacting with Gemini models via both the Gemini Developer API and the Gemini API available on Vertex AI. With limited exceptions, code developed for one platform is generally compatible with the other. This notebook demonstrates usage with the Developer API.

Setup

Install SDK

Install the SDK from PyPI.

!pip install -U -q 'google-genai>=1.19.0'
Note: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.

Setup your API key

To run the following cell, your API key must be stored as a enviuronment variable named GOOGLE_API_KEY. If you don’t already have an API key or you aren’t sure how to create a Colab Secret, see Get a Gemini API key.

Alternatively, you can create a .env file with the following contents

# Google GENAI
GOOGLE_GENAI_USE_VERTEXAI=false
GOOGLE_API_KEY=<YOUR_GEMINI_AI_API_KEY>
# Load your .env file as environment variables
import dotenv
dotenv.load_dotenv() 
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File <command-7168534484572408>, line 2
      1 # Load your .env file as environment variables
----> 2 import dotenv
      3 dotenv.load_dotenv()
File /databricks/python_shell/lib/dbruntime/autoreload/discoverability/hook.py:71, in AutoreloadDiscoverabilityHook._patched_import(self, name, *args, **kwargs)
     65 if not self._should_hint and (
     66     (module := sys.modules.get(absolute_name)) is not None and
     67     (fname := get_allowed_file_name_or_none(module)) is not None and
     68     (mtime := os.stat(fname).st_mtime) > self.last_mtime_by_modname.get(
     69         absolute_name, float("inf")) and not self._should_hint):
     70     self._should_hint = True
---> 71 module = self._original_builtins_import(name, *args, **kwargs)
     72 if (fname := fname or get_allowed_file_name_or_none(module)) is not None:
     73     mtime = mtime or os.stat(fname).st_mtime
ModuleNotFoundError: No module named 'dotenv'

Initialize SDK client

With the new SDK, now you only need to initialize a client with you API key.

from google import genai
from google.genai import types
import os

client = genai.Client(api_key=os.getenv('GOOGLE_API_KEY'))

Select a model

Select the model you want to use in this guide. For a full overview of all Gemini models, check the documentation.

# Available models ["gemini-2.0-flash", "gemini-2.0-flash-lite", "gemini-1.5-flash"] 
MODEL = 'gemini-2.0-flash-lite'

Send text prompts

Use the generate_content method to generate responses to your prompts. You can pass text directly to generate_content and use the .text property to get the text content of the response. Note that the .text field will work when there’s only one part in the output.

from IPython.display import Markdown

response = client.models.generate_content(
    model=MODEL,
    contents="What's the largest star on the universe?"
)

Markdown(response.text)

Count tokens

Tokens are the basic inputs to the Gemini models. You can use the count_tokens method to calculate the number of input tokens before sending a request to the Gemini API.

response = client.models.count_tokens(
    model=MODEL,
    contents=["What's the highest mountain in the solar system?"],
)

print(response)

Send multimodal prompts

Use Gemini 2.0 model (gemini-2.0-flash-exp), a multimodal model that supports multimodal prompts. You can include text, PDF documents, images, audio and video in your prompt requests and get text or code responses.

In this first example, you’ll generate an image from a specified prompt, save it as a byte stream and then write those bytes to a local file named ai_generated_image.png.

import IPython

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents='Generate a 3D animation like image featuring aliens and New York city like skyscrappers, use vivid colors and happy environment',
    config=types.GenerateContentConfig(
        response_modalities=['Text', 'Image']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    display(IPython.display.Markdown(part.text))
  elif part.inline_data is not None:
    mime = part.inline_data.mime_type
    print(mime)
    data = part.inline_data.data
    display(IPython.display.Image(data=data))
import pathlib

img_file = 'ai_generated_image.png'
img_path = pathlib.Path(img_file)
img_path.write_bytes(data)

Now open the newly created image and generate a blog post based on the thumbnail.

from PIL import Image
image = Image.open(img_path)
image.thumbnail([512,512])

response = client.models.generate_content(
    model=MODEL,
    contents=[
        image,
        "Write a short and engaging blog post based on the following picture."
    ]
)

display(image)
Markdown(response.text)

Configure model parameters

You can include parameter values in each call that you send to a model to control how the model generates a response. More about parameter values.

response = client.models.generate_content(
    model=MODEL,
    contents="Explain how cryptocurrency works to children",
    config=types.GenerateContentConfig(
        temperature=0.4,
        top_p=0.95,
        top_k=20,
        candidate_count=1,
        seed=5,
        stop_sequences=["STOP!"],
        presence_penalty=0.0,
        frequency_penalty=0.0,
    )
)

Markdown(response.text)

Generate JSON

The controlled generation capability in Gemini API allows you to constraint the model output to a structured format. You can provide the schemas as Pydantic Models or a JSON string.

from pydantic import BaseModel
import json

class Recipe(BaseModel):
    recipe_name: str
    recipe_description: str
    recipe_ingredients: list[str]

response = client.models.generate_content(
    model=MODEL,
    contents="Provide american style food recipe and its ingredients.",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=Recipe,
    ),
)

print(json.dumps(json.loads(response.text), indent=4))

Upload a PDF file

This PDF page is an article titled Smoothly editing material properties of objects with text-to-image models and synthetic data available on the Google Research Blog.

Firstly you’ll download a the PDF file from an URL and save it locally as “article.pdf

import requests
# Prepare the file to be uploaded
PDF = "https://storage.googleapis.com/generativeai-downloads/data/Smoothly%20editing%20material%20properties%20of%20objects%20with%20text-to-image%20models%20and%20synthetic%20data.pdf"  # @param {type: "string"}
pdf_bytes = requests.get(PDF).content

pdf_path = pathlib.Path('article.pdf')
pdf_path.write_bytes(pdf_bytes)

Secondly, extract PDF file contents into a structured dictionary.

# Upload the file using the API
file_upload = client.files.upload(file=pdf_path)

prompt = """Extract the sections from the file provided without altering it and use the following format on the output:
    [
        {
            section_name: <NAME_OF_SECCION>,
            section_short_description: <SHHORT_DESCRIPTION>,
            start: <page number where the seccion starts>,
            end: <page number where the seccion ends>,
        }
    ]

"""

response = client.models.generate_content(
    model=MODEL,
    contents=[
        file_upload,
        prompt
    ]
)

Markdown(response.text)