Elixir: Gemini Vision with Req

Using req to get information about an image

Published on · 3 min read · 1 view · 1 reading right now

ELIXIR
GEMINI

Recently, I wanted to classify some plants with the help of LLMs. And I knew that Google's AI studio gives free access to some Gemini models. So I thought of using Gemini's Vision model to classify the plants. I was able to get it working with the help of Req library.

Here's the code for it:

First, getting the image. The image should be base64 encoded.

encoded_image =
  image_path
  |> File.read!()
  |> Base.encode64()

And now, for the real meat. Making a request to Gemini Vision API along with prompt and the image.

Req.post(
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent",
  params: [key: get_api_key()],
  json: %{
    contents: [
      %{
        parts: [
          %{text: prompt()},
          %{
            inline_data: %{
              data: encoded_image,
              mime_type: "image/jpeg"
            }
          }
        ]
      }
    ]
  }
)

@doc """
Prompt to classify plants.
"""
def prompt(), do: "..."

That's pretty much it. Here's the whole code that I used to classify the plants.

defmodule Plants.AI do
  require Logger

  @llm_url "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash"

  def run(base64_image) do
    response = make_request(base64_image)

    case response do
      {:ok, %Req.Response{status: 200, body: body}} ->
        %{"candidates" => [%{"content" => %{"parts" => [%{"text" => text}]}} | _]} = body
        {:ok, clean_and_parse(text)}

      {:ok, %Req.Response{body: body}} ->
        %{"error" => %{"message" => message}} = body
        Logger.error(message)
        {:error, message}

      {:error, error} ->
        Logger.error(Exception.message(error))
        {:error, error}
    end
  end

  defp make_request(base64_image) do
    Req.post(
      "#{@llm_url}:generateContent",
      params: [key: get_api_key()],
      json: %{
        contents: [
          %{
            parts: [
              %{text: prompt()},
              %{
                inline_data: %{
                  data: base64_image,
                  mime_type: "image/jpeg"
                }
              }
            ]
          }
        ]
      }
    )
  end

  defp prompt() do
    """
    A description of the given plant.
    As a genius expert, your task is to understand the content and provide the parsed objects in json that match the following json_schema:\n
    {"name": string, "info": string, "care": string, "watering_frequency": one of ["weekly", "biweekly", "daily", "<num> days", "<num> weeks", "upper soil dry"]}

    ## Fields:
    - name: Common name of the plant
    - info: Some common information about the plant such as it's features(ex. "air purifier")
    - care: Some care tips of the plant(ex. "indirect sunlight")
    - watering_frequency: At what frequency the plant should be watered(ex. "3 days" meaning every 3 days)

    Make sure to return an instance of the JSON, not the schema itself. Only return one JSON result even if image contains multiple plants.
    """
  end

  defp get_api_key() do
    System.get_env("GEMINI_API_KEY")
  end

  # The response usually gives the JSON between ``json\n and \n```, so we clean that up and parse it.
  defp clean_and_parse(response) do
    Regex.replace(~r/^```json\n|\n```$/, response, "")
    |> JSON.decode!()
  end
end

Note: I'm using structured output feature using the prompt. So, the response is in JSON format.

The Runner module to run the AI module.

defmodule Plants do
  @moduledoc """
  Module to handle plant classification using Gemini Vision.
  """

  def classify_plant(image_path) do
    image_path
    |> File.read!()
    |> Base.encode64()
    |> Plants.AI.run()
  end
end
Plants.classify_plant("path/to/image.jpg")

Output:
# {:ok,
#  %{
#    "name" => "Aloe Vera",
#    "info" => "Air purifier",
#    "care" => "Indirect sunlight",
#    "watering_frequency" => "3 days"
#  }}
0 likes

Other articles

Theme: