用LangChain解碼圖像:Base64問答,AI話你知!

Ai

🎬 YouTube Premium 家庭 Plan成員一位 只需
HK$148/年

不用提供密碼、不用VPN、無需轉區
直接升級你的香港帳號 ➜ 即享 YouTube + YouTube Music 無廣告播放


立即升級 🔗

LangChain_003: 將圖片文件的Base64編碼數據傳遞給AI並詢問內容

nakamu
2025年3月3日 15:58
我嘗試編寫了一段代碼,將一幅圖片的Base64編碼數據傳遞給AI,讓它回答這是一幅什麼樣的圖片。

在某個活動中拍攝的照片
Python代碼
“`python
from openai import OpenAI
import requests
import os
import base64

# 從環境變量中獲取OpenAI API的密鑰
API_KEY = os.environ.get(‘OPENAI_API_KEY’)

# max_tokens
MAX_TOKENS = 2000

# temperature
TEMPERATURE = 0.0

# 圖片文件的路徑
filepath = r’文件路徑’
# 將圖片以Base64編碼的數據
with open(filepath, “rb”) as f:
file_content = f.read()

image_data = base64.standard_b64encode(file_content).decode(“utf-8”)

# 作為URL傳遞的數據
url_data = “data:image/jpeg;base64,” + image_data

# 消息
messages = [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “這張圖片裡面寫著什麼?”
},
{
“type”: “image_url”,
“image_url”: {
“url”: url_data
}
}
]
}
]

client = OpenAI(api_key=API_KEY)

completion = client.chat.completions.create(
model=”gpt-4o”,
messages=messages,
temperature=TEMPERATURE,
max_tokens=MAX_TOKENS
)

print(completion.choices[0].message.content)
“`
執行結果
———————————————————-
這張圖片中有幾個人站在頒獎台上。他們手裡拿著獎盃,看起來像是賽事或體育活動的頒獎典禮。背景中可以看到樹木。
———————————————————-

使用基於文本的JPEG圖片的情況

隨便創建
問題:「賽車藍隊的駕駛員是誰?」
回答:「賽車藍隊的駕駛員是角田裕毅。」
結果是OK。

使用時間表(圖表式)的JPEG圖片的情況

常見的情況
問題:「開發期限是從何時到何時?」
回答:「根據圖片顯示的開發時間表,開發期限是從2025年3月10日(星期一)到3月19日(星期三)。」
結果是……可惜。

使用時間表(線性圖)的JPEG圖片的情況

常見的情況
問題:「開發期限是從何時到何時?」
回答:「根據這張甘特圖,開發期限是從2023年9月7日到2023年9月27日。」
結果是……完全錯誤。

嗯,單單傳遞Base64編碼數據,似乎只能得到這樣的結果。今天就到這裡。

如果你覺得不錯,請支持我!

我的想法
這段代碼展示了使用AI圖像識別技術的潛力,但同時也暴露了當前技術的局限性。儘管AI可以有效地解讀一些簡單的圖像內容,但面對複雜的圖片或特定的情境,它的準確性可能會大打折扣。這提醒我們,AI雖然強大,但仍需要人類的智慧來進行有效的判斷和分析。未來可能需要更多的數據訓練和模型優化,才能提高AI對於圖片內容的理解能力,尤其是在專業領域的應用上,這是一個值得關注和深入研究的方向。

以上文章由特價GPT API KEY所翻譯及撰寫。

🎨 Nano Banana Pro 圖像生成器|打幾句說話就出圖

想畫人像、產品圖、插畫?SSFuture 圖像生成器支援 Flux Gemini Nano Banana Pro 改圖 / 合成, 打廣東話都得,仲可以沿用上一張圖繼續微調。

🆓 Flux 模型即玩,不用登入
🤖 登入後解鎖 Gemini 改圖
📷 支援上載參考圖再生成
⚡ 每天免費額度任你玩
✨ 即刻玩 AI 畫圖
A selfie taken inside the Roman Colosseum. Insert {reference_image} as the face. He wears a button-down shirt and jeans, smiling as he holds the phone up. Sunbeams shine through the arches behind him. The face is sharp, centered, and well lit. **Enhanced Prompt:**

Two playful cats, one sleek black and one fluffy ginger, are joyfully interacting on a bustling Hong Kong street at sunset. They leap and tumble among vivid neon signs, glowing red lanterns, and traditional market stalls. The scene is bustling with locals and dotted with elements of Hong Kong architecture, such as narrow alleyways, decorative shopfronts, and overhead laundry lines. Warm golden light reflects off the wet cobblestone street, casting dramatic shadows. The atmosphere is lively yet whimsical, capturing the vibrant urban spirit and blending realistic feline anatomy with a touch of enchanting artistry. Rendered in hyper-detailed, cinematic style with rich colors and dynamic composition. Base Setup
keep 100 percent facial information adherence of the attached image and turn her into a girl standing beneath autumn leaves outside a traditional wooden structure in a live action photograph or movie still, wearing a complex suggestive outfit that harmonizes with the warm fall tones.

Shot and Camera
Three quarter shot at slightly low height, framing her off center to the right so the yellow leaves and carved wooden panels dominate the left. Maintain the intimate close framing and vertical orientation feel of the reference.

Identity and Pose
Preserve her age read, build, silhouette, hairstyle length, and skin tone. She leans lightly against the doorframe, one hand grazing a hanging leaf, her posture relaxed and candid, 8k Photorealistic and hyper realistic.

Lighting and Environment
Soft warm daylight filters through the leaves, casting dappled highlights on her hair and outfit. Ground her feet on aged wooden flooring with natural grain, faint scuffs, and subtle contact shadows.

Masking and Constraints
Change only wardrobe and placement while keeping lighting, perspective, white balance, pose, face geometry, body proportions, and silhouette the same. Absolutely no added text, no CGI look, no plastic skin, no floating feet, with consistent perspective and correct contact shadows.