AI編碼新挑戰:得分僅7.5%震撼業界

Ai

新AI程式編碼挑戰賽首輪成績出爐,但結果令人失望

一個全新的AI程式編碼挑戰賽近日公布了首輪結果,結果顯示AI軟件工程師的表現遠未達標。

非牟利組織Laude Institute於美國西岸時間周三下午五時,宣布了由Databricks及Perplexity聯合創辦人Andy Konwinski發起的多輪AI編碼挑戰賽「K Prize」首位得主。這位得主是來自巴西的提示工程師Eduardo Rocha de Andrade,獲得獎金五萬美元。不過更令人驚訝的是,他在測試中只答對了7.5%的問題,卻能奪冠。

Konwinski表示:「我們很高興能建立一個真正有難度的基準測試,因為只有難的基準測試才有意義。」他補充說:「如果大實驗室用他們最大的模型參賽,分數會不一樣。但這正是重點。K Prize採用離線限算力的方式,偏向較小型及開源模型,我很喜歡這樣,因為它創造了公平競爭的環境。」

Konwinski更承諾,若有開源模型能在此測試中取得90%以上分數,將獲得一百萬美元獎金。

K Prize的測試方式與著名的SWE-Bench系統相似,都是用GitHub上標記的問題來檢驗模型解決真實程式編碼難題的能力。但SWE-Bench基於固定的問題集,模型可進行針對性訓練;而K Prize則設計為「無污染版本的SWE-Bench」,採用定時提交制度防止針對測試的特定訓練。首輪比賽模型必須於3月12日前提交,測試問題全來自該日期之後在GitHub出現的問題。

7.5%的最高分數與SWE-Bench目前75%(簡易版)及34%(進階版)的成績形成鮮明對比。Konwinski仍不確定這是因為SWE-Bench存在數據污染,還是因為從GitHub新收集的問題更具挑戰性,但他相信K Prize將很快揭示答案。

他向TechCrunch表示:「隨著比賽進行多輪,我們會更清楚情況,因為參賽者會逐漸適應這個每隔幾個月舉辦一次的挑戰。」

不少評論認為,儘管目前市面已有多種AI編碼工具,但現有基準測試過於簡單,難以真實反映AI能力,K Prize是解決AI評估問題的重要一步。

普林斯頓大學研究員Sayash Kapoor亦贊同:「建立新測試來挑戰既有基準很重要,否則我們無法分辨SWE-Bench的高分究竟是因為數據污染,還是人為在榜單上做手腳。」

Konwinski強調,K Prize不僅是更嚴格的基準,更是對整個AI行業的公開挑戰。他說:「如果你相信市場炒作,應該早就有AI醫生、AI律師、AI軟件工程師了,但事實並非如此。如果在無污染的SWE-Bench上連10%分數都拿不到,這就是現實的警醒。」

評論與啟示

這個K Prize的結果為AI軟件工程領域敲響了警鐘。儘管AI技術近年大幅進步,尤其是在自然語言處理和生成方面取得突破,但在應對真實且複雜的程式編碼問題上,AI模型的能力仍遠遠不足。這凸顯了當前AI技術過於依賴數據集的「記憶」與「模仿」,而非真正理解和創造解決方案的局限。

此外,K Prize的設計理念——限制算力、避免數據污染、強調公平競爭——為AI評估帶來了新的思路。過去許多基準測試因為模型能「作弊」式地訓練於測試數據而失去參考價值,這種嚴格的測試環境有助於揭露AI真實的能力水平。

對香港及全球科技界而言,這提醒我們在推廣AI應用時要保持理性,避免過度炒作,要更注重技術的穩健性和實際效用。同時,開源社群和中小型研發團隊有機會透過這類公平競爭的平台,展示和推動創新,打破大型科技公司對AI研發的壟斷。

最後,這也鼓勵我們重新思考AI如何與人類工程師協作,而非完全取代。或許未來的方向,是打造「人機協作」的混合模式,結合人類的創造力與AI的效率,才能真正提升軟件開發的質與量。

以上文章由特價GPT API KEY所翻譯及撰寫。而圖片則由FLUX根據內容自動生成。

🎨 Nano Banana Pro 圖像生成器|打幾句說話就出圖

想畫人像、產品圖、插畫?SSFuture 圖像生成器支援 Flux Gemini Nano Banana Pro 改圖 / 合成, 打廣東話都得,仲可以沿用上一張圖繼續微調。

🆓 Flux 模型即玩,不用登入
🤖 登入後解鎖 Gemini 改圖
📷 支援上載參考圖再生成
⚡ 每天免費額度任你玩
✨ 即刻玩 AI 畫圖
Create a portrait of a man (from attached photo) made of tiny ceramic mosaic tiles. Man is dressed in black oversized shirt and cargo pants. The image should look like a wall mural in a graffiti alley. People walking by and looking at the portrait. {
"intro": "Create an ultra realistic 8K UHD DSLR photo based on the attached image as a reference of facial features, maintaining 100% likeness.",

"subject": {
"identity": "A stylish beautiful woman portrayed as Cleopatra during her sacred milk bath beauty ritual, embodying divine femininity, sensual elegance, and timeless power.",
"angle": "Close-up beauty editorial captured at a refined 3/4 angle, focusing on her face, shoulders, and upper chest with extreme clarity and no blur.",
"pose": {
"body_position": "She is partially submerged in a luxurious milk bath, her shoulders and collarbones emerging gracefully from the surface.",
"hands": "One hand gently rests at the edge of the bath with milk droplets on her fingers, while the other lightly touches her neck adorned with subtle gold jewelry.",
"expression": "Soft yet commanding gaze directed toward the camera—calm, confident, intimate, and hypnotic."
}
},

"appearance": {
"outfit": "A barely-there, ritual-style ivory silk drape partially submerged in the milk bath, clinging softly to her skin. The fabric is delicate and translucent, edged with fine gold-thread embroidery for a sensual, sacred aesthetic.",
"accessories": "Minimal ritual jewelry—thin gold collar necklace resting above the milk surface, delicate arm cuff, subtle finger ring—kept refined to maintain focus on beauty and skin.",
"hair": "Her hair is sleek and ritual-polished, partially damp and slicked back at the crown, with soft wet strands framing her face naturally, reflecting candlelight.",
"makeup": "High-fashion Cleopatra beauty makeup with ceremonial artistry—perfectly sculpted brows, elongated black kohl eyeliner, vibrant eyeshadow layered in turquoise, teal, emerald, sapphire blue, and metallic gold. Beneath the eyes, subtle artistic ink-inspired accents echo ancient Egyptian symbolism. Her skin is luminous and dewy, cheeks softly flushed, and lips finished in a nude-rose satin sheen."
},

"ritual_props": {
"bath": "A carved alabaster bathtub filled with warm milk infused with honey, almond oil, and lotus essence.",
"floating_elements": [
"Fresh lotus flowers",
"Soft white rose petals",
"Delicate gold flakes shimmering on the milk surface"
],
"beauty_products": [
"Small golden perfume vial with lotus and myrrh oil",
"Alabaster bowl of honey and milk mixture",
"Clay jar of mineral kohl and pigment powders"
]
},

"background": {
"macro_environment": "An intimate royal bathing chamber within Cleopatra’s palace, designed as a sacred beauty sanctuary.",
"midground_details": "Soft linen curtains, glowing oil lamps, tall candles, and subtle steam rising from the bath, creating a warm and sensual atmosphere.",
"micro_elements": "Milk ripples around her skin, floating petals touching her shoulders, tiny droplets of moisture on her collarbones, reflections of candle flames on gold jewelry, visible stone texture on the tub, and delicate fabric translucency—every detail sharply rendered with zero blur."
},

"lighting": {
"type": "Warm cinematic candlelight combined with soft ambient palace glow.",
"effect": "Golden highlights sculpt her face and skin while gentle shadows enhance depth, intimacy, and divine beauty."
},

"camera": {
"camera_type": "DSLR",
"resolution": "8K UHD",
"lens": "85mm prime lens for beauty editorial compression",
"aperture": "f/8 for maximum facial and detail sharpness",
"iso": 100,
"shutter_speed": "1/160s",
"focus": "Extreme sharp focus on facial features, skin texture, makeup details, and milk surface, no bokeh, no blur"
},

"style": "Luxury beauty editorial, sacred milk bath ritual, ancient Egyptian goddess realism, intimate yet powerful, ultra-detailed, luminous skin focus, cinematic elegance"
} The person from the reference photo ( keep the face of the person 100% accurate from the reference image ) relaxing on a fluffy, glowing cloud high above the sky, surrounded by soft golden sunlight and vast layers of clouds stretching to the horizon. the person is lying back comfortably with a pillow, wearing a dark long-sleeve shirt, olive green pants, holding a book in one hand and a coffee cup in the other. the lighting is cinematic and warm, capturing the golden hour ambiance with radiant highlights and gentle shadows across the clouds. captured with a wide-angle lens at medium depth of field, balancing focus between the subject and the surrounding dreamy sky. the overall atmosphere is surreal and serene, blending realism with fantasy in a peaceful, imaginative setting.