OpenAI出SWE-Lancer:實戰測試AI軟件工程力,睇錢份上!

Ai




OpenAI推出SWE-Lancer:評估現實世界自由職業軟件工程工作模型表現的基準

隨著軟件工程面臨的挑戰不斷演變,傳統基準測試常常顯得不足。現實世界中的自由職業軟件工程工作相當複雜,涉及的內容遠不止是孤立的編碼任務。自由職業工程師需要處理整個代碼庫,整合多樣的系統,並管理複雜的客戶需求。傳統的評估方法通常強調單元測試,卻忽略了全棧性能和解決方案的實際經濟影響。這種合成測試與實際應用之間的差距,促使了對更現實的評估方法的需求。

OpenAI推出了SWE-Lancer,這是一個用於評估現實世界自由職業軟件工程工作模型表現的基準。該基準基於來自Upwork和Expensify資料庫的超過1,400個自由職業任務,總支付金額達到100萬美元。任務範圍從小的錯誤修正到重大功能實現。SWE-Lancer旨在評估個別代碼補丁和管理決策,模型需要從多個選項中選擇最佳提案。這種方法更好地反映了真實工程團隊中的雙重角色。

SWE-Lancer的一大優勢在於它使用端到端測試,而非孤立的單元測試。這些測試由專業軟件工程師精心設計和驗證,模擬整個用戶工作流程——從問題識別和調試到補丁驗證。通過使用統一的Docker映像進行評估,該基準確保每個模型在相同的控制條件下進行測試。這種嚴謹的測試框架有助於揭示模型解決方案是否足夠穩健以便實際部署。

SWE-Lancer的技術細節經過深思熟慮,旨在反映自由職業工作的現實。任務需要跨多個文件進行修改並與API進行集成,涵蓋移動和網頁平台。除了生成代碼補丁外,模型還需挑戰於競爭提案之間進行審查和選擇。這種技術和管理技能的雙重聚焦,反映了軟件工程師的真正責任。用戶工具的加入,模擬真實用戶互動,進一步增強了評估,鼓勵迭代調試和調整。

來自SWE-Lancer的結果提供了有關當前語言模型在軟件工程中能力的寶貴見解。在個別貢獻者任務中,像GPT-4o和Claude 3.5 Sonnet的通過率分別為8.0%和26.2%。在管理任務中,最佳模型的通過率達到了44.9%。這些數字表明,儘管最先進的模型能提供有前景的解決方案,但仍有相當大的改進空間。其他實驗顯示,允許更多嘗試或增加測試時間的計算能力,可以顯著提升性能,特別是在更具挑戰性的任務上。

總之,SWE-Lancer提出了一種深思熟慮且現實的評估AI在軟件工程中表現的方法。通過將模型表現直接與實際經濟價值聯繫起來,並強調全棧挑戰,該基準提供了模型實際能力的更準確的畫面。這項工作鼓勵人們擺脫合成評估指標,轉向反映自由職業工作的經濟和技術現實的評估。隨著該領域的不斷發展,SWE-Lancer成為研究人員和從業者的寶貴工具,提供了對當前限制和潛在改進途徑的清晰見解。最終,這一基準有助於為AI在軟件工程過程中的更安全和更有效的整合鋪平道路。

在這篇文章中,OpenAI所提出的SWE-Lancer基準不僅是對現有測試方法的挑戰,更是對AI在真實世界中的應用潛力的一次深入探索。它不僅考量了技術能力,還強調了自由職業者在複雜環境下的決策能力,這對於未來AI的發展有著重要的啟示。隨著行業的演變,這種評估方法將可能成為我們理解AI能力和價值的一個新標準。

以上文章由特價GPT API KEY所翻譯及撰寫。而圖片則由FLUX根據內容自動生成。

🎬 YouTube Premium 家庭 Plan成員一位 只需 HK$148/年

不用提供密碼、不用VPN、無需轉區
直接升級你的香港帳號 ➜ 即享 YouTube + YouTube Music 無廣告播放

立即升級 🔗

🎨 Nano Banana Pro 圖像生成器|打幾句說話就出圖

想畫人像、產品圖、插畫?SSFuture 圖像生成器支援 Flux Gemini Nano Banana Pro 改圖 / 合成, 打廣東話都得,仲可以沿用上一張圖繼續微調。

🆓 Flux 模型即玩,不用登入
🤖 登入後解鎖 Gemini 改圖
📷 支援上載參考圖再生成
⚡ 每天免費額度任你玩
✨ 即刻玩 AI 畫圖
{
"intro": "Create an ultra realistic 8K UHD DSLR photo based on the attached image as a reference of facial features, maintaining 100% likeness.",

"subject": {
"identity": "A stylish beautiful woman portrayed as Cleopatra performing a sacred night-time royal dance, embodying divine seduction, power, and celestial authority.",
"angle": "Full-body high-fashion editorial portrait captured at a cinematic 3/4 angle, ultra-crisp clarity with no blur.",
"pose": {
"body_position": "She is captured mid-dance inside a temple courtyard at night, her body angled gracefully, hips grounded, torso lifted, creating a powerful and sensual silhouette.",
"movement": "Her arms are raised in slow ceremonial motion while one leg steps forward, fully revealing her feet and lower legs as part of the ritual stance.",
"expression": "A calm, hypnotic, commanding gaze directed toward the camera—quiet dominance wrapped in sensual mystery."
}
},

"appearance": {
"outfit": "An ultra-bongga Cleopatra couture gown with an all-crystal bodice that fits like jeweled armor. Thousands of hand-set crystals, diamonds, and glass stones in gold, champagne, clear, and iridescent tones form sacred Egyptian motifs—cobras, sun disks, lotus symbols—entirely through crystal placement. The neckline is daring yet elegant, supported by invisible illusion mesh. From the waist, the skirt opens into flowing sheer silk and organza panels in ivory and soft gold, designed with a high front opening that intentionally exposes her legs and feet during movement. Crystal chains and beaded fringe sway rhythmically with every step, enhancing sensual motion.",
"accessories": "A dramatic Egyptian ceremonial headpiece with gold and crystal detailing, an oversized layered gold collar necklace, sculpted crystal arm cuffs, finger armor rings, a crystal-and-gold waist belt, and delicate gold anklets accentuating her bare feet.",
"hair": "Her hair is sleek, glossy, and perfectly controlled, styled to balance the weight of the crystal headpiece while allowing elegant movement.",
"makeup": "Ultra-glam, colorful Egyptian night makeup—bold elongated black kohl eyeliner, intense eyeshadow layered in turquoise, emerald, sapphire blue, violet, and metallic gold. Beneath the eyes, intricate ceremonial ink drawings in fine black and gold lines inspired by ancient Egyptian markings. Skin is luminous with molten-gold highlights, brows strong and defined, lips finished in a sensual nude-bronze satin."
},

"props": {
"ritual_elements": [
"Burning incense bowls releasing thick aromatic smoke",
"Golden sistrum placed near the dance space",
"Tall ceremonial torches lining the temple floor"
]
},

"background": {
"macro_environment": "A grand ancient Egyptian temple courtyard at night, reserved for sacred royal rituals.",
"midground_details": "Towering sandstone columns, massive statues of Isis and Hathor, carved temple walls, flowing linen drapes, and ceremonial platforms illuminated by firelight.",
"micro_elements": "Torch flames reflecting off crystal surfaces, fine dust particles glowing in the night air, incense smoke wrapping around her legs and feet, detailed stone floor textures beneath her bare feet, crystal fringe in motion, and sharply defined shadows—everything rendered in extreme detail with zero blur."
},

"lighting": {
"type": "Cinematic night lighting using torches, oil lamps, and subtle moonlight.",
"effect": "Firelight ignites the crystal bodice with prismatic sparkle while moonlight softly outlines her legs and feet, creating a dramatic, sensual, and divine atmosphere."
},

"camera": {
"camera_type": "DSLR",
"resolution": "8K UHD",
"lens": "50mm prime lens",
"aperture": "f/8 for full-body and environment sharpness",
"iso": 100,
"shutter_speed": "1/200s to freeze elegant motion clearly",
"focus": "Extreme sharp focus on crystal bodice, facial features, legs, feet, and temple details, no bokeh, no blur"
},

"style": "Ultra high-fashion editorial, night temple ritual, all-crystal couture, sexy yet sacred, ancient Egyptian royalty reimagined, cinematic realism, extremely detailed, sharp, iconic"
} Edit the uploaded photo (face based on the reference photo). Ensure the face remains consistent with the person in the uploaded image, without changing facial structure, skin tone . Create a Create an 8K ultra-realistic image of a jpyful woman dancing and celebrating in the rain outdoors. She has a big smile on her face, eyes closed with happiness. She is wearing a floral sleeveless dress with a fitted waist and a short, flowy skirt. She has a simple gold necklace and her dark hair is tied back. Her arms are raised, one hand higher than the other, and she is barefoot, standing on wet ground with her toes touching the surface. The background is blurred with dark green trees, emphasizing her joyful expression and movement. The rain is falling steadily around her, creating a lively and vibrant atmosphere. Using a Canon EOS R camera with a 50mm f/1.8 lens, f/2.2 aperture, shutter speed 1/200s, ISO 100 and natural light, Full Body, Hyper Realistic Photography, Cinematic, Cinema, Hyper detail, Ultra hd, Color Correction, ultra hd, hdr , color grading, 8k. Generate an ultra-realistic, highly ultra-detailed, 8k resolution with 1080x1080 pixel, true-to-life portrait of me using the uploaded image for reference (preserved the likeness and the original face for reference). Create a portrait of a  fair-skinned woman with long, curly dark hair styled in a high ponytail, her head turned to the side, her expression neutral and serene; her makeup features defined eyebrows, subtle eyeliner, light blush, and glossy lips; she wears dangling earrings with a floral design; she has a piercing in her ear; the image is in ultra-high 8K resolution, showcasing detailed skin textures, crisp edges, and sharp focus on her eyes; a medium shot, taken from the side to emphasize her profile, with a shallow depth of field that softly blurs the background; soft, diffused lighting illuminates her face evenly, creating subtle shadows and highlights, with a warm color palette of soft browns and creams; she is wearing a dark strapless top; the background is a soft, neutral tone, ensuring the focus remains on the subject; no additional props are present; photorealistic style, akin to a raw camera capture, is achieved using an 85mm lens, ISO 100, and an aperture of f/2.0 for a shallow depth of field and soft background blur.

➖Additional details:
- Negative Prompt: whimsical , doll skin, plastic skin, cartoon, 3d render, cgi,a low poly, painting, drawing, sketch, anime, deformed, bad anatomy, mutated hands, extra limbs, low quality, blurry, artifacts, plastic skin, out of frame, out of focus, wrong spelling, rumble letters, missing letter, blurry letter, blurry face, lowres, pixelated, jpeg artifacts, repeated face and repeated word.