MIT用生成式AI打造真實機械人訓練模擬場景

Ai

🎬 YouTube Premium 家庭 Plan成員一位 只需
HK$148/年

不用提供密碼、不用VPN、無需轉區
直接升級你的香港帳號 ➜ 即享 YouTube + YouTube Music 無廣告播放


立即升級 🔗

MIT研發生成式AI技術 多元化虛擬機械人訓練環境

麻省理工學院(MIT)計算機科學與人工智能實驗室(CSAIL)與豐田研究院合作,推出一款名為「可操控場景生成」(Steerable Scene Generation)的新工具,能夠創建逼真的虛擬家庭廚房、客廳及餐廳場景,供模擬機械人與各種真實物件模型互動,從而大幅擴展機械人基礎模型的訓練數據。

過去幾年,像ChatGPT和Claude這類聊天機械人迅速普及,因為它們能協助用戶完成多樣任務,從寫詩、編程除錯到解答冷門問題,背後靠的是互聯網上數以十億計的文本數據。然而,要教導機械人如何成為有效的家庭或工廠助理,光靠文本數據是不夠的。機械人需要大量示範數據,像是教學影片般展示如何操作、堆疊及擺放物件。真實機械人收集這類示範既費時又難以完全重複,因此工程師多半透過AI生成的模擬環境或手工打造數碼場景,但前者常忽略真實物理規律,後者則極為繁複。

MIT團隊的新方法利用一種擴散模型(diffusion model)生成3D場景,再透過「蒙地卡羅樹搜索」(Monte Carlo Tree Search, MCTS)策略,不斷嘗試不同場景配置,優化物理真實感和多樣性。這種方法可視為在無序噪聲中「引導」AI生成逼真日常生活環境,例如廚房內的餐具不會穿越碗碟,解決3D圖形常見的「穿模」問題。

MCTS策略類似AlphaGo在圍棋中預測最佳步驟的思路,模型不斷建立和評估多重場景選項,最後選擇最符合目標(如增加物品數量或提升物理合理性)的方案。實驗中,系統成功在一個簡單餐廳場景內,將桌上物品數量由平均17件增至34件,包括堆積如山的點心碟。

此外,該系統還結合強化學習,透過獎勵機制自動學習創造更符合目標的多樣場景,甚至能根據用戶輸入的具體描述(如「廚房桌上有四個蘋果和一個碗」)精確生成對應場景,準確率高達98%。用戶亦可透過指令讓系統重新排列現有場景中的物件,靈活度極高。

研究人員強調,這套系統的核心優勢在於能從大量預訓練場景中「跳脫」出來,針對特定訓練需求創造更貼近現實且多元的環境,令機械人訓練更有效率。透過這些虛擬場景,機械人可模擬完成擺放餐具、整理食物等日常任務,動作流暢且符合物理規律,未來有望培育出更靈活、適應力強的機械人。

不過,研究團隊指出目前仍屬示範性質,未來計劃讓生成式AI能創造全新物件和場景,而非僅依賴固定資產庫,並加入可開合的物件(如櫥櫃、食物罐)提升互動性。團隊還打算結合網絡圖像資源和先前的「Scalable Real2Sim」技術,打造更真實多元的測試環境,期望吸引更多使用者共同創建龐大數據庫,助力機械人掌握精細技能。

外部專家也讚賞此方法相較傳統程序生成或手工建模,不僅效率高且能保證物理合理性,生成3D場景更具實用價值。豐田研究院機械人專家Rick Cory表示,結合大規模網絡數據的未來應用,將是機械人真實世界部署的重要里程碑。

此項研究由MIT電機及計算機科學系博士生Nicholas Pfaff領銜,與豐田研究院及卡內基梅隆大學等多位專家合作完成,並於2025年9月的機械人學習會議(CoRL)發表。

評論與啟示

MIT團隊的「可操控場景生成」技術,為機械人訓練領域帶來一場革命。過去機械人訓練環境多依賴固定或手工製作的場景,難以涵蓋生活中千變萬化的物件配置與物理互動,限制了機械人靈活應用的可能性。這種利用生成式AI結合蒙地卡羅樹搜索和強化學習的創新方法,不但能大規模生產多樣、逼真的3D場景,還能根據具體需求「定制」訓練環境,極大提升了訓練數據的豐富度和針對性。

對香港及全球的機械人發展產業來說,這代表未來機械人不僅能在工廠生產線,甚至在家庭、商業場所中更好地適應複雜環境。尤其在香港這類空間有限、環境多變的都市,機械人靈活處理不同物件和場景的能力至關重要。此技術將加速機械人從實驗室走向實際應用的步伐,助力智能家居、物流自動化等領域。

然而,現階段仍是概念驗證,未來如何擴展到更多「原創」物件和互動場景,以及如何與實際物理環境無縫結合,仍待解決。結合實體感測器數據、即時物理反饋及更複雜的物件操作,將是下一步挑戰。

總體而言,這項研究展示了生成式AI在機械人訓練中的巨大潛力,為機械人能更智能、更靈活地融入日常生活鋪路。對香港的科研與產業界而言,積極跟進此類前沿技術,培育跨領域人才,將有助提升本地機械人技術研發與應用的競爭力。

以上文章由特價GPT API KEY所翻譯及撰寫。而圖片則由FLUX根據內容自動生成。

🎨 Nano Banana Pro 圖像生成器|打幾句說話就出圖

想畫人像、產品圖、插畫?SSFuture 圖像生成器支援 Flux Gemini Nano Banana Pro 改圖 / 合成, 打廣東話都得,仲可以沿用上一張圖繼續微調。

🆓 Flux 模型即玩,不用登入
🤖 登入後解鎖 Gemini 改圖
📷 支援上載參考圖再生成
⚡ 每天免費額度任你玩
✨ 即刻玩 AI 畫圖
Generate an ultra-realistic, highly ultra-detailed, 8k resolution with 1080x1080 pixel portrait of me using the uploaded image for reference (preserved the likeness and the original face for reference) of a striking, high-resolution portrait of a young woman with long, flowing wavy brunette hair, seated gracefully on a simple black wooden chair in a dimly lit studio. She wears an oversized, off-white knitted sweater that drapes loosely over her frame, slipping off one shoulder to reveal smooth skin and add a subtle touch of sensuality. Her pose is elegant and contemplative: one arm rests gently on the chair’s backrest while her hand delicately supports her chin, and her gaze is directed thoughtfully toward the side, creating an intimate and introspective mood. The lighting is expertly crafted with a single soft, directional light source positioned to the side, casting gentle shadows that sculpt her form and highlight the texture of the knitwear and the natural contours of her legs and arms. This chiaroscuro effect enhances the depth and dimensionality of the image, emphasizing the softness of her skin and the intricate weave of the sweater. The background is a smooth, muted dark gray, providing a minimalist and distraction-free backdrop that contrasts beautifully with her light-colored apparel and warm skin tones. *** The composition is a full-body vertical frame that captures the model’s seated posture with perfect balance, positioning her slightly off-center to create visual interest and harmony. The overall color palette is subdued and warm, featuring neutral tones that evoke a sense of calm and understated elegance. The image is impeccably sharp, showcasing fine details such as the delicate waves of her hair, the subtle folds in the fabric, and the natural texture of her skin. The style is classic and timeless, blending modern minimalism with emotive portraiture to convey quiet strength, beauty, and introspection. 一隻在香港茶餐廳喝奶茶的貓 Generate an ultra-realistic, highly ultra-detailed, 8k resolution with 1080x1080 pixel portrait of me using the uploaded image for reference (preserved the likeness and the original face for reference) of a cinematic studio portrait of a woman seated on a simple wooden chair with a minimalist design, positioned slightly to the left of the frame. She is captured in a contemplative pose, with her body turned to the left, her left arm resting gracefully on the back of the chair, and her right hand gently touching her face near her lips, conveying a sense of introspection and elegance. Her long, wavy hair cascades naturally over her shoulders, framing her face and adding softness to the composition. She wears an oversized, textured knit sweater that slips off her shoulders, exposing her collarbones and upper chest, emphasizing a relaxed and intimate mood. Her legs are bare, with her right foot flat on the ground and her left knee slightly raised, creating a dynamic line that guides the viewer’s eye through the composition. *** The background is a seamless, deep charcoal or dark brown studio backdrop, providing a rich, neutral setting that enhances the dramatic lighting. The lighting setup features a single, soft yet directional light source positioned to the left of the subject, casting gentle, sculptural shadows that highlight the contours of her face, shoulders, and arms, while creating a subtle gradient across her form. The light accentuates the texture of her sweater and the natural shine of her hair, adding depth and dimension to the image. The color palette is monochromatic with warm, muted tones—shades of gray, brown, and beige—contributing to a timeless, artistic aesthetic. The image is shot with a professional full-frame camera using an 85mm or 50mm lens at a wide aperture (f/1.8 to f/2.😎 to achieve a shallow depth of field, ensuring the subject is in sharp focus while the background remains softly blurred. The resolution is ultra-high, capturing every detail from the fine texture of her sweater to the subtle expression of her pose. The overall style is elegant, contemplative, and refined, emphasizing mood and atmosphere over overt glamour. Post-processing is minimal, maintaining natural skin tones, enhancing contrast and clarity, and preserving the authenticity of the scene. This portrait embodies a delicate balance between simplicity and emotional depth, making it suitable for fine art, editorial, or fashion photography.