Amazon出AI聲模Nova Sonic:快過OpenAI,勁慳錢!

Ai

🎨 Nano Banana Pro 圖像生成器|打幾句說話就出圖

想畫人像、產品圖、插畫?SSFuture 圖像生成器支援
Flux Gemini Nano Banana Pro 改圖 / 合成
打廣東話都得,仲可以沿用上一張圖繼續微調。

🆓 Flux 模型即玩,不用登入
🤖 登入後解鎖 Gemini 改圖
📷 支援上載參考圖再生成
⚡ 每天免費額度任你玩

✨ 即刻玩 AI 畫圖

The man holding a hello kitty doll
The person from the reference photo ( keep the face of the person 100% accurate from the reference image ) relaxing on a fluffy, glowing cloud high above the sky, surrounded by soft golden sunlight and vast layers of clouds stretching to the horizon. the person is lying back comfortably with a pillow, wearing a dark long-sleeve shirt, olive green pants, holding a book in one hand and a coffee cup in the other. the lighting is cinematic and warm, capturing the golden hour ambiance with radiant highlights and gentle shadows across the clouds. captured with a wide-angle lens at medium depth of field, balancing focus between the subject and the surrounding dreamy sky. the overall atmosphere is surreal and serene, blending realism with fantasy in a peaceful, imaginative setting.
Ultra-realistic, vibrant aerial adventure photograph captures a joyful man (same face as reference image, 100% accuracy) paragliding high above a breathtaking tropical coastline. He is seated comfortably in his black and orange harness, legs dangling freely, facing the camera with a wide, excited smile, teeth showing, conveying pure exhilaration. He sports dark brown hair and a light beard,wearing a bright red t-shirt emblazoned with 'style' in bold white text within a blackCteate a Ultra-realistic, vibrant aerial adventure photograph in 9:16 ratio captures a joyful man (same face as reference image, 100% accuracy) paragliding high above a breathtaking tropical coastline. He is seated comfortably in his black and orange harness, legs dangling freely, facing the camera with a wide, excited smile, teeth showing, conveying pure exhilaration. He sports dark brown hair and a light beard,wearing a bright red t-shirt emblazoned with 'style' in bold white text within a blackrectangle, layered beneath an open blue and black plaid long-sleeved shirt. His attire includes grey cargo shorts, white socks, and distinctive red, white, and black high-top sneakers. In his left hand, he holds a black selfie stick, while his right grasps the paragliding control lines. The stunning backdrop features crystal-clear turquoise and deep blue ocean revealing intricate coral reefs and gentle breaking waves near a pristine white sandy beach. A charming coastal town with numerouswhite buildings is nestled against lush, emerald-green, tree-covered hills, which ascend to distant, hazy mountains under a softly clouded sky. Another yellow paraglider gracefully soars in the distance, and a long concrete pier extends into the tranquil water. The scene is illuminated by bright, diffused natural light, creating an exhilarating, free-spirited, and idyllic tropical mood, with a color palette dominated by vivid turquoises, deep blues, and verdant greens, beautifully contrastedby the crisp whites of the beach and town, and the bold reds of his clothing and footwear.

亞馬遜推出新的AI語音模型Nova Sonic

亞馬遜於週二推出了一款新的生成式AI模型——Nova Sonic,這款模型能夠原生處理語音並生成自然聽起來的語音。亞馬遜聲稱,Nova Sonic的性能在速度、語音識別和對話質量等基準上與OpenAI和谷歌的前沿語音模型相媲美。

Nova Sonic是亞馬遜對較新AI語音模型的回應,例如支援ChatGPT語音模式的模型,這些模型在對話中感覺更自然,相比之下,亞馬遜Alexa早期的模型則顯得相當僵硬。最近的技術突破使得舊有模型及其背後的數位助手(如Alexa和蘋果的Siri)在比較中顯得相當笨拙。

Nova Sonic可通過亞馬遜的開發者平台Bedrock進行訪問,並提供一個新的雙向串流API。在一份新聞稿中,亞馬遜稱Nova Sonic是市場上“成本效益最高”的AI語音模型,價格大約比OpenAI的GPT-4o低80%。

根據亞馬遜AGI部門的高級副總裁及首席科學家Rohit Prasad的說法,Nova Sonic的組件已經在升級版的數位語音助手Alexa+中發揮作用。

在接受TechCrunch的訪問時,Prasad表示,Nova Sonic建立在亞馬遜在“大型協調系統”方面的專業知識基礎上,這些系統是構成Alexa的技術框架。相較於競爭對手的AI語音模型,Nova Sonic在將用戶請求路由到不同API方面表現出色。這種能力使Nova Sonic能夠“知道”何時需要從互聯網獲取即時信息、解析專有數據源或在外部應用中執行操作,並使用適當的工具來完成。

在雙向對話中,Nova Sonic會在“適當的時間”發言,考慮到說話者的停頓和打斷。它還生成用戶語音的文字轉錄,開發者可以用於各種應用。

根據Prasad的說法,Nova Sonic的語音識別錯誤率低於其他AI語音模型,這意味著即使用戶含糊不清、說錯話或身處嘈雜環境,該模型也能相對準確地理解用戶的意圖。在一個測量多語言和方言的語音識別基準——Multilingual LibriSpeech中,亞馬遜表示Nova Sonic的單詞錯誤率(WER)僅為4.2%,這意味著在英語、法語、意大利語、德語和西班牙語中,約每100個單詞中有4個與人類轉錄不同。

在另一個測量多參與者嘈雜互動的基準——Augmented Multi Party Interaction中,亞馬遜表示Nova Sonic的WER準確度比OpenAI的GPT-4o轉錄模型高出46.7%。根據亞馬遜的數據,Nova Sonic的行業領先速度平均感知延遲為1.09秒,這比支援OpenAI的即時API的GPT-4o模型(反應時間為1.18秒)還要快。

Prasad表示,Nova Sonic是亞馬遜更廣泛的AGI(人工通用智能)戰略的一部分,該公司將AGI定義為“能夠在計算機上執行人類所能做的任何事情的AI系統”。未來,Prasad表示亞馬遜計劃推出更多能夠理解不同模態的AI模型,包括圖像、視頻和語音,以及“如果將事物引入物理世界,則相關的其他感官數據”。

Prasad所監督的亞馬遜AGI部門似乎在當前公司的產品策略中扮演著越來越重要的角色。就在上週,亞馬遜推出了Nova Act的預覽,這是一個基於瀏覽器的AI模型,似乎正在支援Alexa+和亞馬遜的“為我購買”功能。Prasad表示,從Nova Sonic開始,公司希望為開發者提供更多內部AI模型以供使用。

在這一背景下,Nova Sonic的推出不僅僅是一個技術進步,更是亞馬遜在AI領域持續競爭的一部分。隨著AI技術的不斷演進,未來的數位助手將更加智能化和人性化,這不僅能提升用戶體驗,也可能改變人們與科技互動的方式。這樣的發展值得關注,因為它可能會影響到廣泛的行業,從客戶服務到健康護理,甚至是教育等領域。亞馬遜在這一領域的努力,將如何塑造未來的數位助手生態系統,讓人充滿期待。

以上文章由特價GPT API KEY所翻譯及撰寫。而圖片則由FLUX根據內容自動生成。