YuE:開源AI音樂模型,創作完整歌曲,媲美Suno AI

Ai




YuE:一個開放源代碼的音樂生成AI模型系列,能夠創作完整的歌曲

隨著人工智能和音樂生成技術的發展,短小的器樂作品已經取得了顯著進展。然而,對於現有模型來說,創作包含歌詞、聲樂和器樂伴奏的完整歌曲仍然是一個挑戰。從歌詞生成完整歌曲面臨多重挑戰,包括音樂的長度需要AI模型在數分鐘內保持一致性和連貫性,音樂還涉及複雜的和聲結構、樂器編排及節奏模式,而不僅僅是語音或音效。此外,AI生成的歌詞在與音樂元素結合時常常會出現不連貫的情況,並且有效訓練AI模型的配對歌詞音頻數據集也相對稀缺。

在這樣的背景下,由多模態藝術投影團隊開發的YuE開放源代碼基礎模型系列應運而生,與Suno AI在歌曲生成方面競爭。這些模型旨在從歌詞創作出幾分鐘的完整歌曲,具備變化背景音樂、風格和歌詞的能力。YuE系列模型有多個變種,參數最高可達70億。其中一些YuE系列模型在Hugging Face上可用,包括:

– YuE-s1-7B-anneal-en-cot
– YuE-s1-7B-anneal-en-icl
– YuE-s1-7B-anneal-jp-kr-cot
– YuE-s1-7B-anneal-jp-kr-icl
– YuE-s1-7B-anneal-zh-cot
– YuE-s1-7B-anneal-zh-icl
– YuE-s2-1B-general
– YuE-upsampler

YuE採用先進技術來解決完整歌曲生成的挑戰,利用LLaMA系列語言模型來增強歌詞到歌曲的生成過程。其中一項核心進展是其雙標記技術,這使得聲樂和器樂建模可以同步進行,而不需要修改LLaMA的基本架構。這確保了生成的歌曲中聲樂和器樂元素的和諧。此外,YuE還採用了一種強大的音頻標記器,這不僅降低了訓練成本,還加速了收斂,確保生成的音頻保持音樂完整性,同時優化計算效率。

YuE中的另一項獨特技術是歌詞思維鏈(Lyrics-CoT),這使模型能夠以結構化的方式逐步生成歌詞,確保整首歌的歌詞內容保持一致和有意義。YuE遵循結構化的三階段訓練方案,這增強了可擴展性、音樂性和歌詞控制。這種結構化的訓練確保模型能生成不同長度和複雜度的歌曲,改善生成音樂的自然感,並加強生成歌詞與整體歌曲結構之間的對齊。

YuE的突出之處在於它能生成包含聲樂旋律和器樂伴奏的完整歌曲。與現有模型在長篇作品創作上掙扎不同,YuE能在整首歌中保持音樂的連貫性。生成的聲樂遵循自然的歌唱模式和音調變化,使音樂更具吸引力。同時,器樂元素也與聲樂曲線巧妙對齊,產生自然平衡的歌曲。該模型系列還支持多種音樂風格和語言。

在使用方面,YuE模型設計為可以在高性能GPU上運行,以實現無縫的完整歌曲生成。建議至少使用80GB的GPU內存(如NVIDIA A100)以獲得最佳效果。根據所使用的GPU,生成30秒的音樂片段通常需要150到360秒。用戶可以利用Hugging Face的Transformers庫來生成音樂,該模型還支持音樂上下文學習(ICL),允許用戶提供參考歌曲,讓AI生成新音樂。

YuE在Creative Commons Attribution Non-Commercial 4.0 License下發佈,鼓勵藝術家和內容創作者對其輸出進行取樣、修改和整合,並在作品中標註模型為YuE by HKUST/M-A-P。YuE為AI生成音樂的許多應用開啟了大門,可以協助音樂家和作曲家生成歌曲創意和完整作品,為電影、視頻遊戲和虛擬內容創作音樂配樂,根據用戶提供的歌詞或主題生成自定義歌曲,並通過展示AI生成的各種風格和語言的作品來幫助音樂教育。

總結來說,YuE代表了AI音樂生成領域的一次突破,解決了長期以來歌詞到歌曲轉換的挑戰。憑藉其先進技術、可擴展架構和開放源代碼的理念,YuE有望重新定義AI驅動的音樂製作格局。隨著進一步的增強和社區貢獻的出現,YuE有潛力成為完整歌曲生成的領先基礎模型。

在這個快速變化的科技時代,YuE的推出不僅展示了AI在音樂創作中的潛力,還引發了對未來音樂創作方式的思考。隨著AI技術的進一步發展,音樂創作的界限將會被重新定義,無論是對於專業音樂人還是業餘愛好者,YuE都可能成為創作的得力助手。這不僅是技術的進步,更是創意表達的新途徑。

以上文章由特價GPT API KEY所翻譯及撰寫。而圖片則由FLUX根據內容自動生成。

🎨 Nano Banana Pro 圖像生成器|打幾句說話就出圖

想畫人像、產品圖、插畫?SSFuture 圖像生成器支援 Flux Gemini Nano Banana Pro 改圖 / 合成, 打廣東話都得,仲可以沿用上一張圖繼續微調。

🆓 Flux 模型即玩,不用登入
🤖 登入後解鎖 Gemini 改圖
📷 支援上載參考圖再生成
⚡ 每天免費額度任你玩
✨ 即刻玩 AI 畫圖
Base Setup
keep 100 percent facial information adherence of the attached image and turn her into a lone night hiker standing in the shallow river at the mouth of the canyon, captured as a live action photograph or movie still, not an illustration or CGI render, with an alluring, confident, quietly sexy mood.

Shot and Camera
Three quarter body shot from slightly low height on the riverbank, placing her on the left third while the water leads into the dark canyon and the star filled sky dominates the top of the frame, preserving the original wide vertical composition.

Identity and Pose
She has a slim athletic build, natural proportions, medium length loose hair, and lightly tanned skin, 8k Photorealistic and hyper realistic. She stands barefoot in the water with one leg forward, hips relaxed, wearing a fitted cropped technical top and separate high waisted trail shorts with an open lightweight jacket slipping off one shoulder, one hand loosely holding a headlamp at her side and the other brushing hair from her face as she looks up at the stars.

Lighting and Environment
Keep the soft starlight and faint glow from the distant horizon, giving a gentle rim on her shoulders and hair while a subtle fill from the reflected sky reveals her features. The canyon walls, river reflections, pebbled shore, and dense star field stay exactly like the reference, with small ripples around her feet grounding her in the water.

Masking and Constraints
Change only by adding the subject and her outfit, keep lighting, perspective, white balance, canyon geometry, and river reflections the same, with realistic scale, body proportions, and clear contact between feet and water surface. Change only the requested element and keep lighting, perspective, white balance, pose, face geometry, body proportions, and silhouette the same, and absolutely avoid added text, painterly or toon styling, CGI look, blown highlights, banding, plastic skin, overly tidy backgrounds, or floating feet while requiring consistent perspective, consistent white balance, subtle film like grain, and physically correct contact shadows and reflections. A fierce young boy with short, curly hair stands in an action pose, reaching his hand forward as if unleashing a powerful psychic force. Bright orange energy radiates from his outstretched hand, illuminating her determined expression. he wears a patterned shirt reminiscent of 1980s fashion. Dark, monstrous creatures with elongated limbs and sharp teeth surround his, emerging from swirling smoke and fiery cracks in the environment. The entire scene glows with dramatic orange and red lightning-like streaks, creating an intense, supernatural atmosphere. Highly detailed digital painting, cinematic lighting, dynamic composition, epic fantasy-horror style, dramatic shadows, glowing embers, poster-quality artwork Create a photorealistic and highly detailed image featuring the attached image walking confidently down a modern city street, accompanied by Jason Statham, Dwayne “The Rock” Johnson, and Jason Momoa acting as bodyguards.
John Wick (Keanu Reeves) is walking just beside or slightly behind the subject, holding an umbrella over him to shield from light rain.
The subject should be the central figure, wearing stylish casual clothing — like a fitted jacket, dark jeans, and sunglasses — exuding calm authority and cool charisma.
Statham, The Rock, and Momoa are dressed in black tactical-style suits, maintaining alert, protective stances, scanning the surroundings like professional bodyguards. John Wick wears his signature black suit and tie, looking composed as he holds the umbrella.
The setting is a downtown urban street with wet pavement reflecting city lights, parked luxury cars, and paparazzi in the background snapping photos.
The photo should look like a real paparazzi shot — slightly off-angle, mid-step motion blur, with realistic lighting and reflections.
Lighting: natural daylight with overcast skies, reflections from wet concrete, realistic shadows, subtle raindrops on the umbrella and clothing.
Camera realism: crisp detail on facial features and clothing textures, shallow depth of field emphasizing the group, with lens flare or light bloom for authenticity.
Mood & tone: grounded, cinematic, and stylish — feels like a moment from a celebrity entourage photo or action-movie press capture, taken with an iPhone by paparazzi.
Style: ultra-realistic, documentary-style street photography with modern cinematic sharpness.