Source audio, duration, and export checklist for AI Audio to Audio

The audio-to-audio panel on Z.Tools accepts a small set of file shapes and rejects the rest before the upload finishes. Most of the time this is silent and helpful. Occasionally it is silent and frustrating, because the rejection happens before the request reaches the model and the error message is not always specific enough to tell you what went wrong.

This is the missing pre-upload checklist. Five rules that cover almost every "why didn't this work" question I have seen.

Source audio: which models need it

MiniMax Music Cover requires a source clip. There is no from-scratch path on this model; if you do not upload audio, the request is rejected at the panel level before generation starts.

ACE-Step v1.5 Base and v1.5 Turbo accept a source clip optionally. With a source, the model treats it as a remix seed and behaves like a cover model. Without a source, the model generates from a prompt alone.

The decision is upstream: what do you have, and what do you want?

A song you want to hear in a different style: source audio, MiniMax or ACE-Step
A from-scratch generation from a prompt: ACE-Step only, no source

Duration constraints

MiniMax Music Cover accepts source audio between 6 seconds and 6 minutes. Anything outside that range is rejected before upload finishes. The tool reads duration from file metadata, so a clip that fails to decode never reaches the provider; you see a rejection at the upload step.

ACE-Step has a wider envelope. Source audio between 6 seconds and roughly 5 minutes is the safe range, though the registry does not pin a hard maximum the same way MiniMax does. If you upload something longer than 5 minutes, the model accepts it but the credit hold and the output length both get sized from the source duration, which can produce a more expensive generation than you expected.

The ACE-Step duration slider only applies when no source audio is uploaded. The slider's range is 6 to 300 seconds, with a default of 60. Once a source clip is set, the slider is hidden because the output length follows the source. If you want a 4-minute output and you have only a 90-second source, you cannot get there from one ACE-Step generation; you would need to extend the source first or run multiple generations and stitch them.

Pre-upload checklist for AI Audio to Audio

File format

Both models accept MP3 and WAV at the upload step. M4A from voice memos has to be converted; FLAC and OGG are not currently accepted as source even though they are valid output formats. The simplest conversion path on macOS is Audacity or Quick Look's "Open with > GarageBand" trick; on Windows, Audacity or VLC's convert/save feature.

Sample rate is normalized to stereo 48 kHz inside ACE-Step regardless of what you upload. MiniMax has its own internal handling. There is no benefit to uploading at a higher sample rate than your source recording.

A practical rule: if you exported from a voice memo app, run the file through a converter to 192 kbps MP3 before uploading. Higher-bitrate files work too, but 192 kbps is plenty for what the model can read and removes any container-format edge case.

Output format

The format selector at output time supports MP3, WAV, FLAC, and OGG. The selection is per-generation; the tool respects what you picked when the result downloads, so a WAV-selected generation comes back as a 16-bit 48 kHz WAV file rather than a transcoded MP3.

The history panel keeps each result with its original format. Re-downloading from history will not silently change the extension, which matters when you have committed a result to a project and want to fetch the same file later.

A rule for picking format: pick MP3 for everything that is going into video editing or social media; pick WAV when you are committing the result to a DAW project; pick FLAC when you want lossless and are storing the file for archive. OGG is rarely the right choice on the audio-to-audio output side.

When to convert before upload

Three cases where pre-upload conversion saves frustration. Voice memo files in M4A or AAC need to be converted to MP3 or WAV (Audacity handles both). Video files with embedded audio need the audio extracted first; the tool does not pull audio from video containers. And 96 kHz files from professional DAW sessions should be downsampled to 48 kHz before uploading, since the model normalizes there anyway.

A 30-second checklist

Before clicking generate:

The source file is MP3 or WAV. The duration is between 6 seconds and 6 minutes for MiniMax (or under 5 minutes if you want a sane credit hold on ACE-Step). The output format selector is set to what you actually need; defaults are MP3 unless you change it. If you are on ACE-Step and uploaded a source, the duration slider is hidden, which is correct. If you are on MiniMax and the lyrics field is empty, the model will use the source vocal's words, which is what most cover users want.

AI 音频转换

为已有音频赋予全新风格，可生成翻唱、混音以及音乐再创作。由 MiniMax Music Cover 与 ACE-Step v1.5 模型提供支持。

What rejection messages mean

Three error patterns and what they actually point at:

"Source audio is required." You are on MiniMax with no upload, or the upload did not complete. Re-upload and try again.

"Duration outside the supported range." Your source is shorter than 6 seconds or longer than 6 minutes. Trim or extend before uploading.

"Failed to decode source audio." The file is corrupted, in an unsupported codec, or has metadata the parser cannot read. Re-export from your source app to MP3 or WAV. The decoder is reasonably permissive about MP3 variants but strict about codec headers; a clean re-encode usually fixes it.

If you see a different error, the most reliable next step is to halve your file (trim 10 seconds off either end) and try again. Most decode failures are at the file boundaries.

Z.Toolsz.tools

AI Audio to Audio · Z.Tools

Reimagine an existing audio track in a new style — covers, remixes, and music transformations. Powered by MiniMax Music Cover and ACE-Step v1.5.

Source audio, duration, and export checklist for AI Audio to Audio

Source audio: which models need it

Duration constraints

File format

Output format

When to convert before upload

A 30-second checklist

AI 音频转换

What rejection messages mean

AI Audio to Audio · Z.Tools

Bracket tags and lyrics format for AI music models

How to write better AI music style prompts

Multilingual AI covers: what the vocal language picker actually does

Source audio: which models need it

Duration constraints

File format

Output format

When to convert before upload

A 30-second checklist

AI 音频转换

What rejection messages mean

AI Audio to Audio · Z.Tools

继续阅读

Bracket tags and lyrics format for AI music models

How to write better AI music style prompts

Multilingual AI covers: what the vocal language picker actually does