vllm.model_executor.models.qwen2_audio ¶
Inference-only Qwen2-Audio model compatible with HuggingFace weights.
Qwen2AudioEmbeddingInputs ¶
Bases: TensorSchema
Dimensions
- bn: Batch size
- naf: Number of audio features
- hs: Hidden size (must match the hidden size of language model backbone)
Source code in vllm/model_executor/models/qwen2_audio.py
Qwen2AudioFeatureInputs ¶
Bases: TensorSchema
Dimensions
- na: Number of audios
- nmb: Number of mel bins
Source code in vllm/model_executor/models/qwen2_audio.py
Qwen2AudioProcessingInfo ¶
Bases: BaseProcessingInfo