This is a collection of Intel®’ IA32® Software Developer's Manuals (URL of the day) and AMD' AMD64 Architecture Programmer's Manual together with the related specifications, application notes, white papers, and change logs. The collection aims to keep all available revisions. It was originally created by Michal Necasek, see OS/2 Museum.

If you have a public document, related to the IA32® specifications and missing from the collection, please mail it to me. The content of this URL and all sub-ULRs is available for convenient bulk download by rsync x86docs password "" (empty).

VMMLA -- AArch32

VMMLA

BFloat16 floating-point matrix multiply-accumulate. This instruction multiplies the 2x4 matrix of BF16 values in the first 128-bit source vector by the 4x2 BF16 matrix in the second 128-bit source vector. The resulting 2x2 single-precision matrix product is then added destructively to the 2x2 single-precision matrix in the 128-bit destination vector. This is equivalent to performing a 4-way dot product per destination element. The instruction does not update the FPSCR exception status.


Note

Arm expects that the VMMLA instruction will deliver a peak BF16 multiply throughput that is at least as high as can be achieved using two VDOT instructions, with a goal that it should have significantly higher throughput.


It has encodings from the following instruction sets: A32 ( A1 ) and T32 ( T1 ) .

A1
(FEAT_AA32BF16)

313029282726252423222120191817161514131211109876543210
111111000D00VnVd1100N1M0Vm
op1op2op3op4QU

Encoding

VMMLA{<q>}.BF16 <Qd>, <Qn>, <Qm>

Decode for this encoding

if !IsFeatureImplemented(FEAT_AA32BF16) then Undefined(); end; if Vd[0] == '1' || Vn[0] == '1' || Vm[0] == '1' then Undefined(); end; let d : integer = UInt(D::Vd); let n : integer = UInt(N::Vn); let m : integer = UInt(M::Vm); let regs : integer = 2;

T1
(FEAT_AA32BF16)

15141312111098765432101514131211109876543210
111111000D00VnVd1100N1M0Vm
op1op2op3op4QU

Encoding

VMMLA{<q>}.BF16 <Qd>, <Qn>, <Qm>

Decode for this encoding

if InITBlock() then UnpredictableProcedure(); end; if !IsFeatureImplemented(FEAT_AA32BF16) then Undefined(); end; if Vd[0] == '1' || Vn[0] == '1' || Vm[0] == '1' then Undefined(); end; let d : integer = UInt(D::Vd); let n : integer = UInt(N::Vn); let m : integer = UInt(M::Vm); let regs : integer = 2;

Assembler Symbols

<q>

See Standard assembler syntax fields.

<Qd>

Is the 128-bit name of the SIMD&FP destination register, encoded in the "D:Vd" field as <Qd>*2.

<Qn>

Is the 128-bit name of the first SIMD&FP source register, encoded in the "N:Vn" field as <Qn>*2.

<Qm>

Is the 128-bit name of the second SIMD&FP source register, encoded in the "M:Vm" field as <Qm>*2.


2025-09_rel_asl1 2026-03-12 12:57:38

Copyright © 2010-2025 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.