<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" encoding="UTF-8" href="iform.xsl" version="1.0"?><!DOCTYPE instructionsection  PUBLIC '-//ARM//DTD instructionsection //EN'  'iform-p.dtd'><!-- Copyright (c) 2010-2026 Arm Limited or its affiliates. All rights reserved. --><!-- This document is Non-Confidential. This document may only be used and distributed in accordance with the terms of the agreement entered into by Arm and the party that Arm delivered this document to. --><instructionsection id="BFMMLA_advsimd" title="BFMMLA (widening) -- A64" type="instruction">
  <docvars>
    <docvar key="advsimd-type" value="simd"/>
    <docvar key="instr-class" value="advsimd"/>
    <docvar key="isa" value="A64"/>
    <docvar key="mnemonic" value="BFMMLA"/>
  </docvars>
  <heading>BFMMLA (widening)</heading>
  <desc>
    <brief>
      <para>BFloat16 matrix multiply-accumulate to single-precision</para>
    </brief>
    <authored>
      <para>This instruction multiplies the 2x4 matrix of BFloat16 values held in the first source
vector by the 4x2 matrix of BFloat16 values in the second source vector.</para>
      <para>If FEAT_EBF16 is not implemented or <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.EBF is 0,
this instruction:</para>
      <list type="unordered">
        <listitem>
          <content>Calculates the intermediate single-precision results as a sum
  of two unfused sum-of-products using two pairs of adjacent BFloat16
  values from each of the source vectors, and then rounds the results before
  accumulation into the 2x2 matrix of single-precision results in the destination vector.
  This is equivalent to accumulating a 4-way dot product
  per destination element.</content>
        </listitem>
        <listitem>
          <content>Uses the non-IEEE 754 Round-to-Odd rounding mode, which forces bit 0
  of an inexact result to 1, and rounds an overflow to an
  appropriately signed Infinity.</content>
        </listitem>
        <listitem>
          <content>Flushes denormalized inputs and results to zero, as if
  <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.{FZ, FIZ} is {1, 1}.</content>
        </listitem>
        <listitem>
          <content>Honors <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.AH when generating a default NaN result,
  otherwise disables alternative floating point behaviors, as if
  <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.AH is 0.</content>
        </listitem>
      </list>
      <para>If FEAT_EBF16 is implemented and <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.EBF is 1,
this instruction:</para>
      <list type="unordered">
        <listitem>
          <content>Calculates the intermediate single-precision
  results as a sum of two fused sum-of-products
  using two pairs of adjacent BFloat16 values from each of the
  source vectors, and then rounds the results before accumulation into
  the 2x2 matrix of single-precision results in the destination vector.
  This is equivalent to accumulating a 4-way dot product
  per destination element.</content>
        </listitem>
        <listitem>
          <content>Follows all other floating-point behaviors that apply to
  single-precision arithmetic, as governed by <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.RMode,
  <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.FZ, <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.AH, and
  <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.FIZ.</content>
        </listitem>
      </list>
      <para>Irrespective of FEAT_EBF16 and <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.EBF, this instruction:</para>
      <list type="unordered">
        <listitem>
          <content>Does not modify the cumulative <register_link id="AArch64-fpsr.xml" state="AArch64">FPSR</register_link>
  exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).</content>
        </listitem>
        <listitem>
          <content>Disables trapped floating-point exceptions, as if the
  <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link> trap enable bits
  (IDE, IXE, UFE, OFE, DZE, and IOE) are all zero.</content>
        </listitem>
        <listitem>
          <content>Generates only the default NaN, as if <register_link id="AArch64-fpcr.xml" state="AArch64">FPCR</register_link>.DN
  is 1.</content>
        </listitem>
      </list>
      <para><register_link id="AArch64-id_aa64isar1_el1.xml" state="AArch64">ID_AA64ISAR1_EL1</register_link>.BF16 indicates whether this instruction is supported.</para>
    </authored>
  </desc>
  <operationalnotes>
    <operationalnote>
      <operationalnote_content>
        <para>Arm expects that the BFMMLA instruction will deliver a peak BFloat16 multiply throughput that is at least as high as can be achieved using two BFDOT (vector) instructions, with a goal that it should have significantly higher throughput.</para>
      </operationalnote_content>
    </operationalnote>
  </operationalnotes>
  <alias_list howmany="0"/>
  <classes>
    <iclass name="Advanced SIMD" oneof="1" id="iclass_advanced_simd" no_encodings="1" isa="A64">
      <docvars>
        <docvar key="advsimd-type" value="simd"/>
        <docvar key="instr-class" value="advsimd"/>
        <docvar key="isa" value="A64"/>
        <docvar key="mnemonic" value="BFMMLA"/>
      </docvars>
      <iclassintro count="1"/>
      <arch_variants>
        <arch_variant feature="FEAT_BF16" name="v8Ap6"/>
      </arch_variants>
      <regdiagram form="32" psname="A64.simd_dp.asimdsame2.BFMMLA_asimdsame2_E" tworows="1">
        <box hibit="31" width="1" settings="1">
          <c>0</c>
        </box>
        <box hibit="30" name="Q" usename="1" settings="1" psbits="x">
          <c>1</c>
        </box>
        <box hibit="29" name="U" usename="1" settings="1" psbits="x">
          <c>1</c>
        </box>
        <box hibit="28" width="1" settings="1">
          <c>0</c>
        </box>
        <box hibit="27" width="3" settings="3">
          <c>1</c>
          <c>1</c>
          <c>1</c>
        </box>
        <box hibit="24" width="1" settings="1">
          <c>0</c>
        </box>
        <box hibit="23" width="2" name="size" usename="1" settings="2" psbits="xx">
          <c>0</c>
          <c>1</c>
        </box>
        <box hibit="21" width="1" settings="1">
          <c>0</c>
        </box>
        <box hibit="20" width="5" name="Rm" usename="1">
          <c colspan="5"/>
        </box>
        <box hibit="15" width="1" settings="1">
          <c>1</c>
        </box>
        <box hibit="14" width="4" name="opcode" usename="1" settings="4" psbits="xxxx">
          <c>1</c>
          <c>1</c>
          <c>0</c>
          <c>1</c>
        </box>
        <box hibit="10" width="1" settings="1">
          <c>1</c>
        </box>
        <box hibit="9" width="5" name="Rn" usename="1">
          <c colspan="5"/>
        </box>
        <box hibit="4" width="5" name="Rd" usename="1">
          <c colspan="5"/>
        </box>
      </regdiagram>
      <encoding name="BFMMLA_asimdsame2_E" oneofinclass="1" oneof="1" label="">
        <docvars>
          <docvar key="instr-class" value="advsimd"/>
          <docvar key="isa" value="A64"/>
          <docvar key="advsimd-type" value="simd"/>
          <docvar key="mnemonic" value="BFMMLA"/>
        </docvars>
        <asmtemplate><text>BFMMLA  </text><a hover="Is the name of the SIMD&amp;FP third source and destination register, encoded in the &quot;Rd&quot; field." link="Vd__3">&lt;Vd&gt;</a><text>.4S, </text><a hover="Is the name of the first SIMD&amp;FP source register, encoded in the &quot;Rn&quot; field." link="Vn__2">&lt;Vn&gt;</a><text>.8H, </text><a hover="Is the name of the second SIMD&amp;FP source register, encoded in the &quot;Rm&quot; field." link="Vm">&lt;Vm&gt;</a><text>.8H</text></asmtemplate>
      </encoding>
      <ps_section howmany="1">
        <ps name="A64.simd_dp.asimdsame2.BFMMLA_asimdsame2_E" sections="1" secttype="noheading">
          <pstext mayhavelinks="1" section="Decode" rep_section="decode">if !IsFeatureImplemented(FEAT_BF16) then <a link="func_EndOfDecode_1" file="shared_pseudocode.xml">EndOfDecode</a>(<a link="enum_Decode_UNDEF" file="shared_pseudocode.xml">Decode_UNDEF</a>); end;
let d : integer{} = UInt(Rd);
let n : integer{} = UInt(Rn);
let m : integer{} = UInt(Rm);</pstext></ps>
      </ps_section>
    </iclass>
  </classes>
  <explanations scope="all">
    <explanation enclist="BFMMLA_asimdsame2_E" symboldefcount="1">
      <symbol link="Vd__3">&lt;Vd&gt;</symbol>
      <account encodedin="Rd">
        <intro>
          <para>Is the name of the SIMD&amp;FP third source and destination register, encoded in the &quot;Rd&quot; field.</para>
        </intro>
      </account>
    </explanation>
    <explanation enclist="BFMMLA_asimdsame2_E" symboldefcount="1">
      <symbol link="Vn__2">&lt;Vn&gt;</symbol>
      <account encodedin="Rn">
        <intro>
          <para>Is the name of the first SIMD&amp;FP source register, encoded in the &quot;Rn&quot; field.</para>
        </intro>
      </account>
    </explanation>
    <explanation enclist="BFMMLA_asimdsame2_E" symboldefcount="1">
      <symbol link="Vm">&lt;Vm&gt;</symbol>
      <account encodedin="Rm">
        <intro>
          <para>Is the name of the second SIMD&amp;FP source register, encoded in the &quot;Rm&quot; field.</para>
        </intro>
      </account>
    </explanation>
  </explanations>
  <ps_section howmany="1">
    <ps name="A64.simd_dp.asimdsame2.BFMMLA_asimdsame2_E" sections="1" secttype="Operation">
      <pstext mayhavelinks="1" section="Execute" rep_section="execute"><a link="func_AArch64_CheckFPAdvSIMDEnabled_0" file="shared_pseudocode.xml">AArch64_CheckFPAdvSIMDEnabled</a>();
let op1 : bits(128) = V{}(n);
let op2 : bits(128) = V{}(m);
let acc : bits(128) = V{}(d);

<a link="accessor_V_2" file="shared_pseudocode.xml">V</a>{128}(d) = <a link="func_BFMatMulAddH_4" file="shared_pseudocode.xml">BFMatMulAddH</a>(acc, op1, op2, FPCR());</pstext></ps>
  </ps_section>
  <timestamp>2026-03-26 20:27:25</timestamp>
  <commit_id>2026-03_rel</commit_id>
</instructionsection>