

# Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual

**Documentation Changes** 

March 2012

**Notice:** The  $Intel^{(R)}$  64 and IA-32 architectures may contain design defects or errors known as errata that may cause the product to deviate from published specifications. Current characterized errata are documented in the specification updates.

Document Number: 252046-035



INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

Intel, the Intel logo, Pentium, Xeon, Intel NetBurst, Intel Core, Intel Core Solo, Intel Core Duo, Intel Core 2 Duo, Intel Core 2 Extreme, Intel Pentium D, Itanium, Intel SpeedStep, MMX, Intel Atom, and VTune are trademarks of Intel Corporation in the U.S. and/or other countries.

\*Other names and brands may be claimed as the property of others.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copyright © 1997-2012 Intel Corporation. All rights reserved.



# Contents

| Revision History          | <br> | <br>• | • |  | • |       |  | • | • |  | • | • | • |  | • |       | •   | 4 |
|---------------------------|------|-------|---|--|---|-------|--|---|---|--|---|---|---|--|---|-------|-----|---|
| Preface                   | <br> | <br>• | • |  | • |       |  | • | • |  | • | • | • |  |   |       | •   | 7 |
| Summary Tables of Changes |      |       |   |  | • |       |  |   | • |  |   |   |   |  | • |       | . : | 8 |
| Documentation Changes     | <br> | <br>• | • |  | • | <br>• |  |   | • |  | • |   | • |  | • | <br>• | . ' | 9 |



| Revision | Description                                                                                                                                                                                                                  | Date           |
|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
| -001     | Initial release                                                                                                                                                                                                              | November 2002  |
| -002     | <ul> <li>Added 1-10 Documentation Changes.</li> <li>Removed old Documentation Changes items that already have been incorporated in the published Software Developer's manual</li> </ul>                                      | December 2002  |
| -003     | <ul> <li>Added 9 -17 Documentation Changes.</li> <li>Removed Documentation Change #6 - References to bits Gen and Len Deleted.</li> <li>Removed Documentation Change #4 - VIF Information Added to CLI Discussion</li> </ul> | February 2003  |
| -004     | <ul><li>Removed Documentation changes 1-17.</li><li>Added Documentation changes 1-24.</li></ul>                                                                                                                              | June 2003      |
| -005     | <ul><li>Removed Documentation Changes 1-24.</li><li>Added Documentation Changes 1-15.</li></ul>                                                                                                                              | September 2003 |
| -006     | Added Documentation Changes 16- 34.                                                                                                                                                                                          | November 2003  |
| -007     | <ul><li>Updated Documentation changes 14, 16, 17, and 28.</li><li>Added Documentation Changes 35-45.</li></ul>                                                                                                               | January 2004   |
| -008     | <ul><li>Removed Documentation Changes 1-45.</li><li>Added Documentation Changes 1-5.</li></ul>                                                                                                                               | March 2004     |
| -009     | Added Documentation Changes 7-27.                                                                                                                                                                                            | May 2004       |
| -010     | <ul><li>Removed Documentation Changes 1-27.</li><li>Added Documentation Changes 1.</li></ul>                                                                                                                                 | August 2004    |
| -011     | Added Documentation Changes 2-28.                                                                                                                                                                                            | November 2004  |
| -012     | <ul><li>Removed Documentation Changes 1-28.</li><li>Added Documentation Changes 1-16.</li></ul>                                                                                                                              | March 2005     |
| -013     | <ul> <li>Updated title.</li> <li>There are no Documentation Changes for this revision of the document.</li> </ul>                                                                                                            | July 2005      |
| -014     | Added Documentation Changes 1-21.                                                                                                                                                                                            | September 2005 |
| -015     | <ul><li>Removed Documentation Changes 1-21.</li><li>Added Documentation Changes 1-20.</li></ul>                                                                                                                              | March 9, 2006  |
| -016     | Added Documentation changes 21-23.                                                                                                                                                                                           | March 27, 2006 |
| -017     | <ul><li>Removed Documentation Changes 1-23.</li><li>Added Documentation Changes 1-36.</li></ul>                                                                                                                              | September 2006 |
| -018     | Added Documentation Changes 37-42.                                                                                                                                                                                           | October 2006   |
| -019     | <ul><li>Removed Documentation Changes 1-42.</li><li>Added Documentation Changes 1-19.</li></ul>                                                                                                                              | March 2007     |
| -020     | Added Documentation Changes 20-27.                                                                                                                                                                                           | May 2007       |
| -021     | <ul><li>Removed Documentation Changes 1-27.</li><li>Added Documentation Changes 1-6</li></ul>                                                                                                                                | November 2007  |
| -022     | <ul><li>Removed Documentation Changes 1-6</li><li>Added Documentation Changes 1-6</li></ul>                                                                                                                                  | August 2008    |
| -023     | <ul><li>Removed Documentation Changes 1-6</li><li>Added Documentation Changes 1-21</li></ul>                                                                                                                                 | March 2009     |



| Revision | Description                                                                                      | Date           |
|----------|--------------------------------------------------------------------------------------------------|----------------|
| -024     | <ul><li>Removed Documentation Changes 1-21</li><li>Added Documentation Changes 1-16</li></ul>    | June 2009      |
| -025     | <ul><li>Removed Documentation Changes 1-16</li><li>Added Documentation Changes 1-18</li></ul>    | September 2009 |
| -026     | <ul><li>Removed Documentation Changes 1-18</li><li>Added Documentation Changes 1-15</li></ul>    | December 2009  |
| -027     | <ul><li>Removed Documentation Changes 1-15</li><li>Added Documentation Changes 1-24</li></ul>    | March 2010     |
| -028     | <ul><li>Removed Documentation Changes 1-24</li><li>Added Documentation Changes 1-29</li></ul>    | June 2010      |
| -029     | <ul><li>Removed Documentation Changes 1-29</li><li>Added Documentation Changes 1-29</li></ul>    | September 2010 |
| -030     | <ul><li>Removed Documentation Changes 1-29</li><li>Added Documentation Changes 1-29</li></ul>    | January 2011   |
| -031     | <ul><li>Removed Documentation Changes 1-29</li><li>Added Documentation Changes 1-29</li></ul>    | April 2011     |
| -032     | <ul><li>Removed Documentation Changes 1-29</li><li>Added Documentation Changes 1-14</li></ul>    | May 2011       |
| -033     | <ul> <li>Removed Documentation Changes 1-14</li> <li>Added Documentation Changes 1-38</li> </ul> | October 2011   |
| -034     | <ul><li>Removed Documentation Changes 1-38</li><li>Added Documentation Changes 1-16</li></ul>    | December 2011  |
| -035     | <ul><li>Removed Documentation Changes 1-16</li><li>Added Documentation Changes 1-18</li></ul>    | March 2012     |

§

**Revision History** 



6



# Preface

This document is an update to the specifications contained in the Affected Documents table below. This document is a compilation of device and documentation errata, specification clarifications and changes. It is intended for hardware system manufacturers and software developers of applications, operating systems, or tools.

# **Affected Documents**

| Document Title                                                                                                               | Document<br>Number/Location |
|------------------------------------------------------------------------------------------------------------------------------|-----------------------------|
| <i>Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture</i>                | 253665                      |
| <i>Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 2A: Instruction Set Reference, A-M</i>   | 253666                      |
| <i>Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 2B: Instruction Set Reference, N-Z</i>   | 253667                      |
| <i>Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 2C: Instruction Set Reference</i>        | 326018                      |
| <i>Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, Part 1</i> | 253668                      |
| <i>Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3B: System Programming Guide, Part 2</i> | 253669                      |
| <i>Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3C: System Programming Guide, Part 3</i> | 326019                      |

# Nomenclature

**Documentation Changes** include typos, errors, or omissions from the current published specifications. These will be incorporated in any new release of the specification.



# Summary Tables of Changes

The following table indicates documentation changes which apply to the  $Intel^{(R)}$  64 and IA-32 architectures. This table uses the following notations:

# **Codes Used in Summary Tables**

Change bar to left of table row indicates this erratum is either new or modified from the previous version of the document.

# **Documentation Changes**

|    | No. | DOCUMENTATION CHANGES            |
|----|-----|----------------------------------|
| 1  | 1   | Updates to Chapter 1, Volume 1   |
| 1  | 2   | Updates to Chapter 1, Volume 2A  |
|    | 3   | Updates to Chapter 3, Volume 2A  |
|    | 4   | Updates to Chapter 4, Volume 2B  |
|    | 5   | Updates to Appendix A, Volume 2C |
|    | 6   | Updates to Appendix B, Volume 2C |
|    | 7   | Updates to Chapter 1, Volume 3A  |
|    | 8   | Updates to Chapter 4, Volume 3A  |
|    | 9   | Updates to Chapter 10, Volume 3A |
|    | 10  | Updates to Chapter 14, Volume 3B |
|    | 11  | Updates to Chapter 17, Volume 3B |
| 1  | 12  | Updates to Chapter 18, Volume 3B |
| 1  | 13  | Updates to Chapter 19, Volume 3B |
| I. | 14  | Updates to Chapter 25, Volume 3C |
| I. | 15  | Updates to Chapter 26, Volume 3C |
| 1  | 16  | Updates to Chapter 27, Volume 3C |
| 1  | 17  | Update to Volume 3C              |
| I. | 18  | Updates to Chapter 33, Volume 3C |



# **Documentation Changes**

#### 1. Updates to Chapter 1, Volume 1

Change bars show changes to Chapter 1 of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 1:* Basic Architecture.

\_\_\_\_\_

...

# 1.1 INTEL® 64 AND IA-32 PROCESSORS COVERED IN THIS MANUAL

This manual set includes information pertaining primarily to the most recent Intel 64 and IA-32 processors, which include:

- Pentium<sup>®</sup> processors
- P6 family processors
- Pentium<sup>®</sup> 4 processors
- Pentium<sup>®</sup> M processors
- Intel<sup>®</sup> Xeon<sup>®</sup> processors
- Pentium<sup>®</sup> D processors
- Pentium<sup>®</sup> processor Extreme Editions
- 64-bit Intel<sup>®</sup> Xeon<sup>®</sup> processors
- Intel<sup>®</sup> Core<sup>™</sup> Duo processor
- Intel<sup>®</sup> Core<sup>TM</sup> Solo processor
- Dual-Core Intel<sup>®</sup> Xeon<sup>®</sup> processor LV
- Intel<sup>®</sup> Core<sup>™</sup>2 Duo processor
- Intel<sup>®</sup> Core<sup>™</sup>2 Quad processor Q6000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 3000, 3200 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 5000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 5100, 5300 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme processor X7000 and X6800 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme processor QX6000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 7100 series
- Intel<sup>®</sup> Pentium<sup>®</sup> Dual-Core processor
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 7200, 7300 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 5200, 5400, 7400 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme processor QX9000 and X9000 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Quad processor Q9000 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Duo processor E8000, T9000 series
- Intel<sup>®</sup> Atom<sup>™</sup> processor family



- Intel<sup>®</sup> Core<sup>™</sup> i7 processor
- Intel<sup>®</sup> Core<sup>™</sup>i5 processor
- Intel<sup>®</sup> Xeon<sup>®</sup> processor E7-8800/4800/2800 product families
- Intel<sup>®</sup> Xeon<sup>®</sup> processor E5 family
- Intel<sup>®</sup> Xeon<sup>®</sup> processor E3 family
- Intel<sup>®</sup> Core<sup>TM</sup> i7-3930K processor
- 2nd generation Intel<sup>®</sup> Core<sup>™</sup> i7-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i5-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i3-2xxx processor series

P6 family processors are IA-32 processors based on the P6 family microarchitecture. This includes the Pentium<sup>®</sup> Pro, Pentium<sup>®</sup> II, Pentium<sup>®</sup> III, and Pentium<sup>®</sup> III Xeon<sup>®</sup> processors.

The Pentium<sup>®</sup> 4, Pentium<sup>®</sup> D, and Pentium<sup>®</sup> processor Extreme Editions are based on the Intel NetBurst<sup>®</sup> microarchitecture. Most early Intel<sup>®</sup> Xeon<sup>®</sup> processors are based on the Intel NetBurst<sup>®</sup> microarchitecture. Intel Xeon processor 5000, 7100 series are based on the Intel NetBurst<sup>®</sup> microarchitecture.

The Intel<sup>®</sup> Core<sup>TM</sup> Duo, Intel<sup>®</sup> Core<sup>TM</sup> Solo and dual-core Intel<sup>®</sup> Xeon<sup>®</sup> processor LV are based on an improved Pentium<sup>®</sup> M processor microarchitecture.

The Intel<sup>®</sup> Xeon<sup>®</sup> processor 3000, 3200, 5100, 5300, 7200 and 7300 series, Intel<sup>®</sup> Pentium<sup>®</sup> dual-core, Intel<sup>®</sup> Core<sup>TM</sup>2 Duo, Intel<sup>®</sup> Core<sup>TM</sup>2 Quad, and Intel<sup>®</sup> Core<sup>TM</sup>2 Extreme processors are based on Intel<sup>®</sup> Core<sup>TM</sup> microarchitecture.

The Intel<sup>®</sup> Xeon<sup>®</sup> processor 5200, 5400, 7400 series, Intel<sup>®</sup> Core<sup>™</sup>2 Quad processor Q9000 series, and Intel<sup>®</sup> Core<sup>™</sup>2 Extreme processor QX9000, X9000 series, Intel<sup>®</sup> Core<sup>™</sup>2 processor E8000 series are based on Enhanced Intel<sup>®</sup> Core<sup>™</sup> microarchitecture.

The Intel<sup>®</sup> Atom<sup>TM</sup> processor family is based on the Intel<sup>®</sup> Atom<sup>TM</sup> microarchitecture and supports Intel 64 architecture.

The Intel<sup>®</sup> Core<sup>TM</sup>i7 processor and the Intel<sup>®</sup> Core<sup>TM</sup>i5 processor are based on the Intel<sup>®</sup> microarchitecture code name Nehalem and support Intel 64 architecture.

Processors based on  ${\rm Intel}^{\textcircled{R}}$  microarchitecture code name Westmere support Intel 64 architecture.

P6 family, Pentium<sup>®</sup> M, Intel<sup>®</sup> Core<sup>TM</sup> Solo, Intel<sup>®</sup> Core<sup>TM</sup> Duo processors, dual-core Intel<sup>®</sup> Xeon<sup>®</sup> processor LV, and early generations of Pentium 4 and Intel Xeon processors support IA-32 architecture. The Intel<sup>®</sup> Atom<sup>TM</sup> processor Z5xx series support IA-32 architecture.

The Intel<sup>®</sup> Xeon<sup>®</sup> processor E5 family, Intel<sup>®</sup> Xeon<sup>®</sup> processor E3 family, Intel<sup>®</sup> Core<sup>™</sup> i7-3930K processor, 2nd generation Intel<sup>®</sup> Core<sup>™</sup> i7-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i5-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i3-2xxx processor series, Intel<sup>®</sup> Xeon<sup>®</sup> processor E7-8800/4800/2800 product families, Intel<sup>®</sup> Xeon<sup>®</sup> processor 3000, 3200, 5000, 5100, 5200, 5300, 5400, 7100, 7200, 7300, 7400 series, Intel<sup>®</sup> Core<sup>™</sup> 2 Duo, Intel<sup>®</sup> Core<sup>™</sup> 2 Extreme processors, Intel Core 2 Quad processors, Pentium<sup>®</sup> D processors, Pentium<sup>®</sup> Dual-Core processor, newer generations of Pentium 4 and Intel Xeon processor family support Intel<sup>®</sup> 64 architecture.

IA-32 architecture is the instruction set architecture and programming environment for Intel's 32-bit microprocessors.

Intel<sup>®</sup> 64 architecture is the instruction set architecture and programming environment which is the superset of Intel's 32-bit and 64-bit architectures. It is compatible with the IA-32 architecture.



2.

# Updates to Chapter 1, Volume 2A

Change bars show changes to Chapter 1 of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 2A:* Instruction Set Reference, A-L.

•••

. . .

# 1.1 INTEL® 64 AND IA-32 PROCESSORS COVERED IN THIS MANUAL

This manual set includes information pertaining primarily to the most recent Intel 64 and IA-32 processors, which include:

- Pentium<sup>®</sup> processors
- P6 family processors
- Pentium<sup>®</sup> 4 processors
- Pentium<sup>®</sup> M processors
- Intel<sup>®</sup> Xeon<sup>®</sup> processors
- Pentium<sup>®</sup> D processors
- Pentium<sup>®</sup> processor Extreme Editions
- 64-bit Intel<sup>®</sup> Xeon<sup>®</sup> processors
- Intel<sup>®</sup> Core<sup>™</sup> Duo processor
- Intel<sup>®</sup> Core<sup>™</sup> Solo processor
- Dual-Core Intel<sup>®</sup> Xeon<sup>®</sup> processor LV
- Intel<sup>®</sup> Core<sup>™</sup>2 Duo processor
- Intel<sup>®</sup> Core<sup>™</sup>2 Quad processor Q6000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 3000, 3200 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 5000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 5100, 5300 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme processor X7000 and X6800 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme QX6000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 7100 series
- Intel<sup>®</sup> Pentium<sup>®</sup> Dual-Core processor
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 7200, 7300 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 5200, 5400, 7400 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme processor QX9000 and X9000 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Quad processor Q9000 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Duo processor E8000, T9000 series
- Intel<sup>®</sup> Atom<sup>™</sup> processor family
- Intel<sup>®</sup> Core<sup>™</sup>i7 processor



- Intel<sup>®</sup> Core<sup>™</sup>i5 processor
- Intel<sup>®</sup> Xeon<sup>®</sup> processor E7-8800/4800/2800 product families
- Intel<sup>®</sup> Xeon<sup>®</sup> processor E5 family
- Intel<sup>®</sup> Xeon<sup>®</sup> processor E3 family
- Intel<sup>®</sup> Core<sup>™</sup> i7-3930K processor
- 2nd generation Intel<sup>®</sup> Core<sup>™</sup> i7-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i5-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i3-2xxx processor series

P6 family processors are IA-32 processors based on the P6 family microarchitecture. This includes the Pentium<sup>®</sup> Pro, Pentium<sup>®</sup> II, Pentium<sup>®</sup> III, and Pentium<sup>®</sup> III Xeon<sup>®</sup> processors.

The Pentium<sup>®</sup> 4, Pentium<sup>®</sup> D, and Pentium<sup>®</sup> processor Extreme Editions are based on the Intel NetBurst<sup>®</sup> microarchitecture. Most early Intel<sup>®</sup> Xeon<sup>®</sup> processors are based on the Intel NetBurst<sup>®</sup> microarchitecture. Intel Xeon processor 5000, 7100 series are based on the Intel NetBurst<sup>®</sup> microarchitecture.

The Intel<sup>®</sup> Core<sup>m</sup> Duo, Intel<sup>®</sup> Core<sup>m</sup> Solo and dual-core Intel<sup>®</sup> Xeon<sup>®</sup> processor LV are based on an improved Pentium<sup>®</sup> M processor microarchitecture.

The Intel<sup>®</sup> Xeon<sup>®</sup> processor 3000, 3200, 5100, 5300, 7200, and 7300 series, Intel<sup>®</sup> Pentium<sup>®</sup> dual-core, Intel<sup>®</sup> Core<sup>m</sup>2 Duo, Intel<sup>®</sup> Core<sup>m</sup>2 Quad, and Intel<sup>®</sup> Core<sup>m</sup>2 Extreme processors are based on Intel<sup>®</sup> Core<sup>m</sup> microarchitecture.

The Intel<sup>®</sup> Xeon<sup>®</sup> processor 5200, 5400, 7400 series, Intel<sup>®</sup> Core<sup>TM</sup>2 Quad processor Q9000 series, and Intel<sup>®</sup> Core<sup>TM</sup>2 Extreme processors QX9000, X9000 series, Intel<sup>®</sup> Core<sup>TM</sup>2 processor E8000 series are based on Enhanced Intel<sup>®</sup> Core<sup>TM</sup> microarchitecture.

The Intel<sup>®</sup> Atom<sup>TM</sup> processor family is based on the Intel<sup>®</sup> Atom<sup>TM</sup> microarchitecture and supports Intel 64 architecture.

The Intel<sup>®</sup> Core<sup>TM</sup> i7 processor and the Intel<sup>®</sup> Core<sup>TM</sup> i5 processor are based on the Intel<sup>®</sup> microarchitecture code name Nehalem and support Intel 64 architecture.

Processors based on  ${\rm Intel}^{(\! 8\!)}$  microarchitecture code name Westmere support Intel 64 architecture.

P6 family, Pentium<sup>®</sup> M, Intel<sup>®</sup> Core<sup>™</sup> Solo, Intel<sup>®</sup> Core<sup>™</sup> Duo processors, dual-core Intel<sup>®</sup> Xeon<sup>®</sup> processor LV, and early generations of Pentium 4 and Intel Xeon processors support IA-32 architecture. The Intel<sup>®</sup> Atom<sup>™</sup> processor Z5xx series support IA-32 architecture.

The Intel<sup>®</sup> Xeon<sup>®</sup> processor E5 family, Intel<sup>®</sup> Xeon<sup>®</sup> processor E3 family, Intel<sup>®</sup> Core<sup>™</sup> i7-3930K processor, 2nd generation Intel<sup>®</sup> Core<sup>™</sup> i7-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i5-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i3-2xxx processor series, Intel<sup>®</sup> Xeon<sup>®</sup> processor E7-8800/4800/2800 product families, Intel<sup>®</sup> Xeon<sup>®</sup> processor 3000, 3200, 5000, 5100, 5200, 5300, 5400, 7100, 7200, 7300, 7400 series, Intel<sup>®</sup> Core<sup>™</sup> 2 Duo, Intel<sup>®</sup> Core<sup>™</sup> 2 Extreme, Intel<sup>®</sup> Core<sup>™</sup> 2 Quad processors, Pentium<sup>®</sup> D processors, Pentium<sup>®</sup> Dual-Core processor, newer generations of Pentium 4 and Intel Xeon processor family support Intel<sup>®</sup> 64 architecture.

IA-32 architecture is the instruction set architecture and programming environment for Intel's 32-bit microprocessors.

Intel<sup>®</sup> 64 architecture is the instruction set architecture and programming environment which is the superset of Intel's 32-bit and 64-bit architectures. It is compatible with the IA-32 architecture.



...

#### 3. Updates to Chapter 3, Volume 2A

Change bars show changes to Chapter 3 of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 2A:* Instruction Set Reference, A-L.

\_\_\_\_\_

•••

#### AAS—ASCII Adjust AL After Subtraction

| Opcode | Instruction | Op/<br>En | 64-bit<br>Mode | Compat/<br>Leg Mode | Description                        |
|--------|-------------|-----------|----------------|---------------------|------------------------------------|
| ЗF     | AAS         | NP        | Invalid        | Valid               | ASCII adjust AL after subtraction. |

|       |           | Instruction Operand 6 | Encoding  |           |
|-------|-----------|-----------------------|-----------|-----------|
| Op/En | Operand 1 | Operand 2             | Operand 3 | Operand 4 |
| NP    | NA        | NA                    | NA        | NA        |

#### Description

Adjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.

If the subtraction produced a decimal carry, the AH register decrements by 1, and the CF and AF flags are set. If no decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL register is left with its top four bits set to 0.

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.

#### Operation

```
IF 64-bit mode

THEN

#UD;

ELSE

IF ((AL AND OFH) > 9) or (AF = 1)

THEN

AX \leftarrow AX - 6;

AH \leftarrow AH - 1;

AF \leftarrow 1;

CF \leftarrow 1;

AL \leftarrow AL AND OFH;

ELSE
```



$$CF \leftarrow 0;$$
  
 $AF \leftarrow 0;$   
 $AL \leftarrow AL AND OFH;$ 

FI;

FI;

...

| Opcode                         | Instruction             | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description                                  |
|--------------------------------|-------------------------|-----------|----------------|---------------------|----------------------------------------------|
| F2 0F 38 F0 <i>/r</i>          | CRC32 <i>r32, r/m8</i>  | RM        | Valid          | Valid               | Accumulate CRC32 on <i>r/m8</i> .            |
| F2 REX OF 38<br>F0 <i>/r</i>   | CRC32 <i>r32, r/m8*</i> | RM        | Valid          | N.E.                | Accumulate CRC32 on <i>r/m8.</i>             |
| F2 0F 38 F1 <i>/r</i>          | CRC32 <i>r32, r/m16</i> | RM        | Valid          | Valid               | Accumulate CRC32 on r/<br>m16.               |
| F2 0F 38 F1 <i>/r</i>          | CRC32 <i>r32, r/m32</i> | RM        | Valid          | Valid               | Accumulate CRC32 on r/<br>m32.               |
| F2 REX.W OF 38<br>F0 <i>/r</i> | CRC32 <i>r64, r/m8</i>  | RM        | Valid          | N.E.                | Accumulate CRC32 on <i>r/m8.</i>             |
| F2 REX.W OF 38<br>F1 <i>/r</i> | CRC32 <i>r64, r/m64</i> | RM        | Valid          | N.E.                | Accumulate CRC32 on <i>r/</i><br><i>m64.</i> |

#### NOTES:

\*In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH.

|       |                  | Instruction Operand E | ncoding   |           |
|-------|------------------|-----------------------|-----------|-----------|
| Op/En | Operand 1        | Operand 2             | Operand 3 | Operand 4 |
| RM    | ModRM:reg (r, w) | ModRM:r/m (r)         | NA        | NA        |

#### Description

Starting with an initial value in the first operand (destination operand), accumulates a CRC32 (polynomial 0x11EDC6F41) value for the second operand (source operand) and stores the result in the destination operand. The source operand can be a register or a memory location. The destination operand must be an r32 or r64 register. If the destination is an r64 register, then the 32-bit result is stored in the least significant double word and 00000000H is stored in the most significant double word of the r64 register.

The initial value supplied in the destination operand is a double word integer stored in the r32 register or the least significant double word of the r64 register. To incrementally accumulate a CRC32 value, software retains the result of the previous CRC32 operation in the destination operand, then executes the CRC32 instruction again with new input data in the source operand. Data contained in the source operand is processed in reflected bit order. This means that the most significant bit of the source operand is treated as the least significant bit of the quotient, and so on, for all the bits of the source operand in reflected bit order. This means that the most significant bit of the resulting CRC (bit 31) is stored in the least significant bit of the destination operand (bit 0), and so on, for all the bits of the CRC.



#### Operation

#### Notes:

BIT\_REFLECT64: DST[63-0] = SRC[0-63] BIT\_REFLECT32: DST[31-0] = SRC[0-31] BIT\_REFLECT16: DST[15-0] = SRC[0-15] BIT\_REFLECT8: DST[7-0] = SRC[0-7] MOD2: Remainder from Polynomial division modulus 2

CRC32 instruction for 64-bit source operand and 64-bit destination operand:

$$\begin{split} \mathsf{TEMP1[63-0]} &\leftarrow \mathsf{BIT\_REFLECT64} (\mathsf{SRC[63-0]}) \\ \mathsf{TEMP2[31-0]} &\leftarrow \mathsf{BIT\_REFLECT32} (\mathsf{DEST[31-0]}) \\ \mathsf{TEMP3[95-0]} &\leftarrow \mathsf{TEMP1[63-0]} &\ll 32 \\ \mathsf{TEMP4[95-0]} &\leftarrow \mathsf{TEMP2[31-0]} &\ll 64 \\ \mathsf{TEMP5[95-0]} &\leftarrow \mathsf{TEMP3[95-0]} \mathsf{XOR} \mathsf{TEMP4[95-0]} \\ \mathsf{TEMP6[31-0]} &\leftarrow \mathsf{TEMP5[95-0]} \mathsf{MOD2} \mathsf{11EDC6F41H} \\ \mathsf{DEST[31-0]} &\leftarrow \mathsf{BIT\_REFLECT} (\mathsf{TEMP6[31-0]}) \\ \mathsf{DEST[63-32]} &\leftarrow \mathsf{0000000H} \\ \end{split}$$

CRC32 instruction for 32-bit source operand and 32-bit destination operand:

TEMP1[31-0] ← BIT\_REFLECT32 (SRC[31-0]) TEMP2[31-0] ← BIT\_REFLECT32 (DEST[31-0]) TEMP3[63-0] ← TEMP1[31-0]  $\ll$  32 TEMP4[63-0] ← TEMP2[31-0]  $\ll$  32 TEMP5[63-0] ← TEMP3[63-0] XOR TEMP4[63-0] TEMP6[31-0] ← TEMP5[63-0] MOD2 11EDC6F41H DEST[31-0] ← BIT\_REFLECT (TEMP6[31-0])

CRC32 instruction for 16-bit source operand and 32-bit destination operand:

TEMP1[15-0] ← BIT\_REFLECT16 (SRC[15-0]) TEMP2[31-0] ← BIT\_REFLECT32 (DEST[31-0]) TEMP3[47-0] ← TEMP1[15-0]  $\ll$  32 TEMP4[47-0] ← TEMP2[31-0]  $\ll$  16 TEMP5[47-0] ← TEMP3[47-0] XOR TEMP4[47-0] TEMP6[31-0] ← TEMP5[47-0] MOD2 11EDC6F41H DEST[31-0] ← BIT\_REFLECT (TEMP6[31-0])

CRC32 instruction for 8-bit source operand and 64-bit destination operand:

TEMP1[7-0] ← BIT\_REFLECT8(SRC[7-0]) TEMP2[31-0] ← BIT\_REFLECT32 (DEST[31-0]) TEMP3[39-0] ← TEMP1[7-0]  $\ll$  32 TEMP4[39-0] ← TEMP2[31-0]  $\ll$  8 TEMP5[39-0] ← TEMP3[39-0] XOR TEMP4[39-0] TEMP6[31-0] ← TEMP5[39-0] MOD2 11EDC6F41H DEST[31-0] ← BIT\_REFLECT (TEMP6[31-0]) DEST[63-32] ← 0000000H

CRC32 instruction for 8-bit source operand and 32-bit destination operand:

TEMP1[7-0]  $\leftarrow$  BIT\_REFLECT8(SRC[7-0])



TEMP2[31-0] ← BIT\_REFLECT32 (DEST[31-0]) TEMP3[39-0] ← TEMP1[7-0]  $\ll$  32 TEMP4[39-0] ← TEMP2[31-0]  $\ll$  8 TEMP5[39-0] ← TEMP3[39-0] XOR TEMP4[39-0] TEMP6[31-0] ← TEMP5[39-0] MOD2 11EDC6F41H DEST[31-0] ← BIT\_REFLECT (TEMP6[31-0])

#### **Flags Affected**

None

#### Intel C/C++ Compiler Intrinsic Equivalent

unsigned int \_mm\_crc32\_u8( unsigned int crc, unsigned char data ) unsigned int \_mm\_crc32\_u16( unsigned int crc, unsigned short data ) unsigned int \_mm\_crc32\_u32( unsigned int crc, unsigned int data ) unsinged \_\_int64 \_mm\_crc32\_u64( unsinged \_\_int64 crc, unsigned \_\_int64 data )

#### SIMD Floating Point Exceptions

None

#### **Protected Mode Exceptions**

| #GP(0)                      | If a memory operand effective address is outside the CS, DS, ES, FS or GS segments.                                     |
|-----------------------------|-------------------------------------------------------------------------------------------------------------------------|
| #SS(0)                      | If a memory operand effective address is outside the SS segment limit.                                                  |
| <pre>#PF (fault-code)</pre> | For a page fault.                                                                                                       |
| #AC(0)                      | If alignment checking is enabled and an unaligned memory refer-<br>ence is made while the current privilege level is 3. |
| #UD                         | If CPUID.01H:ECX.SSE4_2 [Bit 20] = $0$ .                                                                                |
|                             | If LOCK prefix is used.                                                                                                 |

#### **Real Mode Exceptions**

| #GP(0) | If any part of the operand lies outside of the effective address space |
|--------|------------------------------------------------------------------------|
|        | from 0 to 0FFFFH.                                                      |

| #SS(0) | If a memory operand effective address is outside the SS segment limit. |
|--------|------------------------------------------------------------------------|
|        |                                                                        |

#UD If CPUID.01H:ECX.SSE4\_2 [Bit 20] = 0. If LOCK prefix is used.

#### Virtual 8086 Mode Exceptions

| #GP(0)                      | If any part of the operand lies outside of the effective address space from 0 to 0FFFFH. |
|-----------------------------|------------------------------------------------------------------------------------------|
| #SS(0)                      | If a memory operand effective address is outside the SS segment limit.                   |
| <pre>#PF (fault-code)</pre> | For a page fault.                                                                        |
| " • ~ ( ~ )                 |                                                                                          |

- #AC(0) If alignment checking is enabled and an unaligned memory reference is made.
- #UD If CPUID.01H:ECX.SSE4\_2 [Bit 20] = 0.



If LOCK prefix is used.

#### **Compatibility Mode Exceptions**

Same exceptions as in Protected Mode.

#### **64-Bit Mode Exceptions**

| #GP(0)                      | If the memory address is in a non-canonical form.                                                                       |
|-----------------------------|-------------------------------------------------------------------------------------------------------------------------|
| #SS(0)                      | If a memory address referencing the SS segment is in a non-canon-<br>ical form.                                         |
| <pre>#PF (fault-code)</pre> | For a page fault.                                                                                                       |
| #AC(0)                      | If alignment checking is enabled and an unaligned memory refer-<br>ence is made while the current privilege level is 3. |
| #UD                         | If CPUID.01H:ECX.SSE4_2 [Bit 20] = $0$ .                                                                                |
|                             | If LOCK prefix is used.                                                                                                 |

•••

# LAR—Load Access Rights Byte

| Opcode           | Instruction                       | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description                                                |
|------------------|-----------------------------------|-----------|----------------|---------------------|------------------------------------------------------------|
| 0F 02 /r         | LAR <i>r16, r16/m16</i>           | RM        | Valid          | Valid               | r16 ← access rights<br>referenced by r16/m16               |
| 0F 02 / <i>r</i> | LAR reg, r32/<br>m16 <sup>1</sup> | RM        | Valid          | Valid               | <i>reg</i> ← access rights<br>referenced by <i>r32/m16</i> |

#### NOTES:

1. For all loads (regardless of source or destination sizing) only bits 16-0 are used. Other bits are ignored.

#### Instruction Operand Encoding

| Op/En | Operand 1     | Operand 2     | Operand 3 | Operand 4 |
|-------|---------------|---------------|-----------|-----------|
| RM    | ModRM:reg (w) | ModRM:r/m (r) | NA        | NA        |

#### Description

Loads the access rights from the segment descriptor specified by the second operand (source operand) into the first operand (destination operand) and sets the ZF flag in the flag register. The source operand (which can be a register or a memory location) contains the segment selector for the segment descriptor being accessed. If the source operand is a memory address, only 16 bits of data are accessed. The destination operand is a general-purpose register.

The processor performs access checks as part of the loading process. Once loaded in the destination register, software can perform additional checks on the access rights information.

The access rights for a segment descriptor include fields located in the second doubleword (bytes 4–7) of the segment descriptor. The following fields are loaded by the LAR instruction:

• Bits 7:0 are returned as 0



- Bits 11:8 return the segment type.
- Bit 12 returns the S flag.
- Bits 14:13 return the DPL.
- Bit 15 returns the P flag.
- The following fields are returned only if the operand size is greater than 16 bits:
  - Bits 19:16 are undefined.
  - Bit 20 returns the software-available bit in the descriptor.
  - Bit 21 returns the L flag.
  - Bit 22 returns the D/B flag.
  - Bit 23 returns the G flag.
  - Bits 31:24 are returned as 0.

This instruction performs the following checks before it loads the access rights in the destination register:

- Checks that the segment selector is not NULL.
- Checks that the segment selector points to a descriptor that is within the limits of the GDT or LDT being accessed
- Checks that the descriptor type is valid for this instruction. All code and data segment descriptors are valid for (can be accessed with) the LAR instruction. The valid system segment and gate descriptor types are given in Table 3-62.
- If the segment is not a conforming code segment, it checks that the specified segment descriptor is visible at the CPL (that is, if the CPL and the RPL of the segment selector are less than or equal to the DPL of the segment selector).

If the segment descriptor cannot be accessed or is an invalid type for the instruction, the ZF flag is cleared and no access rights are loaded in the destination operand.

The LAR instruction can only be executed in protected mode and IA-32e mode.

| Туре | Protected Mode          |     | IA-32e Mode          | •     |
|------|-------------------------|-----|----------------------|-------|
|      | Name Valid              |     | Name                 | Valid |
| 0    | Reserved                | No  | Reserved             | No    |
| 1    | Available 16-bit TSS    | Yes | Reserved             | No    |
| 2    | LDT                     | Yes | LDT                  | No    |
| 3    | Busy 16-bit TSS         | Yes | Reserved             | No    |
| 4    | 16-bit call gate        | Yes | Reserved             | No    |
| 5    | 16-bit/32-bit task gate | Yes | Reserved             | No    |
| 6    | 16-bit interrupt gate   | No  | Reserved             | No    |
| 7    | 16-bit trap gate        | No  | Reserved             | No    |
| 8    | Reserved                | No  | Reserved             | No    |
| 9    | Available 32-bit TSS    | Yes | Available 64-bit TSS | Yes   |
| А    | Reserved                | No  | Reserved             | No    |
| В    | Busy 32-bit TSS         | Yes | Busy 64-bit TSS      | Yes   |

#### Table 3-62 Segment and Gate Types



| С | 32-bit call gate      | Yes | 64-bit call gate      | Yes |  |  |
|---|-----------------------|-----|-----------------------|-----|--|--|
| D | Reserved              | No  | Reserved              | No  |  |  |
| Е | 32-bit interrupt gate | No  | 64-bit interrupt gate | No  |  |  |
| F | 32-bit trap gate      | No  | 64-bit trap gate      | No  |  |  |

#### Table 3-62 Segment and Gate Types

#### Operation

 $\begin{array}{ll} \mbox{IF Offset(SRC) > descriptor table limit} \\ \mbox{THEN} & \mbox{ZF} \leftarrow 0; \\ \mbox{ELSE} & \mbox{SegmentDescriptor} \leftarrow descriptor referenced by SRC; \\ \mbox{IF SegmentDescriptor(Type) $\neq$ conforming code segment} \\ \mbox{and (CPL > DPL) or (RPL > DPL)} \\ \mbox{or SegmentDescriptor(Type) is not valid for instruction} \\ \mbox{THEN} & \mbox{ZF} \leftarrow 0; \\ \mbox{ELSE} & \mbox{DEST} \leftarrow access rights from SegmentDescriptor as given in Description section; } \\ \mbox{ZF} \leftarrow 1; \\ \mbox{Fl;} \end{array}$ 

FI;

# **Flags Affected**

The ZF flag is set to 1 if the access rights are loaded successfully; otherwise, it is cleared to 0.

#### **Protected Mode Exceptions**

| #GP(0)                     | If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.                                      |
|----------------------------|--------------------------------------------------------------------------------------------------------------------------------|
|                            | If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment selector.                            |
| #SS(0)                     | If a memory operand effective address is outside the SS segment limit.                                                         |
| <pre>#PF(fault-code)</pre> | If a page fault occurs.                                                                                                        |
| #AC(0)                     | If alignment checking is enabled and the memory operand effective address is unaligned while the current privilege level is 3. |
| #UD                        | If the LOCK prefix is used.                                                                                                    |
|                            |                                                                                                                                |

# **Real-Address Mode Exceptions**

#UD The LAR instruction is not recognized in real-address mode.

#### Virtual-8086 Mode Exceptions

| #UD The LAR instruction cannot be executed in virtual-8086 mode |
|-----------------------------------------------------------------|
|-----------------------------------------------------------------|



#### **Compatibility Mode Exceptions**

Same exceptions as in protected mode.

#### **64-Bit Mode Exceptions**

| #SS(0)                     | If the memory operand effective address referencing the SS segment is in a non-canonical form.                                 |
|----------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| #GP(0)                     | If the memory operand effective address is in a non-canonical form.                                                            |
| <pre>#PF(fault-code)</pre> | If a page fault occurs.                                                                                                        |
| #AC(0)                     | If alignment checking is enabled and the memory operand effective address is unaligned while the current privilege level is 3. |
| #UD                        | If the LOCK prefix is used.                                                                                                    |

...

#### LOCK—Assert LOCK# Signal Prefix

| Opcode | Instruction | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description                                                              |
|--------|-------------|-----------|----------------|---------------------|--------------------------------------------------------------------------|
| FO     | LOCK        | NP        | Valid          | Valid               | Asserts LOCK# signal for<br>duration of the<br>accompanying instruction. |

NOTES:

\* See IA-32 Architecture Compatibility section below.

| Instruction Operand Encoding |           |           |           |           |  |  |
|------------------------------|-----------|-----------|-----------|-----------|--|--|
| Op/En                        | Operand 1 | Operand 2 | Operand 3 | Operand 4 |  |  |
| NP                           | NA        | NA        | NA        | NA        |  |  |

#### Description

Causes the processor's LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted.

Note that, in later Intel 64 and IA-32 processors (including the Pentium 4, Intel Xeon, and P6 family processors), locking may occur without the LOCK# signal being asserted. See the "IA-32 Architecture Compatibility" section below.

The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If the LOCK prefix is used with one of these instructions and the source operand is a memory operand, an undefined opcode exception (#UD) may be generated. An undefined opcode exception will also be generated if the LOCK prefix is used with any instruction not in the above list. The XCHG instruction always asserts the LOCK# signal regardless of the presence or absence of the LOCK prefix.

The LOCK prefix is typically used with the BTS instruction to perform a read-modify-write operation on a memory location in shared memory environment.



The integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed for arbitrarily misaligned fields.

This instruction's operation is the same in non-64-bit modes and 64-bit mode.

•••

#### 4. Updates to Chapter 4, Volume 2B

Change bars show changes to Chapter 4 of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 2B:* Instruction Set Reference, M-Z.

\_\_\_\_\_

...

# MASKMOVDQU—Store Selected Bytes of Double Quadword

| Opcode/<br>Instruction                            | Op/<br>En | 64/32-bit<br>Mode | CPUID<br>Feature<br>Flag | Description                                                                                                                                                 |
|---------------------------------------------------|-----------|-------------------|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 66 OF F7 /r<br>MASKMOVDQU xmm1, xmm2              | RM        | V/V               | SSE2                     | Selectively write bytes from <i>xmm1</i> to memory location using the byte mask in <i>xmm2</i> . The default memory location is specified by DS:DI/EDI/RDI. |
| VEX.128.66.0F.WIG F7 /r<br>VMASKMOVDQU xmm1, xmm2 | RM        | V/V               | AVX                      | Selectively write bytes from<br>xmm1 to memory location<br>using the byte mask in<br>xmm2. The default memory<br>location is specified by<br>DS:DI/EDI/RDI. |

•••



## MFENCE—Memory Fence

| Opcode   | Instruction | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description                           |
|----------|-------------|-----------|----------------|---------------------|---------------------------------------|
| OF AE /6 | MFENCE      | NP        | Valid          | Valid               | Serializes load and store operations. |

# Instruction Operand Encoding Op/En Operand 1 Operand 2 Operand 3 Operand 4 NP NA NA NA NA

#### Description

Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction.<sup>1</sup> The MFENCE instruction is ordered with respect to all load and store instructions, other MFENCE instructions, any LFENCE and SFENCE instructions, and any serializing instructions (such as the CPUID instruction). MFENCE does not serialize the instruction stream.

Weakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue, speculative reads, write-combining, and write-collapsing. The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The MFENCE instruction provides a performance-efficient way of ensuring load and store ordering between routines that produce weakly-ordered results and routines that consume that data.

Processors are free to fetch and cache data speculatively from regions of system memory that use the WB, WC, and WT memory types. This speculative fetching can occur at any time and is not tied to instruction execution. Thus, it is not ordered with respect to executions of the MFENCE instruction; data can be brought into the caches speculatively just before, during, or after the execution of an MFENCE instruction.

This instruction's operation is the same in non-64-bit modes and 64-bit mode.

• • •

<sup>1.</sup> A load instruction is considered to become globally visible when the value to be loaded into its destination register is determined.



NA

## MONITOR—Set Up Monitor Address

NA

| Opcode          | Instruction | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description                                                                                                                                                                                                             |
|-----------------|-------------|-----------|----------------|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| OF 01 <i>C8</i> | MONITOR     | NP        | Valid          | Valid               | Sets up a linear address<br>range to be monitored by<br>hardware and activates the<br>monitor. The address range<br>should be a write-back<br>memory caching type. The<br>address is DS:EAX (DS:RAX<br>in 64-bit mode). |

|           | Instruction Operand E | ncoding   |           |
|-----------|-----------------------|-----------|-----------|
| Operand 1 | Operand 2             | Operand 3 | Operand 4 |

NA

NA

#### Description

Op/En

NP

The MONITOR instruction arms address monitoring hardware using an address specified in EAX (the address range that the monitoring hardware checks for store operations can be determined by using CPUID). A store to an address within the specified address range triggers the monitoring hardware. The state of monitor hardware is used by MWAIT.

The content of EAX is an effective address (in 64-bit mode, RAX is used). By default, the DS segment is used to create a linear address that is monitored. Segment overrides can be used.

ECX and EDX are also used. They communicate other information to MONITOR. ECX specifies optional extensions. EDX specifies optional hints; it does not change the architectural behavior of the instruction. For the Pentium 4 processor (family 15, model 3), no extensions or hints are defined. Undefined hints in EDX are ignored by the processor; undefined extensions in ECX raises a general protection fault.

The address range must use memory of the write-back type. Only write-back memory will correctly trigger the monitoring hardware. Additional information on determining what address range to use in order to prevent false wake-ups is described in Chapter 8, "Multiple-Processor Management" of the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3A.

The MONITOR instruction is ordered as a load operation with respect to other memory transactions. The instruction is subject to the permission checking and faults associated with a byte load. Like a load, MONITOR sets the A-bit but not the D-bit in page tables.

CPUID.01H:ECX.MONITOR[bit 3] indicates the availability of MONITOR and MWAIT in the processor. When set, MONITOR may be executed only at privilege level 0 (use at any other privilege level results in an invalid-opcode exception). The operating system or system BIOS may disable this instruction by using the IA32\_MISC\_ENABLE MSR; disabling MONITOR clears the CPUID feature flag and causes execution to generate an invalid-opcode exception.

The instruction's operation is the same in non-64-bit modes and 64-bit mode.

•••



| MOVHLPS— Move Packed Single-Precision Floating-Point Values High | to |
|------------------------------------------------------------------|----|
| Low                                                              |    |

| Opcode/<br>Instruction                                | Op/<br>En | 64/32-bit<br>Mode | CPUID<br>Feature<br>Flag | Description                                                                                                                            |
|-------------------------------------------------------|-----------|-------------------|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------|
| OF 12 /r<br>MOVHLPS xmm1, xmm2                        | RM        | V/V               | SSE                      | Move two packed single-<br>precision floating-point<br>values from high quadword<br>of <i>xmm2</i> to low quadword<br>of <i>xmm1</i> . |
| VEX.NDS.128.0F.WIG 12 /r<br>VMOVHLPS xmm1, xmm2, xmm3 | RVM       | V/V               | AVX                      | Merge two packed single-<br>precision floating-point<br>values from high quadword<br>of xmm3 and low quadword<br>of xmm2.              |

•••

#### MWAIT—Monitor Wait

| Opcode          | Instruction | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description                                                                                                                                                                   |
|-----------------|-------------|-----------|----------------|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0F 01 <i>C9</i> | MWAIT       | NP        | Valid          | Valid               | A hint that allow the<br>processor to stop<br>instruction execution and<br>enter an implementation-<br>dependent optimized state<br>until occurrence of a class of<br>events. |

|       |             | Instruction Operand | d Encoding |           |
|-------|-------------|---------------------|------------|-----------|
| Op/Ei | n Operand 1 | Operand 2           | Operand 3  | Operand 4 |
| NP    | NA          | NA                  | NA         | NA        |

#### Description

MWAIT instruction provides hints to allow the processor to enter an implementationdependent optimized state. There are two principal targeted usages: address-range monitor and advanced power management. Both usages of MWAIT require the use of the MONITOR instruction.

CPUID.01H:ECX.MONITOR[bit 3] indicates the availability of MONITOR and MWAIT in the processor. When set, MWAIT may be executed only at privilege level 0 (use at any other privilege level results in an invalid-opcode exception). The operating system or system BIOS may disable this instruction by using the IA32\_MISC\_ENABLE MSR; disabling MWAIT clears the CPUID feature flag and causes execution to generate an invalid-opcode exception.

This instruction's operation is the same in non-64-bit modes and 64-bit mode.

ECX specifies optional extensions for the MWAIT instruction. EAX may contain hints such as the preferred optimized state the processor should enter. The first processors to



implement MWAIT supported only the zero value for EAX and ECX. Later processors allowed setting ECX[0] to enable masked interrupts as break events for MWAIT (see below). Software can use the CPUID instruction to determine the extensions and hints supported by the processor.

#### MWAIT for Address Range Monitoring

For address-range monitoring, the MWAIT instruction operates with the MONITOR instruction. The two instructions allow the definition of an address at which to wait (MONITOR) and a implementation-dependent-optimized operation to commence at the wait address (MWAIT). The execution of MWAIT is a hint to the processor that it can enter an implementation-dependent-optimized state while waiting for an event or a store operation to the address range armed by MONITOR.

The following cause the processor to exit the implementation-dependent-optimized state: a store to the address range armed by the MONITOR instruction, an NMI or SMI, a debug exception, a machine check exception, the BINIT# signal, the INIT# signal, and the RESET# signal. Other implementation-dependent events may also cause the processor to exit the implementation-dependent-optimized state.

In addition, an external interrupt causes the processor to exit the implementationdependent-optimized state either (1) if the interrupt would be delivered to software (e.g., as it would be if HLT had been executed instead of MWAIT); or (2) if ECX[0] = 1. Software can execute MWAIT with ECX[0] = 1 only if CPUID.05H:ECX[bit 1] = 1. (Implementation-specific conditions may result in an interrupt causing the processor to exit the implementation-dependent-optimized state even if interrupts are masked and ECX[0] = 0.)

Following exit from the implementation-dependent-optimized state, control passes to the instruction following the MWAIT instruction. A pending interrupt that is not masked (including an NMI or an SMI) may be delivered before execution of that instruction. Unlike the HLT instruction, the MWAIT instruction does not support a restart at the MWAIT instruction following the handling of an SMI.

If the preceding MONITOR instruction did not successfully arm an address range or if the MONITOR instruction has not been executed prior to executing MWAIT, then the processor will not enter the implementation-dependent-optimized state. Execution will resume at the instruction following the MWAIT.

#### **MWAIT for Power Management**

MWAIT accepts a hint and optional extension to the processor that it can enter a specified target C state while waiting for an event or a store operation to the address range armed by MONITOR. Support for MWAIT extensions for power management is indicated by CPUID.05H:ECX[bit 0] reporting 1.

EAX and ECX are used to communicate the additional information to the MWAIT instruction, such as the kind of optimized state the processor should enter. ECX specifies optional extensions for the MWAIT instruction. EAX may contain hints such as the preferred optimized state the processor should enter. Implementation-specific conditions may cause a processor to ignore the hint and enter a different optimized state. Future processor implementations may implement several optimized "waiting" states and will select among those states based on the hint argument.

Table 4-10 describes the meaning of ECX and EAX registers for MWAIT extensions.



| Bits  | Description                                                                                                                  |
|-------|------------------------------------------------------------------------------------------------------------------------------|
| 0     | Treat interrupts as break events even if masked (e.g., even if EFLAGS.IF=0).<br>May be set only if CPUID.05H:ECX[bit 1] = 1. |
| 31: 1 | Reserved                                                                                                                     |

#### Table 4-10 MWAIT Extension Register (ECX)

#### Table 4-11 MWAIT Hints Register (EAX)

| Bits  | Description                                                                                       |
|-------|---------------------------------------------------------------------------------------------------|
| 3:0   | Sub C-state within a C-state, indicated by bits [7:4]                                             |
| 7:4   | Target C-state*                                                                                   |
|       | Value of 0 means C1; 1 means C2 and so on                                                         |
|       | Value of 01111B means C0                                                                          |
|       | Note: Target C states for MWAIT extensions are processor-specific C-<br>states, not ACPI C-states |
| 31: 8 | Reserved                                                                                          |

Note that if MWAIT is used to enter any of the C-states that are numerically higher than C1, a store to the address range armed by the MONITOR instruction will cause the processor to exit MWAIT only if the store was originated by other processor agents. A store from non-processor agent might not cause the processor to exit MWAIT in such cases.

For additional details of MWAIT extensions, see Chapter 14, "Power and Thermal Management," of Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3A.

#### Operation

(\* MWAIT takes the argument in EAX as a hint extension and is architected to take the argument in ECX as an instruction extension MWAIT EAX, ECX \*)

{

WHILE ( ("Monitor Hardware is in armed state")) {

implementation\_dependent\_optimized\_state(EAX, ECX); }

Set the state of Monitor Hardware as triggered; }

#### Intel C/C++ Compiler Intrinsic Equivalent

MWAIT: void \_mm\_mwait(unsigned extensions, unsigned hints)

#### Example

MONITOR/MWAIT instruction pair must be coded in the same loop because execution of the MWAIT instruction will trigger the monitor hardware. It is not a proper usage to execute MONITOR once and then execute MWAIT in a loop. Setting up MONITOR without executing MWAIT has no adverse effects.

Typically the MONITOR/MWAIT pair is used in a sequence, such as:



```
EAX = Logical Address(Trigger)
ECX = 0 (*Hints *)
EDX = 0 (* Hints *)
IF (!trigger_store_happened) {
MONITOR EAX, ECX, EDX
IF (!trigger_store_happened) {
MWAIT EAX, ECX
}
}
```

The above code sequence makes sure that a triggering store does not happen between the first check of the trigger and the execution of the monitor instruction. Without the second check that triggering store would go un-noticed. Typical usage of MONITOR and MWAIT would have the above code sequence within a loop.

#### Numeric Exceptions

None

#### Protected Mode Exceptions

| #GP(0) | If $ECX[31:1] \neq 0$ .                          |
|--------|--------------------------------------------------|
|        | If $ECX[0] = 1$ and $CPUID.05H:ECX[bit 1] = 0$ . |
| #UD    | If CPUID.01H:ECX.MONITOR[bit 3] = 0.             |
|        | If current privilege level is not 0.             |

#### **Real Address Mode Exceptions**

| #GP | If $ECX[31:1] \neq 0$ .                          |
|-----|--------------------------------------------------|
|     | If $ECX[0] = 1$ and $CPUID.05H:ECX[bit 1] = 0$ . |
| #UD | If CPUID.01H:ECX.MONITOR[bit 3] = 0.             |

#### Virtual 8086 Mode Exceptions

#UD The MWAIT instruction is not recognized in virtual-8086 mode (even if CPUID.01H:ECX.MONITOR[bit 3] = 1).

#### **Compatibility Mode Exceptions**

Same exceptions as in protected mode.

#### **64-Bit Mode Exceptions**

| #GP(0) | If $RCX[63:1] \neq 0$ .                          |
|--------|--------------------------------------------------|
|        | If $RCX[0] = 1$ and $CPUID.05H:ECX[bit 1] = 0$ . |
| #UD    | If the current privilege level is not 0.         |
|        | If CPUID.01H:ECX.MONITOR[bit 3] = 0.             |

•••



# PAUSE—Spin Loop Hint

| Opcode | Instruction | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description                                                                 |
|--------|-------------|-----------|----------------|---------------------|-----------------------------------------------------------------------------|
| F3 90  | PAUSE       | NP        | Valid          | Valid               | Gives hint to processor that<br>improves performance of<br>spin-wait loops. |

#### Instruction Operand Encoding

|       |           |           | •         |           |
|-------|-----------|-----------|-----------|-----------|
| Op/En | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
| NP    | NA        | NA        | NA        | NA        |

#### Description

Improves the performance of spin-wait loops. When executing a "spin-wait loop," processors will suffer a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.

An additional function of the PAUSE instruction is to reduce the power consumed by a processor while executing a spin loop. A processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor's power consumption.

This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying no-op operation).

This instruction's operation is the same in non-64-bit modes and 64-bit mode.

...

| Opcode               | Instruction                  | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description            |
|----------------------|------------------------------|-----------|----------------|---------------------|------------------------|
| F3 0F B8 /r          | POPCNT r16, r/<br>m16        | RM        | Valid          | Valid               | POPCNT on <i>r/m16</i> |
| F3 0F B8 /r          | POPCNT <i>r32, r/</i><br>m32 | RM        | Valid          | Valid               | POPCNT on r/m32        |
| F3 REX.W OF B8<br>/r | Popcnt r64, r/<br>m64        | RM        | Valid          | N.E.                | POPCNT on r/m64        |

# POPCNT — Return the Count of Number of Bits Set to 1



| Instruction | Operand | Encoding |
|-------------|---------|----------|

| Op/En | Operand 1     | Operand 2     | Operand 3 | Operand 4 |
|-------|---------------|---------------|-----------|-----------|
| RM    | ModRM:reg (w) | ModRM:r/m (r) | NA        | NA        |

#### Description

This instruction calculates of number of bits set to 1 in the second operand (source) and returns the count in the first operand (a destination register).

#### Operation

Count = 0; For (i=0; i < OperandSize; i++) { IF (SRC[ i] = 1) // i'th bit THEN Count++; FI; } DEST ← Count;

#### **Flags Affected**

OF, SF, ZF, AF, CF, PF are all cleared. ZF is set if SRC = 0, otherwise ZF is cleared

#### Intel C/C++ Compiler Intrinsic Equivalent

| POPCNT: | int _mm_popcnt_u32(unsigned int a);      |
|---------|------------------------------------------|
| POPCNT: | int64_t _mm_popcnt_u64(unsignedint64 a); |

#### **Protected Mode Exceptions**

| #GP(0)                      | If a memory operand effective address is outside the CS, DS, ES, FS or GS segments.                                     |
|-----------------------------|-------------------------------------------------------------------------------------------------------------------------|
| #SS(0)                      | If a memory operand effective address is outside the SS segment limit.                                                  |
| <pre>#PF (fault-code)</pre> | For a page fault.                                                                                                       |
| #AC(0)                      | If an unaligned memory reference is made while the current privi-<br>lege level is 3 and alignment checking is enabled. |
| #UD                         | If CPUID.01H:ECX.POPCNT [Bit 23] = 0.                                                                                   |
|                             | If LOCK prefix is used.                                                                                                 |
|                             | Either the prefix REP (F3h) or REPN (F2H) is used.                                                                      |
|                             |                                                                                                                         |

#### **Real Mode Exceptions**

| #GP(0) | If any part of the operand lies outside of the effective address space from 0 to 0FFFFH. |
|--------|------------------------------------------------------------------------------------------|
| #SS(0) | If a memory operand effective address is outside the SS segment limit.                   |
| #UD    | If CPUID.01H:ECX.POPCNT [Bit 23] = $0$ .                                                 |

If LOCK prefix is used.

Either the prefix REP (F3h) or REPN (F2H) is used.



# Virtual 8086 Mode Exceptions

| #GP(0)                      | If any part of the operand lies outside of the effective address space from 0 to 0FFFFH. |
|-----------------------------|------------------------------------------------------------------------------------------|
| #SS(0)                      | If a memory operand effective address is outside the SS segment limit.                   |
| <pre>#PF (fault-code)</pre> | For a page fault.                                                                        |
| #AC(0)                      | If an unaligned memory reference is made while alignment checking is enabled.            |
| #UD                         | If CPUID.01H:ECX.POPCNT [Bit 23] = 0.                                                    |
|                             | If LOCK prefix is used.                                                                  |
|                             | Either the prefix REP (F3h) or REPN (F2H) is used.                                       |

# **Compatibility Mode Exceptions**

Same exceptions as in Protected Mode.

# 64-Bit Mode Exceptions

| #GP(0)                      | If the memory address is in a non-canonical form.                                                                       |
|-----------------------------|-------------------------------------------------------------------------------------------------------------------------|
| #SS(0)                      | If a memory address referencing the SS segment is in a non-canon-<br>ical form.                                         |
| <pre>#PF (fault-code)</pre> | For a page fault.                                                                                                       |
| #AC(0)                      | If alignment checking is enabled and an unaligned memory refer-<br>ence is made while the current privilege level is 3. |
| #UD                         | If CPUID.01H:ECX.POPCNT [Bit 23] = $0$ .                                                                                |
|                             | If LOCK prefix is used.                                                                                                 |
|                             | Either the prefix REP (F3h) or REPN (F2H) is used.                                                                      |
|                             |                                                                                                                         |

•••



#### WRMSR—Write to Model Specific Register

| Opcode | Instruction | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description                                         |
|--------|-------------|-----------|----------------|---------------------|-----------------------------------------------------|
| 0F 30  | WRMSR       | NP        | Valid          | Valid               | Write the value in EDX:EAX to MSR specified by ECX. |

#### Instruction Operand Encoding

| Op/En | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
|-------|-----------|-----------|-----------|-----------|
| NP    | NA        | NA        | NA        | NA        |

#### Description

Writes the contents of registers EDX:EAX into the 64-bit model specific register (MSR) specified in the ECX register. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The contents of the EDX register are copied to high-order 32 bits of the selected MSR and the contents of the EAX register are copied to low-order 32 bits of the MSR. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are ignored.) Undefined or reserved bits in an MSR should be set to values previously read.

This instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) is generated. Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception. The processor will also generate a general protection exception if software attempts to write to bits in a reserved MSR.

When the WRMSR instruction is used to write to an MTRR, the TLBs are invalidated. This includes global entries (see "Translation Lookaside Buffers (TLBs)" in Chapter 3 of the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3A).

MSRs control functions for testability, execution tracing, performance-monitoring and machine check errors. Chapter 34, "Model-Specific Registers (MSRs)", in the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3C, lists all MSRs that can be written with this instruction and their addresses. Note that each processor family has its own set of MSRs.

The WRMSR instruction is a serializing instruction (see "Serializing Instructions" in Chapter 8 of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 3A*). Note that WRMSR to the IA32\_TSC\_DEADLINE MSR (MSR index 6E0H) and the X2APIC MSRs (MSR indices 802H to 83FH) are not serializing.

The CPUID instruction should be used to determine whether MSRs are supported (CPUID.01H:EDX[5] = 1) before using this instruction.

#### ...

#### 5. Updates to Appendix A, Volume 2C

Change bars show changes to Appendix A of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 2C:* Instruction Set Reference.

\_\_\_\_\_

•••



|   | 0                                                                                                                                                                                                                                                                                                                                                                                                           | 1                                              | 2                      | 3                               | 4                   | 5                    | 6                         | 7                     |  |
|---|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|------------------------|---------------------------------|---------------------|----------------------|---------------------------|-----------------------|--|
| 0 |                                                                                                                                                                                                                                                                                                                                                                                                             |                                                | AD                     | D                               |                     |                      | PUSH                      | POP                   |  |
|   | Eb, Gb                                                                                                                                                                                                                                                                                                                                                                                                      | Ev, Gv                                         | Gb, Eb                 | Gv, Ev                          | AL, Ib              | rAX, Iz              | ES <sup>i64</sup>         | ES <sup>i64</sup>     |  |
| 1 |                                                                                                                                                                                                                                                                                                                                                                                                             |                                                | AD                     | С                               |                     |                      | PUSH<br>SS <sup>i64</sup> | POP                   |  |
|   | Eb, Gb                                                                                                                                                                                                                                                                                                                                                                                                      | Ev, Gv                                         | Gb, Eb                 | Gv, Ev                          | AL, Ib              | rAX, Iz              | SSIOT                     | SS <sup>i64</sup>     |  |
| 2 |                                                                                                                                                                                                                                                                                                                                                                                                             |                                                |                        | SEG=ES                          | DAA <sup>i64</sup>  |                      |                           |                       |  |
|   | Eb, Gb                                                                                                                                                                                                                                                                                                                                                                                                      | Ev, Gv                                         | Gb, Eb                 | Gv, Ev                          | AL, Ib              | rAX, Iz              | (Prefix)                  |                       |  |
| 3 |                                                                                                                                                                                                                                                                                                                                                                                                             |                                                | XC                     |                                 | SEG=SS              | AAA <sup>i64</sup>   |                           |                       |  |
|   | Eb, Gb                                                                                                                                                                                                                                                                                                                                                                                                      | Ev, Gv                                         | Gb, Eb                 | Gv, Ev                          | AL, Ib              | rAX, Iz              | (Prefix)                  |                       |  |
| 4 | INC <sup>i64</sup> general register / REX <sup>064</sup> Prefixes                                                                                                                                                                                                                                                                                                                                           |                                                |                        |                                 |                     |                      |                           |                       |  |
|   | eAX<br>REX                                                                                                                                                                                                                                                                                                                                                                                                  | eCX<br>REX.B                                   | eDX<br>REX.X           | eBX<br>REX.XB                   | eSP<br>REX.R        | eBP<br>REX.RB        | eSI<br>REX.RX             | eDI<br>REX.RXB        |  |
| 5 | REXREX.BREX.XREX.XBREX.RREX.RBREX.RXREX.RXBPUSHd <sup>64</sup> general registerrAX/r8rCX/r9rDX/r10rBX/r11rSP/r12rBP/r13rSI/r14rDI/r15PUSHA <sup>164</sup> POPA <sup>164</sup> /<br>POPAD <sup>164</sup> BOUND <sup>164</sup><br>Gv, MaARPL <sup>164</sup><br>Ew, Gw<br>Gv, EvSEG=FS<br>(Prefix)SEG=GS<br>(Prefix)Operand<br>Size<br>(Prefix)Address<br>Size<br>(Prefix)ONOB/NAE/CNB/AE/NCZ/ENZ/NEBE/NANBE/A |                                                |                        |                                 |                     |                      |                           |                       |  |
|   | rAX/r8                                                                                                                                                                                                                                                                                                                                                                                                      | rCX/r9                                         | rDX/r10                | rBX/r11                         | rSP/r12             | rBP/r13              | rSI/r14                   | rDI/r15               |  |
| 6 |                                                                                                                                                                                                                                                                                                                                                                                                             | Popa <sup>i64</sup> /<br>Popad <sup>i64</sup>  |                        | Ew, Gw<br>MOVSXD <sup>064</sup> |                     |                      | Size                      | Size                  |  |
| 7 |                                                                                                                                                                                                                                                                                                                                                                                                             |                                                | Jcc <sup>f64</sup> , J | b - Short-displa                | cement jump or      | n condition          |                           |                       |  |
|   | 0                                                                                                                                                                                                                                                                                                                                                                                                           | NO                                             | B/NAE/C                | NB/AE/NC                        | Z/E                 | NZ/NE                | BE/NA                     | NBE/A                 |  |
| 8 |                                                                                                                                                                                                                                                                                                                                                                                                             | Immedia                                        | te Grp 1 <sup>1A</sup> |                                 | TE                  | ST                   | XCHG                      |                       |  |
|   | Eb, Ib                                                                                                                                                                                                                                                                                                                                                                                                      | Ev, Iz                                         | Eb, Ib <sup>i64</sup>  | Ev, Ib                          | Eb, Gb              | Ev, Gv               | Eb, Gb                    | Ev, Gv                |  |
| 9 | NOP                                                                                                                                                                                                                                                                                                                                                                                                         |                                                | XCH                    | G word, double                  | -word or quad-      | word register wi     | th rAX                    |                       |  |
|   | PAUSE(F3)<br>XCHG r8, rAX                                                                                                                                                                                                                                                                                                                                                                                   | rCX/r9                                         | rDX/r10                | rBX/r11                         | rSP/r12             | rBP/r13              | rSI/r14                   | rDI/r15               |  |
| A | AL. Ob                                                                                                                                                                                                                                                                                                                                                                                                      | M <sup>r</sup><br>rAX, Ov                      | OV<br>Ob. AL           | Ov. rAX                         | MOVS/B<br>Yb, Xb    | MOVS/W/D/Q<br>Yv, Xv | CMPS/B<br>Xb, Yb          | CMPS/W/D<br>Xv, Yv    |  |
| В | ,                                                                                                                                                                                                                                                                                                                                                                                                           | ,                                              |                        | V immediate b                   | vte into byte rec   | lister               |                           |                       |  |
|   | AL/R8L, Ib                                                                                                                                                                                                                                                                                                                                                                                                  | CL/R9L, Ib                                     | DL/R10L, lb            | BL/R11L, lb                     | AH/R12L, Ib         | CH/R13L, Ib          | DH/R14L, lb               | BH/R15L, Ib           |  |
| С | Shift G                                                                                                                                                                                                                                                                                                                                                                                                     | Grp 2 <sup>1A</sup>                            | RETN <sup>f64</sup>    | RETN <sup>f64</sup>             | LES <sup>i64</sup>  | LDS <sup>i64</sup>   | Grp 11                    | <sup>1A</sup> - MOV   |  |
|   | Eb, Ib                                                                                                                                                                                                                                                                                                                                                                                                      | Ev, Ib                                         | lw                     |                                 | Gz, Mp<br>VEX+2byte | Gz, Mp<br>VEX+1byte  | Eb, Ib                    | Ev, Iz                |  |
| D |                                                                                                                                                                                                                                                                                                                                                                                                             | Shift (                                        | Grp 2 <sup>1A</sup>    |                                 | AAM <sup>i64</sup>  | AAD <sup>i64</sup>   |                           | XLAT/                 |  |
|   | Eb, 1                                                                                                                                                                                                                                                                                                                                                                                                       | Ev, 1                                          | Eb, CL                 | Ev, CL                          | lb                  | lb                   |                           | XLATB                 |  |
| E | LOOPNE <sup>f64</sup> /<br>LOOPNZ <sup>f64</sup>                                                                                                                                                                                                                                                                                                                                                            | LOOPE <sup>f64</sup> /<br>LOOPZ <sup>f64</sup> | LOOP <sup>f64</sup>    | JrCXZ <sup>f64</sup> /          | I                   | N                    | (                         | DUT                   |  |
|   | LOOPNZ <sup>104</sup><br>Jb                                                                                                                                                                                                                                                                                                                                                                                 | LOOPZ <sup>104</sup><br>Jb                     | Jb                     | Jb                              | AL, Ib              | eAX, Ib              | lb, AL                    | lb, eAX               |  |
| F | LOCK                                                                                                                                                                                                                                                                                                                                                                                                        |                                                | REPNE                  | REP/REPE                        | HLT                 | CMC                  | Unary                     | r Grp 3 <sup>1A</sup> |  |
|   | (Prefix)                                                                                                                                                                                                                                                                                                                                                                                                    |                                                | XACQUIRE<br>(Prefix)   | XRELEASE<br>(Prefix)            |                     |                      | Eb                        | Ev                    |  |

# Table A-2 One-byte Opcode Map: (00H — F7H) \*

...



|   | pfx | 0                        | 1                        | 2                                               | 3                       | 4                       | 5                        | 6                                                               | 7                                                        |
|---|-----|--------------------------|--------------------------|-------------------------------------------------|-------------------------|-------------------------|--------------------------|-----------------------------------------------------------------|----------------------------------------------------------|
| 0 |     | Grp 6 <sup>1A</sup>      | Grp 7 <sup>1A</sup>      | LAR<br>Gv, Ew                                   | LSL<br>Gv, Ew           |                         | SYSCALL <sup>064</sup>   | CLTS                                                            | SYSRET <sup>064</sup>                                    |
|   |     | vmovups                  | vmovups                  | vmovlps<br>Vq, Hq, Mq<br>vmovhlps<br>Vq, Hq, Uq | vmovlps<br>Mq, Vq       | vunpcklps<br>Vx, Hx, Wx | vunpckhps<br>Vx, Hx, Wx  | vmovhps <sup>v1</sup><br>Vdq, Hq, Mq<br>vmovlhps<br>Vdq, Hq, Uq | vmovhps <sup>v1</sup><br>Mq, Vq                          |
| 1 | 66  | vmovupd                  | vmovupd<br>Wpd,Vpd       | vmovlpd<br>Vq, Hq, Mq                           | vmovlpd<br>Mq, Vq       | vunpcklpd<br>Vx,Hx,Wx   | vunpckhpd<br>Vx,Hx,Wx    | vmovhpd <sup>v1</sup><br>Vdq, Hq, Mq                            | vmovhpd <sup>v1</sup><br>Mq, Vq                          |
|   | F3  | vmovss<br>Vx, Hx, Wss    | vmovss<br>Wss, Hx, Vss   | vmovsldup<br>Vx, Wx                             |                         |                         |                          | vmovshdup<br>Vx, Wx                                             |                                                          |
|   | F2  | vmovsd<br>Vx, Hx, Wsd    | vmovsd<br>Wsd, Hx, Vsd   | vmovddup<br>Vx, Wx                              |                         |                         |                          |                                                                 |                                                          |
| 2 |     | MOV<br>Rd, Cd            | MOV<br>Rd, Dd            | MOV<br>Cd, Rd                                   | MOV<br>Dd, Rd           |                         |                          |                                                                 |                                                          |
| 3 |     | WRMSR                    | RDTSC                    | RDMSR                                           | RDPMC                   | SYSENTER                | SYSEXIT                  |                                                                 | GETSEC                                                   |
|   |     |                          |                          | С                                               | MOVcc, (Gv, Ev          | /) - Conditional Me     | ove                      |                                                                 |                                                          |
| 4 |     | Ο                        | NO                       | B/C/NAE                                         | AE/NB/NC                | E/Z                     | NE/NZ                    | BE/NA                                                           | A/NBE                                                    |
|   |     | vmovmskps<br>Gy, Ups     | vsqrtps<br>Vps, Wps      | vrsqrtps<br>Vps, Wps                            | vrcpps<br>Vps, Wps      | vandps<br>Vps, Hps, Wps | vandnps<br>Vps, Hps, Wps | vorps<br>Vps, Hps, Wps                                          | vxorps<br>Vps, Hps, Wps                                  |
| 5 | 66  | vmovmskpd<br>Gy,Upd      | vsqrtpd<br>Vpd, Wpd      |                                                 |                         | vandpd<br>Vpd, Hpd, Wpd | vandnpd<br>Vpd, Hpd, Wpd | vorpd<br>Vpd, Hpd, Wpd                                          | vxorpd<br>Vpd, Hpd, Wpd                                  |
|   | F3  |                          | vsqrtss<br>Vss, Hss, Wss | vrsqrtss<br>Vss, Hss, Wss                       | vrcpss<br>Vss, Hss, Wss |                         |                          |                                                                 |                                                          |
|   | F2  |                          | vsqrtsd<br>Vsd, Hsd, Wsd |                                                 |                         |                         |                          |                                                                 |                                                          |
|   |     | punpcklbw<br>Pq, Qd      | punpcklwd<br>Pq, Qd      | punpckldq<br>Pq, Qd                             | packsswb<br>Pq, Qq      | pcmpgtb<br>Pq, Qq       | pcmpgtw<br>Pq, Qq        | pcmpgtd<br>Pq, Qq                                               | packuswb<br>Pq, Qq                                       |
| 6 | 66  | vpunpcklbw<br>Vx, Hx, Wx | vpunpcklwd<br>Vx, Hx, Wx | vpunpckldq<br>Vx, Hx, Wx                        | vpacksswb<br>Vx, Hx, Wx | vpcmpgtb<br>Vx, Hx, Wx  | vpcmpgtw<br>Vx, Hx, Wx   | vpcmpgtd<br>Vx, Hx, Wx                                          | vpackuswb<br>Vx, Hx, Wx                                  |
|   | F3  |                          |                          |                                                 |                         |                         |                          |                                                                 |                                                          |
|   |     | pshufw<br>Pq, Qq, Ib     | (Grp 12 <sup>1A</sup> )  | (Grp 13 <sup>1A</sup> )                         | (Grp 14 <sup>1A</sup> ) | pcmpeqb<br>Pq, Qq       | pcmpeqw<br>Pq, Qq        | pcmpeqd<br>Pq, Qq                                               | emms<br>vzeroupper <sup>v</sup><br>vzeroall <sup>v</sup> |
| 7 | 66  | vpshufd<br>Vx, Wx, Ib    |                          |                                                 |                         | vpcmpeqb<br>Vx, Hx, Wx  | vpcmpeqw<br>Vx, Hx, Wx   | vpcmpeqd<br>Vx, Hx, Wx                                          |                                                          |
|   | F3  | vpshufhw<br>Vx, Wx, Ib   |                          |                                                 |                         |                         |                          |                                                                 |                                                          |
|   | F2  | vpshuflw<br>Vx, Wx, Ib   |                          |                                                 |                         |                         |                          |                                                                 |                                                          |

# Table A-3 Two-byte Opcode Map: 00H — 77H (First Byte is 0FH) \*



|   | pfx | 8                                                 | 9                        | A                            | В                                              | С                         | D                         | E                       | F                        |
|---|-----|---------------------------------------------------|--------------------------|------------------------------|------------------------------------------------|---------------------------|---------------------------|-------------------------|--------------------------|
| 0 |     | INVD                                              | WBINVD                   |                              | 2-byte Illegal<br>Opcodes<br>UD2 <sup>1B</sup> |                           | NOP Ev                    |                         |                          |
| 1 |     | Prefetch <sup>1C</sup><br>(Grp 16 <sup>1A</sup> ) |                          |                              |                                                |                           |                           |                         | NOP Ev                   |
|   |     | vmovaps<br>Vps, Wps                               | vmovaps<br>Wps, Vps      | cvtpi2ps<br>Vps, Qpi         | vmovntps<br>Mps, Vps                           | cvttps2pi<br>Ppi, Wps     | cvtps2pi<br>Ppi, Wps      | vucomiss<br>Vss. Wss    | vcomiss<br>Vss. Ws       |
| 2 | 66  | vmovapd<br>Vpd, Wpd                               | vmovapd<br>Wpd,Vpd       | cvtpi2pd<br>Vpd, Qpi         | vmovntpd<br>Mpd, Vpd                           | cvttpd2pi<br>Ppi, Wpd     | cvtpd2pi<br>Qpi, Wpd      | vucomisd<br>Vsd, Wsd    | vcomisd<br>Vsd, Wse      |
| 2 | F3  |                                                   |                          | vcvtsi2ss<br>Vss, Hss, Ey    |                                                | vcvttss2si<br>Gy, Wss     | vcvtss2si<br>Gy, Wss      |                         |                          |
|   | F2  |                                                   |                          | vcvtsi2sd<br>Vsd, Hsd, Ey    |                                                | vcvttsd2si<br>Gy, Wsd     | vcvtsd2si<br>Gy, Wsd      |                         |                          |
| 3 |     | 3-byte escape<br>(Table A-4)                      |                          | 3-byte escape<br>(Table A-5) |                                                |                           |                           |                         |                          |
| 4 |     | S                                                 | NS                       | C<br>P/PE                    | MOVcc(Gv, Ev)<br>NP/PO                         | - Conditional Mo<br>L/NGE | ve<br>NL/GE               | LE/NG                   | NLE/G                    |
|   |     | vaddps<br>Vps, Hps, Wps                           | vmulps<br>Vps, Hps, Wps  | vcvtps2pd<br>Vpd, Wps        | vcvtdq2ps<br>Vps, Wdq                          | vsubps<br>Vps, Hps, Wps   | vminps<br>Vps, Hps, Wps   | vdivps<br>Vps, Hps, Wps | vmaxps<br>Vps, Hps, V    |
| 5 | 66  |                                                   | vmulpd<br>Vpd, Hpd, Wpd  | vcvtpd2ps<br>Vps, Wpd        | vcvtps2dq<br>Vdq, Wps                          | 1 2 1 2 1                 | 1 1 1 1                   | vdivpd<br>Vpd, Hpd, Wpd |                          |
|   | F3  | vaddss<br>Vss, Hss, Wss                           | vmulss<br>Vss, Hss, Wss  | vcvtss2sd<br>Vsd, Hx, Wss    | vcvttps2dq<br>Vdq, Wps                         | vsubss<br>Vss, Hss, Wss   | vminss<br>Vss, Hss, Wss   |                         | vmaxss<br>Vss, Hss, V    |
|   | F2  | vaddsd<br>Vsd, Hsd, Wsd                           | vmulsd<br>Vsd, Hsd, Wsd  | vcvtsd2ss<br>Vss, Hx, Wsd    |                                                | vsubsd<br>Vsd, Hsd, Wsd   | vminsd<br>Vsd, Hsd, Wsd   | vdivsd<br>Vsd, Hsd, Wsd | vmaxsd<br>Vsd, Hsd, V    |
|   |     | punpckhbw<br>Pq, Qd                               | punpckhwd<br>Pq, Qd      | punpckhdq<br>Pq, Qd          | packssdw<br>Pq, Qd                             |                           |                           | movd/q<br>Pd, Ey        | movq<br>Pq, Qq           |
| 6 | 66  | vpunpckhbw<br>Vx, Hx, Wx                          | vpunpckhwd<br>Vx, Hx, Wx | vpunpckhdq<br>Vx, Hx, Wx     | vpackssdw<br>Vx, Hx, Wx                        | vpunpcklqdq<br>Vx, Hx, Wx | vpunpckhqdq<br>Vx, Hx, Wx | vmovd/q<br>Vy, Ey       | vmovdq<br>Vx, Wx         |
|   | F3  | VMREAD                                            | VMWRITE                  |                              |                                                |                           |                           | movd/q                  | vmovdq<br>Vx, Wx<br>movq |
|   |     | Ey, Gy                                            | Gy, Ey                   |                              |                                                | vhaddpd                   | vhsubpd                   | Ey, Pd<br>vmovd/g       | Qq, Pq<br>vmovdq         |
| 7 | 66  |                                                   |                          |                              |                                                |                           | Vpd, Hpd, Wpd             |                         | Wx,Vx<br>vmovdq          |
|   | F3  |                                                   |                          |                              |                                                | vhaddps                   | vhsubps                   | Vq, Wq                  | Wx,Vx                    |
|   | F2  |                                                   |                          |                              |                                                | Vps, Hps, Wps             |                           |                         |                          |

# Table A-3. Two-byte Opcode Map: 08H — 7FH (First Byte is 0FH) \*



|   | pfx | 0                          | 1                        | 2                        | 3                    | 4                           | 5                      | 6                         | 7                     |
|---|-----|----------------------------|--------------------------|--------------------------|----------------------|-----------------------------|------------------------|---------------------------|-----------------------|
|   |     |                            |                          | Jcc <sup>t64</sup> , J   | z - Long-displac     | ement jump on coi           | ndition                |                           |                       |
| 8 |     | 0                          | NO                       | B/CNAE                   | AE/NB/NC             | E/Z                         | NE/NZ                  | BE/NA                     | A/NBE                 |
|   |     |                            |                          | S                        | ETcc, Eb - Byte      | Set on condition            |                        |                           |                       |
| 9 |     | 0                          | NO                       | B/C/NAE                  | AE/NB/NC             | E/Z                         | NE/NZ                  | BE/NA                     | A/NBE                 |
| A |     | PUSH <sup>d64</sup><br>FS  | POP <sup>d64</sup><br>FS | CPUID                    | BT<br>Ev, Gv         | SHLD<br>Ev, Gv, Ib          | SHLD<br>Ev, Gv, CL     |                           |                       |
|   |     | CMPX                       | CHG                      | LSS                      | BTR                  | LFS                         | LGS                    | MO                        | VZX                   |
| В |     | Eb, Gb                     | Ev, Gv                   | Gv, Mp                   | Ev, Gv               | Gv, Mp                      | Gv, Mp                 | Gv, Eb                    | Gv, Ew                |
|   |     | XADD<br>Eb, Gb             | XADD<br>Ev, Gv           | vcmpps<br>Vps,Hps,Wps,Ib | movnti<br>My, Gy     | pinsrw<br>Pq,Ry/Mw,Ib       | pextrw<br>Gd, Nq, Ib   | vshufps<br>Vps,Hps,Wps,Ib | Grp 9 <sup>1A</sup>   |
| с | 66  |                            |                          | vcmppd<br>Vpd,Hpd,Wpd,Ib |                      | vpinsrw<br>Vdq,Hdq,Ry/Mw,Ib | vpextrw<br>Gd, Udq, Ib | vshufpd<br>Vpd,Hpd,Wpd,Ib |                       |
| C | F3  |                            |                          | vcmpss<br>Vss,Hss,Wss,Ib |                      |                             |                        |                           |                       |
|   | F2  |                            |                          | vcmpsd<br>Vsd,Hsd,Wsd,Ib |                      |                             |                        |                           |                       |
|   |     |                            | psrlw<br>Pq, Qq          | psrld<br>Pq, Qq          | psrlq<br>Pq, Qq      | paddq<br>Pq, Qq             | pmullw<br>Pq, Qq       |                           | pmovmskt<br>Gd, Nq    |
| D | 66  | vaddsubpd<br>Vpd, Hpd, Wpd | vpsrlw<br>Vx, Hx, Wx     | vpsrld<br>Vx, Hx, Wx     | vpsrlq<br>Vx, Hx, Wx | vpaddq<br>Vx, Hx, Wx        | vpmullw<br>Vx, Hx, Wx  | vmovq<br>Wq, Vq           | vpmovmsk<br>Gd, Ux    |
| D | F3  |                            |                          |                          |                      |                             |                        | movq2dq<br>Vdq, Nq        |                       |
|   | F2  | vaddsubps<br>Vps, Hps, Wps |                          |                          |                      |                             |                        | movdq2q<br>Pq, Uq         |                       |
|   |     | pavgb<br>Pq, Qq            | psraw<br>Pq, Qq          | psrad<br>Pq, Qq          | pavgw<br>Pq, Qq      | pmulhuw<br>Pq, Qq           | pmulhw<br>Pq, Qq       |                           | movntq<br>Mq, Pq      |
| Е | 66  | vpavgb<br>Vx, Hx, Wx       | vpsraw<br>Vx, Hx, Wx     | vpsrad<br>Vx, Hx, Wx     | vpavgw<br>Vx, Hx, Wx | vpmulhuw<br>Vx, Hx, Wx      | vpmulhw<br>Vx, Hx, Wx  | vcvttpd2dq<br>Vx, Wpd     | vmovntdo<br>Mx, Vx    |
| - | F3  |                            |                          |                          |                      |                             |                        | vcvtdq2pd<br>Vx, Wpd      |                       |
|   | F2  |                            |                          |                          |                      |                             |                        | vcvtpd2dq<br>Vx, Wpd      |                       |
|   |     |                            | psllw<br>Pq, Qq          | pslld<br>Pq, Qq          | psllq<br>Pq, Qq      | pmuludq<br>Pq, Qq           | pmaddwd<br>Pq, Qq      | psadbw<br>Pq, Qq          | maskmov<br>Pq, Nq     |
| F | 66  |                            | vpsllw<br>Vx, Hx, Wx     | vpslld<br>Vx, Hx, Wx     | vpsllq<br>Vx, Hx, Wx | vpmuludq<br>Vx, Hx, Wx      | vpmaddwd<br>Vx, Hx, Wx | vpsadbw<br>Vx, Hx, Wx     | vmaskmovd<br>Vdq, Udq |
|   | F2  | vlddqu<br>Vx, Mx           |                          |                          |                      |                             |                        |                           |                       |

# Table A-3. Two-byte Opcode Map: 80H — F7H (First Byte is 0FH) \*



|   | pfx | 8                                         | 9                                                       | А                             | В                     | С                     | D                     | ш                                     | F                  |
|---|-----|-------------------------------------------|---------------------------------------------------------|-------------------------------|-----------------------|-----------------------|-----------------------|---------------------------------------|--------------------|
| 8 |     |                                           |                                                         | Jcc <sup>t64</sup> ,          | Jz - Long-displac     | ement jump on o       | condition             |                                       |                    |
| 0 |     | S                                         | NS                                                      | P/PE                          | NP/PO                 | L/NGE                 | NL/GE                 | LE/NG                                 | NLE/G              |
|   |     |                                           |                                                         |                               | SETcc, Eb - Byte      | e Set on condition    | n                     |                                       |                    |
| 9 |     | S                                         | NS                                                      | P/PE                          | NP/PO                 | L/NGE                 | NL/GE                 | LE/NG                                 | NLE/G              |
| А |     | PUSH <sup>d64</sup><br>GS                 | POP <sup>d64</sup><br>GS                                | RSM                           | BTS<br>Ev, Gv         | SHRD<br>Ev, Gv, Ib    | SHRD<br>Ev, Gv, CL    | (Grp 15 <sup>1A</sup> ) <sup>1C</sup> | IMUL<br>Gv, Ev     |
| в |     | JMPE<br>(reserved for<br>emulator on IPF) | Grp 10 <sup>1A</sup><br>Invalid<br>Opcode <sup>1B</sup> | Grp 8 <sup>1A</sup><br>Ev, Ib | BTC<br>Ev, Gv         | BSF<br>Gv, Ev         | BSR<br>Gv, Ev         | MO<br>Gv, Eb                          | VSX<br>Gv, Ew      |
|   | F3  | POPCNT Gv,<br>Ev                          |                                                         |                               |                       | TZCNT<br>Gv, Ev       | LZCNT<br>Gv, Ev       |                                       |                    |
|   |     |                                           |                                                         |                               | BSI                   | WAP                   |                       |                                       |                    |
| С |     | RAX/EAX/<br>R8/R8D                        | RCX/ECX/ R9/<br>R9D                                     | RDX/EDX/<br>R10/R10D          | RBX/EBX/ R11/<br>R11D | RSP/ESP/ R12/<br>R12D | RBP/EBP/ R13/<br>R13D | RSI/ESI/ R14/<br>R14D                 | RDI/EDI/ R<br>R15D |
|   |     | psubusb                                   | psubusw                                                 | pminub                        | pand                  | paddusb               | paddusw               | pmaxub<br>Pq, Qq                      | pandn              |
|   |     | Pq, Qq<br>vpsubusb                        | Pq, Qq<br>vpsubusw                                      | Pq, Qq<br>vpminub             | Pq, Qq<br>vpand       | Pq, Qq<br>vpaddusb    | Pq, Qq<br>vpaddusw    | vpmaxub                               | Pq, Qq<br>vpandn   |
| D | 66  | Vx, Hx, Wx                                | Vx, Hx, Wx                                              | Vx, Hx, Wx                    | Vx, Hx, Wx            | Vx, Hx, Wx            | Vx, Hx, Wx            | Vx, Hx, Wx                            | Vx, Hx, W          |
| D | F3  |                                           |                                                         |                               |                       |                       |                       |                                       |                    |
|   | F2  |                                           |                                                         |                               |                       |                       |                       |                                       |                    |
|   |     | psubsb<br>Pq, Qq                          | psubsw<br>Pq, Qq                                        | pminsw<br>Pq, Qq              | por<br>Pq, Qq         | paddsb<br>Pq, Qq      | paddsw<br>Pq, Qq      | pmaxsw<br>Pq, Qq                      | pxor<br>Pq, Qq     |
| Е | 66  | vpsubsb<br>Vx, Hx, Wx                     | vpsubsw<br>Vx, Hx, Wx                                   | vpminsw<br>Vx, Hx, Wx         | vpor<br>Vx, Hx, Wx    | vpaddsb<br>Vx, Hx, Wx | vpaddsw<br>Vx, Hx, Wx | vpmaxsw<br>Vx, Hx, Wx                 | vpxor<br>Vx, Hx, W |
| - | F3  |                                           |                                                         |                               |                       |                       |                       |                                       |                    |
|   | F2  |                                           |                                                         |                               |                       |                       |                       |                                       |                    |
|   |     | psubb<br>Pq, Qq                           | psubw<br>Pq, Qq                                         | psubd<br>Pq, Qq               | psubq<br>Pq, Qq       | paddb<br>Pq, Qq       | paddw<br>Pq, Qq       | paddd<br>Pq, Qq                       |                    |
| F | 66  | vpsubb<br>Vx, Hx, Wx                      | vpsubw<br>Vx, Hx, Wx                                    | vpsubd<br>Vx, Hx, Wx          | vpsubq<br>Vx, Hx, Wx  | vpaddb<br>Vx, Hx, Wx  | vpaddw<br>Vx, Hx, Wx  | vpaddd<br>Vx, Hx, Wx                  |                    |
|   | F2  |                                           |                                                         |                               |                       |                       |                       |                                       |                    |

# Table A-3. Two-byte Opcode Map: 88H — FFH (First Byte is 0FH) \*

NOTES:

\* All blanks in all opcode maps are reserved and must not be used. Do not depend on the operation of undefined or reserved locations.



|        | pfx        | 0                                    | 1                                    | 2                                     | 3                                     | 4                        | 5                                    | 6                                          | 7                                       |
|--------|------------|--------------------------------------|--------------------------------------|---------------------------------------|---------------------------------------|--------------------------|--------------------------------------|--------------------------------------------|-----------------------------------------|
| 0      |            | pshufb<br>Pq, Qq                     | phaddw<br>Pq, Qq                     | phaddd<br>Pq, Qq                      | phaddsw<br>Pq, Qq                     | pmaddubsw<br>Pq, Qq      | phsubw<br>Pq, Qq                     | phsubd<br>Pq, Qq                           | phsubsw<br>Pq, Qq                       |
| 0      | 66         | vpshufb<br>Vx, Hx, Wx                | vphaddw<br>Vx, Hx, Wx                | vphaddd<br>Vx, Hx, Wx                 | vphaddsw<br>Vx, Hx, Wx                | vpmaddubsw<br>Vx, Hx, Wx | vphsubw<br>Vx, Hx, Wx                | vphsubd<br>Vx, Hx, Wx                      | vphsubsw<br>Vx, Hx, Wx                  |
| 1      | 66         | pblendvb<br>Vdq, Wdq                 |                                      |                                       | vcvtph2ps <sup>v</sup><br>Vx, Wx, Ib  | blendvps<br>Vdq, Wdq     | blendvpd<br>Vdq, Wdq                 | vpermps <sup>v</sup><br>Vqq, Hqq, Wqq      | vptest<br>Vx, Wx                        |
| 2      | 66         | vpmovsxbw<br>Vx, Ux/Mq               | vpmovsxbd<br>Vx, Ux/Md               | vpmovsxbq<br>Vx, Ux/Mw                | vpmovsxwd<br>Vx, Ux/Mq                | vpmovsxwq<br>Vx, Ux/Md   | vpmovsxdq<br>Vx, Ux/Mq               |                                            |                                         |
| 3      | 66         | vpmovzxbw<br>Vx, Ux/Mq               | vpmovzxbd<br>Vx, Ux/Md               | vpmovzxbq<br>Vx, Ux/Mw                | vpmovzxwd<br>Vx, Ux/Mq                | vpmovzxwq<br>Vx, Ux/Md   | vpmovzxdq<br>Vx, Ux/Mq               | vpermd <sup>v</sup><br>Vqq, Hqq, Wqq       | vpcmpgtq<br>Vx, Hx, Wx                  |
| 4      | 66         | vpmulld<br>Vx, Hx, Wx                | vphminposuw<br>Vdq, Wdq              |                                       |                                       |                          | vpsrlvd/q <sup>v</sup><br>Vx, Hx, Wx | vpsravd <sup>v</sup><br>Vx, Hx, Wx         | vpsllvd/q <sup>v</sup><br>Vx, Hx, Wx    |
| 5      |            |                                      |                                      |                                       |                                       |                          |                                      |                                            |                                         |
| 6      |            |                                      |                                      |                                       |                                       |                          |                                      |                                            |                                         |
| 8      | 66         | INVEPT<br>Gy, Mdg                    | INVVPID<br>Gy, Mdg                   | INVPCID<br>Gy, Mdq                    |                                       |                          |                                      |                                            |                                         |
| 9      | 66         | vgatherdd/q <sup>v</sup><br>Vx,Hx,Wx | vgatherqd/q <sup>v</sup><br>Vx,Hx,Wx | vgatherdps/d <sup>v</sup><br>Vx,Hx,Wx | vgatherqps/d <sup>v</sup><br>Vx,Hx,Wx |                          |                                      | vfmaddsub132ps/<br>d <sup>V</sup> Vx,Hx,Wx | vfmsubadd132<br>d <sup>v</sup> Vx,Hx,Wx |
| A      | 66         |                                      |                                      |                                       |                                       |                          |                                      | vfmaddsub213ps/<br>d <sup>v</sup> Vx,Hx,Wx | vfmsubadd213<br>d <sup>v</sup> Vx,Hx,W  |
| в      | 66         |                                      |                                      |                                       |                                       |                          |                                      | vfmaddsub231ps/<br>d <sup>v</sup> Vx,Hx,Wx | vfmsubadd231<br>d <sup>v</sup> Vx,Hx,W  |
| C<br>D |            |                                      |                                      |                                       |                                       |                          |                                      |                                            |                                         |
| E      |            |                                      |                                      |                                       |                                       |                          |                                      |                                            |                                         |
| L      |            | MOVBE<br>Gy, My                      | MOVBE<br>My, Gy                      | ANDN <sup>v</sup><br>Gy, By, Ey       |                                       |                          | BZHI <sup>v</sup><br>Gy, Ey, By      |                                            | BEXTR <sup>v</sup><br>Gy, Ey, By        |
|        | 66         | MOVBE<br>Gw, Mw                      | MOVBE<br>Mw, Gw                      |                                       |                                       |                          |                                      |                                            | SHLX <sup>v</sup><br>Gy, Ey, By         |
| F      | F3         |                                      |                                      |                                       | Grp 17 <sup>1A</sup>                  |                          | PEXT <sup>v</sup><br>Gy, By, Ey      |                                            | SARX <sup>v</sup><br>Gy, Ey, By         |
|        | F2         | CRC32<br>Gd, Eb                      | CRC32<br>Gd, Ey                      |                                       |                                       |                          | PDEP <sup>v</sup><br>Gy, By, Ey      | MULX <sup>v</sup><br>By,Gy,rDX,Ey          | SHRX <sup>v</sup><br>Gy, Ey, By         |
|        | 66 &<br>F2 | CRC32<br>Gd, Eb                      | CRC32<br>Gd, Ew                      |                                       |                                       |                          |                                      |                                            |                                         |

## Table A-4 Three-byte Opcode Map: 00H — F7H (First Two Bytes are 0F 38H) \*



|   | pfx        | 8                                        | 9                                        | A                                        | В                                        | С                                         | D                                         | E                                         | F                              |
|---|------------|------------------------------------------|------------------------------------------|------------------------------------------|------------------------------------------|-------------------------------------------|-------------------------------------------|-------------------------------------------|--------------------------------|
| 0 |            | psignb<br>Pq, Qq                         | psignw<br>Pq, Qq                         | psignd<br>Pq, Qq                         | pmulhrsw<br>Pq, Qq                       |                                           |                                           |                                           |                                |
| U | 66         | vpsignb<br>Vx, Hx, Wx                    | vpsignw<br>Vx, Hx, Wx                    | vpsignd<br>Vx, Hx, Wx                    | vpmulhrsw<br>Vx, Hx, Wx                  | vpermilps <sup>v</sup><br>Vx,Hx,Wx        | vpermilpd <sup>v</sup><br>Vx,Hx,Wx        | vtestps <sup>v</sup><br>Vx, Wx            | vtestpd <sup>v</sup><br>Vx, Wx |
| 1 |            |                                          |                                          |                                          |                                          | pabsb<br>Pq, Qq                           | pabsw<br>Pq, Qq                           | pabsd<br>Pq, Qq                           |                                |
| I | 66         | vbroadcastss <sup>v</sup><br>Vx, Wd      | vbroadcastsd <sup>v</sup><br>Vqq, Wq     | vbroadcastf128 <sup>v</sup><br>Vqq, Mdq  |                                          | vpabsb<br>Vx, Wx                          | vpabsw<br>Vx, Wx                          | vpabsd<br>Vx, Wx                          |                                |
| 2 | 66         | vpmuldq<br>Vx, Hx, Wx                    | vpcmpeqq<br>Vx, Hx, Wx                   | vmovntdqa<br>Vx, Mx                      | vpackusdw<br>Vx, Hx, Wx                  | vmaskmovps <sup>v</sup><br>Vx,Hx,Mx       | vmaskmovpd <sup>v</sup><br>Vx,Hx,Mx       | vmaskmovps <sup>v</sup><br>Mx,Hx,Vx       | vmaskmovpo<br>Mx,Hx,Vx         |
| 3 | 66         | vpminsb<br>Vx, Hx, Wx                    | vpminsd<br>Vx, Hx, Wx                    | vpminuw<br>Vx, Hx, Wx                    | vpminud<br>Vx, Hx, Wx                    | vpmaxsb<br>Vx, Hx, Wx                     | vpmaxsd<br>Vx, Hx, Wx                     | vpmaxuw<br>Vx, Hx, Wx                     | vpmaxud<br>Vx, Hx, Wx          |
| 4 |            |                                          |                                          |                                          |                                          |                                           |                                           |                                           |                                |
| 5 | 66         | vpbroadcastd <sup>v</sup><br>Vx, Wx      | vpbroadcastq <sup>v</sup><br>Vx, Wx      | vbroadcasti128 <sup>v</sup><br>Vqq, Mdq  |                                          |                                           |                                           |                                           |                                |
| 6 |            |                                          |                                          |                                          |                                          |                                           |                                           |                                           |                                |
| 7 | 66         | vpbroadcastb <sup>v</sup><br>Vx, Wx      | vpbroadcastw <sup>v</sup><br>Vx, Wx      |                                          |                                          |                                           |                                           |                                           |                                |
| 8 | 66         |                                          |                                          |                                          |                                          | vpmaskmovd/q <sup>v</sup><br>Vx,Hx,Mx     |                                           | vpmaskmovd/q <sup>v</sup><br>Mx,Vx,Hx     |                                |
| 9 | 66         | vfmadd132ps/d <sup>v</sup><br>Vx, Hx, Wx | vfmadd132ss/d <sup>v</sup><br>Vx, Hx, Wx | vfmsub132ps/d <sup>v</sup><br>Vx, Hx, Wx | vfmsub132ss/d <sup>v</sup><br>Vx, Hx, Wx | vfnmadd132ps/d <sup>v</sup><br>Vx, Hx, Wx | vfnmadd132ss/d <sup>v</sup><br>Vx, Hx, Wx | vfnmsub132ps/d <sup>v</sup><br>Vx, Hx, Wx | vfnmsub132ss<br>Vx, Hx, Wx     |
| А | 66         | vfmadd213ps/d <sup>v</sup><br>Vx, Hx, Wx | vfmadd213ss/d <sup>v</sup><br>Vx, Hx, Wx | vfmsub213ps/d <sup>v</sup><br>Vx, Hx, Wx | vfmsub213ss/d <sup>v</sup><br>Vx, Hx, Wx | vfnmadd213ps/d <sup>v</sup><br>Vx, Hx, Wx | vfnmadd213ss/d <sup>v</sup><br>Vx, Hx, W  | vfnmsub213ps/d <sup>v</sup><br>Vx, Hx, Wx | vfnmsub213ss<br>Vx, Hx, Wx     |
| В | 66         | vfmadd231ps/d <sup>v</sup><br>Vx, Hx, Wx | vfmadd231ss/d <sup>v</sup><br>Vx, Hx, Wx | vfmsub231ps/d <sup>v</sup><br>Vx, Hx, Wx | vfmsub231ss/d <sup>v</sup><br>Vx, Hx, Wx | vfnmadd231ps/d <sup>v</sup><br>Vx, Hx, Wx | vfnmadd231ss/d <sup>v</sup><br>Vx, Hx, Wx | vfnmsub231ps/d <sup>v</sup><br>Vx, Hx, Wx | vfnmsub231ss<br>Vx, Hx, Wx     |
| С |            |                                          |                                          |                                          |                                          |                                           |                                           |                                           |                                |
| D | 66         |                                          |                                          |                                          | VAESIMC<br>Vdq, Wdq                      | VAESENC<br>Vdq,Hdq,Wdq                    | VAESENCLAST<br>Vdq,Hdq,Wdq                | VAESDEC<br>Vdq,Hdq,Wdq                    | VAESDECLAS<br>Vdq,Hdq,Wo       |
| Е |            |                                          |                                          |                                          |                                          |                                           |                                           |                                           |                                |
|   |            |                                          |                                          |                                          |                                          |                                           |                                           |                                           |                                |
|   | 66         |                                          |                                          |                                          |                                          |                                           |                                           |                                           |                                |
| F | F3         |                                          |                                          |                                          |                                          |                                           |                                           |                                           |                                |
|   | F2         |                                          |                                          |                                          |                                          |                                           |                                           |                                           |                                |
|   | 66 &<br>F2 |                                          |                                          |                                          |                                          |                                           |                                           |                                           |                                |

### Table A-4. Three-byte Opcode Map: 08H — FFH (First Two Bytes are 0F 38H) \*

NOTES:

> \* All blanks in all opcode maps are reserved and must not be used. Do not depend on the operation of undefined or reserved locations.



|   | pfx | 0                                   | 1                                    | 2                                    | 3                          | 4                                    | 5                                    | 6                                         | 7                         |
|---|-----|-------------------------------------|--------------------------------------|--------------------------------------|----------------------------|--------------------------------------|--------------------------------------|-------------------------------------------|---------------------------|
| 0 | 66  | vpermq <sup>v</sup><br>Vqq, Wqq, Ib | vpermpd <sup>v</sup><br>Vqq, Wqq, Ib | vpblendd <sup>v</sup><br>Vx,Hx,Wx,Ib |                            | vpermilps <sup>v</sup><br>Vx, Wx, Ib | vpermilpd <sup>v</sup><br>Vx, Wx, Ib | vperm2f128 <sup>v</sup><br>Vqq,Hqq,Wqq,Ib |                           |
| 1 | 66  |                                     |                                      |                                      |                            | vpextrb<br>Rd/Mb, Vdq, Ib            | vpextrw<br>Rd/Mw, Vdq, Ib            | vpextrd/q<br>Ey, Vdq, Ib                  | vextractps<br>Ed, Vdq, Ib |
| 2 | 66  | vpinsrb<br>Vdq,Hdq, Ry/<br>Mb,Ib    | vinsertps<br>Vdq,Hdq, Udq/<br>Md,Ib  | vpinsrd/q<br>Vdq,Hdq,Ey,Ib           |                            |                                      |                                      |                                           |                           |
| 3 |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| 4 | 66  | vdpps<br>Vx,Hx,Wx,Ib                | vdppd<br>Vdq,Hdq,Wdq,Ib              | vmpsadbw<br>Vx,Hx,Wx,Ib              |                            | vpclmulqdq<br>Vdq,Hdq,Wdq,Ib         |                                      | vperm2i128 <sup>v</sup><br>Vqq,Hqq,Wqq,Ib |                           |
| 5 |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| 6 | 66  | vpcmpestrm<br>Vdq, Wdq, Ib          | vpcmpestri<br>Vdq, Wdq, Ib           | vpcmpistrm<br>Vdq, Wdq, Ib           | vpcmpistri<br>Vdq, Wdq, Ib |                                      |                                      |                                           |                           |
| 7 |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| 8 |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| 9 |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| А |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| В |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| С |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| D |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| Е |     |                                     |                                      |                                      |                            |                                      |                                      |                                           |                           |
| F | F2  | RORX <sup>v</sup><br>Gy, Ey, Ib     |                                      |                                      |                            |                                      |                                      |                                           |                           |

## Table A-5 Three-byte Opcode Map: 00H — F7H (First two bytes are 0F 3AH) \*

Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual Documentation Changes



|        | pfx | 8                                          | 9                                       | А                                     | В                                     | С                                     | D                                    | E                       | F                          |
|--------|-----|--------------------------------------------|-----------------------------------------|---------------------------------------|---------------------------------------|---------------------------------------|--------------------------------------|-------------------------|----------------------------|
| 0      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         | palignr<br>Pq, Qq, Ib      |
|        | 66  | vroundps<br>Vx,Wx,Ib                       | vroundpd<br>Vx,Wx,Ib                    | vroundss<br>Vss,Wss,Ib                | vroundsd<br>Vsd,Wsd,Ib                | vblendps<br>Vx,Hx,Wx,Ib               | vblendpd<br>Vx,Hx,Wx,Ib              | vpblendw<br>Vx,Hx,Wx,Ib | vpalignr<br>Vx,Hx,Wx,Ib    |
| 1      | 66  | vinsertf128 <sup>v</sup><br>Vqq,Hqq,Wqq,Ib | vextractf128 <sup>v</sup><br>Wdq,Vqq,Ib |                                       |                                       |                                       | vcvtps2ph <sup>v</sup><br>Wx, Vx, Ib |                         |                            |
| 2      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |
| 3      | 66  | vinserti128 <sup>v</sup><br>Vqq,Hqq,Wqq,Ib | vextracti128 <sup>v</sup><br>Wdq,Vqq,Ib |                                       |                                       |                                       |                                      |                         |                            |
| 4      | 66  |                                            |                                         | vblendvps <sup>v</sup><br>Vx,Hx,Wx,Lx | vblendvpd <sup>v</sup><br>Vx,Hx,Wx,Lx | vpblendvb <sup>v</sup><br>Vx,Hx,Wx,Lx |                                      |                         |                            |
| 5      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |
| 6      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |
| 7      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |
| 8      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |
| 9      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |
| A      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |
| В      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |
| C<br>D | 66  |                                            |                                         |                                       |                                       |                                       |                                      |                         | VAESKEYGEN<br>Vdq, Wdq, Ib |
| Е      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |
| F      |     |                                            |                                         |                                       |                                       |                                       |                                      |                         |                            |

### Table A-5. Three-byte Opcode Map: 08H — FFH (First Two Bytes are 0F 3AH) \*

NOTES:

\* All blanks in all opcode maps are reserved and must not be used. Do not depend on the operation of undefined or reserved locations.

• • •



|                                                   |       |             |     | Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)     |                                     |                                                                                    |             |                           |                |                           |                                  |  |
|---------------------------------------------------|-------|-------------|-----|---------------------------------------------------------------------------|-------------------------------------|------------------------------------------------------------------------------------|-------------|---------------------------|----------------|---------------------------|----------------------------------|--|
| Opcode                                            | Group | Mod 7,6     | pfx | 000                                                                       | 001                                 | 010                                                                                | 011         | 100                       | 101            | 110                       | 111                              |  |
| 80-83                                             | 1     | mem,<br>11B |     | ADD                                                                       | OR                                  | ADC                                                                                | SBB         | AND                       | SUB            | XOR                       | CMP                              |  |
| 8F                                                | 1A    | mem,<br>11B |     | POP                                                                       |                                     |                                                                                    |             |                           |                |                           |                                  |  |
| C0,C1 reg, imm<br>D0, D1 reg, 1<br>D2, D3 reg, CL | 2     | mem,<br>11B |     | ROL                                                                       | ROR                                 | RCL                                                                                | RCR         | SHL/SAL                   | SHR            |                           | SAR                              |  |
| F6, F7                                            | 3     | mem,<br>11B |     | TEST<br>lb/lz                                                             |                                     | NOT                                                                                | NEG         | MUL<br>AL/rAX             | IMUL<br>AL/rAX | DIV<br>AL/rAX             | IDIV<br>AL/rAX                   |  |
| FE                                                | 4     | mem,<br>11B |     | INC<br>Eb                                                                 | DEC<br>Eb                           |                                                                                    |             |                           |                |                           |                                  |  |
| FF                                                | 5     | mem,<br>11B |     | INC<br>Ev                                                                 | DEC<br>Ev                           | CALLN <sup>f64</sup><br>Ev                                                         | CALLF<br>Ep | JMPN <sup>f64</sup><br>Ev | JMPF<br>Mp     | PUSH <sup>d64</sup><br>Ev |                                  |  |
| 0F 00                                             | 6     | mem,<br>11B |     | SLDT<br>Rv/Mw                                                             | STR<br>Rv/Mw                        | LLDT<br>Ew                                                                         | LTR<br>Ew   | VERR<br>Ew                | VERW<br>Ew     |                           |                                  |  |
|                                                   |       | mem         |     | SGDT<br>Ms                                                                | SIDT<br>Ms                          | LGDT<br>Ms                                                                         | LIDT<br>Ms  | SMSW<br>Mw/Rv             |                | LMSW<br>Ew                | INVLPG<br>Mb                     |  |
| 0F 01                                             | 7     | 11B         |     | VMCALL (001)<br>VMLAUNCH<br>(010)<br>VMRESUME<br>(011)<br>VMXOFF<br>(100) | MONITOR<br>(000)<br>MWAIT (001)     | XGETBV<br>(000)<br>XSETBV<br>(001)<br>VMFUNC<br>(100)<br>XEND (101)<br>XTEST (110) |             |                           |                |                           | SWAPGS<br>064(000)<br>RDTSCP (00 |  |
| 0F BA                                             | 8     | mem,<br>11B |     |                                                                           |                                     |                                                                                    |             | BT                        | BTS            | BTR                       | BTC                              |  |
|                                                   |       |             |     |                                                                           | CMPXCH8B<br>Mq<br>CMPXCHG16B<br>Mdq |                                                                                    |             |                           |                | VMPTRLD<br>Mq             | VMPTRST<br>Mq                    |  |
| 0F C7                                             | 9     | mem         | 66  |                                                                           |                                     |                                                                                    |             |                           |                | VMCLEAR<br>Mq             |                                  |  |
|                                                   |       |             | F3  |                                                                           |                                     |                                                                                    |             |                           |                | VMXON<br>Mq               | VMPTRST<br>Mq                    |  |
|                                                   |       | 11B         |     |                                                                           |                                     |                                                                                    |             |                           |                | RDRAND<br>Rv              |                                  |  |
| 0F B9                                             | 10    | mem         |     |                                                                           |                                     |                                                                                    |             |                           |                |                           |                                  |  |
|                                                   |       | 11B         |     |                                                                           | n                                   |                                                                                    |             |                           |                |                           | 1                                |  |
| C6                                                |       | mem         |     | MOV<br>Eb, Ib                                                             |                                     |                                                                                    |             |                           |                |                           | VADODT                           |  |
| 00                                                | 11    | 11B         |     |                                                                           |                                     |                                                                                    |             |                           |                |                           | XABORT<br>(000) lb               |  |
| C7                                                |       | mem         |     | MOV<br>Ev, Iz                                                             |                                     |                                                                                    |             |                           |                |                           |                                  |  |
| 0,                                                |       | 11B         |     |                                                                           |                                     |                                                                                    |             |                           |                |                           | XBEGIN (00<br>Jz                 |  |

### Table A-6 Opcode Extensions for One- and Two-byte Opcodes by Group Number \*

Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual Documentation Changes



| - | Table       | <u>л-0</u> | υμεσε   |     |                 |                   |                     | No-Dyte C           |                    | -        |                    |                     |         |         |       |        |          |         |
|---|-------------|------------|---------|-----|-----------------|-------------------|---------------------|---------------------|--------------------|----------|--------------------|---------------------|---------|---------|-------|--------|----------|---------|
|   |             |            |         |     | Encodii         | ng of Bits        | 5,4,3 of            | the ModR/           | M Byte (b          | oits 2,1 | ,0 in pare         | nthesis)            |         |         |       |        |          |         |
|   | Opcode      | Group      | Mod 7,6 | pfx | 000             | 001               | 010                 | 011                 | 100                | 101      | 110                | 111                 |         |         |       |        |          |         |
| Г |             |            | mem     |     |                 |                   |                     |                     |                    |          |                    |                     |         |         |       |        |          |         |
|   | 0F 71       | 12         | 11B     |     |                 |                   | psrlw<br>Nq, Ib     |                     | psraw<br>Nq, Ib    |          | psllw<br>Nq, Ib    |                     |         |         |       |        |          |         |
|   |             |            | ПD      | 66  |                 |                   | vpsrlw<br>Hx,Ux,Ib  |                     | vpsraw<br>Hx,Ux,Ib |          | vpsllw<br>Hx,Ux,Ib |                     |         |         |       |        |          |         |
|   |             |            | mem     | mem |                 |                   |                     |                     |                    |          |                    |                     |         |         |       |        |          |         |
|   | 0F 72       | 13         | 11B     |     |                 |                   | psrld<br>Nq, Ib     |                     | psrad<br>Nq, Ib    |          | pslld<br>Nq, Ib    |                     |         |         |       |        |          |         |
|   |             |            | ПD      | 66  |                 |                   | vpsrld<br>Hx,Ux,Ib  |                     | vpsrad<br>Hx,Ux,Ib |          | vpslld<br>Hx,Ux,Ib |                     |         |         |       |        |          |         |
| Г |             |            | mem     |     |                 |                   |                     |                     |                    |          |                    |                     |         |         |       |        |          |         |
|   | 0F 73       | 14         | 11B     |     |                 |                   | psrlq<br>Nq, Ib     |                     |                    |          | psllq<br>Nq, Ib    |                     |         |         |       |        |          |         |
|   |             |            | ПD      | 66  |                 |                   | vpsrlq<br>Hx,Ux,Ib  | vpsrldq<br>Hx,Ux,Ib |                    |          | vpsllq<br>Hx,Ux,Ib | vpslldq<br>Hx,Ux,Ib |         |         |       |        |          |         |
|   |             |            |         |     |                 |                   |                     |                     | mem                |          | fxsave             | fxrstor             | ldmxcsr | stmxcsr | XSAVE | XRSTOR | XSAVEOPT | clflush |
|   | 0F AE       | 15         |         |     |                 |                   |                     |                     |                    | lfence   | mfence             | sfence              |         |         |       |        |          |         |
|   |             |            | 11B     | F3  | RDFSBASE<br>Ry  | RDGSBASE<br>Ry    | WRFSBASE<br>Ry      | WRGSBASE<br>Ry      |                    |          |                    |                     |         |         |       |        |          |         |
|   | 0F 18       | 16         | mem     |     | prefetch<br>NTA | prefetch<br>T0    | prefetch<br>T1      | prefetch<br>T2      |                    |          |                    |                     |         |         |       |        |          |         |
|   |             |            | 11B     |     |                 |                   |                     |                     |                    |          |                    |                     |         |         |       |        |          |         |
|   | VEX.0F38 F3 | 17         | mem     |     |                 | BLSR <sup>V</sup> | BLSMSK <sup>V</sup> | BLSI <sup>V</sup>   |                    |          |                    |                     |         |         |       |        |          |         |
|   |             | 17         | 11B     |     |                 | Ву, Еу            | Ву, Еу              | Ву, Еу              |                    |          |                    |                     |         |         |       |        |          |         |

#### Table A-6 Opcode Extensions for One- and Two-byte Opcodes by Group Number \*

NOTES:

\* All blanks in all opcode maps are reserved and must not be used. Do not depend on the operation of undefined or reserved locations.

6.

#### Updates to Appendix B, Volume 2C

Change bars show changes to Appendix B of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 2C:* Instruction Set Reference.

\_\_\_\_\_

...

...

## B.2 GENERAL-PURPOSE INSTRUCTION FORMATS AND ENCODINGS FOR NON-64-BIT MODES

Table B-13 shows machine instruction formats and encodings for general purpose instructions in non-64-bit modes.

## Table B-13 General Purpose Instruction Formats and Encodings

for Non-64-Bit Modes

| Instruction and Format | Encoding |
|------------------------|----------|
|                        |          |
| MOV – Move Data        |          |



| Instruction and Format                     | Encoding                                 |
|--------------------------------------------|------------------------------------------|
| register1 to register2                     | 1000 100w : 11 reg1 reg2                 |
| register2 to register1                     | 1000 101w : 11 reg1 reg2                 |
| memory to reg                              | 1000 101w : mod reg r/m                  |
| reg to memory                              | 1000 100w : mod reg r/m                  |
| immediate to register                      | 1100 011w : 11 000 reg : immediate data  |
| immediate to register (alternate encoding) | 1011 w reg : immediate data              |
| immediate to memory                        | 1100 011w : mod 000 r/m : immediate data |
| memory to AL, AX, or EAX                   | 1010 000w : full displacement            |
| AL, AX, or EAX to memory                   | 1010 001w : full displacement            |
| MOV – Move to/from Control Registers       |                                          |
| CR0 from register                          | 0000 1111 : 0010 0010 : 000 reg          |
| CR2 from register                          | 0000 1111 : 0010 0010 : 010reg           |
| CR3 from register                          | 0000 1111 : 0010 0010 : 011 reg          |
| CR4 from register                          | 0000 1111 : 0010 0010 : 100 reg          |
| register from CRO-CR4                      | 0000 1111 : 0010 0000 : eee reg          |
| MOV – Move to/from Debug Registers         |                                          |
| DR0-DR3 from register                      | 0000 1111 : 0010 0011 : eee reg          |
| DR4-DR5 from register                      | 0000 1111 : 0010 0011 : eee reg          |
| DR6-DR7 from register                      | 0000 1111 : 0010 0011 : eee reg          |
| register from DR6-DR7                      | 0000 1111 : 0010 0001 : eee reg          |
| register from DR4-DR5                      | 0000 1111 : 0010 0001 : eee reg          |
| register from DRO-DR3                      | 0000 1111 : 0010 0001 : eee reg          |
| MOV – Move to/from Segment Registers       |                                          |
| register to segment register               | 1000 1110 : 11 sreg3 reg                 |
| register to SS                             | 1000 1110 : 11 sreg3 reg                 |
| memory to segment reg                      | 1000 1110 : mod sreg3 r/m                |
| memory to SS                               | 1000 1110 : mod sreg3 r/m                |
| segment register to register               | 1000 1100 : 11 sreg3 reg                 |
| segment register to memory                 | 1000 1100 : mod sreg3 r/m                |

#### Table B-13 General Purpose Instruction Formats and Encodings for Non-64-Bit Modes (Contd.)

...

### 7. Updates to Chapter 1, Volume 3A

Change bars show changes to Chapter 1 of the  $Intel^{(R)}$  64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, Part 1.



1.1 INTEL® 64 AND IA-32 PROCESSORS COVERED IN THIS MANUAL

This manual set includes information pertaining primarily to the most recent Intel 64 and IA-32 processors, which include:

- Pentium<sup>®</sup> processors
- P6 family processors
- Pentium<sup>®</sup> 4 processors
- Pentium<sup>®</sup> M processors
- Intel<sup>®</sup> Xeon<sup>®</sup> processors
- Pentium<sup>®</sup> D processors
- Pentium<sup>®</sup> processor Extreme Editions
- 64-bit Intel<sup>®</sup> Xeon<sup>®</sup> processors
- Intel<sup>®</sup> Core<sup>™</sup> Duo processor
- Intel<sup>®</sup> Core<sup>™</sup> Solo processor
- Dual-Core Intel<sup>®</sup> Xeon<sup>®</sup> processor LV
- Intel<sup>®</sup> Core<sup>™</sup>2 Duo processor
- Intel<sup>®</sup> Core<sup>™</sup>2 Quad processor Q6000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 3000, 3200 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 5000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 5100, 5300 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme processor X7000 and X6800 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme QX6000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 7100 series
- Intel<sup>®</sup> Pentium<sup>®</sup> Dual-Core processor
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 7200, 7300 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme QX9000 series
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 5200, 5400, 7400 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Extreme processor QX9000 and X9000 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Quad processor Q9000 series
- Intel<sup>®</sup> Core<sup>™</sup>2 Duo processor E8000, T9000 series
- Intel<sup>®</sup> Atom<sup>™</sup> processor family
- Intel<sup>®</sup> Core<sup>TM</sup> i7 processor
- Intel<sup>®</sup> Core<sup>™</sup> i5 processor
- Intel<sup>®</sup> Xeon<sup>®</sup> processor E7-8800/4800/2800 product families
- Intel<sup>®</sup> Xeon<sup>®</sup> processor E5 family
- Intel<sup>®</sup> Xeon<sup>®</sup> processor E3 family



- Intel<sup>®</sup> Core<sup>™</sup> i7-3930K processor
- 2nd generation Intel<sup>®</sup> Core<sup>™</sup> i7-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i5-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i3-2xxx processor series

P6 family processors are IA-32 processors based on the P6 family microarchitecture. This includes the Pentium<sup>®</sup> Pro, Pentium<sup>®</sup> II, Pentium<sup>®</sup> III, and Pentium<sup>®</sup> III Xeon<sup>®</sup> processors.

The Pentium<sup>®</sup> 4, Pentium<sup>®</sup> D, and Pentium<sup>®</sup> processor Extreme Editions are based on the Intel NetBurst<sup>®</sup> microarchitecture. Most early Intel<sup>®</sup> Xeon<sup>®</sup> processors are based on the Intel NetBurst<sup>®</sup> microarchitecture. Intel Xeon processor 5000, 7100 series are based on the Intel NetBurst<sup>®</sup> microarchitecture.

The Intel<sup>®</sup> Core<sup>m</sup> Duo, Intel<sup>®</sup> Core<sup>m</sup> Solo and dual-core Intel<sup>®</sup> Xeon<sup>®</sup> processor LV are based on an improved Pentium<sup>®</sup> M processor microarchitecture.

The Intel<sup>®</sup> Xeon<sup>®</sup> processor 3000, 3200, 5100, 5300, 7200, and 7300 series, Intel<sup>®</sup> Pentium<sup>®</sup> dual-core, Intel<sup>®</sup> Core<sup>m</sup>2 Duo, Intel<sup>®</sup> Core<sup>m</sup>2 Quad and Intel<sup>®</sup> Core<sup>m</sup>2 Extreme processors are based on Intel<sup>®</sup> Core<sup>m</sup> microarchitecture.

The Intel<sup>®</sup> Xeon<sup>®</sup> processor 5200, 5400, 7400 series, Intel<sup>®</sup> Core<sup>™</sup>2 Quad processor Q9000 series, and Intel<sup>®</sup> Core<sup>™</sup>2 Extreme processors QX9000, X9000 series, Intel<sup>®</sup> Core<sup>™</sup>2 processor E8000 series are based on Enhanced Intel<sup>®</sup> Core<sup>™</sup> microarchitecture.

The Intel<sup>®</sup> Atom<sup>TM</sup> processor family is based on the Intel<sup>®</sup> Atom<sup>TM</sup> microarchitecture and supports Intel 64 architecture.

The Intel<sup>®</sup> Core<sup>TM</sup>i7 processor and the Intel<sup>®</sup> Core<sup>TM</sup>i5 processor are based on the Intel<sup>®</sup> microarchitecture code name Nehalem and support Intel 64 architecture.

Processors based on  $Intel^{(R)}$  microarchitecture code name Westmere support Intel 64 architecture.

P6 family, Pentium<sup>®</sup> M, Intel<sup>®</sup> Core<sup>™</sup> Solo, Intel<sup>®</sup> Core<sup>™</sup> Duo processors, dual-core Intel<sup>®</sup> Xeon<sup>®</sup> processor LV, and early generations of Pentium 4 and Intel Xeon processors support IA-32 architecture. The Intel<sup>®</sup> Atom<sup>™</sup> processor Z5xx series support IA-32 architecture.

The Intel<sup>®</sup> Xeon<sup>®</sup> processor E5 family, Intel<sup>®</sup> Xeon<sup>®</sup> processor E3 family, Intel<sup>®</sup> Core<sup>™</sup> i7-3930K processor, 2nd generation Intel<sup>®</sup> Core<sup>™</sup> i7-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i5-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i3-2xxx processor series, Intel<sup>®</sup> Xeon<sup>®</sup> processor E7-8800/4800/2800 product families, Intel<sup>®</sup> Xeon<sup>®</sup> processor 3000, 3200, 5000, 5100, 5200, 5300, 5400, 7100, 7200, 7300, 7400 series, Intel<sup>®</sup> Core<sup>™</sup> 2 Duo, Intel<sup>®</sup> Core<sup>™</sup> 2 Extreme processors, Intel Core 2 Quad processors, Pentium<sup>®</sup> D processor family support Intel<sup>®</sup> 64 architecture.

IA-32 architecture is the instruction set architecture and programming environment for Intel's 32-bit microprocessors. Intel<sup>®</sup> 64 architecture is the instruction set architecture and programming environment which is a superset of and compatible with IA-32 architecture.

...

#### 8. Updates to Chapter 4, Volume 3A

Change bars show changes to Chapter 4 of the  $Intel^{(R)}$  64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, Part 1.

\_\_\_\_\_

...







9.

...

#### Updates to Chapter 10, Volume 3A

Change bars show changes to Chapter 10 of the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, Part 1.

\_\_\_\_\_

...

### 10.5.1 Local Vector Table

The local vector table (LVT) allows software to specify the manner in which the local interrupts are delivered to the processor core. It consists of the following 32-bit APIC registers (see Figure 10-8), one for each local interrupt:

- LVT CMCI Register (FEE0 02F0H) Specifies interrupt delivery when an overflow condition of corrected machine check error count reaching a threshold value occurred in a machine check bank supporting CMCI (see Section 15.5.1, "CMCI Local APIC Interface").
- LVT Timer Register (FEE0 0320H) Specifies interrupt delivery when the APIC timer signals an interrupt (see Section 10.5.4, "APIC Timer").
- LVT Thermal Monitor Register (FEE0 0330H) Specifies interrupt delivery when the thermal sensor generates an interrupt (see Section 14.5.2, "Thermal Monitor"). This LVT entry is implementation specific, not architectural. If implemented, it will always be at base address FEE0 0330H.
- LVT Performance Counter Register (FEE0 0340H) Specifies interrupt delivery when a performance counter generates an interrupt on overflow (see



Section 18.10.5.8, "Generating an Interrupt on Overflow"). This LVT entry is implementation specific, not architectural. If implemented, it is not guaranteed to be at base address FEE0 0340H.

- LVT LINTO Register (FEE0 0350H) Specifies interrupt delivery when an interrupt is signaled at the LINTO pin.
- LVT LINT1 Register (FEE0 0360H) Specifies interrupt delivery when an interrupt is signaled at the LINT1 pin.
- LVT Error Register (FEE0 0370H) Specifies interrupt delivery when the APIC detects an internal error (see Section 10.5.3, "Error Handling").

The LVT performance counter register and its associated interrupt were introduced in the P6 processors and are also present in the Pentium 4 and Intel Xeon processors. The LVT thermal monitor register and its associated interrupt were introduced in the Pentium 4 and Intel Xeon processors. The LVT CMCI register and its associated interrupt were introduced in the rupt were introduced in the Intel Xeon 5500 processors.

As shown in Figure 10-8, some of these fields and flags are not available (and reserved) for some entries.

The setup information that can be specified in the registers of the LVT table is as follows:

- **Delivery Mode** Specifies the type of interrupt to be sent to the processor. Some delivery modes will only operate as intended when used in conjunction with a specific trigger mode. The allowable delivery modes are as follows:
  - **000 (Fixed)** Delivers the interrupt specified in the vector field.
  - **010 (SMI)** Delivers an SMI interrupt to the processor core through the processor's local SMI signal path. When using this delivery mode, the vector field should be set to 00H for future compatibility.
  - **100 (NMI)** Delivers an NMI interrupt to the processor. The vector information is ignored.
  - **101 (INIT)** Delivers an INIT request to the processor core, which causes the processor to perform an INIT. When using this delivery mode, the vector field should be set to 00H for future compatibility. Not supported for the LVT CMCI register, the LVT thermal monitor register, or the LVT performance counter register.

**110** Reserved; not supported for any LVT register.

**111 (ExtINT)** Causes the processor to respond to the interrupt as if the interrupt originated in an external-





Figure 10-8 Local Vector Table (LVT)

ly connected (8259A-compatible) interrupt controller. A special INTA bus cycle corresponding to ExtINT, is routed to the external controller. The external controller is expected to supply the vector information. The APIC architecture supports only one ExtINT source in a system, usually contained in the compatibility bridge. Only one processor in the system should have an LVT entry configured to use the ExtINT deliv-



ery mode. Not supported for the LVT CMCI register, the LVT thermal monitor register, or the LVT performance counter register.

#### **Delivery Status (Read Only)**

Indicates the interrupt delivery status, as follows:

**0 (Idle)** There is currently no activity for this interrupt source, or the previous interrupt from this source was delivered to the processor core and accepted.

#### 1 (Send Pending)

Indicates that an interrupt from this source has been delivered to the processor core but has not yet been accepted (see Section 10.5.5, "Local Interrupt Acceptance").

#### **Interrupt Input Pin Polarity**

Specifies the polarity of the corresponding interrupt pin: (0) active high or (1) active low.

#### Remote IRR Flag (Read Only)

For fixed mode, level-triggered interrupts; this flag is set when the local APIC accepts the interrupt for servicing and is reset when an EOI command is received from the processor. The meaning of this flag is undefined for edge-triggered interrupts and other delivery modes.

**Trigger Mode** Selects the trigger mode for the local LINT0 and LINT1 pins: (0) edge sensitive and (1) level sensitive. This flag is only used when the delivery mode is Fixed. When the delivery mode is NMI, SMI, or INIT, the trigger mode is always edge sensitive. When the delivery mode is ExtINT, the trigger mode is always level sensitive. The timer and error interrupts are always treated as edge sensitive.

If the local APIC is not used in conjunction with an I/O APIC and fixed delivery mode is selected; the Pentium 4, Intel Xeon, and P6 family processors will always use level-sensitive triggering, regardless if edge-sensitive triggering is selected.

- MaskInterrupt mask: (0) enables reception of the interrupt and (1)<br/>inhibits reception of the interrupt. When the local APIC handles a<br/>performance-monitoring counters interrupt, it automatically sets<br/>the mask flag in the LVT performance counter register. This flag is<br/>set to 1 on reset. It can be cleared only by software.
- Timer ModeBits 18:17 selects the timer mode (see Section 10.5.4):<br/>(00b) one-shot mode using a count-down value,<br/>(01b) periodic mode reloading a count-down value,<br/>(10b) TSC-Deadline mode using absolute target value in<br/>IA32\_TSC\_DEADLINE MSR (see Section 10.5.4.1),<br/>(11b) is reserved.

•••



### 10.8.1 Interrupt Handling with the Pentium 4 and Intel Xeon Processors

With the Pentium 4 and Intel Xeon processors, the local APIC handles the local interrupts, interrupt messages, and IPIs it receives as follows:

1. It determines if it is the specified destination or not (see Figure 10-16). If it is the specified destination, it accepts the message; if it is not, it discards the message.



Figure 10-16 Interrupt Acceptance Flow Chart for the Local APIC (Pentium 4 and Intel Xeon Processors)

- 2. If the local APIC determines that it is the designated destination for the interrupt and if the interrupt request is an NMI, SMI, INIT, ExtINT, or SIPI, the interrupt is sent directly to the processor core for handling.
- 3. If the local APIC determines that it is the designated destination for the interrupt but the interrupt request is not one of the interrupts given in step 2, the local APIC sets the appropriate bit in the IRR.
- 4. When interrupts are pending in the IRR register, the local APIC dispatches them to the processor one at a time, based on their priority and the current processor priority in the PPR (see Section 10.8.3.1, "Task and Processor Priorities").
- 5. When a fixed interrupt has been dispatched to the processor core for handling, the completion of the handler routine is indicated with an instruction in the instruction handler code that writes to the end-of-interrupt (EOI) register in the local APIC (see Section 10.8.5, "Signaling Interrupt Servicing Completion"). The act of writing to the EOI register causes the local APIC to delete the interrupt from its ISR queue and (for level-triggered interrupts) send a message on the bus indicating that the interrupt handling has been completed. (A write to the EOI register must not be included in the handler routine for an NMI, SMI, INIT, ExtINT, or SIPI.)

...

### 10.8.2 Interrupt Handling with the P6 Family and Pentium Processors

With the P6 family and Pentium processors, the local APIC handles the local interrupts, interrupt messages, and IPIs it receives as follows (see Figure 10-17).

 (IPIs only) It examines the IPI message to determines if it is the specified destination for the IPI as described in Section 10.6.2, "Determining IPI Destination." If it is the specified destination, it continues its acceptance procedure; if it is not the destination, it discards the IPI message. When the message specifies lowest-priority delivery mode, the local APIC will arbitrate with the other processors that were designated on recipients of the IPI message (see Section 10.6.2.4, "Lowest Priority Delivery Mode").





Figure 10-17 Interrupt Acceptance Flow Chart for the Local APIC (P6 Family and Pentium Processors)

- If the local APIC determines that it is the designated destination for the interrupt and if the interrupt request is an NMI, SMI, INIT, ExtINT, or INIT-deassert interrupt, or one of the MP protocol IPI messages (BIPI, FIPI, and SIPI), the interrupt is sent directly to the processor core for handling.
- 3. If the local APIC determines that it is the designated destination for the interrupt but the interrupt request is not one of the interrupts given in step 2, the local APIC looks for an open slot in one of its two pending interrupt queues contained in the IRR and ISR registers (see Figure 10-20). If a slot is available (see Section 10.8.4, "Interrupt Acceptance for Fixed Interrupts"), places the interrupt in the slot. If a slot is not available, it rejects the interrupt request and sends it back to the sender with a retry message.
- 4. When interrupts are pending in the IRR register, the local APIC dispatches them to the processor one at a time, based on their priority and the current processor priority in the PPR (see Section 10.8.3.1, "Task and Processor Priorities").
- 5. When a fixed interrupt has been dispatched to the processor core for handling, the completion of the handler routine is indicated with an instruction in the instruction



handler code that writes to the end-of-interrupt (EOI) register in the local APIC (see Section 10.8.5, "Signaling Interrupt Servicing Completion"). The act of writing to the EOI register causes the local APIC to delete the interrupt from its queue and (for level-triggered interrupts) send a message on the bus indicating that the interrupt handling has been completed. (A write to the EOI register must not be included in the handler routine for an NMI, SMI, INIT, ExtINT, or SIPI.)

The following sections describe the acceptance of interrupts and their handling by the local APIC and processor in greater detail.

• • •

### 10.8.3 Interrupt, Task, and Processor Priority

Each interrupt delivered to the processor through the local APIC has a priority based on its vector number. The local APIC uses this priority to determine when to service the interrupt relative to the other activities of the processor, including the servicing of other interrupts.

Each interrupt vector is an 8-bit value. The **interrupt-priority class** is the value of bits 7:4 of the interrupt vector. The lowest interrupt-priority class is 1 and the highest is 15; interrupts with vectors in the range 0-15 (with interrupt-priority class 0) are illegal and are never delivered. Because vectors 0-31 are reserved for dedicated uses by the Intel 64 and IA-32 architectures, software should configure interrupt vectors to use interrupt-priority classes in the range 2-15.

Each interrupt-priority class encompasses 16 vectors. The relative priority of interrupts within an interrupt-priority class is determined by the value of bits 3:0 of the vector number. The higher the value of those bits, the higher the priority within that interrupt-priority class. Thus, each interrupt vector comprises two parts, with the high 4 bits indicating its interrupt-priority class and the low 4 bits indicating its ranking within the interrupt-priority class.

#### 10.8.3.1 Task and Processor Priorities

The local APIC also defines a **task priority** and a **processor priority** that determine the order in which interrupts are handled. The **task-priority class** is the value of bits 7:4 of the task-priority register (TPR), which can be written by software (TPR is a read/write register); see Figure 10-18.



Figure 10-18 Task-Priority Register (TPR)

#### NOTE

In this discussion, the term "task" refers to a software defined task, process, thread, program, or routine that is dispatched to run on the



processor by the operating system. It does not refer to an IA-32 architecture defined task as described in Chapter 7, "Task Management."

The task priority allows software to set a priority threshold for interrupting the processor. This mechanism enables the operating system to temporarily block low priority interrupts from disturbing high-priority work that the processor is doing. The ability to block such interrupts using task priority results from the way that the TPR controls the value of the processor-priority register (PPR).<sup>1</sup>

The **processor-priority class** is a value in the range 0–15 that is maintained in bits 7:4 of the processor-priority register (PPR); see Figure 10-19. The PPR is a read-only register. The processor-priority class represents the current priority at which the processor is executing.

| 31                                           | 8                                             | 7 4 | 43 | 0 |
|----------------------------------------------|-----------------------------------------------|-----|----|---|
|                                              | Reserved                                      |     |    |   |
| Address: FEE0 00A0H<br>Value after reset: 0H | Processor Priority —<br>Processor-Priority St |     |    |   |

Figure 10-19 Processor-Priority Register (PPR)

The value of the PPR is based on the value of TPR and the value ISRV; ISRV is the vector number of the highest priority bit that is set in the ISR or 00H if no bit is set in the ISR. (See Section 10.8.4 for more details on the ISR.) The value of PPR is determined as follows:

- PPR[7:4] (the processor-priority class) the maximum of TPR[7:4] (the task- priority class) and ISRV[7:4] (the priority of the highest priority interrupt in service).
- PPR[3:0] (the processor-priority sub-class) is determined as follows:
  - If TPR[7:4] > ISRV[7:4], PPR[3:0] is TPR[3:0] (the task-priority sub-class).
  - If TPR[7:4] < ISRV[7:4], PPR[3:0] is 0.</p>
  - If TPR[7:4] = ISRV[7:4], PPR[3:0] may be either TPR[3:0] or 0. The actual behavior is model-specific.

The processor-priority class determines the priority threshold for interrupting the processor. The processor will deliver only those interrupts that have an interrupt-priority class higher than the processor-priority class in the PPR. If the processor-priority class is 0, the PPR does not inhibit the delivery any interrupt; if it is 15, the processor inhibits the delivery of all interrupts. (The processor-priority mechanism does not affect the delivery of interrupts with the NMI, SMI, INIT, ExtINT, INIT-deassert, and start-up delivery modes.)

The processor does not use the processor-priority sub-class to determine which interrupts to delivery and which to inhibit. (The processor uses the processor-priority subclass only to satisfy reads of the PPR.)

•••

<sup>1.</sup> The TPR also determines the arbitration priority of the local processor; see Section 10.6.2.4, "Lowest Priority Delivery Mode."



## 10.8.4 Interrupt Acceptance for Fixed Interrupts

The local APIC queues the fixed interrupts that it accepts in one of two interrupt pending registers: the interrupt request register (IRR) or in-service register (ISR). These two 256-bit read-only registers are shown in Figure 10-20. The 256 bits in these registers represent the 256 possible vectors; vectors 0 through 15 are reserved by the APIC (see also: Section 10.5.2, "Valid Interrupt Vectors").

#### NOTE

All interrupts with an NMI, SMI, INIT, ExtINT, start-up, or INIT-deassert delivery mode bypass the IRR and ISR registers and are sent directly to the processor core for servicing.



Figure 10-20 IRR, ISR and TMR Registers

The IRR contains the active interrupt requests that have been accepted, but not yet dispatched to the processor for servicing. When the local APIC accepts an interrupt, it sets the bit in the IRR that corresponds the vector of the accepted interrupt. When the processor core is ready to handle the next interrupt, the local APIC clears the highest priority IRR bit that is set and sets the corresponding ISR bit. The vector for the highest priority bit set in the ISR is then dispatched to the processor core for servicing.

While the processor is servicing the highest priority interrupt, the local APIC can send additional fixed interrupts by setting bits in the IRR. When the interrupt service routine issues a write to the EOI register (see Section 10.8.5, "Signaling Interrupt Servicing Completion"), the local APIC responds by clearing the highest priority ISR bit that is set. It then repeats the process of clearing the highest priority bit in the IRR and setting the corresponding bit in the ISR. The processor core then begins executing the service routing for the highest priority bit set in the ISR.

If more than one interrupt is generated with the same vector number, the local APIC can set the bit for the vector both in the IRR and the ISR. This means that for the Pentium 4 and Intel Xeon processors, the IRR and ISR can queue two interrupts for each interrupt vector: one in the IRR and one in the ISR. Any additional interrupts issued for the same interrupt vector are collapsed into the single bit in the IRR.

For the P6 family and Pentium processors, the IRR and ISR registers can queue no more than two interrupts per interrupt vector and will reject other interrupts that are received within the same vector.

If the local APIC receives an interrupt with an interrupt-priority class higher than that of the interrupt currently in service, and interrupts are enabled in the processor core, the local APIC dispatches the higher priority interrupt to the processor immediately (without waiting for a write to the EOI register). The currently executing interrupt handler is then interrupted so the higher-priority interrupt can be handled. When the handling of the



higher-priority interrupt has been completed, the servicing of the interrupted interrupt is resumed.

The trigger mode register (TMR) indicates the trigger mode of the interrupt (see Figure 10-20). Upon acceptance of an interrupt into the IRR, the corresponding TMR bit is cleared for edge-triggered interrupts and set for level-triggered interrupts. If a TMR bit is set when an EOI cycle for its corresponding interrupt vector is generated, an EOI message is sent to all I/O APICs.

...

## 10.8.6 Task Priority in IA-32e Mode

In IA-32e mode, operating systems can manage the 16 interrupt-priority classes (see Section 10.8.3, "Interrupt, Task, and Processor Priority") explicitly using the task priority register (TPR). Operating systems can use the TPR to temporarily block specific (low-priority) interrupts from interrupting a high-priority task. This is done by loading TPR with a value in which the task-priority class corresponds to the highest interrupt-priority class that is to be blocked. For example:

- Loading the TPR with a task-priority class of 8 (01000B) blocks all interrupts with an interrupt-priority class of 8 or less while allowing all interrupts with an interruptpriority class of 9 or more to be recognized.
- Loading the TPR with a task-priority class of 0 enables all external interrupts.
- Loading the TPR with a task-priority class of 0FH (01111B) disables all external interrupts.

The TPR (shown in Figure 10-18) is cleared to 0 on reset. In 64-bit mode, software can read and write the TPR using an alternate interface, MOV CR8 instruction. The new task-priority class is established when the MOV CR8 instruction completes execution. Software does not need to force serialization after loading the TPR using MOV CR8.

Use of the MOV CRn instruction requires a privilege level of 0. Programs running at privilege level greater than 0 cannot read or write the TPR. An attempt to do so causes a general-protection exception. The TPR is abstracted from the interrupt controller (IC), which prioritizes and manages external interrupt delivery to the processor. The IC can be an external device, such as an APIC or 8259. Typically, the IC provides a priority mechanism similar or identical to the TPR. The IC, however, is considered implementation-dependent with the under-lying priority mechanisms subject to change. CR8, by contrast, is part of the Intel 64 architecture. Software can depend on this definition remaining unchanged.

...

#### 10. Updates to Chapter 14, Volume 3B

Change bars show changes to Chapter 14 of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 3B:* System Programming Guide, Part 2.

\_\_\_\_\_

...

## 14.7.3 Package RAPL Domain

The MSR interfaces defined for the package RAPL domain are:



- MSR\_PKG\_POWER\_LIMIT allows software to set power limits for the package and measurement attributes associated with each limit,
- MSR\_PKG\_ENERGY\_STATUS reports measured actual energy usage,
- MSR\_PKG\_POWER\_INFO reports the package power range information for RAPL usage.

MSR\_PKG\_RAPL\_PERF\_STATUS can report the performance impact of power limiting, but its availability may be model-specific.



Figure 14-17 MSR\_PKG\_POWER\_LIMIT Register

MSR\_PKG\_POWER\_LIMIT allows a software agent to define power limitation for the package domain. Power limitation is defined in terms of average power usage (Watts) over a time window specified in MSR\_PKG\_POWER\_LIMIT. Two power limits can be specified, corresponding to time windows of different sizes. Each power limit provides independent clamping control that would permit the processor cores to go below OS-requested state to meet the power limits. A lock mechanism allow the software agent to enforce power limit settings. Once the lock bit is set, the power limit settings are static and un-modifiable until next RESET.

The bit fields of MSR\_PKG\_POWER\_LIMIT (Figure 14-17) are:

- **Package Power Limit #1**(bits 14:0): Sets the average power usage limit of the package domain corresponding to time window # 1. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.
- Enable Power Limit #1(bit 15): 0 = disabled; 1 = enabled.
- **Package Clamping Limitation #1** (bit 16): Allow going below OS-requested P/T state setting during time window specified by bits 23:17.
- **Time Window for Power Limit #1** (bits 23:17): Indicates the length of time window over which the power limit #1 The numeric value encoded by bits 23:17 is represented by the product of 2^Y \*F; where F is a single-digit decimal floating-point value between 1.0 and 1.3 with the fraction digit represented by bits 23:22, Y is an unsigned integer represented by bits 21:17. The unit of this field is specified by the "Time Units" field of MSR\_RAPL\_POWER\_UNIT.
- **Package Power Limit #2**(bits 46:32): Sets the average power usage limit of the package domain corresponding to time window # 2. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.
- Enable Power Limit #2(bit 47): 0 = disabled; 1 = enabled.



• **Package Clamping Limitation #2** (bit 48): Allow going below OS-requested P/T state setting during time window specified by bits 23:17.

- **Time Window for Power Limit #2** (bits 55:49): Indicates the length of time window over which the power limit #2 The numeric value encoded by bits 55:49 is represented by the product of 2^Y \*F; where F is a single-digit decimal floating-point value between 1.0 and 1.3 with the fraction digit represented by bits 55:54, Y is an unsigned integer represented by bits 53:49. The unit of this field is specified by the "Time Units" field of MSR\_RAPL\_POWER\_UNIT. This field may have a hard-coded value in hardware and ignores values written by software.
- Lock (bit 63): If set, all write attempts to this MSR are ignored until next RESET.

MSR\_PKG\_ENERGY\_STATUS is a read-only MSR. It reports the actual energy use for the package domain. This MSR is updated every ~1msec. It has a wraparound time of around 60 secs when power consumption is high, and may be longer otherwise.

| 63       | 32                  | 31  | 0 |  |
|----------|---------------------|-----|---|--|
|          | Reserved            |     |   |  |
|          | Total Energy Consum | ned |   |  |
| Reserved |                     |     |   |  |

Figure 14-18 MSR\_PKG\_ENERGY\_STATUS MSR

 Total Energy Consumed (bits 31:0): The unsigned integer value represents the total amount of energy consumed since that last time this register is cleared. The unit of this field is specified by the "Energy Status Units" field of MSR RAPL POWER UNIT.

MSR\_PKG\_POWER\_INFO is a read-only MSR. It reports the package power range information for RAPL usage. This MSR provides maximum/minimum values (derived from electrical specification), thermal specification power of the package domain. It also provides the largest possible time window for software to program the RAPL interface.



Figure 14-19 MSR\_PKG\_POWER\_INFO Register

- **Thermal Spec Power** (bits 14:0): The unsigned integer value is the equivalent of thermal specification power of the package domain. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.
- **Minimum Power** (bits 30:16): The unsigned integer value is the equivalent of minimum power derived from electrical spec of the package domain. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.



- **Maximum Power** (bits 46:32): The unsigned integer value is the equivalent of maximum power derived from the electrical spec of the package domain. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.
- Maximum Time Window (bits 53:48): The unsigned integer value is the equivalent of largest acceptable value to program the time window of MSR\_PKG\_POWER\_LIMIT. The unit of this field is specified by the "Time Units" field of MSR\_RAPL\_POWER\_UNIT.

...

## 14.7.4 PP0/PP1 RAPL Domains

The MSR interfaces defined for the PPO and PP1 domains are identical in layout. Generally, PPO refers to the processor cores. The availability of PP1 RAPL domain interface is platform-specific. For a client platform, PP1 domain refers to the power plane of a specific device in the uncore. For server platforms, PP1 domain is not supported, but its PP0 domain supports the MSR\_PP0\_PERF\_STATUS interface.

- MSR\_PP0\_POWER\_LIMIT/MSR\_PP1\_POWER\_LIMIT allow software to set power limits for the respective power plane domain.
- MSR\_PP0\_ENERGY\_STATUS/MSR\_PP1\_ENERGY\_STATUS report actual energy usage on a power plane.
- MSR\_PP0\_POLICY/MSR\_PP1\_POLICY allow software to adjust balance for respective power plane.

MSR\_PP0\_PERF\_STATUS can report the performance impact of power limiting, but it is not available in client platform.



Figure 14-21 MSR\_PP0\_POWER\_LIMIT/MSR\_PP1\_POWER\_LIMIT Register

MSR\_PP0\_POWER\_LIMIT/MSR\_PP1\_POWER\_LIMIT allows a software agent to define power limitation for the respective power plane domain. A lock mechanism in each power plane domain allow the software agent to enforce power limit settings independently. Once a lock bit is set, the power limit settings in that power plane are static and unmodifiable until next RESET.

The bit fields of MSR\_PP0\_POWER\_LIMIT/MSR\_PP1\_POWER\_LIMIT (Figure 14-21) are:

• **Power Limit** (bits 14:0): Sets the average power usage limit of the respective power plane domain. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.



- **Enable Power Limit** (bit 15): 0 = disabled; 1 = enabled.
- **Clamping Limitation** (bit 16): Allow going below OS-requested P/T state setting during time window specified by bits 23:17.
- **Time Window for Power Limit** (bits 23:17): Indicates the length of time window over which the power limit #1 The numeric value encoded by bits 23:17 is represented by the product of 2^Y \*F; where F is a single-digit decimal floating-point value between 1.0 and 1.3 with the fraction digit represented by bits 23:22, Y is an unsigned integer represented by bits 21:17. The unit of this field is specified by the "Time Units" field of MSR\_RAPL\_POWER\_UNIT.
- Lock (bit 31): If set, all write attempts to the MSR and corresponding policy MSR\_PP0\_POLICY/MSR\_PP1\_POLICY are ignored until next RESET.

...

### 14.7.5 DRAM RAPL Domain

The MSR interfaces defined for the DRAM domain is supported only in the server platform. The MSR interfaces are:

- MSR\_DRAM\_POWER\_LIMIT allows software to set power limits for the DRAM domain and measurement attributes associated with each limit,
- MSR\_DRAM\_ENERGY\_STATUS reports measured actual energy usage,
- MSR\_DRAM\_POWER\_INFO reports the DRAM domain power range information for RAPL usage.
- MSR\_DRAM\_RAPL\_PERF\_STATUS can report the performance impact of power limiting.



Figure 14-25 MSR\_DRAM\_POWER\_LIMIT Register

MSR\_DRAM\_POWER\_LIMIT allows a software agent to define power limitation for the DRAM domain. Power limitation is defined in terms of average power usage (Watts) over a time window specified in MSR\_DRAM\_POWER\_LIMIT. A power limit can be specified along with a time window. A lock mechanism allow the software agent to enforce power limit settings. Once the lock bit is set, the power limit settings are static and un-modifiable until next RESET.

The bit fields of MSR\_DRAM\_POWER\_LIMIT (Figure 14-25) are:



- **DRAM Power Limit #1**(bits 14:0): Sets the average power usage limit of the DRAM domain corresponding to time window # 1. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.
- Enable Power Limit #1(bit 15): 0 = disabled; 1 = enabled.

- **Time Window for Power Limit** (bits 23:17): Indicates the length of time window over which the power limit The numeric value encoded by bits 23:17 is represented by the product of 2^Y \*F; where F is a single-digit decimal floating-point value between 1.0 and 1.3 with the fraction digit represented by bits 23:22, Y is an unsigned integer represented by bits 21:17. The unit of this field is specified by the "Time Units" field of MSR\_RAPL\_POWER\_UNIT.
- Lock (bit 31): If set, all write attempts to this MSR are ignored until next RESET.

MSR\_DRAM\_ENERGY\_STATUS is a read-only MSR. It reports the actual energy use for the DRAM domain. This MSR is updated every ~1msec.

| 63       | 32 31                 |  |  |
|----------|-----------------------|--|--|
|          | Reserved              |  |  |
|          | Total Energy Consumed |  |  |
| Reserved |                       |  |  |

Figure 14-26 MSR\_DRAM\_ENERGY\_STATUS MSR

 Total Energy Consumed (bits 31:0): The unsigned integer value represents the total amount of energy consumed since that last time this register is cleared. The unit of this field is specified by the "Energy Status Units" field of MSR RAPL POWER UNIT.

MSR\_DRAM\_POWER\_INFO is a read-only MSR. It reports the DRAM power range information for RAPL usage. This MSR provides maximum/minimum values (derived from electrical specification), thermal specification power of the DRAM domain. It also provides the largest possible time window for software to program the RAPL interface.



#### Figure 14-27 MSR\_DRAM\_POWER\_INFO Register

- **Thermal Spec Power** (bits 14:0): The unsigned integer value is the equivalent of thermal specification power of the DRAM domain. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.
- **Minimum Power** (bits 30:16): The unsigned integer value is the equivalent of minimum power derived from electrical spec of the DRAM domain. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.



- **Maximum Power** (bits 46:32): The unsigned integer value is the equivalent of maximum power derived from the electrical spec of the DRAM domain. The unit of this field is specified by the "Power Units" field of MSR\_RAPL\_POWER\_UNIT.
- **Maximum Time Window** (bits 53:48): The unsigned integer value is the equivalent of largest acceptable value to program the time window of MSR\_DRAM\_POWER\_LIMIT. The unit of this field is specified by the "Time Units" field of MSR\_RAPL\_POWER\_UNIT.

...

#### 11. Updates to Chapter 17, Volume 3B

Change bars show changes to Chapter 17 of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 3B:* System Programming Guide, Part 2.

\_\_\_\_\_

...

## 17.6.1 LBR Stack

Processors based on Intel microarchitecture code name Nehalem provide 16 pairs of MSR to record last branch record information. The layout of each MSR pair is shown in Table 17-6 and Table 17-7.

| Bit Field | Bit Offset | Access | Description                                                                                                                                                                         |
|-----------|------------|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data      | 47:0       | R/0    | The linear address of the branch instruction itself, this is the "branch from" address.                                                                                             |
| SIGN_EXt  | 62:48      | R/0    | Signed extension of bit 47 of this register.                                                                                                                                        |
| MISPRED   | 63         | R/O    | When set, indicates either the target of the branch<br>was mispredicted and/or the direction (taken/non-<br>taken) was mispredicted; otherwise, the target<br>branch was predicted. |

#### Table 17-6 IA32\_LASTBRANCH\_x\_FROM\_IP

•••

#### 12. Updates to Chapter 18, Volume 3B

Change bars show changes to Chapter 18 of the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3B: System Programming Guide, Part 2.

\_\_\_\_\_

...

## 18.8.7 Intel<sup>®</sup> Xeon<sup>®</sup> Processor E5 Family Performance Monitoring Facility

The Intel<sup>®</sup> Xeon<sup>®</sup> processor E5 Family (and Intel<sup>®</sup> Core<sup>™</sup> i7-3930K Processor) are based on Intel microarchitecture code name Sandy Bridge. While the processor cores



share the same microarchitecture as those of the Intel<sup>®</sup> Xeon<sup>®</sup> Processor E3 Family and second generation Intel Core i7-2xxx, Intel Core i5-2xxx, Intel Core i3-2xxx processor series, the uncore subsystems are different. An overview of the uncore performance monitoring facilities of the Intel Xeon processor E5 family (and Intel Core i7-3930K processor) is described in Section 18.8.8.

Thus, the performance monitoring facilities in the processor core generally are the same as those described in Section 18.8 through Section 18.8.5. However, the MSR\_OFFCORE\_RSP\_0/MSR\_OFFCORE\_RSP\_1 Response Supplier Info field shown in Table 18-26 applies to Intel Core Processors with CPUID signature of DisplayFamily\_DisplayModel encoding of 06\_2AH; next generation Intel Xeon processor with CPUID signature of DisplayFamily\_DisplayModel encoding of 06\_2DH supports an additional field for remote DRAM controller shown in Table 18-29. Additionally, the are some small differences in the non-architectural performance monitoring events (see Table 19-4).

| litter Aeon Processor |          |        |                                                         |  |  |
|-----------------------|----------|--------|---------------------------------------------------------|--|--|
| Subtype               | Bit Name | Offset | Description                                             |  |  |
| Common                | Any      | 16     | (R/W). Catch all value for any response types.          |  |  |
| Supplier              | NO_SUPP  | 17     | (R/W). No Supplier Information available                |  |  |
| Info                  | LLC_HITM | 18     | (R/W). M-state initial lookup stat in L3.               |  |  |
|                       | LLC_HITE | 19     | (R/W). E-state                                          |  |  |
|                       | LLC_HITS | 20     | (R/W). S-state                                          |  |  |
|                       | LLC_HITF | 21     | (R/W). F-state                                          |  |  |
|                       | LOCAL    | 22     | (R/W). Local DRAM Controller                            |  |  |
|                       | Remote   | 30:23  | (R/W): Remote DRAM Controller (either all 0s or all 1s) |  |  |

## Table 18-29 MSR\_OFFCORE\_RSP\_x Supplier Info Field Definition for Next Generation Intel Xeon Processor

## 18.8.8 Intel<sup>®</sup> Xeon<sup>®</sup> Processor E5 Family Uncore Performance Monitoring Facility

The uncore subsystem in the Intel Xeon processor E5 family based on Intel microarchitecture Sandy Bridge has some similarities with those of the Intel Xeon processor E7 family based on Intel microarchitecture Sandy Bridge. Within the uncore subsystem, localized performance counter sets are provided at logic control unit scope. For example, each Cbox caching agent has a set of local performance counters, and the power controller unit (PCU) has its own local performance counters. Up to 8 C-Box units are supported in the uncore sub-system.

Table 18-30 summarizes the uncore PMU facilities providing MSR interfaces.

| Table 18-30 Offcore PMO MSR Summary for Intel Xeon Processor ES Family |               |                  |                  |                    |                  |                  |
|------------------------------------------------------------------------|---------------|------------------|------------------|--------------------|------------------|------------------|
| Box                                                                    | # of<br>Boxes | Counters per Box | Counter<br>Width | General<br>Purpose | Global<br>Enable | Sub-control MSRs |
| C-Box                                                                  | 8             | 4                | 44               | Yes                | per-box          | None             |
| PCU                                                                    | 1             | 4                | 48               | Yes                | per-box          | Match/Mask       |
| U-Box                                                                  | 1             | 2                | 44               | Yes                | uncore           | None             |

#### Table 18-30 Uncore PMU MSR Summary for Intel® Xeon® Processor E5 Family



### 13. Updates to Chapter 19, Volume 3B

Change bars show changes to Chapter 19 of the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3B: System Programming Guide, Part 2.

\_\_\_\_\_

...

...

## 19.3 PERFORMANCE MONITORING EVENTS FOR 2ND GENERATION INTEL<sup>®</sup> CORE<sup>™</sup> I7-2XXX, INTEL<sup>®</sup> CORE<sup>™</sup> I5-2XXX, INTEL<sup>®</sup> CORE<sup>™</sup> I3-2XXX PROCESSOR SERIES

Second generation Intel<sup>®</sup> Core<sup>™</sup> i7-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i5-2xxx, Intel<sup>®</sup> Core<sup>™</sup> i3-2xxx processor series are based on the Intel microarchitecture code name Sandy Bridge. They support architectural performance-monitoring events listed in Table 19-1. Non-architectural performance-monitoring events in the processor core are listed in Table 19-3, Table 19-4, and Table 19-5. The events in Table 19-3 apply to processors with CPUID signature of DisplayFamily\_DisplayModel encoding with the following values: 06\_2AH and 06\_2DH. The events in Table 19-4 apply to processors with CPUID signature 06\_2AH. The events in Table 19-5 apply to processors with CPUID signature 06\_2DH.

| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic              | Description                                                               | Comment |
|---------------|----------------|-------------------------------------|---------------------------------------------------------------------------|---------|
| 03H           | 01H            | LD_BLOCKS.DATA_U<br>NKNOWN          | blocked loads due to store buffer<br>blocks with unknown data.            |         |
| 03H           | 02H            | LD_BLOCKS.STORE_F<br>ORWARD         | loads blocked by overlapping with store buffer that cannot be forwarded . |         |
| 03H           | 08H            | LD_BLOCKS.NO_SR                     | # of Split loads blocked due to resource not available.                   |         |
| 03H           | 10H            | LD_BLOCKS.ALL_BLO<br>CK             | Number of cases where any load is blocked but has no DCU miss.            |         |
| 05H           | 01H            | MISALIGN_MEM_REF.<br>Loads          | Speculative cache-line split load uops dispatched to L1D.                 |         |
| 05H           | 02H            | MISALIGN_MEM_REF.<br>STORES         | Speculative cache-line split Store-<br>address uops dispatched to L1D.    |         |
| 07H           | 01H            | LD_BLOCKS_PARTIA<br>L.ADDRESS_ALIAS | False dependencies in MOB due to<br>partial compare on address.           |         |



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic                       | Description                                                                                                                                                                                                  | Comment                                              |
|---------------|----------------|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------|
| 07H           | 08H            | LD_BLOCKS_PARTIA<br>L.ALL_STA_BLOCK          | The number of times that load<br>operations are temporarily blocked<br>because of older stores, with<br>addresses that are not yet known. A<br>load operation may incur more than<br>one block of this type. |                                                      |
| 08H           | 01H            | DTLB_LOAD_MISSES.<br>MISS_CAUSES_A_WA<br>LK  | Misses in all TLB levels that cause a page walk of any page size.                                                                                                                                            |                                                      |
| 08H           | 02H            | DTLB_LOAD_MISSES.<br>WALK_COMPLETED          | Misses in all TLB levels that caused page walk completed of any size.                                                                                                                                        |                                                      |
| 08H           | 04H            | DTLB_LOAD_MISSES.<br>WALK_DURATION           | Cycle PMH is busy with a walk.                                                                                                                                                                               |                                                      |
| 08H           | 10H            | DTLB_LOAD_MISSES.<br>STLB_HIT                | Number of cache load STLB hits. No page walk.                                                                                                                                                                |                                                      |
| ODH           | 03H            | INT_MISC.RECOVERY<br>_CYCLES                 | Cycles waiting to recover after<br>Machine Clears or JEClear. Set<br>Cmask= 1.                                                                                                                               | Set Edge to<br>count<br>occurrences                  |
| ODH           | 40H            | INT_MISC.RAT_STALL<br>_CYCLES                | Cycles RAT external stall is sent to IDQ for this thread.                                                                                                                                                    |                                                      |
| OEH           | 01H            | UOPS_ISSUED.ANY                              | Increments each cycle the # of Uops<br>issued by the RAT to RS.<br>Set Cmask = 1, Inv = 1, Any= 1to<br>count stalled cycles of this core.                                                                    | Set Cmask = 1,<br>Inv = 1 to count<br>stalled cycles |
| 10H           | 01H            | FP_COMP_OPS_EXE.<br>X87                      | Counts number of X87 uops executed.                                                                                                                                                                          |                                                      |
| 10H           | 10H            | FP_COMP_OPS_EXE.<br>SSE_FP_PACKED_DO<br>UBLE | Counts number of SSE* double precision FP packed uops executed.                                                                                                                                              |                                                      |
| 10H           | 20H            | FP_COMP_OPS_EXE.<br>SSE_FP_SCALAR_SIN<br>GLE | Counts number of SSE* single<br>precision FP scalar uops executed.                                                                                                                                           |                                                      |
| 10H           | 40H            | FP_COMP_OPS_EXE.<br>SSE_PACKED SINGLE        | Counts number of SSE* single<br>precision FP packed uops executed.                                                                                                                                           |                                                      |
| 10H           | 80H            | FP_COMP_OPS_EXE.<br>SSE_SCALAR_DOUBL<br>E    | Counts number of SSE* double precision FP scalar uops executed.                                                                                                                                              |                                                      |
| 11H           | 01H            | SIMD_FP_256.PACKE<br>D_SINGLE                | Counts 256-bit packed single-<br>precision floating-point instructions                                                                                                                                       |                                                      |
| 11H           | 02H            | SIMD_FP_256.PACKE<br>D_DOUBLE                | Counts 256-bit packed double-<br>precision floating-point instructions                                                                                                                                       |                                                      |



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic          | Description                                                                                                             | Comment |
|---------------|----------------|---------------------------------|-------------------------------------------------------------------------------------------------------------------------|---------|
| 14H           | 01H            | ARITH.FPU_DIV_ACT<br>IVE        | Cycles that the divider is active,<br>includes INT and FP. Set 'edge =1,<br>cmask=1' to count the number of<br>divides. |         |
| 17H           | 01H            | INSTS_WRITTEN_TO<br>_IQ.INSTS   | Counts the number of instructions written into the IQ every cycle.                                                      |         |
| 24H           | 01H            | L2_RQSTS.DEMAND_<br>DATA_RD_HIT | Demand Data Read requests that<br>hit L2 cache                                                                          |         |
| 24H           | 03H            | L2_RQSTS.ALL_DEM<br>AND_DATA_RD | Counts any demand and L1 HW prefetch data load requests to L2.                                                          |         |
| 24H           | 04H            | L2_RQSTS.RFO_HITS               | Counts the number of store RFO requests that hit the L2 cache.                                                          |         |
| 24H           | 08H            | L2_RQSTS.RFO_MISS               | Counts the number of store RFO requests that miss the L2 cache.                                                         |         |
| 24H           | 0CH            | L2_RQSTS.ALL_RF0                | Counts all L2 store RFO requests.                                                                                       |         |
| 24H           | 10H            | L2_RQSTS.CODE_RD<br>_HIT        | Number of instruction fetches that hit the L2 cache.                                                                    |         |
| 24H           | 20H            | L2_RQSTS.CODE_RD<br>_MISS       | Number of instruction fetches that missed the L2 cache.                                                                 |         |
| 24H           | 30H            | L2_RQSTS.ALL_COD<br>E_RD        | Counts all L2 code requests.                                                                                            |         |
| 24H           | 40H            | L2_RQSTS.PF_HIT                 | Requests from L2 Hardware prefetcher that hit L2.                                                                       |         |
| 24H           | 80H            | L2_RQSTS.PF_MISS                | Requests from L2 Hardware<br>prefetcher that missed L2.                                                                 |         |
| 24H           | СОН            | L2_RQSTS.ALL_PF                 | Any requests from L2 Hardware<br>prefetchers                                                                            |         |
| 27H           | 01H            | L2_STORE_LOCK_RQ<br>STS.MISS    | RFOs that miss cache lines                                                                                              |         |
| 27H           | 04H            | L2_STORE_LOCK_RQ<br>STS.HIT_E   | RFOs that hit cache lines in E state                                                                                    |         |
| 27H           | 08H            | L2_STORE_LOCK_RQ<br>STS.HIT_M   | RFOs that hit cache lines in M state                                                                                    |         |
| 27H           | OFH            | L2_STORE_LOCK_RQ<br>STS.ALL     | RFOs that access cache lines in any state                                                                               |         |
| 28H           | 04H            | L2_L1D_WB_RQSTS.<br>HIT_E       | Not rejected writebacks from L1D to L2 cache lines in E state.                                                          |         |
| 28H           | 08H            | L2_L1D_WB_RQSTS.<br>HIT_M       | Not rejected writebacks from L1D to L2 cache lines in M state.                                                          |         |



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic                       | Description                                                                                                                                                                                                                                                | Comment                                                                           |
|---------------|----------------|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
| 2EH           | 4FH            | LONGEST_LAT_CACH<br>E.REFERENCE              | This event counts requests<br>originating from the core that<br>reference a cache line in the last<br>level cache.                                                                                                                                         | see Table 19-1                                                                    |
| 2EH           | 41H            | LONGEST_LAT_CACH<br>E.MISS                   | This event counts each cache miss<br>condition for references to the last<br>level cache.                                                                                                                                                                  | see Table 19-1                                                                    |
| ЗСН           | 00H            | CPU_CLK_UNHALTED<br>.THREAD_P                | Counts the number of thread cycles<br>while the thread is not in a halt<br>state. The thread enters the halt<br>state when it is running the HLT<br>instruction. The core frequency may<br>change from time to time due to<br>power or thermal throttling. | see Table 19-1                                                                    |
| ЗСН           | 01H            | CPU_CLK_THREAD_<br>UNHALTED.REF_XCL<br>K     | Increments at the frequency of XCLK (100 MHz) when not halted.                                                                                                                                                                                             | see Table 19-1                                                                    |
| 48H           | 01H            | L1D_PEND_MISS.PE<br>NDING                    | Increments the number of<br>outstanding L1D misses every cycle.<br>Set Cmaks = 1 and Edge =1 to count<br>occurrences.                                                                                                                                      | Counter 2 only;<br>Set Cmask = 1 to<br>count cycles.                              |
| 49H           | 01H            | DTLB_STORE_MISSE<br>S.MISS_CAUSES_A_<br>WALK | Miss in all TLB levels causes an page<br>walk of any page size (4K/2M/4M/<br>1G).                                                                                                                                                                          |                                                                                   |
| 49H           | 02H            | DTLB_STORE_MISSE<br>S.WALK_COMPLETED         | Miss in all TLB levels causes a page<br>walk that completes of any page<br>size (4K/2M/4M/1G).                                                                                                                                                             |                                                                                   |
| 49H           | 04H            | DTLB_STORE_MISSE<br>S.WALK_DURATION          | Cycles PMH is busy with this walk.                                                                                                                                                                                                                         |                                                                                   |
| 49H           | 10H            | DTLB_STORE_MISSE<br>S.STLB_HIT               | Store operations that miss the first<br>TLB level but hit the second and do<br>not cause page walks                                                                                                                                                        |                                                                                   |
| 4CH           | 01H            | LOAD_HIT_PRE.SW_<br>PF                       | Not SW-prefetch load dispatches<br>that hit fill buffer allocated for S/W<br>prefetch.                                                                                                                                                                     |                                                                                   |
| 4CH           | 02H            | LOAD_HIT_PRE.HW_<br>PF                       | Not SW-prefetch load dispatches that hit fill buffer allocated for H/W prefetch.                                                                                                                                                                           |                                                                                   |
| 4EH           | 02H            | HW_PRE_REQ.DL1_<br>MISS                      | Hardware Prefetch requests that<br>miss the L1D cache. A request is<br>being counted each time it access<br>the cache & miss it, including if a<br>block is applicable or if hit the Fill<br>Buffer for example.                                           | This accounts for<br>both L1 streamer<br>and IP-based<br>(IPP) HW<br>prefetchers. |



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic                              | Description                                                                                              | Comment                         |
|---------------|----------------|-----------------------------------------------------|----------------------------------------------------------------------------------------------------------|---------------------------------|
| 51H           | 01H            | L1D.REPLACEMENT                                     | Counts the number of lines brought into the L1 data cache.                                               |                                 |
| 51H           | 02H            | L1D.ALLOCATED_IN_<br>M                              | Counts the number of allocations of<br>modified L1D cache lines.                                         |                                 |
| 51H           | 04H            | L1D.EVICTION                                        | Counts the number of modified lines<br>evicted from the L1 data cache due<br>to replacement.             |                                 |
| 51H           | 08H            | L1D.ALL_M_REPLAC<br>EMENT                           | Cache lines in M state evicted out of<br>L1D due to Snoop HitM or dirty line<br>replacement              |                                 |
| 59H           | 20H            | PARTIAL_RAT_STALL<br>S.FLAGS_MERGE_UO<br>P          | Increments the number of flags-<br>merge uops in flight each cycle.                                      |                                 |
|               |                |                                                     | Set Cmask = 1 to count cycles.                                                                           |                                 |
| 59H           | 40H            | PARTIAL_RAT_STALL<br>S.SLOW_LEA_WINDO<br>W          | Cycles with at least one slow LEA<br>uop allocated.                                                      |                                 |
| 59H           | 80H            | PARTIAL_RAT_STALL<br>S.MUL_SINGLE_UOP               | Number of Multiply packed/scalar single precision uops allocated.                                        |                                 |
| 5BH           | 0CH            | RESOURCE_STALLS2.<br>ALL_FL_EMPTY                   | Cycles stalled due to free list empty                                                                    |                                 |
| 5BH           | 0FH            | RESOURCE_STALLS2.<br>ALL_PRF_CONTROL                | Cycles stalled due to control structures full for physical registers                                     |                                 |
| 5BH           | 40H            | RESOURCE_STALLS2.<br>BOB_FULL                       | Cycles Allocator is stalled due<br>Branch Order Buffer.                                                  |                                 |
| 5BH           | 4FH            | RESOURCE_STALLS2.<br>000_RSRC                       | Cycles stalled due to out of order resources full                                                        |                                 |
| 5CH           | 01H            | CPL_CYCLES.RINGO                                    | Unhalted core cycles when the thread is in ring 0                                                        | Use Edge to<br>count transition |
| 5CH           | 02H            | CPL_CYCLES.RING12<br>3                              | Unhalted core cycles when the<br>thread is not in ring 0                                                 |                                 |
| 5EH           | 01H            | RS_EVENTS.EMPTY_<br>CYCLES                          | Cycles the RS is empty for the thread.                                                                   |                                 |
| 60H           | 01H            | OFFCORE_REQUEST<br>S_OUTSTANDING.DE<br>MAND_DATA_RD | Offcore outstanding Demand Data<br>Read transactions in SQ to uncore.<br>Set Cmask=1 to count cycles.    |                                 |
| 60H           | 04H            | OFFCORE_REQUEST<br>S_OUTSTANDING.DE<br>MAND_RFO     | Offcore outstanding RFO store<br>transactions in SQ to uncore. Set<br>Cmask=1 to count cycles.           |                                 |
| 60H           | 08H            | OFFCORE_REQUEST<br>S_OUTSTANDING.AL<br>L_DATA_RD    | Offcore outstanding cacheable data<br>read transactions in SQ to uncore.<br>Set Cmask=1 to count cycles. |                                 |



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic                          | Description                                                                                                                                                                        | Comment                                  |
|---------------|----------------|-------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|
| 63H           | 01H            | LOCK_CYCLES.SPLIT_<br>LOCK_UC_LOCK_DUR<br>ATION | Cycles in which the L1D and L2 are locked, due to a UC lock or split lock.                                                                                                         |                                          |
| 63H           | 02H            | LOCK_CYCLES.CACHE<br>_LOCK_DURATION             | Cycles in which the L1D is locked.                                                                                                                                                 |                                          |
| 79H           | 02H            | IDQ.EMPTY                                       | Counts cycles the IDQ is empty.                                                                                                                                                    |                                          |
| 79H           | 04H            | IDQ.MITE_UOPS                                   | Increment each cycle <b>#</b> of uops<br>delivered to IDQ from MITE path.                                                                                                          | Can combine<br>Umask 04H and<br>20H      |
|               |                |                                                 | Set Cmask = 1 to count cycles.                                                                                                                                                     |                                          |
| 79H           | 08H            | IDQ.DSB_UOPS                                    | Increment each cycle. # of uops<br>delivered to IDQ from DSB path.<br>Set Cmask = 1 to count cycles.                                                                               | Can combine<br>Umask 08H and<br>10H      |
| 79H           | 10H            | IDQ.MS_DSB_UOPS                                 | Increment each cycle <b>#</b> of uops<br>delivered to IDQ when MS busy by<br>DSB. Set Cmask = 1 to count cycles<br>MS is busy. Set Cmask=1 and Edge<br>=1 to count MS activations. | Can combine<br>Umask 08H and<br>10H      |
| 79H           | 20H            | IDQ.MS_MITE_UOPS                                | Increment each cycle <b>#</b> of uops<br>delivered to IDQ when MS is busy by<br>MITE. Set Cmask = 1 to count cycles.                                                               | Can combine<br>Umask 04H and<br>20H      |
| 79H           | 30H            | IDQ.MS_UOPS                                     | Increment each cycle <b>#</b> of uops<br>delivered to IDQ from MS by either<br>DSB or MITE. Set Cmask = 1 to count<br>cycles.                                                      | Can combine<br>Umask 04H, 08H<br>and 30H |
| 80H           | 02H            | ICACHE.MISSES                                   | Number of Instruction Cache,<br>Streaming Buffer and Victim Cache<br>Misses. Includes UC accesses.                                                                                 |                                          |
| 85H           | 01H            | ITLB_MISSES.MISS_C<br>AUSES_A_WALK              | Misses in all ITLB levels that cause page walks                                                                                                                                    |                                          |
| 85H           | 02H            | ITLB_MISSES.WALK_<br>COMPLETED                  | Misses in all ITLB levels that cause completed page walks                                                                                                                          |                                          |
| 85H           | 04H            | ITLB_MISSES.WALK_<br>DURATION                   | Cycle PMH is busy with a walk.                                                                                                                                                     |                                          |
| 85H           | 10H            | ITLB_MISSES.STLB_H<br>IT                        | Number of cache load STLB hits. No page walk.                                                                                                                                      |                                          |
| 87H           | 01H            | ILD_STALL.LCP                                   | Stalls caused by changing prefix length of the instruction.                                                                                                                        |                                          |
| 87H           | 04H            | ILD_STALL.IQ_FULL                               | Stall cycles due to IQ is full.                                                                                                                                                    |                                          |
| 88H           | 01H            | BR_INST_EXEC.COND                               | Qualify conditional near branch<br>instructions executed, but not<br>necessarily retired.                                                                                          | Must combine<br>with umask 40H,<br>80H   |



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic                         | Description                                                                                                  | Comment                                |
|---------------|----------------|------------------------------------------------|--------------------------------------------------------------------------------------------------------------|----------------------------------------|
| 88H           | 02H            | BR_INST_EXEC.DIRE<br>CT_JMP                    | Qualify all unconditional near branch<br>instructions excluding calls and<br>indirect branches.              | Must combine<br>with umask 80H         |
| 88H           | 04H            | BR_INST_EXEC.INDIR<br>ECT_JMP_NON_CALL<br>_RET | Qualify executed indirect near<br>branch instructions that are not<br>calls nor returns.                     | Must combine<br>with umask 80H         |
| 88H           | 08H            | BR_INST_EXEC.RETU<br>RN_NEAR                   | Qualify indirect near branches that have a return mnemonic.                                                  | Must combine<br>with umask 80H         |
| 88H           | 10H            | BR_INST_EXEC.DIRE<br>CT_NEAR_CALL              | Qualify unconditional near call<br>branch instructions, excluding non<br>call branch, executed.              | Must combine<br>with umask 80H         |
| 88H           | 20H            | BR_INST_EXEC.INDIR<br>ECT_NEAR_CALL            | Qualify indirect near calls, including<br>both register and memory indirect,<br>executed.                    | Must combine<br>with umask 80H         |
| 88H           | 40H            | BR_INST_EXEC.NON<br>TAKEN                      | Qualify non-taken near branches executed.                                                                    | Applicable to<br>umask 01H only        |
| 88H           | 80H            | BR_INST_EXEC.TAKE<br>N                         | Qualify taken near branches<br>executed. Must combine with<br>01H,02H, 04H, 08H, 10H, 20H                    |                                        |
| 88H           | FFH            | BR_INST_EXEC.ALL_<br>BRANCHES                  | Counts all near executed branches (not necessarily retired).                                                 |                                        |
| 89H           | 01H            | BR_MISP_EXEC.CON<br>D                          | Qualify conditional near branch<br>instructions mispredicted.                                                | Must combine<br>with umask 40H,<br>80H |
| 89H           | 04H            | BR_MISP_EXEC.INDIR<br>ECT_JMP_NON_CALL<br>_RET | Qualify mispredicted indirect near<br>branch instructions that are not<br>calls nor returns.                 | Must combine<br>with umask 80H         |
| 89H           | 08H            | BR_MISP_EXEC.RETU<br>RN_NEAR                   | Qualify mispredicted indirect near<br>branches that have a return<br>mnemonic.                               | Must combine<br>with umask 80H         |
| 89H           | 10H            | BR_MISP_EXEC.DIRE<br>CT_NEAR_CALL              | Qualify mispredicted unconditional<br>near call branch instructions,<br>excluding non call branch, executed. | Must combine<br>with umask 80H         |
| 89H           | 20H            | BR_MISP_EXEC.INDIR<br>ECT_NEAR_CALL            | Qualify mispredicted indirect near<br>calls, including both register and<br>memory indirect, executed.       | Must combine<br>with umask 80H         |
| 89H           | 40H            | BR_MISP_EXEC.NON<br>Taken                      | Qualify mispredicted non-taken<br>near branches executed,.                                                   | Applicable to<br>umask 01H only        |
| 89H           | 80H            | BR_MISP_EXEC.TAKE<br>N                         | Qualify mispredicted taken near<br>branches executed. Must combine<br>with 01H,02H, 04H, 08H, 10H, 20H       |                                        |
| 89H           | FFH            | BR_MISP_EXEC.ALL_<br>BRANCHES                  | Counts all near executed branches (not necessarily retired).                                                 |                                        |



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic              | Description                                                                                          | Comment                         |
|---------------|----------------|-------------------------------------|------------------------------------------------------------------------------------------------------|---------------------------------|
| 9CH           | 01H            | IDQ_UOPS_NOT_DEL<br>IVERED.CORE     | Count number of non-delivered<br>uops to RAT per thread.                                             | Use Cmask to<br>qualify uop b/w |
| A1H           | 01H            | UOPS_DISPATCHED_<br>PORT.PORT_0     | Cycles which a Uop is dispatched on port 0.                                                          |                                 |
| A1H           | 02H            | UOPS_DISPATCHED_<br>PORT.PORT_1     | Cycles which a Uop is dispatched on port 1.                                                          |                                 |
| A1H           | 04H            | UOPS_DISPATCHED_<br>PORT.PORT_2_LD  | Cycles which a load uop is dispatched on port 2.                                                     |                                 |
| A1H           | 08H            | UOPS_DISPATCHED_<br>PORT.PORT_2_STA | Cycles which a store address uop is dispatched on port 2.                                            |                                 |
| A1H           | 0CH            | UOPS_DISPATCHED_<br>PORT.PORT_2     | Cycles which a Uop is dispatched on port 2.                                                          |                                 |
| A1H           | 10H            | UOPS_DISPATCHED_<br>PORT.PORT_3_LD  | Cycles which a load uop is dispatched on port 3.                                                     |                                 |
| A1H           | 20H            | UOPS_DISPATCHED_<br>PORT.PORT_3_STA | Cycles which a store address uop is dispatched on port 3.                                            |                                 |
| A1H           | 30H            | UOPS_DISPATCHED_<br>PORT.PORT_3     | Cycles which a Uop is dispatched on port 3.                                                          |                                 |
| A1H           | 40H            | UOPS_DISPATCHED_<br>PORT.PORT_4     | Cycles which a Uop is dispatched on port 4.                                                          |                                 |
| A1H           | 80H            | UOPS_DISPATCHED_<br>PORT.PORT_5     | Cycles which a Uop is dispatched on port 5.                                                          |                                 |
| A2H           | 01H            | RESOURCE_STALLS.<br>ANY             | Cycles Allocation is stalled due to Resource Related reason.                                         |                                 |
| A2H           | 02H            | RESOURCE_STALLS.L<br>B              | Counts the cycles of stall due to lack of load buffers.                                              |                                 |
| A2H           | 04H            | RESOURCE_STALLS.R<br>S              | Cycles stalled due to no eligible RS entry available.                                                |                                 |
| A2H           | 08H            | RESOURCE_STALLS.S<br>B              | Cycles stalled due to no store<br>buffers available. (not including<br>draining form sync).          |                                 |
| A2H           | 10H            | RESOURCE_STALLS.R<br>OB             | Cycles stalled due to re-order buffer full.                                                          |                                 |
| A2H           | 20H            | RESOURCE_STALLS.F<br>CSW            | Cycles stalled due to writing the FPU control word.                                                  |                                 |
| A2H           | 40H            | RESOURCE_STALLS.<br>MXCSR           | Cycles stalled due to the MXCSR<br>register rename occurring to close<br>to a previous MXCSR rename. |                                 |
| A2H           | 80H            | RESOURCE_STALLS.<br>OTHER           | Cycles stalled while execution was stalled due to other resource issues.                             |                                 |
| ABH           | 01H            | DSB2MITE_SWITCHE<br>S.COUNT         | Number of DSB to MITE switches.                                                                      |                                 |



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic               | Description                                                                                                                                                                                                                                                                          | Comment                              |
|---------------|----------------|--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|
| ABH           | 02H            | DSB2MITE_SWITCHE<br>S.PENALTY_CYCLES | Cycles DSB to MITE switches caused delay.                                                                                                                                                                                                                                            |                                      |
| ACH           | 02H            | DSB_FILL.OTHER_CA<br>NCEL            | Cases of cancelling valid DSB fill not because of exceeding way limit                                                                                                                                                                                                                |                                      |
| ACH           | 08H            | DSB_FILL.EXCEED_D<br>SB_LINES        | DSB Fill encountered > 3 DSB lines.                                                                                                                                                                                                                                                  |                                      |
| ACH           | OAH            | DSB_FILL.ALL_CANC<br>EL              | Cases of cancelling valid Decode<br>Stream Buffer (DSB) fill not because<br>of exceeding way limit                                                                                                                                                                                   |                                      |
| AEH           | 01H            | ITLB.ITLB_FLUSH                      | Counts the number of ITLB flushes, includes 4k/2M/4M pages.                                                                                                                                                                                                                          |                                      |
| BOH           | 01H            | OFFCORE_REQUEST<br>S.DEMAND_DATA_RD  | Demand data read requests sent to uncore.                                                                                                                                                                                                                                            |                                      |
| BOH           | 04H            | OFFCORE_REQUEST<br>S.DEMAND_RFO      | Demand RFO read requests sent to<br>uncore., including regular RFOs,<br>locks, ItoM                                                                                                                                                                                                  |                                      |
| BOH           | 08H            | OFFCORE_REQUEST<br>S.ALL_DATA_RD     | Data read requests sent to uncore (demand and prefetch).                                                                                                                                                                                                                             |                                      |
| B1H           | 01H            | UOPS_DISPATCHED.T<br>HREAD           | Counts total number of uops to be dispatched per-thread each cycle. Set Cmask = 1, INV =1 to count stall cycles.                                                                                                                                                                     |                                      |
| B1H           | 02H            | UOPS_DISPATCHED.C<br>ORE             | Counts total number of uops to be dispatched per-core each cycle.                                                                                                                                                                                                                    | Do not need to set ANY               |
| B2H           | 01H            | OFFCORE_REQUEST<br>S_BUFFER.SQ_FULL  | Offcore requests buffer cannot take more entries for this thread core.                                                                                                                                                                                                               |                                      |
| B6H           | 01H            | AGU_BYPASS_CANCE<br>L.COUNT          | Counts executed load operations<br>with all the following traits: 1.<br>addressing of the format [base +<br>offset], 2. the offset is between 1<br>and 2047, 3. the address specified<br>in the base register is in one page<br>and the address [base+offset] is in<br>another page. |                                      |
| B7H           | 01H            | OFF_CORE_RESPONS<br>E_0              | see Section 18.8.5, "Off-core<br>Response Performance Monitoring";<br>PMCO only.                                                                                                                                                                                                     | Requires<br>programming<br>MSR 01A6H |
| BBH           | 01H            | OFF_CORE_RESPONS<br>E_1              | See Section 18.8.5, "Off-core<br>Response Performance Monitoring".<br>PMC3 only.                                                                                                                                                                                                     | Requires<br>programming<br>MSR 01A7H |
| BDH           | 01H            | TLB_FLUSH.DTLB_T<br>HREAD            | DTLB flush attempts of the thread-<br>specific entries                                                                                                                                                                                                                               |                                      |



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic              | Description                                                                                                                           | Comment                                   |
|---------------|----------------|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|
| BDH           | 20H            | TLB_FLUSH.STLB_A<br>NY              | Count number of STLB flush<br>attempts                                                                                                |                                           |
| BFH           | 05H            | L1D_BLOCKS.BANK_<br>CONFLICT_CYCLES | Cycles when dispatched loads are<br>cancelled due to L1D bank conflicts<br>with other load ports                                      | cmask=1                                   |
| СОН           | 00H            | INST_RETIRED.ANY_<br>P              | Number of instructions at retirement                                                                                                  | See Table 19-1                            |
| СОН           | 01H            | INST_RETIRED.PREC<br>_DIST          | Precise instruction retired event<br>with HW to reduce effect of PEBS<br>shadow in IP distribution                                    | PMC1 only; Must<br>quiesce other<br>PMCs. |
| C1H           | 02H            | OTHER_ASSISTS.ITL<br>B_MISS_RETIRED | Instructions that experienced an ITLB miss.                                                                                           |                                           |
| C1H           | 08H            | OTHER_ASSISTS.AVX<br>_STORE         | Number of assists associated with 256-bit AVX store operations.                                                                       |                                           |
| C1H           | 10H            | OTHER_ASSISTS.AVX<br>_TO_SSE        | Number of transitions from AVX-<br>256 to legacy SSE when penalty<br>applicable.                                                      |                                           |
| C1H           | 20H            | OTHER_ASSISTS.SSE<br>_TO_AVX        | Number of transitions from SSE to AVX-256 when penalty applicable.                                                                    |                                           |
| C2H           | 01H            | UOPS_RETIRED.ALL                    | Counts the number of micro-ops<br>retired, Use cmask=1 and invert to<br>count active cycles or stalled cycles.                        | Supports PEBS                             |
| C2H           | 02H            | UOPS_RETIRED.RETI<br>RE_SLOTS       | Counts the number of retirement slots used each cycle.                                                                                |                                           |
| СЗН           | 02H            | MACHINE_CLEARS.M<br>EMORY_ORDERING  | Counts the number of machine<br>clears due to memory order<br>conflicts.                                                              |                                           |
| СЗН           | 04H            | Machine_clears.s<br>Mc              | Counts the number of times that a program writes to a code section.                                                                   |                                           |
| СЗН           | 20H            | Machine_clears.m<br>Askmov          | Counts the number of executed<br>AVX masked load operations that<br>refer to an illegal address range<br>with the mask bits set to 0. |                                           |
| C4H           | 00H            | BR_INST_RETIRED.A<br>LL_BRANCHES    | Branch instructions at retirement                                                                                                     | See Table 19-1                            |
| C4H           | 01H            | BR_INST_RETIRED.C<br>ONDITIONAL     | Counts the number of conditional branch instructions retired.                                                                         | Supports PEBS                             |
| C4H           | 02H            | BR_INST_RETIRED.N<br>EAR_CALL       | Direct and indirect near call<br>instructions retired.                                                                                |                                           |
| C4H           | 04H            | BR_INST_RETIRED.A<br>LL_BRANCHES    | Counts the number of branch instructions retired.                                                                                     |                                           |



# Table 19-3Non-Architectural Performance Events In the Processor Core common to<br/>Intel® Core™ i7-2xxx, Intel® Core™ i5-2xxx, Intel® Core™ i3-2xxx Processor Series and<br/>Intel® Xeon® Processor E5 Family

| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic              | Description                                                                              | Comment                           |
|---------------|----------------|-------------------------------------|------------------------------------------------------------------------------------------|-----------------------------------|
| C4H           | 08H            | BR_INST_RETIRED.N<br>EAR_RETURN     | Counts the number of near return instructions retired.                                   |                                   |
| C4H           | 10H            | BR_INST_RETIRED.N<br>OT_TAKEN       | Counts the number of not taken branch instructions retired.                              |                                   |
| C4H           | 20H            | BR_INST_RETIRED.N<br>EAR_TAKEN      | Number of near taken branches retired.                                                   |                                   |
| C4H           | 40H            | BR_INST_RETIRED.F<br>AR_BRANCH      | Number of far branches retired.                                                          |                                   |
| C5H           | 00H            | BR_MISP_RETIRED.A<br>LL_BRANCHES    | Mispredicted branch instructions at retirement                                           | See Table 19-1                    |
| C5H           | 01H            | BR_MISP_RETIRED.C<br>ONDITIONAL     | Mispredicted conditional branch<br>instructions retired.                                 | Supports PEBS                     |
| C5H           | 02H            | BR_MISP_RETIRED.N<br>EAR_CALL       | Direct and indirect mispredicted<br>near call instructions retired.                      |                                   |
| C5H           | 04H            | BR_MISP_RETIRED.A<br>LL_BRANCHES    | Mispredicted macro branch<br>instructions retired.                                       |                                   |
| C5H           | 10H            | BR_MISP_RETIRED.N<br>OT_TAKEN       | Mispredicted not taken branch<br>instructions retired.                                   |                                   |
| C5H           | 20H            | BR_MISP_RETIRED.T<br>AKEN           | Mispredicted taken branch<br>instructions retired.                                       |                                   |
| CAH           | 02H            | FP_ASSIST.X87_OUT<br>PUT            | Number of X87 assists due to<br>output value.                                            |                                   |
| CAH           | 04H            | FP_ASSIST.X87_INP<br>UT             | Number of X87 assists due to input value.                                                |                                   |
| CAH           | 08H            | FP_ASSIST.SIMD_OU<br>TPUT           | Number of SIMD FP assists due to<br>Output values                                        |                                   |
| CAH           | 10H            | FP_ASSIST.SIMD_INP<br>UT            | Number of SIMD FP assists due to<br>input values                                         |                                   |
| CAH           | 1EH            | FP_ASSIST.ANY                       | Cycles with any input/output SSE*<br>or FP assists                                       |                                   |
| ССН           | 20H            | ROB_MISC_EVENTS.L<br>BR_INSERTS     | Count cases of saving new LBR records by hardware.                                       |                                   |
| CDH           | 01H            | MEM_TRANS_RETIR<br>ED.LOAD_LATENCY  | Sample loads with specified latency threshold. PMC3 only.                                | Specify threshold<br>in MSR 0x3F6 |
| CDH           | 02H            | MEM_TRANS_RETIR<br>ED.PRECISE_STORE | Sample stores and collect precise<br>store operation via PEBS record.<br>PMC3 only.      | See Section<br>18.8.4.3           |
| DOH           | 01H            | Mem_uop_retired.<br>Loads           | Qualify retired memory uops that<br>are loads. Combine with umask 10H,<br>20H, 40H, 80H. | Supports PEBS                     |



# Table 19-3Non-Architectural Performance Events In the Processor Core common to<br/>Intel® Core™ i7-2xxx, Intel® Core™ i5-2xxx, Intel® Core™ i3-2xxx Processor Series and<br/>Intel® Xeon® Processor E5 Family

| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic                          | Description                                                                                                                                            | Comment        |
|---------------|----------------|-------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
| DOH           | 02H            | MEM_UOP_RETIRED.<br>STORES                      | Qualify retired memory uops that<br>are stores. Combine with umask<br>10H, 20H, 40H, 80H.                                                              |                |
| DOH           | 10H            | MEM_UOP_RETIRED.<br>STLB_MISS                   | Qualify retired memory uops with<br>STLB miss. Must combine with<br>umask 01H, 02H, to produce counts.                                                 |                |
| DOH           | 20H            | MEM_UOP_RETIRED.<br>LOCK                        | Qualify retired memory uops with<br>lock. Must combine with umask 01H,<br>02H, to produce counts.                                                      |                |
| DOH           | 40H            | MEM_UOP_RETIRED.<br>SPLIT                       | Qualify retired memory uops with<br>line split. Must combine with umask<br>01H, 02H, to produce counts.                                                |                |
| DOH           | 80H            | MEM_UOP_RETIRED.<br>ALL                         | Qualify any retired memory uops.<br>Must combine with umask 01H,<br>02H, to produce counts.                                                            |                |
| D1H           | 01H            | Mem_load_uops_r<br>etired.l1_hit                | Retired load uops with L1 cache hits as data sources.                                                                                                  | Supports PEBS  |
| D1H           | 02H            | MEM_LOAD_UOPS_R<br>ETIRED.L2_HIT                | Retired load uops with L2 cache hits as data sources.                                                                                                  |                |
| D1H           | 40H            | MEM_LOAD_UOPS_R<br>ETIRED.HIT_LFB               | Retired load uops which data<br>sources were load uops missed L1<br>but hit FB due to preceding miss to<br>the same cache line with data not<br>ready. |                |
| D2H           | 01H            | MEM_LOAD_UOPS_L<br>LC_HIT_RETIRED.XS<br>NP_MISS | Retired load uops which data<br>sources were LLC hit and cross-core<br>snoop missed in on-pkg core cache.                                              | Supports PEBS  |
| D2H           | 02H            | MEM_LOAD_UOPS_L<br>LC_HIT_RETIRED.XS<br>NP_HIT  | Retired load uops which data<br>sources were LLC and cross-core<br>snoop hits in on-pkg core cache.                                                    |                |
| D2H           | 04H            | MEM_LOAD_UOPS_L<br>LC_HIT_RETIRED.XS<br>NP_HITM | Retired load uops which data<br>sources were HitM responses from<br>shared LLC.                                                                        |                |
| D2H           | 08H            | MEM_LOAD_UOPS_L<br>LC_HIT_RETIRED.XS<br>NP_NONE | Retired load uops which data<br>sources were hits in LLC without<br>snoops required.                                                                   |                |
| D4H           | 02H            | MEM_LOAD_UOPS_M<br>ISC_RETIRED.LLC_MI<br>SS     | Retired load uops with unknown<br>information as data source in cache<br>serviced the load.                                                            | Supports PEBS. |
| FOH           | 01H            | L2_TRANS.DEMAND_<br>DATA_RD                     | Demand Data Read requests that<br>access L2 cache                                                                                                      |                |
| FOH           | 02H            | L2_TRANS.RFO                                    | RFO requests that access L2 cache                                                                                                                      |                |



#### Table 19-3 Non-Architectural Performance Events In the Processor Core common to Intel® Core™ i7-2xxx, Intel® Core™ i5-2xxx, Intel® Core™ i3-2xxx Processor Series and Intel® Xeon® Processor E5 Family

| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic        | Description                                     | Comment                             |
|---------------|----------------|-------------------------------|-------------------------------------------------|-------------------------------------|
| FOH           | 04H            | L2_TRANS.CODE_RD              | L2 cache accesses when fetching instructions    |                                     |
| FOH           | 08H            | L2_TRANS.ALL_PF               | L2 or LLC HW prefetches that<br>access L2 cache | including rejects.                  |
| FOH           | 10H            | L2_TRANS.L1D_WB               | L1D writebacks that access L2 cache             |                                     |
| FOH           | 20H            | L2_TRANS.L2_FILL              | L2 fill requests that access L2 cache           |                                     |
| FOH           | 40H            | L2_TRANS.L2_WB                | L2 writebacks that access L2 cache              |                                     |
| FOH           | 80H            | L2_TRANS.ALL_REQ<br>UESTS     | Transactions accessing L2 pipe                  |                                     |
| F1H           | 01H            | L2_LINES_IN.I                 | L2 cache lines in I state filling L2            | Counting does<br>not cover rejects. |
| F1H           | 02H            | L2_LINES_IN.S                 | L2 cache lines in S state filling L2            | Counting does<br>not cover rejects. |
| F1H           | 04H            | L2_LINES_IN.E                 | L2 cache lines in E state filling L2            | Counting does<br>not cover rejects. |
| F1H           | 07H            | L2_LINES_IN.ALL               | L2 cache lines filling L2                       | Counting does<br>not cover rejects. |
| F2H           | 01H            | L2_LINES_OUT.DEMA<br>ND_CLEAN | Clean L2 cache lines evicted by<br>demand       |                                     |
| F2H           | 02H            | L2_LINES_OUT.DEMA<br>ND_DIRTY | Dirty L2 cache lines evicted by<br>demand       |                                     |
| F2H           | 04H            | L2_LINES_OUT.PF_C<br>LEAN     | Clean L2 cache lines evicted by L2 prefetch     |                                     |
| F2H           | 08H            | L2_LINES_OUT.PF_DI<br>RTY     | Dirty L2 cache lines evicted by L2 prefetch     |                                     |
| F2H           | OAH            | L2_LINES_OUT.DIRT<br>Y_ALL    | Dirty L2 cache lines filling the L2             | Counting does<br>not cover rejects. |
| F4H           | 10H            | SQ_MISC.SPLIT_LOCK            | Split locks in SQ                               |                                     |

Non-architecture performance monitoring events in the processor core that are applicable only to Intel Xeon processor E5 family (and Intel Core i7-3930 processor) based on Intel microarchitecture Sandy Bridge, with CPUID signature of DisplayFamily\_DisplayModel 06\_2DH, are listed in Table 19-5.

...



| Table 19-5         Non-Architectural Performance Events Applicable only to the Processor Core |
|-----------------------------------------------------------------------------------------------|
| of Intel <sup>®</sup> Xeon <sup>®</sup> Processor E5 Family                                   |

| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic          | Description                                                                                                                                | Comment          |
|---------------|----------------|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|------------------|
| B7H/<br>BBH   | 01H            | OFF_CORE_RESPONS<br>E_N         | Sub-events of<br>OFF_CORE_RESPONSE_N (suffix N =<br>0, 1) programmed using MSR 01A6H/<br>01A7H with values shown in the<br>comment column. |                  |
|               |                | SPONSE_N                        |                                                                                                                                            | 0x3FFFC0000<br>4 |
|               |                | OFFCORE_RESPONSE.<br>RAM_N      | Demand_code_rd.llc_miss.local_d                                                                                                            | 0x600400004      |
|               |                | OFFCORE_RESPONSE.<br>_DRAM_N    | DEMAND_CODE_RD.LLC_MISS.REMOTE                                                                                                             | 0x67F800004      |
|               |                | OFFCORE_RESPONSE.<br>_HIT_FWD_N | DEMAND_CODE_RD.LLC_MISS.REMOTE                                                                                                             | 0x87F800004      |
|               |                | OFFCORE_RESPONSE.<br>_HITM_N    | DEMAND_CODE_RD.LLC_MISS.REMOTE                                                                                                             | 0x107FC0000<br>4 |
|               |                | OFFCORE_RESPONSE.<br>AM_N       | Demand_data_rd.llc_miss.any_dr                                                                                                             | 0x67FC00001      |
|               |                | OFFCORE_RESPONSE.<br>SPONSE_N   | DEMAND_DATA_RD.LLC_MISS.ANY_RE                                                                                                             | 0x3F803C000<br>1 |
|               |                | OFFCORE_RESPONSE.<br>RAM_N      | DEMAND_DATA_RD.LLC_MISS.LOCAL_D                                                                                                            | 0x600400001      |
|               |                | OFFCORE_RESPONSE.<br>_DRAM_N    | DEMAND_DATA_RD.LLC_MISS.REMOTE                                                                                                             | 0x67F800001      |
|               |                | OFFCORE_RESPONSE.<br>_HIT_FWD_N | DEMAND_DATA_RD.LLC_MISS.REMOTE                                                                                                             | 0x87F800001      |
|               |                | OFFCORE_RESPONSE.<br>_HITM_N    | DEMAND_DATA_RD.LLC_MISS.REMOTE                                                                                                             | 0x107FC0000<br>1 |
|               |                | OFFCORE_RESPONSE.<br>ONSE_N     | PF_L2_CODE_RD.LLC_MISS.ANY_RESP                                                                                                            | 0x3F803C004<br>0 |
|               |                | OFFCORE_RESPONSE.<br>_N         | PF_L2_DATA_RD.LLC_MISS.ANY_DRAM                                                                                                            | 0x67FC00010      |
|               |                | OFFCORE_RESPONSE.<br>ONSE_N     | PF_L2_DATA_RD.LLC_MISS.ANY_RESP                                                                                                            | 0x3F803C001<br>0 |
|               |                | OFFCORE_RESPONSE.<br>AM_N       | PF_L2_DATA_RD.LLC_MISS.LOCAL_DR                                                                                                            | 0x600400010      |
|               |                | OFFCORE_RESPONSE.<br>RAM_N      | PF_L2_DATA_RD.LLC_MISS.REMOTE_D                                                                                                            | 0x67F800010      |
|               |                | OFFCORE_RESPONSE.<br>T_FWD_N    | PF_L2_DATA_RD.LLC_MISS.REMOTE_HI                                                                                                           | 0x87F800010      |
|               |                | OFFCORE_RESPONSE.<br>TM_N       | PF_L2_DATA_RD.LLC_MISS.REMOTE_HI                                                                                                           | 0x107FC0001<br>0 |



## Table 19-5Non-Architectural Performance Events Applicable only to the Processor Core<br/>of Intel® Xeon® Processor E5 Family

| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic                                      | Description | Comment          |
|---------------|----------------|-------------------------------------------------------------|-------------|------------------|
|               |                | OFFCORE_RESPONSE.PF_LLC_CODE_RD.LLC_MISS.ANY_RES<br>PONSE_N |             | 0x3FFFC0020<br>0 |
|               |                | OFFCORE_RESPONSE.PF_LLC_DATA_RD.LLC_MISS.ANY_RES<br>PONSE_N |             | 0x3FFFC0008<br>0 |

•••

#### Table 19-7 Non-Architectural Performance Events In the Processor Core for Intel<sup>®</sup> Core<sup>™</sup> i7 Processor and Intel<sup>®</sup> Xeon<sup>®</sup> Processor 5500 Series

| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic | Description                                                                                                                                                                                                                                        | Comment                                                                                                                      |
|---------------|----------------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
|               |                |                        |                                                                                                                                                                                                                                                    |                                                                                                                              |
| СОН           | 00Н            | INST_RETIRED.ANY_<br>P | See Table 19-1<br>Notes: INST_RETIRED.ANY is<br>counted by a designated fixed<br>counter. INST_RETIRED.ANY_P is<br>counted by a programmable counter<br>and is an architectural performance<br>event. Event is supported if<br>CPUID.A.EBX[1] = 0. | Counting:<br>Faulting<br>executions of<br>GETSEC/VM<br>entry/VM Exit/<br>MWait will not<br>count as retired<br>instructions. |
|               |                |                        |                                                                                                                                                                                                                                                    |                                                                                                                              |

...

#### Table 19-9 Non-Architectural Performance Events In the Processor Core for Processors



| Event<br>Num. | Umask<br>Value | Event Mask<br>Mnemonic     | Description                                                                                                                                                                                                                                        | Comment                                                                                                                      |
|---------------|----------------|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
|               |                |                            |                                                                                                                                                                                                                                                    |                                                                                                                              |
| 2EH           | 41H            | L3_LAT_CACHE.MISS          | Counts uncore Last Level Cache<br>misses. Because cache hierarchy,<br>cache sizes and other<br>implementation-specific<br>characteristics; value comparison to<br>estimate performance differences is<br>not recommended.                          | see Table 19-1                                                                                                               |
| 2EH           | 4FH            | L3_LAT_CACHE.REFE<br>RENCE | Counts uncore Last Level Cache<br>references. Because cache<br>hierarchy, cache sizes and other<br>implementation-specific<br>characteristics; value comparison to<br>estimate performance differences is<br>not recommended.                      | see Table 19-1                                                                                                               |
|               |                |                            |                                                                                                                                                                                                                                                    |                                                                                                                              |
| СОН           | 00H            | INST_RETIRED.ANY_<br>P     | See Table 19-1<br>Notes: INST_RETIRED.ANY is<br>counted by a designated fixed<br>counter. INST_RETIRED.ANY_P is<br>counted by a programmable counter<br>and is an architectural performance<br>event. Event is supported if<br>CPUID.A.EBX[1] = 0. | Counting:<br>Faulting<br>executions of<br>GETSEC/VM<br>entry/VM Exit/<br>MWait will not<br>count as retired<br>instructions. |
|               |                |                            |                                                                                                                                                                                                                                                    |                                                                                                                              |

#### Based on Intel<sup>®</sup> Microarchitecture Code Name Westmere

...

#### 14. Updates to Chapter 25, Volume 3C

Change bars show changes to Chapter 25 of the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3C: System Programming Guide, Part 3.

------

...

### 25.3 OTHER CAUSES OF VM EXITS

In addition to VM exits caused by instruction execution, the following events can cause VM exits:

• **Exceptions.** Exceptions (faults, traps, and aborts) cause VM exits based on the exception bitmap (see Section 24.6.3). If an exception occurs, its vector (in the range 0–31) is used to select a bit in the exception bitmap. If the bit is 1, a VM exit occurs; if the bit is 0, the exception is delivered normally through the guest IDT. This use of the exception bitmap applies also to exceptions generated by the instructions INT3, INTO, BOUND, and UD2.



Page faults (exceptions with vector 14) are specially treated. When a page fault occurs, a logical processor consults (1) bit 14 of the exception bitmap; (2) the error code produced with the page fault [PFEC]; (3) the page-fault error-code mask field [PFEC\_MASK]; and (4) the page-fault error-code match field [PFEC\_MATCH]. It checks if PFEC & PFEC\_MASK = PFEC\_MATCH. If there is equality, the specification of bit 14 in the exception bitmap is followed (for example, a VM exit occurs if that bit is set). If there is inequality, the meaning of that bit is reversed (for example, a VM exit occurs if that bit is clear).

Thus, if software desires VM exits on all page faults, it can set bit 14 in the exception bitmap to 1 and set the page-fault error-code mask and match fields each to 00000000H. If software desires VM exits on no page faults, it can set bit 14 in the exception bitmap to 1, the page-fault error-code mask field to 0000000H, and the page-fault error-code match field to FFFFFFFH.

- **Triple fault.** A VM exit occurs if the logical processor encounters an exception while attempting to call the double-fault handler and that exception itself does not cause a VM exit due to the exception bitmap. This applies to the case in which the double-fault exception was generated within VMX non-root operation, the case in which the double-fault exception was generated during event injection by VM entry, and to the case in which VM entry is injecting a double-fault exception.
- **External interrupts.** An external interrupt causes a VM exit if the "externalinterrupt exiting" VM-execution control is 1. Otherwise, the interrupt is delivered normally through the IDT. (If a logical processor is in the shutdown state or the waitfor-SIPI state, external interrupts are blocked. The interrupt is not delivered through the IDT and no VM exit occurs.)
- Non-maskable interrupts (NMIs). An NMI causes a VM exit if the "NMI exiting" VM-execution control is 1. Otherwise, it is delivered using descriptor 2 of the IDT. (If a logical processor is in the wait-for-SIPI state, NMIs are blocked. The NMI is not delivered through the IDT and no VM exit occurs.)
- **INIT signals.** INIT signals cause VM exits. A logical processor performs none of the operations normally associated with these events. Such exits do not modify register state or clear pending events as they would outside of VMX operation. (If a logical processor is in the wait-for-SIPI state, INIT signals are blocked. They do not cause VM exits in this case.)
- Start-up IPIs (SIPIs). SIPIs cause VM exits. If a logical processor is not in the wait-for-SIPI activity state when a SIPI arrives, no VM exit occurs and the SIPI is discarded. VM exits due to SIPIs do not perform any of the normal operations associated with those events: they do not modify register state as they would outside of VMX operation. (If a logical processor is not in the wait-for-SIPI state, SIPIs are blocked. They do not cause VM exits in this case.)
- Task switches. Task switches are not allowed in VMX non-root operation. Any attempt to effect a task switch in VMX non-root operation causes a VM exit. See Section 25.6.2.
- System-management interrupts (SMIs). If the logical processor is using the dual-monitor treatment of SMIs and system-management mode (SMM), SMIs cause SMM VM exits. See Section 33.15.2.<sup>1</sup>
- **VMX-preemption timer.** A VM exit occurs when the timer counts down to zero. See Section 25.7.1 for details of operation of the VMX-preemption timer.
- Under the dual-monitor treatment of SMIs and SMM, SMIs also cause SMM VM exits if they occur in VMX root operation outside SMM. If the processor is using the default treatment of SMIs and SMM, SMIs are delivered as described in Section 33.14.1.



Debug-trap exceptions and higher priority events take priority over VM exits caused by the VMX-preemption timer. VM exits caused by the VMX-preemption timer take priority over VM exits caused by the "NMI-window exiting" VM-execution control and lower priority events.

These VM exits wake a logical processor from the same inactive states as would a non-maskable interrupt. Specifically, they wake a logical processor from the shutdown state and from the states entered using the HLT and MWAIT instructions. These VM exits do not occur if the logical processor is in the wait-for-SIPI state.

...

#### 25.7.1 VMX-Preemption Timer

If the last VM entry was performed with the 1-setting of "activate VMX-preemption timer" VM-execution control, the **VMX-preemption timer** counts down (from the value loaded by VM entry; see Section 26.6.4) in VMX non-root operation. When the timer counts down to zero, it stops counting down and a VM exit occurs (see Section 25.3).

The VMX-preemption timer counts down at rate proportional to that of the timestamp counter (TSC). Specifically, the timer counts down by 1 every time bit X in the TSC changes due to a TSC increment. The value of X is in the range 0-31 and can be determined by consulting the VMX capability MSR IA32\_VMX\_MISC (see Appendix A.6).

The VMX-preemption timer operates in the C-states C0, C1, and C2; it also operates in the shutdown and wait-for-SIPI states. If the timer counts down to zero in any state other than the wait-for SIPI state, the logical processor transitions to the C0 C-state and causes a VM exit; the timer does not cause a VM exit if it counts down to zero in the wait-for-SIPI state. The timer is not decremented in C-states deeper than C2.

...

#### 15. Updates to Chapter 26, Volume 3C

Change bars show changes to Chapter 26 of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 3C:* System Programming Guide, Part 3.

\_\_\_\_\_

...

#### 26.2.1.1 VM-Execution Control Fields

VM entries perform the following checks on the VM-execution control fields:<sup>1</sup>

- Reserved bits in the pin-based VM-execution controls must be set properly. Software may consult the VMX capability MSRs to determine the proper settings (see Appendix A.3.1).
- Reserved bits in the primary processor-based VM-execution controls must be set properly. Software may consult the VMX capability MSRs to determine the proper settings (see Appendix A.3.2).
- If the "activate secondary controls" primary processor-based VM-execution control is 1, reserved bits in the secondary processor-based VM-execution controls must be
- 1. If the "activate secondary controls" primary processor-based VM-execution control is 0, VM entry operates as if each secondary processor-based VM-execution control were 0.



cleared. Software may consult the VMX capability MSRs to determine which bits are reserved (see Appendix A.3.3).

If the "activate secondary controls" primary processor-based VM-execution control is 0 (or if the processor does not support the 1-setting of that control), no checks are performed on the secondary processor-based VM-execution controls. The logical processor operates as if all the secondary processor-based VM-execution controls were 0.

- The CR3-target count must not be greater than 4. Future processors may support a different number of CR3-target values. Software should read the VMX capability MSR IA32\_VMX\_MISC to determine the number of values supported (see Appendix A.6).
- If the "use I/O bitmaps" VM-execution control is 1, bits 11:0 of each I/O-bitmap address must be 0. Neither address should set any bits beyond the processor's physical-address width.<sup>1,2</sup>
- If the "use MSR bitmaps" VM-execution control is 1, bits 11:0 of the MSR-bitmap address must be 0. The address should not set any bits beyond the processor's physical-address width.<sup>3</sup>
- If the "use TPR shadow" VM-execution control is 1, the virtual-APIC address must satisfy the following checks:
  - Bits 11:0 of the address must be 0.
  - $-\,$  The address should not set any bits beyond the processor's physical-address width.  $^4$

If all of the above checks are satisfied and the "use TPR shadow" VM-execution control is 1, bytes 81H-83H on the virtual-APIC page (see Section 24.6.8) may be cleared (behavior may be implementation-specific).

The clearing of these bytes may occur even if the VM entry fails. This is true either if the failure causes control to pass to the instruction following the VM-entry instruction or if it causes processor state to be loaded from the host-state area of the VMCS.

- If the "use TPR shadow" VM-execution control is 1, bits 31:4 of the TPR threshold VM-execution control field must be 0.
- The following check is performed if the "use TPR shadow" VM-execution control is 1 and the "virtualize APIC accesses" VM-execution control is 0: the value of bits 3:0 of the TPR threshold VM-execution control field should not be greater than the value of bits 7:4 in byte 80H on the virtual-APIC page (see Section 24.6.8).
- If the "NMI exiting" VM-execution control is 0, the "virtual NMIs" VM-execution control must be 0.
- If the "virtual NMIs" VM-execution control is 0, the "NMI-window exiting" VMexecution control must be 0.

- 2. If IA32\_VMX\_BASIC[48] is read as 1, these addresses must not set any bits in the range 63:32; see Appendix A.1.
- 3. If IA32\_VMX\_BASIC[48] is read as 1, this address must not set any bits in the range 63:32; see Appendix A.1.
- 4. If IA32\_VMX\_BASIC[48] is read as 1, this address must not set any bits in the range 63:32; see Appendix A.1.

<sup>1.</sup> Software can determine a processor's physical-address width by executing CPUID with 80000008H in EAX. The physical-address width is returned in bits 7:0 of EAX.



- If the "virtualize APIC-accesses" VM-execution control is 1, the APIC-access address must satisfy the following checks:
  - Bits 11:0 of the address must be 0.
  - $-\,$  The address should not set any bits beyond the processor's physical-address width.  $^1$

•••

#### **16.** Updates to Chapter 27, Volume 3C

Change bars show changes to Chapter 27 of the Intel<sup>®</sup> 64 and IA-32 Architectures Software Developer's Manual, Volume 3C: System Programming Guide, Part 3.

\_\_\_\_\_

•••

#### 27.2.4 Information for VM Exits Due to Instruction Execution

Section 24.9.4 defined fields containing information for VM exits that occur due to instruction execution. (The VM-exit instruction length is also used for VM exits that occur during the delivery of a software interrupt or software exception.) The following items detail their use.

- VM-exit instruction length. This field is used in the following cases:
  - For fault-like VM exits due to attempts to execute one of the following instructions that cause VM exits unconditionally (see Section 25.1.2) or based on the settings of VM-execution controls (see Section 25.1.3): CLTS, CPUID, GETSEC, HLT, IN, INS, INVD, INVEPT, INVLPG, INVPCID, INVVPID, LGDT, LIDT, LLDT, LMSW, LTR, MONITOR, MOV CR, MOV DR, MWAIT, OUT, OUTS, PAUSE, RDMSR, RDPMC, RDRAND, RDTSC, RDTSCP, RSM, SGDT, SIDT, SLDT, STR, VMCALL, VMCLEAR, VMLAUNCH, VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON, WBINVD, WRMSR, and XSETBV.<sup>2</sup>
  - For VM exits due to software exceptions (those generated by executions of INT3 or INTO).
  - For VM exits due to faults encountered during delivery of a software interrupt, privileged software exception, or software exception.
  - For VM exits due to attempts to effect a task switch via instruction execution. These are VM exits that produce an exit reason indicating task switch and either of the following:
    - An exit qualification indicating execution of CALL, IRET, or JMP instruction.
    - An exit qualification indicating a task gate in the IDT and an IDT-vectoring information field indicating that the task gate was encountered during

<sup>1.</sup> If IA32\_VMX\_BASIC[48] is read as 1, this address must not set any bits in the range 63:32; see Appendix A.1.

This item applies only to fault-like VM exits. It does not apply to trap-like VM exits following executions of the MOV to CR8 instruction when the "use TPR shadow" VM-execution control is 1 or to those following executions of the WRMSR instruction when the "virtualize x2APIC mode" VM-execution control is 1.



delivery of a software interrupt, privileged software exception, or software exception.

- For APIC-access VM exits resulting from linear accesses (see Section 25.2.1) and encountered during delivery of a software interrupt, privileged software exception, or software exception.<sup>1</sup>
- For VM exits due executions of VMFUNC that fail because one of the following is true:
  - EAX indicates a VM function that is not enabled (the bit at position EAX is 0 in the VM-function controls; see Section 25.7.4.2).
  - EAX = 0 and either ECX  $\ge$  512 or the value of ECX selects an invalid tentative EPTP value (see Section 25.7.4.3).

In all the above cases, this field receives the length in bytes (1-15) of the instruction (including any instruction prefixes) whose execution led to the VM exit (see the next paragraph for one exception).

The cases of VM exits encountered during delivery of a software interrupt, privileged software exception, or software exception include those encountered during delivery of events injected as part of VM entry (see Section 26.5.1.2). If the original event was injected as part of VM entry, this field receives the value of the VM-entry instruction length.

All VM exits other than those listed in the above items leave this field undefined.

•••

#### 17. Update to Volume 3C

Chapter 29 and chapter 33 were swapped in the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 3C:* System Programming Guide, Part 3.

#### 18. Updates to Chapter 33, Volume 3C

Change bars show changes to Chapter 33 of the *Intel*<sup>®</sup> 64 and *IA-32 Architectures Software Developer's Manual, Volume 3C:* System Programming Guide, Part 3.

\_\_\_\_\_

•••

#### 33.4.2 SMRAM Caching

An IA-32 processor does not automatically write back and invalidate its caches before entering SMM or before exiting SMM. Because of this behavior, care must be taken in the placement of the SMRAM in system memory and in the caching of the SMRAM to prevent cache incoherence when switching back and forth between SMM and protected mode operation. Either of the following three methods of locating the SMRAM in system memory will guarantee cache coherency:

- Place the SRAM in a dedicated section of system memory that the operating system and applications are prevented from accessing. Here, the SRAM can be designated as
- 1. The VM-exit instruction-length field is not defined following APIC-access VM exits resulting from physical accesses (see Section 25.2.3) even if encountered during delivery of a software interrupt, privileged software exception, or software exception.



cacheable (WB, WT, or WC) for optimum processor performance, without risking cache incoherence when entering or exiting SMM.

- Place the SRAM in a section of memory that overlaps an area used by the operating system (such as the video memory), but designate the SMRAM as uncacheable (UC). This method prevents cache access when in SMM to maintain cache coherency, but the use of uncacheable memory reduces the performance of SMM code.
- Place the SRAM in a section of system memory that overlaps an area used by the operating system and/or application code, but explicitly flush (write back and invalidate) the caches upon entering and exiting SMM mode. This method maintains cache coherency, but incurs the overhead of two complete cache flushes.

...

### 33.5 SMI HANDLER EXECUTION ENVIRONMENT

After saving the current context of the processor, the processor initializes its core registers to the values shown in Table 33-4. Upon entering SMM, the PE and PG flags in control register CR0 are cleared, which places the processor in an environment similar to real-address mode. The differences between the SMM execution environment and the real-address mode execution environment are as follows:

- The addressable SMRAM address space ranges from 0 to FFFFFFFH (4 GBytes). (The physical address extension — enabled with the PAE flag in control register CR4 — is not supported in SMM.)
- The normal 64-KByte segment limit for real-address mode is increased to 4 GBytes.
- The default operand and address sizes are set to 16 bits, which restricts the addressable SMRAM address space to the 1-MByte real-address mode limit for native real-address-mode code. However, operand-size and address-size override prefixes can be used to access the address space beyond the 1-MByte.

•••