--- title: xvisor categories: embedded, arm, armv8, arm64, xvisor, hypervisor, virtualization ... 協作者 ===================================== * 2015 年春季 - 沈宗穎, 李育丞, 蘇誌航, 張仁傑, 鄧維岱 Hackpad ===================================== * `xvisor`_ * `design doc整理`_ * `報告整理區`_ 虛擬化技術 (Virtualization) ===================================== 以下簡介虛擬化技術,並以Xvisor on `ARMv8`_ 為例探討虛擬化技術的實作 Hypervisor (virtual machine monitor) -------------------------------------------- - Hypervisor (Virtual Machine Monitor)的功用是去管理Virtual Machine (VM) - Host Machine: 運行Hypervisor的實體主機,但有時運行在Hypervisor之上 - Guest Machine: 運行在Hypervisor之上的虛擬主機 .. image:: http://electronicdesign.com/site-files/electronicdesign.com/files/archive/electronicdesign.com/files/29/20211/fig_01.jpg - 使用時機: - 工作負載整合 (Workload Consolidation) - 支援舊有軟體 (Legacy Software) - 啟用多核 (Multicore Enablement) - 提高可靠性 (Improved Reliability) - 安全監控 (Secure Monitoring) Hypervisor的類型 -------------------------------------------- - Type-1: 本地(native)、裸機(bare-metal) hypervisors - Hypervisor直接運行在host的硬體上,並直接控制硬體及管理Guest作業系統,此時host把Guest作業系統當成一個process - 如: `XenServer`_ 、 `Hyper-V`_ 、Xvisor - Type-2: 託管(hosted) hypervisors - Hypervisor運行在host的作業系統上,再去提供虛擬化服務 - 如: `VMware`_、`VirtualBox`_ - 比較: - Type-1: 有較高的安全性及可靠性,但因直接運行於硬體上,所以時常為single purpose - Type-2: 支援較多的I/O device及服務,且較易安裝及使用,但效能較Type-1低,常應用於效率較不重要的客戶端 虛擬化指令(ARM) -------------------------------------------- 虛擬化定理 (Virtualization theorems) ................................................................................ - `Popek and Goldberg virtualization requirements`_ - 虛擬化需要滿足: - 等價性 (Equivalence): 在hypervisor下執行的程式行為必須與直接跑在machine上相同。 - 資源控制 (Resource control): hypervisor必須完全控制虛擬化資源。 - 效率 (Efficiency): 統計上經常使用的機器指令hypervisor不應該介入。 - 定理:對於任何傳統的第三代計算機,只要其敏感指令是特權指令的一個子集,就可以為其建立VMM (from Wiki) - 原本由OS在kernel mode執行的敏感指令,因OS被移到user mode而無法正常執行,所以需要被trap 給 hypervisor來執行。 - 特權指令 (Privileged instructions): 若執行在user mode 會觸發trap - 控制敏感指令 (Control sensitive instructions): 會改變處理器組態或模式的指令 - 行為敏感指令 (Behavior sensitive instructions): 其行為取決於處理器的狀態 問題指令(Problematic Instructions) .................................... - Type I: 在user mode執行會產生未定義的指令異常 - MCR、MRC: 需要依賴協處理器(coprocessor) - Type II: 在user mode執行會沒有作用 - MSR、MRS: 需要操作系統暫存器 - Type III: 在user mode執行會產生不可預測的行為 - MOVS PC, LR: 返回指令,改變PC並跳回user mode,在user mode執行會產生不可預測的結果 - ARM 的敏感指令: - 存取協處理器: MRC / MCR / CDP / LDC / STC - 存取SIMD/VFP 系統暫存器: VMRS / VMSR - 進入TrustZone 安全狀態: SMC - 存取 Memory-Mapped I/O: Load/Store instructions from/into memory-mapped I/O locations - 直接存取CPSR: MRS / MSR / CPS / SRS / RFE / LDM (conditional execution) / DPSPC - 間接存取CPSR: LDRT / STRT – Load/Store Unprivileged (“As User”) - 存取Banked Register: LDM / STM Solutions .................................... - 軟體技術: trap and emulate - Dynamic Binary Translation - 把問題指令取代為hypercall以進行trap 及emulate - Hypercall - 對type and original instruction bits進行編碼 - trap 到 hypervisor,進行解碼及模擬指令 - 硬體技術: - 特權指令轉換(待補) - MMU強制執行trap - 虛擬化擴充 記憶體虛擬化(without hardware support) ............................................. - Shadow page tables: - Map guest virtual address to host physical address - Guest OS maintain自己的page table到 guest實體記憶體框架 - hypervisor 把所有的guest實體記憶體框架 map 到host實體記憶體框架 .. image:: /embedded/shadow_page_table.png (from System Virtualization Memory Virtualization - 國立清華大學) - 為每一個guest page table 建立 Shadow page table - hypervisor要保護放著guest page table的host frame .. image:: /embedded/write_protect.png (from System Virtualization Memory Virtualization - 國立清華大學) ARM Virtualization Extensions -------------------------------------------- 可參考:`ARMv8#虛擬化`_ (暫存器待補) CPU virtualization ................................... - ARM 增加運行在Non-secure privilege level 2 的 Hypervisor mode - CPU 虛擬化擴充 - Guest OS kernel執行在EL1,userspace執行在EL0 - 使大部分的敏感指令可以本地執行(native-run)在EL1上而不必trap及emulation - 而仍需要trap的敏感指令會被trap到EL2 (hypervisor mode HYP) - Guest OS's Load/Store - 會影響其他Guest OS的指令 - Hypervisor Syndrome Register(HSR) 會保存被trapped的指令的資訊,因此hypervisor就能emulate它 * `Xvisor- cpu_vcpu_helper.c`_: .. code-block:: c /* Initialize Hypervisor Configuration */ INIT_SPIN_LOCK(&arm_priv(vcpu)->hcr_lock); arm_priv(vcpu)->hcr = (HCR_TSW_MASK | HCR_TACR_MASK | HCR_TIDCP_MASK | HCR_TSC_MASK | HCR_TWE_MASK | HCR_TWI_MASK | HCR_AMO_MASK | HCR_IMO_MASK | HCR_FMO_MASK | HCR_SWIO_MASK | HCR_VM_MASK); 將EL1的敏感指令(MCR、MRC、SMC、WFE、WFI)及中斷(irq、fiq)trap到EL2,並啟動stage 2 address translation Xvisor instruction emulate ............................................................ cpu_entry.S 內初始化Hyp vector base .. code-block:: c vectors: ventry hyp_sync_invalid /* Synchronous EL1t */ ventry hyp_irq_invalid /* IRQ EL1t */ ventry hyp_fiq_invalid /* FIQ EL1t */ ventry hyp_error_invalid /* Error EL1t */ ventry hyp_sync /* Synchronous EL1h */ ventry hyp_irq /* IRQ EL1h */ ventry hyp_fiq_invalid /* FIQ EL1h */ ventry hyp_error_invalid /* Error EL1h */ ........ EXCEPTION_HANDLER hyp_sync PUSH_REGS mov x1, EXC_HYP_SYNC_SPx CALL_EXCEPTION_CFUNC do_sync PULL_REGS /* * .macro PUSH_REGS * sub sp, sp, #0x20 * push x28, x29 * push x26, x27 * push x24, x25 * push x22, x23 * ...... * push x0, x1 * add x21, sp, #0x110 * mrs x22, elr_el2 * mrs x23, spsr_el2 * stp x30, x21, [sp, #0xF0] * stp x22, x23, [sp, #0x100] * * * .macro CALL_EXCEPTION_CFUNC cfunc * mov x0, sp x0 放下面 arch_regs_t *regs 的參數 * bl \cfunc * .endm */ - ESR_EL2, Exception Syndrome Register: 保存跳到EL2的exception的syndrome 資訊 cpu_interrupt.c (請參考手冊p.1905頁對照ESR的編碼) .. code-block:: c void do_sync(arch_regs_t *regs, unsigned long mode) { ....... esr = mrs(esr_el2); far = mrs(far_el2); elr = mrs(elr_el2); ec = (esr & ESR_EC_MASK) >> ESR_EC_SHIFT; il = (esr & ESR_IL_MASK) >> ESR_IL_SHIFT; iss = (esr & ESR_ISS_MASK) >> ESR_ISS_SHIFT; ....... switch (ec) { case EC_UNKNOWN: /* We dont expect to get this trap so error */ rc = VMM_EFAIL; break; case EC_TRAP_WFI_WFE: /* WFI emulation */ rc = cpu_vcpu_emulate_wfi_wfe(vcpu, regs, il, iss); break; case EC_TRAP_MCR_MRC_CP15_A32: /* MCR/MRC CP15 emulation */ rc = cpu_vcpu_emulate_mcr_mrc_cp15(vcpu, regs, il, iss); break; ......... break; case EC_TRAP_HVC_A64: /* HVC emulation for A64 guest */ rc = cpu_vcpu_emulate_hvc64(vcpu, regs, il, iss); break; case EC_TRAP_MSR_MRS_SYSTEM: /* MSR/MRS/SystemRegs emulation */ rc = cpu_vcpu_emulate_msr_mrs_system(vcpu, regs, il, iss); break; case EC_TRAP_LWREL_INST_ABORT: /* Stage2 instruction abort */ fipa = (mrs(hpfar_el2) & HPFAR_FIPA_MASK) >> HPFAR_FIPA_SHIFT; fipa = fipa << HPFAR_FIPA_PAGE_SHIFT; fipa = fipa | (mrs(far_el2) & HPFAR_FIPA_PAGE_MASK); rc = cpu_vcpu_inst_abort(vcpu, regs, il, iss, fipa); break; case EC_TRAP_LWREL_DATA_ABORT: /* Stage2 data abort */ fipa = (mrs(hpfar_el2) & HPFAR_FIPA_MASK) >> HPFAR_FIPA_SHIFT; fipa = fipa << HPFAR_FIPA_PAGE_SHIFT; fipa = fipa | (mrs(far_el2) & HPFAR_FIPA_PAGE_MASK); rc = cpu_vcpu_data_abort(vcpu, regs, il, iss, fipa); break; 最後呼叫 emulate_arm.c 或 cpu_vcpu_emulate.c內相對應的函式做指令模擬 Memory virtualization ................................... 請參考: `armv8 virtual-memory-system-architecture `_ - ARM 增加 Intermediate Physical Address,使得Guest OS不能直接存取實體位址(physical address) - 二階位址轉換 two stage address translation: => 實體位址(physical address) - 第一階段: 虛擬位址(virtual address) => 中間實體位址(Intermediate physical address) - 由Guest OS控制,並認為IPA就是PA - 第二階段: 中間實體位址(Intermediate physical address) => 實體位址(physical address) - 由hypervisor控制 I/O device virtualization -------------------------------------------- - ARM 增加 Virtual Generic Interrupt Controller 介面去執行interrupt virtio ............................. Xvisor ========================================= booting 流程 ------------------------ 可參考 `AJ NOTE`_ - 需要在MMU啟動前在virtual space下啟動Xvisor - 在assembly time時, 把img加入至.text,使用.incbin .. image:: /embedded/xvisor_memory.png * `cpu_entry.S`_ .. code-block:: c _start_mmu_init: /* Setup SP as-per load address */ ldr x0, __hvc_stack_end mov sp, x0 sub sp, sp, x6 add sp, sp, x4 ......... bl _setup_initial_ttbl 初始化translation table * `mmu_lpae_entry_ttbl.c`_ .. code-block:: c void __attribute__ ((section(".entry"))) _setup_initial_ttbl(virtual_addr_t load_start, virtual_addr_t load_end, virtual_addr_t exec_start, virtual_addr_t exec_end) { .......... lpae_entry.ttbl_base = to_load_pa((virtual_addr_t)&def_ttbl); /* def_ttbl之後要放到 ttbr0_el2(Translation Base Register) 裡*/ lpae_entry.next_ttbl = (u64 *)lpae_entry.ttbl_base; .......... /* Map physical = logical * Note: This mapping is using at boot time only */ __setup_initial_ttbl(&lpae_entry, load_start, load_end, load_start, AINDEX_NORMAL_WB, TRUE); /* Map to logical addresses which are * covered by read-only linker sections * Note: This mapping is used at runtime */ SETUP_RO_SECTION(lpae_entry, text); SETUP_RO_SECTION(lpae_entry, init); SETUP_RO_SECTION(lpae_entry, cpuinit); SETUP_RO_SECTION(lpae_entry, spinlock); SETUP_RO_SECTION(lpae_entry, rodata); /* Map rest of logical addresses which are * not covered by read-only linker sections * Note: This mapping is used at runtime */ __setup_initial_ttbl(&lpae_entry, exec_start, exec_end, load_start, AINDEX_NORMAL_WB, TRUE); } void __attribute__ ((section(".entry"))) __setup_initial_ttbl(struct mmu_lpae_entry_ctrl *lpae_entry, virtual_addr_t map_start, virtual_addr_t map_end, virtual_addr_t pa_start, u32 aindex, bool writeable) { ........ u64 *ttbl; /* align start addresses */ map_start &= TTBL_L3_MAP_MASK; /* 0xFFFFFFFFFFFFF000ULL 後面12的bit 直接map 到 output*/ pa_start &= TTBL_L3_MAP_MASK; page_addr = map_start; while (page_addr < map_end) { /* Setup level1 table */ ttbl = (u64 *) lpae_entry->ttbl_base; index = (page_addr & TTBL_L1_INDEX_MASK) >> TTBL_L1_INDEX_SHIFT; if (ttbl[index] & TTBL_VALID_MASK) { /* Find level2 table */ ttbl = (u64 *) (unsigned long)(ttbl[index] & TTBL_OUTADDR_MASK); } else { /* Allocate new level2 table */ if (lpae_entry->ttbl_count == TTBL_INITIAL_TABLE_COUNT) { while (1) ; /* No initial table available */ } for (i = 0; i < TTBL_TABLE_ENTCNT; i++) { lpae_entry->next_ttbl[i] = 0x0ULL; } lpae_entry->ttbl_tree[lpae_entry->ttbl_count] = ((virtual_addr_t) ttbl - lpae_entry->ttbl_base) >> TTBL_TABLE_SIZE_SHIFT; lpae_entry->ttbl_count++; ttbl[index] |= (((virtual_addr_t) lpae_entry->next_ttbl) & TTBL_OUTADDR_MASK); ttbl[index] |= (TTBL_TABLE_MASK | TTBL_VALID_MASK); ttbl = lpae_entry->next_ttbl; lpae_entry->next_ttbl += TTBL_TABLE_ENTCNT; } /* Setup level2 table */ index = (page_addr & TTBL_L2_INDEX_MASK) >> TTBL_L2_INDEX_SHIFT; if (ttbl[index] & TTBL_VALID_MASK) { /* Find level3 table */ ttbl = (u64 *) (unsigned long)(ttbl[index] & TTBL_OUTADDR_MASK); } else { /* Allocate new level3 table */ ...... } /* Setup level3 table */ index = (page_addr & TTBL_L3_INDEX_MASK) >> TTBL_L3_INDEX_SHIFT; if (!(ttbl[index] & TTBL_VALID_MASK)) { /* Update level3 table */ ....... } /* Point to next page */ page_addr += TTBL_L3_BLOCK_SIZE; } } 設計文件整理 ===================================== 原文: https://github.com/xvisor/xvisor/blob/master/docs/DesignDoc Modeling Virtual Machine -------------------------------------------- * 何謂VM(virtual machine),通常分為兩種 - system virtual machine: support the execution of a complete OS - process virtual machine: support a single process * Xvisor為硬體系統的虛擬化軟體,可直接運行於主機機器,為Native/Type-1的Hypervisor/VMM system virtual machine通常稱為guest,而guest裏的CPU稱為VCPU(Virtual CPU),VCPU又可分為兩種 - 屬於Guest的稱為 Normal VCPU - 不屬於任何Guest的稱為 Orphan VCPU (Orphan VCPU是為了不同的背景程序及運行中的管理daemon而建立的) * 當今CPU至少有兩種privilege mode: - User mode為最低特權,運行Normal VCPUs - Supervisor mode為最高特權,運行Orphan VCPUs * Xvisor當執行various background process和執行management daemons使用Orphan VCPUs 下圖為Xvisor的System Virtual Machine Model .. image:: /embedded/xvisor_model.png reference ===================================== * http://www.slideshare.net/jserv/xvisor * https://github.com/xvisor/xvisor/blob/master/docs/DesignDoc * https://samlin.hackpad.com/Xvisor--chtGSqPYWG8 * http://www.slideshare.net/badaindonesia/linux-on-arm-64bit-architecture?related=1 * `ARMv8-A_Architecture_Reference_Manual_(Issue_A.a) (需登入)`_ * `A Virtualization Infrastructure that Supports Pervasive Computing.`_ * `A Choices Hypervisor on the ARM architecture`_ * `Extensions to the ARMv7-A architecture`_ * `前瞻 資訊科技 - 虛擬化 (1) - Virtualization( V12N ). 薛智文 教授`_ * `前瞻 資訊科技 - 虛擬化 (2) - Virtualization( V12N ). 薛智文 教授`_ * ` An Overview of Microkernel, Hypervisor and Microvisor Virtualization Approaches for Embedded Systems, Asif Iqbal, Nayeema Sadeque and Rafika Ida Mutia, Lund University, Sweden`_ * `Hardware accelerated Virtualization in the ARM Cortex™ Processors`_ * `Popek and Goldberg virtualization requirements`_ * `Syllabus for CS5410 Virtualization Techniques (國立清華大學)`_ (內有課程簡報)