版本 a156472d26ebefc8422575b0d2cc7d06a55a9b78

uClinux

組員

竹內宏輝
陳勁龍
賀祐農

Hackpad<https://hackpad.com/UClinux-EEaoU4qOP12>_

uClinux架構&特性

與linux架構相同，主要的差異在MMU。

缺乏記憶體管理

在uClinux中沒有MMU機制，所以沒辦法實作虛擬記憶體( virtual memory )，故也不支援swap和分頁(paging)的機制，

所以必須在硬體啟動時，就把所有的行程(process)所需的記憶體分配好。

一個行程在執行前，系統必須為行程分配足夠的連續位址空間，然後全部載入到memory的連續空間中。

記憶體分配

在uClinux中若使用 power of 2 memory allocation來分配記憶體的話，會造成大量記憶體被浪費。

(如果一個process 需要 33KB ，以 power of 2 memory allocation 的方式分配的話，系統會給此process 64KB的空間)

所以就必須使用page_alloc2 或 kmalloc2，以節省記憶體。

page_alloc2 是以4KB為單位，如果一個process 需要 33KB ，系統會給此process 36KB的空間。

kmalloc2 也是以4KB為單位，但是以8KB來判斷。大於 8k 的需求從 free memory 的底部拿，小於 8k 的從 free memory 的起始處拿。

這樣可以減少由短暫而多次的分配，而造成無法利用的碎片記憶體區段。

kmalloc(include/linux/slob.h)

.. code-block:: c

   static __always_inline void *kmalloc(size_t size, gfp_t flags)
   {
       return __kmalloc_node(size, flags, -1);
   }

kmalloc_node(include/linux/slab_def.h line: 183)

.. code-block:: c

     static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
     {
             struct kmem_cache *cachep;
             void *ret;

             if (__builtin_constant_p(size)) {
                     int i = 0;
                     if (!size)
                             return ZERO_SIZE_PTR;

     #define CACHE(x) \
           if (size <= x) \
              goto found; \
           else \
              i++;
     #include <linux/kmalloc_sizes.h>
     #undef CACHE
          return NULL;
     found:
     #ifdef CONFIG_ZONE_DMA
          if (flags & GFP_DMA)
              cachep = malloc_sizes[i].cs_dmacachep;
          else
     #endif
          cachep = malloc_sizes[i].cs_cachep;

          ret = kmem_cache_alloc_node_notrace(cachep, flags, node);

          trace_kmalloc_node(_THIS_IP_, ret,
                             size, slab_buffer_size(cachep),
                             flags, node);
          return ret;
             }
            return __kmalloc_node(size, flags, node);
     }

沒有brk/sbrk()

沒有VM(virtual memory)，故無法被有效實作，因為uClinux開機分配記憶體是以連續的方式，而且沒有VM的機制，

故很少有機會直接將記憶體區段指標直接向外擴展，只能呼叫mmap system call來實作。

這樣不如直接使用mmap，而非實作一個brk去呼叫mmap。

所以只能從全域記憶體(Kernel free memory pool)直接分配。

這樣做可以節省記憶體使用量，因為當行程需要記憶體的時候，系統才會分配，

而且在行程用完後，會將記憶體還給全域記憶體。(相對於使用pre-allocated heap system)

vfork

因為uClinux沒有VM(virtual memory)，所以要實作fork的功能只能用vfork()，也就是說parent process 和 child process 是共享同一個記憶體。

parent process 在初始化私有(private)的資料和建立新的task control block 後，進入suspend 狀態。

而child process 則取代現在的程式，到執行結束後把parent process 喚醒。

- 描述

   fork和vfork擁有相同的效果，但如果下列情況發生，vfork的行為無法被定義。

   1.修改用來回傳型態為pid_t以外的資料

   2.成功呼叫calling _exit(2)或是exec(3)系列的程式前，呼叫任何其他的程式

   3.從有使用vfork的程式中傳回

檔案系統名詞解釋

Superblock

為每個檔案系統開始的位置，其儲存資訊像是檔案系統的大小，空的和填滿的區塊，它們各自的總數和其他諸如此類的資料。

要從一個檔案系統中存取任何檔案皆須經過檔案系統中之superblock。

如果superblock損壞了，可能無法從磁碟中去取得資料。
Inode

每個檔案或目錄由inode來表示，inode儲存的資訊包含有關大小、權限、所有權和硬碟上的檔案或目錄的位置資料。

uClinux的檔案系統:Romfs

由於沒辦法實作虛擬記憶體，所以必須採用romfs，因為它可以保證將檔案已連續的方式存放。

*特性

唯讀的檔案系統
無法寫入時複製(copy-on-write)

有多個呼叫者(callers)同時要求相同資源，他們會共同取得相同的指標指向相同的資源，直到某個呼叫者(caller)嘗試修改資源時。

系統才會真正複製一個副本(private copy)給該呼叫者，以避免被修改的資源被直接察覺到。

但因為uClinux沒有fork，故無法使用。
XIP(Execute in place)

程式直接在flash上執行，而不必搬到RAM上。可以減少memory的使用，但執行速度較慢。

下圖為XIP示意圖(來源<http://netlab.cse.yzu.edu.tw/~bpmania/%C5%F8%C0Y/%AD%D7%BD%D2/951%20%B4O%A4J%A6%A1%A7Y%AE%C9%A7@%B7~%A8t%B2%CE/%C1%BF%B8q/03%20uCLinux%20%C2%B2%A4%B6.pdf>_)

.. image:: /embedded/XIP.jpg

xip_file_read(定義在uclinux/mm/filemap_xip.c)

.. code-block:: c

            ssize_t xip_file_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
            {
                    if (!access_ok(VERIFY_WRITE, buf, len))   // if the file can't access,return fault.
                            return -EFAULT;

                    return do_xip_mapping_read(filp->f_mapping, &filp->f_ra, filp,  //mapping to the memory
                        buf, len, ppos);
            }

do_xip_mapping_read

.. code-block:: c

    do_xip_mapping_read(struct address_space *mapping,
                struct file_ra_state *_ra,
                struct file *filp,
                char __user *buf,
                size_t len,
                loff_t *ppos)
    {
            struct inode *inode = mapping->host;
            pgoff_t index, end_index;
            unsigned long offset;
            loff_t isize, pos;
            size_t copied = 0, error = 0;

            BUG_ON(!mapping->a_ops->get_xip_mem);//a_op 為 address_space 中定義可用的method

            pos = *ppos;
            index = pos >> PAGE_CACHE_SHIFT;//PAGE_CACHE_SHIFT 在不同的硬體上值會不同
            offset = pos & ~PAGE_CACHE_MASK;

            isize = i_size_read(inode);
            if (!isize)
                    goto out;

            end_index = (isize - 1) >> PAGE_CACHE_SHIFT;// ex.如果需要讀取的大小為12KB，則只需讀取一次。
            //(PAGE_CACHE_SHIFT 為 12)
            do {
                    unsigned long nr, left;
                    void *xip_mem;
                    unsigned long xip_pfn;
                    int zero = 0;

                    /* nr is the maximum number of bytes to copy from this page */
                    nr = PAGE_CACHE_SIZE;
                    if (index >= end_index) {
                            if (index > end_index)
                                    goto out;
                            nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1;
                            if (nr <= offset) {
                                    goto out;
                            }
                    }
                    nr = nr - offset;
                    if (nr > len - copied)
                            nr = len - copied;

                    error = mapping->a_ops->get_xip_mem(mapping, index, 0,
                                                    &xip_mem, &xip_pfn);
                    if (unlikely(error)) {
                            if (error == -ENODATA) {
                                    /* sparse */
                                    zero = 1;
                            } else
                                    goto out;
                    }
                    if (mapping_writably_mapped(mapping))//if it is writable,return false.
                            /* address based flush */ ;
                    if (!zero)
                            left = __copy_to_user(buf+copied, xip_mem+offset, nr);
                    else
                            left = __clear_user(buf + copied, nr);

                    if (left) {
                            error = -EFAULT;
                            goto out;
                    }
                    //如果可以被寫入，left值為0，即整個page為vaild location
                    copied += (nr - left);
                    offset += (nr - left);
                    index += offset >> PAGE_CACHE_SHIFT;
                    offset &= ~PAGE_CACHE_MASK;
            } while (copied < len);  //如果還沒讀取完，則繼續

    out:
            *ppos = pos + copied;
            if (filp)
                    file_accessed(filp);

            return (copied ? copied : error);
    }

i_size_read

.. code-block:: c

     static inline loff_t i_size_read(const struct inode *inode)
     {
     #if BITS_PER_LONG==32 && defined(CONFIG_SMP)//如果在多核心架構下，即使排程器是允許搶奪機制，也不會影響到其讀取
             loff_t i_size;
             unsigned int seq;

             do {
                     seq = read_seqcount_begin(&inode->i_size_seqcount);
                     i_size = inode->i_size;
             } while (read_seqcount_retry(&inode->i_size_seqcount, seq));
             return i_size;
     #elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)//如果在讀取時被搶奪，有可能造成deadlock
     //如果有一個高權限(須為real time process)的行程去搶低權限的行程，但低權限的擁有某資源的lock，而高權限的也要存取此資源
     //則會形成deadlock(存取low word & high word 中間時被搶奪時發生)
             loff_t i_size;

             preempt_disable();
             i_size = inode->i_size;//因為i_size 為64 bit，如果再64bit CPU架構下，無須分2次存取。
             preempt_enable();
             return i_size;
     #else
             return inode->i_size;
     #endif
     }

romfs結構(來源<http://www.360doc.com/content/10/1130/21/1378815_73855062.shtml>_)

.. image:: /embedded/romfs_struct.jpg
romfs_super_block 結構定義在 include/linux/romfs_fs.h

.. code-block:: c

  struct romfs_super_block {
     __be32   word0;      
     __be32   word1;
     __be32   size;
     __be32   checksum;
     char name[0];//volume name : 使用者給予此系統一個名稱
  };

word0和word1的值是固定的，其值分別為 “-rom” 和 “1fs-”，使系統知道它是romfs檔案系統。

size表示romfs系統合法存取的大小(整個檔案系統的大小)，也就是最後一個檔案的結束位置。

checksum用來檢查檔案的正確性

romfs_checksum定義在 fs/romfs/super.c line:499

romfs_inode結構

.. code-block:: c

   struct romfs_inode { 
      __be32 next; /* low 4 bits see ROMFH_ */ 
      __be32 spec; 
      __be32 size; 
      __be32 checksum; 
      char name[0]; 
    };

spec表示檔案類型定義在include/linux/romfs_fs.h

.. code-block:: c

  #define ROMFH_HRD 0    //永久連結(hard link)
  #define ROMFH_DIR 1    //目錄(directory)
  #define ROMFH_REG 2    //一般(regular file)
  #define ROMFH_LNK 3    //連結(symbolic link)
  #define ROMFH_BLK 4    //block device
  #define ROMFH_CHR 5    //character device 
  #define ROMFH_SCK 6    //socket   
  #define ROMFH_FIF 7    //fifo

hard link 與 symbolic link

hard link是系統中有效的連結，所儲存的內容會指向記憶體中

而 symbolic link，則是把指標指向hard link，所以如果所指向的hard link被刪除，就無法再存取此檔案了。

block device和character device

兩個都是跟IO有關。

block device為固定大小長度來傳送資料且可以隨機存取，如硬碟或是光碟機。

而character device 以不定長度的字元傳送資料且只能循序存取，如終端機、印表機。

EXT2

在uClinux底下仍然有ext2，是為了mount point可以讀寫。

在 EXT2 檔案系統中，目錄是被用來創造並且在檔案系統中保持存取路徑到檔案的特殊檔案。

下圖是目錄的結構 (來源 <http://www.science.unitn.it/~fiorella/guidelinux/tlk/node99.html>_)

.. image:: /embedded/ext2_menu.jpg

ext2_inode示意圖 (來源<http://en.wikipedia.org/wiki/Ext2>_)

.. image:: /embedded/ext2_inode.jpg

Process State

uclinux 的狀態與 linux 相似，其狀態定義在include/linux/sched.h : line 183

.. code-block:: c

#define TASK_RUNNING                0
#define TASK_INTERRUPTIBLE          1
#define TASK_UNINTERRUPTIBLE        2
#define __TASK_STOPPED              4
#define __TASK_TRACED               8
/* in tsk->exit_state */
#define EXIT_ZOMBIE                16
#define EXIT_DEAD                  32
/* in tsk->state again */
#define TASK_DEAD                  64
#define TASK_WAKEKILL             128
#define TASK_WAKING               256
#define TASK_STATE_MAX            512   //如果把各種的state 用or operation 去定義，其值最大為511

下圖為狀態示意圖（來源：<> 任哲樊生文編著）

.. image:: /embedded/status.jpg

process state 切換的條件
- 給予某process時間結束，必須將resource交給其他的process。
- 等待某個裝置或是process回應，而自己進入睡眠。

Scheduler

改變系統schedule方式

定義在kernel/sched.c:line6293

.. code-block:: c

  static int __sched_setscheduler(struct task_struct *p, int policy,
                          struct sched_param *param, bool user)
  {
  int retval, oldprio, oldpolicy = -1, on_rq, running;
  unsigned long flags;
  const struct sched_class *prev_class = p->sched_class;
  struct rq *rq;
  int reset_on_fork;

  /* may grab non-irq protected spin_locks */
  BUG_ON(in_interrupt());
  recheck:  
  /* double check policy once rq lock held */
  if (policy < 0) {
          reset_on_fork = p->sched_reset_on_fork;
          policy = oldpolicy = p->policy;
  } else {
          reset_on_fork = !!(policy & SCHED_RESET_ON_FORK);
          policy &= ~SCHED_RESET_ON_FORK;

          if (policy != SCHED_FIFO && policy != SCHED_RR &&   //只有這五種為有效的排程方式
                          policy != SCHED_NORMAL && policy != SCHED_BATCH &&
                          policy != SCHED_IDLE)
                  return -EINVAL;
  }

  /*
   * Valid priorities for SCHED_FIFO and SCHED_RR are
   * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL,
   * SCHED_BATCH and SCHED_IDLE is 0.
   */
  if (param->sched_priority < 0 ||
      (p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
      (!p->mm && param->sched_priority > MAX_RT_PRIO-1))
          return -EINVAL;
  if (rt_policy(policy) != (param->sched_priority != 0))
          return -EINVAL;

  /*
   * Allow unprivileged RT tasks to decrease priority:
   */
  if (user && !capable(CAP_SYS_NICE)) {
          if (rt_policy(policy)) {
                  unsigned long rlim_rtprio;

                  if (!lock_task_sighand(p, &flags))
                          return -ESRCH;
                  rlim_rtprio = p->signal->rlim[RLIMIT_RTPRIO].rlim_cur;
                  unlock_task_sighand(p, &flags);

                  /* can't set/change the rt policy */
                  if (policy != p->policy && !rlim_rtprio)
                          return -EPERM;

                  /* can't increase priority */
                  if (param->sched_priority > p->rt_priority &&
                      param->sched_priority > rlim_rtprio)
                          return -EPERM;
          }
          /*
           * Like positive nice levels, dont allow tasks to
           * move out of SCHED_IDLE either:
           */
          if (p->policy == SCHED_IDLE && policy != SCHED_IDLE)
                  return -EPERM;

          /* can't change other user's priorities */
          if (!check_same_owner(p))
                  return -EPERM;

          /* Normal users shall not reset the sched_reset_on_fork flag */
          if (p->sched_reset_on_fork && !reset_on_fork)
                  return -EPERM;
  }

  if (user) {
  #ifdef CONFIG_RT_GROUP_SCHED
          /*
           * Do not allow realtime tasks into groups that have no runtime
           * assigned.
           */
          if (rt_bandwidth_enabled() && rt_policy(policy) &&
                          task_group(p)->rt_bandwidth.rt_runtime == 0)
                  return -EPERM;
  #endif

          retval = security_task_setscheduler(p, policy, param);
          if (retval)
                  return retval;
  }

  /*
   * make sure no PI-waiters arrive (or leave) while we are
   * changing the priority of the task:
   */
  raw_spin_lock_irqsave(&p->pi_lock, flags);
  /*
   * To be able to change p->policy safely, the apropriate
   * runqueue lock must be held.
   */
  rq = __task_rq_lock(p);
  /* recheck policy now with rq lock held */
  if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
          policy = oldpolicy = -1;
          __task_rq_unlock(rq);
          raw_spin_unlock_irqrestore(&p->pi_lock, flags);
          goto recheck;
  }
  update_rq_clock(rq);
  on_rq = p->se.on_rq;
  running = task_current(rq, p);
  if (on_rq)
          deactivate_task(rq, p, 0);
  if (running)
          p->sched_class->put_prev_task(rq, p);

  p->sched_reset_on_fork = reset_on_fork;

  oldprio = p->prio;
  __setscheduler(rq, p, policy, param->sched_priority);

  if (running)
          p->sched_class->set_curr_task(rq);
  if (on_rq) {
          activate_task(rq, p, 0);

          check_class_changed(rq, p, prev_class, oldprio, running);
  }
  __task_rq_unlock(rq);
  raw_spin_unlock_irqrestore(&p->pi_lock, flags);

  rt_mutex_adjust_pi(p);

  return 0;
  }

*關於BUG_ON(condition)和BUG()

定義在 include/asm-generic/bug.h 中

.. code-block:: c

    #define BUG_ON(condition) do {if(unlikely(condition))BUG();}while(0);

用來判斷kernel 是否出現問題，如果傳入BUG_ON(condition)的參數為true ，則確認有bug出現。

系統會將BUG訊息印出，然後呼叫panic 。

- BUG()

.. code-block:: c

    #define BUG() do{\
        printk("BUG:failure at %s:%d/%s()!\n",__FILE__,__LINE__,__func__));\
        panic("BUG!");\
    }while(0)

- unlikely()

   用來提升pipeline的效益，此函式代表接下來的程式不太可能被執行。

   所以在compile 時，compiler會調整指令順序，以提升執行效率。其內容定義在GCC的內建函式: __builtin_expect() 中。

   likely()也是相同功能。

Interrupt Handler

當外部中斷發生時，硬體會自動將被中斷程式的下一條指令位址保存到stack裏並且關閉中斷，接着把自中斷向量表找尋到的對應的中斷服務程式位址送到PC，完成程式跳轉。

以 Cortex-M4 爲例：外部中斷發生後，NVIC 會更新 IPSR，並把 PC 設成 0xE000 0000 + offset (=中斷源編號*4) 的值。此時PC便指向中斷服務程式的首位位址。

.. image:: /embedded/memory_access_behavior.jpg

在一個中斷請求中，除了少數緊急事務要處理之外，常常包含不緊急的事務，如資料的處理或分析。

因此 uClinux 保留了 Linux 處理中斷的手法，將ISR分成前半 (top half) 和後半 (bottom half)。

要立即回應的事務寫在前半，可以稍後處理的事務寫在後半。爲了對這2種中斷進行管理，在kernel內構成了硬中斷系統和軟中斷系統。

爲回應外部設備的 IRQ，kernel均爲每個 IRQ 提供一個函式。

IRQ的個數被定義在arch/arm/mach-stm32/include/mach/irqs.h裏。

.. code-block:: c

    #define NR_IRQS     90  /* STM32F2 */

Interrupt descriptor (include/linux/irq.h):

.. code-block:: c

    struct irq_desc {
        unsigned int        irq;
        struct timer_rand_state *timer_rand_state;
        unsigned int            *kstat_irqs;
    #ifdef CONFIG_INTR_REMAP
        struct irq_2_iommu      *irq_2_iommu;
    #endif
        irq_flow_handler_t  handle_irq;
        struct irq_chip     *chip;
        struct msi_desc     *msi_desc;
        void            *handler_data;
        void            *chip_data;
        struct irqaction    *action;    /* IRQ action list */
        unsigned int        status;     /* IRQ status */

        unsigned int        depth;      /* nested irq disables */
        unsigned int        wake_depth; /* nested wake enables */
        unsigned int        irq_count;  /* For detecting broken IRQs */
        unsigned long       last_unhandled; /* Aging timer for unhandled count */
        unsigned int        irqs_unhandled;
        raw_spinlock_t      lock;
    #ifdef CONFIG_SMP
        cpumask_var_t       affinity;
        unsigned int        node;
    #ifdef CONFIG_GENERIC_PENDING_IRQ
        cpumask_var_t       pending_mask;
    #endif
    #endif
        atomic_t        threads_active;
        wait_queue_head_t       wait_for_threads;
    #ifdef CONFIG_PROC_FS
        struct proc_dir_entry   *dir;
    #endif
        const char      *name;
    } ____cacheline_internodealigned_in_smp;

Interrupt 結構 (來源<http://blog.csdn.net/luyesy/article/details/5947310>_)

.. image:: /embedded/irqdesc.png

GPIO

爲了讓開發者能針對自己的使用需求來做開發，GPIO就顯得特別重要。

它能讓開發者進行硬體擴充。每個GPIO 的 input signal 和 output signal 會對應到特定的腳位。

在 STM32F429開發板上，每個GPIO有對應的編號（0～127）：

.. code-block:: c

 ～# cd /sys/class/gpio
 /sys/class/gpio # ls
 export     gpiochip0  unexport
 /sys/class/gpio # cat gpiochip0/ngpio
 128

硬體驅動原理

GPIO
- Linux GPIO interface<https://www.kernel.org/doc/Documentation/gpio/gpio.txt>_
EXTI, NVIC
- interrupt handling

效能表現

Q&A

Q1. Relationship between fork() and MMU.

果整個程式中如果有指標的話，儲存指標類的資料會跟parent process的相同。

若父行程中有指標

.. code-block:: c int a=0; int *b=&a;

如果用只是單純的複製

從child process 去更動 *b的值，在child process中的a 值不會改變，而是parent process的值會被改變。

所以在fork時需要MMU，這樣就可以把這些指標加入某個固定的值(or shift)，來確保這些指標是正確的。

Q2. What is TLB?

Translation Lookaside Buffer，為一種cache，可加快mapping的速度。

如果TLB hit，CPU直接去對應的記憶體位置抓取指令；如果TLB miss，則去page_table中找對應的記憶體位置抓取指令；

如果產生page fault，則只能去storage把程是載入到記憶體中。

Q3. What is slab & slob allocator?

Q4. What is buddy?

Q5. How program loader load OS?(以uClinux為例)

1.載入OS的資料區段

2.OS kernel初始化，kernel以XIP的方式執行而資料區段儲存在RAM

3.載入應用程式

Q6. Real example about task states change.

參考資料

Introduction to uClinux<http://free-electrons.com/docs/uclinux/>_
Getting Familiar with uClinux/ARM 2.6<http://opensrc.sec.samsung.com/Getting_Familiar_with_uClinuxARM2_6.html>_
Practical Advice on Running uClinux on Cortex-M3/M4<http://electronicdesign.com/embedded/practical-advice-running-uclinux-cortex-m3m4>_
µClinux for ARM Cortex-M3<http://www.eetimes.com/document.asp?doc_id=1316982>_