Skip to content

Instantly share code, notes, and snippets.

@ShawnHuang
Last active October 19, 2015 08:03
Show Gist options
  • Save ShawnHuang/72139e959a9e4901f7a4 to your computer and use it in GitHub Desktop.
Save ShawnHuang/72139e959a9e4901f7a4 to your computer and use it in GitHub Desktop.
linux project

Project 1

Total Score: 110 Points

#I. Introduction: The purpose of this project is to let you configure and modify the kernel by yourself. In this project you will learn: Some data structures that Linux kernel uses to maintain virtual addresses and physical addresses of a process. Some functions and macros related to virtual addresses, physical addresses, and processes.

In the following project description, you can use Google to find the meanings of words or phrases prefixed with the tag.

###II. Project Description : ####1. Add a new system call in Linux kernel. (10 points in this part) New system call prototype : int sys_project(long pid);

: Add a new system call in Linux

The following tasks need to be done inside your new system call which is in the kernel address space.

####2. Find the process according to the pid parameter. Print the image name of the process. (10 points) For example, if you execute an executable file called “project.out,” the result that you print out should be “project.out”.

: struct task_struct

####3. Dump the process virtual address space areas. The output may be similar to the following figure.

Print the vm_start and vm_end of all virtual address areas of the process. (15 points). If a file is associated with a virtual address area, print the name of the file. (15 points)

: struct mm_struct, struct vm_area_struct : you can use the /proc/$pid/maps to verify your result.

####4. Dump the physical frame addresses that the process is using (15 points).

: struct page and related functions and macro.

####5. Optional Bonus Point: (10 points) Add a new system call int nonwritable(unsigned long begin, unsigned long end)which specifies a virtual address range between begin and end as non-writable. You can use program project_user_2.c in appendix to verify your code.

: Change a page table entry from read-write to read-only.

: struct vm_area_struct, vm_flag, page fault

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char *ptr = (char *)malloc(0x1000);
    unsigned int i;

    for(i = 0; i < 0x1000; i++) 
        ptr[i] = 'a';

    for( i = 0; i < 10; i++) 
        printf("%c ", ptr[i]);

    nonwritable(ptr, ptr+0x1000-1);

    /* verify read permission */
    for( i = 0; i < 10; i++) 
        printf("%c ", ptr[i]);

    /* write permission */
    ptr[0] = 'w';
    printf("if u see this message, you missed 15 points\n");
}

###III. TA Q & A ( 20 points) : ####1. Describe how you finished the project and the problems you encountered. ####2. Describe how you verify the result in step 4.

###IV. Report Content ( 15 points):

Copy link

ghost commented Dec 6, 2014

關於加分題,我往另一個方向去想,老師說他們做的時候,照理說是改了 PTE 但是一樣不會 core dump,於是我去 mm/memory.c 看了一番,發現一個有趣的 function

(kernel 3.14.25)

/**
 * follow_page_mask - look up a page descriptor from a user-virtual address
 * @vma: vm_area_struct mapping @address
 * @address: virtual address to look up
 * @flags: flags modifying lookup behaviour
 * @page_mask: on output, *page_mask is set according to the size of the page
 *
 * @flags can have FOLL_ flags set, defined in <linux/mm.h>
 *
 * Returns the mapped (struct page *), %NULL if no mapping exists, or
 * an error pointer if there is a mapping to something not represented
 * by a page descriptor (see also vm_normal_page()).
 */
struct page *follow_page_mask(struct vm_area_struct *vma,
                  unsigned long address, unsigned int flags,
                  unsigned int *page_mask)

這個 function 專門測試 usermode 對於記憶體的使用,要是超出就直接 pass 掉,不會依照我們想的去取得
它,所以要是以老師的說法,會不會老師的作法跟 memory.c 中的 function 有關,假設有關那麼做好後再來改這裡做個例外讓他不要pass掉我們的請求,或許就有機會?

Copy link

ghost commented Dec 6, 2014

kernel 3.14.25 add syscall step-by-step test on ubuntu 14.04

1. 修改 system call table 

    arch/x86/syscalls/syscall_*.tbl

    a. syscall_32.tbl 加入

        353    i386    project         sys_project        compat_sys_project

    b. syscall_64.tbl 加入

        316    64        project          sys_project


2. 加入函式宣告

    arch/x86/include/asm/syscalls.h

    a. asmlinkage long sys_get_thread_area(struct user_desc __user *); 之後加入

        asmlinkage long sys_project(long);


    include/linux/compat.h

    (compat.h add #include <linux/syscalls.h>    --bug )

    b. #ifdef CONFIG_COMPAT 中 asmlinkage long compat_sys_fanotify_mark(int, unsigned int, __u32, __u32, int, const char __user *); 之後 加入

        asmlinkage long compat_sys_project(long);


3. 加入實作檔

    mkdir arch/x86/kernel/project_01

    a. 修改 arch/x86/kernel/Makefile link project_01 下的 Makefile 加入

        obj-y        += project_01/


    b. 新增 arch/x86/kernel/project_01/Makefile

        obj-y                          := project.o
        obj-$(CONFIG_IA32_EMULATION)   += compat_project.o

    c. 新增 arch/x86/kernel/project_01/project.c

        #include <linux/printk.h>
        #include <linux/syscalls.h>

        #include <asm/syscalls.h>

        SYSCALL_DEFINE1(project, long, pid)
        {
            // 自己按照之前發的加 記得 include header
            return 1;
        }



    d. 新增 arch/x86/kernel/project_01/compat_project.c

        #include <linux/printk.h>
        #include <linux/compat.h>

        COMPAT_SYSCALL_DEFINE1(project ,long , pid)
        {
            // 自己按照之前發的加 記得 include header
            return 1;
        }

4. 測試

    #include <unistd.h>
    #include <sys/syscall.h>

    int main(){
        // X86
        syscall(353, getpid());

        // X64
        syscall(316, getpid());
        return 0;
    }

Copy link

ghost commented Dec 7, 2014

Modify the pte successed, by check the dmesg

[  528.469847] BUG: Bad page map in process a.out  pte:780001a48003 pmd:765ca067
[  528.469852] addr:0000000001a58000 vm_flags:08100073 anon_vma:ffff880235532f80 mapping:          (null) index:1a58

but in memory.c it seems have a function (an mechanism) to avoid this problem like "try & catch" so i thought if we want to make it core dump must find a way to bypass this checking mechanism.

(sorry the i haven't install zhuyin input on this testing system so i've to use my poor eng..)

Copy link

ghost commented Dec 8, 2014

目前我對於不會 core-dump 的想法:

因為我們在做置換 PTE 的動作是在 "Kernel mode" ,而 linux kernel 對於這種不合法的動作於 Kernel mode 並不會讓他掛(不知是否為了保護) 而是呼叫 die(Oops) 重新載入並印出當前錯誤的訊息,而如果是在 User mode 就是直接讓他 core-dump

如果直接註解掉呼叫 die(Oops) ,不知道是否就會 core-dump,不過這樣很危險,因為只要攸關記憶體存取不合法,就掛的話,不知道系統還能不能跑...

Copy link

ghost commented Dec 8, 2014

#include <linux/init.h>  
#include <linux/module.h>  
#include <linux/kernel.h>  
#include <linux/slab.h>  
#include <linux/gfp.h>  
#include <asm/pgtable.h>  
#include <asm/page.h>  
#include <asm/pgalloc.h>
#include <linux/sched.h>  
#include <linux/mm.h>  
#include <linux/highmem.h>  


#include <linux/printk.h>
#include <linux/syscalls.h>

#include <asm/syscalls.h>

typedef unsigned long ULONG;
typedef unsigned long* ULONG_PTR;
#define PtrToUlong( p ) ((ULONG)(ULONG_PTR) (p) )

SYSCALL_DEFINE2(nonwritable, char *, ptr, char *, ptr_mod)
{ 
    pgd_t *pgd;
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;

    pgd = pgd_offset(current->mm, PtrToUlong((void *)ptr));
    pud = pud_alloc(current->mm, pgd, PtrToUlong((void *)ptr));

    pmd = pmd_alloc(current->mm, pud, PtrToUlong((void *)ptr));

    pte = pte_alloc_map(current->mm, NULL, pmd, PtrToUlong((void *)ptr));

    *pte = mk_pte(virt_to_page(PtrToUlong((void *)ptr_mod)), __pgprot(_PAGE_PRESENT));
    *pte = pte_mkwrite(*pte);

    return 1;
}

@ShawnHuang
Copy link
Author

pte = pte_alloc_map(mm, pmd, address);

Copy link

ghost commented Dec 10, 2014

@ShawnHuang
Copy link
Author

#include <linux/init.h>  
#include <linux/module.h>  
#include <linux/kernel.h>  
#include <linux/slab.h>  
#include <linux/gfp.h>  
#include <asm/pgtable.h>  
#include <asm/page.h>  
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <linux/sched.h>  
#include <linux/mm.h>  
#include <linux/highmem.h>  


#include <linux/printk.h>
#include <linux/syscalls.h>

#include <asm/syscalls.h>

typedef unsigned long ULONG;
typedef unsigned long* ULONG_PTR;
#define PtrToUlong( p ) ((ULONG)(ULONG_PTR) (p) )

asmlinkage long nonwritable(char* ptr, char* ptr_mod)
{ 
    pgd_t *pgd;
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;
    pte_t *pte_m;
    //struct page *page;

    //pgd = pgd_offset(current->mm, PtrToUlong((void *)ptr));
    //pud = pud_alloc(current->mm, pgd, PtrToUlong((void *)ptr));

    //pmd = pmd_alloc(current->mm, pud, PtrToUlong((void *)ptr));

    //pte = pte_alloc_map(current->mm, NULL, pmd, PtrToUlong((void *)ptr));

    //*pte = mk_pte(virt_to_page(PtrToUlong((void *)ptr_mod)), __pgprot(_PAGE_PRESENT));
    //*pte = pte_mkwrite(*pte);
    //pte_t *pte;
    //unsigned int level;
    //pte = lookup_address(PtrToUlong((void *)ptr), &level);
    //set_pte_atomic(pte, pte_wrprotect(*pte));
    struct mm_struct *mm = current->mm;
    //unsigned long vm_address;   /* the virtual address */
    //for (vm_address = mm->mmap->vm_start;
    //    vm_address < mm->mmap->vm_end;
    //    vm_address += 0x1000)
    //{
    //  pgd = pgd_offset(mm, vm_address);
    //  pud = pud_offset(pgd, vm_address);
    //  pmd = pmd_offset(pud, vm_address);
    //  pte = pte_offset_map(pmd, vm_address);
    //  printk(KERN_INFO "before bit: %lx\n",pte_val(pte));
    //  flush_cache_mm(mm);
    //  flush_tlb_mm(mm);
    //  set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));
    //  printk(KERN_INFO "after bit: %lx\n",pte_val(pte));
    //  pgd = pgd_offset(mm, vm_address);
    //  pud = pud_offset(pgd, vm_address);
    //  pmd = pmd_offset(pud, vm_address);
    //  pte = pte_offset_map(pmd, vm_address);
    //  printk(KERN_INFO "after bit: %lx\n",pte_val(pte));
    //}

    printk("from %s\n",ptr);
    printk("from %c\n",*ptr);
    printk("from 0x%lx~0x%lx\n",mm->mmap->vm_start,mm->mmap->vm_end);
    printk("from %lx\n",PtrToUlong((void *)ptr));
    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));
    printk(KERN_INFO "before bit: %lx\n",pte_val(*pte));
    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));
    printk(KERN_INFO "after bit: %lx\n",pte_val(*pte));
    flush_cache_mm(mm);
    flush_tlb_mm(mm);
    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));
    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));
    printk(KERN_INFO "flush bit: %lx\n",pte_val(*pte));

    //set_pte(&pte, __pte(pte_val(pte)));
    //page = pte_page(pte);
    //set_pte_atomic(pte, pte_mkwrite(*pte));
    //set_pte_atomic(pte, pte_wrprotect(*pte));
    return 1;
}

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

如果错误由写访问引起,函数检查这个虚拟区是否可写。如果不可写,跳到bad_area代码处;如果可写,把write局部变量置为1。
如果错误由读或执行访问引起,函数检查这一页是否已经存在于物理内存中。如果在,错误的发生就是由于进程试图访问用户态下的一个有特权的页面(页面的User/Supervisor标志被清除),因此函数跳到bad_area代码处(实际上这种情况从不发生,因为内核根本不会给用户进程分配有特权的页面)。如果不存在物理内存,函数还将检查这个虚拟区是否可读或可执行。
如果这个虚拟区的访问权限与引起错误的访问类型相匹配,则调用handle_mm_fault( )函数:
if (!handle_mm_fault(tsk, vma, address, write)) {
tsk->tss.cr2 = address;
tsk->tss.error_code = error_code;
tsk->tss.trap_no = 14;
force_sig(SIGBUS, tsk);
if (!(error_code & 4)) /* 内核态 */
goto no_context;
}

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

因此页表表项被设为零页的物理地址:

entry = pte_wrprotect(mk_pte(ZERO_PAGE, vma->vm_page_prot));

set_pte(pte, entry);

return 1;

由于这个页被标记为不可写,如果进程试图写这个页,则写时复制机制被激活。当且仅当在这个时候,进程才获得一个属于自己的页并对它进行写。这种机制在下一部分进行描述。

Copy link

ghost commented Dec 10, 2014

SYSCALL_DEFINE2(nonwritable, char *, ptr, char *, ptr_mod)
{ 
    pgd_t *pgd;
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;

    struct mm_struct *mm = current->mm;
    printk("from %s\n",ptr);
    printk("from %c\n",*ptr);
    printk("from 0x%lx~0x%lx\n",mm->mmap->vm_start,mm->mmap->vm_end);
    printk("from %lx\n",PtrToUlong((void *)ptr));

    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));

    if(pte_write(*pte))
        printk(KERN_INFO "pte still writable!!\n");

    printk(KERN_INFO "before bit: %lx\n",pte_val(*pte));

    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));

    printk(KERN_INFO "after bit: %lx\n",pte_val(*pte));

    //flush_cache_mm(mm);
    //flush_tlb_mm(mm);

    //set_pte_atomic(pte, pte_wrprotect(*pte));
    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));

    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));

    printk(KERN_INFO "flush bit: %lx\n",pte_val(*pte));



    ptep_set_wrprotect(mm, PtrToUlong((void *)ptr), &pte);


    return 1;
}

Copy link

ghost commented Dec 12, 2014

We first trying to modify the pte and it actually change after "pte_wrprotect(*pte)" but it resume after doing any assignment even if assign other value not "ptr" in c code.

Then we add "ptep_set_wrprotect(mm, PtrToUlong((void *)ptr), &pte);", pte becomes not resuming but "ptr[0]='w';" are still not being affect.

So we guess there were an mechanism in Linux kernel to avoid pte fault and found an approach calls "copy on write" after trace code in function "handle_mm_fault handle_pte_fault do_page_fault" we thought if we change the page protect and rewrite the vm_flags like above can make cow not effect, but we still not get "segmentation fault" on 3.14.25 . Maybe we miss something and will keep finding it.

我英文真的好爛,拜託幫我改進一下…

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

First, we modify the pte to read-only permission after calling "pte_wrprotect(*pte)" and we are sure that the value has changed. In fact, if we assign a value to not only the ptr that is set only-read permission but other variables in c code, the pte's permission resumes after doing that .

So we add "ptep_set_wrprotect(mm, PtrToUlong((void *)ptr), &pte)" to our code. The pte doesn't resume anymore. However, we execute "ptr[0]='w'", the result is still the same.

Therefore we introduce it has an mechanism in Linux kernel to avoid pte fault and find an approach named "copy on write" after tracing those functions "handle_mm_fault, handle_pte_fault, do_page_fault". We think we change the page's protection and rewrite the vm_flags will let cow not to happen. After doing that, we still not get the error message,"segmentation fault" on kernel 3.14.25. Maybe we miss something else so we will continue to find the solution.

@ShawnHuang
Copy link
Author

[Sat Dec 13 13:23:01 2014] vm_ops ffffffff81814440
[Sat Dec 13 13:23:01 2014] vm_page_prot 25
[Sat Dec 13 13:23:01 2014] vm_page_prot 8000000000000027
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] pte is writable!!
[Sat Dec 13 13:23:01 2014] before bit: 80000001f561e067
[Sat Dec 13 13:23:01 2014] before bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] after bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] after bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] flush bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] flush bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] page_mkwrite 0
[Sat Dec 13 13:23:01 2014] vm_ops ffffffff81814440
[Sat Dec 13 13:23:01 2014] vm_page_prot 8000000000000027
[Sat Dec 13 13:23:01 2014] vm_page_prot 8000000000000027
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] before bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] before bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] after bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] after bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] flush bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] flush bit address: ffff880229a37f50

Copy link

ghost commented Jul 2, 2015

Copy link

ghost commented Jul 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment