Skip to content

Instantly share code, notes, and snippets.

@ShawnHuang
Last active October 19, 2015 08:03
Show Gist options
  • Save ShawnHuang/72139e959a9e4901f7a4 to your computer and use it in GitHub Desktop.
Save ShawnHuang/72139e959a9e4901f7a4 to your computer and use it in GitHub Desktop.
linux project

Project 1

Total Score: 110 Points

#I. Introduction: The purpose of this project is to let you configure and modify the kernel by yourself. In this project you will learn: Some data structures that Linux kernel uses to maintain virtual addresses and physical addresses of a process. Some functions and macros related to virtual addresses, physical addresses, and processes.

In the following project description, you can use Google to find the meanings of words or phrases prefixed with the tag.

###II. Project Description : ####1. Add a new system call in Linux kernel. (10 points in this part) New system call prototype : int sys_project(long pid);

: Add a new system call in Linux

The following tasks need to be done inside your new system call which is in the kernel address space.

####2. Find the process according to the pid parameter. Print the image name of the process. (10 points) For example, if you execute an executable file called “project.out,” the result that you print out should be “project.out”.

: struct task_struct

####3. Dump the process virtual address space areas. The output may be similar to the following figure.

Print the vm_start and vm_end of all virtual address areas of the process. (15 points). If a file is associated with a virtual address area, print the name of the file. (15 points)

: struct mm_struct, struct vm_area_struct : you can use the /proc/$pid/maps to verify your result.

####4. Dump the physical frame addresses that the process is using (15 points).

: struct page and related functions and macro.

####5. Optional Bonus Point: (10 points) Add a new system call int nonwritable(unsigned long begin, unsigned long end)which specifies a virtual address range between begin and end as non-writable. You can use program project_user_2.c in appendix to verify your code.

: Change a page table entry from read-write to read-only.

: struct vm_area_struct, vm_flag, page fault

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char *ptr = (char *)malloc(0x1000);
    unsigned int i;

    for(i = 0; i < 0x1000; i++) 
        ptr[i] = 'a';

    for( i = 0; i < 10; i++) 
        printf("%c ", ptr[i]);

    nonwritable(ptr, ptr+0x1000-1);

    /* verify read permission */
    for( i = 0; i < 10; i++) 
        printf("%c ", ptr[i]);

    /* write permission */
    ptr[0] = 'w';
    printf("if u see this message, you missed 15 points\n");
}

###III. TA Q & A ( 20 points) : ####1. Describe how you finished the project and the problems you encountered. ####2. Describe how you verify the result in step 4.

###IV. Report Content ( 15 points):

Copy link

ghost commented Dec 10, 2014

@ShawnHuang
Copy link
Author

#include <linux/init.h>  
#include <linux/module.h>  
#include <linux/kernel.h>  
#include <linux/slab.h>  
#include <linux/gfp.h>  
#include <asm/pgtable.h>  
#include <asm/page.h>  
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <linux/sched.h>  
#include <linux/mm.h>  
#include <linux/highmem.h>  


#include <linux/printk.h>
#include <linux/syscalls.h>

#include <asm/syscalls.h>

typedef unsigned long ULONG;
typedef unsigned long* ULONG_PTR;
#define PtrToUlong( p ) ((ULONG)(ULONG_PTR) (p) )

asmlinkage long nonwritable(char* ptr, char* ptr_mod)
{ 
    pgd_t *pgd;
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;
    pte_t *pte_m;
    //struct page *page;

    //pgd = pgd_offset(current->mm, PtrToUlong((void *)ptr));
    //pud = pud_alloc(current->mm, pgd, PtrToUlong((void *)ptr));

    //pmd = pmd_alloc(current->mm, pud, PtrToUlong((void *)ptr));

    //pte = pte_alloc_map(current->mm, NULL, pmd, PtrToUlong((void *)ptr));

    //*pte = mk_pte(virt_to_page(PtrToUlong((void *)ptr_mod)), __pgprot(_PAGE_PRESENT));
    //*pte = pte_mkwrite(*pte);
    //pte_t *pte;
    //unsigned int level;
    //pte = lookup_address(PtrToUlong((void *)ptr), &level);
    //set_pte_atomic(pte, pte_wrprotect(*pte));
    struct mm_struct *mm = current->mm;
    //unsigned long vm_address;   /* the virtual address */
    //for (vm_address = mm->mmap->vm_start;
    //    vm_address < mm->mmap->vm_end;
    //    vm_address += 0x1000)
    //{
    //  pgd = pgd_offset(mm, vm_address);
    //  pud = pud_offset(pgd, vm_address);
    //  pmd = pmd_offset(pud, vm_address);
    //  pte = pte_offset_map(pmd, vm_address);
    //  printk(KERN_INFO "before bit: %lx\n",pte_val(pte));
    //  flush_cache_mm(mm);
    //  flush_tlb_mm(mm);
    //  set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));
    //  printk(KERN_INFO "after bit: %lx\n",pte_val(pte));
    //  pgd = pgd_offset(mm, vm_address);
    //  pud = pud_offset(pgd, vm_address);
    //  pmd = pmd_offset(pud, vm_address);
    //  pte = pte_offset_map(pmd, vm_address);
    //  printk(KERN_INFO "after bit: %lx\n",pte_val(pte));
    //}

    printk("from %s\n",ptr);
    printk("from %c\n",*ptr);
    printk("from 0x%lx~0x%lx\n",mm->mmap->vm_start,mm->mmap->vm_end);
    printk("from %lx\n",PtrToUlong((void *)ptr));
    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));
    printk(KERN_INFO "before bit: %lx\n",pte_val(*pte));
    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));
    printk(KERN_INFO "after bit: %lx\n",pte_val(*pte));
    flush_cache_mm(mm);
    flush_tlb_mm(mm);
    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));
    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));
    printk(KERN_INFO "flush bit: %lx\n",pte_val(*pte));

    //set_pte(&pte, __pte(pte_val(pte)));
    //page = pte_page(pte);
    //set_pte_atomic(pte, pte_mkwrite(*pte));
    //set_pte_atomic(pte, pte_wrprotect(*pte));
    return 1;
}

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

如果错误由写访问引起,函数检查这个虚拟区是否可写。如果不可写,跳到bad_area代码处;如果可写,把write局部变量置为1。
如果错误由读或执行访问引起,函数检查这一页是否已经存在于物理内存中。如果在,错误的发生就是由于进程试图访问用户态下的一个有特权的页面(页面的User/Supervisor标志被清除),因此函数跳到bad_area代码处(实际上这种情况从不发生,因为内核根本不会给用户进程分配有特权的页面)。如果不存在物理内存,函数还将检查这个虚拟区是否可读或可执行。
如果这个虚拟区的访问权限与引起错误的访问类型相匹配,则调用handle_mm_fault( )函数:
if (!handle_mm_fault(tsk, vma, address, write)) {
tsk->tss.cr2 = address;
tsk->tss.error_code = error_code;
tsk->tss.trap_no = 14;
force_sig(SIGBUS, tsk);
if (!(error_code & 4)) /* 内核态 */
goto no_context;
}

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

因此页表表项被设为零页的物理地址:

entry = pte_wrprotect(mk_pte(ZERO_PAGE, vma->vm_page_prot));

set_pte(pte, entry);

return 1;

由于这个页被标记为不可写,如果进程试图写这个页,则写时复制机制被激活。当且仅当在这个时候,进程才获得一个属于自己的页并对它进行写。这种机制在下一部分进行描述。

Copy link

ghost commented Dec 10, 2014

SYSCALL_DEFINE2(nonwritable, char *, ptr, char *, ptr_mod)
{ 
    pgd_t *pgd;
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;

    struct mm_struct *mm = current->mm;
    printk("from %s\n",ptr);
    printk("from %c\n",*ptr);
    printk("from 0x%lx~0x%lx\n",mm->mmap->vm_start,mm->mmap->vm_end);
    printk("from %lx\n",PtrToUlong((void *)ptr));

    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));

    if(pte_write(*pte))
        printk(KERN_INFO "pte still writable!!\n");

    printk(KERN_INFO "before bit: %lx\n",pte_val(*pte));

    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));

    printk(KERN_INFO "after bit: %lx\n",pte_val(*pte));

    //flush_cache_mm(mm);
    //flush_tlb_mm(mm);

    //set_pte_atomic(pte, pte_wrprotect(*pte));
    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));

    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));

    printk(KERN_INFO "flush bit: %lx\n",pte_val(*pte));



    ptep_set_wrprotect(mm, PtrToUlong((void *)ptr), &pte);


    return 1;
}

Copy link

ghost commented Dec 12, 2014

We first trying to modify the pte and it actually change after "pte_wrprotect(*pte)" but it resume after doing any assignment even if assign other value not "ptr" in c code.

Then we add "ptep_set_wrprotect(mm, PtrToUlong((void *)ptr), &pte);", pte becomes not resuming but "ptr[0]='w';" are still not being affect.

So we guess there were an mechanism in Linux kernel to avoid pte fault and found an approach calls "copy on write" after trace code in function "handle_mm_fault handle_pte_fault do_page_fault" we thought if we change the page protect and rewrite the vm_flags like above can make cow not effect, but we still not get "segmentation fault" on 3.14.25 . Maybe we miss something and will keep finding it.

我英文真的好爛,拜託幫我改進一下…

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

First, we modify the pte to read-only permission after calling "pte_wrprotect(*pte)" and we are sure that the value has changed. In fact, if we assign a value to not only the ptr that is set only-read permission but other variables in c code, the pte's permission resumes after doing that .

So we add "ptep_set_wrprotect(mm, PtrToUlong((void *)ptr), &pte)" to our code. The pte doesn't resume anymore. However, we execute "ptr[0]='w'", the result is still the same.

Therefore we introduce it has an mechanism in Linux kernel to avoid pte fault and find an approach named "copy on write" after tracing those functions "handle_mm_fault, handle_pte_fault, do_page_fault". We think we change the page's protection and rewrite the vm_flags will let cow not to happen. After doing that, we still not get the error message,"segmentation fault" on kernel 3.14.25. Maybe we miss something else so we will continue to find the solution.

@ShawnHuang
Copy link
Author

[Sat Dec 13 13:23:01 2014] vm_ops ffffffff81814440
[Sat Dec 13 13:23:01 2014] vm_page_prot 25
[Sat Dec 13 13:23:01 2014] vm_page_prot 8000000000000027
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] pte is writable!!
[Sat Dec 13 13:23:01 2014] before bit: 80000001f561e067
[Sat Dec 13 13:23:01 2014] before bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] after bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] after bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] flush bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] flush bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] page_mkwrite 0
[Sat Dec 13 13:23:01 2014] vm_ops ffffffff81814440
[Sat Dec 13 13:23:01 2014] vm_page_prot 8000000000000027
[Sat Dec 13 13:23:01 2014] vm_page_prot 8000000000000027
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] before bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] before bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] after bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] after bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] flush bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] flush bit address: ffff880229a37f50

Copy link

ghost commented Jul 2, 2015

Copy link

ghost commented Jul 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment