Skip to content

Instantly share code, notes, and snippets.

@ShawnHuang
Last active October 19, 2015 08:03
Show Gist options
  • Save ShawnHuang/72139e959a9e4901f7a4 to your computer and use it in GitHub Desktop.
Save ShawnHuang/72139e959a9e4901f7a4 to your computer and use it in GitHub Desktop.
linux project

Project 1

Total Score: 110 Points

#I. Introduction: The purpose of this project is to let you configure and modify the kernel by yourself. In this project you will learn: Some data structures that Linux kernel uses to maintain virtual addresses and physical addresses of a process. Some functions and macros related to virtual addresses, physical addresses, and processes.

In the following project description, you can use Google to find the meanings of words or phrases prefixed with the tag.

###II. Project Description : ####1. Add a new system call in Linux kernel. (10 points in this part) New system call prototype : int sys_project(long pid);

: Add a new system call in Linux

The following tasks need to be done inside your new system call which is in the kernel address space.

####2. Find the process according to the pid parameter. Print the image name of the process. (10 points) For example, if you execute an executable file called “project.out,” the result that you print out should be “project.out”.

: struct task_struct

####3. Dump the process virtual address space areas. The output may be similar to the following figure.

Print the vm_start and vm_end of all virtual address areas of the process. (15 points). If a file is associated with a virtual address area, print the name of the file. (15 points)

: struct mm_struct, struct vm_area_struct : you can use the /proc/$pid/maps to verify your result.

####4. Dump the physical frame addresses that the process is using (15 points).

: struct page and related functions and macro.

####5. Optional Bonus Point: (10 points) Add a new system call int nonwritable(unsigned long begin, unsigned long end)which specifies a virtual address range between begin and end as non-writable. You can use program project_user_2.c in appendix to verify your code.

: Change a page table entry from read-write to read-only.

: struct vm_area_struct, vm_flag, page fault

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char *ptr = (char *)malloc(0x1000);
    unsigned int i;

    for(i = 0; i < 0x1000; i++) 
        ptr[i] = 'a';

    for( i = 0; i < 10; i++) 
        printf("%c ", ptr[i]);

    nonwritable(ptr, ptr+0x1000-1);

    /* verify read permission */
    for( i = 0; i < 10; i++) 
        printf("%c ", ptr[i]);

    /* write permission */
    ptr[0] = 'w';
    printf("if u see this message, you missed 15 points\n");
}

###III. TA Q & A ( 20 points) : ####1. Describe how you finished the project and the problems you encountered. ####2. Describe how you verify the result in step 4.

###IV. Report Content ( 15 points):

Copy link

ghost commented Dec 4, 2014

#include <linux/linkage.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/pid.h>

asmlinkage long sys_project(long pid) {
        //printk(KERN_EMERG "By it.livekn.com");
        struct task_struct *task;
        task = pid_task(find_vpid(pid), PIDTYPE_PID);
        ;
        //task = find_task_by_pid(pid);
        printk(KERN_INFO "process name: %s\n",task->comm);
        printk(KERN_INFO "vm_start: %08lx\n",task->mm->mmap->vm_start);
        printk(KERN_INFO "vm_end: %08lx\n",task->mm->mmap->vm_end);
        printk(KERN_INFO "file name: %s\n",task->mm->mmap->vm_file->f_dentry->d_name.name);
        return 1;
}

Copy link

ghost commented Dec 4, 2014

compile bash

#!/bin/sh

mkdir kernel
cd kernel

wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.2.17.tar.bz2

tar xvf linux-3.2.17.tar.bz2

cd linux-3.2.17

cp -vi /boot/config-`uname -r` .config

# prepare toolchain
sudo apt-get install git-core libncurses5 libncurses5-dev libelf-dev asciidoc binutils-dev linux-source qt3-dev-tools libqt3-mt-dev libncurses5 libncurses5-dev fakeroot build-essential crash kexec-tools makedumpfile kernel-wedge kernel-package

# config
#make menuconfig

# compile
make -j8 KDEB_PKGVERSION=1.samou deb-pkg

# install kernel
sudo dpkg -i ../linux*.deb

# reboot
sudo reboot


# undo
#sudo apt-get purge linux-image-3.2.17

Copy link

ghost commented Dec 4, 2014

Dump the virtual address and the physical Frame Number of the virtual address in linux kernel

https://gist.github.com/mike820324/6f92b55bc5666cb266c9

Copy link

ghost commented Dec 4, 2014

#include <linux/linkage.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/pid.h>
// #include <linux/mm.h>
// #include <asm-i386/page.h>
// #include <asm-i386/pgtable.h>

asmlinkage long sys_project(long pid) {
        struct vm_area_struct *vm;  /* loop counter to loop through all vm section */
        struct page *page;
        unsigned long vm_address;   /* the virtual address */
        unsigned long pfn;

        //printk(KERN_EMERG "By it.livekn.com");
        struct task_struct *task; /* the process descriptor */

        task = pid_task(find_vpid(pid), PIDTYPE_PID);        

        //task = find_task_by_pid(pid);
        printk(KERN_INFO "process name: %s\n",task->comm);
        printk(KERN_INFO "vm_start: %08lx\n",task->mm->mmap->vm_start);
        printk(KERN_INFO "vm_end: %08lx\n",task->mm->mmap->vm_end);
        printk(KERN_INFO "file name: %s\n",task->mm->mmap->vm_file->f_dentry->d_name.name);

        // physical frame address
        for ( vm = task->mm->mmap; vm != NULL; vm = vm->vm_next){

            printk("from 0x%08lx~0x%08lx\n",vm->vm_start,vm->vm_end);
            printk("page frame of the interval : \n");
            for (vm_address = vm->vm_start;
                vm_address < vm->vm_end;
                vm_address += 0x1000)
            {
                page = follow_page(vm, vm_address, 0);
                if ( page == NULL) continue;
                pfn = page_to_pfn(page);
                printk("0x%x ",pfn);
            }
            printk("\n");
        }

        return 1;
}

Copy link

ghost commented Dec 4, 2014

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char *ptr = (char *)malloc(0x1000);
    unsigned int i;

    for(i = 0; i < 0x1000; i++) 
        ptr[i] = 'a';

    for( i = 0; i < 10; i++) 
        printf("%c ", ptr[i]);

    nonwritable(ptr, ptr+0x1000-1);

    /* verify read permission */
    for( i = 0; i < 10; i++) 
        printf("%c ", ptr[i]);

    /* write permission */
    ptr[0] = 'w';
    printf("if u see this message, you missed 15 points\n");
}

Copy link

ghost commented Dec 4, 2014

https://www.kernel.org/doc/gorman/html/understand/understand006.html

This set of functions and macros deal with the mapping of addresses and pages to PTEs and the setting of the individual entries.

The macro mk_pte() takes a struct page and protection bits and combines them together to form the pte_t that needs to be inserted into the page table. A similar macro mk_pte_phys() exists which takes a physical page address as a parameter.

The macro pte_page() returns the struct page which corresponds to the PTE entry. pmd_page() returns the struct page containing the set of PTEs.

The macro set_pte() takes a pte_t such as that returned by mk_pte() and places it within the processes page tables. pte_clear() is the reverse operation. An additional function is provided called ptep_get_and_clear() which clears an entry from the process page table and returns the pte_t. This is important when some modification needs to be made to either the PTE protection or the struct page itself.

@ShawnHuang
Copy link
Author

新鮮的結果

cat /proc/2708/maps
00400000-00630000 r-xp 00000000 08:11 2245261                            /usr/bin/vim.gnome
[  196.823759] process name: vim
[  196.823768] vm_start: 00400000
[  196.823771] vm_end: 00630000
[  196.823775] file name: vim.gnome
[  196.823778] from 0x00400000~0x00630000
[  196.823782] page frame of the interval : 
[  196.823787] 0x20580a 0x1f3bf2 0x1f73bf 0x1f3bee 0x1f5135 0x1f5136 0x1f5142 0x1f5143 0x1f5144 0x1f5145 0x1f5146 0x1f5147 0x1f6fa5 0x1f5a68 0x1f6f8d 0x1f6fd6 0x1f7367 0x1f5055 0x1f73e2 0x1f7000 0x1f504f 0x1f7336 0x1f731e 0x1f6f53 0x1f507e 0x1f7e1b 0x1f5cce 0x1f680d 0x1f40ca 0x1f636a 0x1f686f 0x1f3bef 0x1f5057 0x1f5ae2 0x1f6830 0x1f3bf3 0x1f6fb7 0x1f684b 0x1f6f8e 0x1f3bf0 0x218fb7 0x1f5f86 0x1f5123 0x1f5124 0x1f5125 0x1f5126 0x1f5127 0x1f5128 0x1f5129 0x1f512a 0x1f512b 0x1f512c 0x1f512d 0x1f512e 0x1f512f 0x1f5130 0x1f5131 0x1f5132 0x1f5148 0x1f5149 0x1f514a 0x1f514b 0x1f514c 0x1f514d 0x1f514e 0x1f27ac 0x1f27ad 0x1f27ae 0x1f27b0 0x1f27b2 0x1f27b4 0x1f27b5 0x1f27bf 0x1f27c0 0x1f27c1 0x1f27c2 0x1f27c3 0x1f27c4 0x1f27c5 0x1f27c6 0x1f27c7 0x1f27c8 0x1f27c9 0x1f27ca 0x1f27cb 0x1f0cf5 0x1f0cf6 0x1f0cf7 0x1f0cf8 0x1f0cfb 0x1f0cfc 0x1f0cfd 0x1f0cfe 0x1f0cff 0x1f0d00 0x1f0d01 0x1f0d02 0x1f0d03 0x1f0d04 0x1f0d05 0x1f0d06 0x1f0d07 0x1f0d08 0x1f0d09 0x1f0e4c 0x1f0e4a 0x1f0e48 0x1f0e47 0x1f0dd5 0x1f0dd6 0x1f0dd9 0x1f0dda 0x1f0ddb 0x1f0ddc 0x1f0ddd 0x1f0dde 0x1f0ddf 0x1f0de0 0x1f0de1 0x1f0de2 0x1f0de3 0x1f0de4 0x1f0de5 0x1f0de6 0x1f0de7 0x1f0de8 0x1f0de9 0x1f0dea 0x1f0deb 0x1f0d8a 0x1f0d8b 0x1f0d8c 0x1f0d8d 0x1f0d8e 0x1f0d8f 0x1f0d90 0x1f0d91 0x1f0d98 0x1f0d99 0x1f0d9a 0x1f0d9b 0x1f0d9c 0x1f0d9d 0x1f0da2 0x1f0d0a 0x1f0d0b 0x1f0d0c 0x1f0d0d 0x1f0d0e 0x1f0d0f 0x1f0d10 0x1f0d11 0x1f0d12 0x1f0d13 0x1f0d14 0x1f0d15 0x1f0d16 0x1f0d1a 0x1f0d20 0x1f0d21 0x1f0d23 0x1f0d24 0x1f0d25 0x1f0d26 0x1f0d27 0x1f0d4b 0x1f0d4c 0x1f0d4d 0x1f0d4e 0x1f0d4f 0x1f0d50 0x1f0d51 0x1f0c0c 0x1f0c0d 0x1f0c0e 0x1f0c10 0x1f0c11 0x1f0c17 0x1f0c18 0x1f0c19 0x1f0c1a 0x1f0c1b 0x1f0c1c 0x1f0c1f 0x1f0c20 0x1f0c21 0x1f0c22 0x1f0c23 0x1f0c24 0x1f0c25 0x1f0c26 0x1f0c27 0x1f0c28 0x1f0c2b 0x1f0dc2 0x1f0dc3 0x1f0dc4 0x1f0dc6 0x1f0dc7 0x1f0dcc 0x1f0dce 0x1f0dd1 0x1f0dd2 0x1f27ec 0x1f27ed 0x1f27ee 0x1f27ef 0x1f27f0 0x1f27f1 0x1f27f2 0x1f27f3 0x1f27f4 0x1f27f5 0x1f27f6 0x1f27f7 0x1f27f8 0x1f27f9 0x1f27fa 0x1f27fb 0x1f27fc 0x1f27fd 0x1f27fe 0x1f27ff 0x1f0c01 0x1f0c05 0x1f0c07 0x1f0c08 0x1f0c09 0x1f0c0a 0x1f0c0b 0x1f0da6 0x1f0da8 0x1f0da9 0x1f0daa 0x1f0dab 0x1f0dac 0x1f0dad 0x1f0dae 0x1f0daf 0x1f0db5 0x1f0db6 0x1f0db7 0x1f0db8 0x1f0db9 0x1f0dba 0x1f0dbb 0x1f0dbc 0x1f0dbd 0x1f0dbe 0x1f0dbf 0x1f0dc0 0x1f0dc1 0x1f0e14 0x1f0e15 0x1f0e1c 0x1f0e20 0x1f0e01 0x1f0dfb 0x1f0dfa 0x1f0df9 0x1f0df8 0x1f0df7 0x1f0df6 0x1f0df5 0x1f0df4 0x1f0e12 0x1f0e13 0x1f0d79 0x1f0d7a 0x1f0d7b 0x1f0d7c 0x1f0d81 0x1f0d82 0x1f0d83 0x1f0d84 0x1f0d86 0x1f0d87 0x1f0d52 0x1f0d53 0x1f0d5a 0x1f0d5d 0x1f0d5e 0x1f0d5f 0x1f0d60 0x1f0d61 0x1f0d62 0x1f0d63 0x1f0d66 0x1f0d69 0x1f0d6a 0x1f0d6b 0x1f0d6c 0x1f0d72 0x1f0d75 0x1f08a2 0x1f08a3 0x1f08a4 0x1f27cc 0x1f27ce 0x1f27d3 0x1f27d4 0x1f27d5 0x1f27d6 0x1f27d7 0x1f27d8 0x1f27db 0x1f27dc 0x1f27dd 0x1f27de 0x1f27df 0x1f27e0 0x1f27e1 0x1f27e2 0x1f27e3 0x1f27e4 0x1f27e5 0x1f27e6 0x1f27e7 0x1f27e8 0x1f27e9 0x1f27ea 0x1f27eb 0x1f0d41 0x1f0d42 0x1f0d43 0x1f0d44 0x1f0d45 0x1f0d2a 0x1f0d2b 0x1f0d2c 0x1f0d2d 0x1f0d2e 0x1f0d2f 0x1f0d30 0x1f0d31 0x1f0d32 0x1f0d33 0x1f0d34 0x1f0d35 0x1f0d36 0x1f0d39 0x1f0d3a 0x1f0d3b 0x1f0d3c 0x1f0d3d 0x1f0d3e 0x1f0cdc 0x1f0cdd 0x1f0cde 0x1f0cdf 0x1f0ce0 0x1f0ce4 0x1f0ce5 
[  196.824265] from 0x0082f000~0x00830000
[  196.824268] page frame of the interval : 
[  196.824271] 0x1efd60 
[  196.824275] from 0x00830000~0x00848000
[  196.824278] page frame of the interval : 
[  196.824281] 0x1eff23 0x1efedf 0x1f5048 0x1efeb1 0x1efeb2 0x1efebe 0x1f7390 0x1f73ad 0x1ef962 0x1ef960 0x1ef963 0x1ef961 0x1efebf 0x1ef965 0x1ef966 0x1f0788 0x1ef967 0x1f686b 0x1efdd2 0x1efb34 0x1efd92 0x1efc3b 
...

@yuxiang2025
Copy link

Copy link

ghost commented Dec 6, 2014

關於加分題,我往另一個方向去想,老師說他們做的時候,照理說是改了 PTE 但是一樣不會 core dump,於是我去 mm/memory.c 看了一番,發現一個有趣的 function

(kernel 3.14.25)

/**
 * follow_page_mask - look up a page descriptor from a user-virtual address
 * @vma: vm_area_struct mapping @address
 * @address: virtual address to look up
 * @flags: flags modifying lookup behaviour
 * @page_mask: on output, *page_mask is set according to the size of the page
 *
 * @flags can have FOLL_ flags set, defined in <linux/mm.h>
 *
 * Returns the mapped (struct page *), %NULL if no mapping exists, or
 * an error pointer if there is a mapping to something not represented
 * by a page descriptor (see also vm_normal_page()).
 */
struct page *follow_page_mask(struct vm_area_struct *vma,
                  unsigned long address, unsigned int flags,
                  unsigned int *page_mask)

這個 function 專門測試 usermode 對於記憶體的使用,要是超出就直接 pass 掉,不會依照我們想的去取得
它,所以要是以老師的說法,會不會老師的作法跟 memory.c 中的 function 有關,假設有關那麼做好後再來改這裡做個例外讓他不要pass掉我們的請求,或許就有機會?

Copy link

ghost commented Dec 6, 2014

kernel 3.14.25 add syscall step-by-step test on ubuntu 14.04

1. 修改 system call table 

    arch/x86/syscalls/syscall_*.tbl

    a. syscall_32.tbl 加入

        353    i386    project         sys_project        compat_sys_project

    b. syscall_64.tbl 加入

        316    64        project          sys_project


2. 加入函式宣告

    arch/x86/include/asm/syscalls.h

    a. asmlinkage long sys_get_thread_area(struct user_desc __user *); 之後加入

        asmlinkage long sys_project(long);


    include/linux/compat.h

    (compat.h add #include <linux/syscalls.h>    --bug )

    b. #ifdef CONFIG_COMPAT 中 asmlinkage long compat_sys_fanotify_mark(int, unsigned int, __u32, __u32, int, const char __user *); 之後 加入

        asmlinkage long compat_sys_project(long);


3. 加入實作檔

    mkdir arch/x86/kernel/project_01

    a. 修改 arch/x86/kernel/Makefile link project_01 下的 Makefile 加入

        obj-y        += project_01/


    b. 新增 arch/x86/kernel/project_01/Makefile

        obj-y                          := project.o
        obj-$(CONFIG_IA32_EMULATION)   += compat_project.o

    c. 新增 arch/x86/kernel/project_01/project.c

        #include <linux/printk.h>
        #include <linux/syscalls.h>

        #include <asm/syscalls.h>

        SYSCALL_DEFINE1(project, long, pid)
        {
            // 自己按照之前發的加 記得 include header
            return 1;
        }



    d. 新增 arch/x86/kernel/project_01/compat_project.c

        #include <linux/printk.h>
        #include <linux/compat.h>

        COMPAT_SYSCALL_DEFINE1(project ,long , pid)
        {
            // 自己按照之前發的加 記得 include header
            return 1;
        }

4. 測試

    #include <unistd.h>
    #include <sys/syscall.h>

    int main(){
        // X86
        syscall(353, getpid());

        // X64
        syscall(316, getpid());
        return 0;
    }

Copy link

ghost commented Dec 7, 2014

Modify the pte successed, by check the dmesg

[  528.469847] BUG: Bad page map in process a.out  pte:780001a48003 pmd:765ca067
[  528.469852] addr:0000000001a58000 vm_flags:08100073 anon_vma:ffff880235532f80 mapping:          (null) index:1a58

but in memory.c it seems have a function (an mechanism) to avoid this problem like "try & catch" so i thought if we want to make it core dump must find a way to bypass this checking mechanism.

(sorry the i haven't install zhuyin input on this testing system so i've to use my poor eng..)

Copy link

ghost commented Dec 8, 2014

目前我對於不會 core-dump 的想法:

因為我們在做置換 PTE 的動作是在 "Kernel mode" ,而 linux kernel 對於這種不合法的動作於 Kernel mode 並不會讓他掛(不知是否為了保護) 而是呼叫 die(Oops) 重新載入並印出當前錯誤的訊息,而如果是在 User mode 就是直接讓他 core-dump

如果直接註解掉呼叫 die(Oops) ,不知道是否就會 core-dump,不過這樣很危險,因為只要攸關記憶體存取不合法,就掛的話,不知道系統還能不能跑...

Copy link

ghost commented Dec 8, 2014

#include <linux/init.h>  
#include <linux/module.h>  
#include <linux/kernel.h>  
#include <linux/slab.h>  
#include <linux/gfp.h>  
#include <asm/pgtable.h>  
#include <asm/page.h>  
#include <asm/pgalloc.h>
#include <linux/sched.h>  
#include <linux/mm.h>  
#include <linux/highmem.h>  


#include <linux/printk.h>
#include <linux/syscalls.h>

#include <asm/syscalls.h>

typedef unsigned long ULONG;
typedef unsigned long* ULONG_PTR;
#define PtrToUlong( p ) ((ULONG)(ULONG_PTR) (p) )

SYSCALL_DEFINE2(nonwritable, char *, ptr, char *, ptr_mod)
{ 
    pgd_t *pgd;
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;

    pgd = pgd_offset(current->mm, PtrToUlong((void *)ptr));
    pud = pud_alloc(current->mm, pgd, PtrToUlong((void *)ptr));

    pmd = pmd_alloc(current->mm, pud, PtrToUlong((void *)ptr));

    pte = pte_alloc_map(current->mm, NULL, pmd, PtrToUlong((void *)ptr));

    *pte = mk_pte(virt_to_page(PtrToUlong((void *)ptr_mod)), __pgprot(_PAGE_PRESENT));
    *pte = pte_mkwrite(*pte);

    return 1;
}

@ShawnHuang
Copy link
Author

pte = pte_alloc_map(mm, pmd, address);

Copy link

ghost commented Dec 10, 2014

@ShawnHuang
Copy link
Author

#include <linux/init.h>  
#include <linux/module.h>  
#include <linux/kernel.h>  
#include <linux/slab.h>  
#include <linux/gfp.h>  
#include <asm/pgtable.h>  
#include <asm/page.h>  
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <linux/sched.h>  
#include <linux/mm.h>  
#include <linux/highmem.h>  


#include <linux/printk.h>
#include <linux/syscalls.h>

#include <asm/syscalls.h>

typedef unsigned long ULONG;
typedef unsigned long* ULONG_PTR;
#define PtrToUlong( p ) ((ULONG)(ULONG_PTR) (p) )

asmlinkage long nonwritable(char* ptr, char* ptr_mod)
{ 
    pgd_t *pgd;
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;
    pte_t *pte_m;
    //struct page *page;

    //pgd = pgd_offset(current->mm, PtrToUlong((void *)ptr));
    //pud = pud_alloc(current->mm, pgd, PtrToUlong((void *)ptr));

    //pmd = pmd_alloc(current->mm, pud, PtrToUlong((void *)ptr));

    //pte = pte_alloc_map(current->mm, NULL, pmd, PtrToUlong((void *)ptr));

    //*pte = mk_pte(virt_to_page(PtrToUlong((void *)ptr_mod)), __pgprot(_PAGE_PRESENT));
    //*pte = pte_mkwrite(*pte);
    //pte_t *pte;
    //unsigned int level;
    //pte = lookup_address(PtrToUlong((void *)ptr), &level);
    //set_pte_atomic(pte, pte_wrprotect(*pte));
    struct mm_struct *mm = current->mm;
    //unsigned long vm_address;   /* the virtual address */
    //for (vm_address = mm->mmap->vm_start;
    //    vm_address < mm->mmap->vm_end;
    //    vm_address += 0x1000)
    //{
    //  pgd = pgd_offset(mm, vm_address);
    //  pud = pud_offset(pgd, vm_address);
    //  pmd = pmd_offset(pud, vm_address);
    //  pte = pte_offset_map(pmd, vm_address);
    //  printk(KERN_INFO "before bit: %lx\n",pte_val(pte));
    //  flush_cache_mm(mm);
    //  flush_tlb_mm(mm);
    //  set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));
    //  printk(KERN_INFO "after bit: %lx\n",pte_val(pte));
    //  pgd = pgd_offset(mm, vm_address);
    //  pud = pud_offset(pgd, vm_address);
    //  pmd = pmd_offset(pud, vm_address);
    //  pte = pte_offset_map(pmd, vm_address);
    //  printk(KERN_INFO "after bit: %lx\n",pte_val(pte));
    //}

    printk("from %s\n",ptr);
    printk("from %c\n",*ptr);
    printk("from 0x%lx~0x%lx\n",mm->mmap->vm_start,mm->mmap->vm_end);
    printk("from %lx\n",PtrToUlong((void *)ptr));
    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));
    printk(KERN_INFO "before bit: %lx\n",pte_val(*pte));
    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));
    printk(KERN_INFO "after bit: %lx\n",pte_val(*pte));
    flush_cache_mm(mm);
    flush_tlb_mm(mm);
    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));
    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));
    printk(KERN_INFO "flush bit: %lx\n",pte_val(*pte));

    //set_pte(&pte, __pte(pte_val(pte)));
    //page = pte_page(pte);
    //set_pte_atomic(pte, pte_mkwrite(*pte));
    //set_pte_atomic(pte, pte_wrprotect(*pte));
    return 1;
}

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

如果错误由写访问引起,函数检查这个虚拟区是否可写。如果不可写,跳到bad_area代码处;如果可写,把write局部变量置为1。
如果错误由读或执行访问引起,函数检查这一页是否已经存在于物理内存中。如果在,错误的发生就是由于进程试图访问用户态下的一个有特权的页面(页面的User/Supervisor标志被清除),因此函数跳到bad_area代码处(实际上这种情况从不发生,因为内核根本不会给用户进程分配有特权的页面)。如果不存在物理内存,函数还将检查这个虚拟区是否可读或可执行。
如果这个虚拟区的访问权限与引起错误的访问类型相匹配,则调用handle_mm_fault( )函数:
if (!handle_mm_fault(tsk, vma, address, write)) {
tsk->tss.cr2 = address;
tsk->tss.error_code = error_code;
tsk->tss.trap_no = 14;
force_sig(SIGBUS, tsk);
if (!(error_code & 4)) /* 内核态 */
goto no_context;
}

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

因此页表表项被设为零页的物理地址:

entry = pte_wrprotect(mk_pte(ZERO_PAGE, vma->vm_page_prot));

set_pte(pte, entry);

return 1;

由于这个页被标记为不可写,如果进程试图写这个页,则写时复制机制被激活。当且仅当在这个时候,进程才获得一个属于自己的页并对它进行写。这种机制在下一部分进行描述。

Copy link

ghost commented Dec 10, 2014

SYSCALL_DEFINE2(nonwritable, char *, ptr, char *, ptr_mod)
{ 
    pgd_t *pgd;
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;

    struct mm_struct *mm = current->mm;
    printk("from %s\n",ptr);
    printk("from %c\n",*ptr);
    printk("from 0x%lx~0x%lx\n",mm->mmap->vm_start,mm->mmap->vm_end);
    printk("from %lx\n",PtrToUlong((void *)ptr));

    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));

    if(pte_write(*pte))
        printk(KERN_INFO "pte still writable!!\n");

    printk(KERN_INFO "before bit: %lx\n",pte_val(*pte));

    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));

    printk(KERN_INFO "after bit: %lx\n",pte_val(*pte));

    //flush_cache_mm(mm);
    //flush_tlb_mm(mm);

    //set_pte_atomic(pte, pte_wrprotect(*pte));
    set_pte_atomic(pte, __pte(pte_val(pte_wrprotect(*pte))));

    pgd = pgd_offset(mm, PtrToUlong((void *)ptr));
    pud = pud_offset(pgd, PtrToUlong((void *)ptr));
    pmd = pmd_offset(pud, PtrToUlong((void *)ptr));
    pte = pte_offset_map(pmd, PtrToUlong((void *)ptr));

    printk(KERN_INFO "flush bit: %lx\n",pte_val(*pte));



    ptep_set_wrprotect(mm, PtrToUlong((void *)ptr), &pte);


    return 1;
}

Copy link

ghost commented Dec 12, 2014

We first trying to modify the pte and it actually change after "pte_wrprotect(*pte)" but it resume after doing any assignment even if assign other value not "ptr" in c code.

Then we add "ptep_set_wrprotect(mm, PtrToUlong((void *)ptr), &pte);", pte becomes not resuming but "ptr[0]='w';" are still not being affect.

So we guess there were an mechanism in Linux kernel to avoid pte fault and found an approach calls "copy on write" after trace code in function "handle_mm_fault handle_pte_fault do_page_fault" we thought if we change the page protect and rewrite the vm_flags like above can make cow not effect, but we still not get "segmentation fault" on 3.14.25 . Maybe we miss something and will keep finding it.

我英文真的好爛,拜託幫我改進一下…

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

@ShawnHuang
Copy link
Author

First, we modify the pte to read-only permission after calling "pte_wrprotect(*pte)" and we are sure that the value has changed. In fact, if we assign a value to not only the ptr that is set only-read permission but other variables in c code, the pte's permission resumes after doing that .

So we add "ptep_set_wrprotect(mm, PtrToUlong((void *)ptr), &pte)" to our code. The pte doesn't resume anymore. However, we execute "ptr[0]='w'", the result is still the same.

Therefore we introduce it has an mechanism in Linux kernel to avoid pte fault and find an approach named "copy on write" after tracing those functions "handle_mm_fault, handle_pte_fault, do_page_fault". We think we change the page's protection and rewrite the vm_flags will let cow not to happen. After doing that, we still not get the error message,"segmentation fault" on kernel 3.14.25. Maybe we miss something else so we will continue to find the solution.

@ShawnHuang
Copy link
Author

[Sat Dec 13 13:23:01 2014] vm_ops ffffffff81814440
[Sat Dec 13 13:23:01 2014] vm_page_prot 25
[Sat Dec 13 13:23:01 2014] vm_page_prot 8000000000000027
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] pte is writable!!
[Sat Dec 13 13:23:01 2014] before bit: 80000001f561e067
[Sat Dec 13 13:23:01 2014] before bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] after bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] after bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] flush bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] flush bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] page_mkwrite 0
[Sat Dec 13 13:23:01 2014] vm_ops ffffffff81814440
[Sat Dec 13 13:23:01 2014] vm_page_prot 8000000000000027
[Sat Dec 13 13:23:01 2014] vm_page_prot 8000000000000027
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] vm_flags 8001875
[Sat Dec 13 13:23:01 2014] before bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] before bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] after bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] after bit address: ffff880229a37f50
[Sat Dec 13 13:23:01 2014] flush bit: 80000001f561e065
[Sat Dec 13 13:23:01 2014] flush bit address: ffff880229a37f50

Copy link

ghost commented Jul 2, 2015

Copy link

ghost commented Jul 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment