Linux Memory Management - Important Data Structures - Page Frames - Linux keeps track of each page frame in the system. Each pages is 4KB. The descriptors for each page frame is the a linked list of 'struct page', whose format is typedef struct page { struct list_head list; /* ;mapping has some page lists. */ struct address_space *mapping; /* The inode (or ...) we belong to. */ unsigned long index; /* Our offset within mapping. */ struct page *next_hash; /* Next page sharing our hash bucket in the pagecache hash table. */ atomic_t count; /* Usage count, see below. */ unsigned long flags; /* atomic flags, some possibly updated asynchronously */ struct list_head lru; /* Pageout list, eg. active_list; protected by pagemap_lru_lock !! */ struct page **pprev_hash; /* Complement to *next_hash. */ struct buffer_head * buffers; /* Buffer maps us to a disk block. */ . . . . .. } mem_map_t; Processes Address Space - All information regarding the address space is included in a table of type 'mm_struct' referenced by the 'mm' field of the process descriptor (struct task_struct) struct mm_struct { struct vm_area_struct * mmap; /* list of VMAs */ rb_root_t mm_rb; struct vm_area_struct * mmap_cache; /* last find_vma result */ pgd_t * pgd; atomic_t mm_users; /* How many users with user space? */ atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */ int map_count; /* number of VMAs */ struct rw_semaphore mmap_sem; spinlock_t page_table_lock; /* Protects task page tables and mm->rss */ struct list_head mmlist; /* List of all active mm's. These are globally strung * together off init_mm.mmlist, and are protected * by mmlist_lock */ unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack; unsigned long arg_start, arg_end, env_start, env_end; unsigned long rss, total_vm, locked_vm; unsigned long def_flags; unsigned long cpu_vm_mask; unsigned long swap_address; . . . . . . . . }; Important fields - rss - specifies number of page frames allocated to the process. total_vm - denotes the size of the process address space in terms of number of pages. mmap - points to an array of memory regions. Memory regions are intervals of linear addresses. Memory regions in linux are implemented using the 'struct vm_area_struct', which is as follows - struct vm_area_struct { struct mm_struct * vm_mm; /* The address space we belong to. */ unsigned long vm_start; /* Our start address within vm_mm. */ unsigned long vm_end; /* The first byte after our end address within vm_mm. */ /* linked list of VM areas per task, sorted by address */ struct vm_area_struct *vm_next; pgprot_t vm_page_prot; /* Access permissions of this VMA. */ unsigned long vm_flags; /* Flags, listed below. */ rb_node_t vm_rb; /* * For areas with an address space and backing store, * one of the address_space->i_mmap{,shared} lists, * for shm areas, the list of attaches, otherwise unused. */ struct vm_area_struct *vm_next_share; struct vm_area_struct **vm_pprev_share; /* Function pointers to deal with this struct. */ struct vm_operations_struct * vm_ops; /* Information about our backing store: */ unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE units, *not* PAGE_CACHE_SIZE */ struct file * vm_file; /* File we map to (can be NULL). */ unsigned long vm_raend; /* XXX: put full readahead info here. */ void * vm_private_data; /* was vm_pte (shared mem) */ };Page Fault Handler - The Linux page fault interrupt service routine is the do_page_fault() function (defined for each architecture, for i386 based systems, its defined in arch/i386/mm/fault.c). ->asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code) After performing error checking, if the address belongs to the process address space and the memory region access rights match the access type that caused the exception, it proceeds to call the 'handle_mm_fault()' function. The handle_mm_fault function is defined in mm/memory.c ->int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma, unsigned long address, int write_access) This function checks whether the Page Middle Directory and Page table exists for the address, after allocating them, it calls the handle_pte_fault() function. Since Linux uses Demand paging, the particular page maynot be in memory. The handle_pte_fault() function, determines if the page was never accessed or whether it was swapped back to the disk. So it calls either 1) do_swap_page() - If the page was accessed before and it was temporarily saved to the disk, or 2) do_no_page() - If the page was never accessed. The do_no_page() function inspects in the vm_ops field of vm_area_struct which contains pointers to various virtual memory related functions. One such operation is the nopage(), which the do_no_page() calls (If the vm_ops is not defined or nopage is not specified, it performs a do_anonymous_page(), usually used for dynamic memory allocations.) The generic Linux nopage operation is used for memory mapped executable images and it uses the page cache to bring the required image page into physical memory. The generic nopage function for a memory region is implemented by the filemap_nopage() function defined in the mm/filemap.c. The filemap_nopage() function using the vm_file file pointer for the memory area, searches in the Page Cache, to see if the page exists, if it does, it updates the Page table entries for the process to this page. If the page is non-existant in the cache, it allocates a new page frame, performs a read function to read the contents from the disk into the page (also does some readhead to boost performance) and updates the process's page tables. References - 1) Understanding the Linux Kernel, Daniel P Bovet & Marco Cesati, O'Rielly 2) The Linux Kernel, David Rusling, http://www.tldp.org/LDP/tlk/tlk.html