Skip to content

Instantly share code, notes, and snippets.

@pfactum
Created June 9, 2014 10:21
Show Gist options
  • Save pfactum/5572bcf02b421f51ffdc to your computer and use it in GitHub Desktop.
Save pfactum/5572bcf02b421f51ffdc to your computer and use it in GitHub Desktop.
This file has been truncated, but you can view the full file.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 30a8ad0d..06d87ee 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3428,6 +3428,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
HIGHMEM regardless of setting
of CONFIG_HIGHPTE.
+ uuid_debug= (Boolean) whether to enable debugging of TuxOnIce's
+ uuid support.
+
vdso= [X86,SH]
On X86_32, this is an alias for vdso32=. Otherwise:
diff --git a/Documentation/power/tuxonice-internals.txt b/Documentation/power/tuxonice-internals.txt
new file mode 100644
index 0000000..7a96186
--- /dev/null
+++ b/Documentation/power/tuxonice-internals.txt
@@ -0,0 +1,477 @@
+ TuxOnIce 3.0 Internal Documentation.
+ Updated to 26 March 2009
+
+1. Introduction.
+
+ TuxOnIce 3.0 is an addition to the Linux Kernel, designed to
+ allow the user to quickly shutdown and quickly boot a computer, without
+ needing to close documents or programs. It is equivalent to the
+ hibernate facility in some laptops. This implementation, however,
+ requires no special BIOS or hardware support.
+
+ The code in these files is based upon the original implementation
+ prepared by Gabor Kuti and additional work by Pavel Machek and a
+ host of others. This code has been substantially reworked by Nigel
+ Cunningham, again with the help and testing of many others, not the
+ least of whom is Michael Frank. At its heart, however, the operation is
+ essentially the same as Gabor's version.
+
+2. Overview of operation.
+
+ The basic sequence of operations is as follows:
+
+ a. Quiesce all other activity.
+ b. Ensure enough memory and storage space are available, and attempt
+ to free memory/storage if necessary.
+ c. Allocate the required memory and storage space.
+ d. Write the image.
+ e. Power down.
+
+ There are a number of complicating factors which mean that things are
+ not as simple as the above would imply, however...
+
+ o The activity of each process must be stopped at a point where it will
+ not be holding locks necessary for saving the image, or unexpectedly
+ restart operations due to something like a timeout and thereby make
+ our image inconsistent.
+
+ o It is desirous that we sync outstanding I/O to disk before calculating
+ image statistics. This reduces corruption if one should suspend but
+ then not resume, and also makes later parts of the operation safer (see
+ below).
+
+ o We need to get as close as we can to an atomic copy of the data.
+ Inconsistencies in the image will result in inconsistent memory contents at
+ resume time, and thus in instability of the system and/or file system
+ corruption. This would appear to imply a maximum image size of one half of
+ the amount of RAM, but we have a solution... (again, below).
+
+ o In 2.6, we choose to play nicely with the other suspend-to-disk
+ implementations.
+
+3. Detailed description of internals.
+
+ a. Quiescing activity.
+
+ Safely quiescing the system is achieved using three separate but related
+ aspects.
+
+ First, we note that the vast majority of processes don't need to run during
+ suspend. They can be 'frozen'. We therefore implement a refrigerator
+ routine, which processes enter and in which they remain until the cycle is
+ complete. Processes enter the refrigerator via try_to_freeze() invocations
+ at appropriate places. A process cannot be frozen in any old place. It
+ must not be holding locks that will be needed for writing the image or
+ freezing other processes. For this reason, userspace processes generally
+ enter the refrigerator via the signal handling code, and kernel threads at
+ the place in their event loops where they drop locks and yield to other
+ processes or sleep.
+
+ The task of freezing processes is complicated by the fact that there can be
+ interdependencies between processes. Freezing process A before process B may
+ mean that process B cannot be frozen, because it stops at waiting for
+ process A rather than in the refrigerator. This issue is seen where
+ userspace waits on freezeable kernel threads or fuse filesystem threads. To
+ address this issue, we implement the following algorithm for quiescing
+ activity:
+
+ - Freeze filesystems (including fuse - userspace programs starting
+ new requests are immediately frozen; programs already running
+ requests complete their work before being frozen in the next
+ step)
+ - Freeze userspace
+ - Thaw filesystems (this is safe now that userspace is frozen and no
+ fuse requests are outstanding).
+ - Invoke sys_sync (noop on fuse).
+ - Freeze filesystems
+ - Freeze kernel threads
+
+ If we need to free memory, we thaw kernel threads and filesystems, but not
+ userspace. We can then free caches without worrying about deadlocks due to
+ swap files being on frozen filesystems or such like.
+
+ b. Ensure enough memory & storage are available.
+
+ We have a number of constraints to meet in order to be able to successfully
+ suspend and resume.
+
+ First, the image will be written in two parts, described below. One of these
+ parts needs to have an atomic copy made, which of course implies a maximum
+ size of one half of the amount of system memory. The other part ('pageset')
+ is not atomically copied, and can therefore be as large or small as desired.
+
+ Second, we have constraints on the amount of storage available. In these
+ calculations, we may also consider any compression that will be done. The
+ cryptoapi module allows the user to configure an expected compression ratio.
+
+ Third, the user can specify an arbitrary limit on the image size, in
+ megabytes. This limit is treated as a soft limit, so that we don't fail the
+ attempt to suspend if we cannot meet this constraint.
+
+ c. Allocate the required memory and storage space.
+
+ Having done the initial freeze, we determine whether the above constraints
+ are met, and seek to allocate the metadata for the image. If the constraints
+ are not met, or we fail to allocate the required space for the metadata, we
+ seek to free the amount of memory that we calculate is needed and try again.
+ We allow up to four iterations of this loop before aborting the cycle. If we
+ do fail, it should only be because of a bug in TuxOnIce's calculations.
+
+ These steps are merged together in the prepare_image function, found in
+ prepare_image.c. The functions are merged because of the cyclical nature
+ of the problem of calculating how much memory and storage is needed. Since
+ the data structures containing the information about the image must
+ themselves take memory and use storage, the amount of memory and storage
+ required changes as we prepare the image. Since the changes are not large,
+ only one or two iterations will be required to achieve a solution.
+
+ The recursive nature of the algorithm is miminised by keeping user space
+ frozen while preparing the image, and by the fact that our records of which
+ pages are to be saved and which pageset they are saved in use bitmaps (so
+ that changes in number or fragmentation of the pages to be saved don't
+ feedback via changes in the amount of memory needed for metadata). The
+ recursiveness is thus limited to any extra slab pages allocated to store the
+ extents that record storage used, and the effects of seeking to free memory.
+
+ d. Write the image.
+
+ We previously mentioned the need to create an atomic copy of the data, and
+ the half-of-memory limitation that is implied in this. This limitation is
+ circumvented by dividing the memory to be saved into two parts, called
+ pagesets.
+
+ Pageset2 contains most of the page cache - the pages on the active and
+ inactive LRU lists that aren't needed or modified while TuxOnIce is
+ running, so they can be safely written without an atomic copy. They are
+ therefore saved first and reloaded last. While saving these pages,
+ TuxOnIce carefully ensures that the work of writing the pages doesn't make
+ the image inconsistent. With the support for Kernel (Video) Mode Setting
+ going into the kernel at the time of writing, we need to check for pages
+ on the LRU that are used by KMS, and exclude them from pageset2. They are
+ atomically copied as part of pageset 1.
+
+ Once pageset2 has been saved, we prepare to do the atomic copy of remaining
+ memory. As part of the preparation, we power down drivers, thereby providing
+ them with the opportunity to have their state recorded in the image. The
+ amount of memory allocated by drivers for this is usually negligible, but if
+ DRI is in use, video drivers may require significants amounts. Ideally we
+ would be able to query drivers while preparing the image as to the amount of
+ memory they will need. Unfortunately no such mechanism exists at the time of
+ writing. For this reason, TuxOnIce allows the user to set an
+ 'extra_pages_allowance', which is used to seek to ensure sufficient memory
+ is available for drivers at this point. TuxOnIce also lets the user set this
+ value to 0. In this case, a test driver suspend is done while preparing the
+ image, and the difference (plus a margin) used instead. TuxOnIce will also
+ automatically restart the hibernation process (twice at most) if it finds
+ that the extra pages allowance is not sufficient. It will then use what was
+ actually needed (plus a margin, again). Failure to hibernate should thus
+ be an extremely rare occurence.
+
+ Having suspended the drivers, we save the CPU context before making an
+ atomic copy of pageset1, resuming the drivers and saving the atomic copy.
+ After saving the two pagesets, we just need to save our metadata before
+ powering down.
+
+ As we mentioned earlier, the contents of pageset2 pages aren't needed once
+ they've been saved. We therefore use them as the destination of our atomic
+ copy. In the unlikely event that pageset1 is larger, extra pages are
+ allocated while the image is being prepared. This is normally only a real
+ possibility when the system has just been booted and the page cache is
+ small.
+
+ This is where we need to be careful about syncing, however. Pageset2 will
+ probably contain filesystem meta data. If this is overwritten with pageset1
+ and then a sync occurs, the filesystem will be corrupted - at least until
+ resume time and another sync of the restored data. Since there is a
+ possibility that the user might not resume or (may it never be!) that
+ TuxOnIce might oops, we do our utmost to avoid syncing filesystems after
+ copying pageset1.
+
+ e. Power down.
+
+ Powering down uses standard kernel routines. TuxOnIce supports powering down
+ using the ACPI S3, S4 and S5 methods or the kernel's non-ACPI power-off.
+ Supporting suspend to ram (S3) as a power off option might sound strange,
+ but it allows the user to quickly get their system up and running again if
+ the battery doesn't run out (we just need to re-read the overwritten pages)
+ and if the battery does run out (or the user removes power), they can still
+ resume.
+
+4. Data Structures.
+
+ TuxOnIce uses three main structures to store its metadata and configuration
+ information:
+
+ a) Pageflags bitmaps.
+
+ TuxOnIce records which pages will be in pageset1, pageset2, the destination
+ of the atomic copy and the source of the atomically restored image using
+ bitmaps. The code used is that written for swsusp, with small improvements
+ to match TuxOnIce's requirements.
+
+ The pageset1 bitmap is thus easily stored in the image header for use at
+ resume time.
+
+ As mentioned above, using bitmaps also means that the amount of memory and
+ storage required for recording the above information is constant. This
+ greatly simplifies the work of preparing the image. In earlier versions of
+ TuxOnIce, extents were used to record which pages would be stored. In that
+ case, however, eating memory could result in greater fragmentation of the
+ lists of pages, which in turn required more memory to store the extents and
+ more storage in the image header. These could in turn require further
+ freeing of memory, and another iteration. All of this complexity is removed
+ by having bitmaps.
+
+ Bitmaps also make a lot of sense because TuxOnIce only ever iterates
+ through the lists. There is therefore no cost to not being able to find the
+ nth page in order 0 time. We only need to worry about the cost of finding
+ the n+1th page, given the location of the nth page. Bitwise optimisations
+ help here.
+
+ b) Extents for block data.
+
+ TuxOnIce supports writing the image to multiple block devices. In the case
+ of swap, multiple partitions and/or files may be in use, and we happily use
+ them all (with the exception of compcache pages, which we allocate but do
+ not use). This use of multiple block devices is accomplished as follows:
+
+ Whatever the actual source of the allocated storage, the destination of the
+ image can be viewed in terms of one or more block devices, and on each
+ device, a list of sectors. To simplify matters, we only use contiguous,
+ PAGE_SIZE aligned sectors, like the swap code does.
+
+ Since sector numbers on each bdev may well not start at 0, it makes much
+ more sense to use extents here. Contiguous ranges of pages can thus be
+ represented in the extents by contiguous values.
+
+ Variations in block size are taken account of in transforming this data
+ into the parameters for bio submission.
+
+ We can thus implement a layer of abstraction wherein the core of TuxOnIce
+ doesn't have to worry about which device we're currently writing to or
+ where in the device we are. It simply requests that the next page in the
+ pageset or header be written, leaving the details to this lower layer.
+ The lower layer remembers where in the sequence of devices and blocks each
+ pageset starts. The header always starts at the beginning of the allocated
+ storage.
+
+ So extents are:
+
+ struct extent {
+ unsigned long minimum, maximum;
+ struct extent *next;
+ }
+
+ These are combined into chains of extents for a device:
+
+ struct extent_chain {
+ int size; /* size of the extent ie sum (max-min+1) */
+ int allocs, frees;
+ char *name;
+ struct extent *first, *last_touched;
+ };
+
+ For each bdev, we need to store a little more info:
+
+ struct suspend_bdev_info {
+ struct block_device *bdev;
+ dev_t dev_t;
+ int bmap_shift;
+ int blocks_per_page;
+ };
+
+ The dev_t is used to identify the device in the stored image. As a result,
+ we expect devices at resume time to have the same major and minor numbers
+ as they had while suspending. This is primarily a concern where the user
+ utilises LVM for storage, as they will need to dmsetup their partitions in
+ such a way as to maintain this consistency at resume time.
+
+ bmap_shift and blocks_per_page apply the effects of variations in blocks
+ per page settings for the filesystem and underlying bdev. For most
+ filesystems, these are the same, but for xfs, they can have independant
+ values.
+
+ Combining these two structures together, we have everything we need to
+ record what devices and what blocks on each device are being used to
+ store the image, and to submit i/o using bio_submit.
+
+ The last elements in the picture are a means of recording how the storage
+ is being used.
+
+ We do this first and foremost by implementing a layer of abstraction on
+ top of the devices and extent chains which allows us to view however many
+ devices there might be as one long storage tape, with a single 'head' that
+ tracks a 'current position' on the tape:
+
+ struct extent_iterate_state {
+ struct extent_chain *chains;
+ int num_chains;
+ int current_chain;
+ struct extent *current_extent;
+ unsigned long current_offset;
+ };
+
+ That is, *chains points to an array of size num_chains of extent chains.
+ For the filewriter, this is always a single chain. For the swapwriter, the
+ array is of size MAX_SWAPFILES.
+
+ current_chain, current_extent and current_offset thus point to the current
+ index in the chains array (and into a matching array of struct
+ suspend_bdev_info), the current extent in that chain (to optimise access),
+ and the current value in the offset.
+
+ The image is divided into three parts:
+ - The header
+ - Pageset 1
+ - Pageset 2
+
+ The header always starts at the first device and first block. We know its
+ size before we begin to save the image because we carefully account for
+ everything that will be stored in it.
+
+ The second pageset (LRU) is stored first. It begins on the next page after
+ the end of the header.
+
+ The first pageset is stored second. It's start location is only known once
+ pageset2 has been saved, since pageset2 may be compressed as it is written.
+ This location is thus recorded at the end of saving pageset2. It is page
+ aligned also.
+
+ Since this information is needed at resume time, and the location of extents
+ in memory will differ at resume time, this needs to be stored in a portable
+ way:
+
+ struct extent_iterate_saved_state {
+ int chain_num;
+ int extent_num;
+ unsigned long offset;
+ };
+
+ We can thus implement a layer of abstraction wherein the core of TuxOnIce
+ doesn't have to worry about which device we're currently writing to or
+ where in the device we are. It simply requests that the next page in the
+ pageset or header be written, leaving the details to this layer, and
+ invokes the routines to remember and restore the position, without having
+ to worry about the details of how the data is arranged on disk or such like.
+
+ c) Modules
+
+ One aim in designing TuxOnIce was to make it flexible. We wanted to allow
+ for the implementation of different methods of transforming a page to be
+ written to disk and different methods of getting the pages stored.
+
+ In early versions (the betas and perhaps Suspend1), compression support was
+ inlined in the image writing code, and the data structures and code for
+ managing swap were intertwined with the rest of the code. A number of people
+ had expressed interest in implementing image encryption, and alternative
+ methods of storing the image.
+
+ In order to achieve this, TuxOnIce was given a modular design.
+
+ A module is a single file which encapsulates the functionality needed
+ to transform a pageset of data (encryption or compression, for example),
+ or to write the pageset to a device. The former type of module is called
+ a 'page-transformer', the later a 'writer'.
+
+ Modules are linked together in pipeline fashion. There may be zero or more
+ page transformers in a pipeline, and there is always exactly one writer.
+ The pipeline follows this pattern:
+
+ ---------------------------------
+ | TuxOnIce Core |
+ ---------------------------------
+ |
+ |
+ ---------------------------------
+ | Page transformer 1 |
+ ---------------------------------
+ |
+ |
+ ---------------------------------
+ | Page transformer 2 |
+ ---------------------------------
+ |
+ |
+ ---------------------------------
+ | Writer |
+ ---------------------------------
+
+ During the writing of an image, the core code feeds pages one at a time
+ to the first module. This module performs whatever transformations it
+ implements on the incoming data, completely consuming the incoming data and
+ feeding output in a similar manner to the next module.
+
+ All routines are SMP safe, and the final result of the transformations is
+ written with an index (provided by the core) and size of the output by the
+ writer. As a result, we can have multithreaded I/O without needing to
+ worry about the sequence in which pages are written (or read).
+
+ During reading, the pipeline works in the reverse direction. The core code
+ calls the first module with the address of a buffer which should be filled.
+ (Note that the buffer size is always PAGE_SIZE at this time). This module
+ will in turn request data from the next module and so on down until the
+ writer is made to read from the stored image.
+
+ Part of definition of the structure of a module thus looks like this:
+
+ int (*rw_init) (int rw, int stream_number);
+ int (*rw_cleanup) (int rw);
+ int (*write_chunk) (struct page *buffer_page);
+ int (*read_chunk) (struct page *buffer_page, int sync);
+
+ It should be noted that the _cleanup routine may be called before the
+ full stream of data has been read or written. While writing the image,
+ the user may (depending upon settings) choose to abort suspending, and
+ if we are in the midst of writing the last portion of the image, a portion
+ of the second pageset may be reread. This may also happen if an error
+ occurs and we seek to abort the process of writing the image.
+
+ The modular design is also useful in a number of other ways. It provides
+ a means where by we can add support for:
+
+ - providing overall initialisation and cleanup routines;
+ - serialising configuration information in the image header;
+ - providing debugging information to the user;
+ - determining memory and image storage requirements;
+ - dis/enabling components at run-time;
+ - configuring the module (see below);
+
+ ...and routines for writers specific to their work:
+ - Parsing a resume= location;
+ - Determining whether an image exists;
+ - Marking a resume as having been attempted;
+ - Invalidating an image;
+
+ Since some parts of the core - the user interface and storage manager
+ support - have use for some of these functions, they are registered as
+ 'miscellaneous' modules as well.
+
+ d) Sysfs data structures.
+
+ This brings us naturally to support for configuring TuxOnIce. We desired to
+ provide a way to make TuxOnIce as flexible and configurable as possible.
+ The user shouldn't have to reboot just because they want to now hibernate to
+ a file instead of a partition, for example.
+
+ To accomplish this, TuxOnIce implements a very generic means whereby the
+ core and modules can register new sysfs entries. All TuxOnIce entries use
+ a single _store and _show routine, both of which are found in
+ tuxonice_sysfs.c in the kernel/power directory. These routines handle the
+ most common operations - getting and setting the values of bits, integers,
+ longs, unsigned longs and strings in one place, and allow overrides for
+ customised get and set options as well as side-effect routines for all
+ reads and writes.
+
+ When combined with some simple macros, a new sysfs entry can then be defined
+ in just a couple of lines:
+
+ SYSFS_INT("progress_granularity", SYSFS_RW, &progress_granularity, 1,
+ 2048, 0, NULL),
+
+ This defines a sysfs entry named "progress_granularity" which is rw and
+ allows the user to access an integer stored at &progress_granularity, giving
+ it a value between 1 and 2048 inclusive.
+
+ Sysfs entries are registered under /sys/power/tuxonice, and entries for
+ modules are located in a subdirectory named after the module.
+
diff --git a/Documentation/power/tuxonice.txt b/Documentation/power/tuxonice.txt
new file mode 100644
index 0000000..3bf0575
--- /dev/null
+++ b/Documentation/power/tuxonice.txt
@@ -0,0 +1,948 @@
+ --- TuxOnIce, version 3.0 ---
+
+1. What is it?
+2. Why would you want it?
+3. What do you need to use it?
+4. Why not just use the version already in the kernel?
+5. How do you use it?
+6. What do all those entries in /sys/power/tuxonice do?
+7. How do you get support?
+8. I think I've found a bug. What should I do?
+9. When will XXX be supported?
+10 How does it work?
+11. Who wrote TuxOnIce?
+
+1. What is it?
+
+ Imagine you're sitting at your computer, working away. For some reason, you
+ need to turn off your computer for a while - perhaps it's time to go home
+ for the day. When you come back to your computer next, you're going to want
+ to carry on where you left off. Now imagine that you could push a button and
+ have your computer store the contents of its memory to disk and power down.
+ Then, when you next start up your computer, it loads that image back into
+ memory and you can carry on from where you were, just as if you'd never
+ turned the computer off. You have far less time to start up, no reopening of
+ applications or finding what directory you put that file in yesterday.
+ That's what TuxOnIce does.
+
+ TuxOnIce has a long heritage. It began life as work by Gabor Kuti, who,
+ with some help from Pavel Machek, got an early version going in 1999. The
+ project was then taken over by Florent Chabaud while still in alpha version
+ numbers. Nigel Cunningham came on the scene when Florent was unable to
+ continue, moving the project into betas, then 1.0, 2.0 and so on up to
+ the present series. During the 2.0 series, the name was contracted to
+ Suspend2 and the website suspend2.net created. Beginning around July 2007,
+ a transition to calling the software TuxOnIce was made, to seek to help
+ make it clear that TuxOnIce is more concerned with hibernation than suspend
+ to ram.
+
+ Pavel Machek's swsusp code, which was merged around 2.5.17 retains the
+ original name, and was essentially a fork of the beta code until Rafael
+ Wysocki came on the scene in 2005 and began to improve it further.
+
+2. Why would you want it?
+
+ Why wouldn't you want it?
+
+ Being able to save the state of your system and quickly restore it improves
+ your productivity - you get a useful system in far less time than through
+ the normal boot process. You also get to be completely 'green', using zero
+ power, or as close to that as possible (the computer may still provide
+ minimal power to some devices, so they can initiate a power on, but that
+ will be the same amount of power as would be used if you told the computer
+ to shutdown.
+
+3. What do you need to use it?
+
+ a. Kernel Support.
+
+ i) The TuxOnIce patch.
+
+ TuxOnIce is part of the Linux Kernel. This version is not part of Linus's
+ 2.6 tree at the moment, so you will need to download the kernel source and
+ apply the latest patch. Having done that, enable the appropriate options in
+ make [menu|x]config (under Power Management Options - look for "Enhanced
+ Hibernation"), compile and install your kernel. TuxOnIce works with SMP,
+ Highmem, preemption, fuse filesystems, x86-32, PPC and x86_64.
+
+ TuxOnIce patches are available from http://tuxonice.net.
+
+ ii) Compression support.
+
+ Compression support is implemented via the cryptoapi. You will therefore want
+ to select any Cryptoapi transforms that you want to use on your image from
+ the Cryptoapi menu while configuring your kernel. We recommend the use of the
+ LZO compression method - it is very fast and still achieves good compression.
+
+ You can also tell TuxOnIce to write its image to an encrypted and/or
+ compressed filesystem/swap partition. In that case, you don't need to do
+ anything special for TuxOnIce when it comes to kernel configuration.
+
+ iii) Configuring other options.
+
+ While you're configuring your kernel, try to configure as much as possible
+ to build as modules. We recommend this because there are a number of drivers
+ that are still in the process of implementing proper power management
+ support. In those cases, the best way to work around their current lack is
+ to build them as modules and remove the modules while hibernating. You might
+ also bug the driver authors to get their support up to speed, or even help!
+
+ b. Storage.
+
+ i) Swap.
+
+ TuxOnIce can store the hibernation image in your swap partition, a swap file or
+ a combination thereof. Whichever combination you choose, you will probably
+ want to create enough swap space to store the largest image you could have,
+ plus the space you'd normally use for swap. A good rule of thumb would be
+ to calculate the amount of swap you'd want without using TuxOnIce, and then
+ add the amount of memory you have. This swapspace can be arranged in any way
+ you'd like. It can be in one partition or file, or spread over a number. The
+ only requirement is that they be active when you start a hibernation cycle.
+
+ There is one exception to this requirement. TuxOnIce has the ability to turn
+ on one swap file or partition at the start of hibernating and turn it back off
+ at the end. If you want to ensure you have enough memory to store a image
+ when your memory is fully used, you might want to make one swap partition or
+ file for 'normal' use, and another for TuxOnIce to activate & deactivate
+ automatically. (Further details below).
+
+ ii) Normal files.
+
+ TuxOnIce includes a 'file allocator'. The file allocator can store your
+ image in a simple file. Since Linux has the concept of everything being a
+ file, this is more powerful than it initially sounds. If, for example, you
+ were to set up a network block device file, you could hibernate to a network
+ server. This has been tested and works to a point, but nbd itself isn't
+ stateless enough for our purposes.
+
+ Take extra care when setting up the file allocator. If you just type
+ commands without thinking and then try to hibernate, you could cause
+ irreversible corruption on your filesystems! Make sure you have backups.
+
+ Most people will only want to hibernate to a local file. To achieve that, do
+ something along the lines of:
+
+ echo "TuxOnIce" > /hibernation-file
+ dd if=/dev/zero bs=1M count=512 >> /hibernation-file
+
+ This will create a 512MB file called /hibernation-file. To get TuxOnIce to use
+ it:
+
+ echo /hibernation-file > /sys/power/tuxonice/file/target
+
+ Then
+
+ cat /sys/power/tuxonice/resume
+
+ Put the results of this into your bootloader's configuration (see also step
+ C, below):
+
+ ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE---
+ # cat /sys/power/tuxonice/resume
+ file:/dev/hda2:0x1e001
+
+ In this example, we would edit the append= line of our lilo.conf|menu.lst
+ so that it included:
+
+ resume=file:/dev/hda2:0x1e001
+ ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE---
+
+ For those who are thinking 'Could I make the file sparse?', the answer is
+ 'No!'. At the moment, there is no way for TuxOnIce to fill in the holes in
+ a sparse file while hibernating. In the longer term (post merge!), I'd like
+ to change things so that the file could be dynamically resized and have
+ holes filled as needed. Right now, however, that's not possible and not a
+ priority.
+
+ c. Bootloader configuration.
+
+ Using TuxOnIce also requires that you add an extra parameter to
+ your lilo.conf or equivalent. Here's an example for a swap partition:
+
+ append="resume=swap:/dev/hda1"
+
+ This would tell TuxOnIce that /dev/hda1 is a swap partition you
+ have. TuxOnIce will use the swap signature of this partition as a
+ pointer to your data when you hibernate. This means that (in this example)
+ /dev/hda1 doesn't need to be _the_ swap partition where all of your data
+ is actually stored. It just needs to be a swap partition that has a
+ valid signature.
+
+ You don't need to have a swap partition for this purpose. TuxOnIce
+ can also use a swap file, but usage is a little more complex. Having made
+ your swap file, turn it on and do
+
+ cat /sys/power/tuxonice/swap/headerlocations
+
+ (this assumes you've already compiled your kernel with TuxOnIce
+ support and booted it). The results of the cat command will tell you
+ what you need to put in lilo.conf:
+
+ For swap partitions like /dev/hda1, simply use resume=/dev/hda1.
+ For swapfile `swapfile`, use resume=swap:/dev/hda2:0x242d.
+
+ If the swapfile changes for any reason (it is moved to a different
+ location, it is deleted and recreated, or the filesystem is
+ defragmented) then you will have to check
+ /sys/power/tuxonice/swap/headerlocations for a new resume_block value.
+
+ Once you've compiled and installed the kernel and adjusted your bootloader
+ configuration, you should only need to reboot for the most basic part
+ of TuxOnIce to be ready.
+
+ If you only compile in the swap allocator, or only compile in the file
+ allocator, you don't need to add the "swap:" part of the resume=
+ parameters above. resume=/dev/hda2:0x242d will work just as well. If you
+ have compiled both and your storage is on swap, you can also use this
+ format (the swap allocator is the default allocator).
+
+ When compiling your kernel, one of the options in the 'Power Management
+ Support' menu, just above the 'Enhanced Hibernation (TuxOnIce)' entry is
+ called 'Default resume partition'. This can be used to set a default value
+ for the resume= parameter.
+
+ d. The hibernate script.
+
+ Since the driver model in 2.6 kernels is still being developed, you may need
+ to do more than just configure TuxOnIce. Users of TuxOnIce usually start the
+ process via a script which prepares for the hibernation cycle, tells the
+ kernel to do its stuff and then restore things afterwards. This script might
+ involve:
+
+ - Switching to a text console and back if X doesn't like the video card
+ status on resume.
+ - Un/reloading drivers that don't play well with hibernation.
+
+ Note that you might not be able to unload some drivers if there are
+ processes using them. You might have to kill off processes that hold
+ devices open. Hint: if your X server accesses an USB mouse, doing a
+ 'chvt' to a text console releases the device and you can unload the
+ module.
+
+ Check out the latest script (available on tuxonice.net).
+
+ e. The userspace user interface.
+
+ TuxOnIce has very limited support for displaying status if you only apply
+ the kernel patch - it can printk messages, but that is all. In addition,
+ some of the functions mentioned in this document (such as cancelling a cycle
+ or performing interactive debugging) are unavailable. To utilise these
+ functions, or simply get a nice display, you need the 'userui' component.
+ Userui comes in three flavours, usplash, fbsplash and text. Text should
+ work on any console. Usplash and fbsplash require the appropriate
+ (distro specific?) support.
+
+ To utilise a userui, TuxOnIce just needs to be told where to find the
+ userspace binary:
+
+ echo "/usr/local/sbin/tuxoniceui_fbsplash" > /sys/power/tuxonice/user_interface/program
+
+ The hibernate script can do this for you, and a default value for this
+ setting can be configured when compiling the kernel. This path is also
+ stored in the image header, so if you have an initrd or initramfs, you can
+ use the userui during the first part of resuming (prior to the atomic
+ restore) by putting the binary in the same path in your initrd/ramfs.
+ Alternatively, you can put it in a different location and do an echo
+ similar to the above prior to the echo > do_resume. The value saved in the
+ image header will then be ignored.
+
+4. Why not just use the version already in the kernel?
+
+ The version in the vanilla kernel has a number of drawbacks. The most
+ serious of these are:
+ - it has a maximum image size of 1/2 total memory;
+ - it doesn't allocate storage until after it has snapshotted memory.
+ This means that you can't be sure hibernating will work until you
+ see it start to write the image;
+ - it does not allow you to press escape to cancel a cycle;
+ - it does not allow you to press escape to cancel resuming;
+ - it does not allow you to automatically swapon a file when
+ starting a cycle;
+ - it does not allow you to use multiple swap partitions or files;
+ - it does not allow you to use ordinary files;
+ - it just invalidates an image and continues to boot if you
+ accidentally boot the wrong kernel after hibernating;
+ - it doesn't support any sort of nice display while hibernating;
+ - it is moving toward requiring that you have an initrd/initramfs
+ to ever have a hope of resuming (uswsusp). While uswsusp will
+ address some of the concerns above, it won't address all of them,
+ and will be more complicated to get set up;
+ - it doesn't have support for suspend-to-both (write a hibernation
+ image, then suspend to ram; I think this is known as ReadySafe
+ under M$).
+
+5. How do you use it?
+
+ A hibernation cycle can be started directly by doing:
+
+ echo > /sys/power/tuxonice/do_hibernate
+
+ In practice, though, you'll probably want to use the hibernate script
+ to unload modules, configure the kernel the way you like it and so on.
+ In that case, you'd do (as root):
+
+ hibernate
+
+ See the hibernate script's man page for more details on the options it
+ takes.
+
+ If you're using the text or splash user interface modules, one feature of
+ TuxOnIce that you might find useful is that you can press Escape at any time
+ during hibernating, and the process will be aborted.
+
+ Due to the way hibernation works, this means you'll have your system back and
+ perfectly usable almost instantly. The only exception is when it's at the
+ very end of writing the image. Then it will need to reload a small (usually
+ 4-50MBs, depending upon the image characteristics) portion first.
+
+ Likewise, when resuming, you can press escape and resuming will be aborted.
+ The computer will then powerdown again according to settings at that time for
+ the powerdown method or rebooting.
+
+ You can change the settings for powering down while the image is being
+ written by pressing 'R' to toggle rebooting and 'O' to toggle between
+ suspending to ram and powering down completely).
+
+ If you run into problems with resuming, adding the "noresume" option to
+ the kernel command line will let you skip the resume step and recover your
+ system. This option shouldn't normally be needed, because TuxOnIce modifies
+ the image header prior to the atomic restore, and will thus prompt you
+ if it detects that you've tried to resume an image before (this flag is
+ removed if you press Escape to cancel a resume, so you won't be prompted
+ then).
+
+ Recent kernels (2.6.24 onwards) add support for resuming from a different
+ kernel to the one that was hibernated (thanks to Rafael for his work on
+ this - I've just embraced and enhanced the support for TuxOnIce). This
+ should further reduce the need for you to use the noresume option.
+
+6. What do all those entries in /sys/power/tuxonice do?
+
+ /sys/power/tuxonice is the directory which contains files you can use to
+ tune and configure TuxOnIce to your liking. The exact contents of
+ the directory will depend upon the version of TuxOnIce you're
+ running and the options you selected at compile time. In the following
+ descriptions, names in brackets refer to compile time options.
+ (Note that they're all dependant upon you having selected CONFIG_TUXONICE
+ in the first place!).
+
+ Since the values of these settings can open potential security risks, the
+ writeable ones are accessible only to the root user. You may want to
+ configure sudo to allow you to invoke your hibernate script as an ordinary
+ user.
+
+ - alloc/failure_test
+
+ This debugging option provides a way of testing TuxOnIce's handling of
+ memory allocation failures. Each allocation type that TuxOnIce makes has
+ been given a unique number (see the source code). Echo the appropriate
+ number into this entry, and when TuxOnIce attempts to do that allocation,
+ it will pretend there was a failure and act accordingly.
+
+ - alloc/find_max_mem_allocated
+
+ This debugging option will cause TuxOnIce to find the maximum amount of
+ memory it used during a cycle, and report that information in debugging
+ information at the end of the cycle.
+
+ - alt_resume_param
+
+ Instead of powering down after writing a hibernation image, TuxOnIce
+ supports resuming from a different image. This entry lets you set the
+ location of the signature for that image (the resume= value you'd use
+ for it). Using an alternate image and keep_image mode, you can do things
+ like using an alternate image to power down an uninterruptible power
+ supply.
+
+ - block_io/target_outstanding_io
+
+ This value controls the amount of memory that the block I/O code says it
+ needs when the core code is calculating how much memory is needed for
+ hibernating and for resuming. It doesn't directly control the amount of
+ I/O that is submitted at any one time - that depends on the amount of
+ available memory (we may have more available than we asked for), the
+ throughput that is being achieved and the ability of the CPU to keep up
+ with disk throughput (particularly where we're compressing pages).
+
+ - checksum/enabled
+
+ Use cryptoapi hashing routines to verify that Pageset2 pages don't change
+ while we're saving the first part of the image, and to get any pages that
+ do change resaved in the atomic copy. This should normally not be needed,
+ but if you're seeing issues, please enable this. If your issues stop you
+ being able to resume, enable this option, hibernate and cancel the cycle
+ after the atomic copy is done. If the debugging info shows a non-zero
+ number of pages resaved, please report this to Nigel.
+
+ - compression/algorithm
+
+ Set the cryptoapi algorithm used for compressing the image.
+
+ - compression/expected_compression
+
+ These values allow you to set an expected compression ratio, which TuxOnice
+ will use in calculating whether it meets constraints on the image size. If
+ this expected compression ratio is not attained, the hibernation cycle will
+ abort, so it is wise to allow some spare. You can see what compression
+ ratio is achieved in the logs after hibernating.
+
+ - debug_info:
+
+ This file returns information about your configuration that may be helpful
+ in diagnosing problems with hibernating.
+
+ - did_suspend_to_both:
+
+ This file can be used when you hibernate with powerdown method 3 (ie suspend
+ to ram after writing the image). There can be two outcomes in this case. We
+ can resume from the suspend-to-ram before the battery runs out, or we can run
+ out of juice and and up resuming like normal. This entry lets you find out,
+ post resume, which way we went. If the value is 1, we resumed from suspend
+ to ram. This can be useful when actions need to be run post suspend-to-ram
+ that don't need to be run if we did the normal resume from power off.
+
+ - do_hibernate:
+
+ When anything is written to this file, the kernel side of TuxOnIce will
+ begin to attempt to write an image to disk and power down. You'll normally
+ want to run the hibernate script instead, to get modules unloaded first.
+
+ - do_resume:
+
+ When anything is written to this file TuxOnIce will attempt to read and
+ restore an image. If there is no image, it will return almost immediately.
+ If an image exists, the echo > will never return. Instead, the original
+ kernel context will be restored and the original echo > do_hibernate will
+ return.
+
+ - */enabled
+
+ These option can be used to temporarily disable various parts of TuxOnIce.
+
+ - extra_pages_allowance
+
+ When TuxOnIce does its atomic copy, it calls the driver model suspend
+ and resume methods. If you have DRI enabled with a driver such as fglrx,
+ this can result in the driver allocating a substantial amount of memory
+ for storing its state. Extra_pages_allowance tells TuxOnIce how much
+ extra memory it should ensure is available for those allocations. If
+ your attempts at hibernating end with a message in dmesg indicating that
+ insufficient extra pages were allowed, you need to increase this value.
+
+ - file/target:
+
+ Read this value to get the current setting. Write to it to point TuxOnice
+ at a new storage location for the file allocator. See section 3.b.ii above
+ for details of how to set up the file allocator.
+
+ - freezer_test
+
+ This entry can be used to get TuxOnIce to just test the freezer and prepare
+ an image without actually doing a hibernation cycle. It is useful for
+ diagnosing freezing and image preparation issues.
+
+ - full_pageset2
+
+ TuxOnIce divides the pages that are stored in an image into two sets. The
+ difference between the two sets is that pages in pageset 1 are atomically
+ copied, and pages in pageset 2 are written to disk without being copied
+ first. A page CAN be written to disk without being copied first if and only
+ if its contents will not be modified or used at any time after userspace
+ processes are frozen. A page MUST be in pageset 1 if its contents are
+ modified or used at any time after userspace processes have been frozen.
+
+ Normally (ie if this option is enabled), TuxOnIce will put all pages on the
+ per-zone LRUs in pageset2, then remove those pages used by any userspace
+ user interface helper and TuxOnIce storage manager that are running,
+ together with pages used by the GEM memory manager introduced around 2.6.28
+ kernels.
+
+ If this option is disabled, a much more conservative approach will be taken.
+ The only pages in pageset2 will be those belonging to userspace processes,
+ with the exclusion of those belonging to the TuxOnIce userspace helpers
+ mentioned above. This will result in a much smaller pageset2, and will
+ therefore result in smaller images than are possible with this option
+ enabled.
+
+ - ignore_rootfs
+
+ TuxOnIce records which device is mounted as the root filesystem when
+ writing the hibernation image. It will normally check at resume time that
+ this device isn't already mounted - that would be a cause of filesystem
+ corruption. In some particular cases (RAM based root filesystems), you
+ might want to disable this check. This option allows you to do that.
+
+ - image_exists:
+
+ Can be used in a script to determine whether a valid image exists at the
+ location currently pointed to by resume=. Returns up to three lines.
+ The first is whether an image exists (-1 for unsure, otherwise 0 or 1).
+ If an image eixsts, additional lines will return the machine and version.
+ Echoing anything to this entry removes any current image.
+
+ - image_size_limit:
+
+ The maximum size of hibernation image written to disk, measured in megabytes
+ (1024*1024).
+
+ - last_result:
+
+ The result of the last hibernation cycle, as defined in
+ include/linux/suspend-debug.h with the values SUSPEND_ABORTED to
+ SUSPEND_KEPT_IMAGE. This is a bitmask.
+
+ - late_cpu_hotplug:
+
+ This sysfs entry controls whether cpu hotplugging is done - as normal - just
+ before (unplug) and after (replug) the atomic copy/restore (so that all
+ CPUs/cores are available for multithreaded I/O). The alternative is to
+ unplug all secondary CPUs/cores at the start of hibernating/resuming, and
+ replug them at the end of resuming. No multithreaded I/O will be possible in
+ this configuration, but the odd machine has been reported to require it.
+
+ - lid_file:
+
+ This determines which ACPI button file we look in to determine whether the
+ lid is open or closed after resuming from suspend to disk or power off.
+ If the entry is set to "lid/LID", we'll open /proc/acpi/button/lid/LID/state
+ and check its contents at the appropriate moment. See post_wake_state below
+ for more details on how this entry is used.
+
+ - log_everything (CONFIG_PM_DEBUG):
+
+ Setting this option results in all messages printed being logged. Normally,
+ only a subset are logged, so as to not slow the process and not clutter the
+ logs. Useful for debugging. It can be toggled during a cycle by pressing
+ 'L'.
+
+ - no_load_direct:
+
+ This is a debugging option. If, when loading the atomically copied pages of
+ an image, TuxOnIce finds that the destination address for a page is free,
+ it will normally allocate the image, load the data directly into that
+ address and skip it in the atomic restore. If this option is disabled, the
+ page will be loaded somewhere else and atomically restored like other pages.
+
+ - no_flusher_thread:
+
+ When doing multithreaded I/O (see below), the first online CPU can be used
+ to _just_ submit compressed pages when writing the image, rather than
+ compressing and submitting data. This option is normally disabled, but has
+ been included because Nigel would like to see whether it will be more useful
+ as the number of cores/cpus in computers increases.
+
+ - no_multithreaded_io:
+
+ TuxOnIce will normally create one thread per cpu/core on your computer,
+ each of which will then perform I/O. This will generally result in
+ throughput that's the maximum the storage medium can handle. There
+ shouldn't be any reason to disable multithreaded I/O now, but this option
+ has been retained for debugging purposes.
+
+ - no_pageset2
+
+ See the entry for full_pageset2 above for an explanation of pagesets.
+ Enabling this option causes TuxOnIce to do an atomic copy of all pages,
+ thereby limiting the maximum image size to 1/2 of memory, as swsusp does.
+
+ - no_pageset2_if_unneeded
+
+ See the entry for full_pageset2 above for an explanation of pagesets.
+ Enabling this option causes TuxOnIce to act like no_pageset2 was enabled
+ if and only it isn't needed anyway. This option may still make TuxOnIce
+ less reliable because pageset2 pages are normally used to store the
+ atomic copy - drivers that want to do allocations of larger amounts of
+ memory in one shot will be more likely to find that those amounts aren't
+ available if this option is enabled.
+
+ - pause_between_steps (CONFIG_PM_DEBUG):
+
+ This option is used during debugging, to make TuxOnIce pause between
+ each step of the process. It is ignored when the nice display is on.
+
+ - post_wake_state:
+
+ TuxOnIce provides support for automatically waking after a user-selected
+ delay, and using a different powerdown method if the lid is still closed.
+ (Yes, we're assuming a laptop). This entry lets you choose what state
+ should be entered next. The values are those described under
+ powerdown_method, below. It can be used to suspend to RAM after hibernating,
+ then powerdown properly (say) 20 minutes. It can also be used to power down
+ properly, then wake at (say) 6.30am and suspend to RAM until you're ready
+ to use the machine.
+
+ - powerdown_method:
+
+ Used to select a method by which TuxOnIce should powerdown after writing the
+ image. Currently:
+
+ 0: Don't use ACPI to power off.
+ 3: Attempt to enter Suspend-to-ram.
+ 4: Attempt to enter ACPI S4 mode.
+ 5: Attempt to power down via ACPI S5 mode.
+
+ Note that these options are highly dependant upon your hardware & software:
+
+ 3: When succesful, your machine suspends to ram instead of powering off.
+ The advantage of using this mode is that it doesn't matter whether your
+ battery has enough charge to make it through to your next resume. If it
+ lasts, you will simply resume from suspend to ram (and the image on disk
+ will be discarded). If the battery runs out, you will resume from disk
+ instead. The disadvantage is that it takes longer than a normal
+ suspend-to-ram to enter the state, since the suspend-to-disk image needs
+ to be written first.
+ 4/5: When successful, your machine will be off and comsume (almost) no power.
+ But it might still react to some external events like opening the lid or
+ trafic on a network or usb device. For the bios, resume is then the same
+ as warm boot, similar to a situation where you used the command `reboot'
+ to reboot your machine. If your machine has problems on warm boot or if
+ you want to protect your machine with the bios password, this is probably
+ not the right choice. Mode 4 may be necessary on some machines where ACPI
+ wake up methods need to be run to properly reinitialise hardware after a
+ hibernation cycle.
+ 0: Switch the machine completely off. The only possible wakeup is the power
+ button. For the bios, resume is then the same as a cold boot, in
+ particular you would have to provide your bios boot password if your
+ machine uses that feature for booting.
+
+ - progressbar_granularity_limit:
+
+ This option can be used to limit the granularity of the progress bar
+ displayed with a bootsplash screen. The value is the maximum number of
+ steps. That is, 10 will make the progress bar jump in 10% increments.
+
+ - reboot:
+
+ This option causes TuxOnIce to reboot rather than powering down
+ at the end of saving an image. It can be toggled during a cycle by pressing
+ 'R'.
+
+ - resume:
+
+ This sysfs entry can be used to read and set the location in which TuxOnIce
+ will look for the signature of an image - the value set using resume= at
+ boot time or CONFIG_PM_STD_PARTITION ("Default resume partition"). By
+ writing to this file as well as modifying your bootloader's configuration
+ file (eg menu.lst), you can set or reset the location of your image or the
+ method of storing the image without rebooting.
+
+ - replace_swsusp (CONFIG_TOI_REPLACE_SWSUSP):
+
+ This option makes
+
+ echo disk > /sys/power/state
+
+ activate TuxOnIce instead of swsusp. Regardless of whether this option is
+ enabled, any invocation of swsusp's resume time trigger will cause TuxOnIce
+ to check for an image too. This is due to the fact that at resume time, we
+ can't know whether this option was enabled until we see if an image is there
+ for us to resume from. (And when an image exists, we don't care whether we
+ did replace swsusp anyway - we just want to resume).
+
+ - resume_commandline:
+
+ This entry can be read after resuming to see the commandline that was used
+ when resuming began. You might use this to set up two bootloader entries
+ that are the same apart from the fact that one includes a extra append=
+ argument "at_work=1". You could then grep resume_commandline in your
+ post-resume scripts and configure networking (for example) differently
+ depending upon whether you're at home or work. resume_commandline can be
+ set to arbitrary text if you wish to remove sensitive contents.
+
+ - swap/swapfilename:
+
+ This entry is used to specify the swapfile or partition that
+ TuxOnIce will attempt to swapon/swapoff automatically. Thus, if
+ I normally use /dev/hda1 for swap, and want to use /dev/hda2 for specifically
+ for my hibernation image, I would
+
+ echo /dev/hda2 > /sys/power/tuxonice/swap/swapfile
+
+ /dev/hda2 would then be automatically swapon'd and swapoff'd. Note that the
+ swapon and swapoff occur while other processes are frozen (including kswapd)
+ so this swap file will not be used up when attempting to free memory. The
+ parition/file is also given the highest priority, so other swapfiles/partitions
+ will only be used to save the image when this one is filled.
+
+ The value of this file is used by headerlocations along with any currently
+ activated swapfiles/partitions.
+
+ - swap/headerlocations:
+
+ This option tells you the resume= options to use for swap devices you
+ currently have activated. It is particularly useful when you only want to
+ use a swap file to store your image. See above for further details.
+
+ - test_bio
+
+ This is a debugging option. When enabled, TuxOnIce will not hibernate.
+ Instead, when asked to write an image, it will skip the atomic copy,
+ just doing the writing of the image and then returning control to the
+ user at the point where it would have powered off. This is useful for
+ testing throughput in different configurations.
+
+ - test_filter_speed
+
+ This is a debugging option. When enabled, TuxOnIce will not hibernate.
+ Instead, when asked to write an image, it will not write anything or do
+ an atomic copy, but will only run any enabled compression algorithm on the
+ data that would have been written (the source pages of the atomic copy in
+ the case of pageset 1). This is useful for comparing the performance of
+ compression algorithms and for determining the extent to which an upgrade
+ to your storage method would improve hibernation speed.
+
+ - user_interface/debug_sections (CONFIG_PM_DEBUG):
+
+ This value, together with the console log level, controls what debugging
+ information is displayed. The console log level determines the level of
+ detail, and this value determines what detail is displayed. This value is
+ a bit vector, and the meaning of the bits can be found in the kernel tree
+ in include/linux/tuxonice.h. It can be overridden using the kernel's
+ command line option suspend_dbg.
+
+ - user_interface/default_console_level (CONFIG_PM_DEBUG):
+
+ This determines the value of the console log level at the start of a
+ hibernation cycle. If debugging is compiled in, the console log level can be
+ changed during a cycle by pressing the digit keys. Meanings are:
+
+ 0: Nice display.
+ 1: Nice display plus numerical progress.
+ 2: Errors only.
+ 3: Low level debugging info.
+ 4: Medium level debugging info.
+ 5: High level debugging info.
+ 6: Verbose debugging info.
+
+ - user_interface/enable_escape:
+
+ Setting this to "1" will enable you abort a hibernation cycle or resuming by
+ pressing escape, "0" (default) disables this feature. Note that enabling
+ this option means that you cannot initiate a hibernation cycle and then walk
+ away from your computer, expecting it to be secure. With feature disabled,
+ you can validly have this expectation once TuxOnice begins to write the
+ image to disk. (Prior to this point, it is possible that TuxOnice might
+ about because of failure to freeze all processes or because constraints
+ on its ability to save the image are not met).
+
+ - user_interface/program
+
+ This entry is used to tell TuxOnice what userspace program to use for
+ providing a user interface while hibernating. The program uses a netlink
+ socket to pass messages back and forward to the kernel, allowing all of the
+ functions formerly implemented in the kernel user interface components.
+
+ - version:
+
+ The version of TuxOnIce you have compiled into the currently running kernel.
+
+ - wake_alarm_dir:
+
+ As mentioned above (post_wake_state), TuxOnIce supports automatically waking
+ after some delay. This entry allows you to select which wake alarm to use.
+ It should contain the value "rtc0" if you're wanting to use
+ /sys/class/rtc/rtc0.
+
+ - wake_delay:
+
+ This value determines the delay from the end of writing the image until the
+ wake alarm is triggered. You can set an absolute time by writing the desired
+ time into /sys/class/rtc/<wake_alarm_dir>/wakealarm and leaving these values
+ empty.
+
+ Note that for the wakeup to actually occur, you may need to modify entries
+ in /proc/acpi/wakeup. This is done by echoing the name of the button in the
+ first column (eg PBTN) into the file.
+
+7. How do you get support?
+
+ Glad you asked. TuxOnIce is being actively maintained and supported
+ by Nigel (the guy doing most of the kernel coding at the moment), Bernard
+ (who maintains the hibernate script and userspace user interface components)
+ and its users.
+
+ Resources availble include HowTos, FAQs and a Wiki, all available via
+ tuxonice.net. You can find the mailing lists there.
+
+8. I think I've found a bug. What should I do?
+
+ By far and a way, the most common problems people have with TuxOnIce
+ related to drivers not having adequate power management support. In this
+ case, it is not a bug with TuxOnIce, but we can still help you. As we
+ mentioned above, such issues can usually be worked around by building the
+ functionality as modules and unloading them while hibernating. Please visit
+ the Wiki for up-to-date lists of known issues and work arounds.
+
+ If this information doesn't help, try running:
+
+ hibernate --bug-report
+
+ ..and sending the output to the users mailing list.
+
+ Good information on how to provide us with useful information from an
+ oops is found in the file REPORTING-BUGS, in the top level directory
+ of the kernel tree. If you get an oops, please especially note the
+ information about running what is printed on the screen through ksymoops.
+ The raw information is useless.
+
+9. When will XXX be supported?
+
+ If there's a feature missing from TuxOnIce that you'd like, feel free to
+ ask. We try to be obliging, within reason.
+
+ Patches are welcome. Please send to the list.
+
+10. How does it work?
+
+ TuxOnIce does its work in a number of steps.
+
+ a. Freezing system activity.
+
+ The first main stage in hibernating is to stop all other activity. This is
+ achieved in stages. Processes are considered in fours groups, which we will
+ describe in reverse order for clarity's sake: Threads with the PF_NOFREEZE
+ flag, kernel threads without this flag, userspace processes with the
+ PF_SYNCTHREAD flag and all other processes. The first set (PF_NOFREEZE) are
+ untouched by the refrigerator code. They are allowed to run during hibernating
+ and resuming, and are used to support user interaction, storage access or the
+ like. Other kernel threads (those unneeded while hibernating) are frozen last.
+ This leaves us with userspace processes that need to be frozen. When a
+ process enters one of the *_sync system calls, we set a PF_SYNCTHREAD flag on
+ that process for the duration of that call. Processes that have this flag are
+ frozen after processes without it, so that we can seek to ensure that dirty
+ data is synced to disk as quickly as possible in a situation where other
+ processes may be submitting writes at the same time. Freezing the processes
+ that are submitting data stops new I/O from being submitted. Syncthreads can
+ then cleanly finish their work. So the order is:
+
+ - Userspace processes without PF_SYNCTHREAD or PF_NOFREEZE;
+ - Userspace processes with PF_SYNCTHREAD (they won't have NOFREEZE);
+ - Kernel processes without PF_NOFREEZE.
+
+ b. Eating memory.
+
+ For a successful hibernation cycle, you need to have enough disk space to store the
+ image and enough memory for the various limitations of TuxOnIce's
+ algorithm. You can also specify a maximum image size. In order to attain
+ to those constraints, TuxOnIce may 'eat' memory. If, after freezing
+ processes, the constraints aren't met, TuxOnIce will thaw all the
+ other processes and begin to eat memory until its calculations indicate
+ the constraints are met. It will then freeze processes again and recheck
+ its calculations.
+
+ c. Allocation of storage.
+
+ Next, TuxOnIce allocates the storage that will be used to save
+ the image.
+
+ The core of TuxOnIce knows nothing about how or where pages are stored. We
+ therefore request the active allocator (remember you might have compiled in
+ more than one!) to allocate enough storage for our expect image size. If
+ this request cannot be fulfilled, we eat more memory and try again. If it
+ is fulfiled, we seek to allocate additional storage, just in case our
+ expected compression ratio (if any) isn't achieved. This time, however, we
+ just continue if we can't allocate enough storage.
+
+ If these calls to our allocator change the characteristics of the image
+ such that we haven't allocated enough memory, we also loop. (The allocator
+ may well need to allocate space for its storage information).
+
+ d. Write the first part of the image.
+
+ TuxOnIce stores the image in two sets of pages called 'pagesets'.
+ Pageset 2 contains pages on the active and inactive lists; essentially
+ the page cache. Pageset 1 contains all other pages, including the kernel.
+ We use two pagesets for one important reason: We need to make an atomic copy
+ of the kernel to ensure consistency of the image. Without a second pageset,
+ that would limit us to an image that was at most half the amount of memory
+ available. Using two pagesets allows us to store a full image. Since pageset
+ 2 pages won't be needed in saving pageset 1, we first save pageset 2 pages.
+ We can then make our atomic copy of the remaining pages using both pageset 2
+ pages and any other pages that are free. While saving both pagesets, we are
+ careful not to corrupt the image. Among other things, we use lowlevel block
+ I/O routines that don't change the pagecache contents.
+
+ The next step, then, is writing pageset 2.
+
+ e. Suspending drivers and storing processor context.
+
+ Having written pageset2, TuxOnIce calls the power management functions to
+ notify drivers of the hibernation, and saves the processor state in preparation
+ for the atomic copy of memory we are about to make.
+
+ f. Atomic copy.
+
+ At this stage, everything else but the TuxOnIce code is halted. Processes
+ are frozen or idling, drivers are quiesced and have stored (ideally and where
+ necessary) their configuration in memory we are about to atomically copy.
+ In our lowlevel architecture specific code, we have saved the CPU state.
+ We can therefore now do our atomic copy before resuming drivers etc.
+
+ g. Save the atomic copy (pageset 1).
+
+ TuxOnice can then write the atomic copy of the remaining pages. Since we
+ have copied the pages into other locations, we can continue to use the
+ normal block I/O routines without fear of corruption our image.
+
+ f. Save the image header.
+
+ Nearly there! We save our settings and other parameters needed for
+ reloading pageset 1 in an 'image header'. We also tell our allocator to
+ serialise its data at this stage, so that it can reread the image at resume
+ time.
+
+ g. Set the image header.
+
+ Finally, we edit the header at our resume= location. The signature is
+ changed by the allocator to reflect the fact that an image exists, and to
+ point to the start of that data if necessary (swap allocator).
+
+ h. Power down.
+
+ Or reboot if we're debugging and the appropriate option is selected.
+
+ Whew!
+
+ Reloading the image.
+ --------------------
+
+ Reloading the image is essentially the reverse of all the above. We load
+ our copy of pageset 1, being careful to choose locations that aren't going
+ to be overwritten as we copy it back (We start very early in the boot
+ process, so there are no other processes to quiesce here). We then copy
+ pageset 1 back to its original location in memory and restore the process
+ context. We are now running with the original kernel. Next, we reload the
+ pageset 2 pages, free the memory and swap used by TuxOnIce, restore
+ the pageset header and restart processes. Sounds easy in comparison to
+ hibernating, doesn't it!
+
+ There is of course more to TuxOnIce than this, but this explanation
+ should be a good start. If there's interest, I'll write further
+ documentation on range pages and the low level I/O.
+
+11. Who wrote TuxOnIce?
+
+ (Answer based on the writings of Florent Chabaud, credits in files and
+ Nigel's limited knowledge; apologies to anyone missed out!)
+
+ The main developers of TuxOnIce have been...
+
+ Gabor Kuti
+ Pavel Machek
+ Florent Chabaud
+ Bernard Blackham
+ Nigel Cunningham
+
+ Significant portions of swsusp, the code in the vanilla kernel which
+ TuxOnIce enhances, have been worked on by Rafael Wysocki. Thanks should
+ also be expressed to him.
+
+ The above mentioned developers have been aided in their efforts by a host
+ of hundreds, if not thousands of testers and people who have submitted bug
+ fixes & suggestions. Of special note are the efforts of Michael Frank, who
+ had his computers repetitively hibernate and resume for literally tens of
+ thousands of cycles and developed scripts to stress the system and test
+ TuxOnIce far beyond the point most of us (Nigel included!) would consider
+ testing. His efforts have contributed as much to TuxOnIce as any of the
+ names above.
diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index 081c497..75bde3d 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -16,6 +16,8 @@ hwpoison.txt
- explains what hwpoison is
ksm.txt
- how to use the Kernel Samepage Merging feature.
+uksm.txt
+ - Introduction to Ultra KSM
numa
- information about NUMA specific code in the Linux vm.
numa_memory_policy.txt
diff --git a/Documentation/vm/uksm.txt b/Documentation/vm/uksm.txt
new file mode 100644
index 0000000..9b2cb51
--- /dev/null
+++ b/Documentation/vm/uksm.txt
@@ -0,0 +1,57 @@
+The Ultra Kernel Samepage Merging feature
+----------------------------------------------
+/*
+ * Ultra KSM. Copyright (C) 2011-2012 Nai Xia
+ *
+ * This is an improvement upon KSM. Some basic data structures and routines
+ * are borrowed from ksm.c .
+ *
+ * Its new features:
+ * 1. Full system scan:
+ * It automatically scans all user processes' anonymous VMAs. Kernel-user
+ * interaction to submit a memory area to KSM is no longer needed.
+ *
+ * 2. Rich area detection:
+ * It automatically detects rich areas containing abundant duplicated
+ * pages based. Rich areas are given a full scan speed. Poor areas are
+ * sampled at a reasonable speed with very low CPU consumption.
+ *
+ * 3. Ultra Per-page scan speed improvement:
+ * A new hash algorithm is proposed. As a result, on a machine with
+ * Core(TM)2 Quad Q9300 CPU in 32-bit mode and 800MHZ DDR2 main memory, it
+ * can scan memory areas that does not contain duplicated pages at speed of
+ * 627MB/sec ~ 2445MB/sec and can merge duplicated areas at speed of
+ * 477MB/sec ~ 923MB/sec.
+ *
+ * 4. Thrashing area avoidance:
+ * Thrashing area(an VMA that has frequent Ksm page break-out) can be
+ * filtered out. My benchmark shows it's more efficient than KSM's per-page
+ * hash value based volatile page detection.
+ *
+ *
+ * 5. Misc changes upon KSM:
+ * * It has a fully x86-opitmized memcmp dedicated for 4-byte-aligned page
+ * comparison. It's much faster than default C version on x86.
+ * * rmap_item now has an struct *page member to loosely cache a
+ * address-->page mapping, which reduces too much time-costly
+ * follow_page().
+ * * The VMA creation/exit procedures are hooked to let the Ultra KSM know.
+ * * try_to_merge_two_pages() now can revert a pte if it fails. No break_
+ * ksm is needed for this case.
+ *
+ * 6. Full Zero Page consideration(contributed by Figo Zhang)
+ * Now uksmd consider full zero pages as special pages and merge them to an
+ * special unswappable uksm zero page.
+ */
+
+ChangeLog:
+
+2012-05-05 The creation of this Doc
+2012-05-08 UKSM 0.1.1.1 libc crash bug fix, api clean up, doc clean up.
+2012-05-28 UKSM 0.1.1.2 bug fix release
+2012-06-26 UKSM 0.1.2-beta1 first beta release for 0.1.2
+2012-07-2 UKSM 0.1.2-beta2
+2012-07-10 UKSM 0.1.2-beta3
+2012-07-26 UKSM 0.1.2 Fine grained speed control, more scan optimization.
+2012-10-13 UKSM 0.1.2.1 Bug fixes.
+2012-12-31 UKSM 0.1.2.2 Minor bug fixes
diff --git a/MAINTAINERS b/MAINTAINERS
index 6c484ac..cafc523 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9124,6 +9124,13 @@ S: Maintained
F: drivers/tc/
F: include/linux/tc.h
+TUXONICE (ENHANCED HIBERNATION)
+P: Nigel Cunningham
+M: nigel@tuxonice.net
+L: tuxonice-devel@tuxonice.net
+W: http://tuxonice.net
+S: Maintained
+
U14-34F SCSI DRIVER
M: Dario Ballabio <ballabio_dario@emc.com>
L: linux-scsi@vger.kernel.org
diff --git a/Makefile b/Makefile
index 6d1e304..a49e765 100644
--- a/Makefile
+++ b/Makefile
@@ -1,8 +1,8 @@
VERSION = 3
PATCHLEVEL = 15
SUBLEVEL = 0
-EXTRAVERSION =
-NAME = Shuffling Zombie Juror
+EXTRAVERSION = -pf1
+NAME = United Ukraine
# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 343a87f..2df8093 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -437,6 +437,7 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
change_page_attr(page, numpages, enable ? PAGE_KERNEL : __pgprot(0));
}
+EXPORT_SYMBOL_GPL(kernel_map_pages);
#endif /* CONFIG_DEBUG_PAGEALLOC */
static int fixmaps;
diff --git a/arch/powerpc/platforms/83xx/suspend.c b/arch/powerpc/platforms/83xx/suspend.c
index 4b4c081..5667da2 100644
--- a/arch/powerpc/platforms/83xx/suspend.c
+++ b/arch/powerpc/platforms/83xx/suspend.c
@@ -264,6 +264,8 @@ static int mpc83xx_suspend_begin(suspend_state_t state)
static int agent_thread_fn(void *data)
{
+ set_freezable();
+
while (1) {
wait_event_interruptible(agent_wq, pci_pm_state >= 2);
try_to_freeze();
diff --git a/arch/powerpc/platforms/ps3/device-init.c b/arch/powerpc/platforms/ps3/device-init.c
index 3f175e8..b5d59c6 100644
--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -841,6 +841,8 @@ static int ps3_probe_thread(void *data)
if (res)
goto fail_free_irq;
+ set_freezable();
+
/* Loop here processing the requested notification events. */
do {
try_to_freeze();
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index ae242a7..1f6f9b7 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1829,6 +1829,8 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
arch_flush_lazy_mmu_mode();
}
+EXPORT_SYMBOL_GPL(kernel_map_pages);
+
#ifdef CONFIG_HIBERNATION
bool kernel_page_present(struct page *page)
@@ -1842,7 +1844,7 @@ bool kernel_page_present(struct page *page)
pte = lookup_address((unsigned long)page_address(page), &level);
return (pte_val(*pte) & _PAGE_PRESENT);
}
-
+EXPORT_SYMBOL_GPL(kernel_page_present);
#endif /* CONFIG_HIBERNATION */
#endif /* CONFIG_DEBUG_PAGEALLOC */
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 424f4c9..41f9004 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -122,9 +122,7 @@ void save_processor_state(void)
__save_processor_state(&saved_context);
x86_platform.save_sched_clock_state();
}
-#ifdef CONFIG_X86_32
EXPORT_SYMBOL(save_processor_state);
-#endif
static void do_fpu_end(void)
{
diff --git a/arch/x86/power/hibernate_32.c b/arch/x86/power/hibernate_32.c
index 7d28c88..4f1dd95 100644
--- a/arch/x86/power/hibernate_32.c
+++ b/arch/x86/power/hibernate_32.c
@@ -9,6 +9,7 @@
#include <linux/gfp.h>
#include <linux/suspend.h>
#include <linux/bootmem.h>
+#include <linux/export.h>
#include <asm/page.h>
#include <asm/pgtable.h>
@@ -161,6 +162,7 @@ int swsusp_arch_resume(void)
restore_image();
return 0;
}
+EXPORT_SYMBOL_GPL(swsusp_arch_resume);
/*
* pfn_is_nosave - check if given pfn is in the 'nosave' section
diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index 35e2bb6..9f2545e 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -11,8 +11,7 @@
#include <linux/gfp.h>
#include <linux/smp.h>
#include <linux/suspend.h>
-
-#include <asm/init.h>
+#include <linux/export.h>
#include <asm/proto.h>
#include <asm/page.h>
#include <asm/pgtable.h>
@@ -41,21 +40,41 @@ pgd_t *temp_level4_pgt __visible;
void *relocated_restore_code __visible;
-static void *alloc_pgt_page(void *context)
+static int res_phys_pud_init(pud_t *pud, unsigned long address, unsigned long end)
{
- return (void *)get_safe_page(GFP_ATOMIC);
+ long i, j;
+
+ i = pud_index(address);
+ pud = pud + i;
+ for (; i < PTRS_PER_PUD; pud++, i++) {
+ unsigned long paddr;
+ pmd_t *pmd;
+
+ paddr = address + i*PUD_SIZE;
+ if (paddr >= end)
+ break;
+
+ pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
+ if (!pmd)
+ return -ENOMEM;
+ set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+ for (j = 0; j < PTRS_PER_PMD; pmd++, j++, paddr += PMD_SIZE) {
+ unsigned long pe;
+
+ if (paddr >= end)
+ break;
+ pe = __PAGE_KERNEL_LARGE_EXEC | paddr;
+ pe &= __supported_pte_mask;
+ set_pmd(pmd, __pmd(pe));
+ }
+ }
+ return 0;
}
static int set_up_temporary_mappings(void)
{
- struct x86_mapping_info info = {
- .alloc_pgt_page = alloc_pgt_page,
- .pmd_flag = __PAGE_KERNEL_LARGE_EXEC,
- .kernel_mapping = true,
- };
- unsigned long mstart, mend;
- int result;
- int i;
+ unsigned long start, end, next;
+ int error;
temp_level4_pgt = (pgd_t *)get_safe_page(GFP_ATOMIC);
if (!temp_level4_pgt)
@@ -66,17 +85,21 @@ static int set_up_temporary_mappings(void)
init_level4_pgt[pgd_index(__START_KERNEL_map)]);
/* Set up the direct mapping from scratch */
- for (i = 0; i < nr_pfn_mapped; i++) {
- mstart = pfn_mapped[i].start << PAGE_SHIFT;
- mend = pfn_mapped[i].end << PAGE_SHIFT;
-
- result = kernel_ident_mapping_init(&info, temp_level4_pgt,
- mstart, mend);
-
- if (result)
- return result;
+ start = (unsigned long)pfn_to_kaddr(0);
+ end = (unsigned long)pfn_to_kaddr(max_pfn);
+
+ for (; start < end; start = next) {
+ pud_t *pud = (pud_t *)get_safe_page(GFP_ATOMIC);
+ if (!pud)
+ return -ENOMEM;
+ next = start + PGDIR_SIZE;
+ if (next > end)
+ next = end;
+ if ((error = res_phys_pud_init(pud, __pa(start), __pa(next))))
+ return error;
+ set_pgd(temp_level4_pgt + pgd_index(start),
+ mk_kernel_pgd(__pa(pud)));
}
-
return 0;
}
@@ -97,6 +120,7 @@ int swsusp_arch_resume(void)
restore_image();
return 0;
}
+EXPORT_SYMBOL_GPL(swsusp_arch_resume);
/*
* pfn_is_nosave - check if given pfn is in the 'nosave' section
@@ -147,3 +171,4 @@ int arch_hibernation_header_restore(void *addr)
restore_cr3 = rdr->cr3;
return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
}
+EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 421bef9..0ee5f0f 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -39,6 +39,27 @@ config CFQ_GROUP_IOSCHED
---help---
Enable group IO scheduling in CFQ.
+config IOSCHED_BFQ
+ tristate "BFQ I/O scheduler"
+ default n
+ ---help---
+ The BFQ I/O scheduler tries to distribute bandwidth among
+ all processes according to their weights.
+ It aims at distributing the bandwidth as desired, independently of
+ the disk parameters and with any workload. It also tries to
+ guarantee low latency to interactive and soft real-time
+ applications. If compiled built-in (saying Y here), BFQ can
+ be configured to support hierarchical scheduling.
+
+config CGROUP_BFQIO
+ bool "BFQ hierarchical scheduling support"
+ depends on CGROUPS && IOSCHED_BFQ=y
+ default n
+ ---help---
+ Enable hierarchical scheduling in BFQ, using the cgroups
+ filesystem interface. The name of the subsystem will be
+ bfqio.
+
choice
prompt "Default I/O scheduler"
default DEFAULT_CFQ
@@ -52,6 +73,16 @@ choice
config DEFAULT_CFQ
bool "CFQ" if IOSCHED_CFQ=y
+ config DEFAULT_BFQ
+ bool "BFQ" if IOSCHED_BFQ=y
+ help
+ Selects BFQ as the default I/O scheduler which will be
+ used by default for all block devices.
+ The BFQ I/O scheduler aims at distributing the bandwidth
+ as desired, independently of the disk parameters and with
+ any workload. It also tries to guarantee low latency to
+ interactive and soft real-time applications.
+
config DEFAULT_NOOP
bool "No-op"
@@ -61,6 +92,7 @@ config DEFAULT_IOSCHED
string
default "deadline" if DEFAULT_DEADLINE
default "cfq" if DEFAULT_CFQ
+ default "bfq" if DEFAULT_BFQ
default "noop" if DEFAULT_NOOP
endmenu
diff --git a/block/Makefile b/block/Makefile
index 20645e8..4357141 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_BLOCK) := elevator.o blk-core.o blk-tag.o blk-sysfs.o \
blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \
blk-iopoll.o blk-lib.o blk-mq.o blk-mq-tag.o \
blk-mq-sysfs.o blk-mq-cpu.o blk-mq-cpumap.o ioctl.o \
- genhd.o scsi_ioctl.o partition-generic.o partitions/
+ uuid.o genhd.o scsi_ioctl.o partition-generic.o partitions/
obj-$(CONFIG_BLK_DEV_BSG) += bsg.o
obj-$(CONFIG_BLK_DEV_BSGLIB) += bsg-lib.o
@@ -16,6 +16,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
+obj-$(CONFIG_IOSCHED_BFQ) += bfq-iosched.o
obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
obj-$(CONFIG_BLK_DEV_INTEGRITY) += blk-integrity.o
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
new file mode 100644
index 0000000..b7edce0
--- /dev/null
+++ b/block/bfq-cgroup.c
@@ -0,0 +1,925 @@
+/*
+ * BFQ: CGROUPS support.
+ *
+ * Based on ideas and code from CFQ:
+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
+ *
+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
+ * Paolo Valente <paolo.valente@unimore.it>
+ *
+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
+ *
+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ file.
+ */
+
+#ifdef CONFIG_CGROUP_BFQIO
+
+static DEFINE_MUTEX(bfqio_mutex);
+
+static bool bfqio_is_removed(struct bfqio_cgroup *bgrp)
+{
+ return bgrp ? !bgrp->online : false;
+}
+
+static struct bfqio_cgroup bfqio_root_cgroup = {
+ .weight = BFQ_DEFAULT_GRP_WEIGHT,
+ .ioprio = BFQ_DEFAULT_GRP_IOPRIO,
+ .ioprio_class = BFQ_DEFAULT_GRP_CLASS,
+};
+
+static inline void bfq_init_entity(struct bfq_entity *entity,
+ struct bfq_group *bfqg)
+{
+ entity->weight = entity->new_weight;
+ entity->orig_weight = entity->new_weight;
+ entity->ioprio = entity->new_ioprio;
+ entity->ioprio_class = entity->new_ioprio_class;
+ entity->parent = bfqg->my_entity;
+ entity->sched_data = &bfqg->sched_data;
+}
+
+static struct bfqio_cgroup *css_to_bfqio(struct cgroup_subsys_state *css)
+{
+ return css ? container_of(css, struct bfqio_cgroup, css) : NULL;
+}
+
+/*
+ * Search the bfq_group for bfqd into the hash table (by now only a list)
+ * of bgrp. Must be called under rcu_read_lock().
+ */
+static struct bfq_group *bfqio_lookup_group(struct bfqio_cgroup *bgrp,
+ struct bfq_data *bfqd)
+{
+ struct bfq_group *bfqg;
+ void *key;
+
+ hlist_for_each_entry_rcu(bfqg, &bgrp->group_data, group_node) {
+ key = rcu_dereference(bfqg->bfqd);
+ if (key == bfqd)
+ return bfqg;
+ }
+
+ return NULL;
+}
+
+static inline void bfq_group_init_entity(struct bfqio_cgroup *bgrp,
+ struct bfq_group *bfqg)
+{
+ struct bfq_entity *entity = &bfqg->entity;
+
+ /*
+ * If the weight of the entity has never been set via the sysfs
+ * interface, then bgrp->weight == 0. In this case we initialize
+ * the weight from the current ioprio value. Otherwise, the group
+ * weight, if set, has priority over the ioprio value.
+ */
+ if (bgrp->weight == 0) {
+ entity->new_weight = bfq_ioprio_to_weight(bgrp->ioprio);
+ entity->new_ioprio = bgrp->ioprio;
+ } else {
+ entity->new_weight = bgrp->weight;
+ entity->new_ioprio = bfq_weight_to_ioprio(bgrp->weight);
+ }
+ entity->orig_weight = entity->weight = entity->new_weight;
+ entity->ioprio = entity->new_ioprio;
+ entity->ioprio_class = entity->new_ioprio_class = bgrp->ioprio_class;
+ entity->my_sched_data = &bfqg->sched_data;
+ bfqg->active_entities = 0;
+}
+
+static inline void bfq_group_set_parent(struct bfq_group *bfqg,
+ struct bfq_group *parent)
+{
+ struct bfq_entity *entity;
+
+ BUG_ON(parent == NULL);
+ BUG_ON(bfqg == NULL);
+
+ entity = &bfqg->entity;
+ entity->parent = parent->my_entity;
+ entity->sched_data = &parent->sched_data;
+}
+
+/**
+ * bfq_group_chain_alloc - allocate a chain of groups.
+ * @bfqd: queue descriptor.
+ * @css: the leaf cgroup_subsys_state this chain starts from.
+ *
+ * Allocate a chain of groups starting from the one belonging to
+ * @cgroup up to the root cgroup. Stop if a cgroup on the chain
+ * to the root has already an allocated group on @bfqd.
+ */
+static struct bfq_group *bfq_group_chain_alloc(struct bfq_data *bfqd,
+ struct cgroup_subsys_state *css)
+{
+ struct bfqio_cgroup *bgrp;
+ struct bfq_group *bfqg, *prev = NULL, *leaf = NULL;
+
+ for (; css != NULL; css = css->parent) {
+ bgrp = css_to_bfqio(css);
+
+ bfqg = bfqio_lookup_group(bgrp, bfqd);
+ if (bfqg != NULL) {
+ /*
+ * All the cgroups in the path from there to the
+ * root must have a bfq_group for bfqd, so we don't
+ * need any more allocations.
+ */
+ break;
+ }
+
+ bfqg = kzalloc(sizeof(*bfqg), GFP_ATOMIC);
+ if (bfqg == NULL)
+ goto cleanup;
+
+ bfq_group_init_entity(bgrp, bfqg);
+ bfqg->my_entity = &bfqg->entity;
+
+ if (leaf == NULL) {
+ leaf = bfqg;
+ prev = leaf;
+ } else {
+ bfq_group_set_parent(prev, bfqg);
+ /*
+ * Build a list of allocated nodes using the bfqd
+ * filed, that is still unused and will be initialized
+ * only after the node will be connected.
+ */
+ prev->bfqd = bfqg;
+ prev = bfqg;
+ }
+ }
+
+ return leaf;
+
+cleanup:
+ while (leaf != NULL) {
+ prev = leaf;
+ leaf = leaf->bfqd;
+ kfree(prev);
+ }
+
+ return NULL;
+}
+
+/**
+ * bfq_group_chain_link - link an allocated group chain to a cgroup hierarchy.
+ * @bfqd: the queue descriptor.
+ * @css: the leaf cgroup_subsys_state to start from.
+ * @leaf: the leaf group (to be associated to @cgroup).
+ *
+ * Try to link a chain of groups to a cgroup hierarchy, connecting the
+ * nodes bottom-up, so we can be sure that when we find a cgroup in the
+ * hierarchy that already as a group associated to @bfqd all the nodes
+ * in the path to the root cgroup have one too.
+ *
+ * On locking: the queue lock protects the hierarchy (there is a hierarchy
+ * per device) while the bfqio_cgroup lock protects the list of groups
+ * belonging to the same cgroup.
+ */
+static void bfq_group_chain_link(struct bfq_data *bfqd,
+ struct cgroup_subsys_state *css,
+ struct bfq_group *leaf)
+{
+ struct bfqio_cgroup *bgrp;
+ struct bfq_group *bfqg, *next, *prev = NULL;
+ unsigned long flags;
+
+ assert_spin_locked(bfqd->queue->queue_lock);
+
+ for (; css != NULL && leaf != NULL; css = css->parent) {
+ bgrp = css_to_bfqio(css);
+ next = leaf->bfqd;
+
+ bfqg = bfqio_lookup_group(bgrp, bfqd);
+ BUG_ON(bfqg != NULL);
+
+ spin_lock_irqsave(&bgrp->lock, flags);
+
+ rcu_assign_pointer(leaf->bfqd, bfqd);
+ hlist_add_head_rcu(&leaf->group_node, &bgrp->group_data);
+ hlist_add_head(&leaf->bfqd_node, &bfqd->group_list);
+
+ spin_unlock_irqrestore(&bgrp->lock, flags);
+
+ prev = leaf;
+ leaf = next;
+ }
+
+ BUG_ON(css == NULL && leaf != NULL);
+ if (css != NULL && prev != NULL) {
+ bgrp = css_to_bfqio(css);
+ bfqg = bfqio_lookup_group(bgrp, bfqd);
+ bfq_group_set_parent(prev, bfqg);
+ }
+}
+
+/**
+ * bfq_find_alloc_group - return the group associated to @bfqd in @cgroup.
+ * @bfqd: queue descriptor.
+ * @cgroup: cgroup being searched for.
+ *
+ * Return a group associated to @bfqd in @cgroup, allocating one if
+ * necessary. When a group is returned all the cgroups in the path
+ * to the root have a group associated to @bfqd.
+ *
+ * If the allocation fails, return the root group: this breaks guarantees
+ * but is a safe fallback. If this loss becomes a problem it can be
+ * mitigated using the equivalent weight (given by the product of the
+ * weights of the groups in the path from @group to the root) in the
+ * root scheduler.
+ *
+ * We allocate all the missing nodes in the path from the leaf cgroup
+ * to the root and we connect the nodes only after all the allocations
+ * have been successful.
+ */
+static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
+ struct cgroup_subsys_state *css)
+{
+ struct bfqio_cgroup *bgrp = css_to_bfqio(css);
+ struct bfq_group *bfqg;
+
+ bfqg = bfqio_lookup_group(bgrp, bfqd);
+ if (bfqg != NULL)
+ return bfqg;
+
+ bfqg = bfq_group_chain_alloc(bfqd, css);
+ if (bfqg != NULL)
+ bfq_group_chain_link(bfqd, css, bfqg);
+ else
+ bfqg = bfqd->root_group;
+
+ return bfqg;
+}
+
+/**
+ * bfq_bfqq_move - migrate @bfqq to @bfqg.
+ * @bfqd: queue descriptor.
+ * @bfqq: the queue to move.
+ * @entity: @bfqq's entity.
+ * @bfqg: the group to move to.
+ *
+ * Move @bfqq to @bfqg, deactivating it from its old group and reactivating
+ * it on the new one. Avoid putting the entity on the old group idle tree.
+ *
+ * Must be called under the queue lock; the cgroup owning @bfqg must
+ * not disappear (by now this just means that we are called under
+ * rcu_read_lock()).
+ */
+static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ struct bfq_entity *entity, struct bfq_group *bfqg)
+{
+ int busy, resume;
+
+ busy = bfq_bfqq_busy(bfqq);
+ resume = !RB_EMPTY_ROOT(&bfqq->sort_list);
+
+ BUG_ON(resume && !entity->on_st);
+ BUG_ON(busy && !resume && entity->on_st &&
+ bfqq != bfqd->in_service_queue);
+
+ if (busy) {
+ BUG_ON(atomic_read(&bfqq->ref) < 2);
+
+ if (!resume)
+ bfq_del_bfqq_busy(bfqd, bfqq, 0);
+ else
+ bfq_deactivate_bfqq(bfqd, bfqq, 0);
+ } else if (entity->on_st)
+ bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
+
+ /*
+ * Here we use a reference to bfqg. We don't need a refcounter
+ * as the cgroup reference will not be dropped, so that its
+ * destroy() callback will not be invoked.
+ */
+ entity->parent = bfqg->my_entity;
+ entity->sched_data = &bfqg->sched_data;
+
+ if (busy && resume)
+ bfq_activate_bfqq(bfqd, bfqq);
+
+ if (bfqd->in_service_queue == NULL && !bfqd->rq_in_driver)
+ bfq_schedule_dispatch(bfqd);
+}
+
+/**
+ * __bfq_bic_change_cgroup - move @bic to @cgroup.
+ * @bfqd: the queue descriptor.
+ * @bic: the bic to move.
+ * @cgroup: the cgroup to move to.
+ *
+ * Move bic to cgroup, assuming that bfqd->queue is locked; the caller
+ * has to make sure that the reference to cgroup is valid across the call.
+ *
+ * NOTE: an alternative approach might have been to store the current
+ * cgroup in bfqq and getting a reference to it, reducing the lookup
+ * time here, at the price of slightly more complex code.
+ */
+static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
+ struct bfq_io_cq *bic,
+ struct cgroup_subsys_state *css)
+{
+ struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
+ struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
+ struct bfq_entity *entity;
+ struct bfq_group *bfqg;
+ struct bfqio_cgroup *bgrp;
+
+ bgrp = css_to_bfqio(css);
+
+ bfqg = bfq_find_alloc_group(bfqd, css);
+ if (async_bfqq != NULL) {
+ entity = &async_bfqq->entity;
+
+ if (entity->sched_data != &bfqg->sched_data) {
+ bic_set_bfqq(bic, NULL, 0);
+ bfq_log_bfqq(bfqd, async_bfqq,
+ "bic_change_group: %p %d",
+ async_bfqq, atomic_read(&async_bfqq->ref));
+ bfq_put_queue(async_bfqq);
+ }
+ }
+
+ if (sync_bfqq != NULL) {
+ entity = &sync_bfqq->entity;
+ if (entity->sched_data != &bfqg->sched_data)
+ bfq_bfqq_move(bfqd, sync_bfqq, entity, bfqg);
+ }
+
+ return bfqg;
+}
+
+/**
+ * bfq_bic_change_cgroup - move @bic to @cgroup.
+ * @bic: the bic being migrated.
+ * @cgroup: the destination cgroup.
+ *
+ * When the task owning @bic is moved to @cgroup, @bic is immediately
+ * moved into its new parent group.
+ */
+static void bfq_bic_change_cgroup(struct bfq_io_cq *bic,
+ struct cgroup_subsys_state *css)
+{
+ struct bfq_data *bfqd;
+ unsigned long uninitialized_var(flags);
+
+ bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
+ &flags);
+ if (bfqd != NULL) {
+ __bfq_bic_change_cgroup(bfqd, bic, css);
+ bfq_put_bfqd_unlock(bfqd, &flags);
+ }
+}
+
+/**
+ * bfq_bic_update_cgroup - update the cgroup of @bic.
+ * @bic: the @bic to update.
+ *
+ * Make sure that @bic is enqueued in the cgroup of the current task.
+ * We need this in addition to moving bics during the cgroup attach
+ * phase because the task owning @bic could be at its first disk
+ * access or we may end up in the root cgroup as the result of a
+ * memory allocation failure and here we try to move to the right
+ * group.
+ *
+ * Must be called under the queue lock. It is safe to use the returned
+ * value even after the rcu_read_unlock() as the migration/destruction
+ * paths act under the queue lock too. IOW it is impossible to race with
+ * group migration/destruction and end up with an invalid group as:
+ * a) here cgroup has not yet been destroyed, nor its destroy callback
+ * has started execution, as current holds a reference to it,
+ * b) if it is destroyed after rcu_read_unlock() [after current is
+ * migrated to a different cgroup] its attach() callback will have
+ * taken care of remove all the references to the old cgroup data.
+ */
+static struct bfq_group *bfq_bic_update_cgroup(struct bfq_io_cq *bic)
+{
+ struct bfq_data *bfqd = bic_to_bfqd(bic);
+ struct bfq_group *bfqg;
+ struct cgroup_subsys_state *css;
+
+ BUG_ON(bfqd == NULL);
+
+ rcu_read_lock();
+ css = task_css(current, bfqio_cgrp_id);
+ bfqg = __bfq_bic_change_cgroup(bfqd, bic, css);
+ rcu_read_unlock();
+
+ return bfqg;
+}
+
+/**
+ * bfq_flush_idle_tree - deactivate any entity on the idle tree of @st.
+ * @st: the service tree being flushed.
+ */
+static inline void bfq_flush_idle_tree(struct bfq_service_tree *st)
+{
+ struct bfq_entity *entity = st->first_idle;
+
+ for (; entity != NULL; entity = st->first_idle)
+ __bfq_deactivate_entity(entity, 0);
+}
+
+/**
+ * bfq_reparent_leaf_entity - move leaf entity to the root_group.
+ * @bfqd: the device data structure with the root group.
+ * @entity: the entity to move.
+ */
+static inline void bfq_reparent_leaf_entity(struct bfq_data *bfqd,
+ struct bfq_entity *entity)
+{
+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
+
+ BUG_ON(bfqq == NULL);
+ bfq_bfqq_move(bfqd, bfqq, entity, bfqd->root_group);
+ return;
+}
+
+/**
+ * bfq_reparent_active_entities - move to the root group all active entities.
+ * @bfqd: the device data structure with the root group.
+ * @bfqg: the group to move from.
+ * @st: the service tree with the entities.
+ *
+ * Needs queue_lock to be taken and reference to be valid over the call.
+ */
+static inline void bfq_reparent_active_entities(struct bfq_data *bfqd,
+ struct bfq_group *bfqg,
+ struct bfq_service_tree *st)
+{
+ struct rb_root *active = &st->active;
+ struct bfq_entity *entity = NULL;
+
+ if (!RB_EMPTY_ROOT(&st->active))
+ entity = bfq_entity_of(rb_first(active));
+
+ for (; entity != NULL; entity = bfq_entity_of(rb_first(active)))
+ bfq_reparent_leaf_entity(bfqd, entity);
+
+ if (bfqg->sched_data.in_service_entity != NULL)
+ bfq_reparent_leaf_entity(bfqd,
+ bfqg->sched_data.in_service_entity);
+
+ return;
+}
+
+/**
+ * bfq_destroy_group - destroy @bfqg.
+ * @bgrp: the bfqio_cgroup containing @bfqg.
+ * @bfqg: the group being destroyed.
+ *
+ * Destroy @bfqg, making sure that it is not referenced from its parent.
+ */
+static void bfq_destroy_group(struct bfqio_cgroup *bgrp, struct bfq_group *bfqg)
+{
+ struct bfq_data *bfqd;
+ struct bfq_service_tree *st;
+ struct bfq_entity *entity = bfqg->my_entity;
+ unsigned long uninitialized_var(flags);
+ int i;
+
+ hlist_del(&bfqg->group_node);
+
+ /*
+ * Empty all service_trees belonging to this group before deactivating
+ * the group itself.
+ */
+ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++) {
+ st = bfqg->sched_data.service_tree + i;
+
+ /*
+ * The idle tree may still contain bfq_queues belonging
+ * to exited task because they never migrated to a different
+ * cgroup from the one being destroyed now. No one else
+ * can access them so it's safe to act without any lock.
+ */
+ bfq_flush_idle_tree(st);
+
+ /*
+ * It may happen that some queues are still active
+ * (busy) upon group destruction (if the corresponding
+ * processes have been forced to terminate). We move
+ * all the leaf entities corresponding to these queues
+ * to the root_group.
+ * Also, it may happen that the group has an entity
+ * under service, which is disconnected from the active
+ * tree: it must be moved, too.
+ * There is no need to put the sync queues, as the
+ * scheduler has taken no reference.
+ */
+ bfqd = bfq_get_bfqd_locked(&bfqg->bfqd, &flags);
+ if (bfqd != NULL) {
+ bfq_reparent_active_entities(bfqd, bfqg, st);
+ bfq_put_bfqd_unlock(bfqd, &flags);
+ }
+ BUG_ON(!RB_EMPTY_ROOT(&st->active));
+ BUG_ON(!RB_EMPTY_ROOT(&st->idle));
+ }
+ BUG_ON(bfqg->sched_data.next_in_service != NULL);
+ BUG_ON(bfqg->sched_data.in_service_entity != NULL);
+
+ /*
+ * We may race with device destruction, take extra care when
+ * dereferencing bfqg->bfqd.
+ */
+ bfqd = bfq_get_bfqd_locked(&bfqg->bfqd, &flags);
+ if (bfqd != NULL) {
+ hlist_del(&bfqg->bfqd_node);
+ __bfq_deactivate_entity(entity, 0);
+ bfq_put_async_queues(bfqd, bfqg);
+ bfq_put_bfqd_unlock(bfqd, &flags);
+ }
+ BUG_ON(entity->tree != NULL);
+
+ /*
+ * No need to defer the kfree() to the end of the RCU grace
+ * period: we are called from the destroy() callback of our
+ * cgroup, so we can be sure that no one is a) still using
+ * this cgroup or b) doing lookups in it.
+ */
+ kfree(bfqg);
+}
+
+static void bfq_end_wr_async(struct bfq_data *bfqd)
+{
+ struct hlist_node *tmp;
+ struct bfq_group *bfqg;
+
+ hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node)
+ bfq_end_wr_async_queues(bfqd, bfqg);
+ bfq_end_wr_async_queues(bfqd, bfqd->root_group);
+}
+
+/**
+ * bfq_disconnect_groups - disconnect @bfqd from all its groups.
+ * @bfqd: the device descriptor being exited.
+ *
+ * When the device exits we just make sure that no lookup can return
+ * the now unused group structures. They will be deallocated on cgroup
+ * destruction.
+ */
+static void bfq_disconnect_groups(struct bfq_data *bfqd)
+{
+ struct hlist_node *tmp;
+ struct bfq_group *bfqg;
+
+ bfq_log(bfqd, "disconnect_groups beginning");
+ hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node) {
+ hlist_del(&bfqg->bfqd_node);
+
+ __bfq_deactivate_entity(bfqg->my_entity, 0);
+
+ /*
+ * Don't remove from the group hash, just set an
+ * invalid key. No lookups can race with the
+ * assignment as bfqd is being destroyed; this
+ * implies also that new elements cannot be added
+ * to the list.
+ */
+ rcu_assign_pointer(bfqg->bfqd, NULL);
+
+ bfq_log(bfqd, "disconnect_groups: put async for group %p",
+ bfqg);
+ bfq_put_async_queues(bfqd, bfqg);
+ }
+}
+
+static inline void bfq_free_root_group(struct bfq_data *bfqd)
+{
+ struct bfqio_cgroup *bgrp = &bfqio_root_cgroup;
+ struct bfq_group *bfqg = bfqd->root_group;
+
+ bfq_put_async_queues(bfqd, bfqg);
+
+ spin_lock_irq(&bgrp->lock);
+ hlist_del_rcu(&bfqg->group_node);
+ spin_unlock_irq(&bgrp->lock);
+
+ /*
+ * No need to synchronize_rcu() here: since the device is gone
+ * there cannot be any read-side access to its root_group.
+ */
+ kfree(bfqg);
+}
+
+static struct bfq_group *bfq_alloc_root_group(struct bfq_data *bfqd, int node)
+{
+ struct bfq_group *bfqg;
+ struct bfqio_cgroup *bgrp;
+ int i;
+
+ bfqg = kzalloc_node(sizeof(*bfqg), GFP_KERNEL, node);
+ if (bfqg == NULL)
+ return NULL;
+
+ bfqg->entity.parent = NULL;
+ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
+ bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
+
+ bgrp = &bfqio_root_cgroup;
+ spin_lock_irq(&bgrp->lock);
+ rcu_assign_pointer(bfqg->bfqd, bfqd);
+ hlist_add_head_rcu(&bfqg->group_node, &bgrp->group_data);
+ spin_unlock_irq(&bgrp->lock);
+
+ return bfqg;
+}
+
+#define SHOW_FUNCTION(__VAR) \
+static u64 bfqio_cgroup_##__VAR##_read(struct cgroup_subsys_state *css, \
+ struct cftype *cftype) \
+{ \
+ struct bfqio_cgroup *bgrp = css_to_bfqio(css); \
+ u64 ret = -ENODEV; \
+ \
+ mutex_lock(&bfqio_mutex); \
+ if (bfqio_is_removed(bgrp)) \
+ goto out_unlock; \
+ \
+ spin_lock_irq(&bgrp->lock); \
+ ret = bgrp->__VAR; \
+ spin_unlock_irq(&bgrp->lock); \
+ \
+out_unlock: \
+ mutex_unlock(&bfqio_mutex); \
+ return ret; \
+}
+
+SHOW_FUNCTION(weight);
+SHOW_FUNCTION(ioprio);
+SHOW_FUNCTION(ioprio_class);
+#undef SHOW_FUNCTION
+
+#define STORE_FUNCTION(__VAR, __MIN, __MAX) \
+static int bfqio_cgroup_##__VAR##_write(struct cgroup_subsys_state *css,\
+ struct cftype *cftype, \
+ u64 val) \
+{ \
+ struct bfqio_cgroup *bgrp = css_to_bfqio(css); \
+ struct bfq_group *bfqg; \
+ int ret = -EINVAL; \
+ \
+ if (val < (__MIN) || val > (__MAX)) \
+ return ret; \
+ \
+ ret = -ENODEV; \
+ mutex_lock(&bfqio_mutex); \
+ if (bfqio_is_removed(bgrp)) \
+ goto out_unlock; \
+ ret = 0; \
+ \
+ spin_lock_irq(&bgrp->lock); \
+ bgrp->__VAR = (unsigned short)val; \
+ hlist_for_each_entry(bfqg, &bgrp->group_data, group_node) { \
+ /* \
+ * Setting the ioprio_changed flag of the entity \
+ * to 1 with new_##__VAR == ##__VAR would re-set \
+ * the value of the weight to its ioprio mapping. \
+ * Set the flag only if necessary. \
+ */ \
+ if ((unsigned short)val != bfqg->entity.new_##__VAR) { \
+ bfqg->entity.new_##__VAR = (unsigned short)val; \
+ /* \
+ * Make sure that the above new value has been \
+ * stored in bfqg->entity.new_##__VAR before \
+ * setting the ioprio_changed flag. In fact, \
+ * this flag may be read asynchronously (in \
+ * critical sections protected by a different \
+ * lock than that held here), and finding this \
+ * flag set may cause the execution of the code \
+ * for updating parameters whose value may \
+ * depend also on bfqg->entity.new_##__VAR (in \
+ * __bfq_entity_update_weight_prio). \
+ * This barrier makes sure that the new value \
+ * of bfqg->entity.new_##__VAR is correctly \
+ * seen in that code. \
+ */ \
+ smp_wmb(); \
+ bfqg->entity.ioprio_changed = 1; \
+ } \
+ } \
+ spin_unlock_irq(&bgrp->lock); \
+ \
+out_unlock: \
+ mutex_unlock(&bfqio_mutex); \
+ return ret; \
+}
+
+STORE_FUNCTION(weight, BFQ_MIN_WEIGHT, BFQ_MAX_WEIGHT);
+STORE_FUNCTION(ioprio, 0, IOPRIO_BE_NR - 1);
+STORE_FUNCTION(ioprio_class, IOPRIO_CLASS_RT, IOPRIO_CLASS_IDLE);
+#undef STORE_FUNCTION
+
+static struct cftype bfqio_files[] = {
+ {
+ .name = "weight",
+ .read_u64 = bfqio_cgroup_weight_read,
+ .write_u64 = bfqio_cgroup_weight_write,
+ },
+ {
+ .name = "ioprio",
+ .read_u64 = bfqio_cgroup_ioprio_read,
+ .write_u64 = bfqio_cgroup_ioprio_write,
+ },
+ {
+ .name = "ioprio_class",
+ .read_u64 = bfqio_cgroup_ioprio_class_read,
+ .write_u64 = bfqio_cgroup_ioprio_class_write,
+ },
+ { }, /* terminate */
+};
+
+static struct cgroup_subsys_state *bfqio_create(struct cgroup_subsys_state
+ *parent_css)
+{
+ struct bfqio_cgroup *bgrp;
+
+ if (parent_css != NULL) {
+ bgrp = kzalloc(sizeof(*bgrp), GFP_KERNEL);
+ if (bgrp == NULL)
+ return ERR_PTR(-ENOMEM);
+ } else
+ bgrp = &bfqio_root_cgroup;
+
+ spin_lock_init(&bgrp->lock);
+ INIT_HLIST_HEAD(&bgrp->group_data);
+ bgrp->ioprio = BFQ_DEFAULT_GRP_IOPRIO;
+ bgrp->ioprio_class = BFQ_DEFAULT_GRP_CLASS;
+
+ return &bgrp->css;
+}
+
+/*
+ * We cannot support shared io contexts, as we have no means to support
+ * two tasks with the same ioc in two different groups without major rework
+ * of the main bic/bfqq data structures. By now we allow a task to change
+ * its cgroup only if it's the only owner of its ioc; the drawback of this
+ * behavior is that a group containing a task that forked using CLONE_IO
+ * will not be destroyed until the tasks sharing the ioc die.
+ */
+static int bfqio_can_attach(struct cgroup_subsys_state *css,
+ struct cgroup_taskset *tset)
+{
+ struct task_struct *task;
+ struct io_context *ioc;
+ int ret = 0;
+
+ cgroup_taskset_for_each(task, tset) {
+ /*
+ * task_lock() is needed to avoid races with
+ * exit_io_context()
+ */
+ task_lock(task);
+ ioc = task->io_context;
+ if (ioc != NULL && atomic_read(&ioc->nr_tasks) > 1)
+ /*
+ * ioc == NULL means that the task is either too young
+ * or exiting: if it has still no ioc the ioc can't be
+ * shared, if the task is exiting the attach will fail
+ * anyway, no matter what we return here.
+ */
+ ret = -EINVAL;
+ task_unlock(task);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
+static void bfqio_attach(struct cgroup_subsys_state *css,
+ struct cgroup_taskset *tset)
+{
+ struct task_struct *task;
+ struct io_context *ioc;
+ struct io_cq *icq;
+
+ /*
+ * IMPORTANT NOTE: The move of more than one process at a time to a
+ * new group has not yet been tested.
+ */
+ cgroup_taskset_for_each(task, tset) {
+ ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
+ if (ioc) {
+ /*
+ * Handle cgroup change here.
+ */
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(icq, &ioc->icq_list, ioc_node)
+ if (!strncmp(
+ icq->q->elevator->type->elevator_name,
+ "bfq", ELV_NAME_MAX))
+ bfq_bic_change_cgroup(icq_to_bic(icq),
+ css);
+ rcu_read_unlock();
+ put_io_context(ioc);
+ }
+ }
+}
+
+static void bfqio_destroy(struct cgroup_subsys_state *css)
+{
+ struct bfqio_cgroup *bgrp = css_to_bfqio(css);
+ struct hlist_node *tmp;
+ struct bfq_group *bfqg;
+
+ /*
+ * Since we are destroying the cgroup, there are no more tasks
+ * referencing it, and all the RCU grace periods that may have
+ * referenced it are ended (as the destruction of the parent
+ * cgroup is RCU-safe); bgrp->group_data will not be accessed by
+ * anything else and we don't need any synchronization.
+ */
+ hlist_for_each_entry_safe(bfqg, tmp, &bgrp->group_data, group_node)
+ bfq_destroy_group(bgrp, bfqg);
+
+ BUG_ON(!hlist_empty(&bgrp->group_data));
+
+ kfree(bgrp);
+}
+
+static int bfqio_css_online(struct cgroup_subsys_state *css)
+{
+ struct bfqio_cgroup *bgrp = css_to_bfqio(css);
+
+ mutex_lock(&bfqio_mutex);
+ bgrp->online = true;
+ mutex_unlock(&bfqio_mutex);
+
+ return 0;
+}
+
+static void bfqio_css_offline(struct cgroup_subsys_state *css)
+{
+ struct bfqio_cgroup *bgrp = css_to_bfqio(css);
+
+ mutex_lock(&bfqio_mutex);
+ bgrp->online = false;
+ mutex_unlock(&bfqio_mutex);
+}
+
+struct cgroup_subsys bfqio_cgrp_subsys = {
+ .css_alloc = bfqio_create,
+ .css_online = bfqio_css_online,
+ .css_offline = bfqio_css_offline,
+ .can_attach = bfqio_can_attach,
+ .attach = bfqio_attach,
+ .css_free = bfqio_destroy,
+ .base_cftypes = bfqio_files,
+};
+#else
+static inline void bfq_init_entity(struct bfq_entity *entity,
+ struct bfq_group *bfqg)
+{
+ entity->weight = entity->new_weight;
+ entity->orig_weight = entity->new_weight;
+ entity->ioprio = entity->new_ioprio;
+ entity->ioprio_class = entity->new_ioprio_class;
+ entity->sched_data = &bfqg->sched_data;
+}
+
+static inline struct bfq_group *
+bfq_bic_update_cgroup(struct bfq_io_cq *bic)
+{
+ struct bfq_data *bfqd = bic_to_bfqd(bic);
+ return bfqd->root_group;
+}
+
+static inline void bfq_bfqq_move(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq,
+ struct bfq_entity *entity,
+ struct bfq_group *bfqg)
+{
+}
+
+static void bfq_end_wr_async(struct bfq_data *bfqd)
+{
+ bfq_end_wr_async_queues(bfqd, bfqd->root_group);
+}
+
+static inline void bfq_disconnect_groups(struct bfq_data *bfqd)
+{
+ bfq_put_async_queues(bfqd, bfqd->root_group);
+}
+
+static inline void bfq_free_root_group(struct bfq_data *bfqd)
+{
+ kfree(bfqd->root_group);
+}
+
+static struct bfq_group *bfq_alloc_root_group(struct bfq_data *bfqd, int node)
+{
+ struct bfq_group *bfqg;
+ int i;
+
+ bfqg = kmalloc_node(sizeof(*bfqg), GFP_KERNEL | __GFP_ZERO, node);
+ if (bfqg == NULL)
+ return NULL;
+
+ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
+ bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
+
+ return bfqg;
+}
+#endif
diff --git a/block/bfq-ioc.c b/block/bfq-ioc.c
new file mode 100644
index 0000000..7f6b000
--- /dev/null
+++ b/block/bfq-ioc.c
@@ -0,0 +1,36 @@
+/*
+ * BFQ: I/O context handling.
+ *
+ * Based on ideas and code from CFQ:
+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
+ *
+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
+ * Paolo Valente <paolo.valente@unimore.it>
+ *
+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
+ */
+
+/**
+ * icq_to_bic - convert iocontext queue structure to bfq_io_cq.
+ * @icq: the iocontext queue.
+ */
+static inline struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
+{
+ /* bic->icq is the first member, %NULL will convert to %NULL */
+ return container_of(icq, struct bfq_io_cq, icq);
+}
+
+/**
+ * bfq_bic_lookup - search into @ioc a bic associated to @bfqd.
+ * @bfqd: the lookup key.
+ * @ioc: the io_context of the process doing I/O.
+ *
+ * Queue lock must be held.
+ */
+static inline struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
+ struct io_context *ioc)
+{
+ if (ioc)
+ return icq_to_bic(ioc_lookup_icq(ioc, bfqd->queue));
+ return NULL;
+}
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
new file mode 100644
index 0000000..77162be
--- /dev/null
+++ b/block/bfq-iosched.c
@@ -0,0 +1,3843 @@
+/*
+ * Budget Fair Queueing (BFQ) disk scheduler.
+ *
+ * Based on ideas and code from CFQ:
+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
+ *
+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
+ * Paolo Valente <paolo.valente@unimore.it>
+ *
+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
+ *
+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ file.
+ *
+ * BFQ is a proportional share disk scheduling algorithm based on the
+ * slice-by-slice service scheme of CFQ. But BFQ assigns budgets, measured in
+ * number of sectors, to tasks instead of time slices. The disk is not granted
+ * to the in-service task for a given time slice, but until it has exhausted
+ * its assigned budget. This change from the time to the service domain allows
+ * BFQ to distribute the disk bandwidth among tasks as desired, without any
+ * distortion due to ZBR, workload fluctuations or other factors. BFQ uses an
+ * ad hoc internal scheduler, called B-WF2Q+, to schedule tasks according to
+ * their budgets (more precisely BFQ schedules queues associated to tasks).
+ * Thanks to this accurate scheduler, BFQ can afford to assign high budgets to
+ * disk-bound non-seeky tasks (to boost the throughput), and yet guarantee low
+ * latencies to interactive and soft real-time applications.
+ *
+ * BFQ is described in [1], where also a reference to the initial, more
+ * theoretical paper on BFQ can be found. The interested reader can find in
+ * the latter paper full details on the main algorithm as well as formulas of
+ * the guarantees, plus formal proofs of all the properties. With respect to
+ * the version of BFQ presented in these papers, this implementation adds a
+ * few more heuristics, such as the one that guarantees a low latency to soft
+ * real-time applications, and a hierarchical extension based on H-WF2Q+.
+ *
+ * B-WF2Q+ is based on WF2Q+, that is described in [2], together with
+ * H-WF2Q+, while the augmented tree used to implement B-WF2Q+ with O(log N)
+ * complexity derives from the one introduced with EEVDF in [3].
+ *
+ * [1] P. Valente and M. Andreolini, ``Improving Application Responsiveness
+ * with the BFQ Disk I/O Scheduler'',
+ * Proceedings of the 5th Annual International Systems and Storage
+ * Conference (SYSTOR '12), June 2012.
+ *
+ * http://algogroup.unimo.it/people/paolo/disk_sched/bf1-v1-suite-results.pdf
+ *
+ * [2] Jon C.R. Bennett and H. Zhang, ``Hierarchical Packet Fair Queueing
+ * Algorithms,'' IEEE/ACM Transactions on Networking, 5(5):675-689,
+ * Oct 1997.
+ *
+ * http://www.cs.cmu.edu/~hzhang/papers/TON-97-Oct.ps.gz
+ *
+ * [3] I. Stoica and H. Abdel-Wahab, ``Earliest Eligible Virtual Deadline
+ * First: A Flexible and Accurate Mechanism for Proportional Share
+ * Resource Allocation,'' technical report.
+ *
+ * http://www.cs.berkeley.edu/~istoica/papers/eevdf-tr-95.pdf
+ */
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/blkdev.h>
+#include <linux/cgroup.h>
+#include <linux/elevator.h>
+#include <linux/jiffies.h>
+#include <linux/rbtree.h>
+#include <linux/ioprio.h>
+#include "bfq.h"
+#include "blk.h"
+
+/* Max number of dispatches in one round of service. */
+static const int bfq_quantum = 4;
+
+/* Expiration time of sync (0) and async (1) requests, in jiffies. */
+static const int bfq_fifo_expire[2] = { HZ / 4, HZ / 8 };
+
+/* Maximum backwards seek, in KiB. */
+static const int bfq_back_max = 16 * 1024;
+
+/* Penalty of a backwards seek, in number of sectors. */
+static const int bfq_back_penalty = 2;
+
+/* Idling period duration, in jiffies. */
+static int bfq_slice_idle = HZ / 125;
+
+/* Default maximum budget values, in sectors and number of requests. */
+static const int bfq_default_max_budget = 16 * 1024;
+static const int bfq_max_budget_async_rq = 4;
+
+/*
+ * Async to sync throughput distribution is controlled as follows:
+ * when an async request is served, the entity is charged the number
+ * of sectors of the request, multiplied by the factor below
+ */
+static const int bfq_async_charge_factor = 10;
+
+/* Default timeout values, in jiffies, approximating CFQ defaults. */
+static const int bfq_timeout_sync = HZ / 8;
+static int bfq_timeout_async = HZ / 25;
+
+struct kmem_cache *bfq_pool;
+
+/* Below this threshold (in ms), we consider thinktime immediate. */
+#define BFQ_MIN_TT 2
+
+/* hw_tag detection: parallel requests threshold and min samples needed. */
+#define BFQ_HW_QUEUE_THRESHOLD 4
+#define BFQ_HW_QUEUE_SAMPLES 32
+
+#define BFQQ_SEEK_THR (sector_t)(8 * 1024)
+#define BFQQ_SEEKY(bfqq) ((bfqq)->seek_mean > BFQQ_SEEK_THR)
+
+/* Min samples used for peak rate estimation (for autotuning). */
+#define BFQ_PEAK_RATE_SAMPLES 32
+
+/* Shift used for peak rate fixed precision calculations. */
+#define BFQ_RATE_SHIFT 16
+
+/*
+ * By default, BFQ computes the duration of the weight raising for interactive
+ * applications automatically, using the following formula:
+ * duration = (R / r) * T, where r is the peak rate of the device, and R and T
+ * are two reference parameters.
+ * In particular, R is the peak rate of the reference device (see below), and T
+ * is a reference time: given the systems that are likely to be installed on
+ * the reference device according to its speed class, T is about the maximum
+ * time needed, under BFQ and while reading two files in parallel, to load
+ * typical large applications on these systems.
+ * In practice, the slower/faster the device at hand is, the more/less it takes
+ * to load applications with respect to the reference device. Accordingly, the
+ * longer/shorter BFQ grants weight raising to interactive applications.
+ *
+ * BFQ uses four different reference pairs (R, T), depending on:
+ * . whether the device is rotational or non-rotational;
+ * . whether the device is slow, such as old or portable HDDs, as well as
+ * SD cards, or fast, such as newer HDDs and SSDs.
+ *
+ * The device's speed class is dynamically (re)detected in
+ * bfq_update_peak_rate() every time the estimated peak rate is updated.
+ *
+ * In the following definitions, R_slow[0]/R_fast[0] and T_slow[0]/T_fast[0]
+ * are the reference values for a slow/fast rotational device, whereas
+ * R_slow[1]/R_fast[1] and T_slow[1]/T_fast[1] are the reference values for
+ * a slow/fast non-rotational device. Finally, device_speed_thresh are the
+ * thresholds used to switch between speed classes.
+ * Both the reference peak rates and the thresholds are measured in
+ * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
+ */
+static int R_slow[2] = {1536, 10752};
+static int R_fast[2] = {17415, 34791};
+/*
+ * To improve readability, a conversion function is used to initialize the
+ * following arrays, which entails that the latter can be initialized only
+ * in a function.
+ */
+static int T_slow[2];
+static int T_fast[2];
+static int device_speed_thresh[2];
+
+#define BFQ_SERVICE_TREE_INIT ((struct bfq_service_tree) \
+ { RB_ROOT, RB_ROOT, NULL, NULL, 0, 0 })
+
+#define RQ_BIC(rq) ((struct bfq_io_cq *) (rq)->elv.priv[0])
+#define RQ_BFQQ(rq) ((rq)->elv.priv[1])
+
+static inline void bfq_schedule_dispatch(struct bfq_data *bfqd);
+
+#include "bfq-ioc.c"
+#include "bfq-sched.c"
+#include "bfq-cgroup.c"
+
+#define bfq_class_idle(bfqq) ((bfqq)->entity.ioprio_class ==\
+ IOPRIO_CLASS_IDLE)
+#define bfq_class_rt(bfqq) ((bfqq)->entity.ioprio_class ==\
+ IOPRIO_CLASS_RT)
+
+#define bfq_sample_valid(samples) ((samples) > 80)
+
+/*
+ * We regard a request as SYNC, if either it's a read or has the SYNC bit
+ * set (in which case it could also be a direct WRITE).
+ */
+static inline int bfq_bio_sync(struct bio *bio)
+{
+ if (bio_data_dir(bio) == READ || (bio->bi_rw & REQ_SYNC))
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Scheduler run of queue, if there are requests pending and no one in the
+ * driver that will restart queueing.
+ */
+static inline void bfq_schedule_dispatch(struct bfq_data *bfqd)
+{
+ if (bfqd->queued != 0) {
+ bfq_log(bfqd, "schedule dispatch");
+ kblockd_schedule_work(bfqd->queue, &bfqd->unplug_work);
+ }
+}
+
+/*
+ * Lifted from AS - choose which of rq1 and rq2 that is best served now.
+ * We choose the request that is closesr to the head right now. Distance
+ * behind the head is penalized and only allowed to a certain extent.
+ */
+static struct request *bfq_choose_req(struct bfq_data *bfqd,
+ struct request *rq1,
+ struct request *rq2,
+ sector_t last)
+{
+ sector_t s1, s2, d1 = 0, d2 = 0;
+ unsigned long back_max;
+#define BFQ_RQ1_WRAP 0x01 /* request 1 wraps */
+#define BFQ_RQ2_WRAP 0x02 /* request 2 wraps */
+ unsigned wrap = 0; /* bit mask: requests behind the disk head? */
+
+ if (rq1 == NULL || rq1 == rq2)
+ return rq2;
+ if (rq2 == NULL)
+ return rq1;
+
+ if (rq_is_sync(rq1) && !rq_is_sync(rq2))
+ return rq1;
+ else if (rq_is_sync(rq2) && !rq_is_sync(rq1))
+ return rq2;
+ if ((rq1->cmd_flags & REQ_META) && !(rq2->cmd_flags & REQ_META))
+ return rq1;
+ else if ((rq2->cmd_flags & REQ_META) && !(rq1->cmd_flags & REQ_META))
+ return rq2;
+
+ s1 = blk_rq_pos(rq1);
+ s2 = blk_rq_pos(rq2);
+
+ /*
+ * By definition, 1KiB is 2 sectors.
+ */
+ back_max = bfqd->bfq_back_max * 2;
+
+ /*
+ * Strict one way elevator _except_ in the case where we allow
+ * short backward seeks which are biased as twice the cost of a
+ * similar forward seek.
+ */
+ if (s1 >= last)
+ d1 = s1 - last;
+ else if (s1 + back_max >= last)
+ d1 = (last - s1) * bfqd->bfq_back_penalty;
+ else
+ wrap |= BFQ_RQ1_WRAP;
+
+ if (s2 >= last)
+ d2 = s2 - last;
+ else if (s2 + back_max >= last)
+ d2 = (last - s2) * bfqd->bfq_back_penalty;
+ else
+ wrap |= BFQ_RQ2_WRAP;
+
+ /* Found required data */
+
+ /*
+ * By doing switch() on the bit mask "wrap" we avoid having to
+ * check two variables for all permutations: --> faster!
+ */
+ switch (wrap) {
+ case 0: /* common case for CFQ: rq1 and rq2 not wrapped */
+ if (d1 < d2)
+ return rq1;
+ else if (d2 < d1)
+ return rq2;
+ else {
+ if (s1 >= s2)
+ return rq1;
+ else
+ return rq2;
+ }
+
+ case BFQ_RQ2_WRAP:
+ return rq1;
+ case BFQ_RQ1_WRAP:
+ return rq2;
+ case (BFQ_RQ1_WRAP|BFQ_RQ2_WRAP): /* both rqs wrapped */
+ default:
+ /*
+ * Since both rqs are wrapped,
+ * start with the one that's further behind head
+ * (--> only *one* back seek required),
+ * since back seek takes more time than forward.
+ */
+ if (s1 <= s2)
+ return rq1;
+ else
+ return rq2;
+ }
+}
+
+static struct bfq_queue *
+bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
+ sector_t sector, struct rb_node **ret_parent,
+ struct rb_node ***rb_link)
+{
+ struct rb_node **p, *parent;
+ struct bfq_queue *bfqq = NULL;
+
+ parent = NULL;
+ p = &root->rb_node;
+ while (*p) {
+ struct rb_node **n;
+
+ parent = *p;
+ bfqq = rb_entry(parent, struct bfq_queue, pos_node);
+
+ /*
+ * Sort strictly based on sector. Smallest to the left,
+ * largest to the right.
+ */
+ if (sector > blk_rq_pos(bfqq->next_rq))
+ n = &(*p)->rb_right;
+ else if (sector < blk_rq_pos(bfqq->next_rq))
+ n = &(*p)->rb_left;
+ else
+ break;
+ p = n;
+ bfqq = NULL;
+ }
+
+ *ret_parent = parent;
+ if (rb_link)
+ *rb_link = p;
+
+ bfq_log(bfqd, "rq_pos_tree_lookup %llu: returning %d",
+ (long long unsigned)sector,
+ bfqq != NULL ? bfqq->pid : 0);
+
+ return bfqq;
+}
+
+static void bfq_rq_pos_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+{
+ struct rb_node **p, *parent;
+ struct bfq_queue *__bfqq;
+
+ if (bfqq->pos_root != NULL) {
+ rb_erase(&bfqq->pos_node, bfqq->pos_root);
+ bfqq->pos_root = NULL;
+ }
+
+ if (bfq_class_idle(bfqq))
+ return;
+ if (!bfqq->next_rq)
+ return;
+
+ bfqq->pos_root = &bfqd->rq_pos_tree;
+ __bfqq = bfq_rq_pos_tree_lookup(bfqd, bfqq->pos_root,
+ blk_rq_pos(bfqq->next_rq), &parent, &p);
+ if (__bfqq == NULL) {
+ rb_link_node(&bfqq->pos_node, parent, p);
+ rb_insert_color(&bfqq->pos_node, bfqq->pos_root);
+ } else
+ bfqq->pos_root = NULL;
+}
+
+/*
+ * Tell whether there are active queues or groups with differentiated weights.
+ */
+static inline bool bfq_differentiated_weights(struct bfq_data *bfqd)
+{
+ BUG_ON(!bfqd->hw_tag);
+ /*
+ * For weights to differ, at least one of the trees must contain
+ * at least two nodes.
+ */
+ return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
+ (bfqd->queue_weights_tree.rb_node->rb_left ||
+ bfqd->queue_weights_tree.rb_node->rb_right)
+#ifdef CONFIG_CGROUP_BFQIO
+ ) ||
+ (!RB_EMPTY_ROOT(&bfqd->group_weights_tree) &&
+ (bfqd->group_weights_tree.rb_node->rb_left ||
+ bfqd->group_weights_tree.rb_node->rb_right)
+#endif
+ );
+}
+
+/*
+ * If the weight-counter tree passed as input contains no counter for
+ * the weight of the input entity, then add that counter; otherwise just
+ * increment the existing counter.
+ *
+ * Note that weight-counter trees contain few nodes in mostly symmetric
+ * scenarios. For example, if all queues have the same weight, then the
+ * weight-counter tree for the queues may contain at most one node.
+ * This holds even if low_latency is on, because weight-raised queues
+ * are not inserted in the tree.
+ * In most scenarios, also the rate at which nodes are created/destroyed
+ * should be low.
+ */
+static void bfq_weights_tree_add(struct bfq_data *bfqd,
+ struct bfq_entity *entity,
+ struct rb_root *root)
+{
+ struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+ /*
+ * Do not insert if:
+ * - the device does not support queueing;
+ * - the entity is already associated with a counter, which happens if:
+ * 1) the entity is associated with a queue, 2) a request arrival
+ * has caused the queue to become both non-weight-raised, and hence
+ * change its weight, and backlogged; in this respect, each
+ * of the two events causes an invocation of this function,
+ * 3) this is the invocation of this function caused by the second
+ * event. This second invocation is actually useless, and we handle
+ * this fact by exiting immediately. More efficient or clearer
+ * solutions might possibly be adopted.
+ */
+ if (!bfqd->hw_tag || entity->weight_counter)
+ return;
+
+ while (*new) {
+ struct bfq_weight_counter *__counter = container_of(*new,
+ struct bfq_weight_counter,
+ weights_node);
+ parent = *new;
+
+ if (entity->weight == __counter->weight) {
+ entity->weight_counter = __counter;
+ goto inc_counter;
+ }
+ if (entity->weight < __counter->weight)
+ new = &((*new)->rb_left);
+ else
+ new = &((*new)->rb_right);
+ }
+
+ entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter),
+ GFP_ATOMIC);
+ entity->weight_counter->weight = entity->weight;
+ rb_link_node(&entity->weight_counter->weights_node, parent, new);
+ rb_insert_color(&entity->weight_counter->weights_node, root);
+
+inc_counter:
+ entity->weight_counter->num_active++;
+}
+
+/*
+ * Decrement the weight counter associated with the entity, and, if the
+ * counter reaches 0, remove the counter from the tree.
+ * See the comments to the function bfq_weights_tree_add() for considerations
+ * about overhead.
+ */
+static void bfq_weights_tree_remove(struct bfq_data *bfqd,
+ struct bfq_entity *entity,
+ struct rb_root *root)
+{
+ /*
+ * Check whether the entity is actually associated with a counter.
+ * In fact, the device may be not be considered NCQ-capable for a while,
+ * which implies that no insertion in the weight trees is performed,
+ * after which the device may start to be deemed NCQ-capable, and hence
+ * this function may start to be invoked. This may cause the function
+ * to be invoked for entities that are not associated with any counter.
+ */
+ if (!entity->weight_counter)
+ return;
+
+ BUG_ON(RB_EMPTY_ROOT(root));
+ BUG_ON(entity->weight_counter->weight != entity->weight);
+
+ BUG_ON(!entity->weight_counter->num_active);
+ entity->weight_counter->num_active--;
+ if (entity->weight_counter->num_active > 0)
+ goto reset_entity_pointer;
+
+ rb_erase(&entity->weight_counter->weights_node, root);
+ kfree(entity->weight_counter);
+
+reset_entity_pointer:
+ entity->weight_counter = NULL;
+}
+
+static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq,
+ struct request *last)
+{
+ struct rb_node *rbnext = rb_next(&last->rb_node);
+ struct rb_node *rbprev = rb_prev(&last->rb_node);
+ struct request *next = NULL, *prev = NULL;
+
+ BUG_ON(RB_EMPTY_NODE(&last->rb_node));
+
+ if (rbprev != NULL)
+ prev = rb_entry_rq(rbprev);
+
+ if (rbnext != NULL)
+ next = rb_entry_rq(rbnext);
+ else {
+ rbnext = rb_first(&bfqq->sort_list);
+ if (rbnext && rbnext != &last->rb_node)
+ next = rb_entry_rq(rbnext);
+ }
+
+ return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
+}
+
+/* see the definition of bfq_async_charge_factor for details */
+static inline unsigned long bfq_serv_to_charge(struct request *rq,
+ struct bfq_queue *bfqq)
+{
+ return blk_rq_sectors(rq) *
+ (1 + ((!bfq_bfqq_sync(bfqq)) * (bfqq->wr_coeff == 1) *
+ bfq_async_charge_factor));
+}
+
+/**
+ * bfq_updated_next_req - update the queue after a new next_rq selection.
+ * @bfqd: the device data the queue belongs to.
+ * @bfqq: the queue to update.
+ *
+ * If the first request of a queue changes we make sure that the queue
+ * has enough budget to serve at least its first request (if the
+ * request has grown). We do this because if the queue has not enough
+ * budget for its first request, it has to go through two dispatch
+ * rounds to actually get it dispatched.
+ */
+static void bfq_updated_next_req(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq)
+{
+ struct bfq_entity *entity = &bfqq->entity;
+ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
+ struct request *next_rq = bfqq->next_rq;
+ unsigned long new_budget;
+
+ if (next_rq == NULL)
+ return;
+
+ if (bfqq == bfqd->in_service_queue)
+ /*
+ * In order not to break guarantees, budgets cannot be
+ * changed after an entity has been selected.
+ */
+ return;
+
+ BUG_ON(entity->tree != &st->active);
+ BUG_ON(entity == entity->sched_data->in_service_entity);
+
+ new_budget = max_t(unsigned long, bfqq->max_budget,
+ bfq_serv_to_charge(next_rq, bfqq));
+ if (entity->budget != new_budget) {
+ entity->budget = new_budget;
+ bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
+ new_budget);
+ bfq_activate_bfqq(bfqd, bfqq);
+ }
+}
+
+static inline unsigned int bfq_wr_duration(struct bfq_data *bfqd)
+{
+ u64 dur;
+
+ if (bfqd->bfq_wr_max_time > 0)
+ return bfqd->bfq_wr_max_time;
+
+ dur = bfqd->RT_prod;
+ do_div(dur, bfqd->peak_rate);
+
+ return dur;
+}
+
+static inline void
+bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
+{
+ if (bic->saved_idle_window)
+ bfq_mark_bfqq_idle_window(bfqq);
+ else
+ bfq_clear_bfqq_idle_window(bfqq);
+ if (bic->wr_time_left && bfqq->bfqd->low_latency) {
+ /*
+ * Start a weight raising period with the duration given by
+ * the raising_time_left snapshot.
+ */
+ if (bfq_bfqq_busy(bfqq))
+ bfqq->bfqd->raised_busy_queues++;
+ bfqq->wr_coeff = bfqq->bfqd->bfq_wr_coeff;
+ bfqq->wr_cur_max_time = bic->wr_time_left;
+ bfqq->last_wr_start_finish = jiffies;
+ bfqq->entity.ioprio_changed = 1;
+ }
+ /*
+ * Clear wr_time_left to prevent bfq_bfqq_save_state() from
+ * getting confused about the queue's need of a weight-raising
+ * period.
+ */
+ bic->wr_time_left = 0;
+}
+
+/*
+ * Must be called with the queue_lock held.
+ */
+static int bfqq_process_refs(struct bfq_queue *bfqq)
+{
+ int process_refs, io_refs;
+
+ io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
+ process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
+ BUG_ON(process_refs < 0);
+ return process_refs;
+}
+
+static void bfq_add_request(struct request *rq)
+{
+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
+ struct bfq_entity *entity = &bfqq->entity;
+ struct bfq_data *bfqd = bfqq->bfqd;
+ struct request *next_rq, *prev;
+ unsigned long old_wr_coeff = bfqq->wr_coeff;
+ int idle_for_long_time = 0;
+
+ bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
+ bfqq->queued[rq_is_sync(rq)]++;
+ bfqd->queued++;
+
+ elv_rb_add(&bfqq->sort_list, rq);
+
+ /*
+ * Check if this request is a better next-serve candidate.
+ */
+ prev = bfqq->next_rq;
+ next_rq = bfq_choose_req(bfqd, bfqq->next_rq, rq, bfqd->last_position);
+ BUG_ON(next_rq == NULL);
+ bfqq->next_rq = next_rq;
+
+ /*
+ * Adjust priority tree position, if next_rq changes.
+ */
+ if (prev != bfqq->next_rq)
+ bfq_rq_pos_tree_add(bfqd, bfqq);
+
+ if (!bfq_bfqq_busy(bfqq)) {
+ int soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
+ time_is_before_jiffies(bfqq->soft_rt_next_start);
+ idle_for_long_time = time_is_before_jiffies(
+ bfqq->budget_timeout +
+ bfqd->bfq_wr_min_idle_time);
+ entity->budget = max_t(unsigned long, bfqq->max_budget,
+ bfq_serv_to_charge(next_rq, bfqq));
+
+ if (!bfqd->low_latency)
+ goto add_bfqq_busy;
+
+ if (bfq_bfqq_just_split(bfqq))
+ goto set_ioprio_changed;
+
+ /*
+ * If the queue:
+ * - is not being boosted,
+ * - has been idle for enough time,
+ * - is not a sync queue or is linked to a bfq_io_cq (it is
+ * shared "for its nature" or it is not shared and its
+ * requests have not been redirected to a shared queue)
+ * start a weight-raising period.
+ */
+ if (old_wr_coeff == 1 && (idle_for_long_time || soft_rt) &&
+ (!bfq_bfqq_sync(bfqq) || bfqq->bic != NULL)) {
+ bfqq->wr_coeff = bfqd->bfq_wr_coeff;
+ if (idle_for_long_time)
+ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+ else
+ bfqq->wr_cur_max_time =
+ bfqd->bfq_wr_rt_max_time;
+ bfq_log_bfqq(bfqd, bfqq,
+ "wrais starting at %lu, rais_max_time %u",
+ jiffies,
+ jiffies_to_msecs(bfqq->wr_cur_max_time));
+ } else if (old_wr_coeff > 1) {
+ if (idle_for_long_time)
+ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+ else if (bfqq->wr_cur_max_time ==
+ bfqd->bfq_wr_rt_max_time &&
+ !soft_rt) {
+ bfqq->wr_coeff = 1;
+ bfq_log_bfqq(bfqd, bfqq,
+ "wrais ending at %lu, rais_max_time %u",
+ jiffies,
+ jiffies_to_msecs(bfqq->
+ wr_cur_max_time));
+ } else if (time_before(
+ bfqq->last_wr_start_finish +
+ bfqq->wr_cur_max_time,
+ jiffies +
+ bfqd->bfq_wr_rt_max_time) &&
+ soft_rt) {
+ /*
+ *
+ * The remaining weight-raising time is lower
+ * than bfqd->bfq_wr_rt_max_time, which means
+ * that the application is enjoying weight
+ * raising either because deemed soft-rt in
+ * the near past, or because deemed interactive
+ * a long ago.
+ * In both cases, resetting now the current
+ * remaining weight-raising time for the
+ * application to the weight-raising duration
+ * for soft rt applications would not cause any
+ * latency increase for the application (as the
+ * new duration would be higher than the
+ * remaining time).
+ *
+ * In addition, the application is now meeting
+ * the requirements for being deemed soft rt.
+ * In the end we can correctly and safely
+ * (re)charge the weight-raising duration for
+ * the application with the weight-raising
+ * duration for soft rt applications.
+ *
+ * In particular, doing this recharge now, i.e.,
+ * before the weight-raising period for the
+ * application finishes, reduces the probability
+ * of the following negative scenario:
+ * 1) the weight of a soft rt application is
+ * raised at startup (as for any newly
+ * created application),
+ * 2) since the application is not interactive,
+ * at a certain time weight-raising is
+ * stopped for the application,
+ * 3) at that time the application happens to
+ * still have pending requests, and hence
+ * is destined to not have a chance to be
+ * deemed soft rt before these requests are
+ * completed (see the comments to the
+ * function bfq_bfqq_softrt_next_start()
+ * for details on soft rt detection),
+ * 4) these pending requests experience a high
+ * latency because the application is not
+ * weight-raised while they are pending.
+ */
+ bfqq->last_wr_start_finish = jiffies;
+ bfqq->wr_cur_max_time =
+ bfqd->bfq_wr_rt_max_time;
+ }
+ }
+set_ioprio_changed:
+ if (old_wr_coeff != bfqq->wr_coeff)
+ entity->ioprio_changed = 1;
+add_bfqq_busy:
+ bfqq->last_idle_bklogged = jiffies;
+ bfqq->service_from_backlogged = 0;
+ bfq_clear_bfqq_softrt_update(bfqq);
+ bfq_add_bfqq_busy(bfqd, bfqq);
+ } else {
+ if (bfqd->low_latency && old_wr_coeff == 1 && !rq_is_sync(rq) &&
+ time_is_before_jiffies(
+ bfqq->last_wr_start_finish +
+ bfqd->bfq_wr_min_inter_arr_async)) {
+ bfqq->wr_coeff = bfqd->bfq_wr_coeff;
+ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+
+ bfqd->raised_busy_queues++;
+ entity->ioprio_changed = 1;
+ bfq_log_bfqq(bfqd, bfqq,
+ "non-idle wrais starting at %lu, rais_max_time %u",
+ jiffies,
+ jiffies_to_msecs(bfqq->wr_cur_max_time));
+ }
+ if (prev != bfqq->next_rq)
+ bfq_updated_next_req(bfqd, bfqq);
+ }
+
+ if (bfqd->low_latency &&
+ (old_wr_coeff == 1 || bfqq->wr_coeff == 1 ||
+ idle_for_long_time))
+ bfqq->last_wr_start_finish = jiffies;
+}
+
+static struct request *bfq_find_rq_fmerge(struct bfq_data *bfqd,
+ struct bio *bio)
+{
+ struct task_struct *tsk = current;
+ struct bfq_io_cq *bic;
+ struct bfq_queue *bfqq;
+
+ bic = bfq_bic_lookup(bfqd, tsk->io_context);
+ if (bic == NULL)
+ return NULL;
+
+ bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
+ if (bfqq != NULL)
+ return elv_rb_find(&bfqq->sort_list, bio_end_sector(bio));
+
+ return NULL;
+}
+
+static void bfq_activate_request(struct request_queue *q, struct request *rq)
+{
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+
+ bfqd->rq_in_driver++;
+ bfqd->last_position = blk_rq_pos(rq) + blk_rq_sectors(rq);
+ bfq_log(bfqd, "activate_request: new bfqd->last_position %llu",
+ (long long unsigned)bfqd->last_position);
+}
+
+static void bfq_deactivate_request(struct request_queue *q, struct request *rq)
+{
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+
+ WARN_ON(bfqd->rq_in_driver == 0);
+ bfqd->rq_in_driver--;
+}
+
+static void bfq_remove_request(struct request *rq)
+{
+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
+ struct bfq_data *bfqd = bfqq->bfqd;
+ const int sync = rq_is_sync(rq);
+
+ if (bfqq->next_rq == rq) {
+ bfqq->next_rq = bfq_find_next_rq(bfqd, bfqq, rq);
+ bfq_updated_next_req(bfqd, bfqq);
+ }
+
+ list_del_init(&rq->queuelist);
+ BUG_ON(bfqq->queued[sync] == 0);
+ bfqq->queued[sync]--;
+ bfqd->queued--;
+ elv_rb_del(&bfqq->sort_list, rq);
+
+ if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
+ if (bfq_bfqq_busy(bfqq) && bfqq != bfqd->in_service_queue)
+ bfq_del_bfqq_busy(bfqd, bfqq, 1);
+ /*
+ * Remove queue from request-position tree as it is empty.
+ */
+ if (bfqq->pos_root != NULL) {
+ rb_erase(&bfqq->pos_node, bfqq->pos_root);
+ bfqq->pos_root = NULL;
+ }
+ }
+
+ if (rq->cmd_flags & REQ_META) {
+ WARN_ON(bfqq->meta_pending == 0);
+ bfqq->meta_pending--;
+ }
+}
+
+static int bfq_merge(struct request_queue *q, struct request **req,
+ struct bio *bio)
+{
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+ struct request *__rq;
+
+ __rq = bfq_find_rq_fmerge(bfqd, bio);
+ if (__rq != NULL && elv_rq_merge_ok(__rq, bio)) {
+ *req = __rq;
+ return ELEVATOR_FRONT_MERGE;
+ }
+
+ return ELEVATOR_NO_MERGE;
+}
+
+static void bfq_merged_request(struct request_queue *q, struct request *req,
+ int type)
+{
+ if (type == ELEVATOR_FRONT_MERGE &&
+ rb_prev(&req->rb_node) &&
+ blk_rq_pos(req) <
+ blk_rq_pos(container_of(rb_prev(&req->rb_node),
+ struct request, rb_node))) {
+ struct bfq_queue *bfqq = RQ_BFQQ(req);
+ struct bfq_data *bfqd = bfqq->bfqd;
+ struct request *prev, *next_rq;
+
+ /* Reposition request in its sort_list */
+ elv_rb_del(&bfqq->sort_list, req);
+ elv_rb_add(&bfqq->sort_list, req);
+ /* Choose next request to be served for bfqq */
+ prev = bfqq->next_rq;
+ next_rq = bfq_choose_req(bfqd, bfqq->next_rq, req,
+ bfqd->last_position);
+ BUG_ON(next_rq == NULL);
+ bfqq->next_rq = next_rq;
+ /*
+ * If next_rq changes, update both the queue's budget to fit
+ * the new request and the queue's position in its rq_pos_tree.
+ */
+ if (prev != bfqq->next_rq) {
+ bfq_updated_next_req(bfqd, bfqq);
+ bfq_rq_pos_tree_add(bfqd, bfqq);
+ }
+ }
+}
+
+static void bfq_merged_requests(struct request_queue *q, struct request *rq,
+ struct request *next)
+{
+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
+
+ /*
+ * Reposition in fifo if next is older than rq.
+ */
+ if (!list_empty(&rq->queuelist) && !list_empty(&next->queuelist) &&
+ time_before(next->fifo_time, rq->fifo_time)) {
+ list_move(&rq->queuelist, &next->queuelist);
+ rq->fifo_time = next->fifo_time;
+ }
+
+ if (bfqq->next_rq == next)
+ bfqq->next_rq = rq;
+
+ bfq_remove_request(next);
+}
+
+/* Must be called with bfqq != NULL */
+static inline void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
+{
+ BUG_ON(bfqq == NULL);
+ if (bfq_bfqq_busy(bfqq))
+ bfqq->bfqd->raised_busy_queues--;
+ bfqq->wr_coeff = 1;
+ bfqq->wr_cur_max_time = 0;
+ /* Trigger a weight change on the next activation of the queue */
+ bfqq->entity.ioprio_changed = 1;
+}
+
+static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
+ struct bfq_group *bfqg)
+{
+ int i, j;
+
+ for (i = 0; i < 2; i++)
+ for (j = 0; j < IOPRIO_BE_NR; j++)
+ if (bfqg->async_bfqq[i][j] != NULL)
+ bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
+ if (bfqg->async_idle_bfqq != NULL)
+ bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
+}
+
+static void bfq_end_wr(struct bfq_data *bfqd)
+{
+ struct bfq_queue *bfqq;
+
+ spin_lock_irq(bfqd->queue->queue_lock);
+
+ list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
+ bfq_bfqq_end_wr(bfqq);
+ list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
+ bfq_bfqq_end_wr(bfqq);
+ bfq_end_wr_async(bfqd);
+
+ spin_unlock_irq(bfqd->queue->queue_lock);
+}
+
+static inline sector_t bfq_io_struct_pos(void *io_struct, bool request)
+{
+ if (request)
+ return blk_rq_pos(io_struct);
+ else
+ return ((struct bio *)io_struct)->bi_iter.bi_sector;
+}
+
+static inline sector_t bfq_dist_from(sector_t pos1,
+ sector_t pos2)
+{
+ if (pos1 >= pos2)
+ return pos1 - pos2;
+ else
+ return pos2 - pos1;
+}
+
+static inline int bfq_rq_close_to_sector(void *io_struct, bool request,
+ sector_t sector)
+{
+ return bfq_dist_from(bfq_io_struct_pos(io_struct, request), sector) <=
+ BFQQ_SEEK_THR;
+}
+
+static struct bfq_queue *bfqq_close(struct bfq_data *bfqd, sector_t sector)
+{
+ struct rb_root *root = &bfqd->rq_pos_tree;
+ struct rb_node *parent, *node;
+ struct bfq_queue *__bfqq;
+
+ if (RB_EMPTY_ROOT(root))
+ return NULL;
+
+ /*
+ * First, if we find a request starting at the end of the last
+ * request, choose it.
+ */
+ __bfqq = bfq_rq_pos_tree_lookup(bfqd, root, sector, &parent, NULL);
+ if (__bfqq != NULL)
+ return __bfqq;
+
+ /*
+ * If the exact sector wasn't found, the parent of the NULL leaf
+ * will contain the closest sector (rq_pos_tree sorted by next_request
+ * position).
+ */
+ __bfqq = rb_entry(parent, struct bfq_queue, pos_node);
+ if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
+ return __bfqq;
+
+ if (blk_rq_pos(__bfqq->next_rq) < sector)
+ node = rb_next(&__bfqq->pos_node);
+ else
+ node = rb_prev(&__bfqq->pos_node);
+ if (node == NULL)
+ return NULL;
+
+ __bfqq = rb_entry(node, struct bfq_queue, pos_node);
+ if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
+ return __bfqq;
+
+ return NULL;
+}
+
+/*
+ * bfqd - obvious
+ * cur_bfqq - passed in so that we don't decide that the current queue
+ * is closely cooperating with itself
+ * sector - used as a reference point to search for a close queue
+ */
+static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
+ struct bfq_queue *cur_bfqq,
+ sector_t sector)
+{
+ struct bfq_queue *bfqq;
+
+ if (bfq_class_idle(cur_bfqq))
+ return NULL;
+ if (!bfq_bfqq_sync(cur_bfqq))
+ return NULL;
+ if (BFQQ_SEEKY(cur_bfqq))
+ return NULL;
+
+ /* If device has only one backlogged bfq_queue, don't search. */
+ if (bfqd->busy_queues == 1)
+ return NULL;
+
+ /*
+ * We should notice if some of the queues are cooperating, e.g.
+ * working closely on the same area of the disk. In that case,
+ * we can group them together and don't waste time idling.
+ */
+ bfqq = bfqq_close(bfqd, sector);
+ if (bfqq == NULL || bfqq == cur_bfqq)
+ return NULL;
+
+ /*
+ * Do not merge queues from different bfq_groups.
+ */
+ if (bfqq->entity.parent != cur_bfqq->entity.parent)
+ return NULL;
+
+ /*
+ * It only makes sense to merge sync queues.
+ */
+ if (!bfq_bfqq_sync(bfqq))
+ return NULL;
+ if (BFQQ_SEEKY(bfqq))
+ return NULL;
+
+ /*
+ * Do not merge queues of different priority classes.
+ */
+ if (bfq_class_rt(bfqq) != bfq_class_rt(cur_bfqq))
+ return NULL;
+
+ return bfqq;
+}
+
+static struct bfq_queue *
+bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
+{
+ int process_refs, new_process_refs;
+ struct bfq_queue *__bfqq;
+
+ /*
+ * If there are no process references on the new_bfqq, then it is
+ * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
+ * may have dropped their last reference (not just their last process
+ * reference).
+ */
+ if (!bfqq_process_refs(new_bfqq))
+ return NULL;
+
+ /* Avoid a circular list and skip interim queue merges. */
+ while ((__bfqq = new_bfqq->new_bfqq)) {
+ if (__bfqq == bfqq)
+ return NULL;
+ new_bfqq = __bfqq;
+ }
+
+ process_refs = bfqq_process_refs(bfqq);
+ new_process_refs = bfqq_process_refs(new_bfqq);
+ /*
+ * If the process for the bfqq has gone away, there is no
+ * sense in merging the queues.
+ */
+ if (process_refs == 0 || new_process_refs == 0)
+ return NULL;
+
+ bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
+ new_bfqq->pid);
+
+ /*
+ * Merging is just a redirection: the requests of the process owning
+ * one of the two queues are redirected to the other queue. The latter
+ * queue, in its turn, is set as shared if this is the first time that
+ * the requests of some process are redirected to it.
+ *
+ * We redirect bfqq to new_bfqq and not the opposite, because we
+ * are in the context of the process owning bfqq, hence we have the
+ * io_cq of this process. So we can immediately configure this io_cq
+ * to redirect the requests of the process to new_bfqq.
+ *
+ * NOTE, even if new_bfqq coincides with the in-service queue, the
+ * io_cq of new_bfqq is not available, because, if the in-service queue
+ * is shared, bfqd->in_service_bic may not point to the io_cq of the
+ * in-service queue.
+ * Redirecting the requests of the process owning bfqq to the currently
+ * in-service queue is in any case the best option, as we feed the
+ * in-service queue with new requests close to the last request served
+ * and, by doing so, hopefully increase the throughput.
+ */
+ bfqq->new_bfqq = new_bfqq;
+ atomic_add(process_refs, &new_bfqq->ref);
+ return new_bfqq;
+}
+
+/*
+ * Attempt to schedule a merge of bfqq with the currently in-service queue or
+ * with a close queue among the scheduled queues.
+ * Return NULL if no merge was scheduled, a pointer to the shared bfq_queue
+ * structure otherwise.
+ */
+static struct bfq_queue *
+bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ void *io_struct, bool request)
+{
+ struct bfq_queue *in_service_bfqq, *new_bfqq;
+
+ if (bfqq->new_bfqq)
+ return bfqq->new_bfqq;
+
+ if (!io_struct)
+ return NULL;
+
+ in_service_bfqq = bfqd->in_service_queue;
+
+ if (in_service_bfqq == NULL || in_service_bfqq == bfqq ||
+ !bfqd->in_service_bic)
+ goto check_scheduled;
+
+ if (bfq_class_idle(in_service_bfqq) || bfq_class_idle(bfqq))
+ goto check_scheduled;
+
+ if (bfq_class_rt(in_service_bfqq) != bfq_class_rt(bfqq))
+ goto check_scheduled;
+
+ if (in_service_bfqq->entity.parent != bfqq->entity.parent)
+ goto check_scheduled;
+
+ if (bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
+ bfq_bfqq_sync(in_service_bfqq) && bfq_bfqq_sync(bfqq)) {
+ new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
+ if (new_bfqq != NULL)
+ return new_bfqq; /* Merge with the in-service queue */
+ }
+
+ /*
+ * Check whether there is a cooperator among currently scheduled
+ * queues. The only thing we need is that the bio/request is not
+ * NULL, as we need it to establish whether a cooperator exists.
+ */
+check_scheduled:
+ new_bfqq = bfq_close_cooperator(bfqd, bfqq,
+ bfq_io_struct_pos(io_struct, request));
+ if (new_bfqq)
+ return bfq_setup_merge(bfqq, new_bfqq);
+
+ return NULL;
+}
+
+static inline void
+bfq_bfqq_save_state(struct bfq_queue *bfqq)
+{
+ /*
+ * If bfqq->bic == NULL, the queue is already shared or its requests
+ * have already been redirected to a shared queue; both idle window
+ * and weight raising state have already been saved. Do nothing.
+ */
+ if (bfqq->bic == NULL)
+ return;
+ if (bfqq->bic->wr_time_left)
+ /*
+ * This is the queue of a just-started process, and would
+ * deserve weight raising: we set wr_time_left to the full
+ * weight-raising duration to trigger weight-raising when and
+ * if the queue is split and the first request of the queue
+ * is enqueued.
+ */
+ bfqq->bic->wr_time_left = bfq_wr_duration(bfqq->bfqd);
+ else if (bfqq->wr_coeff > 1) {
+ unsigned long wr_duration =
+ jiffies - bfqq->last_wr_start_finish;
+ /*
+ * It may happen that a queue's weight raising period lasts
+ * longer than its wr_cur_max_time, as weight raising is
+ * handled only when a request is enqueued or dispatched (it
+ * does not use any timer). If the weight raising period is
+ * about to end, don't save it.
+ */
+ if (bfqq->wr_cur_max_time <= wr_duration)
+ bfqq->bic->wr_time_left = 0;
+ else
+ bfqq->bic->wr_time_left =
+ bfqq->wr_cur_max_time - wr_duration;
+ /*
+ * The bfq_queue is becoming shared or the requests of the
+ * process owning the queue are being redirected to a shared
+ * queue. Stop the weight raising period of the queue, as in
+ * both cases it should not be owned by an interactive or soft
+ * real-time application.
+ */
+ bfq_bfqq_end_wr(bfqq);
+ } else
+ bfqq->bic->wr_time_left = 0;
+ bfqq->bic->saved_idle_window = bfq_bfqq_idle_window(bfqq);
+}
+
+static inline void
+bfq_get_bic_reference(struct bfq_queue *bfqq)
+{
+ /*
+ * If bfqq->bic has a non-NULL value, the bic to which it belongs
+ * is about to begin using a shared bfq_queue.
+ */
+ if (bfqq->bic)
+ atomic_long_inc(&bfqq->bic->icq.ioc->refcount);
+}
+
+static void
+bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
+ struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
+{
+ bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
+ (long unsigned)new_bfqq->pid);
+ /* Save weight raising and idle window of the merged queues */
+ bfq_bfqq_save_state(bfqq);
+ bfq_bfqq_save_state(new_bfqq);
+ /*
+ * Grab a reference to the bic, to prevent it from being destroyed
+ * before being possibly touched by a bfq_split_bfqq().
+ */
+ bfq_get_bic_reference(bfqq);
+ bfq_get_bic_reference(new_bfqq);
+ /* Merge queues (that is, let bic redirect its requests to new_bfqq) */
+ bic_set_bfqq(bic, new_bfqq, 1);
+ bfq_mark_bfqq_coop(new_bfqq);
+ /*
+ * new_bfqq now belongs to at least two bics (it is a shared queue): set
+ * new_bfqq->bic to NULL. bfqq either:
+ * - does not belong to any bic any more, and hence bfqq->bic must
+ * be set to NULL, or
+ * - is a queue whose owning bics have already been redirected to a
+ * different queue, hence the queue is destined to not belong to any
+ * bic soon and bfqq->bic is already NULL (therefore the next
+ * assignment causes no harm).
+ */
+ new_bfqq->bic = NULL;
+ bfqq->bic = NULL;
+ bfq_put_queue(bfqq);
+}
+
+static int bfq_allow_merge(struct request_queue *q, struct request *rq,
+ struct bio *bio)
+{
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+ struct bfq_io_cq *bic;
+ struct bfq_queue *bfqq, *new_bfqq;
+
+ /*
+ * Disallow merge of a sync bio into an async request.
+ */
+ if (bfq_bio_sync(bio) && !rq_is_sync(rq))
+ return 0;
+
+ /*
+ * Lookup the bfqq that this bio will be queued with. Allow
+ * merge only if rq is queued there.
+ * Queue lock is held here.
+ */
+ bic = bfq_bic_lookup(bfqd, current->io_context);
+ if (bic == NULL)
+ return 0;
+
+ bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
+ /*
+ * We take advantage of this function to perform an early merge
+ * of the queues of possible cooperating processes.
+ */
+ if (bfqq != NULL) {
+ new_bfqq = bfq_setup_cooperator(bfqd, bfqq, bio, false);
+ if (new_bfqq != NULL) {
+ bfq_merge_bfqqs(bfqd, bic, bfqq, new_bfqq);
+ /*
+ * If we get here, the bio will be queued in the shared
+ * queue, i.e., new_bfqq, so use new_bfqq to decide
+ * whether bio and rq can be merged.
+ */
+ bfqq = new_bfqq;
+ }
+ }
+
+ return bfqq == RQ_BFQQ(rq);
+}
+
+static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq)
+{
+ if (bfqq != NULL) {
+ bfq_mark_bfqq_must_alloc(bfqq);
+ bfq_mark_bfqq_budget_new(bfqq);
+ bfq_clear_bfqq_fifo_expire(bfqq);
+
+ bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
+
+ bfq_log_bfqq(bfqd, bfqq,
+ "set_in_service_queue, cur-budget = %lu",
+ bfqq->entity.budget);
+ }
+
+ bfqd->in_service_queue = bfqq;
+}
+
+/*
+ * Get and set a new queue for service.
+ */
+static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd)
+{
+ struct bfq_queue *bfqq = bfq_get_next_queue(bfqd);
+
+ __bfq_set_in_service_queue(bfqd, bfqq);
+ return bfqq;
+}
+
+/*
+ * If enough samples have been computed, return the current max budget
+ * stored in bfqd, which is dynamically updated according to the
+ * estimated disk peak rate; otherwise return the default max budget
+ */
+static inline unsigned long bfq_max_budget(struct bfq_data *bfqd)
+{
+ if (bfqd->budgets_assigned < 194)
+ return bfq_default_max_budget;
+ else
+ return bfqd->bfq_max_budget;
+}
+
+/*
+ * Return min budget, which is a fraction of the current or default
+ * max budget (trying with 1/32)
+ */
+static inline unsigned long bfq_min_budget(struct bfq_data *bfqd)
+{
+ if (bfqd->budgets_assigned < 194)
+ return bfq_default_max_budget / 32;
+ else
+ return bfqd->bfq_max_budget / 32;
+}
+
+static void bfq_arm_slice_timer(struct bfq_data *bfqd)
+{
+ struct bfq_queue *bfqq = bfqd->in_service_queue;
+ struct bfq_io_cq *bic;
+ unsigned long sl;
+
+ WARN_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
+
+ /* Tasks have exited, don't wait. */
+ bic = bfqd->in_service_bic;
+ if (bic == NULL || atomic_read(&bic->icq.ioc->active_ref) == 0)
+ return;
+
+ bfq_mark_bfqq_wait_request(bfqq);
+
+ /*
+ * We don't want to idle for seeks, but we do want to allow
+ * fair distribution of slice time for a process doing back-to-back
+ * seeks. So allow a little bit of time for him to submit a new rq.
+ *
+ * To prevent processes with (partly) seeky workloads from
+ * being too ill-treated, grant them a small fraction of the
+ * assigned budget before reducing the waiting time to
+ * BFQ_MIN_TT. This happened to help reduce latency.
+ */
+ sl = bfqd->bfq_slice_idle;
+ /*
+ * Unless the queue is being weight-raised, grant only minimum idle
+ * time if the queue either has been seeky for long enough or has
+ * already proved to be constantly seeky.
+ */
+ if (bfq_sample_valid(bfqq->seek_samples) &&
+ ((BFQQ_SEEKY(bfqq) && bfqq->entity.service >
+ bfq_max_budget(bfqq->bfqd) / 8) ||
+ bfq_bfqq_constantly_seeky(bfqq)) && bfqq->wr_coeff == 1)
+ sl = min(sl, msecs_to_jiffies(BFQ_MIN_TT));
+ else if (bfqq->wr_coeff > 1)
+ sl = sl * 3;
+ bfqd->last_idling_start = ktime_get();
+ mod_timer(&bfqd->idle_slice_timer, jiffies + sl);
+ bfq_log(bfqd, "arm idle: %u/%u ms",
+ jiffies_to_msecs(sl), jiffies_to_msecs(bfqd->bfq_slice_idle));
+}
+
+/*
+ * Set the maximum time for the in-service queue to consume its
+ * budget. This prevents seeky processes from lowering the disk
+ * throughput (always guaranteed with a time slice scheme as in CFQ).
+ */
+static void bfq_set_budget_timeout(struct bfq_data *bfqd)
+{
+ struct bfq_queue *bfqq = bfqd->in_service_queue;
+ unsigned int timeout_coeff;
+ if (bfqq->wr_cur_max_time == bfqd->bfq_wr_rt_max_time)
+ timeout_coeff = 1;
+ else
+ timeout_coeff = bfqq->entity.weight / bfqq->entity.orig_weight;
+
+ bfqd->last_budget_start = ktime_get();
+
+ bfq_clear_bfqq_budget_new(bfqq);
+ bfqq->budget_timeout = jiffies +
+ bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] * timeout_coeff;
+
+ bfq_log_bfqq(bfqd, bfqq, "set budget_timeout %u",
+ jiffies_to_msecs(bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] *
+ timeout_coeff));
+}
+
+/*
+ * Move request from internal lists to the request queue dispatch list.
+ */
+static void bfq_dispatch_insert(struct request_queue *q, struct request *rq)
+{
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
+
+ /*
+ * For consistency, the next instruction should have been executed
+ * after removing the request from the queue and dispatching it.
+ * We execute instead this instruction before bfq_remove_request()
+ * (and hence introduce a temporary inconsistency), for efficiency.
+ * In fact, in a forced_dispatch, this prevents two counters related
+ * to bfqq->dispatched to risk to be uselessly decremented if bfqq is
+ * not in service, and then to be incremented again after incrementing
+ * bfqq->dispatched.
+ */
+ bfqq->dispatched++;
+ bfq_remove_request(rq);
+ elv_dispatch_sort(q, rq);
+
+ if (bfq_bfqq_sync(bfqq))
+ bfqd->sync_flight++;
+}
+
+/*
+ * Return expired entry, or NULL to just start from scratch in rbtree.
+ */
+static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
+{
+ struct request *rq = NULL;
+
+ if (bfq_bfqq_fifo_expire(bfqq))
+ return NULL;
+
+ bfq_mark_bfqq_fifo_expire(bfqq);
+
+ if (list_empty(&bfqq->fifo))
+ return NULL;
+
+ rq = rq_entry_fifo(bfqq->fifo.next);
+
+ if (time_before(jiffies, rq->fifo_time))
+ return NULL;
+
+ return rq;
+}
+
+static inline unsigned long bfq_bfqq_budget_left(struct bfq_queue *bfqq)
+{
+ struct bfq_entity *entity = &bfqq->entity;
+ return entity->budget - entity->service;
+}
+
+static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+{
+ BUG_ON(bfqq != bfqd->in_service_queue);
+
+ __bfq_bfqd_reset_in_service(bfqd);
+
+ /*
+ * If this bfqq is shared between multiple processes, check
+ * to make sure that those processes are still issuing I/Os
+ * within the mean seek distance. If not, it may be time to
+ * break the queues apart again.
+ */
+ if (bfq_bfqq_coop(bfqq) && BFQQ_SEEKY(bfqq))
+ bfq_mark_bfqq_split_coop(bfqq);
+
+ if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
+ /*
+ * overloading budget_timeout field to store when
+ * the queue remains with no backlog, used by
+ * the weight-raising mechanism
+ */
+ bfqq->budget_timeout = jiffies;
+ bfq_del_bfqq_busy(bfqd, bfqq, 1);
+ } else {
+ bfq_activate_bfqq(bfqd, bfqq);
+ /*
+ * Resort priority tree of potential close cooperators.
+ */
+ bfq_rq_pos_tree_add(bfqd, bfqq);
+ }
+}
+
+/**
+ * __bfq_bfqq_recalc_budget - try to adapt the budget to the @bfqq behavior.
+ * @bfqd: device data.
+ * @bfqq: queue to update.
+ * @reason: reason for expiration.
+ *
+ * Handle the feedback on @bfqq budget. See the body for detailed
+ * comments.
+ */
+static void __bfq_bfqq_recalc_budget(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq,
+ enum bfqq_expiration reason)
+{
+ struct request *next_rq;
+ unsigned long budget, min_budget;
+
+ budget = bfqq->max_budget;
+ min_budget = bfq_min_budget(bfqd);
+
+ BUG_ON(bfqq != bfqd->in_service_queue);
+
+ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last budg %lu, budg left %lu",
+ bfqq->entity.budget, bfq_bfqq_budget_left(bfqq));
+ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last max_budg %lu, min budg %lu",
+ budget, bfq_min_budget(bfqd));
+ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: sync %d, seeky %d",
+ bfq_bfqq_sync(bfqq), BFQQ_SEEKY(bfqd->in_service_queue));
+
+ if (bfq_bfqq_sync(bfqq)) {
+ switch (reason) {
+ /*
+ * Caveat: in all the following cases we trade latency
+ * for throughput.
+ */
+ case BFQ_BFQQ_TOO_IDLE:
+ /*
+ * This is the only case where we may reduce
+ * the budget: if there is no request of the
+ * process still waiting for completion, then
+ * we assume (tentatively) that the timer has
+ * expired because the batch of requests of
+ * the process could have been served with a
+ * smaller budget. Hence, betting that
+ * process will behave in the same way when it
+ * becomes backlogged again, we reduce its
+ * next budget. As long as we guess right,
+ * this budget cut reduces the latency
+ * experienced by the process.
+ *
+ * However, if there are still outstanding
+ * requests, then the process may have not yet
+ * issued its next request just because it is
+ * still waiting for the completion of some of
+ * the still outstanding ones. So in this
+ * subcase we do not reduce its budget, on the
+ * contrary we increase it to possibly boost
+ * the throughput, as discussed in the
+ * comments to the BUDGET_TIMEOUT case.
+ */
+ if (bfqq->dispatched > 0) /* still outstanding reqs */
+ budget = min(budget * 2, bfqd->bfq_max_budget);
+ else {
+ if (budget > 5 * min_budget)
+ budget -= 4 * min_budget;
+ else
+ budget = min_budget;
+ }
+ break;
+ case BFQ_BFQQ_BUDGET_TIMEOUT:
+ /*
+ * We double the budget here because: 1) it
+ * gives the chance to boost the throughput if
+ * this is not a seeky process (which may have
+ * bumped into this timeout because of, e.g.,
+ * ZBR), 2) together with charge_full_budget
+ * it helps give seeky processes higher
+ * timestamps, and hence be served less
+ * frequently.
+ */
+ budget = min(budget * 2, bfqd->bfq_max_budget);
+ break;
+ case BFQ_BFQQ_BUDGET_EXHAUSTED:
+ /*
+ * The process still has backlog, and did not
+ * let either the budget timeout or the disk
+ * idling timeout expire. Hence it is not
+ * seeky, has a short thinktime and may be
+ * happy with a higher budget too. So
+ * definitely increase the budget of this good
+ * candidate to boost the disk throughput.
+ */
+ budget = min(budget * 4, bfqd->bfq_max_budget);
+ break;
+ case BFQ_BFQQ_NO_MORE_REQUESTS:
+ /*
+ * Leave the budget unchanged.
+ */
+ default:
+ return;
+ }
+ } else /* async queue */
+ /* async queues get always the maximum possible budget
+ * (their ability to dispatch is limited by
+ * @bfqd->bfq_max_budget_async_rq).
+ */
+ budget = bfqd->bfq_max_budget;
+
+ bfqq->max_budget = budget;
+
+ if (bfqd->budgets_assigned >= 194 && bfqd->bfq_user_max_budget == 0 &&
+ bfqq->max_budget > bfqd->bfq_max_budget)
+ bfqq->max_budget = bfqd->bfq_max_budget;
+
+ /*
+ * Make sure that we have enough budget for the next request.
+ * Since the finish time of the bfqq must be kept in sync with
+ * the budget, be sure to call __bfq_bfqq_expire() after the
+ * update.
+ */
+ next_rq = bfqq->next_rq;
+ if (next_rq != NULL)
+ bfqq->entity.budget = max_t(unsigned long, bfqq->max_budget,
+ bfq_serv_to_charge(next_rq, bfqq));
+ else
+ bfqq->entity.budget = bfqq->max_budget;
+
+ bfq_log_bfqq(bfqd, bfqq, "head sect: %u, new budget %lu",
+ next_rq != NULL ? blk_rq_sectors(next_rq) : 0,
+ bfqq->entity.budget);
+}
+
+static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout)
+{
+ unsigned long max_budget;
+
+ /*
+ * The max_budget calculated when autotuning is equal to the
+ * amount of sectors transfered in timeout_sync at the
+ * estimated peak rate.
+ */
+ max_budget = (unsigned long)(peak_rate * 1000 *
+ timeout >> BFQ_RATE_SHIFT);
+
+ return max_budget;
+}
+
+/*
+ * In addition to updating the peak rate, checks whether the process
+ * is "slow", and returns 1 if so. This slow flag is used, in addition
+ * to the budget timeout, to reduce the amount of service provided to
+ * seeky processes, and hence reduce their chances to lower the
+ * throughput. See the code for more details.
+ */
+static int bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ int compensate, enum bfqq_expiration reason)
+{
+ u64 bw, usecs, expected, timeout;
+ ktime_t delta;
+ int update = 0;
+
+ if (!bfq_bfqq_sync(bfqq) || bfq_bfqq_budget_new(bfqq))
+ return 0;
+
+ if (compensate)
+ delta = bfqd->last_idling_start;
+ else
+ delta = ktime_get();
+ delta = ktime_sub(delta, bfqd->last_budget_start);
+ usecs = ktime_to_us(delta);
+
+ /* Don't trust short/unrealistic values. */
+ if (usecs < 100 || usecs >= LONG_MAX)
+ return 0;
+
+ /*
+ * Calculate the bandwidth for the last slice. We use a 64 bit
+ * value to store the peak rate, in sectors per usec in fixed
+ * point math. We do so to have enough precision in the estimate
+ * and to avoid overflows.
+ */
+ bw = (u64)bfqq->entity.service << BFQ_RATE_SHIFT;
+ do_div(bw, (unsigned long)usecs);
+
+ timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
+
+ /*
+ * Use only long (> 20ms) intervals to filter out spikes for
+ * the peak rate estimation.
+ */
+ if (usecs > 20000) {
+ if (bw > bfqd->peak_rate ||
+ (!BFQQ_SEEKY(bfqq) &&
+ reason == BFQ_BFQQ_BUDGET_TIMEOUT)) {
+ bfq_log(bfqd, "measured bw =%llu", bw);
+ /*
+ * To smooth oscillations use a low-pass filter with
+ * alpha=7/8, i.e.,
+ * new_rate = (7/8) * old_rate + (1/8) * bw
+ */
+ do_div(bw, 8);
+ if (bw == 0)
+ return 0;
+ bfqd->peak_rate *= 7;
+ do_div(bfqd->peak_rate, 8);
+ bfqd->peak_rate += bw;
+ update = 1;
+ bfq_log(bfqd, "new peak_rate=%llu", bfqd->peak_rate);
+ }
+
+ update |= bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES - 1;
+
+ if (bfqd->peak_rate_samples < BFQ_PEAK_RATE_SAMPLES)
+ bfqd->peak_rate_samples++;
+
+ if (bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES &&
+ update) {
+ int dev_type = blk_queue_nonrot(bfqd->queue);
+ if (bfqd->bfq_user_max_budget == 0) {
+ bfqd->bfq_max_budget =
+ bfq_calc_max_budget(bfqd->peak_rate,
+ timeout);
+ bfq_log(bfqd, "new max_budget=%lu",
+ bfqd->bfq_max_budget);
+ }
+ if (bfqd->device_speed == BFQ_BFQD_FAST &&
+ bfqd->peak_rate < device_speed_thresh[dev_type]) {
+ bfqd->device_speed = BFQ_BFQD_SLOW;
+ bfqd->RT_prod = R_slow[dev_type] *
+ T_slow[dev_type];
+ } else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
+ bfqd->peak_rate > device_speed_thresh[dev_type]) {
+ bfqd->device_speed = BFQ_BFQD_FAST;
+ bfqd->RT_prod = R_fast[dev_type] *
+ T_fast[dev_type];
+ }
+ }
+ }
+
+ /*
+ * If the process has been served for a too short time
+ * interval to let its possible sequential accesses prevail on
+ * the initial seek time needed to move the disk head on the
+ * first sector it requested, then give the process a chance
+ * and for the moment return false.
+ */
+ if (bfqq->entity.budget <= bfq_max_budget(bfqd) / 8)
+ return 0;
+
+ /*
+ * A process is considered ``slow'' (i.e., seeky, so that we
+ * cannot treat it fairly in the service domain, as it would
+ * slow down too much the other processes) if, when a slice
+ * ends for whatever reason, it has received service at a
+ * rate that would not be high enough to complete the budget
+ * before the budget timeout expiration.
+ */
+ expected = bw * 1000 * timeout >> BFQ_RATE_SHIFT;
+
+ /*
+ * Caveat: processes doing IO in the slower disk zones will
+ * tend to be slow(er) even if not seeky. And the estimated
+ * peak rate will actually be an average over the disk
+ * surface. Hence, to not be too harsh with unlucky processes,
+ * we keep a budget/3 margin of safety before declaring a
+ * process slow.
+ */
+ return expected > (4 * bfqq->entity.budget) / 3;
+}
+
+/*
+ * To be deemed as soft real-time, an application must meet two requirements.
+ * First, the application must not require an average bandwidth higher than
+ * the approximate bandwidth required to playback or record a compressed high-
+ * definition video.
+ * The next function is invoked on the completion of the last request of a
+ * batch, to compute the next-start time instant, soft_rt_next_start, such
+ * that, if the next request of the application does not arrive before
+ * soft_rt_next_start, then the above requirement on the bandwidth is met.
+ *
+ * The second requirement is that the request pattern of the application is
+ * isochronous, i.e., that, after issuing a request or a batch of requests,
+ * the application stops issuing new requests until all its pending requests
+ * have been completed. After that, the application may issue a new batch,
+ * and so on.
+ * For this reason the next function is invoked to compute soft_rt_next_start
+ * only for applications that meet this requirement, whereas soft_rt_next_start
+ * is set to infinity for applications that do not.
+ *
+ * Unfortunately, even a greedy application may happen to behave in an
+ * isochronous way if the CPU load is high. In fact, the application may stop
+ * issuing requests while the CPUs are busy serving other processes, then
+ * restart, then stop again for a while, and so on. In addition, if the disk
+ * achieves a low enough throughput with the request pattern issued by the
+ * application (e.g., because the request pattern is random and/or the device
+ * is slow), then the application may meet the above bandwidth requirement too.
+ * To prevent such a greedy application to be deemed as soft real-time, a
+ * further rule is used in the computation of soft_rt_next_start:
+ * soft_rt_next_start must be higher than the current time plus the maximum
+ * time for which the arrival of a request is waited for when a sync queue
+ * becomes idle, namely bfqd->bfq_slice_idle.
+ * This filters out greedy applications, as the latter issue instead their next
+ * request as soon as possible after the last one has been completed (in
+ * contrast, when a batch of requests is completed, a soft real-time application
+ * spends some time processing data).
+ *
+ * Unfortunately, the last filter may easily generate false positives if only
+ * bfqd->bfq_slice_idle is used as a reference time interval and one or both
+ * the following cases occur:
+ * 1) HZ is so low that the duration of a jiffy is comparable to or higher
+ * than bfqd->bfq_slice_idle. This happens, e.g., on slow devices with
+ * HZ=100.
+ * 2) jiffies, instead of increasing at a constant rate, may stop increasing
+ * for a while, then suddenly 'jump' by several units to recover the lost
+ * increments. This seems to happen, e.g., inside virtual machines.
+ * To address this issue, we do not use as a reference time interval just
+ * bfqd->bfq_slice_idle, but bfqd->bfq_slice_idle plus a few jiffies. In
+ * particular we add the minimum number of jiffies for which the filter seems
+ * to be quite precise also in embedded systems and KVM/QEMU virtual machines.
+ */
+static inline unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq)
+{
+ return max(bfqq->last_idle_bklogged +
+ HZ * bfqq->service_from_backlogged /
+ bfqd->bfq_wr_max_softrt_rate,
+ jiffies + bfqq->bfqd->bfq_slice_idle + 4);
+}
+
+/*
+ * Return the largest-possible time instant such that, for as long as possible,
+ * the current time will be lower than this time instant according to the macro
+ * time_is_before_jiffies().
+ */
+static inline unsigned long bfq_infinity_from_now(unsigned long now)
+{
+ return now + ULONG_MAX / 2;
+}
+
+/**
+ * bfq_bfqq_expire - expire a queue.
+ * @bfqd: device owning the queue.
+ * @bfqq: the queue to expire.
+ * @compensate: if true, compensate for the time spent idling.
+ * @reason: the reason causing the expiration.
+ *
+ *
+ * If the process associated to the queue is slow (i.e., seeky), or in
+ * case of budget timeout, or, finally, if it is async, we
+ * artificially charge it an entire budget (independently of the
+ * actual service it received). As a consequence, the queue will get
+ * higher timestamps than the correct ones upon reactivation, and
+ * hence it will be rescheduled as if it had received more service
+ * than what it actually received. In the end, this class of processes
+ * will receive less service in proportion to how slowly they consume
+ * their budgets (and hence how seriously they tend to lower the
+ * throughput).
+ *
+ * In contrast, when a queue expires because it has been idling for
+ * too much or because it exhausted its budget, we do not touch the
+ * amount of service it has received. Hence when the queue will be
+ * reactivated and its timestamps updated, the latter will be in sync
+ * with the actual service received by the queue until expiration.
+ *
+ * Charging a full budget to the first type of queues and the exact
+ * service to the others has the effect of using the WF2Q+ policy to
+ * schedule the former on a timeslice basis, without violating the
+ * service domain guarantees of the latter.
+ */
+static void bfq_bfqq_expire(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq,
+ int compensate,
+ enum bfqq_expiration reason)
+{
+ int slow;
+ BUG_ON(bfqq != bfqd->in_service_queue);
+
+ /* Update disk peak rate for autotuning and check whether the
+ * process is slow (see bfq_update_peak_rate).
+ */
+ slow = bfq_update_peak_rate(bfqd, bfqq, compensate, reason);
+
+ /*
+ * As above explained, 'punish' slow (i.e., seeky), timed-out
+ * and async queues, to favor sequential sync workloads.
+ *
+ * Processes doing IO in the slower disk zones will tend to be
+ * slow(er) even if not seeky. Hence, since the estimated peak
+ * rate is actually an average over the disk surface, these
+ * processes may timeout just for bad luck. To avoid punishing
+ * them we do not charge a full budget to a process that
+ * succeeded in consuming at least 2/3 of its budget.
+ */
+ if (slow || (reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
+ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3))
+ bfq_bfqq_charge_full_budget(bfqq);
+
+ bfqq->service_from_backlogged += bfqq->entity.service;
+
+ if (BFQQ_SEEKY(bfqq) && reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
+ !bfq_bfqq_constantly_seeky(bfqq)) {
+ bfq_mark_bfqq_constantly_seeky(bfqq);
+ if (!blk_queue_nonrot(bfqd->queue))
+ bfqd->const_seeky_busy_in_flight_queues++;
+ }
+
+ if (bfqd->low_latency && bfqq->wr_coeff == 1)
+ bfqq->last_wr_start_finish = jiffies;
+
+ if (bfqd->low_latency && bfqd->bfq_wr_max_softrt_rate > 0 &&
+ RB_EMPTY_ROOT(&bfqq->sort_list)) {
+ /*
+ * If we get here, and there are no outstanding requests,
+ * then the request pattern is isochronous (see the comments
+ * to the function bfq_bfqq_softrt_next_start()). Hence we can
+ * compute soft_rt_next_start. If, instead, the queue still
+ * has outstanding requests, then we have to wait for the
+ * completion of all the outstanding requests to discover
+ * whether the request pattern is actually isochronous.
+ */
+ if (bfqq->dispatched == 0)
+ bfqq->soft_rt_next_start =
+ bfq_bfqq_softrt_next_start(bfqd, bfqq);
+ else {
+ /*
+ * The application is still waiting for the
+ * completion of one or more requests:
+ * prevent it from possibly being incorrectly
+ * deemed as soft real-time by setting its
+ * soft_rt_next_start to infinity. In fact,
+ * without this assignment, the application
+ * would be incorrectly deemed as soft
+ * real-time if:
+ * 1) it issued a new request before the
+ * completion of all its in-flight
+ * requests, and
+ * 2) at that time, its soft_rt_next_start
+ * happened to be in the past.
+ */
+ bfqq->soft_rt_next_start =
+ bfq_infinity_from_now(jiffies);
+ /*
+ * Schedule an update of soft_rt_next_start to when
+ * the task may be discovered to be isochronous.
+ */
+ bfq_mark_bfqq_softrt_update(bfqq);
+ }
+ }
+
+ bfq_log_bfqq(bfqd, bfqq,
+ "expire (%d, slow %d, num_disp %d, idle_win %d)", reason, slow,
+ bfqq->dispatched, bfq_bfqq_idle_window(bfqq));
+
+ /* Increase, decrease or leave budget unchanged according to reason */
+ __bfq_bfqq_recalc_budget(bfqd, bfqq, reason);
+ __bfq_bfqq_expire(bfqd, bfqq);
+}
+
+/*
+ * Budget timeout is not implemented through a dedicated timer, but
+ * just checked on request arrivals and completions, as well as on
+ * idle timer expirations.
+ */
+static int bfq_bfqq_budget_timeout(struct bfq_queue *bfqq)
+{
+ if (bfq_bfqq_budget_new(bfqq) ||
+ time_before(jiffies, bfqq->budget_timeout))
+ return 0;
+ return 1;
+}
+
+/*
+ * If we expire a queue that is waiting for the arrival of a new
+ * request, we may prevent the fictitious timestamp back-shifting that
+ * allows the guarantees of the queue to be preserved (see [1] for
+ * this tricky aspect). Hence we return true only if this condition
+ * does not hold, or if the queue is slow enough to deserve only to be
+ * kicked off for preserving a high throughput.
+*/
+static inline int bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
+{
+ bfq_log_bfqq(bfqq->bfqd, bfqq,
+ "may_budget_timeout: wait_request %d left %d timeout %d",
+ bfq_bfqq_wait_request(bfqq),
+ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3,
+ bfq_bfqq_budget_timeout(bfqq));
+
+ return (!bfq_bfqq_wait_request(bfqq) ||
+ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3)
+ &&
+ bfq_bfqq_budget_timeout(bfqq);
+}
+
+/*
+ * Device idling is allowed only for the queues for which this function returns
+ * true. For this reason, the return value of this function plays a critical
+ * role for both throughput boosting and service guarantees. This return value
+ * is computed through a logical expression. In this rather long comment, we
+ * try to briefly describe all the details and motivations behind the
+ * components of this logical expression.
+ *
+ * First, the expression may be true only for sync queues. Besides, if bfqq is
+ * also being weight-raised, then the expression always evaluates to true, as
+ * device idling is instrumental for preserving low-latency guarantees
+ * (see [1]). Otherwise, the expression evaluates to true only if bfqq has
+ * a non-null idle window and either the device is not performing NCQ
+ * (because, when both of the last two conditions hold, idling most certainly
+ * boosts the throughput), or the following compound condition is true.
+ *
+ * The compound condition contains a first component that lets the whole
+ * compound condition evaluate to false if there is at least one
+ * weight-raised busy queue. This guarantees that, in this case, the device
+ * is not idled for a sync non-weight-raised queue. The latter is then expired
+ * immediately if empty. Combined with the timestamping rules of BFQ (see [1]
+ * for details), this causes sync non-weight-raised queues to get a lower
+ * number of requests served, and hence to ask for a lower number of requests
+ * from the request pool, before the busy weight-raised queues get served
+ * again.
+ *
+ * This is beneficial for the processes associated with weight-raised queues,
+ * when the system operates in request-pool saturation conditions (e.g., in
+ * the presence of write hogs). In fact, if the processes associated with the
+ * other queues ask for requests at a lower rate, then weight-raised processes
+ * have a higher probability to get a request from the pool immediately (or at
+ * least soon) when they need one. Hence they have a higher probability to
+ * actually get a fraction of the disk throughput proportional to their high
+ * weight. This is especially true with NCQ-enabled drives, which enqueue
+ * several requests in advance and further reorder internally-queued requests.
+ *
+ * In the end, mistreating non-weight-raised queues when there are busy weight-
+ * raised queues seems to mitigate starvation problems in the presence of heavy
+ * write workloads and NCQ, and hence to guarantee a higher application and
+ * system responsiveness in these hostile scenarios.
+ *
+ * If the first component of the compound condition is instead true (i.e.,
+ * there is no weight-raised busy queue), then the rest of the compound
+ * condition takes into account service-guarantee and throughput issues.
+ *
+ * As for service guarantees, allowing the drive to enqueue more than one
+ * request at a time, and hence delegating de facto final scheduling decisions
+ * to the drive's internal scheduler, causes loss of control on the actual
+ * request service order. In this respect, when the drive is allowed to
+ * enqueue more than one request at a time, the service distribution enforced
+ * by the drive's internal scheduler is likely to coincide with the desired
+ * device-throughput distribution only in the following, perfectly symmetric,
+ * scenario:
+ * 1) all active queues have the same weight,
+ * 2) all active groups at the same level in the groups tree have the same
+ * weight,
+ * 3) all active groups at the same level in the groups tree have the same
+ * number of children.
+ *
+ * Even in such a scenario, sequential I/O may still receive a preferential
+ * treatment, but this is not likely to be a big issue with flash-based
+ * devices, because of their non-dramatic loss of throughput with random I/O.
+ * Things do differ with HDDs, for which additional care is taken, as
+ * explained after completing the discussion for flash-based devices.
+ *
+ * Unfortunately, keeping the necessary state for evaluating exactly the above
+ * symmetry conditions would be quite complex and time consuming. Therefore BFQ
+ * evaluates instead the following stronger sub-conditions, for which it is
+ * much easier to maintain the needed state:
+ * 1) all active queues have the same weight,
+ * 2) all active groups have the same weight,
+ * 3) all active groups have at most one active child each.
+ * In particular, the last two conditions are always true if hierarchical
+ * support and the cgroups interface are not enabled, hence no state needs
+ * to be maintained.
+ *
+ * According to the above considerations, the compound condition evaluates
+ * to true and hence idling is performed if any of the above symmetry
+ * sub-condition does not hold. These are the only sub-conditions considered
+ * if the device is flash-based, as, for such a device, it is sensible to
+ * force idling only for service-guarantee issues. In fact, as for throughput,
+ * idling NCQ-capable flash-based devices would not boost the throughput even
+ * with sequential I/O; rather it would lower the throughput in proportion to
+ * how fast the device is. In the end, (only) if all the three sub-conditions
+ * hold and the device is flash-based, then the compound condition evaluates
+ * to false and hence no idling is performed.
+ *
+ * As already said, things change with a rotational device, where idling boosts
+ * the throughput with sequential I/O (even with NCQ). Hence, for such a device
+ * the compound condition evaluates to true and idling is performed also if the
+ * following additional sub-condition does not hold: the queue is (constantly)
+ * seeky. Unfortunately, this different behavior with respect to flash-based
+ * devices causes an additional asymmetry: if some sync queues enjoy idling and
+ * some other sync queues do not, then the latter get a low share of the device
+ * bandwidth, simply because the former get many requests served after being
+ * set as in service, whereas the latter do not. As a consequence, to
+ * guarantee the desired bandwidth distribution, on HDDs the compound
+ * expression evaluates to true (and hence device idling is performed) also
+ * if the following last symmetry condition does not hold: no other queue is
+ * benefiting from idling.
+ * Also this last condition is actually replaced with a simpler-to-maintain
+ * and stronger condition: there is no busy queue which is not seeky (and
+ * hence may also benefit from idling).
+ *
+ * To sum up, when all the required symmetry and throughput-boosting
+ * sub-conditions hold, the compound condition evaluates to false, and hence
+ * no idling is performed. This helps to keep the drives' internal queues full
+ * on NCQ-capable devices, and hence to boost the throughput, without causing
+ * 'almost' any loss of service guarantees. The 'almost' follows from the fact
+ * that, if the internal queue of one such device is filled while all the
+ * sub-conditions hold, but at some point in time some sub-condition stops to
+ * hold, then it may become impossible to let requests be served in the new
+ * desired order until all the requests already queued in the device have been
+ * served.
+ */
+static inline bool bfq_bfqq_must_not_expire(struct bfq_queue *bfqq)
+{
+ struct bfq_data *bfqd = bfqq->bfqd;
+#ifdef CONFIG_CGROUP_BFQIO
+#define symmetric_scenario (!bfqd->active_numerous_groups && \
+ !bfq_differentiated_weights(bfqd))
+#else
+#define symmetric_scenario (!bfq_differentiated_weights(bfqd))
+#endif
+#define cond_for_seeky_on_ncq_hdd (bfq_bfqq_constantly_seeky(bfqq) && \
+ bfqd->busy_in_flight_queues == \
+ bfqd->const_seeky_busy_in_flight_queues)
+/*
+ * Condition for expiring a non-weight-raised queue (and hence not idling
+ * the device).
+ */
+#define cond_for_expiring_non_wr (bfqd->hw_tag && \
+ (bfqd->raised_busy_queues > 0 || \
+ (symmetric_scenario && \
+ (blk_queue_nonrot(bfqd->queue) || \
+ cond_for_seeky_on_ncq_hdd))))
+
+ return bfq_bfqq_sync(bfqq) && (
+ bfqq->wr_coeff > 1 ||
+ (bfq_bfqq_idle_window(bfqq) &&
+ !cond_for_expiring_non_wr)
+ );
+}
+
+/*
+ * If the in-service queue is empty, but it is sync and either of the following
+ * conditions holds, then: 1) the queue must remain in service and cannot be
+ * expired, and 2) the disk must be idled to wait for the possible arrival
+ * of a new request for the queue. The conditions are:
+ * - the device is rotational and not performing NCQ, and the queue has its
+ * idle window set (in this case, waiting for a new request for the queue
+ * is likely to boost the disk throughput);
+ * - the queue is weight-raised (waiting for the request is necessary to
+ * provide the queue with fairness and latency guarantees, see [1] for
+ * details).
+ */
+static inline bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
+{
+ struct bfq_data *bfqd = bfqq->bfqd;
+
+ return RB_EMPTY_ROOT(&bfqq->sort_list) && bfqd->bfq_slice_idle != 0 &&
+ bfq_bfqq_must_not_expire(bfqq);
+}
+
+/*
+ * Select a queue for service. If we have a current queue in service,
+ * check whether to continue servicing it, or retrieve and set a new one.
+ */
+static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+{
+ struct bfq_queue *bfqq;
+ struct request *next_rq;
+ enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
+
+ bfqq = bfqd->in_service_queue;
+ if (bfqq == NULL)
+ goto new_queue;
+
+ bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
+
+ if (bfq_may_expire_for_budg_timeout(bfqq) &&
+ !timer_pending(&bfqd->idle_slice_timer) &&
+ !bfq_bfqq_must_idle(bfqq))
+ goto expire;
+
+ next_rq = bfqq->next_rq;
+ /*
+ * If bfqq has requests queued and it has enough budget left to
+ * serve them, keep the queue, otherwise expire it.
+ */
+ if (next_rq != NULL) {
+ if (bfq_serv_to_charge(next_rq, bfqq) >
+ bfq_bfqq_budget_left(bfqq)) {
+ reason = BFQ_BFQQ_BUDGET_EXHAUSTED;
+ goto expire;
+ } else {
+ /*
+ * The idle timer may be pending because we may not
+ * disable disk idling even when a new request arrives
+ */
+ if (timer_pending(&bfqd->idle_slice_timer)) {
+ /*
+ * If we get here: 1) at least a new request
+ * has arrived but we have not disabled the
+ * timer because the request was too small,
+ * 2) then the block layer has unplugged the
+ * device, causing the dispatch to be invoked.
+ *
+ * Since the device is unplugged, now the
+ * requests are probably large enough to
+ * provide a reasonable throughput.
+ * So we disable idling.
+ */
+ bfq_clear_bfqq_wait_request(bfqq);
+ del_timer(&bfqd->idle_slice_timer);
+ }
+ goto keep_queue;
+ }
+ }
+
+ /*
+ * No requests pending. If the in-service queue still has requests in
+ * flight (possibly waiting for a completion) or is idling for a new
+ * request, then keep it.
+ */
+ if (timer_pending(&bfqd->idle_slice_timer) ||
+ (bfqq->dispatched != 0 && bfq_bfqq_must_not_expire(bfqq))) {
+ bfqq = NULL;
+ goto keep_queue;
+ }
+
+ reason = BFQ_BFQQ_NO_MORE_REQUESTS;
+expire:
+ bfq_bfqq_expire(bfqd, bfqq, 0, reason);
+new_queue:
+ bfqq = bfq_set_in_service_queue(bfqd);
+ bfq_log(bfqd, "select_queue: new queue %d returned",
+ bfqq != NULL ? bfqq->pid : 0);
+keep_queue:
+ return bfqq;
+}
+
+static void bfq_update_raising_data(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq)
+{
+ struct bfq_entity *entity = &bfqq->entity;
+ if (bfqq->wr_coeff > 1) { /* queue is being weight-raised */
+ bfq_log_bfqq(bfqd, bfqq,
+ "raising period dur %u/%u msec, old raising coeff %u, w %d(%d)",
+ jiffies_to_msecs(jiffies -
+ bfqq->last_wr_start_finish),
+ jiffies_to_msecs(bfqq->wr_cur_max_time),
+ bfqq->wr_coeff,
+ bfqq->entity.weight, bfqq->entity.orig_weight);
+
+ BUG_ON(bfqq != bfqd->in_service_queue && entity->weight !=
+ entity->orig_weight * bfqq->wr_coeff);
+ if (entity->ioprio_changed)
+ bfq_log_bfqq(bfqd, bfqq,
+ "WARN: pending prio change");
+ /*
+ * If too much time has elapsed from the beginning
+ * of this weight-raising period, stop it.
+ */
+ if (time_is_before_jiffies(bfqq->last_wr_start_finish +
+ bfqq->wr_cur_max_time)) {
+ bfqq->last_wr_start_finish = jiffies;
+ bfq_log_bfqq(bfqd, bfqq,
+ "wrais ending at %lu, rais_max_time %u",
+ bfqq->last_wr_start_finish,
+ jiffies_to_msecs(bfqq->
+ wr_cur_max_time));
+ bfq_bfqq_end_wr(bfqq);
+ }
+ }
+ /* Update weight both if it must be raised and if it must be lowered */
+ if ((entity->weight > entity->orig_weight) != (bfqq->wr_coeff > 1))
+ __bfq_entity_update_weight_prio(
+ bfq_entity_service_tree(entity),
+ entity);
+}
+
+/*
+ * Dispatch one request from bfqq, moving it to the request queue
+ * dispatch list.
+ */
+static int bfq_dispatch_request(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq)
+{
+ int dispatched = 0;
+ struct request *rq;
+ unsigned long service_to_charge;
+
+ BUG_ON(RB_EMPTY_ROOT(&bfqq->sort_list));
+
+ /* Follow expired path, else get first next available. */
+ rq = bfq_check_fifo(bfqq);
+ if (rq == NULL)
+ rq = bfqq->next_rq;
+ service_to_charge = bfq_serv_to_charge(rq, bfqq);
+
+ if (service_to_charge > bfq_bfqq_budget_left(bfqq)) {
+ /*
+ * This may happen if the next rq is chosen
+ * in fifo order instead of sector order.
+ * The budget is properly dimensioned
+ * to be always sufficient to serve the next request
+ * only if it is chosen in sector order. The reason is
+ * that it would be quite inefficient and little useful
+ * to always make sure that the budget is large enough
+ * to serve even the possible next rq in fifo order.
+ * In fact, requests are seldom served in fifo order.
+ *
+ * Expire the queue for budget exhaustion, and
+ * make sure that the next act_budget is enough
+ * to serve the next request, even if it comes
+ * from the fifo expired path.
+ */
+ bfqq->next_rq = rq;
+ /*
+ * Since this dispatch is failed, make sure that
+ * a new one will be performed
+ */
+ if (!bfqd->rq_in_driver)
+ bfq_schedule_dispatch(bfqd);
+ goto expire;
+ }
+
+ /* Finally, insert request into driver dispatch list. */
+ bfq_bfqq_served(bfqq, service_to_charge);
+ bfq_dispatch_insert(bfqd->queue, rq);
+
+ bfq_update_raising_data(bfqd, bfqq);
+
+ bfq_log_bfqq(bfqd, bfqq,
+ "dispatched %u sec req (%llu), budg left %lu",
+ blk_rq_sectors(rq),
+ (long long unsigned)blk_rq_pos(rq),
+ bfq_bfqq_budget_left(bfqq));
+
+ dispatched++;
+
+ if (bfqd->in_service_bic == NULL) {
+ atomic_long_inc(&RQ_BIC(rq)->icq.ioc->refcount);
+ bfqd->in_service_bic = RQ_BIC(rq);
+ }
+
+ if (bfqd->busy_queues > 1 && ((!bfq_bfqq_sync(bfqq) &&
+ dispatched >= bfqd->bfq_max_budget_async_rq) ||
+ bfq_class_idle(bfqq)))
+ goto expire;
+
+ return dispatched;
+
+expire:
+ bfq_bfqq_expire(bfqd, bfqq, 0, BFQ_BFQQ_BUDGET_EXHAUSTED);
+ return dispatched;
+}
+
+static int __bfq_forced_dispatch_bfqq(struct bfq_queue *bfqq)
+{
+ int dispatched = 0;
+
+ while (bfqq->next_rq != NULL) {
+ bfq_dispatch_insert(bfqq->bfqd->queue, bfqq->next_rq);
+ dispatched++;
+ }
+
+ BUG_ON(!list_empty(&bfqq->fifo));
+ return dispatched;
+}
+
+/*
+ * Drain our current requests. Used for barriers and when switching
+ * io schedulers on-the-fly.
+ */
+static int bfq_forced_dispatch(struct bfq_data *bfqd)
+{
+ struct bfq_queue *bfqq, *n;
+ struct bfq_service_tree *st;
+ int dispatched = 0;
+
+ bfqq = bfqd->in_service_queue;
+ if (bfqq != NULL)
+ __bfq_bfqq_expire(bfqd, bfqq);
+
+ /*
+ * Loop through classes, and be careful to leave the scheduler
+ * in a consistent state, as feedback mechanisms and vtime
+ * updates cannot be disabled during the process.
+ */
+ list_for_each_entry_safe(bfqq, n, &bfqd->active_list, bfqq_list) {
+ st = bfq_entity_service_tree(&bfqq->entity);
+
+ dispatched += __bfq_forced_dispatch_bfqq(bfqq);
+ bfqq->max_budget = bfq_max_budget(bfqd);
+
+ bfq_forget_idle(st);
+ }
+
+ BUG_ON(bfqd->busy_queues != 0);
+
+ return dispatched;
+}
+
+static int bfq_dispatch_requests(struct request_queue *q, int force)
+{
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+ struct bfq_queue *bfqq;
+ int max_dispatch;
+
+ bfq_log(bfqd, "dispatch requests: %d busy queues", bfqd->busy_queues);
+ if (bfqd->busy_queues == 0)
+ return 0;
+
+ if (unlikely(force))
+ return bfq_forced_dispatch(bfqd);
+
+ bfqq = bfq_select_queue(bfqd);
+ if (bfqq == NULL)
+ return 0;
+
+ max_dispatch = bfqd->bfq_quantum;
+ if (bfq_class_idle(bfqq))
+ max_dispatch = 1;
+
+ if (!bfq_bfqq_sync(bfqq))
+ max_dispatch = bfqd->bfq_max_budget_async_rq;
+
+ if (bfqq->dispatched >= max_dispatch) {
+ if (bfqd->busy_queues > 1)
+ return 0;
+ if (bfqq->dispatched >= 4 * max_dispatch)
+ return 0;
+ }
+
+ if (bfqd->sync_flight != 0 && !bfq_bfqq_sync(bfqq))
+ return 0;
+
+ bfq_clear_bfqq_wait_request(bfqq);
+ BUG_ON(timer_pending(&bfqd->idle_slice_timer));
+
+ if (!bfq_dispatch_request(bfqd, bfqq))
+ return 0;
+
+ bfq_log_bfqq(bfqd, bfqq, "dispatched one request of %d (max_disp %d)",
+ bfqq->pid, max_dispatch);
+
+ return 1;
+}
+
+/*
+ * Task holds one reference to the queue, dropped when task exits. Each rq
+ * in-flight on this queue also holds a reference, dropped when rq is freed.
+ *
+ * Queue lock must be held here.
+ */
+static void bfq_put_queue(struct bfq_queue *bfqq)
+{
+ struct bfq_data *bfqd = bfqq->bfqd;
+
+ BUG_ON(atomic_read(&bfqq->ref) <= 0);
+
+ bfq_log_bfqq(bfqd, bfqq, "put_queue: %p %d", bfqq,
+ atomic_read(&bfqq->ref));
+ if (!atomic_dec_and_test(&bfqq->ref))
+ return;
+
+ BUG_ON(rb_first(&bfqq->sort_list) != NULL);
+ BUG_ON(bfqq->allocated[READ] + bfqq->allocated[WRITE] != 0);
+ BUG_ON(bfqq->entity.tree != NULL);
+ BUG_ON(bfq_bfqq_busy(bfqq));
+ BUG_ON(bfqd->in_service_queue == bfqq);
+
+ bfq_log_bfqq(bfqd, bfqq, "put_queue: %p freed", bfqq);
+
+ kmem_cache_free(bfq_pool, bfqq);
+}
+
+static void bfq_put_cooperator(struct bfq_queue *bfqq)
+{
+ struct bfq_queue *__bfqq, *next;
+
+ /*
+ * If this queue was scheduled to merge with another queue, be
+ * sure to drop the reference taken on that queue (and others in
+ * the merge chain). See bfq_setup_merge and bfq_merge_bfqqs.
+ */
+ __bfqq = bfqq->new_bfqq;
+ while (__bfqq) {
+ if (__bfqq == bfqq) {
+ WARN(1, "bfqq->new_bfqq loop detected.\n");
+ break;
+ }
+ next = __bfqq->new_bfqq;
+ bfq_put_queue(__bfqq);
+ __bfqq = next;
+ }
+}
+
+static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+{
+ if (bfqq == bfqd->in_service_queue) {
+ __bfq_bfqq_expire(bfqd, bfqq);
+ bfq_schedule_dispatch(bfqd);
+ }
+
+ bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
+ atomic_read(&bfqq->ref));
+
+ bfq_put_cooperator(bfqq);
+
+ bfq_put_queue(bfqq);
+}
+
+static void bfq_init_icq(struct io_cq *icq)
+{
+ struct bfq_io_cq *bic = icq_to_bic(icq);
+
+ bic->ttime.last_end_request = jiffies;
+ /*
+ * A newly created bic indicates that the process has just
+ * started doing I/O, and is probably mapping into memory its
+ * executable and libraries: it definitely needs weight raising.
+ * There is however the possibility that the process performs,
+ * for a while, I/O close to some other process. EQM intercepts
+ * this behavior and may merge the queue corresponding to the
+ * process with some other queue, BEFORE the weight of the queue
+ * is raised. Merged queues are not weight-raised (they are assumed
+ * to belong to processes that benefit only from high throughput).
+ * If the merge is basically the consequence of an accident, then
+ * the queue will be split soon and will get back its old weight.
+ * It is then important to write down somewhere that this queue
+ * does need weight raising, even if it did not make it to get its
+ * weight raised before being merged. To this purpose, we overload
+ * the field raising_time_left and assign 1 to it, to mark the queue
+ * as needing weight raising.
+ */
+ bic->wr_time_left = 1;
+}
+
+static void bfq_exit_icq(struct io_cq *icq)
+{
+ struct bfq_io_cq *bic = icq_to_bic(icq);
+ struct bfq_data *bfqd = bic_to_bfqd(bic);
+
+ if (bic->bfqq[BLK_RW_ASYNC]) {
+ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_ASYNC]);
+ bic->bfqq[BLK_RW_ASYNC] = NULL;
+ }
+
+ if (bic->bfqq[BLK_RW_SYNC]) {
+ /*
+ * If the bic is using a shared queue, put the reference
+ * taken on the io_context when the bic started using a
+ * shared bfq_queue.
+ */
+ if (bfq_bfqq_coop(bic->bfqq[BLK_RW_SYNC]))
+ put_io_context(icq->ioc);
+ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
+ bic->bfqq[BLK_RW_SYNC] = NULL;
+ }
+}
+
+/*
+ * Update the entity prio values; note that the new values will not
+ * be used until the next (re)activation.
+ */
+static void bfq_init_prio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
+{
+ struct task_struct *tsk = current;
+ int ioprio_class;
+
+ if (!bfq_bfqq_prio_changed(bfqq))
+ return;
+
+ ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
+ switch (ioprio_class) {
+ default:
+ dev_err(bfqq->bfqd->queue->backing_dev_info.dev,
+ "bfq: bad prio %x\n", ioprio_class);
+ case IOPRIO_CLASS_NONE:
+ /*
+ * No prio set, inherit CPU scheduling settings.
+ */
+ bfqq->entity.new_ioprio = task_nice_ioprio(tsk);
+ bfqq->entity.new_ioprio_class = task_nice_ioclass(tsk);
+ break;
+ case IOPRIO_CLASS_RT:
+ bfqq->entity.new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
+ bfqq->entity.new_ioprio_class = IOPRIO_CLASS_RT;
+ break;
+ case IOPRIO_CLASS_BE:
+ bfqq->entity.new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
+ bfqq->entity.new_ioprio_class = IOPRIO_CLASS_BE;
+ break;
+ case IOPRIO_CLASS_IDLE:
+ bfqq->entity.new_ioprio_class = IOPRIO_CLASS_IDLE;
+ bfqq->entity.new_ioprio = 7;
+ bfq_clear_bfqq_idle_window(bfqq);
+ break;
+ }
+
+ bfqq->entity.ioprio_changed = 1;
+
+ /*
+ * Keep track of original prio settings in case we have to temporarily
+ * elevate the priority of this queue.
+ */
+ bfqq->org_ioprio = bfqq->entity.new_ioprio;
+ bfq_clear_bfqq_prio_changed(bfqq);
+}
+
+static void bfq_changed_ioprio(struct bfq_io_cq *bic)
+{
+ struct bfq_data *bfqd;
+ struct bfq_queue *bfqq, *new_bfqq;
+ struct bfq_group *bfqg;
+ unsigned long uninitialized_var(flags);
+ int ioprio = bic->icq.ioc->ioprio;
+
+ bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
+ &flags);
+ /*
+ * This condition may trigger on a newly created bic, be sure to drop
+ * the lock before returning.
+ */
+ if (unlikely(bfqd == NULL) || likely(bic->ioprio == ioprio))
+ goto out;
+
+ bfqq = bic->bfqq[BLK_RW_ASYNC];
+ if (bfqq != NULL) {
+ bfqg = container_of(bfqq->entity.sched_data, struct bfq_group,
+ sched_data);
+ new_bfqq = bfq_get_queue(bfqd, bfqg, BLK_RW_ASYNC, bic,
+ GFP_ATOMIC);
+ if (new_bfqq != NULL) {
+ bic->bfqq[BLK_RW_ASYNC] = new_bfqq;
+ bfq_log_bfqq(bfqd, bfqq,
+ "changed_ioprio: bfqq %p %d",
+ bfqq, atomic_read(&bfqq->ref));
+ bfq_put_queue(bfqq);
+ }
+ }
+
+ bfqq = bic->bfqq[BLK_RW_SYNC];
+ if (bfqq != NULL)
+ bfq_mark_bfqq_prio_changed(bfqq);
+
+ bic->ioprio = ioprio;
+
+out:
+ bfq_put_bfqd_unlock(bfqd, &flags);
+}
+
+static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ pid_t pid, int is_sync)
+{
+ RB_CLEAR_NODE(&bfqq->entity.rb_node);
+ INIT_LIST_HEAD(&bfqq->fifo);
+
+ atomic_set(&bfqq->ref, 0);
+ bfqq->bfqd = bfqd;
+
+ bfq_mark_bfqq_prio_changed(bfqq);
+
+ if (is_sync) {
+ if (!bfq_class_idle(bfqq))
+ bfq_mark_bfqq_idle_window(bfqq);
+ bfq_mark_bfqq_sync(bfqq);
+ }
+
+ /* Tentative initial value to trade off between thr and lat */
+ bfqq->max_budget = (2 * bfq_max_budget(bfqd)) / 3;
+ bfqq->pid = pid;
+
+ bfqq->wr_coeff = 1;
+ bfqq->last_wr_start_finish = 0;
+ /*
+ * Set to the value for which bfqq will not be deemed as
+ * soft rt when it becomes backlogged.
+ */
+ bfqq->soft_rt_next_start = bfq_infinity_from_now(jiffies);
+}
+
+static struct bfq_queue *bfq_find_alloc_queue(struct bfq_data *bfqd,
+ struct bfq_group *bfqg,
+ int is_sync,
+ struct bfq_io_cq *bic,
+ gfp_t gfp_mask)
+{
+ struct bfq_queue *bfqq, *new_bfqq = NULL;
+
+retry:
+ /* bic always exists here */
+ bfqq = bic_to_bfqq(bic, is_sync);
+
+ /*
+ * Always try a new alloc if we fall back to the OOM bfqq
+ * originally, since it should just be a temporary situation.
+ */
+ if (bfqq == NULL || bfqq == &bfqd->oom_bfqq) {
+ bfqq = NULL;
+ if (new_bfqq != NULL) {
+ bfqq = new_bfqq;
+ new_bfqq = NULL;
+ } else if (gfp_mask & __GFP_WAIT) {
+ spin_unlock_irq(bfqd->queue->queue_lock);
+ new_bfqq = kmem_cache_alloc_node(bfq_pool,
+ gfp_mask | __GFP_ZERO,
+ bfqd->queue->node);
+ spin_lock_irq(bfqd->queue->queue_lock);
+ if (new_bfqq != NULL)
+ goto retry;
+ } else {
+ bfqq = kmem_cache_alloc_node(bfq_pool,
+ gfp_mask | __GFP_ZERO,
+ bfqd->queue->node);
+ }
+
+ if (bfqq != NULL) {
+ bfq_init_bfqq(bfqd, bfqq, current->pid, is_sync);
+ bfq_log_bfqq(bfqd, bfqq, "allocated");
+ } else {
+ bfqq = &bfqd->oom_bfqq;
+ bfq_log_bfqq(bfqd, bfqq, "using oom bfqq");
+ }
+
+ bfq_init_prio_data(bfqq, bic);
+ bfq_init_entity(&bfqq->entity, bfqg);
+ }
+
+ if (new_bfqq != NULL)
+ kmem_cache_free(bfq_pool, new_bfqq);
+
+ return bfqq;
+}
+
+static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
+ struct bfq_group *bfqg,
+ int ioprio_class, int ioprio)
+{
+ switch (ioprio_class) {
+ case IOPRIO_CLASS_RT:
+ return &bfqg->async_bfqq[0][ioprio];
+ case IOPRIO_CLASS_NONE:
+ ioprio = IOPRIO_NORM;
+ /* fall through */
+ case IOPRIO_CLASS_BE:
+ return &bfqg->async_bfqq[1][ioprio];
+ case IOPRIO_CLASS_IDLE:
+ return &bfqg->async_idle_bfqq;
+ default:
+ BUG();
+ }
+}
+
+static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
+ struct bfq_group *bfqg, int is_sync,
+ struct bfq_io_cq *bic, gfp_t gfp_mask)
+{
+ const int ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
+ const int ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
+ struct bfq_queue **async_bfqq = NULL;
+ struct bfq_queue *bfqq = NULL;
+
+ if (!is_sync) {
+ async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
+ ioprio);
+ bfqq = *async_bfqq;
+ }
+
+ if (bfqq == NULL)
+ bfqq = bfq_find_alloc_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
+
+ /*
+ * Pin the queue now that it's allocated, scheduler exit will prune it.
+ */
+ if (!is_sync && *async_bfqq == NULL) {
+ atomic_inc(&bfqq->ref);
+ bfq_log_bfqq(bfqd, bfqq, "get_queue, bfqq not in async: %p, %d",
+ bfqq, atomic_read(&bfqq->ref));
+ *async_bfqq = bfqq;
+ }
+
+ atomic_inc(&bfqq->ref);
+ bfq_log_bfqq(bfqd, bfqq, "get_queue, at end: %p, %d", bfqq,
+ atomic_read(&bfqq->ref));
+ return bfqq;
+}
+
+static void bfq_update_io_thinktime(struct bfq_data *bfqd,
+ struct bfq_io_cq *bic)
+{
+ unsigned long elapsed = jiffies - bic->ttime.last_end_request;
+ unsigned long ttime = min(elapsed, 2UL * bfqd->bfq_slice_idle);
+
+ bic->ttime.ttime_samples = (7*bic->ttime.ttime_samples + 256) / 8;
+ bic->ttime.ttime_total = (7*bic->ttime.ttime_total + 256*ttime) / 8;
+ bic->ttime.ttime_mean = (bic->ttime.ttime_total + 128) /
+ bic->ttime.ttime_samples;
+}
+
+static void bfq_update_io_seektime(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq,
+ struct request *rq)
+{
+ sector_t sdist;
+ u64 total;
+
+ if (bfqq->last_request_pos < blk_rq_pos(rq))
+ sdist = blk_rq_pos(rq) - bfqq->last_request_pos;
+ else
+ sdist = bfqq->last_request_pos - blk_rq_pos(rq);
+
+ /*
+ * Don't allow the seek distance to get too large from the
+ * odd fragment, pagein, etc.
+ */
+ if (bfqq->seek_samples == 0) /* first request, not really a seek */
+ sdist = 0;
+ else if (bfqq->seek_samples <= 60) /* second & third seek */
+ sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*1024);
+ else
+ sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*64);
+
+ bfqq->seek_samples = (7*bfqq->seek_samples + 256) / 8;
+ bfqq->seek_total = (7*bfqq->seek_total + (u64)256*sdist) / 8;
+ total = bfqq->seek_total + (bfqq->seek_samples/2);
+ do_div(total, bfqq->seek_samples);
+ bfqq->seek_mean = (sector_t)total;
+
+ bfq_log_bfqq(bfqd, bfqq, "dist=%llu mean=%llu", (u64)sdist,
+ (u64)bfqq->seek_mean);
+}
+
+/*
+ * Disable idle window if the process thinks too long or seeks so much that
+ * it doesn't matter.
+ */
+static void bfq_update_idle_window(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq,
+ struct bfq_io_cq *bic)
+{
+ int enable_idle;
+
+ /* Don't idle for async or idle io prio class. */
+ if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
+ return;
+
+ /* Idle window just restored, statistics are meaningless. */
+ if (bfq_bfqq_just_split(bfqq))
+ return;
+
+ enable_idle = bfq_bfqq_idle_window(bfqq);
+
+ if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
+ bfqd->bfq_slice_idle == 0 ||
+ (bfqd->hw_tag && BFQQ_SEEKY(bfqq) &&
+ bfqq->wr_coeff == 1))
+ enable_idle = 0;
+ else if (bfq_sample_valid(bic->ttime.ttime_samples)) {
+ if (bic->ttime.ttime_mean > bfqd->bfq_slice_idle &&
+ bfqq->wr_coeff == 1)
+ enable_idle = 0;
+ else
+ enable_idle = 1;
+ }
+ bfq_log_bfqq(bfqd, bfqq, "update_idle_window: enable_idle %d",
+ enable_idle);
+
+ if (enable_idle)
+ bfq_mark_bfqq_idle_window(bfqq);
+ else
+ bfq_clear_bfqq_idle_window(bfqq);
+}
+
+/*
+ * Called when a new fs request (rq) is added to bfqq. Check if there's
+ * something we should do about it.
+ */
+static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ struct request *rq)
+{
+ struct bfq_io_cq *bic = RQ_BIC(rq);
+
+ if (rq->cmd_flags & REQ_META)
+ bfqq->meta_pending++;
+
+ bfq_update_io_thinktime(bfqd, bic);
+ bfq_update_io_seektime(bfqd, bfqq, rq);
+ if (!BFQQ_SEEKY(bfqq) && bfq_bfqq_constantly_seeky(bfqq)) {
+ bfq_clear_bfqq_constantly_seeky(bfqq);
+ if (!blk_queue_nonrot(bfqd->queue)) {
+ BUG_ON(!bfqd->const_seeky_busy_in_flight_queues);
+ bfqd->const_seeky_busy_in_flight_queues--;
+ }
+ }
+ if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
+ !BFQQ_SEEKY(bfqq))
+ bfq_update_idle_window(bfqd, bfqq, bic);
+ bfq_clear_bfqq_just_split(bfqq);
+
+ bfq_log_bfqq(bfqd, bfqq,
+ "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
+ bfq_bfqq_idle_window(bfqq), BFQQ_SEEKY(bfqq),
+ (long long unsigned)bfqq->seek_mean);
+
+ bfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq);
+
+ if (bfqq == bfqd->in_service_queue && bfq_bfqq_wait_request(bfqq)) {
+ int small_req = bfqq->queued[rq_is_sync(rq)] == 1 &&
+ blk_rq_sectors(rq) < 32;
+ int budget_timeout = bfq_bfqq_budget_timeout(bfqq);
+
+ /*
+ * There is just this request queued: if the request
+ * is small and the queue is not to be expired, then
+ * just exit.
+ *
+ * In this way, if the disk is being idled to wait for
+ * a new request from the in-service queue, we avoid
+ * unplugging the device and committing the disk to serve
+ * just a small request. On the contrary, we wait for
+ * the block layer to decide when to unplug the device:
+ * hopefully, new requests will be merged to this one
+ * quickly, then the device will be unplugged and
+ * larger requests will be dispatched.
+ */
+ if (small_req && !budget_timeout)
+ return;
+
+ /*
+ * A large enough request arrived, or the queue is to
+ * be expired: in both cases disk idling is to be
+ * stopped, so clear wait_request flag and reset
+ * timer.
+ */
+ bfq_clear_bfqq_wait_request(bfqq);
+ del_timer(&bfqd->idle_slice_timer);
+
+ /*
+ * The queue is not empty, because a new request just
+ * arrived. Hence we can safely expire the queue, in
+ * case of budget timeout, without risking that the
+ * timestamps of the queue are not updated correctly.
+ * See [1] for more details.
+ */
+ if (budget_timeout)
+ bfq_bfqq_expire(bfqd, bfqq, 0, BFQ_BFQQ_BUDGET_TIMEOUT);
+
+ /*
+ * Let the request rip immediately, or let a new queue be
+ * selected if bfqq has just been expired.
+ */
+ __blk_run_queue(bfqd->queue);
+ }
+}
+
+static void bfq_insert_request(struct request_queue *q, struct request *rq)
+{
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+ struct bfq_queue *bfqq = RQ_BFQQ(rq), *new_bfqq;
+
+ assert_spin_locked(bfqd->queue->queue_lock);
+
+ /*
+ * An unplug may trigger a requeue of a request from the device
+ * driver: make sure we are in process context while trying to
+ * merge two bfq_queues.
+ */
+ if (!in_interrupt()) {
+ new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true);
+ if (new_bfqq != NULL) {
+ if (bic_to_bfqq(RQ_BIC(rq), 1) != bfqq)
+ new_bfqq = bic_to_bfqq(RQ_BIC(rq), 1);
+ /*
+ * Release the request's reference to the old bfqq
+ * and make sure one is taken to the shared queue.
+ */
+ new_bfqq->allocated[rq_data_dir(rq)]++;
+ bfqq->allocated[rq_data_dir(rq)]--;
+ atomic_inc(&new_bfqq->ref);
+ bfq_put_queue(bfqq);
+ if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
+ bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
+ bfqq, new_bfqq);
+ rq->elv.priv[1] = new_bfqq;
+ bfqq = new_bfqq;
+ }
+ }
+
+ bfq_init_prio_data(bfqq, RQ_BIC(rq));
+
+ bfq_add_request(rq);
+
+ /*
+ * Here a newly-created bfq_queue has already started a weight-raising
+ * period: clear raising_time_left to prevent bfq_bfqq_save_state()
+ * from assigning it a full weight-raising period. See the detailed
+ * comments about this field in bfq_init_icq().
+ */
+ if (bfqq->bic != NULL)
+ bfqq->bic->wr_time_left = 0;
+ rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
+ list_add_tail(&rq->queuelist, &bfqq->fifo);
+
+ bfq_rq_enqueued(bfqd, bfqq, rq);
+}
+
+static void bfq_update_hw_tag(struct bfq_data *bfqd)
+{
+ bfqd->max_rq_in_driver = max(bfqd->max_rq_in_driver,
+ bfqd->rq_in_driver);
+
+ if (bfqd->hw_tag == 1)
+ return;
+
+ /*
+ * This sample is valid if the number of outstanding requests
+ * is large enough to allow a queueing behavior. Note that the
+ * sum is not exact, as it's not taking into account deactivated
+ * requests.
+ */
+ if (bfqd->rq_in_driver + bfqd->queued < BFQ_HW_QUEUE_THRESHOLD)
+ return;
+
+ if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
+ return;
+
+ bfqd->hw_tag = bfqd->max_rq_in_driver > BFQ_HW_QUEUE_THRESHOLD;
+ bfqd->max_rq_in_driver = 0;
+ bfqd->hw_tag_samples = 0;
+}
+
+static void bfq_completed_request(struct request_queue *q, struct request *rq)
+{
+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
+ struct bfq_data *bfqd = bfqq->bfqd;
+ bool sync = bfq_bfqq_sync(bfqq);
+
+ bfq_log_bfqq(bfqd, bfqq, "completed one req with %u sects left (%d)",
+ blk_rq_sectors(rq), sync);
+
+ bfq_update_hw_tag(bfqd);
+
+ WARN_ON(!bfqd->rq_in_driver);
+ WARN_ON(!bfqq->dispatched);
+ bfqd->rq_in_driver--;
+ bfqq->dispatched--;
+
+ if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
+ bfq_weights_tree_remove(bfqd, &bfqq->entity,
+ &bfqd->queue_weights_tree);
+ if (!blk_queue_nonrot(bfqd->queue)) {
+ BUG_ON(!bfqd->busy_in_flight_queues);
+ bfqd->busy_in_flight_queues--;
+ if (bfq_bfqq_constantly_seeky(bfqq)) {
+ BUG_ON(
+ !bfqd->const_seeky_busy_in_flight_queues);
+ bfqd->const_seeky_busy_in_flight_queues--;
+ }
+ }
+ }
+
+ if (sync) {
+ bfqd->sync_flight--;
+ RQ_BIC(rq)->ttime.last_end_request = jiffies;
+ }
+
+ /*
+ * If we are waiting to discover whether the request pattern of the
+ * task associated with the queue is actually isochronous, and
+ * both requisites for this condition to hold are satisfied, then
+ * compute soft_rt_next_start (see the comments to the function
+ * bfq_bfqq_softrt_next_start()).
+ */
+ if (bfq_bfqq_softrt_update(bfqq) && bfqq->dispatched == 0 &&
+ RB_EMPTY_ROOT(&bfqq->sort_list))
+ bfqq->soft_rt_next_start =
+ bfq_bfqq_softrt_next_start(bfqd, bfqq);
+
+ /*
+ * If this is the in-service queue, check if it needs to be expired,
+ * or if we want to idle in case it has no pending requests.
+ */
+ if (bfqd->in_service_queue == bfqq) {
+ if (bfq_bfqq_budget_new(bfqq))
+ bfq_set_budget_timeout(bfqd);
+
+ if (bfq_bfqq_must_idle(bfqq)) {
+ bfq_arm_slice_timer(bfqd);
+ goto out;
+ } else if (bfq_may_expire_for_budg_timeout(bfqq))
+ bfq_bfqq_expire(bfqd, bfqq, 0, BFQ_BFQQ_BUDGET_TIMEOUT);
+ else if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
+ (bfqq->dispatched == 0 ||
+ !bfq_bfqq_must_not_expire(bfqq)))
+ bfq_bfqq_expire(bfqd, bfqq, 0,
+ BFQ_BFQQ_NO_MORE_REQUESTS);
+ }
+
+ if (!bfqd->rq_in_driver)
+ bfq_schedule_dispatch(bfqd);
+
+out:
+ return;
+}
+
+static inline int __bfq_may_queue(struct bfq_queue *bfqq)
+{
+ if (bfq_bfqq_wait_request(bfqq) && bfq_bfqq_must_alloc(bfqq)) {
+ bfq_clear_bfqq_must_alloc(bfqq);
+ return ELV_MQUEUE_MUST;
+ }
+
+ return ELV_MQUEUE_MAY;
+}
+
+static int bfq_may_queue(struct request_queue *q, int rw)
+{
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+ struct task_struct *tsk = current;
+ struct bfq_io_cq *bic;
+ struct bfq_queue *bfqq;
+
+ /*
+ * Don't force setup of a queue from here, as a call to may_queue
+ * does not necessarily imply that a request actually will be queued.
+ * So just lookup a possibly existing queue, or return 'may queue'
+ * if that fails.
+ */
+ bic = bfq_bic_lookup(bfqd, tsk->io_context);
+ if (bic == NULL)
+ return ELV_MQUEUE_MAY;
+
+ bfqq = bic_to_bfqq(bic, rw_is_sync(rw));
+ if (bfqq != NULL) {
+ bfq_init_prio_data(bfqq, bic);
+
+ return __bfq_may_queue(bfqq);
+ }
+
+ return ELV_MQUEUE_MAY;
+}
+
+/*
+ * Queue lock held here.
+ */
+static void bfq_put_request(struct request *rq)
+{
+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
+
+ if (bfqq != NULL) {
+ const int rw = rq_data_dir(rq);
+
+ BUG_ON(!bfqq->allocated[rw]);
+ bfqq->allocated[rw]--;
+
+ rq->elv.priv[0] = NULL;
+ rq->elv.priv[1] = NULL;
+
+ bfq_log_bfqq(bfqq->bfqd, bfqq, "put_request %p, %d",
+ bfqq, atomic_read(&bfqq->ref));
+ bfq_put_queue(bfqq);
+ }
+}
+
+/*
+ * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
+ * was the last process referring to said bfqq.
+ */
+static struct bfq_queue *
+bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
+{
+ bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
+
+ put_io_context(bic->icq.ioc);
+
+ if (bfqq_process_refs(bfqq) == 1) {
+ bfqq->pid = current->pid;
+ bfq_clear_bfqq_coop(bfqq);
+ bfq_clear_bfqq_split_coop(bfqq);
+ return bfqq;
+ }
+
+ bic_set_bfqq(bic, NULL, 1);
+
+ bfq_put_cooperator(bfqq);
+
+ bfq_put_queue(bfqq);
+ return NULL;
+}
+
+/*
+ * Allocate bfq data structures associated with this request.
+ */
+static int bfq_set_request(struct request_queue *q, struct request *rq,
+ struct bio *bio, gfp_t gfp_mask)
+{
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+ struct bfq_io_cq *bic = icq_to_bic(rq->elv.icq);
+ const int rw = rq_data_dir(rq);
+ const int is_sync = rq_is_sync(rq);
+ struct bfq_queue *bfqq;
+ struct bfq_group *bfqg;
+ unsigned long flags;
+ bool split = false;
+
+ might_sleep_if(gfp_mask & __GFP_WAIT);
+
+ bfq_changed_ioprio(bic);
+
+ spin_lock_irqsave(q->queue_lock, flags);
+
+ if (bic == NULL)
+ goto queue_fail;
+
+ bfqg = bfq_bic_update_cgroup(bic);
+
+new_queue:
+ bfqq = bic_to_bfqq(bic, is_sync);
+ if (bfqq == NULL || bfqq == &bfqd->oom_bfqq) {
+ bfqq = bfq_get_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
+ bic_set_bfqq(bic, bfqq, is_sync);
+ } else {
+ /* If the queue was seeky for too long, break it apart. */
+ if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
+ bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
+ bfqq = bfq_split_bfqq(bic, bfqq);
+ split = true;
+ if (!bfqq)
+ goto new_queue;
+ }
+ }
+
+ bfqq->allocated[rw]++;
+ atomic_inc(&bfqq->ref);
+ bfq_log_bfqq(bfqd, bfqq, "set_request: bfqq %p, %d", bfqq,
+ atomic_read(&bfqq->ref));
+
+ rq->elv.priv[0] = bic;
+ rq->elv.priv[1] = bfqq;
+
+ /*
+ * If a bfq_queue has only one process reference, it is owned
+ * by only one bfq_io_cq: we can set the bic field of the
+ * bfq_queue to the address of that structure. Also, if the
+ * queue has just been split, mark a flag so that the
+ * information is available to the other scheduler hooks.
+ */
+ if (bfqq_process_refs(bfqq) == 1) {
+ bfqq->bic = bic;
+ if (split) {
+ bfq_mark_bfqq_just_split(bfqq);
+ /*
+ * If the queue has just been split from a shared queue,
+ * restore the idle window and the possible weight
+ * raising period.
+ */
+ bfq_bfqq_resume_state(bfqq, bic);
+ }
+ }
+
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
+ return 0;
+
+queue_fail:
+ bfq_schedule_dispatch(bfqd);
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
+ return 1;
+}
+
+static void bfq_kick_queue(struct work_struct *work)
+{
+ struct bfq_data *bfqd =
+ container_of(work, struct bfq_data, unplug_work);
+ struct request_queue *q = bfqd->queue;
+
+ spin_lock_irq(q->queue_lock);
+ __blk_run_queue(q);
+ spin_unlock_irq(q->queue_lock);
+}
+
+/*
+ * Handler of the expiration of the timer running if the in-service queue
+ * is idling inside its time slice.
+ */
+static void bfq_idle_slice_timer(unsigned long data)
+{
+ struct bfq_data *bfqd = (struct bfq_data *)data;
+ struct bfq_queue *bfqq;
+ unsigned long flags;
+ enum bfqq_expiration reason;
+
+ spin_lock_irqsave(bfqd->queue->queue_lock, flags);
+
+ bfqq = bfqd->in_service_queue;
+ /*
+ * Theoretical race here: the in-service queue can be NULL or different
+ * from the queue that was idling if the timer handler spins on
+ * the queue_lock and a new request arrives for the current
+ * queue and there is a full dispatch cycle that changes the
+ * in-service queue. This can hardly happen, but in the worst case
+ * we just expire a queue too early.
+ */
+ if (bfqq != NULL) {
+ bfq_log_bfqq(bfqd, bfqq, "slice_timer expired");
+ if (bfq_bfqq_budget_timeout(bfqq))
+ /*
+ * Also here the queue can be safely expired
+ * for budget timeout without wasting
+ * guarantees
+ */
+ reason = BFQ_BFQQ_BUDGET_TIMEOUT;
+ else if (bfqq->queued[0] == 0 && bfqq->queued[1] == 0)
+ /*
+ * The queue may not be empty upon timer expiration,
+ * because we may not disable the timer when the first
+ * request of the in-service queue arrives during
+ * disk idling
+ */
+ reason = BFQ_BFQQ_TOO_IDLE;
+ else
+ goto schedule_dispatch;
+
+ bfq_bfqq_expire(bfqd, bfqq, 1, reason);
+ }
+
+schedule_dispatch:
+ bfq_schedule_dispatch(bfqd);
+
+ spin_unlock_irqrestore(bfqd->queue->queue_lock, flags);
+}
+
+static void bfq_shutdown_timer_wq(struct bfq_data *bfqd)
+{
+ del_timer_sync(&bfqd->idle_slice_timer);
+ cancel_work_sync(&bfqd->unplug_work);
+}
+
+static inline void __bfq_put_async_bfqq(struct bfq_data *bfqd,
+ struct bfq_queue **bfqq_ptr)
+{
+ struct bfq_group *root_group = bfqd->root_group;
+ struct bfq_queue *bfqq = *bfqq_ptr;
+
+ bfq_log(bfqd, "put_async_bfqq: %p", bfqq);
+ if (bfqq != NULL) {
+ bfq_bfqq_move(bfqd, bfqq, &bfqq->entity, root_group);
+ bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d",
+ bfqq, atomic_read(&bfqq->ref));
+ bfq_put_queue(bfqq);
+ *bfqq_ptr = NULL;
+ }
+}
+
+/*
+ * Release all the bfqg references to its async queues. If we are
+ * deallocating the group these queues may still contain requests, so
+ * we reparent them to the root cgroup (i.e., the only one that will
+ * exist for sure until all the requests on a device are gone).
+ */
+static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
+{
+ int i, j;
+
+ for (i = 0; i < 2; i++)
+ for (j = 0; j < IOPRIO_BE_NR; j++)
+ __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
+
+ __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
+}
+
+static void bfq_exit_queue(struct elevator_queue *e)
+{
+ struct bfq_data *bfqd = e->elevator_data;
+ struct request_queue *q = bfqd->queue;
+ struct bfq_queue *bfqq, *n;
+
+ bfq_shutdown_timer_wq(bfqd);
+
+ spin_lock_irq(q->queue_lock);
+
+ BUG_ON(bfqd->in_service_queue != NULL);
+ list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
+ bfq_deactivate_bfqq(bfqd, bfqq, 0);
+
+ bfq_disconnect_groups(bfqd);
+ spin_unlock_irq(q->queue_lock);
+
+ bfq_shutdown_timer_wq(bfqd);
+
+ synchronize_rcu();
+
+ BUG_ON(timer_pending(&bfqd->idle_slice_timer));
+
+ bfq_free_root_group(bfqd);
+ kfree(bfqd);
+}
+
+static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
+{
+ struct bfq_group *bfqg;
+ struct bfq_data *bfqd;
+ struct elevator_queue *eq;
+
+ eq = elevator_alloc(q, e);
+ if (eq == NULL)
+ return -ENOMEM;
+
+ bfqd = kzalloc_node(sizeof(*bfqd), GFP_KERNEL, q->node);
+ if (bfqd == NULL) {
+ kobject_put(&eq->kobj);
+ return -ENOMEM;
+ }
+ eq->elevator_data = bfqd;
+
+ /*
+ * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
+ * Grab a permanent reference to it, so that the normal code flow
+ * will not attempt to free it.
+ */
+ bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, 1, 0);
+ atomic_inc(&bfqd->oom_bfqq.ref);
+
+ bfqd->queue = q;
+
+ spin_lock_irq(q->queue_lock);
+ q->elevator = eq;
+ spin_unlock_irq(q->queue_lock);
+
+ bfqg = bfq_alloc_root_group(bfqd, q->node);
+ if (bfqg == NULL) {
+ kfree(bfqd);
+ kobject_put(&eq->kobj);
+ return -ENOMEM;
+ }
+
+ bfqd->root_group = bfqg;
+#ifdef CONFIG_CGROUP_BFQIO
+ bfqd->active_numerous_groups = 0;
+#endif
+
+ init_timer(&bfqd->idle_slice_timer);
+ bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
+ bfqd->idle_slice_timer.data = (unsigned long)bfqd;
+
+ bfqd->rq_pos_tree = RB_ROOT;
+ bfqd->queue_weights_tree = RB_ROOT;
+ bfqd->group_weights_tree = RB_ROOT;
+
+ INIT_WORK(&bfqd->unplug_work, bfq_kick_queue);
+
+ INIT_LIST_HEAD(&bfqd->active_list);
+ INIT_LIST_HEAD(&bfqd->idle_list);
+
+ bfqd->hw_tag = -1;
+
+ bfqd->bfq_max_budget = bfq_default_max_budget;
+
+ bfqd->bfq_quantum = bfq_quantum;
+ bfqd->bfq_fifo_expire[0] = bfq_fifo_expire[0];
+ bfqd->bfq_fifo_expire[1] = bfq_fifo_expire[1];
+ bfqd->bfq_back_max = bfq_back_max;
+ bfqd->bfq_back_penalty = bfq_back_penalty;
+ bfqd->bfq_slice_idle = bfq_slice_idle;
+ bfqd->bfq_class_idle_last_service = 0;
+ bfqd->bfq_max_budget_async_rq = bfq_max_budget_async_rq;
+ bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
+ bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
+
+ bfqd->low_latency = true;
+
+ bfqd->bfq_wr_coeff = 20;
+ bfqd->bfq_wr_rt_max_time = msecs_to_jiffies(300);
+ bfqd->bfq_wr_max_time = 0;
+ bfqd->bfq_wr_min_idle_time = msecs_to_jiffies(2000);
+ bfqd->bfq_wr_min_inter_arr_async = msecs_to_jiffies(500);
+ bfqd->bfq_wr_max_softrt_rate = 7000; /*
+ * Approximate rate required
+ * to playback or record a
+ * high-definition compressed
+ * video.
+ */
+ bfqd->raised_busy_queues = 0;
+ bfqd->busy_in_flight_queues = 0;
+ bfqd->const_seeky_busy_in_flight_queues = 0;
+
+ /*
+ * Begin by assuming, optimistically, that the device peak rate is equal
+ * to the highest reference rate.
+ */
+ bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] *
+ T_fast[blk_queue_nonrot(bfqd->queue)];
+ bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)];
+ bfqd->device_speed = BFQ_BFQD_FAST;
+
+ return 0;
+}
+
+static void bfq_slab_kill(void)
+{
+ if (bfq_pool != NULL)
+ kmem_cache_destroy(bfq_pool);
+}
+
+static int __init bfq_slab_setup(void)
+{
+ bfq_pool = KMEM_CACHE(bfq_queue, 0);
+ if (bfq_pool == NULL)
+ return -ENOMEM;
+ return 0;
+}
+
+static ssize_t bfq_var_show(unsigned int var, char *page)
+{
+ return sprintf(page, "%d\n", var);
+}
+
+static ssize_t bfq_var_store(unsigned long *var, const char *page, size_t count)
+{
+ unsigned long new_val;
+ int ret = kstrtoul(page, 10, &new_val);
+
+ if (ret == 0)
+ *var = new_val;
+
+ return count;
+}
+
+static ssize_t bfq_wr_max_time_show(struct elevator_queue *e, char *page)
+{
+ struct bfq_data *bfqd = e->elevator_data;
+ return sprintf(page, "%d\n", bfqd->bfq_wr_max_time > 0 ?
+ jiffies_to_msecs(bfqd->bfq_wr_max_time) :
+ jiffies_to_msecs(bfq_wr_duration(bfqd)));
+}
+
+static ssize_t bfq_weights_show(struct elevator_queue *e, char *page)
+{
+ struct bfq_queue *bfqq;
+ struct bfq_data *bfqd = e->elevator_data;
+ ssize_t num_char = 0;
+
+ num_char += sprintf(page + num_char, "Tot reqs queued %d\n\n",
+ bfqd->queued);
+
+ spin_lock_irq(bfqd->queue->queue_lock);
+
+ num_char += sprintf(page + num_char, "Active:\n");
+ list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) {
+ num_char += sprintf(page + num_char,
+ "pid%d: weight %hu, nr_queued %d %d, dur %d/%u\n",
+ bfqq->pid,
+ bfqq->entity.weight,
+ bfqq->queued[0],
+ bfqq->queued[1],
+ jiffies_to_msecs(jiffies -
+ bfqq->last_wr_start_finish),
+ jiffies_to_msecs(bfqq->wr_cur_max_time));
+ }
+
+ num_char += sprintf(page + num_char, "Idle:\n");
+ list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list) {
+ num_char += sprintf(page + num_char,
+ "pid%d: weight %hu, dur %d/%u\n",
+ bfqq->pid,
+ bfqq->entity.weight,
+ jiffies_to_msecs(jiffies -
+ bfqq->last_wr_start_finish),
+ jiffies_to_msecs(bfqq->wr_cur_max_time));
+ }
+
+ spin_unlock_irq(bfqd->queue->queue_lock);
+
+ return num_char;
+}
+
+#define SHOW_FUNCTION(__FUNC, __VAR, __CONV) \
+static ssize_t __FUNC(struct elevator_queue *e, char *page) \
+{ \
+ struct bfq_data *bfqd = e->elevator_data; \
+ unsigned int __data = __VAR; \
+ if (__CONV) \
+ __data = jiffies_to_msecs(__data); \
+ return bfq_var_show(__data, (page)); \
+}
+SHOW_FUNCTION(bfq_quantum_show, bfqd->bfq_quantum, 0);
+SHOW_FUNCTION(bfq_fifo_expire_sync_show, bfqd->bfq_fifo_expire[1], 1);
+SHOW_FUNCTION(bfq_fifo_expire_async_show, bfqd->bfq_fifo_expire[0], 1);
+SHOW_FUNCTION(bfq_back_seek_max_show, bfqd->bfq_back_max, 0);
+SHOW_FUNCTION(bfq_back_seek_penalty_show, bfqd->bfq_back_penalty, 0);
+SHOW_FUNCTION(bfq_slice_idle_show, bfqd->bfq_slice_idle, 1);
+SHOW_FUNCTION(bfq_max_budget_show, bfqd->bfq_user_max_budget, 0);
+SHOW_FUNCTION(bfq_max_budget_async_rq_show, bfqd->bfq_max_budget_async_rq, 0);
+SHOW_FUNCTION(bfq_timeout_sync_show, bfqd->bfq_timeout[BLK_RW_SYNC], 1);
+SHOW_FUNCTION(bfq_timeout_async_show, bfqd->bfq_timeout[BLK_RW_ASYNC], 1);
+SHOW_FUNCTION(bfq_low_latency_show, bfqd->low_latency, 0);
+SHOW_FUNCTION(bfq_wr_coeff_show, bfqd->bfq_wr_coeff, 0);
+SHOW_FUNCTION(bfq_wr_rt_max_time_show, bfqd->bfq_wr_rt_max_time, 1);
+SHOW_FUNCTION(bfq_wr_min_idle_time_show, bfqd->bfq_wr_min_idle_time, 1);
+SHOW_FUNCTION(bfq_wr_min_inter_arr_async_show, bfqd->bfq_wr_min_inter_arr_async,
+ 1);
+SHOW_FUNCTION(bfq_wr_max_softrt_rate_show, bfqd->bfq_wr_max_softrt_rate, 0);
+#undef SHOW_FUNCTION
+
+#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \
+static ssize_t \
+__FUNC(struct elevator_queue *e, const char *page, size_t count) \
+{ \
+ struct bfq_data *bfqd = e->elevator_data; \
+ unsigned long uninitialized_var(__data); \
+ int ret = bfq_var_store(&__data, (page), count); \
+ if (__data < (MIN)) \
+ __data = (MIN); \
+ else if (__data > (MAX)) \
+ __data = (MAX); \
+ if (__CONV) \
+ *(__PTR) = msecs_to_jiffies(__data); \
+ else \
+ *(__PTR) = __data; \
+ return ret; \
+}
+STORE_FUNCTION(bfq_quantum_store, &bfqd->bfq_quantum, 1, INT_MAX, 0);
+STORE_FUNCTION(bfq_fifo_expire_sync_store, &bfqd->bfq_fifo_expire[1], 1,
+ INT_MAX, 1);
+STORE_FUNCTION(bfq_fifo_expire_async_store, &bfqd->bfq_fifo_expire[0], 1,
+ INT_MAX, 1);
+STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0);
+STORE_FUNCTION(bfq_back_seek_penalty_store, &bfqd->bfq_back_penalty, 1,
+ INT_MAX, 0);
+STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 1);
+STORE_FUNCTION(bfq_max_budget_async_rq_store, &bfqd->bfq_max_budget_async_rq,
+ 1, INT_MAX, 0);
+STORE_FUNCTION(bfq_timeout_async_store, &bfqd->bfq_timeout[BLK_RW_ASYNC], 0,
+ INT_MAX, 1);
+STORE_FUNCTION(bfq_wr_coeff_store, &bfqd->bfq_wr_coeff, 1, INT_MAX, 0);
+STORE_FUNCTION(bfq_wr_max_time_store, &bfqd->bfq_wr_max_time, 0, INT_MAX, 1);
+STORE_FUNCTION(bfq_wr_rt_max_time_store, &bfqd->bfq_wr_rt_max_time, 0, INT_MAX,
+ 1);
+STORE_FUNCTION(bfq_wr_min_idle_time_store, &bfqd->bfq_wr_min_idle_time, 0,
+ INT_MAX, 1);
+STORE_FUNCTION(bfq_wr_min_inter_arr_async_store,
+ &bfqd->bfq_wr_min_inter_arr_async, 0, INT_MAX, 1);
+STORE_FUNCTION(bfq_wr_max_softrt_rate_store, &bfqd->bfq_wr_max_softrt_rate, 0,
+ INT_MAX, 0);
+#undef STORE_FUNCTION
+
+/* do nothing for the moment */
+static ssize_t bfq_weights_store(struct elevator_queue *e,
+ const char *page, size_t count)
+{
+ return count;
+}
+
+static inline unsigned long bfq_estimated_max_budget(struct bfq_data *bfqd)
+{
+ u64 timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
+
+ if (bfqd->peak_rate_samples >= BFQ_PEAK_RATE_SAMPLES)
+ return bfq_calc_max_budget(bfqd->peak_rate, timeout);
+ else
+ return bfq_default_max_budget;
+}
+
+static ssize_t bfq_max_budget_store(struct elevator_queue *e,
+ const char *page, size_t count)
+{
+ struct bfq_data *bfqd = e->elevator_data;
+ unsigned long uninitialized_var(__data);
+ int ret = bfq_var_store(&__data, (page), count);
+
+ if (__data == 0)
+ bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
+ else {
+ if (__data > INT_MAX)
+ __data = INT_MAX;
+ bfqd->bfq_max_budget = __data;
+ }
+
+ bfqd->bfq_user_max_budget = __data;
+
+ return ret;
+}
+
+static ssize_t bfq_timeout_sync_store(struct elevator_queue *e,
+ const char *page, size_t count)
+{
+ struct bfq_data *bfqd = e->elevator_data;
+ unsigned long uninitialized_var(__data);
+ int ret = bfq_var_store(&__data, (page), count);
+
+ if (__data < 1)
+ __data = 1;
+ else if (__data > INT_MAX)
+ __data = INT_MAX;
+
+ bfqd->bfq_timeout[BLK_RW_SYNC] = msecs_to_jiffies(__data);
+ if (bfqd->bfq_user_max_budget == 0)
+ bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
+
+ return ret;
+}
+
+static ssize_t bfq_low_latency_store(struct elevator_queue *e,
+ const char *page, size_t count)
+{
+ struct bfq_data *bfqd = e->elevator_data;
+ unsigned long uninitialized_var(__data);
+ int ret = bfq_var_store(&__data, (page), count);
+
+ if (__data > 1)
+ __data = 1;
+ if (__data == 0 && bfqd->low_latency != 0)
+ bfq_end_wr(bfqd);
+ bfqd->low_latency = __data;
+
+ return ret;
+}
+
+#define BFQ_ATTR(name) \
+ __ATTR(name, S_IRUGO|S_IWUSR, bfq_##name##_show, bfq_##name##_store)
+
+static struct elv_fs_entry bfq_attrs[] = {
+ BFQ_ATTR(quantum),
+ BFQ_ATTR(fifo_expire_sync),
+ BFQ_ATTR(fifo_expire_async),
+ BFQ_ATTR(back_seek_max),
+ BFQ_ATTR(back_seek_penalty),
+ BFQ_ATTR(slice_idle),
+ BFQ_ATTR(max_budget),
+ BFQ_ATTR(max_budget_async_rq),
+ BFQ_ATTR(timeout_sync),
+ BFQ_ATTR(timeout_async),
+ BFQ_ATTR(low_latency),
+ BFQ_ATTR(wr_coeff),
+ BFQ_ATTR(wr_max_time),
+ BFQ_ATTR(wr_rt_max_time),
+ BFQ_ATTR(wr_min_idle_time),
+ BFQ_ATTR(wr_min_inter_arr_async),
+ BFQ_ATTR(wr_max_softrt_rate),
+ BFQ_ATTR(weights),
+ __ATTR_NULL
+};
+
+static struct elevator_type iosched_bfq = {
+ .ops = {
+ .elevator_merge_fn = bfq_merge,
+ .elevator_merged_fn = bfq_merged_request,
+ .elevator_merge_req_fn = bfq_merged_requests,
+ .elevator_allow_merge_fn = bfq_allow_merge,
+ .elevator_dispatch_fn = bfq_dispatch_requests,
+ .elevator_add_req_fn = bfq_insert_request,
+ .elevator_activate_req_fn = bfq_activate_request,
+ .elevator_deactivate_req_fn = bfq_deactivate_request,
+ .elevator_completed_req_fn = bfq_completed_request,
+ .elevator_former_req_fn = elv_rb_former_request,
+ .elevator_latter_req_fn = elv_rb_latter_request,
+ .elevator_init_icq_fn = bfq_init_icq,
+ .elevator_exit_icq_fn = bfq_exit_icq,
+ .elevator_set_req_fn = bfq_set_request,
+ .elevator_put_req_fn = bfq_put_request,
+ .elevator_may_queue_fn = bfq_may_queue,
+ .elevator_init_fn = bfq_init_queue,
+ .elevator_exit_fn = bfq_exit_queue,
+ },
+ .icq_size = sizeof(struct bfq_io_cq),
+ .icq_align = __alignof__(struct bfq_io_cq),
+ .elevator_attrs = bfq_attrs,
+ .elevator_name = "bfq",
+ .elevator_owner = THIS_MODULE,
+};
+
+static int __init bfq_init(void)
+{
+ /*
+ * Can be 0 on HZ < 1000 setups.
+ */
+ if (bfq_slice_idle == 0)
+ bfq_slice_idle = 1;
+
+ if (bfq_timeout_async == 0)
+ bfq_timeout_async = 1;
+
+ if (bfq_slab_setup())
+ return -ENOMEM;
+
+ /*
+ * Times to load large popular applications for the typical systems
+ * installed on the reference devices (see the comments before the
+ * definitions of the two arrays).
+ */
+ T_slow[0] = msecs_to_jiffies(2600);
+ T_slow[1] = msecs_to_jiffies(1000);
+ T_fast[0] = msecs_to_jiffies(5500);
+ T_fast[1] = msecs_to_jiffies(2000);
+
+ /*
+ * Thresholds that determine the switch between speed classes (see the
+ * comments before the definition of the array).
+ */
+ device_speed_thresh[0] = (R_fast[0] + R_slow[0]) / 2;
+ device_speed_thresh[1] = (R_fast[1] + R_slow[1]) / 2;
+
+ elv_register(&iosched_bfq);
+ pr_info("BFQ I/O-scheduler version: v7r4");
+
+ return 0;
+}
+
+static void __exit bfq_exit(void)
+{
+ elv_unregister(&iosched_bfq);
+ bfq_slab_kill();
+}
+
+module_init(bfq_init);
+module_exit(bfq_exit);
+
+MODULE_AUTHOR("Fabio Checconi, Paolo Valente");
+MODULE_LICENSE("GPL");
diff --git a/block/bfq-sched.c b/block/bfq-sched.c
new file mode 100644
index 0000000..0fd077c
--- /dev/null
+++ b/block/bfq-sched.c
@@ -0,0 +1,1176 @@
+/*
+ * BFQ: Hierarchical B-WF2Q+ scheduler.
+ *
+ * Based on ideas and code from CFQ:
+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
+ *
+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
+ * Paolo Valente <paolo.valente@unimore.it>
+ *
+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
+ */
+
+#ifdef CONFIG_CGROUP_BFQIO
+#define for_each_entity(entity) \
+ for (; entity != NULL; entity = entity->parent)
+
+#define for_each_entity_safe(entity, parent) \
+ for (; entity && ({ parent = entity->parent; 1; }); entity = parent)
+
+static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
+ int extract,
+ struct bfq_data *bfqd);
+
+static inline void bfq_update_budget(struct bfq_entity *next_in_service)
+{
+ struct bfq_entity *bfqg_entity;
+ struct bfq_group *bfqg;
+ struct bfq_sched_data *group_sd;
+
+ BUG_ON(next_in_service == NULL);
+
+ group_sd = next_in_service->sched_data;
+
+ bfqg = container_of(group_sd, struct bfq_group, sched_data);
+ /*
+ * bfq_group's my_entity field is not NULL only if the group
+ * is not the root group. We must not touch the root entity
+ * as it must never become an in-service entity.
+ */
+ bfqg_entity = bfqg->my_entity;
+ if (bfqg_entity != NULL)
+ bfqg_entity->budget = next_in_service->budget;
+}
+
+static int bfq_update_next_in_service(struct bfq_sched_data *sd)
+{
+ struct bfq_entity *next_in_service;
+
+ if (sd->in_service_entity != NULL)
+ /* will update/requeue at the end of service */
+ return 0;
+
+ /*
+ * NOTE: this can be improved in many ways, such as returning
+ * 1 (and thus propagating upwards the update) only when the
+ * budget changes, or caching the bfqq that will be scheduled
+ * next from this subtree. By now we worry more about
+ * correctness than about performance...
+ */
+ next_in_service = bfq_lookup_next_entity(sd, 0, NULL);
+ sd->next_in_service = next_in_service;
+
+ if (next_in_service != NULL)
+ bfq_update_budget(next_in_service);
+
+ return 1;
+}
+
+static inline void bfq_check_next_in_service(struct bfq_sched_data *sd,
+ struct bfq_entity *entity)
+{
+ BUG_ON(sd->next_in_service != entity);
+}
+#else
+#define for_each_entity(entity) \
+ for (; entity != NULL; entity = NULL)
+
+#define for_each_entity_safe(entity, parent) \
+ for (parent = NULL; entity != NULL; entity = parent)
+
+static inline int bfq_update_next_in_service(struct bfq_sched_data *sd)
+{
+ return 0;
+}
+
+static inline void bfq_check_next_in_service(struct bfq_sched_data *sd,
+ struct bfq_entity *entity)
+{
+}
+
+static inline void bfq_update_budget(struct bfq_entity *next_in_service)
+{
+}
+#endif
+
+/*
+ * Shift for timestamp calculations. This actually limits the maximum
+ * service allowed in one timestamp delta (small shift values increase it),
+ * the maximum total weight that can be used for the queues in the system
+ * (big shift values increase it), and the period of virtual time wraparounds.
+ */
+#define WFQ_SERVICE_SHIFT 22
+
+/**
+ * bfq_gt - compare two timestamps.
+ * @a: first ts.
+ * @b: second ts.
+ *
+ * Return @a > @b, dealing with wrapping correctly.
+ */
+static inline int bfq_gt(u64 a, u64 b)
+{
+ return (s64)(a - b) > 0;
+}
+
+static inline struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity)
+{
+ struct bfq_queue *bfqq = NULL;
+
+ BUG_ON(entity == NULL);
+
+ if (entity->my_sched_data == NULL)
+ bfqq = container_of(entity, struct bfq_queue, entity);
+
+ return bfqq;
+}
+
+
+/**
+ * bfq_delta - map service into the virtual time domain.
+ * @service: amount of service.
+ * @weight: scale factor (weight of an entity or weight sum).
+ */
+static inline u64 bfq_delta(unsigned long service,
+ unsigned long weight)
+{
+ u64 d = (u64)service << WFQ_SERVICE_SHIFT;
+
+ do_div(d, weight);
+ return d;
+}
+
+/**
+ * bfq_calc_finish - assign the finish time to an entity.
+ * @entity: the entity to act upon.
+ * @service: the service to be charged to the entity.
+ */
+static inline void bfq_calc_finish(struct bfq_entity *entity,
+ unsigned long service)
+{
+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
+
+ BUG_ON(entity->weight == 0);
+
+ entity->finish = entity->start +
+ bfq_delta(service, entity->weight);
+
+ if (bfqq != NULL) {
+ bfq_log_bfqq(bfqq->bfqd, bfqq,
+ "calc_finish: serv %lu, w %d",
+ service, entity->weight);
+ bfq_log_bfqq(bfqq->bfqd, bfqq,
+ "calc_finish: start %llu, finish %llu, delta %llu",
+ entity->start, entity->finish,
+ bfq_delta(service, entity->weight));
+ }
+}
+
+/**
+ * bfq_entity_of - get an entity from a node.
+ * @node: the node field of the entity.
+ *
+ * Convert a node pointer to the relative entity. This is used only
+ * to simplify the logic of some functions and not as the generic
+ * conversion mechanism because, e.g., in the tree walking functions,
+ * the check for a %NULL value would be redundant.
+ */
+static inline struct bfq_entity *bfq_entity_of(struct rb_node *node)
+{
+ struct bfq_entity *entity = NULL;
+
+ if (node != NULL)
+ entity = rb_entry(node, struct bfq_entity, rb_node);
+
+ return entity;
+}
+
+/**
+ * bfq_extract - remove an entity from a tree.
+ * @root: the tree root.
+ * @entity: the entity to remove.
+ */
+static inline void bfq_extract(struct rb_root *root,
+ struct bfq_entity *entity)
+{
+ BUG_ON(entity->tree != root);
+
+ entity->tree = NULL;
+ rb_erase(&entity->rb_node, root);
+}
+
+/**
+ * bfq_idle_extract - extract an entity from the idle tree.
+ * @st: the service tree of the owning @entity.
+ * @entity: the entity being removed.
+ */
+static void bfq_idle_extract(struct bfq_service_tree *st,
+ struct bfq_entity *entity)
+{
+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
+ struct rb_node *next;
+
+ BUG_ON(entity->tree != &st->idle);
+
+ if (entity == st->first_idle) {
+ next = rb_next(&entity->rb_node);
+ st->first_idle = bfq_entity_of(next);
+ }
+
+ if (entity == st->last_idle) {
+ next = rb_prev(&entity->rb_node);
+ st->last_idle = bfq_entity_of(next);
+ }
+
+ bfq_extract(&st->idle, entity);
+
+ if (bfqq != NULL)
+ list_del(&bfqq->bfqq_list);
+}
+
+/**
+ * bfq_insert - generic tree insertion.
+ * @root: tree root.
+ * @entity: entity to insert.
+ *
+ * This is used for the idle and the active tree, since they are both
+ * ordered by finish time.
+ */
+static void bfq_insert(struct rb_root *root, struct bfq_entity *entity)
+{
+ struct bfq_entity *entry;
+ struct rb_node **node = &root->rb_node;
+ struct rb_node *parent = NULL;
+
+ BUG_ON(entity->tree != NULL);
+
+ while (*node != NULL) {
+ parent = *node;
+ entry = rb_entry(parent, struct bfq_entity, rb_node);
+
+ if (bfq_gt(entry->finish, entity->finish))
+ node = &parent->rb_left;
+ else
+ node = &parent->rb_right;
+ }
+
+ rb_link_node(&entity->rb_node, parent, node);
+ rb_insert_color(&entity->rb_node, root);
+
+ entity->tree = root;
+}
+
+/**
+ * bfq_update_min - update the min_start field of a entity.
+ * @entity: the entity to update.
+ * @node: one of its children.
+ *
+ * This function is called when @entity may store an invalid value for
+ * min_start due to updates to the active tree. The function assumes
+ * that the subtree rooted at @node (which may be its left or its right
+ * child) has a valid min_start value.
+ */
+static inline void bfq_update_min(struct bfq_entity *entity,
+ struct rb_node *node)
+{
+ struct bfq_entity *child;
+
+ if (node != NULL) {
+ child = rb_entry(node, struct bfq_entity, rb_node);
+ if (bfq_gt(entity->min_start, child->min_start))
+ entity->min_start = child->min_start;
+ }
+}
+
+/**
+ * bfq_update_active_node - recalculate min_start.
+ * @node: the node to update.
+ *
+ * @node may have changed position or one of its children may have moved,
+ * this function updates its min_start value. The left and right subtrees
+ * are assumed to hold a correct min_start value.
+ */
+static inline void bfq_update_active_node(struct rb_node *node)
+{
+ struct bfq_entity *entity = rb_entry(node, struct bfq_entity, rb_node);
+
+ entity->min_start = entity->start;
+ bfq_update_min(entity, node->rb_right);
+ bfq_update_min(entity, node->rb_left);
+}
+
+/**
+ * bfq_update_active_tree - update min_start for the whole active tree.
+ * @node: the starting node.
+ *
+ * @node must be the deepest modified node after an update. This function
+ * updates its min_start using the values held by its children, assuming
+ * that they did not change, and then updates all the nodes that may have
+ * changed in the path to the root. The only nodes that may have changed
+ * are the ones in the path or their siblings.
+ */
+static void bfq_update_active_tree(struct rb_node *node)
+{
+ struct rb_node *parent;
+
+up:
+ bfq_update_active_node(node);
+
+ parent = rb_parent(node);
+ if (parent == NULL)
+ return;
+
+ if (node == parent->rb_left && parent->rb_right != NULL)
+ bfq_update_active_node(parent->rb_right);
+ else if (parent->rb_left != NULL)
+ bfq_update_active_node(parent->rb_left);
+
+ node = parent;
+ goto up;
+}
+
+static void bfq_weights_tree_add(struct bfq_data *bfqd,
+ struct bfq_entity *entity,
+ struct rb_root *root);
+
+static void bfq_weights_tree_remove(struct bfq_data *bfqd,
+ struct bfq_entity *entity,
+ struct rb_root *root);
+
+
+/**
+ * bfq_active_insert - insert an entity in the active tree of its group/device.
+ * @st: the service tree of the entity.
+ * @entity: the entity being inserted.
+ *
+ * The active tree is ordered by finish time, but an extra key is kept
+ * per each node, containing the minimum value for the start times of
+ * its children (and the node itself), so it's possible to search for
+ * the eligible node with the lowest finish time in logarithmic time.
+ */
+static void bfq_active_insert(struct bfq_service_tree *st,
+ struct bfq_entity *entity)
+{
+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
+ struct rb_node *node = &entity->rb_node;
+#ifdef CONFIG_CGROUP_BFQIO
+ struct bfq_sched_data *sd = NULL;
+ struct bfq_group *bfqg = NULL;
+ struct bfq_data *bfqd = NULL;
+#endif
+
+ bfq_insert(&st->active, entity);
+
+ if (node->rb_left != NULL)
+ node = node->rb_left;
+ else if (node->rb_right != NULL)
+ node = node->rb_right;
+
+ bfq_update_active_tree(node);
+
+#ifdef CONFIG_CGROUP_BFQIO
+ sd = entity->sched_data;
+ bfqg = container_of(sd, struct bfq_group, sched_data);
+ BUG_ON(!bfqg);
+ bfqd = (struct bfq_data *)bfqg->bfqd;
+#endif
+ if (bfqq != NULL)
+ list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
+#ifdef CONFIG_CGROUP_BFQIO
+ else { /* bfq_group */
+ BUG_ON(!bfqd);
+ bfq_weights_tree_add(bfqd, entity, &bfqd->group_weights_tree);
+ }
+ if (bfqg != bfqd->root_group) {
+ BUG_ON(!bfqg);
+ BUG_ON(!bfqd);
+ bfqg->active_entities++;
+ if (bfqg->active_entities == 2)
+ bfqd->active_numerous_groups++;
+ }
+#endif
+}
+
+/**
+ * bfq_ioprio_to_weight - calc a weight from an ioprio.
+ * @ioprio: the ioprio value to convert.
+ */
+static unsigned short bfq_ioprio_to_weight(int ioprio)
+{
+ WARN_ON(ioprio < 0 || ioprio >= IOPRIO_BE_NR);
+ return IOPRIO_BE_NR - ioprio;
+}
+
+/**
+ * bfq_weight_to_ioprio - calc an ioprio from a weight.
+ * @weight: the weight value to convert.
+ *
+ * To preserve as mush as possible the old only-ioprio user interface,
+ * 0 is used as an escape ioprio value for weights (numerically) equal or
+ * larger than IOPRIO_BE_NR
+ */
+static unsigned short bfq_weight_to_ioprio(int weight)
+{
+ WARN_ON(weight < BFQ_MIN_WEIGHT || weight > BFQ_MAX_WEIGHT);
+ return IOPRIO_BE_NR - weight < 0 ? 0 : IOPRIO_BE_NR - weight;
+}
+
+static inline void bfq_get_entity(struct bfq_entity *entity)
+{
+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
+
+ if (bfqq != NULL) {
+ atomic_inc(&bfqq->ref);
+ bfq_log_bfqq(bfqq->bfqd, bfqq, "get_entity: %p %d",
+ bfqq, atomic_read(&bfqq->ref));
+ }
+}
+
+/**
+ * bfq_find_deepest - find the deepest node that an extraction can modify.
+ * @node: the node being removed.
+ *
+ * Do the first step of an extraction in an rb tree, looking for the
+ * node that will replace @node, and returning the deepest node that
+ * the following modifications to the tree can touch. If @node is the
+ * last node in the tree return %NULL.
+ */
+static struct rb_node *bfq_find_deepest(struct rb_node *node)
+{
+ struct rb_node *deepest;
+
+ if (node->rb_right == NULL && node->rb_left == NULL)
+ deepest = rb_parent(node);
+ else if (node->rb_right == NULL)
+ deepest = node->rb_left;
+ else if (node->rb_left == NULL)
+ deepest = node->rb_right;
+ else {
+ deepest = rb_next(node);
+ if (deepest->rb_right != NULL)
+ deepest = deepest->rb_right;
+ else if (rb_parent(deepest) != node)
+ deepest = rb_parent(deepest);
+ }
+
+ return deepest;
+}
+
+/**
+ * bfq_active_extract - remove an entity from the active tree.
+ * @st: the service_tree containing the tree.
+ * @entity: the entity being removed.
+ */
+static void bfq_active_extract(struct bfq_service_tree *st,
+ struct bfq_entity *entity)
+{
+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
+ struct rb_node *node;
+#ifdef CONFIG_CGROUP_BFQIO
+ struct bfq_sched_data *sd = NULL;
+ struct bfq_group *bfqg = NULL;
+ struct bfq_data *bfqd = NULL;
+#endif
+
+ node = bfq_find_deepest(&entity->rb_node);
+ bfq_extract(&st->active, entity);
+
+ if (node != NULL)
+ bfq_update_active_tree(node);
+
+#ifdef CONFIG_CGROUP_BFQIO
+ sd = entity->sched_data;
+ bfqg = container_of(sd, struct bfq_group, sched_data);
+ BUG_ON(!bfqg);
+ bfqd = (struct bfq_data *)bfqg->bfqd;
+#endif
+ if (bfqq != NULL)
+ list_del(&bfqq->bfqq_list);
+#ifdef CONFIG_CGROUP_BFQIO
+ else { /* bfq_group */
+ BUG_ON(!bfqd);
+ bfq_weights_tree_remove(bfqd, entity,
+ &bfqd->group_weights_tree);
+ }
+ if (bfqg != bfqd->root_group) {
+ BUG_ON(!bfqg);
+ BUG_ON(!bfqd);
+ BUG_ON(!bfqg->active_entities);
+ bfqg->active_entities--;
+ if (bfqg->active_entities == 1) {
+ BUG_ON(!bfqd->active_numerous_groups);
+ bfqd->active_numerous_groups--;
+ }
+ }
+#endif
+}
+
+/**
+ * bfq_idle_insert - insert an entity into the idle tree.
+ * @st: the service tree containing the tree.
+ * @entity: the entity to insert.
+ */
+static void bfq_idle_insert(struct bfq_service_tree *st,
+ struct bfq_entity *entity)
+{
+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
+ struct bfq_entity *first_idle = st->first_idle;
+ struct bfq_entity *last_idle = st->last_idle;
+
+ if (first_idle == NULL || bfq_gt(first_idle->finish, entity->finish))
+ st->first_idle = entity;
+ if (last_idle == NULL || bfq_gt(entity->finish, last_idle->finish))
+ st->last_idle = entity;
+
+ bfq_insert(&st->idle, entity);
+
+ if (bfqq != NULL)
+ list_add(&bfqq->bfqq_list, &bfqq->bfqd->idle_list);
+}
+
+/**
+ * bfq_forget_entity - remove an entity from the wfq trees.
+ * @st: the service tree.
+ * @entity: the entity being removed.
+ *
+ * Update the device status and forget everything about @entity, putting
+ * the device reference to it, if it is a queue. Entities belonging to
+ * groups are not refcounted.
+ */
+static void bfq_forget_entity(struct bfq_service_tree *st,
+ struct bfq_entity *entity)
+{
+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
+ struct bfq_sched_data *sd;
+
+ BUG_ON(!entity->on_st);
+
+ entity->on_st = 0;
+ st->wsum -= entity->weight;
+ if (bfqq != NULL) {
+ sd = entity->sched_data;
+ bfq_log_bfqq(bfqq->bfqd, bfqq, "forget_entity: %p %d",
+ bfqq, atomic_read(&bfqq->ref));
+ bfq_put_queue(bfqq);
+ }
+}
+
+/**
+ * bfq_put_idle_entity - release the idle tree ref of an entity.
+ * @st: service tree for the entity.
+ * @entity: the entity being released.
+ */
+static void bfq_put_idle_entity(struct bfq_service_tree *st,
+ struct bfq_entity *entity)
+{
+ bfq_idle_extract(st, entity);
+ bfq_forget_entity(st, entity);
+}
+
+/**
+ * bfq_forget_idle - update the idle tree if necessary.
+ * @st: the service tree to act upon.
+ *
+ * To preserve the global O(log N) complexity we only remove one entry here;
+ * as the idle tree will not grow indefinitely this can be done safely.
+ */
+static void bfq_forget_idle(struct bfq_service_tree *st)
+{
+ struct bfq_entity *first_idle = st->first_idle;
+ struct bfq_entity *last_idle = st->last_idle;
+
+ if (RB_EMPTY_ROOT(&st->active) && last_idle != NULL &&
+ !bfq_gt(last_idle->finish, st->vtime)) {
+ /*
+ * Forget the whole idle tree, increasing the vtime past
+ * the last finish time of idle entities.
+ */
+ st->vtime = last_idle->finish;
+ }
+
+ if (first_idle != NULL && !bfq_gt(first_idle->finish, st->vtime))
+ bfq_put_idle_entity(st, first_idle);
+}
+
+static struct bfq_service_tree *
+__bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
+ struct bfq_entity *entity)
+{
+ struct bfq_service_tree *new_st = old_st;
+
+ if (entity->ioprio_changed) {
+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
+ unsigned short prev_weight, new_weight;
+ struct bfq_data *bfqd = NULL;
+ struct rb_root *root;
+#ifdef CONFIG_CGROUP_BFQIO
+ struct bfq_sched_data *sd;
+ struct bfq_group *bfqg;
+#endif
+
+ if (bfqq != NULL)
+ bfqd = bfqq->bfqd;
+#ifdef CONFIG_CGROUP_BFQIO
+ else {
+ sd = entity->my_sched_data;
+ bfqg = container_of(sd, struct bfq_group, sched_data);
+ BUG_ON(!bfqg);
+ bfqd = (struct bfq_data *)bfqg->bfqd;
+ BUG_ON(!bfqd);
+ }
+#endif
+
+ BUG_ON(old_st->wsum < entity->weight);
+ old_st->wsum -= entity->weight;
+
+ if (entity->new_weight != entity->orig_weight) {
+ entity->orig_weight = entity->new_weight;
+ entity->ioprio =
+ bfq_weight_to_ioprio(entity->orig_weight);
+ } else if (entity->new_ioprio != entity->ioprio) {
+ entity->ioprio = entity->new_ioprio;
+ entity->orig_weight =
+ bfq_ioprio_to_weight(entity->ioprio);
+ } else
+ entity->new_weight = entity->orig_weight =
+ bfq_ioprio_to_weight(entity->ioprio);
+
+ entity->ioprio_class = entity->new_ioprio_class;
+ entity->ioprio_changed = 0;
+
+ /*
+ * NOTE: here we may be changing the weight too early,
+ * this will cause unfairness. The correct approach
+ * would have required additional complexity to defer
+ * weight changes to the proper time instants (i.e.,
+ * when entity->finish <= old_st->vtime).
+ */
+ new_st = bfq_entity_service_tree(entity);
+
+ prev_weight = entity->weight;
+ new_weight = entity->orig_weight *
+ (bfqq != NULL ? bfqq->wr_coeff : 1);
+ /*
+ * If the weight of the entity changes, remove the entity
+ * from its old weight counter (if there is a counter
+ * associated with the entity), and add it to the counter
+ * associated with its new weight.
+ */
+ if (prev_weight != new_weight) {
+ root = bfqq ? &bfqd->queue_weights_tree :
+ &bfqd->group_weights_tree;
+ bfq_weights_tree_remove(bfqd, entity, root);
+ }
+ entity->weight = new_weight;
+ /*
+ * Add the entity to its weights tree only if it is
+ * not associated with a weight-raised queue.
+ */
+ if (prev_weight != new_weight &&
+ (bfqq ? bfqq->wr_coeff == 1 : 1))
+ /* If we get here, root has been initialized. */
+ bfq_weights_tree_add(bfqd, entity, root);
+
+ new_st->wsum += entity->weight;
+
+ if (new_st != old_st)
+ entity->start = new_st->vtime;
+ }
+
+ return new_st;
+}
+
+/**
+ * bfq_bfqq_served - update the scheduler status after selection for service.
+ * @bfqq: the queue being served.
+ * @served: bytes to transfer.
+ *
+ * NOTE: this can be optimized, as the timestamps of upper level entities
+ * are synchronized every time a new bfqq is selected for service. By now,
+ * we keep it to better check consistency.
+ */
+static void bfq_bfqq_served(struct bfq_queue *bfqq, unsigned long served)
+{
+ struct bfq_entity *entity = &bfqq->entity;
+ struct bfq_service_tree *st;
+
+ for_each_entity(entity) {
+ st = bfq_entity_service_tree(entity);
+
+ entity->service += served;
+ BUG_ON(entity->service > entity->budget);
+ BUG_ON(st->wsum == 0);
+
+ st->vtime += bfq_delta(served, st->wsum);
+ bfq_forget_idle(st);
+ }
+ bfq_log_bfqq(bfqq->bfqd, bfqq, "bfqq_served %lu secs", served);
+}
+
+/**
+ * bfq_bfqq_charge_full_budget - set the service to the entity budget.
+ * @bfqq: the queue that needs a service update.
+ *
+ * When it's not possible to be fair in the service domain, because
+ * a queue is not consuming its budget fast enough (the meaning of
+ * fast depends on the timeout parameter), we charge it a full
+ * budget. In this way we should obtain a sort of time-domain
+ * fairness among all the seeky/slow queues.
+ */
+static inline void bfq_bfqq_charge_full_budget(struct bfq_queue *bfqq)
+{
+ struct bfq_entity *entity = &bfqq->entity;
+
+ bfq_log_bfqq(bfqq->bfqd, bfqq, "charge_full_budget");
+
+ bfq_bfqq_served(bfqq, entity->budget - entity->service);
+}
+
+/**
+ * __bfq_activate_entity - activate an entity.
+ * @entity: the entity being activated.
+ *
+ * Called whenever an entity is activated, i.e., it is not active and one
+ * of its children receives a new request, or has to be reactivated due to
+ * budget exhaustion. It uses the current budget of the entity (and the
+ * service received if @entity is active) of the queue to calculate its
+ * timestamps.
+ */
+static void __bfq_activate_entity(struct bfq_entity *entity)
+{
+ struct bfq_sched_data *sd = entity->sched_data;
+ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
+
+ if (entity == sd->in_service_entity) {
+ BUG_ON(entity->tree != NULL);
+ /*
+ * If we are requeueing the current entity we have
+ * to take care of not charging to it service it has
+ * not received.
+ */
+ bfq_calc_finish(entity, entity->service);
+ entity->start = entity->finish;
+ sd->in_service_entity = NULL;
+ } else if (entity->tree == &st->active) {
+ /*
+ * Requeueing an entity due to a change of some
+ * next_in_service entity below it. We reuse the
+ * old start time.
+ */
+ bfq_active_extract(st, entity);
+ } else if (entity->tree == &st->idle) {
+ /*
+ * Must be on the idle tree, bfq_idle_extract() will
+ * check for that.
+ */
+ bfq_idle_extract(st, entity);
+ entity->start = bfq_gt(st->vtime, entity->finish) ?
+ st->vtime : entity->finish;
+ } else {
+ /*
+ * The finish time of the entity may be invalid, and
+ * it is in the past for sure, otherwise the queue
+ * would have been on the idle tree.
+ */
+ entity->start = st->vtime;
+ st->wsum += entity->weight;
+ bfq_get_entity(entity);
+
+ BUG_ON(entity->on_st);
+ entity->on_st = 1;
+ }
+
+ st = __bfq_entity_update_weight_prio(st, entity);
+ bfq_calc_finish(entity, entity->budget);
+ bfq_active_insert(st, entity);
+}
+
+/**
+ * bfq_activate_entity - activate an entity and its ancestors if necessary.
+ * @entity: the entity to activate.
+ *
+ * Activate @entity and all the entities on the path from it to the root.
+ */
+static void bfq_activate_entity(struct bfq_entity *entity)
+{
+ struct bfq_sched_data *sd;
+
+ for_each_entity(entity) {
+ __bfq_activate_entity(entity);
+
+ sd = entity->sched_data;
+ if (!bfq_update_next_in_service(sd))
+ /*
+ * No need to propagate the activation to the
+ * upper entities, as they will be updated when
+ * the in-service entity is rescheduled.
+ */
+ break;
+ }
+}
+
+/**
+ * __bfq_deactivate_entity - deactivate an entity from its service tree.
+ * @entity: the entity to deactivate.
+ * @requeue: if false, the entity will not be put into the idle tree.
+ *
+ * Deactivate an entity, independently from its previous state. If the
+ * entity was not on a service tree just return, otherwise if it is on
+ * any scheduler tree, extract it from that tree, and if necessary
+ * and if the caller did not specify @requeue, put it on the idle tree.
+ *
+ * Return %1 if the caller should update the entity hierarchy, i.e.,
+ * if the entity was under service or if it was the next_in_service for
+ * its sched_data; return %0 otherwise.
+ */
+static int __bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
+{
+ struct bfq_sched_data *sd = entity->sched_data;
+ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
+ int was_in_service = entity == sd->in_service_entity;
+ int ret = 0;
+
+ if (!entity->on_st)
+ return 0;
+
+ BUG_ON(was_in_service && entity->tree != NULL);
+
+ if (was_in_service) {
+ bfq_calc_finish(entity, entity->service);
+ sd->in_service_entity = NULL;
+ } else if (entity->tree == &st->active)
+ bfq_active_extract(st, entity);
+ else if (entity->tree == &st->idle)
+ bfq_idle_extract(st, entity);
+ else if (entity->tree != NULL)
+ BUG();
+
+ if (was_in_service || sd->next_in_service == entity)
+ ret = bfq_update_next_in_service(sd);
+
+ if (!requeue || !bfq_gt(entity->finish, st->vtime))
+ bfq_forget_entity(st, entity);
+ else
+ bfq_idle_insert(st, entity);
+
+ BUG_ON(sd->in_service_entity == entity);
+ BUG_ON(sd->next_in_service == entity);
+
+ return ret;
+}
+
+/**
+ * bfq_deactivate_entity - deactivate an entity.
+ * @entity: the entity to deactivate.
+ * @requeue: true if the entity can be put on the idle tree
+ */
+static void bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
+{
+ struct bfq_sched_data *sd;
+ struct bfq_entity *parent;
+
+ for_each_entity_safe(entity, parent) {
+ sd = entity->sched_data;
+
+ if (!__bfq_deactivate_entity(entity, requeue))
+ /*
+ * The parent entity is still backlogged, and
+ * we don't need to update it as it is still
+ * under service.
+ */
+ break;
+
+ if (sd->next_in_service != NULL)
+ /*
+ * The parent entity is still backlogged and
+ * the budgets on the path towards the root
+ * need to be updated.
+ */
+ goto update;
+
+ /*
+ * If we reach there the parent is no more backlogged and
+ * we want to propagate the dequeue upwards.
+ */
+ requeue = 1;
+ }
+
+ return;
+
+update:
+ entity = parent;
+ for_each_entity(entity) {
+ __bfq_activate_entity(entity);
+
+ sd = entity->sched_data;
+ if (!bfq_update_next_in_service(sd))
+ break;
+ }
+}
+
+/**
+ * bfq_update_vtime - update vtime if necessary.
+ * @st: the service tree to act upon.
+ *
+ * If necessary update the service tree vtime to have at least one
+ * eligible entity, skipping to its start time. Assumes that the
+ * active tree of the device is not empty.
+ *
+ * NOTE: this hierarchical implementation updates vtimes quite often,
+ * we may end up with reactivated tasks getting timestamps after a
+ * vtime skip done because we needed a ->first_active entity on some
+ * intermediate node.
+ */
+static void bfq_update_vtime(struct bfq_service_tree *st)
+{
+ struct bfq_entity *entry;
+ struct rb_node *node = st->active.rb_node;
+
+ entry = rb_entry(node, struct bfq_entity, rb_node);
+ if (bfq_gt(entry->min_start, st->vtime)) {
+ st->vtime = entry->min_start;
+ bfq_forget_idle(st);
+ }
+}
+
+/**
+ * bfq_first_active_entity - find the eligible entity with
+ * the smallest finish time
+ * @st: the service tree to select from.
+ *
+ * This function searches the first schedulable entity, starting from the
+ * root of the tree and going on the left every time on this side there is
+ * a subtree with at least one eligible (start >= vtime) entity. The path
+ * on the right is followed only if a) the left subtree contains no eligible
+ * entities and b) no eligible entity has been found yet.
+ */
+static struct bfq_entity *bfq_first_active_entity(struct bfq_service_tree *st)
+{
+ struct bfq_entity *entry, *first = NULL;
+ struct rb_node *node = st->active.rb_node;
+
+ while (node != NULL) {
+ entry = rb_entry(node, struct bfq_entity, rb_node);
+left:
+ if (!bfq_gt(entry->start, st->vtime))
+ first = entry;
+
+ BUG_ON(bfq_gt(entry->min_start, st->vtime));
+
+ if (node->rb_left != NULL) {
+ entry = rb_entry(node->rb_left,
+ struct bfq_entity, rb_node);
+ if (!bfq_gt(entry->min_start, st->vtime)) {
+ node = node->rb_left;
+ goto left;
+ }
+ }
+ if (first != NULL)
+ break;
+ node = node->rb_right;
+ }
+
+ BUG_ON(first == NULL && !RB_EMPTY_ROOT(&st->active));
+ return first;
+}
+
+/**
+ * __bfq_lookup_next_entity - return the first eligible entity in @st.
+ * @st: the service tree.
+ *
+ * Update the virtual time in @st and return the first eligible entity
+ * it contains.
+ */
+static struct bfq_entity *__bfq_lookup_next_entity(struct bfq_service_tree *st,
+ bool force)
+{
+ struct bfq_entity *entity, *new_next_in_service = NULL;
+
+ if (RB_EMPTY_ROOT(&st->active))
+ return NULL;
+
+ bfq_update_vtime(st);
+ entity = bfq_first_active_entity(st);
+ BUG_ON(bfq_gt(entity->start, st->vtime));
+
+ /*
+ * If the chosen entity does not match with the sched_data's
+ * next_in_service and we are forcedly serving the IDLE priority
+ * class tree, bubble up budget update.
+ */
+ if (unlikely(force && entity != entity->sched_data->next_in_service)) {
+ new_next_in_service = entity;
+ for_each_entity(new_next_in_service)
+ bfq_update_budget(new_next_in_service);
+ }
+
+ return entity;
+}
+
+/**
+ * bfq_lookup_next_entity - return the first eligible entity in @sd.
+ * @sd: the sched_data.
+ * @extract: if true the returned entity will be also extracted from @sd.
+ *
+ * NOTE: since we cache the next_in_service entity at each level of the
+ * hierarchy, the complexity of the lookup can be decreased with
+ * absolutely no effort just returning the cached next_in_service value;
+ * we prefer to do full lookups to test the consistency of * the data
+ * structures.
+ */
+static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
+ int extract,
+ struct bfq_data *bfqd)
+{
+ struct bfq_service_tree *st = sd->service_tree;
+ struct bfq_entity *entity;
+ int i = 0;
+
+ BUG_ON(sd->in_service_entity != NULL);
+
+ if (bfqd != NULL &&
+ jiffies - bfqd->bfq_class_idle_last_service > BFQ_CL_IDLE_TIMEOUT) {
+ entity = __bfq_lookup_next_entity(st + BFQ_IOPRIO_CLASSES - 1,
+ true);
+ if (entity != NULL) {
+ i = BFQ_IOPRIO_CLASSES - 1;
+ bfqd->bfq_class_idle_last_service = jiffies;
+ sd->next_in_service = entity;
+ }
+ }
+ for (; i < BFQ_IOPRIO_CLASSES; i++) {
+ entity = __bfq_lookup_next_entity(st + i, false);
+ if (entity != NULL) {
+ if (extract) {
+ bfq_check_next_in_service(sd, entity);
+ bfq_active_extract(st + i, entity);
+ sd->in_service_entity = entity;
+ sd->next_in_service = NULL;
+ }
+ break;
+ }
+ }
+
+ return entity;
+}
+
+/*
+ * Get next queue for service.
+ */
+static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
+{
+ struct bfq_entity *entity = NULL;
+ struct bfq_sched_data *sd;
+ struct bfq_queue *bfqq;
+
+ BUG_ON(bfqd->in_service_queue != NULL);
+
+ if (bfqd->busy_queues == 0)
+ return NULL;
+
+ sd = &bfqd->root_group->sched_data;
+ for (; sd != NULL; sd = entity->my_sched_data) {
+ entity = bfq_lookup_next_entity(sd, 1, bfqd);
+ BUG_ON(entity == NULL);
+ entity->service = 0;
+ }
+
+ bfqq = bfq_entity_to_bfqq(entity);
+ BUG_ON(bfqq == NULL);
+
+ return bfqq;
+}
+
+static void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd)
+{
+ if (bfqd->in_service_bic != NULL) {
+ put_io_context(bfqd->in_service_bic->icq.ioc);
+ bfqd->in_service_bic = NULL;
+ }
+
+ bfqd->in_service_queue = NULL;
+ del_timer(&bfqd->idle_slice_timer);
+}
+
+static void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ int requeue)
+{
+ struct bfq_entity *entity = &bfqq->entity;
+
+ if (bfqq == bfqd->in_service_queue)
+ __bfq_bfqd_reset_in_service(bfqd);
+
+ bfq_deactivate_entity(entity, requeue);
+}
+
+static void bfq_activate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+{
+ struct bfq_entity *entity = &bfqq->entity;
+
+ bfq_activate_entity(entity);
+}
+
+/*
+ * Called when the bfqq no longer has requests pending, remove it from
+ * the service tree.
+ */
+static void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ int requeue)
+{
+ BUG_ON(!bfq_bfqq_busy(bfqq));
+ BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
+
+ bfq_log_bfqq(bfqd, bfqq, "del from busy");
+
+ bfq_clear_bfqq_busy(bfqq);
+
+ BUG_ON(bfqd->busy_queues == 0);
+ bfqd->busy_queues--;
+
+ if (!bfqq->dispatched) {
+ bfq_weights_tree_remove(bfqd, &bfqq->entity,
+ &bfqd->queue_weights_tree);
+ if (!blk_queue_nonrot(bfqd->queue)) {
+ BUG_ON(!bfqd->busy_in_flight_queues);
+ bfqd->busy_in_flight_queues--;
+ if (bfq_bfqq_constantly_seeky(bfqq)) {
+ BUG_ON(
+ !bfqd->const_seeky_busy_in_flight_queues);
+ bfqd->const_seeky_busy_in_flight_queues--;
+ }
+ }
+ }
+ if (bfqq->wr_coeff > 1)
+ bfqd->raised_busy_queues--;
+
+ bfq_deactivate_bfqq(bfqd, bfqq, requeue);
+}
+
+/*
+ * Called when an inactive queue receives a new request.
+ */
+static void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+{
+ BUG_ON(bfq_bfqq_busy(bfqq));
+ BUG_ON(bfqq == bfqd->in_service_queue);
+
+ bfq_log_bfqq(bfqd, bfqq, "add to busy");
+
+ bfq_activate_bfqq(bfqd, bfqq);
+
+ bfq_mark_bfqq_busy(bfqq);
+ bfqd->busy_queues++;
+
+ if (!bfqq->dispatched) {
+ if (bfqq->wr_coeff == 1)
+ bfq_weights_tree_add(bfqd, &bfqq->entity,
+ &bfqd->queue_weights_tree);
+ if (!blk_queue_nonrot(bfqd->queue)) {
+ bfqd->busy_in_flight_queues++;
+ if (bfq_bfqq_constantly_seeky(bfqq))
+ bfqd->const_seeky_busy_in_flight_queues++;
+ }
+ }
+ if (bfqq->wr_coeff > 1)
+ bfqd->raised_busy_queues++;
+}
diff --git a/block/bfq.h b/block/bfq.h
new file mode 100644
index 0000000..c754735
--- /dev/null
+++ b/block/bfq.h
@@ -0,0 +1,719 @@
+/*
+ * BFQ-v7r4 for 3.15.0: data structures and common functions prototypes.
+ *
+ * Based on ideas and code from CFQ:
+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
+ *
+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
+ * Paolo Valente <paolo.valente@unimore.it>
+ *
+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
+ */
+
+#ifndef _BFQ_H
+#define _BFQ_H
+
+#include <linux/blktrace_api.h>
+#include <linux/hrtimer.h>
+#include <linux/ioprio.h>
+#include <linux/rbtree.h>
+
+#define BFQ_IOPRIO_CLASSES 3
+#define BFQ_CL_IDLE_TIMEOUT (HZ/5)
+
+#define BFQ_MIN_WEIGHT 1
+#define BFQ_MAX_WEIGHT 1000
+
+#define BFQ_DEFAULT_GRP_WEIGHT 10
+#define BFQ_DEFAULT_GRP_IOPRIO 0
+#define BFQ_DEFAULT_GRP_CLASS IOPRIO_CLASS_BE
+
+struct bfq_entity;
+
+/**
+ * struct bfq_service_tree - per ioprio_class service tree.
+ * @active: tree for active entities (i.e., those backlogged).
+ * @idle: tree for idle entities (i.e., those not backlogged, with V <= F_i).
+ * @first_idle: idle entity with minimum F_i.
+ * @last_idle: idle entity with maximum F_i.
+ * @vtime: scheduler virtual time.
+ * @wsum: scheduler weight sum; active and idle entities contribute to it.
+ *
+ * Each service tree represents a B-WF2Q+ scheduler on its own. Each
+ * ioprio_class has its own independent scheduler, and so its own
+ * bfq_service_tree. All the fields are protected by the queue lock
+ * of the containing bfqd.
+ */
+struct bfq_service_tree {
+ struct rb_root active;
+ struct rb_root idle;
+
+ struct bfq_entity *first_idle;
+ struct bfq_entity *last_idle;
+
+ u64 vtime;
+ unsigned long wsum;
+};
+
+/**
+ * struct bfq_sched_data - multi-class scheduler.
+ * @in_service_entity: entity under service.
+ * @next_in_service: head-of-the-line entity in the scheduler.
+ * @service_tree: array of service trees, one per ioprio_class.
+ *
+ * bfq_sched_data is the basic scheduler queue. It supports three
+ * ioprio_classes, and can be used either as a toplevel queue or as
+ * an intermediate queue on a hierarchical setup.
+ * @next_in_service points to the active entity of the sched_data
+ * service trees that will be scheduled next.
+ *
+ * The supported ioprio_classes are the same as in CFQ, in descending
+ * priority order, IOPRIO_CLASS_RT, IOPRIO_CLASS_BE, IOPRIO_CLASS_IDLE.
+ * Requests from higher priority queues are served before all the
+ * requests from lower priority queues; among requests of the same
+ * queue requests are served according to B-WF2Q+.
+ * All the fields are protected by the queue lock of the containing bfqd.
+ */
+struct bfq_sched_data {
+ struct bfq_entity *in_service_entity;
+ struct bfq_entity *next_in_service;
+ struct bfq_service_tree service_tree[BFQ_IOPRIO_CLASSES];
+};
+
+/**
+ * struct bfq_weight_counter - counter of the number of all active entities
+ * with a given weight.
+ * @weight: weight of the entities that this counter refers to.
+ * @num_active: number of active entities with this weight.
+ * @weights_node: weights tree member (see bfq_data's @queue_weights_tree
+ * and @group_weights_tree).
+ */
+struct bfq_weight_counter {
+ short int weight;
+ unsigned int num_active;
+ struct rb_node weights_node;
+};
+
+/**
+ * struct bfq_entity - schedulable entity.
+ * @rb_node: service_tree member.
+ * @weights_counter: pointer to the weight counter associated with this entity.
+ * @on_st: flag, true if the entity is on a tree (either the active or
+ * the idle one of its service_tree).
+ * @finish: B-WF2Q+ finish timestamp (aka F_i).
+ * @start: B-WF2Q+ start timestamp (aka S_i).
+ * @tree: tree the entity is enqueued into; %NULL if not on a tree.
+ * @min_start: minimum start time of the (active) subtree rooted at
+ * this entity; used for O(log N) lookups into active trees.
+ * @service: service received during the last round of service.
+ * @budget: budget used to calculate F_i; F_i = S_i + @budget / @weight.
+ * @weight: weight of the queue
+ * @parent: parent entity, for hierarchical scheduling.
+ * @my_sched_data: for non-leaf nodes in the cgroup hierarchy, the
+ * associated scheduler queue, %NULL on leaf nodes.
+ * @sched_data: the scheduler queue this entity belongs to.
+ * @ioprio: the ioprio in use.
+ * @new_weight: when a weight change is requested, the new weight value.
+ * @orig_weight: original weight, used to implement weight boosting
+ * @new_ioprio: when an ioprio change is requested, the new ioprio value.
+ * @ioprio_class: the ioprio_class in use.
+ * @new_ioprio_class: when an ioprio_class change is requested, the new
+ * ioprio_class value.
+ * @ioprio_changed: flag, true when the user requested a weight, ioprio or
+ * ioprio_class change.
+ *
+ * A bfq_entity is used to represent either a bfq_queue (leaf node in the
+ * cgroup hierarchy) or a bfq_group into the upper level scheduler. Each
+ * entity belongs to the sched_data of the parent group in the cgroup
+ * hierarchy. Non-leaf entities have also their own sched_data, stored
+ * in @my_sched_data.
+ *
+ * Each entity stores independently its priority values; this would
+ * allow different weights on different devices, but this
+ * functionality is not exported to userspace by now. Priorities and
+ * weights are updated lazily, first storing the new values into the
+ * new_* fields, then setting the @ioprio_changed flag. As soon as
+ * there is a transition in the entity state that allows the priority
+ * update to take place the effective and the requested priority
+ * values are synchronized.
+ *
+ * Unless cgroups are used, the weight value is calculated from the
+ * ioprio to export the same interface as CFQ. When dealing with
+ * ``well-behaved'' queues (i.e., queues that do not spend too much
+ * time to consume their budget and have true sequential behavior, and
+ * when there are no external factors breaking anticipation) the
+ * relative weights at each level of the cgroups hierarchy should be
+ * guaranteed. All the fields are protected by the queue lock of the
+ * containing bfqd.
+ */
+struct bfq_entity {
+ struct rb_node rb_node;
+ struct bfq_weight_counter *weight_counter;
+
+ int on_st;
+
+ u64 finish;
+ u64 start;
+
+ struct rb_root *tree;
+
+ u64 min_start;
+
+ unsigned long service, budget;
+ unsigned short weight, new_weight;
+ unsigned short orig_weight;
+
+ struct bfq_entity *parent;
+
+ struct bfq_sched_data *my_sched_data;
+ struct bfq_sched_data *sched_data;
+
+ unsigned short ioprio, new_ioprio;
+ unsigned short ioprio_class, new_ioprio_class;
+
+ int ioprio_changed;
+};
+
+struct bfq_group;
+
+/**
+ * struct bfq_queue - leaf schedulable entity.
+ * @ref: reference counter.
+ * @bfqd: parent bfq_data.
+ * @new_bfqq: shared bfq_queue if queue is cooperating with
+ * one or more other queues.
+ * @pos_node: request-position tree member (see bfq_data's @rq_pos_tree).
+ * @pos_root: request-position tree root (see bfq_data's @rq_pos_tree).
+ * @sort_list: sorted list of pending requests.
+ * @next_rq: if fifo isn't expired, next request to serve.
+ * @queued: nr of requests queued in @sort_list.
+ * @allocated: currently allocated requests.
+ * @meta_pending: pending metadata requests.
+ * @fifo: fifo list of requests in sort_list.
+ * @entity: entity representing this queue in the scheduler.
+ * @max_budget: maximum budget allowed from the feedback mechanism.
+ * @budget_timeout: budget expiration (in jiffies).
+ * @dispatched: number of requests on the dispatch list or inside driver.
+ * @org_ioprio: saved ioprio during boosted periods.
+ * @flags: status flags.
+ * @bfqq_list: node for active/idle bfqq list inside our bfqd.
+ * @seek_samples: number of seeks sampled
+ * @seek_total: sum of the distances of the seeks sampled
+ * @seek_mean: mean seek distance
+ * @last_request_pos: position of the last request enqueued
+ * @pid: pid of the process owning the queue, used for logging purposes.
+ * @last_wr_start_finish: start time of the current weight-raising period if
+ * the @bfq-queue is being weight-raised, otherwise
+ * finish time of the last weight-raising period
+ * @wr_cur_max_time: current max raising time for this queue
+ * @soft_rt_next_start: minimum time instant such that, only if a new request
+ * is enqueued after this time instant in an idle
+ * @bfq_queue with no outstanding requests, then the
+ * task associated with the queue it is deemed as soft
+ * real-time (see the comments to the function
+ * bfq_bfqq_softrt_next_start())
+ * @last_idle_bklogged: time of the last transition of the @bfq_queue from
+ * idle to backlogged
+ * @service_from_backlogged: cumulative service received from the @bfq_queue
+ * since the last transition from idle to backlogged
+ * @bic: pointer to the bfq_io_cq owning the bfq_queue, set to %NULL if the
+ * queue is shared
+ *
+ * A bfq_queue is a leaf request queue; it can be associated with an io_context
+ * or more, if it is async or shared between cooperating processes. @cgroup
+ * holds a reference to the cgroup, to be sure that it does not disappear while
+ * a bfqq still references it (mostly to avoid races between request issuing and
+ * task migration followed by cgroup destruction).
+ * All the fields are protected by the queue lock of the containing bfqd.
+ */
+struct bfq_queue {
+ atomic_t ref;
+ struct bfq_data *bfqd;
+
+ /* fields for cooperating queues handling */
+ struct bfq_queue *new_bfqq;
+ struct rb_node pos_node;
+ struct rb_root *pos_root;
+
+ struct rb_root sort_list;
+ struct request *next_rq;
+ int queued[2];
+ int allocated[2];
+ int meta_pending;
+ struct list_head fifo;
+
+ struct bfq_entity entity;
+
+ unsigned long max_budget;
+ unsigned long budget_timeout;
+
+ int dispatched;
+
+ unsigned short org_ioprio;
+
+ unsigned int flags;
+
+ struct list_head bfqq_list;
+
+ unsigned int seek_samples;
+ u64 seek_total;
+ sector_t seek_mean;
+ sector_t last_request_pos;
+
+ pid_t pid;
+ struct bfq_io_cq *bic;
+
+ /* weight-raising fields */
+ unsigned long wr_cur_max_time;
+ unsigned long soft_rt_next_start;
+ unsigned long last_wr_start_finish;
+ unsigned int wr_coeff;
+ unsigned long last_idle_bklogged;
+ unsigned long service_from_backlogged;
+};
+
+/**
+ * struct bfq_ttime - per process thinktime stats.
+ * @ttime_total: total process thinktime
+ * @ttime_samples: number of thinktime samples
+ * @ttime_mean: average process thinktime
+ */
+struct bfq_ttime {
+ unsigned long last_end_request;
+
+ unsigned long ttime_total;
+ unsigned long ttime_samples;
+ unsigned long ttime_mean;
+};
+
+/**
+ * struct bfq_io_cq - per (request_queue, io_context) structure.
+ * @icq: associated io_cq structure
+ * @bfqq: array of two process queues, the sync and the async
+ * @ttime: associated @bfq_ttime struct
+ * @wr_time_left: snapshot of the time left before weight raising ends
+ * for the sync queue associated to this process; this
+ * snapshot is taken to remember this value while the weight
+ * raising is suspended because the queue is merged with a
+ * shared queue, and is used to set @raising_cur_max_time
+ * when the queue is split from the shared queue and its
+ * weight is raised again
+ * @saved_idle_window: same purpose as the previous field for the idle window
+ */
+struct bfq_io_cq {
+ struct io_cq icq; /* must be the first member */
+ struct bfq_queue *bfqq[2];
+ struct bfq_ttime ttime;
+ int ioprio;
+
+ unsigned int wr_time_left;
+ unsigned int saved_idle_window;
+};
+
+enum bfq_device_speed {
+ BFQ_BFQD_FAST,
+ BFQ_BFQD_SLOW,
+};
+
+/**
+ * struct bfq_data - per device data structure.
+ * @queue: request queue for the managed device.
+ * @root_group: root bfq_group for the device.
+ * @active_numerous_groups: number of bfq_groups containing more than one
+ * active @bfq_entity.
+ * @rq_pos_tree: rbtree sorted by next_request position,
+ * used when determining if two or more queues
+ * have interleaving requests (see bfq_close_cooperator).
+ * @queue_weights_tree: rbtree of weight counters of @bfq_queues, sorted by
+ * weight. Used to keep track of whether all @bfq_queues
+ * have the same weight. The tree contains one counter
+ * for each distinct weight associated to some active
+ * and not weight-raised @bfq_queue (see the comments to
+ * the functions bfq_weights_tree_[add|remove] for
+ * further details).
+ * @group_weights_tree: rbtree of non-queue @bfq_entity weight counters, sorted
+ * by weight. Used to keep track of whether all
+ * @bfq_groups have the same weight. The tree contains
+ * one counter for each distinct weight associated to
+ * some active @bfq_group (see the comments to the
+ * functions bfq_weights_tree_[add|remove] for further
+ * details).
+ * @busy_queues: number of bfq_queues containing requests (including the
+ * queue under service, even if it is idling).
+ * @busy_in_flight_queues: number of @bfq_queues containing pending or
+ * in-flight requests, plus the @bfq_queue in service,
+ * even if idle but waiting for the possible arrival
+ * of its next sync request. This field is updated only
+ * if the device is rotational, but used only if the
+ * device is also NCQ-capable. The reason why the field
+ * is updated also for non-NCQ-capable rotational
+ * devices is related to the fact that the value of
+ * hw_tag may be set also later than when this field may
+ * need to be incremented for the first time(s).
+ * Taking also this possibility into account, to avoid
+ * unbalanced increments/decrements, would imply more
+ * overhead than just updating this field regardless of
+ * the value of hw_tag.
+ * @const_seeky_busy_in_flight_queues: number of constantly-seeky @bfq_queues
+ * (that is, seeky queues that expired
+ * for budget timeout at least once)
+ * containing pending or in-flight
+ * requests, including the in-service
+ * @bfq_queue if constantly seeky. This
+ * field is updated only if the device
+ * is rotational, but used only if the
+ * device is also NCQ-capable (see the
+ * comments to @busy_in_flight_queues).
+ * @raised_busy_queues: number of weight-raised busy bfq_queues.
+ * @queued: number of queued requests.
+ * @rq_in_driver: number of requests dispatched and waiting for completion.
+ * @sync_flight: number of sync requests in the driver.
+ * @max_rq_in_driver: max number of reqs in driver in the last @hw_tag_samples
+ * completed requests.
+ * @hw_tag_samples: nr of samples used to calculate hw_tag.
+ * @hw_tag: flag set to one if the driver is showing a queueing behavior.
+ * @budgets_assigned: number of budgets assigned.
+ * @idle_slice_timer: timer set when idling for the next sequential request
+ * from the queue under service.
+ * @unplug_work: delayed work to restart dispatching on the request queue.
+ * @in_service_queue: bfq_queue under service.
+ * @in_service_bic: bfq_io_cq (bic) associated with the @in_service_queue.
+ * @last_position: on-disk position of the last served request.
+ * @last_budget_start: beginning of the last budget.
+ * @last_idling_start: beginning of the last idle slice.
+ * @peak_rate: peak transfer rate observed for a budget.
+ * @peak_rate_samples: number of samples used to calculate @peak_rate.
+ * @bfq_max_budget: maximum budget allotted to a bfq_queue before rescheduling.
+ * @group_list: list of all the bfq_groups active on the device.
+ * @active_list: list of all the bfq_queues active on the device.
+ * @idle_list: list of all the bfq_queues idle on the device.
+ * @bfq_quantum: max number of requests dispatched per dispatch round.
+ * @bfq_fifo_expire: timeout for async/sync requests; when it expires
+ * requests are served in fifo order.
+ * @bfq_back_penalty: weight of backward seeks wrt forward ones.
+ * @bfq_back_max: maximum allowed backward seek.
+ * @bfq_slice_idle: maximum idling time.
+ * @bfq_user_max_budget: user-configured max budget value (0 for auto-tuning).
+ * @bfq_max_budget_async_rq: maximum budget (in nr of requests) allotted to
+ * async queues.
+ * @bfq_timeout: timeout for bfq_queues to consume their budget; used to
+ * to prevent seeky queues to impose long latencies to well
+ * behaved ones (this also implies that seeky queues cannot
+ * receive guarantees in the service domain; after a timeout
+ * they are charged for the whole allocated budget, to try
+ * to preserve a behavior reasonably fair among them, but
+ * without service-domain guarantees).
+ * @bfq_wr_coeff: Maximum factor by which the weight of a weight-raised
+ * queue is multiplied
+ * @bfq_wr_max_time: maximum duration of a weight-raising period (jiffies)
+ * @bfq_wr_rt_max_time: maximum duration for soft real-time processes
+ * @bfq_wr_min_idle_time: minimum idle period after which weight-raising
+ * may be reactivated for a queue (in jiffies)
+ * @bfq_wr_min_inter_arr_async: minimum period between request arrivals
+ * after which weight-raising may be
+ * reactivated for an already busy queue
+ * (in jiffies)
+ * @bfq_wr_max_softrt_rate: max service-rate for a soft real-time queue,
+ * sectors per seconds
+ * @RT_prod: cached value of the product R*T used for computing the maximum
+ * duration of the weight raising automatically
+ * @device_speed: device speed class for the low-latency heuristic
+ * @oom_bfqq: fallback dummy bfqq for extreme OOM conditions
+ *
+ * All the fields are protected by the @queue lock.
+ */
+struct bfq_data {
+ struct request_queue *queue;
+
+ struct bfq_group *root_group;
+#ifdef CONFIG_CGROUP_BFQIO
+ int active_numerous_groups;
+#endif
+
+ struct rb_root rq_pos_tree;
+ struct rb_root queue_weights_tree;
+ struct rb_root group_weights_tree;
+
+ int busy_queues;
+ int busy_in_flight_queues;
+ int const_seeky_busy_in_flight_queues;
+ int raised_busy_queues;
+ int queued;
+ int rq_in_driver;
+ int sync_flight;
+
+ int max_rq_in_driver;
+ int hw_tag_samples;
+ int hw_tag;
+
+ int budgets_assigned;
+
+ struct timer_list idle_slice_timer;
+ struct work_struct unplug_work;
+
+ struct bfq_queue *in_service_queue;
+ struct bfq_io_cq *in_service_bic;
+
+ sector_t last_position;
+
+ ktime_t last_budget_start;
+ ktime_t last_idling_start;
+ int peak_rate_samples;
+ u64 peak_rate;
+ unsigned long bfq_max_budget;
+
+ struct hlist_head group_list;
+ struct list_head active_list;
+ struct list_head idle_list;
+
+ unsigned int bfq_quantum;
+ unsigned int bfq_fifo_expire[2];
+ unsigned int bfq_back_penalty;
+ unsigned int bfq_back_max;
+ unsigned int bfq_slice_idle;
+ u64 bfq_class_idle_last_service;
+
+ unsigned int bfq_user_max_budget;
+ unsigned int bfq_max_budget_async_rq;
+ unsigned int bfq_timeout[2];
+
+ bool low_latency;
+
+ /* parameters of the low_latency heuristics */
+ unsigned int bfq_wr_coeff;
+ unsigned int bfq_wr_max_time;
+ unsigned int bfq_wr_rt_max_time;
+ unsigned int bfq_wr_min_idle_time;
+ unsigned long bfq_wr_min_inter_arr_async;
+ unsigned int bfq_wr_max_softrt_rate;
+ u64 RT_prod;
+ enum bfq_device_speed device_speed;
+
+ struct bfq_queue oom_bfqq;
+};
+
+enum bfqq_state_flags {
+ BFQ_BFQQ_FLAG_busy = 0, /* has requests or is under service */
+ BFQ_BFQQ_FLAG_wait_request, /* waiting for a request */
+ BFQ_BFQQ_FLAG_must_alloc, /* must be allowed rq alloc */
+ BFQ_BFQQ_FLAG_fifo_expire, /* FIFO checked in this slice */
+ BFQ_BFQQ_FLAG_idle_window, /* slice idling enabled */
+ BFQ_BFQQ_FLAG_prio_changed, /* task priority has changed */
+ BFQ_BFQQ_FLAG_sync, /* synchronous queue */
+ BFQ_BFQQ_FLAG_budget_new, /* no completion with this budget */
+ BFQ_BFQQ_FLAG_constantly_seeky, /*
+ * bfqq has proved to be slow and seeky
+ * until budget timeout
+ */
+ BFQ_BFQQ_FLAG_coop, /* bfqq is shared */
+ BFQ_BFQQ_FLAG_split_coop, /* shared bfqq will be split */
+ BFQ_BFQQ_FLAG_just_split, /* queue has just been split */
+ BFQ_BFQQ_FLAG_softrt_update, /* may need softrt-next-start update */
+};
+
+#define BFQ_BFQQ_FNS(name) \
+static inline void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \
+{ \
+ (bfqq)->flags |= (1 << BFQ_BFQQ_FLAG_##name); \
+} \
+static inline void bfq_clear_bfqq_##name(struct bfq_queue *bfqq) \
+{ \
+ (bfqq)->flags &= ~(1 << BFQ_BFQQ_FLAG_##name); \
+} \
+static inline int bfq_bfqq_##name(const struct bfq_queue *bfqq) \
+{ \
+ return ((bfqq)->flags & (1 << BFQ_BFQQ_FLAG_##name)) != 0; \
+}
+
+BFQ_BFQQ_FNS(busy);
+BFQ_BFQQ_FNS(wait_request);
+BFQ_BFQQ_FNS(must_alloc);
+BFQ_BFQQ_FNS(fifo_expire);
+BFQ_BFQQ_FNS(idle_window);
+BFQ_BFQQ_FNS(prio_changed);
+BFQ_BFQQ_FNS(sync);
+BFQ_BFQQ_FNS(budget_new);
+BFQ_BFQQ_FNS(constantly_seeky);
+BFQ_BFQQ_FNS(coop);
+BFQ_BFQQ_FNS(split_coop);
+BFQ_BFQQ_FNS(just_split);
+BFQ_BFQQ_FNS(softrt_update);
+#undef BFQ_BFQQ_FNS
+
+/* Logging facilities. */
+#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) \
+ blk_add_trace_msg((bfqd)->queue, "bfq%d " fmt, (bfqq)->pid, ##args)
+
+#define bfq_log(bfqd, fmt, args...) \
+ blk_add_trace_msg((bfqd)->queue, "bfq " fmt, ##args)
+
+/* Expiration reasons. */
+enum bfqq_expiration {
+ BFQ_BFQQ_TOO_IDLE = 0, /* queue has been idling for too long */
+ BFQ_BFQQ_BUDGET_TIMEOUT, /* budget took too long to be used */
+ BFQ_BFQQ_BUDGET_EXHAUSTED, /* budget consumed */
+ BFQ_BFQQ_NO_MORE_REQUESTS, /* the queue has no more requests */
+};
+
+#ifdef CONFIG_CGROUP_BFQIO
+/**
+ * struct bfq_group - per (device, cgroup) data structure.
+ * @entity: schedulable entity to insert into the parent group sched_data.
+ * @sched_data: own sched_data, to contain child entities (they may be
+ * both bfq_queues and bfq_groups).
+ * @group_node: node to be inserted into the bfqio_cgroup->group_data
+ * list of the containing cgroup's bfqio_cgroup.
+ * @bfqd_node: node to be inserted into the @bfqd->group_list list
+ * of the groups active on the same device; used for cleanup.
+ * @bfqd: the bfq_data for the device this group acts upon.
+ * @async_bfqq: array of async queues for all the tasks belonging to
+ * the group, one queue per ioprio value per ioprio_class,
+ * except for the idle class that has only one queue.
+ * @async_idle_bfqq: async queue for the idle class (ioprio is ignored).
+ * @my_entity: pointer to @entity, %NULL for the toplevel group; used
+ * to avoid too many special cases during group creation/migration.
+ * @active_entities: number of active entities belonging to the group; unused
+ * for the root group. Used to know whether there are groups
+ * with more than one active @bfq_entity (see the comments
+ * to the function bfq_bfqq_must_not_expire()).
+ *
+ * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
+ * there is a set of bfq_groups, each one collecting the lower-level
+ * entities belonging to the group that are acting on the same device.
+ *
+ * Locking works as follows:
+ * o @group_node is protected by the bfqio_cgroup lock, and is accessed
+ * via RCU from its readers.
+ * o @bfqd is protected by the queue lock, RCU is used to access it
+ * from the readers.
+ * o All the other fields are protected by the @bfqd queue lock.
+ */
+struct bfq_group {
+ struct bfq_entity entity;
+ struct bfq_sched_data sched_data;
+
+ struct hlist_node group_node;
+ struct hlist_node bfqd_node;
+
+ void *bfqd;
+
+ struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+ struct bfq_queue *async_idle_bfqq;
+
+ struct bfq_entity *my_entity;
+
+ int active_entities;
+};
+
+/**
+ * struct bfqio_cgroup - bfq cgroup data structure.
+ * @css: subsystem state for bfq in the containing cgroup.
+ * @online: flag marked when the subsystem is inserted.
+ * @weight: cgroup weight.
+ * @ioprio: cgroup ioprio.
+ * @ioprio_class: cgroup ioprio_class.
+ * @lock: spinlock that protects @ioprio, @ioprio_class and @group_data.
+ * @group_data: list containing the bfq_group belonging to this cgroup.
+ *
+ * @group_data is accessed using RCU, with @lock protecting the updates,
+ * @ioprio and @ioprio_class are protected by @lock.
+ */
+struct bfqio_cgroup {
+ struct cgroup_subsys_state css;
+ bool online;
+
+ unsigned short weight, ioprio, ioprio_class;
+
+ spinlock_t lock;
+ struct hlist_head group_data;
+};
+#else
+struct bfq_group {
+ struct bfq_sched_data sched_data;
+
+ struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+ struct bfq_queue *async_idle_bfqq;
+};
+#endif
+
+static inline struct bfq_service_tree *
+bfq_entity_service_tree(struct bfq_entity *entity)
+{
+ struct bfq_sched_data *sched_data = entity->sched_data;
+ unsigned int idx = entity->ioprio_class - 1;
+
+ BUG_ON(idx >= BFQ_IOPRIO_CLASSES);
+ BUG_ON(sched_data == NULL);
+
+ return sched_data->service_tree + idx;
+}
+
+static inline struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic,
+ int is_sync)
+{
+ return bic->bfqq[!!is_sync];
+}
+
+static inline void bic_set_bfqq(struct bfq_io_cq *bic,
+ struct bfq_queue *bfqq, int is_sync)
+{
+ bic->bfqq[!!is_sync] = bfqq;
+}
+
+static inline struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic)
+{
+ return bic->icq.q->elevator->elevator_data;
+}
+
+/**
+ * bfq_get_bfqd_locked - get a lock to a bfqd using a RCU protected pointer.
+ * @ptr: a pointer to a bfqd.
+ * @flags: storage for the flags to be saved.
+ *
+ * This function allows bfqg->bfqd to be protected by the
+ * queue lock of the bfqd they reference; the pointer is dereferenced
+ * under RCU, so the storage for bfqd is assured to be safe as long
+ * as the RCU read side critical section does not end. After the
+ * bfqd->queue->queue_lock is taken the pointer is rechecked, to be
+ * sure that no other writer accessed it. If we raced with a writer,
+ * the function returns NULL, with the queue unlocked, otherwise it
+ * returns the dereferenced pointer, with the queue locked.
+ */
+static inline struct bfq_data *bfq_get_bfqd_locked(void **ptr,
+ unsigned long *flags)
+{
+ struct bfq_data *bfqd;
+
+ rcu_read_lock();
+ bfqd = rcu_dereference(*(struct bfq_data **)ptr);
+
+ if (bfqd != NULL) {
+ spin_lock_irqsave(bfqd->queue->queue_lock, *flags);
+ if (*ptr == bfqd)
+ goto out;
+ spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
+ }
+
+ bfqd = NULL;
+out:
+ rcu_read_unlock();
+ return bfqd;
+}
+
+static inline void bfq_put_bfqd_unlock(struct bfq_data *bfqd,
+ unsigned long *flags)
+{
+ spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
+}
+
+static void bfq_changed_ioprio(struct bfq_io_cq *bic);
+static void bfq_put_queue(struct bfq_queue *bfqq);
+static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
+static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
+ struct bfq_group *bfqg, int is_sync,
+ struct bfq_io_cq *bic, gfp_t gfp_mask);
+static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
+ struct bfq_group *bfqg);
+static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
+static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
+#endif
diff --git a/block/blk-core.c b/block/blk-core.c
index a0e3096..f29de0f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -47,6 +47,9 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(block_unplug);
DEFINE_IDA(blk_queue_ida);
+int trap_non_toi_io;
+EXPORT_SYMBOL_GPL(trap_non_toi_io);
+
/*
* For the allocated request tables
*/
@@ -1878,6 +1881,9 @@ void submit_bio(int rw, struct bio *bio)
{
bio->bi_rw |= rw;
+ if (unlikely(trap_non_toi_io))
+ BUG_ON(!(bio->bi_flags & BIO_TOI));
+
/*
* If it's a regular read/write or a barrier with data attached,
* go through the normal accounting stuff before submission.
diff --git a/block/genhd.c b/block/genhd.c
index 791f419..97985a4 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -17,6 +17,8 @@
#include <linux/kobj_map.h>
#include <linux/mutex.h>
#include <linux/idr.h>
+#include <linux/ctype.h>
+#include <linux/fs_uuid.h>
#include <linux/log2.h>
#include <linux/pm_runtime.h>
@@ -1375,6 +1377,87 @@ int invalidate_partition(struct gendisk *disk, int partno)
EXPORT_SYMBOL(invalidate_partition);
+dev_t blk_lookup_fs_info(struct fs_info *seek)
+{
+ dev_t devt = MKDEV(0, 0);
+ struct class_dev_iter iter;
+ struct device *dev;
+ int best_score = 0;
+
+ class_dev_iter_init(&iter, &block_class, NULL, &disk_type);
+ while (best_score < 3 && (dev = class_dev_iter_next(&iter))) {
+ struct gendisk *disk = dev_to_disk(dev);
+ struct disk_part_iter piter;
+ struct hd_struct *part;
+
+ disk_part_iter_init(&piter, disk, DISK_PITER_INCL_PART0);
+
+ while (best_score < 3 && (part = disk_part_iter_next(&piter))) {
+ int score = part_matches_fs_info(part, seek);
+ if (score > best_score) {
+ devt = part_devt(part);
+ best_score = score;
+ }
+ }
+ disk_part_iter_exit(&piter);
+ }
+ class_dev_iter_exit(&iter);
+ return devt;
+}
+EXPORT_SYMBOL_GPL(blk_lookup_fs_info);
+
+/* Caller uses NULL, key to start. For each match found, we return a bdev on
+ * which we have done blkdev_get, and we do the blkdev_put on block devices
+ * that are passed to us. When no more matches are found, we return NULL.
+ */
+struct block_device *next_bdev_of_type(struct block_device *last,
+ const char *key)
+{
+ dev_t devt = MKDEV(0, 0);
+ struct class_dev_iter iter;
+ struct device *dev;
+ struct block_device *next = NULL, *bdev;
+ int got_last = 0;
+
+ if (!key)
+ goto out;
+
+ class_dev_iter_init(&iter, &block_class, NULL, &disk_type);
+ while (!devt && (dev = class_dev_iter_next(&iter))) {
+ struct gendisk *disk = dev_to_disk(dev);
+ struct disk_part_iter piter;
+ struct hd_struct *part;
+
+ disk_part_iter_init(&piter, disk, DISK_PITER_INCL_PART0);
+
+ while ((part = disk_part_iter_next(&piter))) {
+ bdev = bdget(part_devt(part));
+ if (last && !got_last) {
+ if (last == bdev)
+ got_last = 1;
+ continue;
+ }
+
+ if (blkdev_get(bdev, FMODE_READ, 0))
+ continue;
+
+ if (bdev_matches_key(bdev, key)) {
+ next = bdev;
+ break;
+ }
+
+ blkdev_put(bdev, FMODE_READ);
+ }
+ disk_part_iter_exit(&piter);
+ }
+ class_dev_iter_exit(&iter);
+out:
+ if (last)
+ blkdev_put(last, FMODE_READ);
+ return next;
+}
+EXPORT_SYMBOL_GPL(next_bdev_of_type);
+
/*
* Disk events - monitor disk events like media change and eject request.
*/
diff --git a/block/uuid.c b/block/uuid.c
new file mode 100644
index 0000000..6ab3f05
--- /dev/null
+++ b/block/uuid.c
@@ -0,0 +1,511 @@
+#include <linux/blkdev.h>
+#include <linux/ctype.h>
+#include <linux/fs_uuid.h>
+#include <linux/slab.h>
+#include <linux/export.h>
+
+static int debug_enabled;
+
+#define PRINTK(fmt, args...) do { \
+ if (debug_enabled) \
+ printk(KERN_DEBUG fmt, ## args); \
+ } while(0)
+
+#define PRINT_HEX_DUMP(v1, v2, v3, v4, v5, v6, v7, v8) \
+ do { \
+ if (debug_enabled) \
+ print_hex_dump(v1, v2, v3, v4, v5, v6, v7, v8); \
+ } while(0)
+
+/*
+ * Simple UUID translation
+ */
+
+struct uuid_info {
+ const char *key;
+ const char *name;
+ long bkoff;
+ unsigned sboff;
+ unsigned sig_len;
+ const char *magic;
+ int uuid_offset;
+ int last_mount_offset;
+ int last_mount_size;
+};
+
+/*
+ * Based on libuuid's blkid_magic array. Note that I don't
+ * have uuid offsets for all of these yet - mssing ones are 0x0.
+ * Further information welcome.
+ *
+ * Rearranged by page of fs signature for optimisation.
+ */
+static struct uuid_info uuid_list[] = {
+ { NULL, "oracleasm", 0, 32, 8, "ORCLDISK", 0x0, 0, 0 },
+ { "ntfs", "ntfs", 0, 3, 8, "NTFS ", 0x0, 0, 0 },
+ { "vfat", "vfat", 0, 0x52, 5, "MSWIN", 0x0, 0, 0 },
+ { "vfat", "vfat", 0, 0x52, 8, "FAT32 ", 0x0, 0, 0 },
+ { "vfat", "vfat", 0, 0x36, 5, "MSDOS", 0x0, 0, 0 },
+ { "vfat", "vfat", 0, 0x36, 8, "FAT16 ", 0x0, 0, 0 },
+ { "vfat", "vfat", 0, 0x36, 8, "FAT12 ", 0x0, 0, 0 },
+ { "vfat", "vfat", 0, 0, 1, "\353", 0x0, 0, 0 },
+ { "vfat", "vfat", 0, 0, 1, "\351", 0x0, 0, 0 },
+ { "vfat", "vfat", 0, 0x1fe, 2, "\125\252", 0x0, 0, 0 },
+ { "xfs", "xfs", 0, 0, 4, "XFSB", 0x20, 0, 0 },
+ { "romfs", "romfs", 0, 0, 8, "-rom1fs-", 0x0, 0, 0 },
+ { "bfs", "bfs", 0, 0, 4, "\316\372\173\033", 0, 0, 0 },
+ { "cramfs", "cramfs", 0, 0, 4, "E=\315\050", 0x0, 0, 0 },
+ { "qnx4", "qnx4", 0, 4, 6, "QNX4FS", 0, 0, 0 },
+ { NULL, "crypt_LUKS", 0, 0, 6, "LUKS\xba\xbe", 0x0, 0, 0 },
+ { "squashfs", "squashfs", 0, 0, 4, "sqsh", 0, 0, 0 },
+ { "squashfs", "squashfs", 0, 0, 4, "hsqs", 0, 0, 0 },
+ { "ocfs", "ocfs", 0, 8, 9, "OracleCFS", 0x0, 0, 0 },
+ { "lvm2pv", "lvm2pv", 0, 0x018, 8, "LVM2 001", 0x0, 0, 0 },
+ { "sysv", "sysv", 0, 0x3f8, 4, "\020~\030\375", 0, 0, 0 },
+ { "ext", "ext", 1, 0x38, 2, "\123\357", 0x468, 0x42c, 4 },
+ { "minix", "minix", 1, 0x10, 2, "\177\023", 0, 0, 0 },
+ { "minix", "minix", 1, 0x10, 2, "\217\023", 0, 0, 0 },
+ { "minix", "minix", 1, 0x10, 2, "\150\044", 0, 0, 0 },
+ { "minix", "minix", 1, 0x10, 2, "\170\044", 0, 0, 0 },
+ { "lvm2pv", "lvm2pv", 1, 0x018, 8, "LVM2 001", 0x0, 0, 0 },
+ { "vxfs", "vxfs", 1, 0, 4, "\365\374\001\245", 0, 0, 0 },
+ { "hfsplus", "hfsplus", 1, 0, 2, "BD", 0x0, 0, 0 },
+ { "hfsplus", "hfsplus", 1, 0, 2, "H+", 0x0, 0, 0 },
+ { "hfsplus", "hfsplus", 1, 0, 2, "HX", 0x0, 0, 0 },
+ { "hfs", "hfs", 1, 0, 2, "BD", 0x0, 0, 0 },
+ { "ocfs2", "ocfs2", 1, 0, 6, "OCFSV2", 0x0, 0, 0 },
+ { "lvm2pv", "lvm2pv", 0, 0x218, 8, "LVM2 001", 0x0, 0, 0 },
+ { "lvm2pv", "lvm2pv", 1, 0x218, 8, "LVM2 001", 0x0, 0, 0 },
+ { "ocfs2", "ocfs2", 2, 0, 6, "OCFSV2", 0x0, 0, 0 },
+ { "swap", "swap", 0, 0xff6, 10, "SWAP-SPACE", 0x40c, 0, 0 },
+ { "swap", "swap", 0, 0xff6, 10, "SWAPSPACE2", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0xff6, 9, "S1SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0xff6, 9, "S2SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0xff6, 9, "ULSUSPEND", 0x40c, 0, 0 },
+ { "ocfs2", "ocfs2", 4, 0, 6, "OCFSV2", 0x0, 0, 0 },
+ { "ocfs2", "ocfs2", 8, 0, 6, "OCFSV2", 0x0, 0, 0 },
+ { "hpfs", "hpfs", 8, 0, 4, "I\350\225\371", 0, 0, 0 },
+ { "reiserfs", "reiserfs", 8, 0x34, 8, "ReIsErFs", 0x10054, 0, 0 },
+ { "reiserfs", "reiserfs", 8, 20, 8, "ReIsErFs", 0x10054, 0, 0 },
+ { "zfs", "zfs", 8, 0, 8, "\0\0\x02\xf5\xb0\x07\xb1\x0c", 0x0, 0, 0 },
+ { "zfs", "zfs", 8, 0, 8, "\x0c\xb1\x07\xb0\xf5\x02\0\0", 0x0, 0, 0 },
+ { "ufs", "ufs", 8, 0x55c, 4, "T\031\001\000", 0, 0, 0 },
+ { "swap", "swap", 0, 0x1ff6, 10, "SWAP-SPACE", 0x40c, 0, 0 },
+ { "swap", "swap", 0, 0x1ff6, 10, "SWAPSPACE2", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0x1ff6, 9, "S1SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0x1ff6, 9, "S2SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0x1ff6, 9, "ULSUSPEND", 0x40c, 0, 0 },
+ { "reiserfs", "reiserfs", 64, 0x34, 9, "ReIsEr2Fs", 0x10054, 0, 0 },
+ { "reiserfs", "reiserfs", 64, 0x34, 9, "ReIsEr3Fs", 0x10054, 0, 0 },
+ { "reiserfs", "reiserfs", 64, 0x34, 8, "ReIsErFs", 0x10054, 0, 0 },
+ { "reiser4", "reiser4", 64, 0, 7, "ReIsEr4", 0x100544, 0, 0 },
+ { "gfs2", "gfs2", 64, 0, 4, "\x01\x16\x19\x70", 0x0, 0, 0 },
+ { "gfs", "gfs", 64, 0, 4, "\x01\x16\x19\x70", 0x0, 0, 0 },
+ { "btrfs", "btrfs", 64, 0x40, 8, "_BHRfS_M", 0x0, 0, 0 },
+ { "swap", "swap", 0, 0x3ff6, 10, "SWAP-SPACE", 0x40c, 0, 0 },
+ { "swap", "swap", 0, 0x3ff6, 10, "SWAPSPACE2", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0x3ff6, 9, "S1SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0x3ff6, 9, "S2SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0x3ff6, 9, "ULSUSPEND", 0x40c, 0, 0 },
+ { "udf", "udf", 32, 1, 5, "BEA01", 0x0, 0, 0 },
+ { "udf", "udf", 32, 1, 5, "BOOT2", 0x0, 0, 0 },
+ { "udf", "udf", 32, 1, 5, "CD001", 0x0, 0, 0 },
+ { "udf", "udf", 32, 1, 5, "CDW02", 0x0, 0, 0 },
+ { "udf", "udf", 32, 1, 5, "NSR02", 0x0, 0, 0 },
+ { "udf", "udf", 32, 1, 5, "NSR03", 0x0, 0, 0 },
+ { "udf", "udf", 32, 1, 5, "TEA01", 0x0, 0, 0 },
+ { "iso9660", "iso9660", 32, 1, 5, "CD001", 0x0, 0, 0 },
+ { "iso9660", "iso9660", 32, 9, 5, "CDROM", 0x0, 0, 0 },
+ { "jfs", "jfs", 32, 0, 4, "JFS1", 0x88, 0, 0 },
+ { "swap", "swap", 0, 0x7ff6, 10, "SWAP-SPACE", 0x40c, 0, 0 },
+ { "swap", "swap", 0, 0x7ff6, 10, "SWAPSPACE2", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0x7ff6, 9, "S1SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0x7ff6, 9, "S2SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0x7ff6, 9, "ULSUSPEND", 0x40c, 0, 0 },
+ { "swap", "swap", 0, 0xfff6, 10, "SWAP-SPACE", 0x40c, 0, 0 },
+ { "swap", "swap", 0, 0xfff6, 10, "SWAPSPACE2", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0xfff6, 9, "S1SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0xfff6, 9, "S2SUSPEND", 0x40c, 0, 0 },
+ { "swap", "swsuspend", 0, 0xfff6, 9, "ULSUSPEND", 0x40c, 0, 0 },
+ { "zfs", "zfs", 264, 0, 8, "\0\0\x02\xf5\xb0\x07\xb1\x0c", 0x0, 0, 0 },
+ { "zfs", "zfs", 264, 0, 8, "\x0c\xb1\x07\xb0\xf5\x02\0\0", 0x0, 0, 0 },
+ { NULL, NULL, 0, 0, 0, NULL, 0x0, 0, 0 }
+};
+
+static int null_uuid(const char *uuid)
+{
+ int i;
+
+ for (i = 0; i < 16 && !uuid[i]; i++);
+
+ return (i == 16);
+}
+
+
+static void uuid_end_bio(struct bio *bio, int err)
+{
+ struct page *page = bio->bi_io_vec[0].bv_page;
+
+ if(!test_bit(BIO_UPTODATE, &bio->bi_flags))
+ SetPageError(page);
+
+ unlock_page(page);
+ bio_put(bio);
+}
+
+
+/**
+ * submit - submit BIO request
+ * @dev: The block device we're using.
+ * @page_num: The page we're reading.
+ *
+ * Based on Patrick Mochell's pmdisk code from long ago: "Straight from the
+ * textbook - allocate and initialize the bio. If we're writing, make sure
+ * the page is marked as dirty. Then submit it and carry on."
+ **/
+static struct page *read_bdev_page(struct block_device *dev, int page_num)
+{
+ struct bio *bio = NULL;
+ struct page *page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+
+ if (!page) {
+ printk(KERN_ERR "Failed to allocate a page for reading data "
+ "in UUID checks.");
+ return NULL;
+ }
+
+ bio = bio_alloc(GFP_NOFS, 1);
+ bio->bi_bdev = dev;
+ bio->bi_iter.bi_sector = page_num << 3;
+ bio->bi_end_io = uuid_end_bio;
+ bio->bi_flags |= (1 << BIO_TOI);
+
+ PRINTK("Submitting bio on device %lx, page %d using bio %p and page %p.\n",
+ (unsigned long) dev->bd_dev, page_num, bio, page);
+
+ if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
+ printk(KERN_DEBUG "ERROR: adding page to bio at %d\n",
+ page_num);
+ bio_put(bio);
+ __free_page(page);
+ printk(KERN_DEBUG "read_bdev_page freed page %p (in error "
+ "path).\n", page);
+ return NULL;
+ }
+
+ lock_page(page);
+ submit_bio(READ | REQ_SYNC, bio);
+
+ wait_on_page_locked(page);
+ if (PageError(page)) {
+ __free_page(page);
+ page = NULL;
+ }
+ return page;
+}
+
+int bdev_matches_key(struct block_device *bdev, const char *key)
+{
+ unsigned char *data = NULL;
+ struct page *data_page = NULL;
+
+ int dev_offset, pg_num, pg_off, i;
+ int last_pg_num = -1;
+ int result = 0;
+ char buf[50];
+
+ if (null_uuid(key)) {
+ PRINTK("Refusing to find a NULL key.\n");
+ return 0;
+ }
+
+ if (!bdev->bd_disk) {
+ bdevname(bdev, buf);
+ PRINTK("bdev %s has no bd_disk.\n", buf);
+ return 0;
+ }
+
+ if (!bdev->bd_disk->queue) {
+ bdevname(bdev, buf);
+ PRINTK("bdev %s has no queue.\n", buf);
+ return 0;
+ }
+
+ for (i = 0; uuid_list[i].name; i++) {
+ struct uuid_info *dat = &uuid_list[i];
+
+ if (!dat->key || strcmp(dat->key, key))
+ continue;
+
+ dev_offset = (dat->bkoff << 10) + dat->sboff;
+ pg_num = dev_offset >> 12;
+ pg_off = dev_offset & 0xfff;
+
+ if ((((pg_num + 1) << 3) - 1) > bdev->bd_part->nr_sects >> 1)
+ continue;
+
+ if (pg_num != last_pg_num) {
+ if (data_page) {
+ kunmap(data_page);
+ __free_page(data_page);
+ }
+ data_page = read_bdev_page(bdev, pg_num);
+ if (!data_page)
+ continue;
+ data = kmap(data_page);
+ }
+
+ last_pg_num = pg_num;
+
+ if (strncmp(&data[pg_off], dat->magic, dat->sig_len))
+ continue;
+
+ result = 1;
+ break;
+ }
+
+ if (data_page) {
+ kunmap(data_page);
+ __free_page(data_page);
+ }
+
+ return result;
+}
+
+/*
+ * part_matches_fs_info - Does the given partition match the details given?
+ *
+ * Returns a score saying how good the match is.
+ * 0 = no UUID match.
+ * 1 = UUID but last mount time differs.
+ * 2 = UUID, last mount time but not dev_t
+ * 3 = perfect match
+ *
+ * This lets us cope elegantly with probing resulting in dev_ts changing
+ * from boot to boot, and with the case where a user copies a partition
+ * (UUID is non unique), and we need to check the last mount time of the
+ * correct partition.
+ */
+int part_matches_fs_info(struct hd_struct *part, struct fs_info *seek)
+{
+ struct block_device *bdev;
+ struct fs_info *got;
+ int result = 0;
+ char buf[50];
+
+ if (null_uuid((char *) &seek->uuid)) {
+ PRINTK("Refusing to find a NULL uuid.\n");
+ return 0;
+ }
+
+ bdev = bdget(part_devt(part));
+
+ PRINTK("part_matches fs info considering %x.\n", part_devt(part));
+
+ if (blkdev_get(bdev, FMODE_READ, 0)) {
+ PRINTK("blkdev_get failed.\n");
+ return 0;
+ }
+
+ if (!bdev->bd_disk) {
+ bdevname(bdev, buf);
+ PRINTK("bdev %s has no bd_disk.\n", buf);
+ goto out;
+ }
+
+ if (!bdev->bd_disk->queue) {
+ bdevname(bdev, buf);
+ PRINTK("bdev %s has no queue.\n", buf);
+ goto out;
+ }
+
+ got = fs_info_from_block_dev(bdev);
+
+ if (got && !memcmp(got->uuid, seek->uuid, 16)) {
+ PRINTK(" Have matching UUID.\n");
+ PRINTK(" Got: LMS %d, LM %p.\n", got->last_mount_size, got->last_mount);
+ PRINTK(" Seek: LMS %d, LM %p.\n", seek->last_mount_size, seek->last_mount);
+ result = 1;
+
+ if (got->last_mount_size == seek->last_mount_size &&
+ got->last_mount && seek->last_mount &&
+ !memcmp(got->last_mount, seek->last_mount,
+ got->last_mount_size)) {
+ result = 2;
+
+ PRINTK(" Matching last mount time.\n");
+
+ if (part_devt(part) == seek->dev_t) {
+ result = 3;
+ PRINTK(" Matching dev_t.\n");
+ } else
+ PRINTK("Dev_ts differ (%x vs %x).\n", part_devt(part), seek->dev_t);
+ }
+ }
+
+ PRINTK(" Score for %x is %d.\n", part_devt(part), result);
+ free_fs_info(got);
+out:
+ blkdev_put(bdev, FMODE_READ);
+ return result;
+}
+
+void free_fs_info(struct fs_info *fs_info)
+{
+ if (!fs_info || IS_ERR(fs_info))
+ return;
+
+ if (fs_info->last_mount)
+ kfree(fs_info->last_mount);
+
+ kfree(fs_info);
+}
+EXPORT_SYMBOL_GPL(free_fs_info);
+
+struct fs_info *fs_info_from_block_dev(struct block_device *bdev)
+{
+ unsigned char *data = NULL;
+ struct page *data_page = NULL;
+
+ int dev_offset, pg_num, pg_off;
+ int uuid_pg_num, uuid_pg_off, i;
+ unsigned char *uuid_data = NULL;
+ struct page *uuid_data_page = NULL;
+
+ int last_pg_num = -1, last_uuid_pg_num = 0;
+ char buf[50];
+ struct fs_info *fs_info = NULL;
+
+ bdevname(bdev, buf);
+
+ PRINTK("uuid_from_block_dev looking for partition type of %s.\n", buf);
+
+ for (i = 0; uuid_list[i].name; i++) {
+ struct uuid_info *dat = &uuid_list[i];
+ dev_offset = (dat->bkoff << 10) + dat->sboff;
+ pg_num = dev_offset >> 12;
+ pg_off = dev_offset & 0xfff;
+ uuid_pg_num = dat->uuid_offset >> 12;
+ uuid_pg_off = dat->uuid_offset & 0xfff;
+
+ if ((((pg_num + 1) << 3) - 1) > bdev->bd_part->nr_sects >> 1)
+ continue;
+
+ /* Ignore partition types with no UUID offset */
+ if (!dat->uuid_offset)
+ continue;
+
+ if (pg_num != last_pg_num) {
+ if (data_page) {
+ kunmap(data_page);
+ __free_page(data_page);
+ }
+ data_page = read_bdev_page(bdev, pg_num);
+ if (!data_page)
+ continue;
+ data = kmap(data_page);
+ }
+
+ last_pg_num = pg_num;
+
+ if (strncmp(&data[pg_off], dat->magic, dat->sig_len))
+ continue;
+
+ PRINTK("This partition looks like %s.\n", dat->name);
+
+ fs_info = kzalloc(sizeof(struct fs_info), GFP_KERNEL);
+
+ if (!fs_info) {
+ PRINTK("Failed to allocate fs_info struct.");
+ fs_info = ERR_PTR(-ENOMEM);
+ break;
+ }
+
+ /* UUID can't be off the end of the disk */
+ if ((uuid_pg_num > bdev->bd_part->nr_sects >> 3) ||
+ !dat->uuid_offset)
+ goto no_uuid;
+
+ if (!uuid_data || uuid_pg_num != last_uuid_pg_num) {
+ /* No need to reread the page from above */
+ if (uuid_pg_num == pg_num && uuid_data)
+ memcpy(uuid_data, data, PAGE_SIZE);
+ else {
+ if (uuid_data_page) {
+ kunmap(uuid_data_page);
+ __free_page(uuid_data_page);
+ }
+ uuid_data_page = read_bdev_page(bdev, uuid_pg_num);
+ if (!uuid_data_page)
+ continue;
+ uuid_data = kmap(uuid_data_page);
+ }
+ }
+
+ last_uuid_pg_num = uuid_pg_num;
+ memcpy(&fs_info->uuid, &uuid_data[uuid_pg_off], 16);
+ fs_info->dev_t = bdev->bd_dev;
+
+no_uuid:
+ PRINT_HEX_DUMP(KERN_EMERG, "fs_info_from_block_dev "
+ "returning uuid ", DUMP_PREFIX_NONE, 16, 1,
+ fs_info->uuid, 16, 0);
+
+ if (dat->last_mount_size) {
+ int pg = dat->last_mount_offset >> 12, sz;
+ int off = dat->last_mount_offset & 0xfff;
+ struct page *last_mount = read_bdev_page(bdev, pg);
+ unsigned char *last_mount_data;
+ char *ptr;
+
+ if (!last_mount) {
+ fs_info = ERR_PTR(-ENOMEM);
+ break;
+ }
+ last_mount_data = kmap(last_mount);
+ sz = dat->last_mount_size;
+ ptr = kmalloc(sz, GFP_KERNEL);
+
+ if (!ptr) {
+ printk(KERN_EMERG "fs_info_from_block_dev "
+ "failed to get memory for last mount "
+ "timestamp.");
+ free_fs_info(fs_info);
+ fs_info = ERR_PTR(-ENOMEM);
+ } else {
+ fs_info->last_mount = ptr;
+ fs_info->last_mount_size = sz;
+ memcpy(ptr, &last_mount_data[off], sz);
+ }
+
+ kunmap(last_mount);
+ __free_page(last_mount);
+ }
+ break;
+ }
+
+ if (data_page) {
+ kunmap(data_page);
+ __free_page(data_page);
+ }
+
+ if (uuid_data_page) {
+ kunmap(uuid_data_page);
+ __free_page(uuid_data_page);
+ }
+
+ return fs_info;
+}
+EXPORT_SYMBOL_GPL(fs_info_from_block_dev);
+
+static int __init uuid_debug_setup(char *str)
+{
+ int value;
+
+ if (sscanf(str, "=%d", &value))
+ debug_enabled = value;
+
+ return 1;
+}
+
+__setup("uuid_debug", uuid_debug_setup);
diff --git a/configs/dell-vostro-3360.config b/configs/dell-vostro-3360.config
new file mode 100644
index 0000000..8841c7d
--- /dev/null
+++ b/configs/dell-vostro-3360.config
@@ -0,0 +1,4880 @@
+#
+# Automatically generated file; DO NOT EDIT.
+# Linux/x86 3.15.0-pf1 Kernel Configuration
+#
+CONFIG_64BIT=y
+CONFIG_X86_64=y
+CONFIG_X86=y
+CONFIG_INSTRUCTION_DECODER=y
+CONFIG_OUTPUT_FORMAT="elf64-x86-64"
+CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
+CONFIG_LOCKDEP_SUPPORT=y
+CONFIG_STACKTRACE_SUPPORT=y
+CONFIG_HAVE_LATENCYTOP_SUPPORT=y
+CONFIG_MMU=y
+CONFIG_NEED_DMA_MAP_STATE=y
+CONFIG_NEED_SG_DMA_LENGTH=y
+CONFIG_GENERIC_ISA_DMA=y
+CONFIG_GENERIC_BUG=y
+CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
+CONFIG_GENERIC_HWEIGHT=y
+CONFIG_ARCH_MAY_HAVE_PC_FDC=y
+CONFIG_RWSEM_XCHGADD_ALGORITHM=y
+CONFIG_GENERIC_CALIBRATE_DELAY=y
+CONFIG_ARCH_HAS_CPU_RELAX=y
+CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
+CONFIG_HAVE_SETUP_PER_CPU_AREA=y
+CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
+CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
+CONFIG_ARCH_HIBERNATION_POSSIBLE=y
+CONFIG_ARCH_SUSPEND_POSSIBLE=y
+CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
+CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
+CONFIG_ZONE_DMA32=y
+CONFIG_AUDIT_ARCH=y
+CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
+CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
+CONFIG_HAVE_INTEL_TXT=y
+CONFIG_X86_64_SMP=y
+CONFIG_X86_HT=y
+CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11"
+CONFIG_ARCH_SUPPORTS_UPROBES=y
+CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
+CONFIG_IRQ_WORK=y
+CONFIG_BUILDTIME_EXTABLE_SORT=y
+
+#
+# General setup
+#
+CONFIG_INIT_ENV_ARG_LIMIT=32
+CONFIG_CROSS_COMPILE=""
+# CONFIG_COMPILE_TEST is not set
+CONFIG_LOCALVERSION=""
+CONFIG_LOCALVERSION_AUTO=y
+CONFIG_HAVE_KERNEL_GZIP=y
+CONFIG_HAVE_KERNEL_BZIP2=y
+CONFIG_HAVE_KERNEL_LZMA=y
+CONFIG_HAVE_KERNEL_XZ=y
+CONFIG_HAVE_KERNEL_LZO=y
+CONFIG_HAVE_KERNEL_LZ4=y
+CONFIG_KERNEL_GZIP=y
+# CONFIG_KERNEL_BZIP2 is not set
+# CONFIG_KERNEL_LZMA is not set
+# CONFIG_KERNEL_XZ is not set
+# CONFIG_KERNEL_LZO is not set
+# CONFIG_KERNEL_LZ4 is not set
+CONFIG_DEFAULT_HOSTNAME="spock"
+CONFIG_SWAP=y
+CONFIG_SYSVIPC=y
+CONFIG_SYSVIPC_SYSCTL=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_POSIX_MQUEUE_SYSCTL=y
+CONFIG_FHANDLE=y
+CONFIG_USELIB=y
+CONFIG_AUDIT=y
+CONFIG_HAVE_ARCH_AUDITSYSCALL=y
+CONFIG_AUDITSYSCALL=y
+CONFIG_AUDIT_WATCH=y
+CONFIG_AUDIT_TREE=y
+
+#
+# IRQ subsystem
+#
+CONFIG_GENERIC_IRQ_PROBE=y
+CONFIG_GENERIC_IRQ_SHOW=y
+CONFIG_GENERIC_PENDING_IRQ=y
+CONFIG_IRQ_DOMAIN=y
+# CONFIG_IRQ_DOMAIN_DEBUG is not set
+CONFIG_IRQ_FORCED_THREADING=y
+CONFIG_SPARSE_IRQ=y
+CONFIG_CLOCKSOURCE_WATCHDOG=y
+CONFIG_ARCH_CLOCKSOURCE_DATA=y
+CONFIG_GENERIC_TIME_VSYSCALL=y
+CONFIG_GENERIC_CLOCKEVENTS=y
+CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
+CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
+CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
+CONFIG_GENERIC_CMOS_UPDATE=y
+
+#
+# Timers subsystem
+#
+CONFIG_TICK_ONESHOT=y
+CONFIG_NO_HZ_COMMON=y
+# CONFIG_HZ_PERIODIC is not set
+CONFIG_NO_HZ_IDLE=y
+# CONFIG_NO_HZ_FULL is not set
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+
+#
+# CPU/Task time and stats accounting
+#
+# CONFIG_TICK_CPU_ACCOUNTING is not set
+# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
+CONFIG_IRQ_TIME_ACCOUNTING=y
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_BSD_PROCESS_ACCT_V3=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+
+#
+# RCU Subsystem
+#
+CONFIG_TREE_PREEMPT_RCU=y
+CONFIG_PREEMPT_RCU=y
+CONFIG_RCU_STALL_COMMON=y
+# CONFIG_RCU_USER_QS is not set
+CONFIG_RCU_FANOUT=64
+CONFIG_RCU_FANOUT_LEAF=16
+# CONFIG_RCU_FANOUT_EXACT is not set
+CONFIG_RCU_FAST_NO_HZ=y
+# CONFIG_TREE_RCU_TRACE is not set
+CONFIG_RCU_BOOST=y
+CONFIG_RCU_BOOST_PRIO=1
+CONFIG_RCU_BOOST_DELAY=500
+# CONFIG_RCU_NOCB_CPU is not set
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_LOG_BUF_SHIFT=19
+CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
+CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
+CONFIG_ARCH_SUPPORTS_INT128=y
+CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y
+CONFIG_ARCH_USES_NUMA_PROT_NONE=y
+CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
+CONFIG_NUMA_BALANCING=y
+CONFIG_CGROUPS=y
+# CONFIG_CGROUP_DEBUG is not set
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CGROUP_DEVICE=y
+CONFIG_CPUSETS=y
+CONFIG_PROC_PID_CPUSET=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_RESOURCE_COUNTERS=y
+CONFIG_MEMCG=y
+CONFIG_MEMCG_SWAP=y
+CONFIG_MEMCG_SWAP_ENABLED=y
+CONFIG_MEMCG_KMEM=y
+CONFIG_CGROUP_HUGETLB=y
+CONFIG_CGROUP_PERF=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_FAIR_GROUP_SCHED=y
+CONFIG_CFS_BANDWIDTH=y
+CONFIG_RT_GROUP_SCHED=y
+CONFIG_BLK_CGROUP=y
+# CONFIG_DEBUG_BLK_CGROUP is not set
+CONFIG_CHECKPOINT_RESTORE=y
+CONFIG_NAMESPACES=y
+CONFIG_UTS_NS=y
+CONFIG_IPC_NS=y
+CONFIG_USER_NS=y
+CONFIG_PID_NS=y
+CONFIG_NET_NS=y
+CONFIG_SCHED_AUTOGROUP=y
+CONFIG_MM_OWNER=y
+# CONFIG_SYSFS_DEPRECATED is not set
+CONFIG_RELAY=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_INITRAMFS_SOURCE=""
+CONFIG_RD_GZIP=y
+CONFIG_RD_BZIP2=y
+CONFIG_RD_LZMA=y
+CONFIG_RD_XZ=y
+CONFIG_RD_LZO=y
+CONFIG_RD_LZ4=y
+# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
+CONFIG_SYSCTL=y
+CONFIG_ANON_INODES=y
+CONFIG_HAVE_UID16=y
+CONFIG_SYSCTL_EXCEPTION_TRACE=y
+CONFIG_HAVE_PCSPKR_PLATFORM=y
+CONFIG_EXPERT=y
+CONFIG_UID16=y
+CONFIG_SYSFS_SYSCALL=y
+# CONFIG_SYSCTL_SYSCALL is not set
+CONFIG_KALLSYMS=y
+CONFIG_KALLSYMS_ALL=y
+CONFIG_PRINTK=y
+CONFIG_BUG=y
+CONFIG_ELF_CORE=y
+# CONFIG_PCSPKR_PLATFORM is not set
+CONFIG_BASE_FULL=y
+CONFIG_FUTEX=y
+CONFIG_EPOLL=y
+CONFIG_SIGNALFD=y
+CONFIG_TIMERFD=y
+CONFIG_EVENTFD=y
+CONFIG_SHMEM=y
+CONFIG_AIO=y
+CONFIG_PCI_QUIRKS=y
+# CONFIG_EMBEDDED is not set
+CONFIG_HAVE_PERF_EVENTS=y
+
+#
+# Kernel Performance Events And Counters
+#
+CONFIG_PERF_EVENTS=y
+# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
+CONFIG_VM_EVENT_COUNTERS=y
+# CONFIG_SLUB_DEBUG is not set
+# CONFIG_COMPAT_BRK is not set
+# CONFIG_SLAB is not set
+CONFIG_SLUB=y
+# CONFIG_SLOB is not set
+CONFIG_SLUB_CPU_PARTIAL=y
+# CONFIG_SYSTEM_TRUSTED_KEYRING is not set
+# CONFIG_PROFILING is not set
+CONFIG_TRACEPOINTS=y
+CONFIG_HAVE_OPROFILE=y
+CONFIG_OPROFILE_NMI_TIMER=y
+# CONFIG_KPROBES is not set
+CONFIG_JUMP_LABEL=y
+CONFIG_UPROBES=y
+# CONFIG_HAVE_64BIT_ALIGNED_ACCESS is not set
+CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
+CONFIG_ARCH_USE_BUILTIN_BSWAP=y
+CONFIG_USER_RETURN_NOTIFIER=y
+CONFIG_HAVE_IOREMAP_PROT=y
+CONFIG_HAVE_KPROBES=y
+CONFIG_HAVE_KRETPROBES=y
+CONFIG_HAVE_OPTPROBES=y
+CONFIG_HAVE_KPROBES_ON_FTRACE=y
+CONFIG_HAVE_ARCH_TRACEHOOK=y
+CONFIG_HAVE_DMA_ATTRS=y
+CONFIG_GENERIC_SMP_IDLE_THREAD=y
+CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
+CONFIG_HAVE_CLK=y
+CONFIG_HAVE_DMA_API_DEBUG=y
+CONFIG_HAVE_HW_BREAKPOINT=y
+CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
+CONFIG_HAVE_USER_RETURN_NOTIFIER=y
+CONFIG_HAVE_PERF_EVENTS_NMI=y
+CONFIG_HAVE_PERF_REGS=y
+CONFIG_HAVE_PERF_USER_STACK_DUMP=y
+CONFIG_HAVE_ARCH_JUMP_LABEL=y
+CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
+CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
+CONFIG_HAVE_CMPXCHG_LOCAL=y
+CONFIG_HAVE_CMPXCHG_DOUBLE=y
+CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
+CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
+CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
+CONFIG_SECCOMP_FILTER=y
+CONFIG_HAVE_CC_STACKPROTECTOR=y
+# CONFIG_CC_STACKPROTECTOR is not set
+CONFIG_CC_STACKPROTECTOR_NONE=y
+# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
+# CONFIG_CC_STACKPROTECTOR_STRONG is not set
+CONFIG_HAVE_CONTEXT_TRACKING=y
+CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
+CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
+CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
+CONFIG_HAVE_ARCH_SOFT_DIRTY=y
+CONFIG_MODULES_USE_ELF_RELA=y
+CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
+CONFIG_OLD_SIGSUSPEND3=y
+CONFIG_COMPAT_OLD_SIGACTION=y
+
+#
+# GCOV-based kernel profiling
+#
+# CONFIG_GCOV_KERNEL is not set
+# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
+CONFIG_RT_MUTEXES=y
+CONFIG_BASE_SMALL=0
+CONFIG_MODULES=y
+CONFIG_MODULE_FORCE_LOAD=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_MODULE_FORCE_UNLOAD=y
+# CONFIG_MODVERSIONS is not set
+# CONFIG_MODULE_SRCVERSION_ALL is not set
+# CONFIG_MODULE_SIG is not set
+CONFIG_STOP_MACHINE=y
+CONFIG_BLOCK=y
+CONFIG_BLK_DEV_BSG=y
+CONFIG_BLK_DEV_BSGLIB=y
+CONFIG_BLK_DEV_INTEGRITY=y
+CONFIG_BLK_DEV_THROTTLING=y
+CONFIG_BLK_CMDLINE_PARSER=y
+
+#
+# Partition Types
+#
+CONFIG_PARTITION_ADVANCED=y
+# CONFIG_ACORN_PARTITION is not set
+# CONFIG_AIX_PARTITION is not set
+# CONFIG_OSF_PARTITION is not set
+# CONFIG_AMIGA_PARTITION is not set
+# CONFIG_ATARI_PARTITION is not set
+CONFIG_MAC_PARTITION=y
+CONFIG_MSDOS_PARTITION=y
+CONFIG_BSD_DISKLABEL=y
+# CONFIG_MINIX_SUBPARTITION is not set
+# CONFIG_SOLARIS_X86_PARTITION is not set
+# CONFIG_UNIXWARE_DISKLABEL is not set
+CONFIG_LDM_PARTITION=y
+# CONFIG_LDM_DEBUG is not set
+# CONFIG_SGI_PARTITION is not set
+# CONFIG_ULTRIX_PARTITION is not set
+# CONFIG_SUN_PARTITION is not set
+# CONFIG_KARMA_PARTITION is not set
+CONFIG_EFI_PARTITION=y
+# CONFIG_SYSV68_PARTITION is not set
+CONFIG_CMDLINE_PARTITION=y
+CONFIG_BLOCK_COMPAT=y
+
+#
+# IO Schedulers
+#
+CONFIG_IOSCHED_NOOP=y
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_IOSCHED_CFQ is not set
+CONFIG_IOSCHED_BFQ=y
+CONFIG_CGROUP_BFQIO=y
+CONFIG_DEFAULT_BFQ=y
+# CONFIG_DEFAULT_NOOP is not set
+CONFIG_DEFAULT_IOSCHED="bfq"
+CONFIG_PREEMPT_NOTIFIERS=y
+CONFIG_PADATA=y
+CONFIG_ASN1=m
+CONFIG_UNINLINE_SPIN_UNLOCK=y
+CONFIG_MUTEX_SPIN_ON_OWNER=y
+CONFIG_FREEZER=y
+
+#
+# Processor type and features
+#
+CONFIG_ZONE_DMA=y
+CONFIG_SMP=y
+CONFIG_X86_X2APIC=y
+# CONFIG_X86_MPPARSE is not set
+# CONFIG_X86_EXTENDED_PLATFORM is not set
+CONFIG_X86_INTEL_LPSS=y
+CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
+CONFIG_SCHED_OMIT_FRAME_POINTER=y
+CONFIG_HYPERVISOR_GUEST=y
+CONFIG_PARAVIRT=y
+# CONFIG_PARAVIRT_DEBUG is not set
+CONFIG_PARAVIRT_SPINLOCKS=y
+CONFIG_XEN=y
+CONFIG_XEN_DOM0=y
+CONFIG_XEN_PVHVM=y
+CONFIG_XEN_MAX_DOMAIN_MEMORY=500
+CONFIG_XEN_SAVE_RESTORE=y
+# CONFIG_XEN_DEBUG_FS is not set
+CONFIG_XEN_PVH=y
+CONFIG_KVM_GUEST=y
+# CONFIG_KVM_DEBUG_FS is not set
+CONFIG_PARAVIRT_TIME_ACCOUNTING=y
+CONFIG_PARAVIRT_CLOCK=y
+CONFIG_NO_BOOTMEM=y
+CONFIG_MEMTEST=y
+# CONFIG_MK8 is not set
+# CONFIG_MPSC is not set
+# CONFIG_MCORE2 is not set
+# CONFIG_MATOM is not set
+CONFIG_GENERIC_CPU=y
+CONFIG_X86_INTERNODE_CACHE_SHIFT=6
+CONFIG_X86_L1_CACHE_SHIFT=6
+CONFIG_X86_TSC=y
+CONFIG_X86_CMPXCHG64=y
+CONFIG_X86_CMOV=y
+CONFIG_X86_MINIMUM_CPU_FAMILY=64
+CONFIG_X86_DEBUGCTLMSR=y
+CONFIG_PROCESSOR_SELECT=y
+CONFIG_CPU_SUP_INTEL=y
+# CONFIG_CPU_SUP_AMD is not set
+# CONFIG_CPU_SUP_CENTAUR is not set
+CONFIG_HPET_TIMER=y
+CONFIG_HPET_EMULATE_RTC=y
+CONFIG_DMI=y
+# CONFIG_CALGARY_IOMMU is not set
+CONFIG_SWIOTLB=y
+CONFIG_IOMMU_HELPER=y
+# CONFIG_MAXSMP is not set
+CONFIG_NR_CPUS=64
+CONFIG_SCHED_SMT=y
+CONFIG_SCHED_MC=y
+# CONFIG_PREEMPT_NONE is not set
+# CONFIG_PREEMPT_VOLUNTARY is not set
+CONFIG_PREEMPT=y
+CONFIG_PREEMPT_COUNT=y
+CONFIG_X86_LOCAL_APIC=y
+CONFIG_X86_IO_APIC=y
+CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
+CONFIG_X86_MCE=y
+CONFIG_X86_MCE_INTEL=y
+# CONFIG_X86_MCE_AMD is not set
+CONFIG_X86_MCE_THRESHOLD=y
+# CONFIG_X86_MCE_INJECT is not set
+CONFIG_X86_THERMAL_VECTOR=y
+CONFIG_I8K=y
+CONFIG_MICROCODE=m
+CONFIG_MICROCODE_INTEL=y
+# CONFIG_MICROCODE_AMD is not set
+CONFIG_MICROCODE_OLD_INTERFACE=y
+# CONFIG_MICROCODE_INTEL_EARLY is not set
+# CONFIG_MICROCODE_AMD_EARLY is not set
+CONFIG_X86_MSR=m
+CONFIG_X86_CPUID=m
+CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
+CONFIG_ARCH_DMA_ADDR_T_64BIT=y
+CONFIG_DIRECT_GBPAGES=y
+CONFIG_NUMA=y
+# CONFIG_AMD_NUMA is not set
+CONFIG_X86_64_ACPI_NUMA=y
+CONFIG_NODES_SPAN_OTHER_NODES=y
+# CONFIG_NUMA_EMU is not set
+CONFIG_NODES_SHIFT=6
+CONFIG_ARCH_SPARSEMEM_ENABLE=y
+CONFIG_ARCH_SPARSEMEM_DEFAULT=y
+CONFIG_ARCH_SELECT_MEMORY_MODEL=y
+CONFIG_ARCH_PROC_KCORE_TEXT=y
+CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
+CONFIG_SELECT_MEMORY_MODEL=y
+CONFIG_SPARSEMEM_MANUAL=y
+CONFIG_SPARSEMEM=y
+CONFIG_NEED_MULTIPLE_NODES=y
+CONFIG_HAVE_MEMORY_PRESENT=y
+CONFIG_SPARSEMEM_EXTREME=y
+CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
+CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y
+CONFIG_SPARSEMEM_VMEMMAP=y
+CONFIG_HAVE_MEMBLOCK=y
+CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
+CONFIG_ARCH_DISCARD_MEMBLOCK=y
+CONFIG_MEMORY_ISOLATION=y
+# CONFIG_MOVABLE_NODE is not set
+# CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
+# CONFIG_MEMORY_HOTPLUG is not set
+CONFIG_PAGEFLAGS_EXTENDED=y
+CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
+CONFIG_BALLOON_COMPACTION=y
+CONFIG_COMPACTION=y
+CONFIG_MIGRATION=y
+CONFIG_PHYS_ADDR_T_64BIT=y
+CONFIG_ZONE_DMA_FLAG=1
+CONFIG_BOUNCE=y
+CONFIG_VIRT_TO_BUS=y
+CONFIG_MMU_NOTIFIER=y
+CONFIG_KSM=y
+CONFIG_UKSM=y
+# CONFIG_KSM_LEGACY is not set
+CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
+CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
+# CONFIG_MEMORY_FAILURE is not set
+CONFIG_TRANSPARENT_HUGEPAGE=y
+# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
+CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
+CONFIG_CROSS_MEMORY_ATTACH=y
+CONFIG_CLEANCACHE=y
+CONFIG_FRONTSWAP=y
+CONFIG_CMA=y
+# CONFIG_CMA_DEBUG is not set
+CONFIG_ZBUD=y
+CONFIG_ZSWAP=y
+CONFIG_MEM_SOFT_DIRTY=y
+CONFIG_ZSMALLOC=y
+CONFIG_PGTABLE_MAPPING=y
+CONFIG_GENERIC_EARLY_IOREMAP=y
+CONFIG_X86_CHECK_BIOS_CORRUPTION=y
+CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
+CONFIG_X86_RESERVE_LOW=64
+CONFIG_MTRR=y
+CONFIG_MTRR_SANITIZER=y
+CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
+CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
+CONFIG_X86_PAT=y
+CONFIG_ARCH_USES_PG_UNCACHED=y
+CONFIG_ARCH_RANDOM=y
+CONFIG_X86_SMAP=y
+CONFIG_EFI=y
+CONFIG_EFI_STUB=y
+# CONFIG_EFI_MIXED is not set
+CONFIG_SECCOMP=y
+CONFIG_HZ_100=y
+# CONFIG_HZ_250 is not set
+# CONFIG_HZ_300 is not set
+# CONFIG_HZ_1000 is not set
+CONFIG_HZ=100
+CONFIG_SCHED_HRTICK=y
+CONFIG_KEXEC=y
+# CONFIG_CRASH_DUMP is not set
+CONFIG_KEXEC_JUMP=y
+CONFIG_PHYSICAL_START=0x1000000
+CONFIG_RELOCATABLE=y
+CONFIG_PHYSICAL_ALIGN=0x1000000
+CONFIG_HOTPLUG_CPU=y
+# CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set
+# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
+# CONFIG_COMPAT_VDSO is not set
+# CONFIG_CMDLINE_BOOL is not set
+CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
+CONFIG_USE_PERCPU_NUMA_NODE_ID=y
+
+#
+# Power management and ACPI options
+#
+CONFIG_ARCH_HIBERNATION_HEADER=y
+CONFIG_SUSPEND=y
+CONFIG_SUSPEND_FREEZER=y
+CONFIG_HIBERNATE_CALLBACKS=y
+CONFIG_HIBERNATION=y
+CONFIG_PM_STD_PARTITION="/dev/sdb1"
+CONFIG_TOI_CORE=y
+
+#
+# Image Storage (you need at least one allocator)
+#
+CONFIG_TOI_FILE=y
+CONFIG_TOI_SWAP=y
+
+#
+# General Options
+#
+CONFIG_TOI_CRYPTO=y
+CONFIG_TOI_USERUI=y
+CONFIG_TOI_USERUI_DEFAULT_PATH="/usr/sbin/tuxoniceui"
+CONFIG_TOI_DEFAULT_IMAGE_SIZE_LIMIT=-2
+# CONFIG_TOI_KEEP_IMAGE is not set
+# CONFIG_TOI_INCREMENTAL is not set
+
+#
+# No incremental image support available without Keep Image support.
+#
+# CONFIG_TOI_REPLACE_SWSUSP is not set
+# CONFIG_TOI_IGNORE_LATE_INITCALL is not set
+CONFIG_TOI_DEFAULT_WAIT=25
+CONFIG_TOI_DEFAULT_EXTRA_PAGES_ALLOWANCE=2000
+CONFIG_TOI_CHECKSUM=y
+CONFIG_TOI=y
+CONFIG_TOI_ZRAM_SUPPORT=y
+CONFIG_PM_SLEEP=y
+CONFIG_PM_SLEEP_SMP=y
+# CONFIG_PM_AUTOSLEEP is not set
+# CONFIG_PM_WAKELOCKS is not set
+CONFIG_PM_RUNTIME=y
+CONFIG_PM=y
+# CONFIG_PM_DEBUG is not set
+CONFIG_PM_CLK=y
+CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y
+CONFIG_ACPI=y
+CONFIG_ACPI_SLEEP=y
+# CONFIG_ACPI_PROCFS_POWER is not set
+CONFIG_ACPI_EC_DEBUGFS=m
+CONFIG_ACPI_AC=y
+CONFIG_ACPI_BATTERY=y
+CONFIG_ACPI_BUTTON=y
+CONFIG_ACPI_VIDEO=y
+CONFIG_ACPI_FAN=y
+CONFIG_ACPI_DOCK=y
+CONFIG_ACPI_PROCESSOR=y
+CONFIG_ACPI_HOTPLUG_CPU=y
+CONFIG_ACPI_PROCESSOR_AGGREGATOR=m
+CONFIG_ACPI_THERMAL=y
+CONFIG_ACPI_NUMA=y
+# CONFIG_ACPI_CUSTOM_DSDT is not set
+CONFIG_ACPI_INITRD_TABLE_OVERRIDE=y
+# CONFIG_ACPI_DEBUG is not set
+CONFIG_ACPI_PCI_SLOT=y
+CONFIG_X86_PM_TIMER=y
+CONFIG_ACPI_CONTAINER=y
+CONFIG_ACPI_SBS=m
+CONFIG_ACPI_HED=y
+CONFIG_ACPI_CUSTOM_METHOD=m
+# CONFIG_ACPI_BGRT is not set
+# CONFIG_ACPI_REDUCED_HARDWARE_ONLY is not set
+CONFIG_ACPI_APEI=y
+CONFIG_ACPI_APEI_GHES=y
+CONFIG_ACPI_APEI_PCIEAER=y
+# CONFIG_ACPI_APEI_EINJ is not set
+# CONFIG_ACPI_APEI_ERST_DEBUG is not set
+CONFIG_ACPI_EXTLOG=m
+# CONFIG_SFI is not set
+
+#
+# CPU Frequency scaling
+#
+CONFIG_CPU_FREQ=y
+CONFIG_CPU_FREQ_GOV_COMMON=y
+CONFIG_CPU_FREQ_STAT=y
+CONFIG_CPU_FREQ_STAT_DETAILS=y
+# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
+# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
+# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
+# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
+CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y
+CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
+CONFIG_CPU_FREQ_GOV_POWERSAVE=m
+CONFIG_CPU_FREQ_GOV_USERSPACE=m
+CONFIG_CPU_FREQ_GOV_ONDEMAND=m
+CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
+
+#
+# x86 CPU frequency scaling drivers
+#
+CONFIG_X86_INTEL_PSTATE=y
+CONFIG_X86_PCC_CPUFREQ=m
+CONFIG_X86_ACPI_CPUFREQ=y
+# CONFIG_X86_POWERNOW_K8 is not set
+# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
+# CONFIG_X86_P4_CLOCKMOD is not set
+
+#
+# shared options
+#
+# CONFIG_X86_SPEEDSTEP_LIB is not set
+
+#
+# CPU Idle
+#
+CONFIG_CPU_IDLE=y
+# CONFIG_CPU_IDLE_MULTIPLE_DRIVERS is not set
+CONFIG_CPU_IDLE_GOV_LADDER=y
+CONFIG_CPU_IDLE_GOV_MENU=y
+# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
+CONFIG_INTEL_IDLE=y
+
+#
+# Memory power savings
+#
+CONFIG_I7300_IDLE_IOAT_CHANNEL=y
+CONFIG_I7300_IDLE=m
+
+#
+# Bus options (PCI etc.)
+#
+CONFIG_PCI=y
+CONFIG_PCI_DIRECT=y
+CONFIG_PCI_MMCONFIG=y
+CONFIG_PCI_XEN=y
+CONFIG_PCI_DOMAINS=y
+# CONFIG_PCI_CNB20LE_QUIRK is not set
+CONFIG_PCIEPORTBUS=y
+CONFIG_HOTPLUG_PCI_PCIE=y
+CONFIG_PCIEAER=y
+# CONFIG_PCIE_ECRC is not set
+# CONFIG_PCIEAER_INJECT is not set
+CONFIG_PCIEASPM=y
+# CONFIG_PCIEASPM_DEBUG is not set
+CONFIG_PCIEASPM_DEFAULT=y
+# CONFIG_PCIEASPM_POWERSAVE is not set
+# CONFIG_PCIEASPM_PERFORMANCE is not set
+CONFIG_PCIE_PME=y
+CONFIG_PCI_MSI=y
+# CONFIG_PCI_DEBUG is not set
+CONFIG_PCI_REALLOC_ENABLE_AUTO=y
+CONFIG_PCI_STUB=m
+CONFIG_XEN_PCIDEV_FRONTEND=m
+CONFIG_HT_IRQ=y
+CONFIG_PCI_ATS=y
+CONFIG_PCI_IOV=y
+CONFIG_PCI_PRI=y
+CONFIG_PCI_PASID=y
+CONFIG_PCI_IOAPIC=y
+CONFIG_PCI_LABEL=y
+
+#
+# PCI host controller drivers
+#
+CONFIG_ISA_DMA_API=y
+# CONFIG_PCCARD is not set
+CONFIG_HOTPLUG_PCI=y
+CONFIG_HOTPLUG_PCI_ACPI=y
+CONFIG_HOTPLUG_PCI_ACPI_IBM=m
+# CONFIG_HOTPLUG_PCI_CPCI is not set
+# CONFIG_HOTPLUG_PCI_SHPC is not set
+CONFIG_RAPIDIO=y
+# CONFIG_RAPIDIO_TSI721 is not set
+CONFIG_RAPIDIO_DISC_TIMEOUT=30
+CONFIG_RAPIDIO_ENABLE_RX_TX_PORTS=y
+CONFIG_RAPIDIO_DMA_ENGINE=y
+# CONFIG_RAPIDIO_DEBUG is not set
+CONFIG_RAPIDIO_ENUM_BASIC=m
+
+#
+# RapidIO Switch drivers
+#
+# CONFIG_RAPIDIO_TSI57X is not set
+# CONFIG_RAPIDIO_CPS_XX is not set
+# CONFIG_RAPIDIO_TSI568 is not set
+# CONFIG_RAPIDIO_CPS_GEN2 is not set
+# CONFIG_X86_SYSFB is not set
+
+#
+# Executable file formats / Emulations
+#
+CONFIG_BINFMT_ELF=y
+CONFIG_COMPAT_BINFMT_ELF=y
+CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE=y
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+CONFIG_BINFMT_SCRIPT=y
+# CONFIG_HAVE_AOUT is not set
+CONFIG_BINFMT_MISC=y
+CONFIG_COREDUMP=y
+CONFIG_IA32_EMULATION=y
+CONFIG_IA32_AOUT=m
+# CONFIG_X86_X32 is not set
+CONFIG_COMPAT=y
+CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
+CONFIG_SYSVIPC_COMPAT=y
+CONFIG_KEYS_COMPAT=y
+CONFIG_X86_DEV_DMA_OPS=y
+CONFIG_NET=y
+CONFIG_COMPAT_NETLINK_MESSAGES=y
+
+#
+# Networking options
+#
+CONFIG_PACKET=y
+CONFIG_PACKET_DIAG=m
+CONFIG_UNIX=y
+CONFIG_UNIX_DIAG=m
+CONFIG_XFRM=y
+CONFIG_XFRM_ALGO=m
+CONFIG_XFRM_USER=m
+# CONFIG_XFRM_SUB_POLICY is not set
+# CONFIG_XFRM_MIGRATE is not set
+# CONFIG_XFRM_STATISTICS is not set
+CONFIG_XFRM_IPCOMP=m
+CONFIG_NET_KEY=m
+# CONFIG_NET_KEY_MIGRATE is not set
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_FIB_TRIE_STATS=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_ROUTE_MULTIPATH=y
+CONFIG_IP_ROUTE_VERBOSE=y
+CONFIG_IP_ROUTE_CLASSID=y
+# CONFIG_IP_PNP is not set
+CONFIG_NET_IPIP=m
+CONFIG_NET_IPGRE_DEMUX=m
+CONFIG_NET_IP_TUNNEL=m
+CONFIG_NET_IPGRE=m
+CONFIG_NET_IPGRE_BROADCAST=y
+CONFIG_IP_MROUTE=y
+CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
+CONFIG_IP_PIMSM_V1=y
+CONFIG_IP_PIMSM_V2=y
+CONFIG_SYN_COOKIES=y
+CONFIG_NET_IPVTI=m
+CONFIG_INET_AH=m
+CONFIG_INET_ESP=m
+CONFIG_INET_IPCOMP=m
+CONFIG_INET_XFRM_TUNNEL=m
+CONFIG_INET_TUNNEL=m
+CONFIG_INET_XFRM_MODE_TRANSPORT=m
+CONFIG_INET_XFRM_MODE_TUNNEL=m
+CONFIG_INET_XFRM_MODE_BEET=m
+CONFIG_INET_LRO=y
+CONFIG_INET_DIAG=y
+CONFIG_INET_TCP_DIAG=y
+CONFIG_INET_UDP_DIAG=m
+CONFIG_TCP_CONG_ADVANCED=y
+CONFIG_TCP_CONG_BIC=m
+CONFIG_TCP_CONG_CUBIC=y
+CONFIG_TCP_CONG_WESTWOOD=m
+CONFIG_TCP_CONG_HTCP=m
+CONFIG_TCP_CONG_HSTCP=m
+CONFIG_TCP_CONG_HYBLA=m
+CONFIG_TCP_CONG_VEGAS=m
+CONFIG_TCP_CONG_SCALABLE=m
+CONFIG_TCP_CONG_LP=m
+CONFIG_TCP_CONG_VENO=m
+CONFIG_TCP_CONG_YEAH=m
+CONFIG_TCP_CONG_ILLINOIS=m
+CONFIG_DEFAULT_CUBIC=y
+# CONFIG_DEFAULT_RENO is not set
+CONFIG_DEFAULT_TCP_CONG="cubic"
+# CONFIG_TCP_MD5SIG is not set
+CONFIG_IPV6=y
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_ROUTE_INFO=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_INET6_AH=m
+CONFIG_INET6_ESP=m
+CONFIG_INET6_IPCOMP=m
+CONFIG_IPV6_MIP6=m
+CONFIG_INET6_XFRM_TUNNEL=m
+CONFIG_INET6_TUNNEL=m
+CONFIG_INET6_XFRM_MODE_TRANSPORT=m
+CONFIG_INET6_XFRM_MODE_TUNNEL=m
+CONFIG_INET6_XFRM_MODE_BEET=m
+CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
+CONFIG_IPV6_VTI=m
+CONFIG_IPV6_SIT=m
+CONFIG_IPV6_SIT_6RD=y
+CONFIG_IPV6_NDISC_NODETYPE=y
+CONFIG_IPV6_TUNNEL=m
+CONFIG_IPV6_GRE=m
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_IPV6_SUBTREES=y
+CONFIG_IPV6_MROUTE=y
+CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
+CONFIG_IPV6_PIMSM_V2=y
+# CONFIG_NETLABEL is not set
+CONFIG_NETWORK_SECMARK=y
+CONFIG_NET_PTP_CLASSIFY=y
+CONFIG_NETWORK_PHY_TIMESTAMPING=y
+CONFIG_NETFILTER=y
+# CONFIG_NETFILTER_DEBUG is not set
+CONFIG_NETFILTER_ADVANCED=y
+CONFIG_BRIDGE_NETFILTER=y
+
+#
+# Core Netfilter Configuration
+#
+CONFIG_NETFILTER_NETLINK=m
+CONFIG_NETFILTER_NETLINK_ACCT=m
+CONFIG_NETFILTER_NETLINK_QUEUE=m
+CONFIG_NETFILTER_NETLINK_LOG=m
+CONFIG_NF_CONNTRACK=m
+CONFIG_NF_CONNTRACK_MARK=y
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_ZONES=y
+CONFIG_NF_CONNTRACK_PROCFS=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CONNTRACK_TIMEOUT=y
+CONFIG_NF_CONNTRACK_TIMESTAMP=y
+CONFIG_NF_CONNTRACK_LABELS=y
+CONFIG_NF_CT_PROTO_DCCP=m
+CONFIG_NF_CT_PROTO_GRE=m
+CONFIG_NF_CT_PROTO_SCTP=m
+CONFIG_NF_CT_PROTO_UDPLITE=m
+CONFIG_NF_CONNTRACK_AMANDA=m
+CONFIG_NF_CONNTRACK_FTP=m
+CONFIG_NF_CONNTRACK_H323=m
+CONFIG_NF_CONNTRACK_IRC=m
+CONFIG_NF_CONNTRACK_BROADCAST=m
+CONFIG_NF_CONNTRACK_NETBIOS_NS=m
+CONFIG_NF_CONNTRACK_SNMP=m
+CONFIG_NF_CONNTRACK_PPTP=m
+CONFIG_NF_CONNTRACK_SANE=m
+CONFIG_NF_CONNTRACK_SIP=m
+CONFIG_NF_CONNTRACK_TFTP=m
+CONFIG_NF_CT_NETLINK=m
+CONFIG_NF_CT_NETLINK_TIMEOUT=m
+CONFIG_NF_CT_NETLINK_HELPER=m
+CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
+CONFIG_NF_NAT=m
+CONFIG_NF_NAT_NEEDED=y
+CONFIG_NF_NAT_PROTO_DCCP=m
+CONFIG_NF_NAT_PROTO_UDPLITE=m
+CONFIG_NF_NAT_PROTO_SCTP=m
+CONFIG_NF_NAT_AMANDA=m
+CONFIG_NF_NAT_FTP=m
+CONFIG_NF_NAT_IRC=m
+CONFIG_NF_NAT_SIP=m
+CONFIG_NF_NAT_TFTP=m
+CONFIG_NETFILTER_SYNPROXY=m
+CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_INET=m
+CONFIG_NFT_EXTHDR=m
+CONFIG_NFT_META=m
+CONFIG_NFT_CT=m
+CONFIG_NFT_RBTREE=m
+CONFIG_NFT_HASH=m
+CONFIG_NFT_COUNTER=m
+CONFIG_NFT_LOG=m
+CONFIG_NFT_LIMIT=m
+CONFIG_NFT_NAT=m
+CONFIG_NFT_QUEUE=m
+CONFIG_NFT_REJECT=m
+CONFIG_NFT_REJECT_INET=m
+CONFIG_NFT_COMPAT=m
+CONFIG_NETFILTER_XTABLES=m
+
+#
+# Xtables combined modules
+#
+CONFIG_NETFILTER_XT_MARK=m
+CONFIG_NETFILTER_XT_CONNMARK=m
+CONFIG_NETFILTER_XT_SET=m
+
+#
+# Xtables targets
+#
+CONFIG_NETFILTER_XT_TARGET_AUDIT=m
+CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
+CONFIG_NETFILTER_XT_TARGET_CT=m
+CONFIG_NETFILTER_XT_TARGET_DSCP=m
+CONFIG_NETFILTER_XT_TARGET_HL=m
+CONFIG_NETFILTER_XT_TARGET_HMARK=m
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
+CONFIG_NETFILTER_XT_TARGET_LED=m
+CONFIG_NETFILTER_XT_TARGET_LOG=m
+CONFIG_NETFILTER_XT_TARGET_MARK=m
+CONFIG_NETFILTER_XT_TARGET_NETMAP=m
+CONFIG_NETFILTER_XT_TARGET_NFLOG=m
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
+CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
+CONFIG_NETFILTER_XT_TARGET_RATEEST=m
+CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
+CONFIG_NETFILTER_XT_TARGET_TEE=m
+CONFIG_NETFILTER_XT_TARGET_TPROXY=m
+CONFIG_NETFILTER_XT_TARGET_TRACE=m
+CONFIG_NETFILTER_XT_TARGET_SECMARK=m
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
+CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m
+
+#
+# Xtables matches
+#
+CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
+CONFIG_NETFILTER_XT_MATCH_BPF=m
+CONFIG_NETFILTER_XT_MATCH_CGROUP=m
+CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
+CONFIG_NETFILTER_XT_MATCH_COMMENT=m
+CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
+CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
+CONFIG_NETFILTER_XT_MATCH_CPU=m
+CONFIG_NETFILTER_XT_MATCH_DCCP=m
+CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
+CONFIG_NETFILTER_XT_MATCH_DSCP=m
+CONFIG_NETFILTER_XT_MATCH_ECN=m
+CONFIG_NETFILTER_XT_MATCH_ESP=m
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
+CONFIG_NETFILTER_XT_MATCH_HELPER=m
+CONFIG_NETFILTER_XT_MATCH_HL=m
+CONFIG_NETFILTER_XT_MATCH_IPCOMP=m
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
+CONFIG_NETFILTER_XT_MATCH_IPVS=m
+CONFIG_NETFILTER_XT_MATCH_L2TP=m
+CONFIG_NETFILTER_XT_MATCH_LENGTH=m
+CONFIG_NETFILTER_XT_MATCH_LIMIT=m
+CONFIG_NETFILTER_XT_MATCH_MAC=m
+CONFIG_NETFILTER_XT_MATCH_MARK=m
+CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
+CONFIG_NETFILTER_XT_MATCH_NFACCT=m
+CONFIG_NETFILTER_XT_MATCH_OSF=m
+CONFIG_NETFILTER_XT_MATCH_OWNER=m
+CONFIG_NETFILTER_XT_MATCH_POLICY=m
+CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
+CONFIG_NETFILTER_XT_MATCH_QUOTA=m
+CONFIG_NETFILTER_XT_MATCH_RATEEST=m
+CONFIG_NETFILTER_XT_MATCH_REALM=m
+CONFIG_NETFILTER_XT_MATCH_RECENT=m
+CONFIG_NETFILTER_XT_MATCH_SCTP=m
+CONFIG_NETFILTER_XT_MATCH_SOCKET=m
+CONFIG_NETFILTER_XT_MATCH_STATE=m
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
+CONFIG_NETFILTER_XT_MATCH_STRING=m
+CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
+CONFIG_NETFILTER_XT_MATCH_TIME=m
+CONFIG_NETFILTER_XT_MATCH_U32=m
+CONFIG_IP_SET=m
+CONFIG_IP_SET_MAX=256
+CONFIG_IP_SET_BITMAP_IP=m
+CONFIG_IP_SET_BITMAP_IPMAC=m
+CONFIG_IP_SET_BITMAP_PORT=m
+CONFIG_IP_SET_HASH_IP=m
+CONFIG_IP_SET_HASH_IPMARK=m
+CONFIG_IP_SET_HASH_IPPORT=m
+CONFIG_IP_SET_HASH_IPPORTIP=m
+CONFIG_IP_SET_HASH_IPPORTNET=m
+CONFIG_IP_SET_HASH_NETPORTNET=m
+CONFIG_IP_SET_HASH_NET=m
+CONFIG_IP_SET_HASH_NETNET=m
+CONFIG_IP_SET_HASH_NETPORT=m
+CONFIG_IP_SET_HASH_NETIFACE=m
+CONFIG_IP_SET_LIST_SET=m
+CONFIG_IP_VS=m
+# CONFIG_IP_VS_IPV6 is not set
+# CONFIG_IP_VS_DEBUG is not set
+CONFIG_IP_VS_TAB_BITS=12
+
+#
+# IPVS transport protocol load balancing support
+#
+CONFIG_IP_VS_PROTO_TCP=y
+CONFIG_IP_VS_PROTO_UDP=y
+CONFIG_IP_VS_PROTO_AH_ESP=y
+CONFIG_IP_VS_PROTO_ESP=y
+CONFIG_IP_VS_PROTO_AH=y
+CONFIG_IP_VS_PROTO_SCTP=y
+
+#
+# IPVS scheduler
+#
+CONFIG_IP_VS_RR=m
+CONFIG_IP_VS_WRR=m
+CONFIG_IP_VS_LC=m
+CONFIG_IP_VS_WLC=m
+CONFIG_IP_VS_LBLC=m
+CONFIG_IP_VS_LBLCR=m
+CONFIG_IP_VS_DH=m
+CONFIG_IP_VS_SH=m
+CONFIG_IP_VS_SED=m
+CONFIG_IP_VS_NQ=m
+
+#
+# IPVS SH scheduler
+#
+CONFIG_IP_VS_SH_TAB_BITS=8
+
+#
+# IPVS application helper
+#
+CONFIG_IP_VS_FTP=m
+CONFIG_IP_VS_NFCT=y
+CONFIG_IP_VS_PE_SIP=m
+
+#
+# IP: Netfilter Configuration
+#
+CONFIG_NF_DEFRAG_IPV4=m
+CONFIG_NF_CONNTRACK_IPV4=m
+# CONFIG_NF_CONNTRACK_PROC_COMPAT is not set
+CONFIG_NF_TABLES_IPV4=m
+CONFIG_NFT_CHAIN_ROUTE_IPV4=m
+CONFIG_NFT_CHAIN_NAT_IPV4=m
+CONFIG_NFT_REJECT_IPV4=m
+CONFIG_NF_TABLES_ARP=m
+CONFIG_IP_NF_IPTABLES=m
+CONFIG_IP_NF_MATCH_AH=m
+CONFIG_IP_NF_MATCH_ECN=m
+CONFIG_IP_NF_MATCH_RPFILTER=m
+CONFIG_IP_NF_MATCH_TTL=m
+CONFIG_IP_NF_FILTER=m
+CONFIG_IP_NF_TARGET_REJECT=m
+CONFIG_IP_NF_TARGET_SYNPROXY=m
+CONFIG_IP_NF_TARGET_ULOG=m
+CONFIG_NF_NAT_IPV4=m
+CONFIG_IP_NF_TARGET_MASQUERADE=m
+CONFIG_IP_NF_TARGET_NETMAP=m
+CONFIG_IP_NF_TARGET_REDIRECT=m
+CONFIG_NF_NAT_SNMP_BASIC=m
+CONFIG_NF_NAT_PROTO_GRE=m
+CONFIG_NF_NAT_PPTP=m
+CONFIG_NF_NAT_H323=m
+CONFIG_IP_NF_MANGLE=m
+CONFIG_IP_NF_TARGET_CLUSTERIP=m
+CONFIG_IP_NF_TARGET_ECN=m
+CONFIG_IP_NF_TARGET_TTL=m
+CONFIG_IP_NF_RAW=m
+CONFIG_IP_NF_SECURITY=m
+CONFIG_IP_NF_ARPTABLES=m
+CONFIG_IP_NF_ARPFILTER=m
+CONFIG_IP_NF_ARP_MANGLE=m
+
+#
+# IPv6: Netfilter Configuration
+#
+CONFIG_NF_DEFRAG_IPV6=m
+CONFIG_NF_CONNTRACK_IPV6=m
+CONFIG_NF_TABLES_IPV6=m
+CONFIG_NFT_CHAIN_ROUTE_IPV6=m
+CONFIG_NFT_CHAIN_NAT_IPV6=m
+CONFIG_NFT_REJECT_IPV6=m
+CONFIG_IP6_NF_IPTABLES=m
+CONFIG_IP6_NF_MATCH_AH=m
+CONFIG_IP6_NF_MATCH_EUI64=m
+CONFIG_IP6_NF_MATCH_FRAG=m
+CONFIG_IP6_NF_MATCH_OPTS=m
+CONFIG_IP6_NF_MATCH_HL=m
+CONFIG_IP6_NF_MATCH_IPV6HEADER=m
+CONFIG_IP6_NF_MATCH_MH=m
+CONFIG_IP6_NF_MATCH_RPFILTER=m
+CONFIG_IP6_NF_MATCH_RT=m
+CONFIG_IP6_NF_TARGET_HL=m
+CONFIG_IP6_NF_FILTER=m
+CONFIG_IP6_NF_TARGET_REJECT=m
+CONFIG_IP6_NF_TARGET_SYNPROXY=m
+CONFIG_IP6_NF_MANGLE=m
+CONFIG_IP6_NF_RAW=m
+CONFIG_IP6_NF_SECURITY=m
+CONFIG_NF_NAT_IPV6=m
+CONFIG_IP6_NF_TARGET_MASQUERADE=m
+CONFIG_IP6_NF_TARGET_NPT=m
+CONFIG_NF_TABLES_BRIDGE=m
+CONFIG_BRIDGE_NF_EBTABLES=m
+CONFIG_BRIDGE_EBT_BROUTE=m
+CONFIG_BRIDGE_EBT_T_FILTER=m
+CONFIG_BRIDGE_EBT_T_NAT=m
+CONFIG_BRIDGE_EBT_802_3=m
+CONFIG_BRIDGE_EBT_AMONG=m
+CONFIG_BRIDGE_EBT_ARP=m
+CONFIG_BRIDGE_EBT_IP=m
+CONFIG_BRIDGE_EBT_IP6=m
+CONFIG_BRIDGE_EBT_LIMIT=m
+CONFIG_BRIDGE_EBT_MARK=m
+CONFIG_BRIDGE_EBT_PKTTYPE=m
+CONFIG_BRIDGE_EBT_STP=m
+CONFIG_BRIDGE_EBT_VLAN=m
+CONFIG_BRIDGE_EBT_ARPREPLY=m
+CONFIG_BRIDGE_EBT_DNAT=m
+CONFIG_BRIDGE_EBT_MARK_T=m
+CONFIG_BRIDGE_EBT_REDIRECT=m
+CONFIG_BRIDGE_EBT_SNAT=m
+CONFIG_BRIDGE_EBT_LOG=m
+CONFIG_BRIDGE_EBT_ULOG=m
+CONFIG_BRIDGE_EBT_NFLOG=m
+# CONFIG_IP_DCCP is not set
+CONFIG_IP_SCTP=m
+# CONFIG_SCTP_DBG_OBJCNT is not set
+CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5=y
+# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1 is not set
+# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
+CONFIG_SCTP_COOKIE_HMAC_MD5=y
+CONFIG_SCTP_COOKIE_HMAC_SHA1=y
+# CONFIG_RDS is not set
+# CONFIG_TIPC is not set
+# CONFIG_ATM is not set
+CONFIG_L2TP=m
+# CONFIG_L2TP_DEBUGFS is not set
+CONFIG_L2TP_V3=y
+CONFIG_L2TP_IP=m
+CONFIG_L2TP_ETH=m
+CONFIG_STP=m
+CONFIG_GARP=m
+CONFIG_MRP=m
+CONFIG_BRIDGE=m
+CONFIG_BRIDGE_IGMP_SNOOPING=y
+CONFIG_BRIDGE_VLAN_FILTERING=y
+CONFIG_HAVE_NET_DSA=y
+CONFIG_VLAN_8021Q=m
+CONFIG_VLAN_8021Q_GVRP=y
+CONFIG_VLAN_8021Q_MVRP=y
+# CONFIG_DECNET is not set
+CONFIG_LLC=m
+CONFIG_LLC2=m
+# CONFIG_IPX is not set
+# CONFIG_ATALK is not set
+# CONFIG_X25 is not set
+# CONFIG_LAPB is not set
+# CONFIG_PHONET is not set
+# CONFIG_IEEE802154 is not set
+CONFIG_6LOWPAN_IPHC=m
+CONFIG_NET_SCHED=y
+
+#
+# Queueing/Scheduling
+#
+CONFIG_NET_SCH_CBQ=m
+CONFIG_NET_SCH_HTB=m
+CONFIG_NET_SCH_HFSC=m
+CONFIG_NET_SCH_PRIO=m
+CONFIG_NET_SCH_MULTIQ=m
+CONFIG_NET_SCH_RED=m
+CONFIG_NET_SCH_SFB=m
+CONFIG_NET_SCH_SFQ=m
+CONFIG_NET_SCH_TEQL=m
+CONFIG_NET_SCH_TBF=m
+CONFIG_NET_SCH_GRED=m
+CONFIG_NET_SCH_DSMARK=m
+CONFIG_NET_SCH_NETEM=m
+CONFIG_NET_SCH_DRR=m
+CONFIG_NET_SCH_MQPRIO=m
+CONFIG_NET_SCH_CHOKE=m
+CONFIG_NET_SCH_QFQ=m
+CONFIG_NET_SCH_CODEL=m
+CONFIG_NET_SCH_FQ_CODEL=m
+CONFIG_NET_SCH_FQ=m
+CONFIG_NET_SCH_HHF=m
+CONFIG_NET_SCH_PIE=m
+CONFIG_NET_SCH_INGRESS=m
+CONFIG_NET_SCH_PLUG=m
+
+#
+# Classification
+#
+CONFIG_NET_CLS=y
+CONFIG_NET_CLS_BASIC=m
+CONFIG_NET_CLS_TCINDEX=m
+CONFIG_NET_CLS_ROUTE4=m
+CONFIG_NET_CLS_FW=m
+CONFIG_NET_CLS_U32=m
+CONFIG_CLS_U32_PERF=y
+CONFIG_CLS_U32_MARK=y
+CONFIG_NET_CLS_RSVP=m
+CONFIG_NET_CLS_RSVP6=m
+CONFIG_NET_CLS_FLOW=m
+CONFIG_NET_CLS_CGROUP=y
+CONFIG_NET_CLS_BPF=m
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_STACK=32
+CONFIG_NET_EMATCH_CMP=m
+CONFIG_NET_EMATCH_NBYTE=m
+CONFIG_NET_EMATCH_U32=m
+CONFIG_NET_EMATCH_META=m
+CONFIG_NET_EMATCH_TEXT=m
+CONFIG_NET_EMATCH_IPSET=m
+CONFIG_NET_CLS_ACT=y
+CONFIG_NET_ACT_POLICE=m
+CONFIG_NET_ACT_GACT=m
+CONFIG_GACT_PROB=y
+CONFIG_NET_ACT_MIRRED=m
+CONFIG_NET_ACT_IPT=m
+CONFIG_NET_ACT_NAT=m
+CONFIG_NET_ACT_PEDIT=m
+CONFIG_NET_ACT_SIMP=m
+CONFIG_NET_ACT_SKBEDIT=m
+CONFIG_NET_ACT_CSUM=m
+CONFIG_NET_CLS_IND=y
+CONFIG_NET_SCH_FIFO=y
+# CONFIG_DCB is not set
+CONFIG_DNS_RESOLVER=y
+# CONFIG_BATMAN_ADV is not set
+CONFIG_OPENVSWITCH=m
+CONFIG_OPENVSWITCH_GRE=y
+CONFIG_OPENVSWITCH_VXLAN=y
+CONFIG_VSOCKETS=m
+CONFIG_VMWARE_VMCI_VSOCKETS=m
+CONFIG_NETLINK_MMAP=y
+CONFIG_NETLINK_DIAG=m
+CONFIG_NET_MPLS_GSO=m
+# CONFIG_HSR is not set
+CONFIG_RPS=y
+CONFIG_RFS_ACCEL=y
+CONFIG_XPS=y
+CONFIG_CGROUP_NET_PRIO=y
+CONFIG_CGROUP_NET_CLASSID=y
+CONFIG_NET_RX_BUSY_POLL=y
+CONFIG_BQL=y
+CONFIG_BPF_JIT=y
+CONFIG_NET_FLOW_LIMIT=y
+
+#
+# Network testing
+#
+CONFIG_NET_PKTGEN=m
+CONFIG_NET_DROP_MONITOR=y
+# CONFIG_HAMRADIO is not set
+# CONFIG_CAN is not set
+# CONFIG_IRDA is not set
+CONFIG_BT=m
+CONFIG_BT_6LOWPAN=y
+CONFIG_BT_RFCOMM=m
+CONFIG_BT_RFCOMM_TTY=y
+CONFIG_BT_BNEP=m
+CONFIG_BT_BNEP_MC_FILTER=y
+CONFIG_BT_BNEP_PROTO_FILTER=y
+CONFIG_BT_HIDP=m
+
+#
+# Bluetooth device drivers
+#
+CONFIG_BT_HCIBTUSB=m
+CONFIG_BT_HCIUART=m
+CONFIG_BT_HCIUART_H4=y
+CONFIG_BT_HCIUART_BCSP=y
+CONFIG_BT_HCIUART_ATH3K=y
+CONFIG_BT_HCIUART_LL=y
+# CONFIG_BT_HCIUART_3WIRE is not set
+# CONFIG_BT_HCIBCM203X is not set
+# CONFIG_BT_HCIBPA10X is not set
+# CONFIG_BT_HCIBFUSB is not set
+CONFIG_BT_HCIVHCI=m
+# CONFIG_BT_MRVL is not set
+CONFIG_BT_ATH3K=m
+# CONFIG_AF_RXRPC is not set
+CONFIG_FIB_RULES=y
+CONFIG_WIRELESS=y
+CONFIG_WEXT_CORE=y
+CONFIG_WEXT_PROC=y
+CONFIG_CFG80211=m
+CONFIG_NL80211_TESTMODE=y
+# CONFIG_CFG80211_DEVELOPER_WARNINGS is not set
+# CONFIG_CFG80211_REG_DEBUG is not set
+# CONFIG_CFG80211_CERTIFICATION_ONUS is not set
+CONFIG_CFG80211_DEFAULT_PS=y
+# CONFIG_CFG80211_DEBUGFS is not set
+# CONFIG_CFG80211_INTERNAL_REGDB is not set
+CONFIG_CFG80211_WEXT=y
+# CONFIG_LIB80211 is not set
+CONFIG_MAC80211=m
+CONFIG_MAC80211_HAS_RC=y
+CONFIG_MAC80211_RC_PID=y
+CONFIG_MAC80211_RC_MINSTREL=y
+CONFIG_MAC80211_RC_MINSTREL_HT=y
+# CONFIG_MAC80211_RC_DEFAULT_PID is not set
+CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
+CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
+# CONFIG_MAC80211_MESH is not set
+CONFIG_MAC80211_LEDS=y
+# CONFIG_MAC80211_DEBUGFS is not set
+# CONFIG_MAC80211_MESSAGE_TRACING is not set
+# CONFIG_MAC80211_DEBUG_MENU is not set
+# CONFIG_WIMAX is not set
+CONFIG_RFKILL=m
+CONFIG_RFKILL_LEDS=y
+CONFIG_RFKILL_INPUT=y
+CONFIG_RFKILL_REGULATOR=m
+CONFIG_RFKILL_GPIO=m
+# CONFIG_NET_9P is not set
+# CONFIG_CAIF is not set
+# CONFIG_CEPH_LIB is not set
+# CONFIG_NFC is not set
+CONFIG_HAVE_BPF_JIT=y
+
+#
+# Device Drivers
+#
+
+#
+# Generic Driver Options
+#
+CONFIG_UEVENT_HELPER_PATH=""
+CONFIG_DEVTMPFS=y
+# CONFIG_DEVTMPFS_MOUNT is not set
+CONFIG_STANDALONE=y
+CONFIG_PREVENT_FIRMWARE_BUILD=y
+CONFIG_FW_LOADER=y
+# CONFIG_FIRMWARE_IN_KERNEL is not set
+CONFIG_EXTRA_FIRMWARE=""
+CONFIG_FW_LOADER_USER_HELPER=y
+# CONFIG_DEBUG_DRIVER is not set
+# CONFIG_DEBUG_DEVRES is not set
+CONFIG_SYS_HYPERVISOR=y
+# CONFIG_GENERIC_CPU_DEVICES is not set
+CONFIG_GENERIC_CPU_AUTOPROBE=y
+CONFIG_REGMAP=y
+CONFIG_REGMAP_MMIO=y
+CONFIG_DMA_SHARED_BUFFER=y
+
+#
+# Bus devices
+#
+CONFIG_CONNECTOR=y
+CONFIG_PROC_EVENTS=y
+# CONFIG_MTD is not set
+# CONFIG_PARPORT is not set
+CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
+CONFIG_PNP=y
+# CONFIG_PNP_DEBUG_MESSAGES is not set
+
+#
+# Protocols
+#
+CONFIG_PNPACPI=y
+CONFIG_BLK_DEV=y
+# CONFIG_BLK_DEV_NULL_BLK is not set
+# CONFIG_BLK_DEV_FD is not set
+# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
+CONFIG_ZRAM=m
+CONFIG_ZRAM_LZ4_COMPRESS=y
+# CONFIG_ZRAM_DEBUG is not set
+# CONFIG_BLK_CPQ_CISS_DA is not set
+# CONFIG_BLK_DEV_DAC960 is not set
+# CONFIG_BLK_DEV_UMEM is not set
+# CONFIG_BLK_DEV_COW_COMMON is not set
+CONFIG_BLK_DEV_LOOP=m
+CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
+CONFIG_BLK_DEV_CRYPTOLOOP=m
+# CONFIG_BLK_DEV_DRBD is not set
+CONFIG_BLK_DEV_NBD=m
+# CONFIG_BLK_DEV_NVME is not set
+# CONFIG_BLK_DEV_SKD is not set
+# CONFIG_BLK_DEV_SX8 is not set
+CONFIG_BLK_DEV_RAM=m
+CONFIG_BLK_DEV_RAM_COUNT=16
+CONFIG_BLK_DEV_RAM_SIZE=16384
+CONFIG_BLK_DEV_XIP=y
+CONFIG_CDROM_PKTCDVD=m
+CONFIG_CDROM_PKTCDVD_BUFFERS=8
+CONFIG_CDROM_PKTCDVD_WCACHE=y
+# CONFIG_ATA_OVER_ETH is not set
+CONFIG_XEN_BLKDEV_FRONTEND=m
+CONFIG_XEN_BLKDEV_BACKEND=m
+CONFIG_VIRTIO_BLK=m
+# CONFIG_BLK_DEV_HD is not set
+# CONFIG_BLK_DEV_RBD is not set
+# CONFIG_BLK_DEV_RSXX is not set
+
+#
+# Misc devices
+#
+# CONFIG_SENSORS_LIS3LV02D is not set
+# CONFIG_AD525X_DPOT is not set
+# CONFIG_ATMEL_PWM is not set
+CONFIG_DUMMY_IRQ=m
+# CONFIG_IBM_ASM is not set
+# CONFIG_PHANTOM is not set
+# CONFIG_SGI_IOC4 is not set
+# CONFIG_TIFM_CORE is not set
+# CONFIG_ICS932S401 is not set
+# CONFIG_ATMEL_SSC is not set
+# CONFIG_ENCLOSURE_SERVICES is not set
+# CONFIG_HP_ILO is not set
+# CONFIG_APDS9802ALS is not set
+# CONFIG_ISL29003 is not set
+# CONFIG_ISL29020 is not set
+# CONFIG_SENSORS_TSL2550 is not set
+# CONFIG_SENSORS_BH1780 is not set
+# CONFIG_SENSORS_BH1770 is not set
+# CONFIG_SENSORS_APDS990X is not set
+# CONFIG_HMC6352 is not set
+# CONFIG_DS1682 is not set
+# CONFIG_TI_DAC7512 is not set
+CONFIG_VMWARE_BALLOON=m
+# CONFIG_BMP085_I2C is not set
+# CONFIG_BMP085_SPI is not set
+# CONFIG_PCH_PHUB is not set
+# CONFIG_USB_SWITCH_FSA9480 is not set
+# CONFIG_LATTICE_ECP3_CONFIG is not set
+# CONFIG_SRAM is not set
+# CONFIG_C2PORT is not set
+
+#
+# EEPROM support
+#
+CONFIG_EEPROM_AT24=m
+CONFIG_EEPROM_AT25=m
+CONFIG_EEPROM_LEGACY=m
+# CONFIG_EEPROM_MAX6875 is not set
+# CONFIG_EEPROM_93CX6 is not set
+# CONFIG_EEPROM_93XX46 is not set
+# CONFIG_CB710_CORE is not set
+
+#
+# Texas Instruments shared transport line discipline
+#
+# CONFIG_TI_ST is not set
+# CONFIG_SENSORS_LIS3_I2C is not set
+
+#
+# Altera FPGA firmware download module
+#
+# CONFIG_ALTERA_STAPL is not set
+CONFIG_INTEL_MEI=y
+CONFIG_INTEL_MEI_ME=y
+# CONFIG_INTEL_MEI_TXE is not set
+CONFIG_VMWARE_VMCI=m
+
+#
+# Intel MIC Host Driver
+#
+# CONFIG_INTEL_MIC_HOST is not set
+
+#
+# Intel MIC Card Driver
+#
+# CONFIG_INTEL_MIC_CARD is not set
+# CONFIG_GENWQE is not set
+# CONFIG_ECHO is not set
+CONFIG_HAVE_IDE=y
+# CONFIG_IDE is not set
+
+#
+# SCSI device support
+#
+CONFIG_SCSI_MOD=m
+CONFIG_RAID_ATTRS=m
+CONFIG_SCSI=m
+CONFIG_SCSI_DMA=y
+CONFIG_SCSI_TGT=m
+# CONFIG_SCSI_NETLINK is not set
+CONFIG_SCSI_PROC_FS=y
+
+#
+# SCSI support type (disk, tape, CD-ROM)
+#
+CONFIG_BLK_DEV_SD=m
+# CONFIG_CHR_DEV_ST is not set
+# CONFIG_CHR_DEV_OSST is not set
+CONFIG_BLK_DEV_SR=m
+# CONFIG_BLK_DEV_SR_VENDOR is not set
+CONFIG_CHR_DEV_SG=m
+CONFIG_CHR_DEV_SCH=m
+CONFIG_SCSI_MULTI_LUN=y
+# CONFIG_SCSI_CONSTANTS is not set
+# CONFIG_SCSI_LOGGING is not set
+# CONFIG_SCSI_SCAN_ASYNC is not set
+
+#
+# SCSI Transports
+#
+CONFIG_SCSI_SPI_ATTRS=m
+# CONFIG_SCSI_FC_ATTRS is not set
+CONFIG_SCSI_ISCSI_ATTRS=m
+# CONFIG_SCSI_SAS_ATTRS is not set
+# CONFIG_SCSI_SAS_LIBSAS is not set
+# CONFIG_SCSI_SRP_ATTRS is not set
+CONFIG_SCSI_LOWLEVEL=y
+CONFIG_ISCSI_TCP=m
+CONFIG_ISCSI_BOOT_SYSFS=m
+# CONFIG_SCSI_CXGB3_ISCSI is not set
+# CONFIG_SCSI_CXGB4_ISCSI is not set
+# CONFIG_SCSI_BNX2_ISCSI is not set
+# CONFIG_SCSI_BNX2X_FCOE is not set
+# CONFIG_BE2ISCSI is not set
+# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
+# CONFIG_SCSI_HPSA is not set
+# CONFIG_SCSI_3W_9XXX is not set
+# CONFIG_SCSI_3W_SAS is not set
+# CONFIG_SCSI_ACARD is not set
+# CONFIG_SCSI_AACRAID is not set
+# CONFIG_SCSI_AIC7XXX is not set
+# CONFIG_SCSI_AIC79XX is not set
+# CONFIG_SCSI_AIC94XX is not set
+# CONFIG_SCSI_MVSAS is not set
+# CONFIG_SCSI_MVUMI is not set
+# CONFIG_SCSI_DPT_I2O is not set
+# CONFIG_SCSI_ADVANSYS is not set
+# CONFIG_SCSI_ARCMSR is not set
+# CONFIG_SCSI_ESAS2R is not set
+# CONFIG_MEGARAID_NEWGEN is not set
+# CONFIG_MEGARAID_LEGACY is not set
+# CONFIG_MEGARAID_SAS is not set
+# CONFIG_SCSI_MPT2SAS is not set
+# CONFIG_SCSI_MPT3SAS is not set
+# CONFIG_SCSI_UFSHCD is not set
+# CONFIG_SCSI_HPTIOP is not set
+# CONFIG_SCSI_BUSLOGIC is not set
+CONFIG_VMWARE_PVSCSI=m
+# CONFIG_LIBFC is not set
+# CONFIG_LIBFCOE is not set
+# CONFIG_FCOE is not set
+# CONFIG_FCOE_FNIC is not set
+# CONFIG_SCSI_DMX3191D is not set
+# CONFIG_SCSI_EATA is not set
+# CONFIG_SCSI_FUTURE_DOMAIN is not set
+# CONFIG_SCSI_GDTH is not set
+# CONFIG_SCSI_ISCI is not set
+# CONFIG_SCSI_IPS is not set
+# CONFIG_SCSI_INITIO is not set
+# CONFIG_SCSI_INIA100 is not set
+# CONFIG_SCSI_STEX is not set
+# CONFIG_SCSI_SYM53C8XX_2 is not set
+# CONFIG_SCSI_IPR is not set
+# CONFIG_SCSI_QLOGIC_1280 is not set
+# CONFIG_SCSI_QLA_FC is not set
+# CONFIG_SCSI_QLA_ISCSI is not set
+# CONFIG_SCSI_LPFC is not set
+# CONFIG_SCSI_DC395x is not set
+# CONFIG_SCSI_DC390T is not set
+# CONFIG_SCSI_DEBUG is not set
+# CONFIG_SCSI_PMCRAID is not set
+# CONFIG_SCSI_PM8001 is not set
+# CONFIG_SCSI_SRP is not set
+# CONFIG_SCSI_BFA_FC is not set
+CONFIG_SCSI_VIRTIO=m
+# CONFIG_SCSI_CHELSIO_FCOE is not set
+# CONFIG_SCSI_DH is not set
+# CONFIG_SCSI_OSD_INITIATOR is not set
+CONFIG_ATA=m
+# CONFIG_ATA_NONSTANDARD is not set
+CONFIG_ATA_VERBOSE_ERROR=y
+CONFIG_ATA_ACPI=y
+CONFIG_SATA_ZPODD=y
+CONFIG_SATA_PMP=y
+
+#
+# Controllers with non-SFF native interface
+#
+CONFIG_SATA_AHCI=m
+CONFIG_SATA_AHCI_PLATFORM=m
+# CONFIG_SATA_INIC162X is not set
+# CONFIG_SATA_ACARD_AHCI is not set
+# CONFIG_SATA_SIL24 is not set
+CONFIG_ATA_SFF=y
+
+#
+# SFF controllers with custom DMA interface
+#
+# CONFIG_PDC_ADMA is not set
+# CONFIG_SATA_QSTOR is not set
+# CONFIG_SATA_SX4 is not set
+CONFIG_ATA_BMDMA=y
+
+#
+# SATA SFF controllers with BMDMA
+#
+CONFIG_ATA_PIIX=m
+# CONFIG_SATA_MV is not set
+# CONFIG_SATA_NV is not set
+# CONFIG_SATA_PROMISE is not set
+# CONFIG_SATA_SIL is not set
+# CONFIG_SATA_SIS is not set
+# CONFIG_SATA_SVW is not set
+# CONFIG_SATA_ULI is not set
+# CONFIG_SATA_VIA is not set
+# CONFIG_SATA_VITESSE is not set
+
+#
+# PATA SFF controllers with BMDMA
+#
+# CONFIG_PATA_ALI is not set
+# CONFIG_PATA_AMD is not set
+# CONFIG_PATA_ARTOP is not set
+# CONFIG_PATA_ATIIXP is not set
+# CONFIG_PATA_ATP867X is not set
+# CONFIG_PATA_CMD64X is not set
+# CONFIG_PATA_CYPRESS is not set
+# CONFIG_PATA_EFAR is not set
+# CONFIG_PATA_HPT366 is not set
+# CONFIG_PATA_HPT37X is not set
+# CONFIG_PATA_HPT3X2N is not set
+# CONFIG_PATA_HPT3X3 is not set
+# CONFIG_PATA_IT8213 is not set
+# CONFIG_PATA_IT821X is not set
+# CONFIG_PATA_JMICRON is not set
+# CONFIG_PATA_MARVELL is not set
+# CONFIG_PATA_NETCELL is not set
+# CONFIG_PATA_NINJA32 is not set
+# CONFIG_PATA_NS87415 is not set
+# CONFIG_PATA_OLDPIIX is not set
+# CONFIG_PATA_OPTIDMA is not set
+# CONFIG_PATA_PDC2027X is not set
+# CONFIG_PATA_PDC_OLD is not set
+# CONFIG_PATA_RADISYS is not set
+# CONFIG_PATA_RDC is not set
+# CONFIG_PATA_SCH is not set
+# CONFIG_PATA_SERVERWORKS is not set
+# CONFIG_PATA_SIL680 is not set
+# CONFIG_PATA_SIS is not set
+# CONFIG_PATA_TOSHIBA is not set
+# CONFIG_PATA_TRIFLEX is not set
+# CONFIG_PATA_VIA is not set
+# CONFIG_PATA_WINBOND is not set
+
+#
+# PIO-only SFF controllers
+#
+# CONFIG_PATA_CMD640_PCI is not set
+# CONFIG_PATA_MPIIX is not set
+# CONFIG_PATA_NS87410 is not set
+# CONFIG_PATA_OPTI is not set
+# CONFIG_PATA_PLATFORM is not set
+# CONFIG_PATA_RZ1000 is not set
+
+#
+# Generic fallback / legacy drivers
+#
+CONFIG_PATA_ACPI=m
+CONFIG_ATA_GENERIC=m
+# CONFIG_PATA_LEGACY is not set
+CONFIG_MD=y
+CONFIG_BLK_DEV_MD=m
+CONFIG_MD_LINEAR=m
+CONFIG_MD_RAID0=m
+CONFIG_MD_RAID1=m
+CONFIG_MD_RAID10=m
+CONFIG_MD_RAID456=m
+CONFIG_MD_MULTIPATH=m
+CONFIG_MD_FAULTY=m
+# CONFIG_BCACHE is not set
+CONFIG_BLK_DEV_DM_BUILTIN=y
+CONFIG_BLK_DEV_DM=m
+# CONFIG_DM_DEBUG is not set
+CONFIG_DM_BUFIO=m
+CONFIG_DM_BIO_PRISON=m
+CONFIG_DM_PERSISTENT_DATA=m
+# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set
+CONFIG_DM_CRYPT=m
+CONFIG_DM_SNAPSHOT=m
+CONFIG_DM_THIN_PROVISIONING=m
+# CONFIG_DM_CACHE is not set
+CONFIG_DM_ERA=m
+CONFIG_DM_MIRROR=m
+CONFIG_DM_LOG_USERSPACE=m
+CONFIG_DM_RAID=m
+CONFIG_DM_ZERO=m
+CONFIG_DM_MULTIPATH=m
+CONFIG_DM_MULTIPATH_QL=m
+CONFIG_DM_MULTIPATH_ST=m
+CONFIG_DM_DELAY=m
+CONFIG_DM_UEVENT=y
+# CONFIG_DM_FLAKEY is not set
+CONFIG_DM_VERITY=m
+CONFIG_DM_SWITCH=m
+CONFIG_TARGET_CORE=m
+CONFIG_TCM_IBLOCK=m
+CONFIG_TCM_FILEIO=m
+CONFIG_TCM_PSCSI=m
+CONFIG_LOOPBACK_TARGET=m
+CONFIG_ISCSI_TARGET=m
+# CONFIG_FUSION is not set
+
+#
+# IEEE 1394 (FireWire) support
+#
+# CONFIG_FIREWIRE is not set
+# CONFIG_FIREWIRE_NOSY is not set
+CONFIG_I2O=m
+CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=y
+CONFIG_I2O_EXT_ADAPTEC=y
+CONFIG_I2O_EXT_ADAPTEC_DMA64=y
+CONFIG_I2O_CONFIG=m
+CONFIG_I2O_CONFIG_OLD_IOCTL=y
+CONFIG_I2O_BUS=m
+CONFIG_I2O_BLOCK=m
+CONFIG_I2O_SCSI=m
+CONFIG_I2O_PROC=m
+# CONFIG_MACINTOSH_DRIVERS is not set
+CONFIG_NETDEVICES=y
+CONFIG_MII=m
+CONFIG_NET_CORE=y
+CONFIG_BONDING=m
+CONFIG_DUMMY=m
+CONFIG_EQUALIZER=m
+# CONFIG_NET_FC is not set
+CONFIG_IFB=m
+CONFIG_NET_TEAM=m
+CONFIG_NET_TEAM_MODE_BROADCAST=m
+CONFIG_NET_TEAM_MODE_ROUNDROBIN=m
+CONFIG_NET_TEAM_MODE_RANDOM=m
+CONFIG_NET_TEAM_MODE_ACTIVEBACKUP=m
+CONFIG_NET_TEAM_MODE_LOADBALANCE=m
+CONFIG_MACVLAN=m
+CONFIG_MACVTAP=m
+CONFIG_VXLAN=m
+CONFIG_NETCONSOLE=m
+CONFIG_NETCONSOLE_DYNAMIC=y
+CONFIG_NETPOLL=y
+CONFIG_NET_POLL_CONTROLLER=y
+CONFIG_NTB_NETDEV=m
+CONFIG_RIONET=m
+CONFIG_RIONET_TX_SIZE=128
+CONFIG_RIONET_RX_SIZE=128
+CONFIG_TUN=m
+CONFIG_VETH=m
+CONFIG_VIRTIO_NET=m
+CONFIG_NLMON=m
+# CONFIG_ARCNET is not set
+
+#
+# CAIF transport drivers
+#
+CONFIG_VHOST_NET=m
+CONFIG_VHOST_SCSI=m
+CONFIG_VHOST_RING=m
+CONFIG_VHOST=m
+
+#
+# Distributed Switch Architecture drivers
+#
+# CONFIG_NET_DSA_MV88E6XXX is not set
+# CONFIG_NET_DSA_MV88E6060 is not set
+# CONFIG_NET_DSA_MV88E6XXX_NEED_PPU is not set
+# CONFIG_NET_DSA_MV88E6131 is not set
+# CONFIG_NET_DSA_MV88E6123_61_65 is not set
+CONFIG_ETHERNET=y
+CONFIG_MDIO=m
+# CONFIG_NET_VENDOR_3COM is not set
+# CONFIG_NET_VENDOR_ADAPTEC is not set
+# CONFIG_NET_VENDOR_ALTEON is not set
+# CONFIG_ALTERA_TSE is not set
+# CONFIG_NET_VENDOR_AMD is not set
+# CONFIG_NET_VENDOR_ARC is not set
+CONFIG_NET_VENDOR_ATHEROS=y
+# CONFIG_ATL2 is not set
+# CONFIG_ATL1 is not set
+# CONFIG_ATL1E is not set
+# CONFIG_ATL1C is not set
+CONFIG_ALX=m
+# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_VENDOR_BROCADE is not set
+# CONFIG_NET_CALXEDA_XGMAC is not set
+# CONFIG_NET_VENDOR_CHELSIO is not set
+# CONFIG_NET_VENDOR_CISCO is not set
+# CONFIG_CX_ECAT is not set
+# CONFIG_DNET is not set
+# CONFIG_NET_VENDOR_DEC is not set
+# CONFIG_NET_VENDOR_DLINK is not set
+# CONFIG_NET_VENDOR_EMULEX is not set
+# CONFIG_NET_VENDOR_EXAR is not set
+# CONFIG_NET_VENDOR_HP is not set
+# CONFIG_NET_VENDOR_INTEL is not set
+# CONFIG_IP1000 is not set
+# CONFIG_JME is not set
+# CONFIG_NET_VENDOR_MARVELL is not set
+# CONFIG_NET_VENDOR_MELLANOX is not set
+# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROCHIP is not set
+# CONFIG_NET_VENDOR_MYRI is not set
+# CONFIG_FEALNX is not set
+# CONFIG_NET_VENDOR_NATSEMI is not set
+# CONFIG_NET_VENDOR_NVIDIA is not set
+# CONFIG_NET_VENDOR_OKI is not set
+# CONFIG_ETHOC is not set
+# CONFIG_NET_PACKET_ENGINE is not set
+# CONFIG_NET_VENDOR_QLOGIC is not set
+# CONFIG_NET_VENDOR_REALTEK is not set
+# CONFIG_SH_ETH is not set
+# CONFIG_NET_VENDOR_RDC is not set
+# CONFIG_NET_VENDOR_SAMSUNG is not set
+# CONFIG_NET_VENDOR_SEEQ is not set
+# CONFIG_NET_VENDOR_SILAN is not set
+# CONFIG_NET_VENDOR_SIS is not set
+# CONFIG_SFC is not set
+# CONFIG_NET_VENDOR_SMSC is not set
+# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SUN is not set
+# CONFIG_NET_VENDOR_TEHUTI is not set
+# CONFIG_NET_VENDOR_TI is not set
+# CONFIG_NET_VENDOR_VIA is not set
+# CONFIG_NET_VENDOR_WIZNET is not set
+# CONFIG_FDDI is not set
+# CONFIG_HIPPI is not set
+# CONFIG_NET_SB1000 is not set
+CONFIG_PHYLIB=m
+
+#
+# MII PHY device drivers
+#
+# CONFIG_AT803X_PHY is not set
+# CONFIG_AMD_PHY is not set
+# CONFIG_MARVELL_PHY is not set
+# CONFIG_DAVICOM_PHY is not set
+# CONFIG_QSEMI_PHY is not set
+# CONFIG_LXT_PHY is not set
+# CONFIG_CICADA_PHY is not set
+# CONFIG_VITESSE_PHY is not set
+# CONFIG_SMSC_PHY is not set
+# CONFIG_BROADCOM_PHY is not set
+# CONFIG_BCM7XXX_PHY is not set
+# CONFIG_BCM87XX_PHY is not set
+# CONFIG_ICPLUS_PHY is not set
+# CONFIG_REALTEK_PHY is not set
+# CONFIG_NATIONAL_PHY is not set
+# CONFIG_STE10XP is not set
+# CONFIG_LSI_ET1011C_PHY is not set
+# CONFIG_MICREL_PHY is not set
+# CONFIG_MDIO_BITBANG is not set
+# CONFIG_MICREL_KS8995MA is not set
+CONFIG_PPP=m
+CONFIG_PPP_BSDCOMP=m
+CONFIG_PPP_DEFLATE=m
+CONFIG_PPP_FILTER=y
+CONFIG_PPP_MPPE=m
+CONFIG_PPP_MULTILINK=y
+CONFIG_PPPOE=m
+CONFIG_PPTP=m
+CONFIG_PPPOL2TP=m
+CONFIG_PPP_ASYNC=m
+CONFIG_PPP_SYNC_TTY=m
+# CONFIG_SLIP is not set
+CONFIG_SLHC=m
+
+#
+# USB Network Adapters
+#
+# CONFIG_USB_CATC is not set
+# CONFIG_USB_KAWETH is not set
+# CONFIG_USB_PEGASUS is not set
+# CONFIG_USB_RTL8150 is not set
+# CONFIG_USB_RTL8152 is not set
+CONFIG_USB_USBNET=m
+# CONFIG_USB_NET_AX8817X is not set
+# CONFIG_USB_NET_AX88179_178A is not set
+CONFIG_USB_NET_CDCETHER=m
+CONFIG_USB_NET_CDC_EEM=m
+CONFIG_USB_NET_CDC_NCM=m
+CONFIG_USB_NET_HUAWEI_CDC_NCM=m
+CONFIG_USB_NET_CDC_MBIM=m
+# CONFIG_USB_NET_DM9601 is not set
+# CONFIG_USB_NET_SR9700 is not set
+# CONFIG_USB_NET_SR9800 is not set
+# CONFIG_USB_NET_SMSC75XX is not set
+# CONFIG_USB_NET_SMSC95XX is not set
+# CONFIG_USB_NET_GL620A is not set
+# CONFIG_USB_NET_NET1080 is not set
+CONFIG_USB_NET_PLUSB=m
+# CONFIG_USB_NET_MCS7830 is not set
+CONFIG_USB_NET_RNDIS_HOST=m
+CONFIG_USB_NET_CDC_SUBSET=m
+# CONFIG_USB_ALI_M5632 is not set
+# CONFIG_USB_AN2720 is not set
+# CONFIG_USB_BELKIN is not set
+# CONFIG_USB_ARMLINUX is not set
+# CONFIG_USB_EPSON2888 is not set
+# CONFIG_USB_KC2190 is not set
+# CONFIG_USB_NET_ZAURUS is not set
+# CONFIG_USB_NET_CX82310_ETH is not set
+# CONFIG_USB_NET_KALMIA is not set
+# CONFIG_USB_NET_QMI_WWAN is not set
+# CONFIG_USB_HSO is not set
+# CONFIG_USB_NET_INT51X1 is not set
+CONFIG_USB_IPHETH=m
+# CONFIG_USB_SIERRA_NET is not set
+# CONFIG_USB_VL600 is not set
+CONFIG_WLAN=y
+# CONFIG_LIBERTAS_THINFIRM is not set
+# CONFIG_AIRO is not set
+# CONFIG_ATMEL is not set
+# CONFIG_AT76C50X_USB is not set
+# CONFIG_PRISM54 is not set
+# CONFIG_USB_ZD1201 is not set
+CONFIG_USB_NET_RNDIS_WLAN=m
+# CONFIG_RTL8180 is not set
+# CONFIG_RTL8187 is not set
+# CONFIG_ADM8211 is not set
+# CONFIG_MAC80211_HWSIM is not set
+# CONFIG_MWL8K is not set
+CONFIG_ATH_COMMON=m
+CONFIG_ATH_CARDS=m
+# CONFIG_ATH_DEBUG is not set
+# CONFIG_ATH5K is not set
+# CONFIG_ATH5K_PCI is not set
+CONFIG_ATH9K_HW=m
+CONFIG_ATH9K_COMMON=m
+CONFIG_ATH9K_BTCOEX_SUPPORT=y
+CONFIG_ATH9K=m
+CONFIG_ATH9K_PCI=y
+CONFIG_ATH9K_AHB=y
+# CONFIG_ATH9K_DEBUGFS is not set
+# CONFIG_ATH9K_WOW is not set
+CONFIG_ATH9K_RFKILL=y
+# CONFIG_ATH9K_HTC is not set
+# CONFIG_CARL9170 is not set
+# CONFIG_ATH6KL is not set
+# CONFIG_AR5523 is not set
+# CONFIG_WIL6210 is not set
+# CONFIG_ATH10K is not set
+# CONFIG_WCN36XX is not set
+# CONFIG_B43 is not set
+# CONFIG_B43LEGACY is not set
+# CONFIG_BRCMSMAC is not set
+# CONFIG_BRCMFMAC is not set
+# CONFIG_HOSTAP is not set
+# CONFIG_IPW2100 is not set
+# CONFIG_IPW2200 is not set
+# CONFIG_IWLWIFI is not set
+# CONFIG_IWL4965 is not set
+# CONFIG_IWL3945 is not set
+# CONFIG_LIBERTAS is not set
+# CONFIG_HERMES is not set
+# CONFIG_P54_COMMON is not set
+# CONFIG_RT2X00 is not set
+# CONFIG_RTL_CARDS is not set
+# CONFIG_WL_TI is not set
+# CONFIG_ZD1211RW is not set
+# CONFIG_MWIFIEX is not set
+# CONFIG_CW1200 is not set
+# CONFIG_RSI_91X is not set
+
+#
+# Enable WiMAX (Networking options) to see the WiMAX drivers
+#
+# CONFIG_WAN is not set
+CONFIG_XEN_NETDEV_FRONTEND=m
+CONFIG_XEN_NETDEV_BACKEND=m
+CONFIG_VMXNET3=m
+# CONFIG_ISDN is not set
+
+#
+# Input device support
+#
+CONFIG_INPUT=y
+CONFIG_INPUT_FF_MEMLESS=m
+CONFIG_INPUT_POLLDEV=m
+CONFIG_INPUT_SPARSEKMAP=m
+CONFIG_INPUT_MATRIXKMAP=m
+
+#
+# Userland interfaces
+#
+CONFIG_INPUT_MOUSEDEV=y
+CONFIG_INPUT_MOUSEDEV_PSAUX=y
+CONFIG_INPUT_MOUSEDEV_SCREEN_X=1366
+CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
+# CONFIG_INPUT_JOYDEV is not set
+CONFIG_INPUT_EVDEV=m
+# CONFIG_INPUT_EVBUG is not set
+
+#
+# Input Device Drivers
+#
+CONFIG_INPUT_KEYBOARD=y
+# CONFIG_KEYBOARD_ADP5588 is not set
+# CONFIG_KEYBOARD_ADP5589 is not set
+CONFIG_KEYBOARD_ATKBD=y
+# CONFIG_KEYBOARD_QT1070 is not set
+# CONFIG_KEYBOARD_QT2160 is not set
+# CONFIG_KEYBOARD_LKKBD is not set
+# CONFIG_KEYBOARD_GPIO is not set
+# CONFIG_KEYBOARD_GPIO_POLLED is not set
+# CONFIG_KEYBOARD_TCA6416 is not set
+# CONFIG_KEYBOARD_TCA8418 is not set
+# CONFIG_KEYBOARD_MATRIX is not set
+# CONFIG_KEYBOARD_LM8323 is not set
+# CONFIG_KEYBOARD_LM8333 is not set
+# CONFIG_KEYBOARD_MAX7359 is not set
+# CONFIG_KEYBOARD_MCS is not set
+# CONFIG_KEYBOARD_MPR121 is not set
+# CONFIG_KEYBOARD_NEWTON is not set
+# CONFIG_KEYBOARD_OPENCORES is not set
+# CONFIG_KEYBOARD_SAMSUNG is not set
+# CONFIG_KEYBOARD_STOWAWAY is not set
+# CONFIG_KEYBOARD_SUNKBD is not set
+# CONFIG_KEYBOARD_XTKBD is not set
+CONFIG_INPUT_MOUSE=y
+CONFIG_MOUSE_PS2=m
+CONFIG_MOUSE_PS2_ALPS=y
+CONFIG_MOUSE_PS2_LOGIPS2PP=y
+CONFIG_MOUSE_PS2_SYNAPTICS=y
+CONFIG_MOUSE_PS2_CYPRESS=y
+# CONFIG_MOUSE_PS2_LIFEBOOK is not set
+# CONFIG_MOUSE_PS2_TRACKPOINT is not set
+CONFIG_MOUSE_PS2_ELANTECH=y
+# CONFIG_MOUSE_PS2_SENTELIC is not set
+# CONFIG_MOUSE_PS2_TOUCHKIT is not set
+CONFIG_MOUSE_SERIAL=y
+# CONFIG_MOUSE_APPLETOUCH is not set
+# CONFIG_MOUSE_BCM5974 is not set
+CONFIG_MOUSE_CYAPA=m
+# CONFIG_MOUSE_VSXXXAA is not set
+# CONFIG_MOUSE_GPIO is not set
+CONFIG_MOUSE_SYNAPTICS_I2C=m
+CONFIG_MOUSE_SYNAPTICS_USB=m
+# CONFIG_INPUT_JOYSTICK is not set
+# CONFIG_INPUT_TABLET is not set
+# CONFIG_INPUT_TOUCHSCREEN is not set
+CONFIG_INPUT_MISC=y
+# CONFIG_INPUT_AD714X is not set
+# CONFIG_INPUT_BMA150 is not set
+# CONFIG_INPUT_MMA8450 is not set
+# CONFIG_INPUT_MPU3050 is not set
+# CONFIG_INPUT_APANEL is not set
+# CONFIG_INPUT_GP2A is not set
+# CONFIG_INPUT_GPIO_TILT_POLLED is not set
+# CONFIG_INPUT_ATLAS_BTNS is not set
+# CONFIG_INPUT_ATI_REMOTE2 is not set
+# CONFIG_INPUT_KEYSPAN_REMOTE is not set
+# CONFIG_INPUT_KXTJ9 is not set
+# CONFIG_INPUT_POWERMATE is not set
+# CONFIG_INPUT_YEALINK is not set
+# CONFIG_INPUT_CM109 is not set
+CONFIG_INPUT_UINPUT=m
+# CONFIG_INPUT_PCF8574 is not set
+# CONFIG_INPUT_PWM_BEEPER is not set
+# CONFIG_INPUT_GPIO_ROTARY_ENCODER is not set
+# CONFIG_INPUT_ADXL34X is not set
+# CONFIG_INPUT_IMS_PCU is not set
+# CONFIG_INPUT_CMA3000 is not set
+CONFIG_INPUT_XEN_KBDDEV_FRONTEND=m
+# CONFIG_INPUT_IDEAPAD_SLIDEBAR is not set
+
+#
+# Hardware I/O ports
+#
+CONFIG_SERIO=y
+CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
+CONFIG_SERIO_I8042=y
+CONFIG_SERIO_SERPORT=m
+CONFIG_SERIO_CT82C710=m
+CONFIG_SERIO_PCIPS2=m
+CONFIG_SERIO_LIBPS2=y
+CONFIG_SERIO_RAW=m
+# CONFIG_SERIO_ALTERA_PS2 is not set
+# CONFIG_SERIO_PS2MULT is not set
+# CONFIG_SERIO_ARC_PS2 is not set
+# CONFIG_GAMEPORT is not set
+
+#
+# Character devices
+#
+CONFIG_TTY=y
+CONFIG_VT=y
+CONFIG_CONSOLE_TRANSLATIONS=y
+CONFIG_VT_CONSOLE=y
+CONFIG_VT_CONSOLE_SLEEP=y
+CONFIG_HW_CONSOLE=y
+CONFIG_VT_HW_CONSOLE_BINDING=y
+CONFIG_UNIX98_PTYS=y
+CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
+CONFIG_LEGACY_PTYS=y
+CONFIG_LEGACY_PTY_COUNT=256
+# CONFIG_SERIAL_NONSTANDARD is not set
+# CONFIG_NOZOMI is not set
+# CONFIG_N_GSM is not set
+# CONFIG_TRACE_SINK is not set
+CONFIG_DEVKMEM=y
+
+#
+# Serial drivers
+#
+CONFIG_SERIAL_8250=y
+# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
+CONFIG_SERIAL_8250_PNP=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_FIX_EARLYCON_MEM=y
+CONFIG_SERIAL_8250_DMA=y
+CONFIG_SERIAL_8250_PCI=y
+CONFIG_SERIAL_8250_NR_UARTS=32
+CONFIG_SERIAL_8250_RUNTIME_UARTS=4
+CONFIG_SERIAL_8250_EXTENDED=y
+CONFIG_SERIAL_8250_MANY_PORTS=y
+CONFIG_SERIAL_8250_SHARE_IRQ=y
+CONFIG_SERIAL_8250_DETECT_IRQ=y
+CONFIG_SERIAL_8250_RSA=y
+# CONFIG_SERIAL_8250_DW is not set
+
+#
+# Non-8250 serial port support
+#
+# CONFIG_SERIAL_MAX3100 is not set
+# CONFIG_SERIAL_MAX310X is not set
+CONFIG_SERIAL_MFD_HSU=m
+CONFIG_SERIAL_CORE=y
+CONFIG_SERIAL_CORE_CONSOLE=y
+# CONFIG_SERIAL_JSM is not set
+# CONFIG_SERIAL_SCCNXP is not set
+# CONFIG_SERIAL_ALTERA_JTAGUART is not set
+# CONFIG_SERIAL_ALTERA_UART is not set
+# CONFIG_SERIAL_IFX6X60 is not set
+CONFIG_SERIAL_PCH_UART=m
+# CONFIG_SERIAL_ARC is not set
+# CONFIG_SERIAL_RP2 is not set
+# CONFIG_SERIAL_FSL_LPUART is not set
+# CONFIG_TTY_PRINTK is not set
+CONFIG_HVC_DRIVER=y
+CONFIG_HVC_IRQ=y
+CONFIG_HVC_XEN=y
+CONFIG_HVC_XEN_FRONTEND=y
+CONFIG_VIRTIO_CONSOLE=m
+# CONFIG_IPMI_HANDLER is not set
+CONFIG_HW_RANDOM=y
+CONFIG_HW_RANDOM_TIMERIOMEM=m
+CONFIG_HW_RANDOM_INTEL=m
+# CONFIG_HW_RANDOM_AMD is not set
+# CONFIG_HW_RANDOM_ATMEL is not set
+# CONFIG_HW_RANDOM_VIA is not set
+CONFIG_HW_RANDOM_VIRTIO=m
+# CONFIG_HW_RANDOM_EXYNOS is not set
+CONFIG_NVRAM=m
+# CONFIG_R3964 is not set
+# CONFIG_APPLICOM is not set
+# CONFIG_MWAVE is not set
+# CONFIG_RAW_DRIVER is not set
+CONFIG_HPET=y
+CONFIG_HPET_MMAP=y
+CONFIG_HPET_MMAP_DEFAULT=y
+CONFIG_HANGCHECK_TIMER=m
+# CONFIG_TCG_TPM is not set
+# CONFIG_TELCLOCK is not set
+CONFIG_DEVPORT=y
+CONFIG_I2C=y
+CONFIG_I2C_BOARDINFO=y
+CONFIG_I2C_COMPAT=y
+CONFIG_I2C_CHARDEV=m
+CONFIG_I2C_MUX=m
+
+#
+# Multiplexer I2C Chip support
+#
+# CONFIG_I2C_MUX_GPIO is not set
+# CONFIG_I2C_MUX_PCA9541 is not set
+# CONFIG_I2C_MUX_PCA954x is not set
+# CONFIG_I2C_MUX_PINCTRL is not set
+CONFIG_I2C_HELPER_AUTO=y
+CONFIG_I2C_ALGOBIT=y
+
+#
+# I2C Hardware Bus support
+#
+
+#
+# PC SMBus host controller drivers
+#
+# CONFIG_I2C_ALI1535 is not set
+# CONFIG_I2C_ALI1563 is not set
+# CONFIG_I2C_ALI15X3 is not set
+# CONFIG_I2C_AMD756 is not set
+# CONFIG_I2C_AMD8111 is not set
+CONFIG_I2C_I801=m
+CONFIG_I2C_ISCH=m
+CONFIG_I2C_ISMT=m
+CONFIG_I2C_PIIX4=m
+# CONFIG_I2C_NFORCE2 is not set
+# CONFIG_I2C_SIS5595 is not set
+# CONFIG_I2C_SIS630 is not set
+# CONFIG_I2C_SIS96X is not set
+# CONFIG_I2C_VIA is not set
+# CONFIG_I2C_VIAPRO is not set
+
+#
+# ACPI drivers
+#
+CONFIG_I2C_SCMI=m
+
+#
+# I2C system bus drivers (mostly embedded / system-on-chip)
+#
+CONFIG_I2C_CBUS_GPIO=m
+# CONFIG_I2C_DESIGNWARE_PLATFORM is not set
+# CONFIG_I2C_DESIGNWARE_PCI is not set
+CONFIG_I2C_EG20T=m
+CONFIG_I2C_GPIO=m
+# CONFIG_I2C_OCORES is not set
+# CONFIG_I2C_PCA_PLATFORM is not set
+# CONFIG_I2C_PXA_PCI is not set
+# CONFIG_I2C_SIMTEC is not set
+# CONFIG_I2C_XILINX is not set
+
+#
+# External I2C/SMBus adapter drivers
+#
+# CONFIG_I2C_DIOLAN_U2C is not set
+# CONFIG_I2C_PARPORT_LIGHT is not set
+# CONFIG_I2C_ROBOTFUZZ_OSIF is not set
+# CONFIG_I2C_TAOS_EVM is not set
+# CONFIG_I2C_TINY_USB is not set
+
+#
+# Other I2C/SMBus bus drivers
+#
+# CONFIG_I2C_STUB is not set
+# CONFIG_I2C_DEBUG_CORE is not set
+# CONFIG_I2C_DEBUG_ALGO is not set
+# CONFIG_I2C_DEBUG_BUS is not set
+CONFIG_SPI=y
+# CONFIG_SPI_DEBUG is not set
+CONFIG_SPI_MASTER=y
+
+#
+# SPI Master Controller Drivers
+#
+# CONFIG_SPI_ALTERA is not set
+CONFIG_SPI_BITBANG=m
+CONFIG_SPI_GPIO=m
+# CONFIG_SPI_OC_TINY is not set
+# CONFIG_SPI_PXA2XX is not set
+# CONFIG_SPI_PXA2XX_PCI is not set
+# CONFIG_SPI_SC18IS602 is not set
+# CONFIG_SPI_TOPCLIFF_PCH is not set
+# CONFIG_SPI_XCOMM is not set
+# CONFIG_SPI_XILINX is not set
+# CONFIG_SPI_DESIGNWARE is not set
+
+#
+# SPI Protocol Masters
+#
+CONFIG_SPI_SPIDEV=m
+# CONFIG_SPI_TLE62X0 is not set
+# CONFIG_SPMI is not set
+# CONFIG_HSI is not set
+
+#
+# PPS support
+#
+# CONFIG_PPS is not set
+
+#
+# PPS generators support
+#
+
+#
+# PTP clock support
+#
+# CONFIG_PTP_1588_CLOCK is not set
+# CONFIG_DP83640_PHY is not set
+# CONFIG_PTP_1588_CLOCK_PCH is not set
+CONFIG_PINCTRL=y
+
+#
+# Pin controllers
+#
+# CONFIG_PINMUX is not set
+# CONFIG_PINCONF is not set
+# CONFIG_DEBUG_PINCTRL is not set
+# CONFIG_PINCTRL_BAYTRAIL is not set
+CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
+CONFIG_GPIOLIB=y
+CONFIG_GPIO_DEVRES=y
+CONFIG_GPIO_ACPI=y
+# CONFIG_DEBUG_GPIO is not set
+CONFIG_GPIO_SYSFS=y
+CONFIG_GPIO_GENERIC=m
+
+#
+# Memory mapped GPIO drivers:
+#
+CONFIG_GPIO_GENERIC_PLATFORM=m
+# CONFIG_GPIO_IT8761E is not set
+# CONFIG_GPIO_F7188X is not set
+# CONFIG_GPIO_SCH311X is not set
+CONFIG_GPIO_SCH=m
+CONFIG_GPIO_ICH=m
+# CONFIG_GPIO_VX855 is not set
+CONFIG_GPIO_LYNXPOINT=y
+
+#
+# I2C GPIO expanders:
+#
+# CONFIG_GPIO_MAX7300 is not set
+# CONFIG_GPIO_MAX732X is not set
+# CONFIG_GPIO_PCA953X is not set
+# CONFIG_GPIO_PCF857X is not set
+# CONFIG_GPIO_SX150X is not set
+# CONFIG_GPIO_ADP5588 is not set
+
+#
+# PCI GPIO expanders:
+#
+# CONFIG_GPIO_BT8XX is not set
+# CONFIG_GPIO_AMD8111 is not set
+# CONFIG_GPIO_INTEL_MID is not set
+# CONFIG_GPIO_PCH is not set
+# CONFIG_GPIO_ML_IOH is not set
+# CONFIG_GPIO_RDC321X is not set
+
+#
+# SPI GPIO expanders:
+#
+# CONFIG_GPIO_MAX7301 is not set
+# CONFIG_GPIO_MC33880 is not set
+
+#
+# AC97 GPIO expanders:
+#
+
+#
+# LPC GPIO expanders:
+#
+
+#
+# MODULbus GPIO expanders:
+#
+
+#
+# USB GPIO expanders:
+#
+# CONFIG_W1 is not set
+CONFIG_POWER_SUPPLY=y
+# CONFIG_POWER_SUPPLY_DEBUG is not set
+CONFIG_PDA_POWER=m
+CONFIG_GENERIC_ADC_BATTERY=m
+CONFIG_TEST_POWER=m
+# CONFIG_BATTERY_DS2780 is not set
+# CONFIG_BATTERY_DS2781 is not set
+# CONFIG_BATTERY_DS2782 is not set
+# CONFIG_BATTERY_SBS is not set
+# CONFIG_BATTERY_BQ27x00 is not set
+# CONFIG_BATTERY_MAX17040 is not set
+# CONFIG_BATTERY_MAX17042 is not set
+# CONFIG_CHARGER_ISP1704 is not set
+# CONFIG_CHARGER_MAX8903 is not set
+# CONFIG_CHARGER_LP8727 is not set
+CONFIG_CHARGER_GPIO=m
+# CONFIG_CHARGER_MANAGER is not set
+# CONFIG_CHARGER_BQ2415X is not set
+# CONFIG_CHARGER_BQ24190 is not set
+# CONFIG_CHARGER_BQ24735 is not set
+# CONFIG_CHARGER_SMB347 is not set
+CONFIG_POWER_RESET=y
+CONFIG_POWER_AVS=y
+CONFIG_HWMON=y
+# CONFIG_HWMON_VID is not set
+# CONFIG_HWMON_DEBUG_CHIP is not set
+
+#
+# Native drivers
+#
+# CONFIG_SENSORS_ABITUGURU is not set
+# CONFIG_SENSORS_ABITUGURU3 is not set
+# CONFIG_SENSORS_AD7314 is not set
+# CONFIG_SENSORS_AD7414 is not set
+# CONFIG_SENSORS_AD7418 is not set
+# CONFIG_SENSORS_ADM1021 is not set
+# CONFIG_SENSORS_ADM1025 is not set
+# CONFIG_SENSORS_ADM1026 is not set
+# CONFIG_SENSORS_ADM1029 is not set
+# CONFIG_SENSORS_ADM1031 is not set
+# CONFIG_SENSORS_ADM9240 is not set
+# CONFIG_SENSORS_ADT7310 is not set
+# CONFIG_SENSORS_ADT7410 is not set
+# CONFIG_SENSORS_ADT7411 is not set
+# CONFIG_SENSORS_ADT7462 is not set
+# CONFIG_SENSORS_ADT7470 is not set
+# CONFIG_SENSORS_ADT7475 is not set
+# CONFIG_SENSORS_ASC7621 is not set
+# CONFIG_SENSORS_K8TEMP is not set
+# CONFIG_SENSORS_K10TEMP is not set
+# CONFIG_SENSORS_FAM15H_POWER is not set
+# CONFIG_SENSORS_APPLESMC is not set
+# CONFIG_SENSORS_ASB100 is not set
+# CONFIG_SENSORS_ATXP1 is not set
+# CONFIG_SENSORS_DS620 is not set
+# CONFIG_SENSORS_DS1621 is not set
+# CONFIG_SENSORS_I5K_AMB is not set
+# CONFIG_SENSORS_F71805F is not set
+# CONFIG_SENSORS_F71882FG is not set
+# CONFIG_SENSORS_F75375S is not set
+# CONFIG_SENSORS_FSCHMD is not set
+# CONFIG_SENSORS_GL518SM is not set
+# CONFIG_SENSORS_GL520SM is not set
+# CONFIG_SENSORS_G760A is not set
+# CONFIG_SENSORS_G762 is not set
+CONFIG_SENSORS_GPIO_FAN=m
+# CONFIG_SENSORS_HIH6130 is not set
+CONFIG_SENSORS_IIO_HWMON=m
+CONFIG_SENSORS_CORETEMP=y
+# CONFIG_SENSORS_IT87 is not set
+# CONFIG_SENSORS_JC42 is not set
+# CONFIG_SENSORS_LINEAGE is not set
+# CONFIG_SENSORS_LTC2945 is not set
+# CONFIG_SENSORS_LTC4151 is not set
+# CONFIG_SENSORS_LTC4215 is not set
+# CONFIG_SENSORS_LTC4222 is not set
+# CONFIG_SENSORS_LTC4245 is not set
+# CONFIG_SENSORS_LTC4260 is not set
+# CONFIG_SENSORS_LTC4261 is not set
+# CONFIG_SENSORS_MAX1111 is not set
+# CONFIG_SENSORS_MAX16065 is not set
+# CONFIG_SENSORS_MAX1619 is not set
+# CONFIG_SENSORS_MAX1668 is not set
+# CONFIG_SENSORS_MAX197 is not set
+# CONFIG_SENSORS_MAX6639 is not set
+# CONFIG_SENSORS_MAX6642 is not set
+# CONFIG_SENSORS_MAX6650 is not set
+# CONFIG_SENSORS_MAX6697 is not set
+# CONFIG_SENSORS_HTU21 is not set
+# CONFIG_SENSORS_MCP3021 is not set
+# CONFIG_SENSORS_ADCXX is not set
+# CONFIG_SENSORS_LM63 is not set
+# CONFIG_SENSORS_LM70 is not set
+# CONFIG_SENSORS_LM73 is not set
+# CONFIG_SENSORS_LM75 is not set
+# CONFIG_SENSORS_LM77 is not set
+# CONFIG_SENSORS_LM78 is not set
+# CONFIG_SENSORS_LM80 is not set
+# CONFIG_SENSORS_LM83 is not set
+# CONFIG_SENSORS_LM85 is not set
+# CONFIG_SENSORS_LM87 is not set
+# CONFIG_SENSORS_LM90 is not set
+# CONFIG_SENSORS_LM92 is not set
+# CONFIG_SENSORS_LM93 is not set
+# CONFIG_SENSORS_LM95234 is not set
+# CONFIG_SENSORS_LM95241 is not set
+# CONFIG_SENSORS_LM95245 is not set
+# CONFIG_SENSORS_PC87360 is not set
+# CONFIG_SENSORS_PC87427 is not set
+# CONFIG_SENSORS_NTC_THERMISTOR is not set
+# CONFIG_SENSORS_NCT6775 is not set
+# CONFIG_SENSORS_PCF8591 is not set
+CONFIG_PMBUS=m
+CONFIG_SENSORS_PMBUS=m
+# CONFIG_SENSORS_ADM1275 is not set
+# CONFIG_SENSORS_LM25066 is not set
+# CONFIG_SENSORS_LTC2978 is not set
+# CONFIG_SENSORS_MAX16064 is not set
+# CONFIG_SENSORS_MAX34440 is not set
+# CONFIG_SENSORS_MAX8688 is not set
+# CONFIG_SENSORS_UCD9000 is not set
+# CONFIG_SENSORS_UCD9200 is not set
+# CONFIG_SENSORS_ZL6100 is not set
+# CONFIG_SENSORS_SHT15 is not set
+# CONFIG_SENSORS_SHT21 is not set
+# CONFIG_SENSORS_SIS5595 is not set
+# CONFIG_SENSORS_DME1737 is not set
+# CONFIG_SENSORS_EMC1403 is not set
+# CONFIG_SENSORS_EMC2103 is not set
+# CONFIG_SENSORS_EMC6W201 is not set
+# CONFIG_SENSORS_SMSC47M1 is not set
+# CONFIG_SENSORS_SMSC47M192 is not set
+# CONFIG_SENSORS_SMSC47B397 is not set
+# CONFIG_SENSORS_SCH56XX_COMMON is not set
+# CONFIG_SENSORS_SCH5627 is not set
+# CONFIG_SENSORS_SCH5636 is not set
+# CONFIG_SENSORS_SMM665 is not set
+# CONFIG_SENSORS_ADC128D818 is not set
+# CONFIG_SENSORS_ADS1015 is not set
+# CONFIG_SENSORS_ADS7828 is not set
+# CONFIG_SENSORS_ADS7871 is not set
+# CONFIG_SENSORS_AMC6821 is not set
+# CONFIG_SENSORS_INA209 is not set
+# CONFIG_SENSORS_INA2XX is not set
+# CONFIG_SENSORS_THMC50 is not set
+# CONFIG_SENSORS_TMP102 is not set
+# CONFIG_SENSORS_TMP401 is not set
+# CONFIG_SENSORS_TMP421 is not set
+# CONFIG_SENSORS_VIA_CPUTEMP is not set
+# CONFIG_SENSORS_VIA686A is not set
+# CONFIG_SENSORS_VT1211 is not set
+# CONFIG_SENSORS_VT8231 is not set
+# CONFIG_SENSORS_W83781D is not set
+# CONFIG_SENSORS_W83791D is not set
+# CONFIG_SENSORS_W83792D is not set
+# CONFIG_SENSORS_W83793 is not set
+# CONFIG_SENSORS_W83795 is not set
+# CONFIG_SENSORS_W83L785TS is not set
+# CONFIG_SENSORS_W83L786NG is not set
+# CONFIG_SENSORS_W83627HF is not set
+# CONFIG_SENSORS_W83627EHF is not set
+
+#
+# ACPI drivers
+#
+CONFIG_SENSORS_ACPI_POWER=m
+# CONFIG_SENSORS_ATK0110 is not set
+CONFIG_THERMAL=y
+CONFIG_THERMAL_HWMON=y
+CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
+# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
+# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
+CONFIG_THERMAL_GOV_FAIR_SHARE=y
+CONFIG_THERMAL_GOV_STEP_WISE=y
+CONFIG_THERMAL_GOV_USER_SPACE=y
+# CONFIG_THERMAL_EMULATION is not set
+CONFIG_INTEL_POWERCLAMP=m
+CONFIG_X86_PKG_TEMP_THERMAL=m
+CONFIG_ACPI_INT3403_THERMAL=m
+
+#
+# Texas Instruments thermal drivers
+#
+CONFIG_WATCHDOG=y
+CONFIG_WATCHDOG_CORE=y
+# CONFIG_WATCHDOG_NOWAYOUT is not set
+
+#
+# Watchdog Device Drivers
+#
+CONFIG_SOFT_WATCHDOG=m
+# CONFIG_XILINX_WATCHDOG is not set
+# CONFIG_DW_WATCHDOG is not set
+# CONFIG_ACQUIRE_WDT is not set
+# CONFIG_ADVANTECH_WDT is not set
+# CONFIG_ALIM1535_WDT is not set
+# CONFIG_ALIM7101_WDT is not set
+# CONFIG_F71808E_WDT is not set
+# CONFIG_SP5100_TCO is not set
+# CONFIG_SBC_FITPC2_WATCHDOG is not set
+# CONFIG_EUROTECH_WDT is not set
+# CONFIG_IB700_WDT is not set
+# CONFIG_IBMASR is not set
+# CONFIG_WAFER_WDT is not set
+# CONFIG_I6300ESB_WDT is not set
+# CONFIG_IE6XX_WDT is not set
+CONFIG_ITCO_WDT=m
+CONFIG_ITCO_VENDOR_SUPPORT=y
+# CONFIG_IT8712F_WDT is not set
+# CONFIG_IT87_WDT is not set
+# CONFIG_HP_WATCHDOG is not set
+# CONFIG_SC1200_WDT is not set
+# CONFIG_PC87413_WDT is not set
+# CONFIG_NV_TCO is not set
+# CONFIG_60XX_WDT is not set
+# CONFIG_SBC8360_WDT is not set
+# CONFIG_CPU5_WDT is not set
+# CONFIG_SMSC_SCH311X_WDT is not set
+# CONFIG_SMSC37B787_WDT is not set
+# CONFIG_VIA_WDT is not set
+# CONFIG_W83627HF_WDT is not set
+# CONFIG_W83697HF_WDT is not set
+# CONFIG_W83697UG_WDT is not set
+# CONFIG_W83877F_WDT is not set
+# CONFIG_W83977F_WDT is not set
+# CONFIG_MACHZ_WDT is not set
+# CONFIG_SBC_EPX_C3_WATCHDOG is not set
+# CONFIG_MEN_A21_WDT is not set
+CONFIG_XEN_WDT=m
+
+#
+# PCI-based Watchdog Cards
+#
+# CONFIG_PCIPCWATCHDOG is not set
+# CONFIG_WDTPCI is not set
+
+#
+# USB-based Watchdog Cards
+#
+# CONFIG_USBPCWATCHDOG is not set
+CONFIG_SSB_POSSIBLE=y
+
+#
+# Sonics Silicon Backplane
+#
+CONFIG_SSB=m
+CONFIG_SSB_SPROM=y
+CONFIG_SSB_PCIHOST_POSSIBLE=y
+CONFIG_SSB_PCIHOST=y
+# CONFIG_SSB_B43_PCI_BRIDGE is not set
+# CONFIG_SSB_SILENT is not set
+# CONFIG_SSB_DEBUG is not set
+CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
+CONFIG_SSB_DRIVER_PCICORE=y
+CONFIG_SSB_DRIVER_GPIO=y
+CONFIG_BCMA_POSSIBLE=y
+
+#
+# Broadcom specific AMBA
+#
+# CONFIG_BCMA is not set
+
+#
+# Multifunction device drivers
+#
+CONFIG_MFD_CORE=m
+# CONFIG_MFD_CS5535 is not set
+# CONFIG_MFD_AS3711 is not set
+# CONFIG_PMIC_ADP5520 is not set
+# CONFIG_MFD_AAT2870_CORE is not set
+# CONFIG_MFD_BCM590XX is not set
+# CONFIG_MFD_CROS_EC is not set
+# CONFIG_PMIC_DA903X is not set
+# CONFIG_MFD_DA9052_SPI is not set
+# CONFIG_MFD_DA9052_I2C is not set
+# CONFIG_MFD_DA9055 is not set
+# CONFIG_MFD_DA9063 is not set
+# CONFIG_MFD_MC13XXX_SPI is not set
+# CONFIG_MFD_MC13XXX_I2C is not set
+# CONFIG_HTC_PASIC3 is not set
+# CONFIG_HTC_I2CPLD is not set
+CONFIG_LPC_ICH=m
+CONFIG_LPC_SCH=m
+# CONFIG_MFD_JANZ_CMODIO is not set
+# CONFIG_MFD_KEMPLD is not set
+# CONFIG_MFD_88PM800 is not set
+# CONFIG_MFD_88PM805 is not set
+# CONFIG_MFD_88PM860X is not set
+# CONFIG_MFD_MAX14577 is not set
+# CONFIG_MFD_MAX77686 is not set
+# CONFIG_MFD_MAX77693 is not set
+# CONFIG_MFD_MAX8907 is not set
+# CONFIG_MFD_MAX8925 is not set
+# CONFIG_MFD_MAX8997 is not set
+# CONFIG_MFD_MAX8998 is not set
+# CONFIG_EZX_PCAP is not set
+# CONFIG_MFD_VIPERBOARD is not set
+# CONFIG_MFD_RETU is not set
+# CONFIG_MFD_PCF50633 is not set
+# CONFIG_MFD_RDC321X is not set
+# CONFIG_MFD_RTSX_PCI is not set
+CONFIG_MFD_RTSX_USB=m
+# CONFIG_MFD_RC5T583 is not set
+# CONFIG_MFD_SEC_CORE is not set
+# CONFIG_MFD_SI476X_CORE is not set
+# CONFIG_MFD_SM501 is not set
+# CONFIG_MFD_SMSC is not set
+# CONFIG_ABX500_CORE is not set
+# CONFIG_MFD_STMPE is not set
+CONFIG_MFD_SYSCON=y
+# CONFIG_MFD_TI_AM335X_TSCADC is not set
+# CONFIG_MFD_LP3943 is not set
+# CONFIG_MFD_LP8788 is not set
+# CONFIG_MFD_PALMAS is not set
+# CONFIG_TPS6105X is not set
+# CONFIG_TPS65010 is not set
+# CONFIG_TPS6507X is not set
+# CONFIG_MFD_TPS65090 is not set
+# CONFIG_MFD_TPS65217 is not set
+# CONFIG_MFD_TPS65218 is not set
+# CONFIG_MFD_TPS6586X is not set
+# CONFIG_MFD_TPS65910 is not set
+# CONFIG_MFD_TPS65912 is not set
+# CONFIG_MFD_TPS65912_I2C is not set
+# CONFIG_MFD_TPS65912_SPI is not set
+# CONFIG_MFD_TPS80031 is not set
+# CONFIG_TWL4030_CORE is not set
+# CONFIG_TWL6040_CORE is not set
+# CONFIG_MFD_WL1273_CORE is not set
+# CONFIG_MFD_LM3533 is not set
+# CONFIG_MFD_TIMBERDALE is not set
+# CONFIG_MFD_TC3589X is not set
+# CONFIG_MFD_TMIO is not set
+# CONFIG_MFD_VX855 is not set
+# CONFIG_MFD_ARIZONA_I2C is not set
+# CONFIG_MFD_ARIZONA_SPI is not set
+# CONFIG_MFD_WM8400 is not set
+# CONFIG_MFD_WM831X_I2C is not set
+# CONFIG_MFD_WM831X_SPI is not set
+# CONFIG_MFD_WM8350_I2C is not set
+# CONFIG_MFD_WM8994 is not set
+CONFIG_REGULATOR=y
+# CONFIG_REGULATOR_DEBUG is not set
+CONFIG_REGULATOR_FIXED_VOLTAGE=m
+CONFIG_REGULATOR_VIRTUAL_CONSUMER=m
+CONFIG_REGULATOR_USERSPACE_CONSUMER=m
+# CONFIG_REGULATOR_ACT8865 is not set
+# CONFIG_REGULATOR_AD5398 is not set
+# CONFIG_REGULATOR_ANATOP is not set
+# CONFIG_REGULATOR_DA9210 is not set
+# CONFIG_REGULATOR_FAN53555 is not set
+CONFIG_REGULATOR_GPIO=m
+# CONFIG_REGULATOR_ISL6271A is not set
+# CONFIG_REGULATOR_LP3971 is not set
+# CONFIG_REGULATOR_LP3972 is not set
+# CONFIG_REGULATOR_LP872X is not set
+# CONFIG_REGULATOR_LP8755 is not set
+# CONFIG_REGULATOR_MAX1586 is not set
+# CONFIG_REGULATOR_MAX8649 is not set
+# CONFIG_REGULATOR_MAX8660 is not set
+# CONFIG_REGULATOR_MAX8952 is not set
+# CONFIG_REGULATOR_MAX8973 is not set
+# CONFIG_REGULATOR_PFUZE100 is not set
+# CONFIG_REGULATOR_TPS51632 is not set
+# CONFIG_REGULATOR_TPS62360 is not set
+# CONFIG_REGULATOR_TPS65023 is not set
+# CONFIG_REGULATOR_TPS6507X is not set
+# CONFIG_REGULATOR_TPS6524X is not set
+CONFIG_MEDIA_SUPPORT=m
+
+#
+# Multimedia core support
+#
+CONFIG_MEDIA_CAMERA_SUPPORT=y
+# CONFIG_MEDIA_ANALOG_TV_SUPPORT is not set
+# CONFIG_MEDIA_DIGITAL_TV_SUPPORT is not set
+# CONFIG_MEDIA_RADIO_SUPPORT is not set
+# CONFIG_MEDIA_RC_SUPPORT is not set
+CONFIG_MEDIA_CONTROLLER=y
+CONFIG_VIDEO_DEV=m
+CONFIG_VIDEO_V4L2_SUBDEV_API=y
+CONFIG_VIDEO_V4L2=m
+# CONFIG_VIDEO_ADV_DEBUG is not set
+# CONFIG_VIDEO_FIXED_MINOR_RANGES is not set
+CONFIG_V4L2_MEM2MEM_DEV=m
+CONFIG_VIDEOBUF2_CORE=m
+CONFIG_VIDEOBUF2_MEMOPS=m
+CONFIG_VIDEOBUF2_DMA_CONTIG=m
+CONFIG_VIDEOBUF2_VMALLOC=m
+# CONFIG_TTPCI_EEPROM is not set
+
+#
+# Media drivers
+#
+CONFIG_MEDIA_USB_SUPPORT=y
+
+#
+# Webcam devices
+#
+CONFIG_USB_VIDEO_CLASS=m
+CONFIG_USB_VIDEO_CLASS_INPUT_EVDEV=y
+# CONFIG_USB_GSPCA is not set
+# CONFIG_USB_PWC is not set
+# CONFIG_VIDEO_CPIA2 is not set
+# CONFIG_USB_ZR364XX is not set
+# CONFIG_USB_STKWEBCAM is not set
+# CONFIG_USB_S2255 is not set
+# CONFIG_VIDEO_USBTV is not set
+
+#
+# Webcam, TV (analog/digital) USB devices
+#
+# CONFIG_VIDEO_EM28XX is not set
+CONFIG_MEDIA_PCI_SUPPORT=y
+
+#
+# Media capture support
+#
+# CONFIG_V4L_PLATFORM_DRIVERS is not set
+CONFIG_V4L_MEM2MEM_DRIVERS=y
+CONFIG_VIDEO_MEM2MEM_DEINTERLACE=m
+# CONFIG_VIDEO_SH_VEU is not set
+# CONFIG_VIDEO_RENESAS_VSP1 is not set
+# CONFIG_V4L_TEST_DRIVERS is not set
+
+#
+# Supported MMC/SDIO adapters
+#
+# CONFIG_CYPRESS_FIRMWARE is not set
+
+#
+# Media ancillary drivers (tuners, sensors, i2c, frontends)
+#
+CONFIG_MEDIA_SUBDRV_AUTOSELECT=y
+
+#
+# Audio decoders, processors and mixers
+#
+
+#
+# RDS decoders
+#
+
+#
+# Video decoders
+#
+
+#
+# Video and audio decoders
+#
+
+#
+# Video encoders
+#
+
+#
+# Camera sensor devices
+#
+
+#
+# Flash devices
+#
+
+#
+# Video improvement chips
+#
+
+#
+# Audio/Video compression chips
+#
+
+#
+# Miscellaneous helper chips
+#
+
+#
+# Sensors used on soc_camera driver
+#
+
+#
+# Tools to develop new frontends
+#
+# CONFIG_DVB_DUMMY_FE is not set
+
+#
+# Graphics support
+#
+CONFIG_AGP=y
+CONFIG_AGP_INTEL=y
+# CONFIG_AGP_SIS is not set
+# CONFIG_AGP_VIA is not set
+CONFIG_INTEL_GTT=y
+CONFIG_VGA_ARB=y
+CONFIG_VGA_ARB_MAX_GPUS=16
+# CONFIG_VGA_SWITCHEROO is not set
+
+#
+# Direct Rendering Manager
+#
+CONFIG_DRM=y
+CONFIG_DRM_KMS_HELPER=y
+CONFIG_DRM_KMS_FB_HELPER=y
+CONFIG_DRM_LOAD_EDID_FIRMWARE=y
+CONFIG_DRM_TTM=m
+
+#
+# I2C encoder or helper chips
+#
+# CONFIG_DRM_I2C_CH7006 is not set
+# CONFIG_DRM_I2C_SIL164 is not set
+# CONFIG_DRM_I2C_NXP_TDA998X is not set
+# CONFIG_DRM_TDFX is not set
+# CONFIG_DRM_R128 is not set
+# CONFIG_DRM_RADEON is not set
+# CONFIG_DRM_NOUVEAU is not set
+CONFIG_DRM_I915=y
+CONFIG_DRM_I915_KMS=y
+CONFIG_DRM_I915_FBDEV=y
+# CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT is not set
+# CONFIG_DRM_I915_UMS is not set
+# CONFIG_DRM_MGA is not set
+# CONFIG_DRM_SIS is not set
+# CONFIG_DRM_VIA is not set
+# CONFIG_DRM_SAVAGE is not set
+CONFIG_DRM_VMWGFX=m
+# CONFIG_DRM_VMWGFX_FBCON is not set
+# CONFIG_DRM_GMA500 is not set
+# CONFIG_DRM_UDL is not set
+# CONFIG_DRM_AST is not set
+# CONFIG_DRM_MGAG200 is not set
+CONFIG_DRM_CIRRUS_QEMU=m
+CONFIG_DRM_QXL=m
+CONFIG_DRM_BOCHS=m
+# CONFIG_DRM_PTN3460 is not set
+
+#
+# Frame buffer Devices
+#
+CONFIG_FB=y
+CONFIG_FIRMWARE_EDID=y
+# CONFIG_FB_DDC is not set
+# CONFIG_FB_BOOT_VESA_SUPPORT is not set
+CONFIG_FB_CFB_FILLRECT=y
+CONFIG_FB_CFB_COPYAREA=y
+CONFIG_FB_CFB_IMAGEBLIT=y
+# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
+CONFIG_FB_SYS_FILLRECT=m
+CONFIG_FB_SYS_COPYAREA=m
+CONFIG_FB_SYS_IMAGEBLIT=m
+# CONFIG_FB_FOREIGN_ENDIAN is not set
+CONFIG_FB_SYS_FOPS=m
+CONFIG_FB_DEFERRED_IO=y
+# CONFIG_FB_SVGALIB is not set
+# CONFIG_FB_MACMODES is not set
+# CONFIG_FB_BACKLIGHT is not set
+CONFIG_FB_MODE_HELPERS=y
+CONFIG_FB_TILEBLITTING=y
+
+#
+# Frame buffer hardware drivers
+#
+# CONFIG_FB_CIRRUS is not set
+# CONFIG_FB_PM2 is not set
+# CONFIG_FB_CYBER2000 is not set
+# CONFIG_FB_ARC is not set
+# CONFIG_FB_ASILIANT is not set
+# CONFIG_FB_IMSTT is not set
+# CONFIG_FB_VGA16 is not set
+# CONFIG_FB_UVESA is not set
+# CONFIG_FB_VESA is not set
+CONFIG_FB_EFI=y
+# CONFIG_FB_N411 is not set
+# CONFIG_FB_HGA is not set
+# CONFIG_FB_OPENCORES is not set
+# CONFIG_FB_S1D13XXX is not set
+# CONFIG_FB_NVIDIA is not set
+# CONFIG_FB_RIVA is not set
+# CONFIG_FB_I740 is not set
+# CONFIG_FB_LE80578 is not set
+# CONFIG_FB_MATROX is not set
+# CONFIG_FB_RADEON is not set
+# CONFIG_FB_ATY128 is not set
+# CONFIG_FB_ATY is not set
+# CONFIG_FB_S3 is not set
+# CONFIG_FB_SAVAGE is not set
+# CONFIG_FB_SIS is not set
+# CONFIG_FB_VIA is not set
+# CONFIG_FB_NEOMAGIC is not set
+# CONFIG_FB_KYRO is not set
+# CONFIG_FB_3DFX is not set
+# CONFIG_FB_VOODOO1 is not set
+# CONFIG_FB_VT8623 is not set
+# CONFIG_FB_TRIDENT is not set
+# CONFIG_FB_ARK is not set
+# CONFIG_FB_PM3 is not set
+# CONFIG_FB_CARMINE is not set
+# CONFIG_FB_TMIO is not set
+# CONFIG_FB_SMSCUFX is not set
+# CONFIG_FB_UDL is not set
+# CONFIG_FB_GOLDFISH is not set
+CONFIG_FB_VIRTUAL=m
+CONFIG_XEN_FBDEV_FRONTEND=m
+# CONFIG_FB_METRONOME is not set
+# CONFIG_FB_MB862XX is not set
+# CONFIG_FB_BROADSHEET is not set
+# CONFIG_FB_AUO_K190X is not set
+# CONFIG_FB_SIMPLE is not set
+# CONFIG_EXYNOS_VIDEO is not set
+CONFIG_BACKLIGHT_LCD_SUPPORT=y
+CONFIG_LCD_CLASS_DEVICE=m
+# CONFIG_LCD_L4F00242T03 is not set
+# CONFIG_LCD_LMS283GF05 is not set
+# CONFIG_LCD_LTV350QV is not set
+# CONFIG_LCD_ILI922X is not set
+# CONFIG_LCD_ILI9320 is not set
+# CONFIG_LCD_TDO24M is not set
+# CONFIG_LCD_VGG2432A4 is not set
+CONFIG_LCD_PLATFORM=m
+# CONFIG_LCD_S6E63M0 is not set
+# CONFIG_LCD_LD9040 is not set
+# CONFIG_LCD_AMS369FG06 is not set
+# CONFIG_LCD_LMS501KF03 is not set
+# CONFIG_LCD_HX8357 is not set
+CONFIG_BACKLIGHT_CLASS_DEVICE=y
+CONFIG_BACKLIGHT_GENERIC=m
+CONFIG_BACKLIGHT_PWM=m
+# CONFIG_BACKLIGHT_APPLE is not set
+# CONFIG_BACKLIGHT_SAHARA is not set
+# CONFIG_BACKLIGHT_ADP8860 is not set
+# CONFIG_BACKLIGHT_ADP8870 is not set
+# CONFIG_BACKLIGHT_LM3630A is not set
+# CONFIG_BACKLIGHT_LM3639 is not set
+# CONFIG_BACKLIGHT_LP855X is not set
+CONFIG_BACKLIGHT_GPIO=m
+# CONFIG_BACKLIGHT_LV5207LP is not set
+# CONFIG_BACKLIGHT_BD6107 is not set
+# CONFIG_VGASTATE is not set
+CONFIG_HDMI=y
+
+#
+# Console display driver support
+#
+CONFIG_VGA_CONSOLE=y
+CONFIG_VGACON_SOFT_SCROLLBACK=y
+CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=128
+CONFIG_DUMMY_CONSOLE=y
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
+CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
+CONFIG_LOGO=y
+# CONFIG_LOGO_LINUX_MONO is not set
+# CONFIG_LOGO_LINUX_VGA16 is not set
+CONFIG_LOGO_LINUX_CLUT224=y
+CONFIG_SOUND=m
+CONFIG_SOUND_OSS_CORE=y
+# CONFIG_SOUND_OSS_CORE_PRECLAIM is not set
+CONFIG_SND=m
+CONFIG_SND_TIMER=m
+CONFIG_SND_PCM=m
+CONFIG_SND_HWDEP=m
+CONFIG_SND_RAWMIDI=m
+CONFIG_SND_JACK=y
+CONFIG_SND_SEQUENCER=m
+CONFIG_SND_SEQ_DUMMY=m
+CONFIG_SND_OSSEMUL=y
+CONFIG_SND_MIXER_OSS=m
+CONFIG_SND_PCM_OSS=m
+CONFIG_SND_PCM_OSS_PLUGINS=y
+CONFIG_SND_SEQUENCER_OSS=y
+CONFIG_SND_HRTIMER=m
+CONFIG_SND_SEQ_HRTIMER_DEFAULT=y
+CONFIG_SND_DYNAMIC_MINORS=y
+CONFIG_SND_MAX_CARDS=32
+CONFIG_SND_SUPPORT_OLD_API=y
+CONFIG_SND_VERBOSE_PROCFS=y
+# CONFIG_SND_VERBOSE_PRINTK is not set
+# CONFIG_SND_DEBUG is not set
+CONFIG_SND_VMASTER=y
+CONFIG_SND_KCTL_JACK=y
+CONFIG_SND_DMA_SGBUF=y
+CONFIG_SND_RAWMIDI_SEQ=m
+# CONFIG_SND_OPL3_LIB_SEQ is not set
+# CONFIG_SND_OPL4_LIB_SEQ is not set
+# CONFIG_SND_SBAWE_SEQ is not set
+# CONFIG_SND_EMU10K1_SEQ is not set
+CONFIG_SND_MPU401_UART=m
+CONFIG_SND_DRIVERS=y
+CONFIG_SND_DUMMY=m
+CONFIG_SND_ALOOP=m
+CONFIG_SND_VIRMIDI=m
+CONFIG_SND_MTPAV=m
+CONFIG_SND_SERIAL_U16550=m
+CONFIG_SND_MPU401=m
+CONFIG_SND_PCI=y
+# CONFIG_SND_AD1889 is not set
+# CONFIG_SND_ALS300 is not set
+# CONFIG_SND_ALS4000 is not set
+# CONFIG_SND_ALI5451 is not set
+# CONFIG_SND_ASIHPI is not set
+# CONFIG_SND_ATIIXP is not set
+# CONFIG_SND_ATIIXP_MODEM is not set
+# CONFIG_SND_AU8810 is not set
+# CONFIG_SND_AU8820 is not set
+# CONFIG_SND_AU8830 is not set
+# CONFIG_SND_AW2 is not set
+# CONFIG_SND_AZT3328 is not set
+# CONFIG_SND_BT87X is not set
+# CONFIG_SND_CA0106 is not set
+# CONFIG_SND_CMIPCI is not set
+# CONFIG_SND_OXYGEN is not set
+# CONFIG_SND_CS4281 is not set
+# CONFIG_SND_CS46XX is not set
+# CONFIG_SND_CTXFI is not set
+# CONFIG_SND_DARLA20 is not set
+# CONFIG_SND_GINA20 is not set
+# CONFIG_SND_LAYLA20 is not set
+# CONFIG_SND_DARLA24 is not set
+# CONFIG_SND_GINA24 is not set
+# CONFIG_SND_LAYLA24 is not set
+# CONFIG_SND_MONA is not set
+# CONFIG_SND_MIA is not set
+# CONFIG_SND_ECHO3G is not set
+# CONFIG_SND_INDIGO is not set
+# CONFIG_SND_INDIGOIO is not set
+# CONFIG_SND_INDIGODJ is not set
+# CONFIG_SND_INDIGOIOX is not set
+# CONFIG_SND_INDIGODJX is not set
+# CONFIG_SND_EMU10K1 is not set
+# CONFIG_SND_EMU10K1X is not set
+# CONFIG_SND_ENS1370 is not set
+# CONFIG_SND_ENS1371 is not set
+# CONFIG_SND_ES1938 is not set
+# CONFIG_SND_ES1968 is not set
+# CONFIG_SND_FM801 is not set
+# CONFIG_SND_HDSP is not set
+# CONFIG_SND_HDSPM is not set
+# CONFIG_SND_ICE1712 is not set
+# CONFIG_SND_ICE1724 is not set
+# CONFIG_SND_INTEL8X0 is not set
+# CONFIG_SND_INTEL8X0M is not set
+# CONFIG_SND_KORG1212 is not set
+# CONFIG_SND_LOLA is not set
+# CONFIG_SND_LX6464ES is not set
+# CONFIG_SND_MAESTRO3 is not set
+# CONFIG_SND_MIXART is not set
+# CONFIG_SND_NM256 is not set
+# CONFIG_SND_PCXHR is not set
+# CONFIG_SND_RIPTIDE is not set
+# CONFIG_SND_RME32 is not set
+# CONFIG_SND_RME96 is not set
+# CONFIG_SND_RME9652 is not set
+# CONFIG_SND_SONICVIBES is not set
+# CONFIG_SND_TRIDENT is not set
+# CONFIG_SND_VIA82XX is not set
+# CONFIG_SND_VIA82XX_MODEM is not set
+# CONFIG_SND_VIRTUOSO is not set
+# CONFIG_SND_VX222 is not set
+# CONFIG_SND_YMFPCI is not set
+
+#
+# HD-Audio
+#
+CONFIG_SND_HDA=m
+CONFIG_SND_HDA_INTEL=m
+CONFIG_SND_HDA_PREALLOC_SIZE=4096
+CONFIG_SND_HDA_HWDEP=y
+CONFIG_SND_HDA_RECONFIG=y
+CONFIG_SND_HDA_INPUT_BEEP=y
+CONFIG_SND_HDA_INPUT_BEEP_MODE=1
+CONFIG_SND_HDA_INPUT_JACK=y
+CONFIG_SND_HDA_PATCH_LOADER=y
+# CONFIG_SND_HDA_CODEC_REALTEK is not set
+# CONFIG_SND_HDA_CODEC_ANALOG is not set
+# CONFIG_SND_HDA_CODEC_SIGMATEL is not set
+# CONFIG_SND_HDA_CODEC_VIA is not set
+CONFIG_SND_HDA_CODEC_HDMI=m
+CONFIG_SND_HDA_I915=y
+CONFIG_SND_HDA_CODEC_CIRRUS=m
+# CONFIG_SND_HDA_CODEC_CONEXANT is not set
+# CONFIG_SND_HDA_CODEC_CA0110 is not set
+# CONFIG_SND_HDA_CODEC_CA0132 is not set
+# CONFIG_SND_HDA_CODEC_CMEDIA is not set
+# CONFIG_SND_HDA_CODEC_SI3054 is not set
+CONFIG_SND_HDA_GENERIC=m
+CONFIG_SND_HDA_POWER_SAVE_DEFAULT=0
+# CONFIG_SND_SPI is not set
+# CONFIG_SND_USB is not set
+# CONFIG_SND_SOC is not set
+# CONFIG_SOUND_PRIME is not set
+
+#
+# HID support
+#
+CONFIG_HID=m
+CONFIG_HIDRAW=y
+CONFIG_UHID=m
+CONFIG_HID_GENERIC=m
+
+#
+# Special HID drivers
+#
+# CONFIG_HID_A4TECH is not set
+# CONFIG_HID_ACRUX is not set
+# CONFIG_HID_APPLE is not set
+# CONFIG_HID_APPLEIR is not set
+# CONFIG_HID_AUREAL is not set
+# CONFIG_HID_BELKIN is not set
+# CONFIG_HID_CHERRY is not set
+# CONFIG_HID_CHICONY is not set
+# CONFIG_HID_PRODIKEYS is not set
+# CONFIG_HID_CP2112 is not set
+# CONFIG_HID_CYPRESS is not set
+# CONFIG_HID_DRAGONRISE is not set
+# CONFIG_HID_EMS_FF is not set
+# CONFIG_HID_ELECOM is not set
+# CONFIG_HID_ELO is not set
+# CONFIG_HID_EZKEY is not set
+# CONFIG_HID_HOLTEK is not set
+# CONFIG_HID_HUION is not set
+# CONFIG_HID_KEYTOUCH is not set
+# CONFIG_HID_KYE is not set
+# CONFIG_HID_UCLOGIC is not set
+# CONFIG_HID_WALTOP is not set
+# CONFIG_HID_GYRATION is not set
+# CONFIG_HID_ICADE is not set
+# CONFIG_HID_TWINHAN is not set
+# CONFIG_HID_KENSINGTON is not set
+# CONFIG_HID_LCPOWER is not set
+# CONFIG_HID_LENOVO_TPKBD is not set
+CONFIG_HID_LOGITECH=m
+CONFIG_HID_LOGITECH_DJ=m
+# CONFIG_LOGITECH_FF is not set
+# CONFIG_LOGIRUMBLEPAD2_FF is not set
+# CONFIG_LOGIG940_FF is not set
+# CONFIG_LOGIWHEELS_FF is not set
+# CONFIG_HID_MAGICMOUSE is not set
+# CONFIG_HID_MICROSOFT is not set
+# CONFIG_HID_MONTEREY is not set
+# CONFIG_HID_MULTITOUCH is not set
+# CONFIG_HID_NTRIG is not set
+# CONFIG_HID_ORTEK is not set
+# CONFIG_HID_PANTHERLORD is not set
+# CONFIG_HID_PETALYNX is not set
+# CONFIG_HID_PICOLCD is not set
+# CONFIG_HID_PRIMAX is not set
+# CONFIG_HID_ROCCAT is not set
+# CONFIG_HID_SAITEK is not set
+# CONFIG_HID_SAMSUNG is not set
+# CONFIG_HID_SONY is not set
+# CONFIG_HID_SPEEDLINK is not set
+# CONFIG_HID_STEELSERIES is not set
+# CONFIG_HID_SUNPLUS is not set
+# CONFIG_HID_GREENASIA is not set
+# CONFIG_HID_SMARTJOYPLUS is not set
+# CONFIG_HID_TIVO is not set
+# CONFIG_HID_TOPSEED is not set
+# CONFIG_HID_THINGM is not set
+# CONFIG_HID_THRUSTMASTER is not set
+# CONFIG_HID_WACOM is not set
+# CONFIG_HID_WIIMOTE is not set
+# CONFIG_HID_XINMO is not set
+# CONFIG_HID_ZEROPLUS is not set
+# CONFIG_HID_ZYDACRON is not set
+CONFIG_HID_SENSOR_HUB=m
+
+#
+# USB HID support
+#
+CONFIG_USB_HID=m
+CONFIG_HID_PID=y
+CONFIG_USB_HIDDEV=y
+
+#
+# USB HID Boot Protocol drivers
+#
+# CONFIG_USB_KBD is not set
+# CONFIG_USB_MOUSE is not set
+
+#
+# I2C HID support
+#
+CONFIG_I2C_HID=m
+CONFIG_USB_OHCI_LITTLE_ENDIAN=y
+CONFIG_USB_SUPPORT=y
+CONFIG_USB_COMMON=m
+CONFIG_USB_ARCH_HAS_HCD=y
+CONFIG_USB=m
+# CONFIG_USB_DEBUG is not set
+CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
+
+#
+# Miscellaneous USB options
+#
+CONFIG_USB_DEFAULT_PERSIST=y
+CONFIG_USB_DYNAMIC_MINORS=y
+CONFIG_USB_OTG=y
+# CONFIG_USB_OTG_WHITELIST is not set
+# CONFIG_USB_OTG_BLACKLIST_HUB is not set
+CONFIG_USB_MON=m
+# CONFIG_USB_WUSB_CBAF is not set
+
+#
+# USB Host Controller Drivers
+#
+# CONFIG_USB_C67X00_HCD is not set
+CONFIG_USB_XHCI_HCD=m
+CONFIG_USB_EHCI_HCD=m
+CONFIG_USB_EHCI_ROOT_HUB_TT=y
+CONFIG_USB_EHCI_TT_NEWSCHED=y
+CONFIG_USB_EHCI_PCI=m
+# CONFIG_USB_EHCI_HCD_PLATFORM is not set
+# CONFIG_USB_OXU210HP_HCD is not set
+# CONFIG_USB_ISP116X_HCD is not set
+# CONFIG_USB_ISP1760_HCD is not set
+# CONFIG_USB_ISP1362_HCD is not set
+# CONFIG_USB_FUSBH200_HCD is not set
+# CONFIG_USB_FOTG210_HCD is not set
+CONFIG_USB_OHCI_HCD=m
+CONFIG_USB_OHCI_HCD_PCI=m
+# CONFIG_USB_OHCI_HCD_SSB is not set
+# CONFIG_USB_OHCI_HCD_PLATFORM is not set
+CONFIG_USB_UHCI_HCD=m
+# CONFIG_USB_SL811_HCD is not set
+# CONFIG_USB_R8A66597_HCD is not set
+# CONFIG_USB_HCD_SSB is not set
+# CONFIG_USB_HCD_TEST_MODE is not set
+# CONFIG_USB_RENESAS_USBHS is not set
+
+#
+# USB Device Class drivers
+#
+CONFIG_USB_ACM=m
+CONFIG_USB_PRINTER=m
+CONFIG_USB_WDM=m
+CONFIG_USB_TMC=m
+
+#
+# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
+#
+
+#
+# also be needed; see USB_STORAGE Help for more info
+#
+CONFIG_USB_STORAGE=m
+# CONFIG_USB_STORAGE_DEBUG is not set
+# CONFIG_USB_STORAGE_REALTEK is not set
+# CONFIG_USB_STORAGE_DATAFAB is not set
+# CONFIG_USB_STORAGE_FREECOM is not set
+# CONFIG_USB_STORAGE_ISD200 is not set
+# CONFIG_USB_STORAGE_USBAT is not set
+# CONFIG_USB_STORAGE_SDDR09 is not set
+# CONFIG_USB_STORAGE_SDDR55 is not set
+# CONFIG_USB_STORAGE_JUMPSHOT is not set
+# CONFIG_USB_STORAGE_ALAUDA is not set
+# CONFIG_USB_STORAGE_ONETOUCH is not set
+# CONFIG_USB_STORAGE_KARMA is not set
+# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
+# CONFIG_USB_STORAGE_ENE_UB6250 is not set
+CONFIG_USB_UAS=m
+
+#
+# USB Imaging devices
+#
+# CONFIG_USB_MDC800 is not set
+# CONFIG_USB_MICROTEK is not set
+# CONFIG_USB_MUSB_HDRC is not set
+# CONFIG_USB_DWC3 is not set
+# CONFIG_USB_DWC2 is not set
+# CONFIG_USB_CHIPIDEA is not set
+
+#
+# USB port drivers
+#
+CONFIG_USB_SERIAL=m
+CONFIG_USB_SERIAL_GENERIC=y
+CONFIG_USB_SERIAL_SIMPLE=m
+# CONFIG_USB_SERIAL_AIRCABLE is not set
+# CONFIG_USB_SERIAL_ARK3116 is not set
+# CONFIG_USB_SERIAL_BELKIN is not set
+# CONFIG_USB_SERIAL_CH341 is not set
+# CONFIG_USB_SERIAL_WHITEHEAT is not set
+# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
+# CONFIG_USB_SERIAL_CP210X is not set
+# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
+# CONFIG_USB_SERIAL_EMPEG is not set
+CONFIG_USB_SERIAL_FTDI_SIO=m
+# CONFIG_USB_SERIAL_VISOR is not set
+# CONFIG_USB_SERIAL_IPAQ is not set
+# CONFIG_USB_SERIAL_IR is not set
+# CONFIG_USB_SERIAL_EDGEPORT is not set
+# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
+# CONFIG_USB_SERIAL_F81232 is not set
+# CONFIG_USB_SERIAL_GARMIN is not set
+# CONFIG_USB_SERIAL_IPW is not set
+# CONFIG_USB_SERIAL_IUU is not set
+# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
+# CONFIG_USB_SERIAL_KEYSPAN is not set
+# CONFIG_USB_SERIAL_KLSI is not set
+# CONFIG_USB_SERIAL_KOBIL_SCT is not set
+# CONFIG_USB_SERIAL_MCT_U232 is not set
+# CONFIG_USB_SERIAL_METRO is not set
+# CONFIG_USB_SERIAL_MOS7720 is not set
+# CONFIG_USB_SERIAL_MOS7840 is not set
+# CONFIG_USB_SERIAL_MXUPORT is not set
+# CONFIG_USB_SERIAL_NAVMAN is not set
+CONFIG_USB_SERIAL_PL2303=m
+# CONFIG_USB_SERIAL_OTI6858 is not set
+# CONFIG_USB_SERIAL_QCAUX is not set
+# CONFIG_USB_SERIAL_QUALCOMM is not set
+# CONFIG_USB_SERIAL_SPCP8X5 is not set
+CONFIG_USB_SERIAL_SAFE=m
+# CONFIG_USB_SERIAL_SAFE_PADDED is not set
+# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
+# CONFIG_USB_SERIAL_SYMBOL is not set
+# CONFIG_USB_SERIAL_TI is not set
+# CONFIG_USB_SERIAL_CYBERJACK is not set
+# CONFIG_USB_SERIAL_XIRCOM is not set
+CONFIG_USB_SERIAL_WWAN=m
+CONFIG_USB_SERIAL_OPTION=m
+# CONFIG_USB_SERIAL_OMNINET is not set
+# CONFIG_USB_SERIAL_OPTICON is not set
+# CONFIG_USB_SERIAL_XSENS_MT is not set
+# CONFIG_USB_SERIAL_WISHBONE is not set
+CONFIG_USB_SERIAL_ZTE=m
+# CONFIG_USB_SERIAL_SSU100 is not set
+# CONFIG_USB_SERIAL_QT2 is not set
+# CONFIG_USB_SERIAL_DEBUG is not set
+
+#
+# USB Miscellaneous drivers
+#
+# CONFIG_USB_EMI62 is not set
+# CONFIG_USB_EMI26 is not set
+# CONFIG_USB_ADUTUX is not set
+# CONFIG_USB_SEVSEG is not set
+# CONFIG_USB_RIO500 is not set
+# CONFIG_USB_LEGOTOWER is not set
+# CONFIG_USB_LCD is not set
+# CONFIG_USB_LED is not set
+# CONFIG_USB_CYPRESS_CY7C63 is not set
+# CONFIG_USB_CYTHERM is not set
+# CONFIG_USB_IDMOUSE is not set
+# CONFIG_USB_FTDI_ELAN is not set
+# CONFIG_USB_APPLEDISPLAY is not set
+# CONFIG_USB_SISUSBVGA is not set
+# CONFIG_USB_LD is not set
+# CONFIG_USB_TRANCEVIBRATOR is not set
+# CONFIG_USB_IOWARRIOR is not set
+# CONFIG_USB_TEST is not set
+# CONFIG_USB_EHSET_TEST_FIXTURE is not set
+# CONFIG_USB_ISIGHTFW is not set
+# CONFIG_USB_YUREX is not set
+# CONFIG_USB_EZUSB_FX2 is not set
+# CONFIG_USB_HSIC_USB3503 is not set
+
+#
+# USB Physical Layer drivers
+#
+CONFIG_USB_PHY=y
+CONFIG_USB_OTG_FSM=m
+# CONFIG_NOP_USB_XCEIV is not set
+# CONFIG_SAMSUNG_USB2PHY is not set
+# CONFIG_SAMSUNG_USB3PHY is not set
+# CONFIG_USB_GPIO_VBUS is not set
+# CONFIG_USB_ISP1301 is not set
+# CONFIG_USB_RCAR_PHY is not set
+CONFIG_USB_GADGET=m
+# CONFIG_USB_GADGET_DEBUG is not set
+# CONFIG_USB_GADGET_DEBUG_FILES is not set
+# CONFIG_USB_GADGET_DEBUG_FS is not set
+CONFIG_USB_GADGET_VBUS_DRAW=2
+CONFIG_USB_GADGET_STORAGE_NUM_BUFFERS=2
+
+#
+# USB Peripheral Controller
+#
+# CONFIG_USB_FOTG210_UDC is not set
+# CONFIG_USB_GR_UDC is not set
+# CONFIG_USB_R8A66597 is not set
+# CONFIG_USB_PXA27X is not set
+# CONFIG_USB_S3C_HSOTG is not set
+# CONFIG_USB_MV_UDC is not set
+# CONFIG_USB_MV_U3D is not set
+# CONFIG_USB_M66592 is not set
+# CONFIG_USB_AMD5536UDC is not set
+# CONFIG_USB_NET2272 is not set
+# CONFIG_USB_NET2280 is not set
+# CONFIG_USB_GOKU is not set
+CONFIG_USB_EG20T=m
+CONFIG_USB_DUMMY_HCD=m
+CONFIG_USB_LIBCOMPOSITE=m
+CONFIG_USB_F_ACM=m
+CONFIG_USB_U_SERIAL=m
+CONFIG_USB_U_ETHER=m
+CONFIG_USB_F_SERIAL=m
+CONFIG_USB_F_OBEX=m
+CONFIG_USB_F_NCM=m
+CONFIG_USB_F_ECM=m
+CONFIG_USB_F_EEM=m
+CONFIG_USB_F_SUBSET=m
+CONFIG_USB_F_RNDIS=m
+CONFIG_USB_F_MASS_STORAGE=m
+CONFIG_USB_F_FS=m
+CONFIG_USB_CONFIGFS=m
+CONFIG_USB_CONFIGFS_SERIAL=y
+CONFIG_USB_CONFIGFS_ACM=y
+CONFIG_USB_CONFIGFS_OBEX=y
+CONFIG_USB_CONFIGFS_NCM=y
+CONFIG_USB_CONFIGFS_ECM=y
+CONFIG_USB_CONFIGFS_ECM_SUBSET=y
+CONFIG_USB_CONFIGFS_RNDIS=y
+CONFIG_USB_CONFIGFS_EEM=y
+CONFIG_USB_CONFIGFS_MASS_STORAGE=y
+# CONFIG_USB_CONFIGFS_F_LB_SS is not set
+CONFIG_USB_CONFIGFS_F_FS=y
+# CONFIG_USB_ZERO is not set
+CONFIG_USB_AUDIO=m
+# CONFIG_GADGET_UAC1 is not set
+CONFIG_USB_ETH=m
+CONFIG_USB_ETH_RNDIS=y
+# CONFIG_USB_ETH_EEM is not set
+# CONFIG_USB_G_NCM is not set
+# CONFIG_USB_GADGETFS is not set
+# CONFIG_USB_FUNCTIONFS is not set
+# CONFIG_USB_MASS_STORAGE is not set
+# CONFIG_USB_GADGET_TARGET is not set
+CONFIG_USB_G_SERIAL=m
+# CONFIG_USB_MIDI_GADGET is not set
+CONFIG_USB_G_PRINTER=m
+CONFIG_USB_CDC_COMPOSITE=m
+CONFIG_USB_G_ACM_MS=m
+CONFIG_USB_G_MULTI=m
+CONFIG_USB_G_MULTI_RNDIS=y
+CONFIG_USB_G_MULTI_CDC=y
+CONFIG_USB_G_HID=m
+# CONFIG_USB_G_DBGP is not set
+CONFIG_USB_G_WEBCAM=m
+# CONFIG_UWB is not set
+# CONFIG_MMC is not set
+# CONFIG_MEMSTICK is not set
+CONFIG_NEW_LEDS=y
+CONFIG_LEDS_CLASS=y
+
+#
+# LED drivers
+#
+# CONFIG_LEDS_LM3530 is not set
+# CONFIG_LEDS_LM3642 is not set
+# CONFIG_LEDS_PCA9532 is not set
+CONFIG_LEDS_GPIO=m
+# CONFIG_LEDS_LP3944 is not set
+# CONFIG_LEDS_LP5521 is not set
+# CONFIG_LEDS_LP5523 is not set
+# CONFIG_LEDS_LP5562 is not set
+# CONFIG_LEDS_LP8501 is not set
+# CONFIG_LEDS_CLEVO_MAIL is not set
+# CONFIG_LEDS_PCA955X is not set
+# CONFIG_LEDS_PCA963X is not set
+# CONFIG_LEDS_PCA9685 is not set
+# CONFIG_LEDS_DAC124S085 is not set
+CONFIG_LEDS_PWM=m
+CONFIG_LEDS_REGULATOR=m
+# CONFIG_LEDS_BD2802 is not set
+# CONFIG_LEDS_INTEL_SS4200 is not set
+# CONFIG_LEDS_LT3593 is not set
+CONFIG_LEDS_DELL_NETBOOKS=m
+# CONFIG_LEDS_TCA6507 is not set
+# CONFIG_LEDS_LM355x is not set
+# CONFIG_LEDS_BLINKM is not set
+
+#
+# LED Triggers
+#
+CONFIG_LEDS_TRIGGERS=y
+CONFIG_LEDS_TRIGGER_TIMER=m
+CONFIG_LEDS_TRIGGER_ONESHOT=m
+CONFIG_LEDS_TRIGGER_HEARTBEAT=m
+CONFIG_LEDS_TRIGGER_BACKLIGHT=m
+CONFIG_LEDS_TRIGGER_CPU=y
+CONFIG_LEDS_TRIGGER_GPIO=m
+CONFIG_LEDS_TRIGGER_DEFAULT_ON=m
+
+#
+# iptables trigger is under Netfilter config (LED target)
+#
+CONFIG_LEDS_TRIGGER_TRANSIENT=m
+CONFIG_LEDS_TRIGGER_CAMERA=m
+# CONFIG_ACCESSIBILITY is not set
+# CONFIG_INFINIBAND is not set
+CONFIG_EDAC=y
+# CONFIG_EDAC_LEGACY_SYSFS is not set
+# CONFIG_EDAC_DEBUG is not set
+CONFIG_EDAC_MM_EDAC=m
+# CONFIG_EDAC_E752X is not set
+# CONFIG_EDAC_I82975X is not set
+# CONFIG_EDAC_I3000 is not set
+# CONFIG_EDAC_I3200 is not set
+# CONFIG_EDAC_X38 is not set
+# CONFIG_EDAC_I5400 is not set
+CONFIG_EDAC_I7CORE=m
+# CONFIG_EDAC_I5000 is not set
+# CONFIG_EDAC_I5100 is not set
+# CONFIG_EDAC_I7300 is not set
+# CONFIG_EDAC_SBRIDGE is not set
+CONFIG_RTC_LIB=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_HCTOSYS=y
+CONFIG_RTC_SYSTOHC=y
+CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
+# CONFIG_RTC_DEBUG is not set
+
+#
+# RTC interfaces
+#
+CONFIG_RTC_INTF_SYSFS=y
+CONFIG_RTC_INTF_PROC=y
+CONFIG_RTC_INTF_DEV=y
+CONFIG_RTC_INTF_DEV_UIE_EMUL=y
+CONFIG_RTC_DRV_TEST=m
+
+#
+# I2C RTC drivers
+#
+# CONFIG_RTC_DRV_DS1307 is not set
+# CONFIG_RTC_DRV_DS1374 is not set
+# CONFIG_RTC_DRV_DS1672 is not set
+# CONFIG_RTC_DRV_DS3232 is not set
+# CONFIG_RTC_DRV_MAX6900 is not set
+# CONFIG_RTC_DRV_RS5C372 is not set
+# CONFIG_RTC_DRV_ISL1208 is not set
+# CONFIG_RTC_DRV_ISL12022 is not set
+# CONFIG_RTC_DRV_ISL12057 is not set
+# CONFIG_RTC_DRV_X1205 is not set
+# CONFIG_RTC_DRV_PCF2127 is not set
+# CONFIG_RTC_DRV_PCF8523 is not set
+# CONFIG_RTC_DRV_PCF8563 is not set
+# CONFIG_RTC_DRV_PCF8583 is not set
+# CONFIG_RTC_DRV_M41T80 is not set
+# CONFIG_RTC_DRV_BQ32K is not set
+# CONFIG_RTC_DRV_S35390A is not set
+# CONFIG_RTC_DRV_FM3130 is not set
+# CONFIG_RTC_DRV_RX8581 is not set
+# CONFIG_RTC_DRV_RX8025 is not set
+# CONFIG_RTC_DRV_EM3027 is not set
+# CONFIG_RTC_DRV_RV3029C2 is not set
+
+#
+# SPI RTC drivers
+#
+# CONFIG_RTC_DRV_M41T93 is not set
+# CONFIG_RTC_DRV_M41T94 is not set
+# CONFIG_RTC_DRV_DS1305 is not set
+# CONFIG_RTC_DRV_DS1347 is not set
+# CONFIG_RTC_DRV_DS1390 is not set
+# CONFIG_RTC_DRV_MAX6902 is not set
+# CONFIG_RTC_DRV_R9701 is not set
+# CONFIG_RTC_DRV_RS5C348 is not set
+# CONFIG_RTC_DRV_DS3234 is not set
+# CONFIG_RTC_DRV_PCF2123 is not set
+# CONFIG_RTC_DRV_RX4581 is not set
+
+#
+# Platform RTC drivers
+#
+CONFIG_RTC_DRV_CMOS=y
+# CONFIG_RTC_DRV_DS1286 is not set
+# CONFIG_RTC_DRV_DS1511 is not set
+# CONFIG_RTC_DRV_DS1553 is not set
+# CONFIG_RTC_DRV_DS1742 is not set
+# CONFIG_RTC_DRV_STK17TA8 is not set
+# CONFIG_RTC_DRV_M48T86 is not set
+# CONFIG_RTC_DRV_M48T35 is not set
+# CONFIG_RTC_DRV_M48T59 is not set
+# CONFIG_RTC_DRV_MSM6242 is not set
+# CONFIG_RTC_DRV_BQ4802 is not set
+# CONFIG_RTC_DRV_RP5C01 is not set
+# CONFIG_RTC_DRV_V3020 is not set
+# CONFIG_RTC_DRV_DS2404 is not set
+
+#
+# on-CPU RTC drivers
+#
+# CONFIG_RTC_DRV_MOXART is not set
+
+#
+# HID Sensor RTC drivers
+#
+CONFIG_RTC_DRV_HID_SENSOR_TIME=m
+CONFIG_DMADEVICES=y
+# CONFIG_DMADEVICES_DEBUG is not set
+
+#
+# DMA Devices
+#
+CONFIG_INTEL_MID_DMAC=m
+CONFIG_INTEL_IOATDMA=m
+# CONFIG_DW_DMAC_CORE is not set
+# CONFIG_DW_DMAC is not set
+# CONFIG_DW_DMAC_PCI is not set
+CONFIG_PCH_DMA=m
+CONFIG_DMA_ENGINE=y
+CONFIG_DMA_ACPI=y
+
+#
+# DMA Clients
+#
+CONFIG_ASYNC_TX_DMA=y
+# CONFIG_DMATEST is not set
+CONFIG_DMA_ENGINE_RAID=y
+CONFIG_DCA=m
+# CONFIG_AUXDISPLAY is not set
+CONFIG_UIO=m
+# CONFIG_UIO_CIF is not set
+CONFIG_UIO_PDRV_GENIRQ=m
+CONFIG_UIO_DMEM_GENIRQ=m
+# CONFIG_UIO_AEC is not set
+# CONFIG_UIO_SERCOS3 is not set
+CONFIG_UIO_PCI_GENERIC=m
+# CONFIG_UIO_NETX is not set
+# CONFIG_UIO_MF624 is not set
+CONFIG_VFIO_IOMMU_TYPE1=m
+CONFIG_VFIO=m
+CONFIG_VFIO_PCI=m
+CONFIG_VFIO_PCI_VGA=y
+CONFIG_VIRT_DRIVERS=y
+CONFIG_VIRTIO=m
+
+#
+# Virtio drivers
+#
+CONFIG_VIRTIO_PCI=m
+CONFIG_VIRTIO_BALLOON=m
+CONFIG_VIRTIO_MMIO=m
+CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y
+
+#
+# Microsoft Hyper-V guest support
+#
+# CONFIG_HYPERV is not set
+
+#
+# Xen driver support
+#
+CONFIG_XEN_BALLOON=y
+# CONFIG_XEN_SELFBALLOONING is not set
+CONFIG_XEN_SCRUB_PAGES=y
+CONFIG_XEN_DEV_EVTCHN=m
+CONFIG_XEN_BACKEND=y
+CONFIG_XENFS=m
+CONFIG_XEN_COMPAT_XENFS=y
+CONFIG_XEN_SYS_HYPERVISOR=y
+CONFIG_XEN_XENBUS_FRONTEND=y
+CONFIG_XEN_GNTDEV=m
+CONFIG_XEN_GRANT_DEV_ALLOC=m
+CONFIG_SWIOTLB_XEN=y
+CONFIG_XEN_TMEM=m
+CONFIG_XEN_PCIDEV_BACKEND=m
+CONFIG_XEN_PRIVCMD=m
+CONFIG_XEN_ACPI_PROCESSOR=m
+# CONFIG_XEN_MCE_LOG is not set
+CONFIG_XEN_HAVE_PVMMU=y
+CONFIG_STAGING=y
+# CONFIG_ET131X is not set
+# CONFIG_SLICOSS is not set
+CONFIG_USBIP_CORE=m
+CONFIG_USBIP_VHCI_HCD=m
+CONFIG_USBIP_HOST=m
+# CONFIG_USBIP_DEBUG is not set
+# CONFIG_W35UND is not set
+# CONFIG_PRISM2_USB is not set
+# CONFIG_COMEDI is not set
+# CONFIG_RTL8192U is not set
+# CONFIG_RTLLIB is not set
+# CONFIG_R8712U is not set
+# CONFIG_R8188EU is not set
+# CONFIG_R8723AU is not set
+# CONFIG_R8821AE is not set
+CONFIG_RTS5139=m
+# CONFIG_RTS5139_DEBUG is not set
+# CONFIG_RTS5208 is not set
+# CONFIG_TRANZPORT is not set
+# CONFIG_IDE_PHISON is not set
+# CONFIG_LINE6_USB is not set
+# CONFIG_USB_SERIAL_QUATECH2 is not set
+# CONFIG_VT6655 is not set
+# CONFIG_VT6656 is not set
+# CONFIG_DX_SEP is not set
+
+#
+# IIO staging drivers
+#
+
+#
+# Accelerometers
+#
+# CONFIG_ADIS16201 is not set
+# CONFIG_ADIS16203 is not set
+# CONFIG_ADIS16204 is not set
+# CONFIG_ADIS16209 is not set
+# CONFIG_ADIS16220 is not set
+# CONFIG_ADIS16240 is not set
+# CONFIG_LIS3L02DQ is not set
+# CONFIG_SCA3000 is not set
+
+#
+# Analog to digital converters
+#
+# CONFIG_AD7291 is not set
+# CONFIG_AD7606 is not set
+# CONFIG_AD799X is not set
+# CONFIG_AD7780 is not set
+# CONFIG_AD7816 is not set
+# CONFIG_AD7192 is not set
+# CONFIG_AD7280 is not set
+
+#
+# Analog digital bi-direction converters
+#
+# CONFIG_ADT7316 is not set
+
+#
+# Capacitance to digital converters
+#
+# CONFIG_AD7150 is not set
+# CONFIG_AD7152 is not set
+# CONFIG_AD7746 is not set
+
+#
+# Direct Digital Synthesis
+#
+# CONFIG_AD5930 is not set
+# CONFIG_AD9832 is not set
+# CONFIG_AD9834 is not set
+# CONFIG_AD9850 is not set
+# CONFIG_AD9852 is not set
+# CONFIG_AD9910 is not set
+# CONFIG_AD9951 is not set
+
+#
+# Digital gyroscope sensors
+#
+# CONFIG_ADIS16060 is not set
+
+#
+# Network Analyzer, Impedance Converters
+#
+# CONFIG_AD5933 is not set
+
+#
+# Light sensors
+#
+# CONFIG_SENSORS_ISL29018 is not set
+# CONFIG_SENSORS_ISL29028 is not set
+# CONFIG_TSL2583 is not set
+# CONFIG_TSL2x7x is not set
+
+#
+# Magnetometer sensors
+#
+# CONFIG_SENSORS_HMC5843 is not set
+
+#
+# Active energy metering IC
+#
+# CONFIG_ADE7753 is not set
+# CONFIG_ADE7754 is not set
+# CONFIG_ADE7758 is not set
+# CONFIG_ADE7759 is not set
+# CONFIG_ADE7854 is not set
+
+#
+# Resolver to digital converters
+#
+# CONFIG_AD2S90 is not set
+# CONFIG_AD2S1200 is not set
+# CONFIG_AD2S1210 is not set
+
+#
+# Triggers - standalone
+#
+CONFIG_IIO_PERIODIC_RTC_TRIGGER=m
+# CONFIG_IIO_SIMPLE_DUMMY is not set
+# CONFIG_CRYSTALHD is not set
+# CONFIG_FB_XGI is not set
+# CONFIG_ACPI_QUICKSTART is not set
+# CONFIG_USB_ENESTORAGE is not set
+# CONFIG_BCM_WIMAX is not set
+# CONFIG_FT1000 is not set
+
+#
+# Speakup console speech
+#
+# CONFIG_SPEAKUP is not set
+# CONFIG_TOUCHSCREEN_CLEARPAD_TM1217 is not set
+# CONFIG_TOUCHSCREEN_SYNAPTICS_I2C_RMI4 is not set
+# CONFIG_STAGING_MEDIA is not set
+
+#
+# Android
+#
+# CONFIG_ANDROID is not set
+CONFIG_USB_WPAN_HCD=m
+# CONFIG_WIMAX_GDM72XX is not set
+# CONFIG_LTE_GDM724X is not set
+# CONFIG_NET_VENDOR_SILICOM is not set
+# CONFIG_CED1401 is not set
+# CONFIG_DGRP is not set
+# CONFIG_LUSTRE_FS is not set
+# CONFIG_XILLYBUS is not set
+# CONFIG_DGNC is not set
+# CONFIG_DGAP is not set
+# CONFIG_GS_FPGABOOT is not set
+CONFIG_X86_PLATFORM_DEVICES=y
+# CONFIG_ACER_WMI is not set
+# CONFIG_ACERHDF is not set
+# CONFIG_ALIENWARE_WMI is not set
+# CONFIG_ASUS_LAPTOP is not set
+CONFIG_DELL_LAPTOP=m
+CONFIG_DELL_WMI=m
+CONFIG_DELL_WMI_AIO=m
+# CONFIG_FUJITSU_LAPTOP is not set
+# CONFIG_FUJITSU_TABLET is not set
+# CONFIG_AMILO_RFKILL is not set
+# CONFIG_HP_ACCEL is not set
+# CONFIG_HP_WIRELESS is not set
+# CONFIG_HP_WMI is not set
+# CONFIG_MSI_LAPTOP is not set
+# CONFIG_PANASONIC_LAPTOP is not set
+# CONFIG_COMPAL_LAPTOP is not set
+# CONFIG_SONY_LAPTOP is not set
+# CONFIG_IDEAPAD_LAPTOP is not set
+# CONFIG_THINKPAD_ACPI is not set
+# CONFIG_SENSORS_HDAPS is not set
+CONFIG_INTEL_MENLOW=m
+# CONFIG_EEEPC_LAPTOP is not set
+# CONFIG_ASUS_WMI is not set
+CONFIG_ACPI_WMI=m
+# CONFIG_MSI_WMI is not set
+# CONFIG_TOPSTAR_LAPTOP is not set
+# CONFIG_ACPI_TOSHIBA is not set
+# CONFIG_TOSHIBA_BT_RFKILL is not set
+# CONFIG_ACPI_CMPC is not set
+CONFIG_INTEL_IPS=m
+CONFIG_IBM_RTL=m
+# CONFIG_XO15_EBOOK is not set
+# CONFIG_SAMSUNG_LAPTOP is not set
+# CONFIG_MXM_WMI is not set
+CONFIG_INTEL_OAKTRAIL=m
+# CONFIG_SAMSUNG_Q10 is not set
+# CONFIG_APPLE_GMUX is not set
+# CONFIG_INTEL_RST is not set
+# CONFIG_INTEL_SMARTCONNECT is not set
+CONFIG_PVPANIC=m
+# CONFIG_CHROME_PLATFORMS is not set
+CONFIG_CLKDEV_LOOKUP=y
+CONFIG_HAVE_CLK_PREPARE=y
+CONFIG_COMMON_CLK=y
+
+#
+# Common Clock Framework
+#
+# CONFIG_COMMON_CLK_SI5351 is not set
+
+#
+# Hardware Spinlock drivers
+#
+CONFIG_CLKEVT_I8253=y
+CONFIG_CLKBLD_I8253=y
+# CONFIG_SH_TIMER_CMT is not set
+# CONFIG_SH_TIMER_MTU2 is not set
+# CONFIG_SH_TIMER_TMU is not set
+# CONFIG_EM_TIMER_STI is not set
+CONFIG_MAILBOX=y
+CONFIG_IOMMU_API=y
+CONFIG_IOMMU_SUPPORT=y
+# CONFIG_AMD_IOMMU is not set
+CONFIG_DMAR_TABLE=y
+CONFIG_INTEL_IOMMU=y
+CONFIG_INTEL_IOMMU_DEFAULT_ON=y
+CONFIG_INTEL_IOMMU_FLOPPY_WA=y
+CONFIG_IRQ_REMAP=y
+
+#
+# Remoteproc drivers
+#
+CONFIG_REMOTEPROC=m
+CONFIG_STE_MODEM_RPROC=m
+
+#
+# Rpmsg drivers
+#
+CONFIG_PM_DEVFREQ=y
+
+#
+# DEVFREQ Governors
+#
+CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND=y
+CONFIG_DEVFREQ_GOV_PERFORMANCE=y
+CONFIG_DEVFREQ_GOV_POWERSAVE=y
+CONFIG_DEVFREQ_GOV_USERSPACE=y
+
+#
+# DEVFREQ Drivers
+#
+CONFIG_EXTCON=m
+
+#
+# Extcon Device Drivers
+#
+CONFIG_EXTCON_GPIO=m
+# CONFIG_EXTCON_ADC_JACK is not set
+CONFIG_MEMORY=y
+CONFIG_IIO=m
+CONFIG_IIO_BUFFER=y
+CONFIG_IIO_BUFFER_CB=y
+CONFIG_IIO_KFIFO_BUF=m
+CONFIG_IIO_TRIGGER=y
+CONFIG_IIO_CONSUMERS_PER_TRIGGER=2
+
+#
+# Accelerometers
+#
+# CONFIG_BMA180 is not set
+# CONFIG_HID_SENSOR_ACCEL_3D is not set
+# CONFIG_IIO_ST_ACCEL_3AXIS is not set
+# CONFIG_KXSD9 is not set
+
+#
+# Analog to digital converters
+#
+# CONFIG_AD7266 is not set
+# CONFIG_AD7298 is not set
+# CONFIG_AD7476 is not set
+# CONFIG_AD7791 is not set
+# CONFIG_AD7793 is not set
+# CONFIG_AD7887 is not set
+# CONFIG_AD7923 is not set
+# CONFIG_MAX1363 is not set
+# CONFIG_MCP320X is not set
+# CONFIG_MCP3422 is not set
+# CONFIG_NAU7802 is not set
+# CONFIG_TI_ADC081C is not set
+
+#
+# Amplifiers
+#
+# CONFIG_AD8366 is not set
+
+#
+# Hid Sensor IIO Common
+#
+CONFIG_HID_SENSOR_IIO_COMMON=m
+CONFIG_HID_SENSOR_IIO_TRIGGER=m
+
+#
+# Digital to analog converters
+#
+# CONFIG_AD5064 is not set
+# CONFIG_AD5360 is not set
+# CONFIG_AD5380 is not set
+# CONFIG_AD5421 is not set
+# CONFIG_AD5446 is not set
+# CONFIG_AD5449 is not set
+# CONFIG_AD5504 is not set
+# CONFIG_AD5624R_SPI is not set
+# CONFIG_AD5686 is not set
+# CONFIG_AD5755 is not set
+# CONFIG_AD5764 is not set
+# CONFIG_AD5791 is not set
+# CONFIG_AD7303 is not set
+# CONFIG_MAX517 is not set
+# CONFIG_MCP4725 is not set
+
+#
+# Frequency Synthesizers DDS/PLL
+#
+
+#
+# Clock Generator/Distribution
+#
+# CONFIG_AD9523 is not set
+
+#
+# Phase-Locked Loop (PLL) frequency synthesizers
+#
+# CONFIG_ADF4350 is not set
+
+#
+# Digital gyroscope sensors
+#
+# CONFIG_ADIS16080 is not set
+# CONFIG_ADIS16130 is not set
+# CONFIG_ADIS16136 is not set
+# CONFIG_ADIS16260 is not set
+# CONFIG_ADXRS450 is not set
+# CONFIG_HID_SENSOR_GYRO_3D is not set
+# CONFIG_IIO_ST_GYRO_3AXIS is not set
+# CONFIG_ITG3200 is not set
+
+#
+# Humidity sensors
+#
+# CONFIG_DHT11 is not set
+# CONFIG_SI7005 is not set
+
+#
+# Inertial measurement units
+#
+# CONFIG_ADIS16400 is not set
+# CONFIG_ADIS16480 is not set
+# CONFIG_INV_MPU6050_IIO is not set
+
+#
+# Light sensors
+#
+# CONFIG_ADJD_S311 is not set
+# CONFIG_APDS9300 is not set
+# CONFIG_CM32181 is not set
+# CONFIG_CM36651 is not set
+# CONFIG_GP2AP020A00F is not set
+# CONFIG_HID_SENSOR_ALS is not set
+# CONFIG_HID_SENSOR_PROX is not set
+# CONFIG_LTR501 is not set
+# CONFIG_TCS3472 is not set
+# CONFIG_SENSORS_TSL2563 is not set
+# CONFIG_TSL4531 is not set
+# CONFIG_VCNL4000 is not set
+
+#
+# Magnetometer sensors
+#
+# CONFIG_AK8975 is not set
+# CONFIG_MAG3110 is not set
+# CONFIG_HID_SENSOR_MAGNETOMETER_3D is not set
+# CONFIG_IIO_ST_MAGN_3AXIS is not set
+
+#
+# Inclinometer sensors
+#
+# CONFIG_HID_SENSOR_INCLINOMETER_3D is not set
+
+#
+# Triggers - standalone
+#
+CONFIG_IIO_INTERRUPT_TRIGGER=m
+CONFIG_IIO_SYSFS_TRIGGER=m
+
+#
+# Pressure sensors
+#
+# CONFIG_HID_SENSOR_PRESS is not set
+# CONFIG_MPL3115 is not set
+# CONFIG_IIO_ST_PRESS is not set
+
+#
+# Temperature sensors
+#
+# CONFIG_TMP006 is not set
+CONFIG_NTB=m
+# CONFIG_VME_BUS is not set
+CONFIG_PWM=y
+CONFIG_PWM_SYSFS=y
+CONFIG_PWM_LPSS=m
+# CONFIG_IPACK_BUS is not set
+CONFIG_RESET_CONTROLLER=y
+# CONFIG_FMC is not set
+
+#
+# PHY Subsystem
+#
+CONFIG_GENERIC_PHY=y
+# CONFIG_BCM_KONA_USB2_PHY is not set
+# CONFIG_PHY_SAMSUNG_USB2 is not set
+CONFIG_POWERCAP=y
+CONFIG_INTEL_RAPL=y
+# CONFIG_MCB is not set
+
+#
+# Firmware Drivers
+#
+CONFIG_EDD=m
+# CONFIG_EDD_OFF is not set
+CONFIG_FIRMWARE_MEMMAP=y
+CONFIG_DELL_RBU=m
+CONFIG_DCDBAS=m
+CONFIG_DMIID=y
+CONFIG_DMI_SYSFS=m
+CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
+CONFIG_ISCSI_IBFT_FIND=y
+CONFIG_ISCSI_IBFT=m
+# CONFIG_GOOGLE_FIRMWARE is not set
+
+#
+# EFI (Extensible Firmware Interface) Support
+#
+# CONFIG_EFI_VARS is not set
+CONFIG_EFI_RUNTIME_MAP=y
+CONFIG_UEFI_CPER=y
+
+#
+# File systems
+#
+CONFIG_DCACHE_WORD_ACCESS=y
+# CONFIG_EXT2_FS is not set
+# CONFIG_EXT3_FS is not set
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_USE_FOR_EXT23=y
+CONFIG_EXT4_FS_POSIX_ACL=y
+CONFIG_EXT4_FS_SECURITY=y
+# CONFIG_EXT4_DEBUG is not set
+CONFIG_JBD2=y
+# CONFIG_JBD2_DEBUG is not set
+CONFIG_FS_MBCACHE=y
+# CONFIG_REISERFS_FS is not set
+# CONFIG_JFS_FS is not set
+CONFIG_XFS_FS=m
+CONFIG_XFS_QUOTA=y
+CONFIG_XFS_POSIX_ACL=y
+CONFIG_XFS_RT=y
+# CONFIG_XFS_WARN is not set
+# CONFIG_XFS_DEBUG is not set
+# CONFIG_GFS2_FS is not set
+# CONFIG_OCFS2_FS is not set
+# CONFIG_BTRFS_FS is not set
+# CONFIG_NILFS2_FS is not set
+CONFIG_FS_POSIX_ACL=y
+CONFIG_EXPORTFS=y
+CONFIG_FILE_LOCKING=y
+CONFIG_FSNOTIFY=y
+CONFIG_DNOTIFY=y
+CONFIG_INOTIFY_USER=y
+CONFIG_FANOTIFY=y
+CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
+CONFIG_QUOTA=y
+CONFIG_QUOTA_NETLINK_INTERFACE=y
+# CONFIG_PRINT_QUOTA_WARNING is not set
+# CONFIG_QUOTA_DEBUG is not set
+CONFIG_QUOTA_TREE=m
+CONFIG_QFMT_V1=m
+CONFIG_QFMT_V2=m
+CONFIG_QUOTACTL=y
+CONFIG_QUOTACTL_COMPAT=y
+CONFIG_AUTOFS4_FS=m
+CONFIG_FUSE_FS=m
+CONFIG_CUSE=m
+
+#
+# Caches
+#
+CONFIG_FSCACHE=m
+CONFIG_FSCACHE_STATS=y
+CONFIG_FSCACHE_HISTOGRAM=y
+# CONFIG_FSCACHE_DEBUG is not set
+# CONFIG_FSCACHE_OBJECT_LIST is not set
+CONFIG_CACHEFILES=m
+# CONFIG_CACHEFILES_DEBUG is not set
+# CONFIG_CACHEFILES_HISTOGRAM is not set
+
+#
+# CD-ROM/DVD Filesystems
+#
+CONFIG_ISO9660_FS=m
+CONFIG_JOLIET=y
+CONFIG_ZISOFS=y
+CONFIG_UDF_FS=m
+CONFIG_UDF_NLS=y
+
+#
+# DOS/FAT/NT Filesystems
+#
+CONFIG_FAT_FS=m
+CONFIG_MSDOS_FS=m
+CONFIG_VFAT_FS=m
+CONFIG_FAT_DEFAULT_CODEPAGE=437
+CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
+CONFIG_NTFS_FS=m
+# CONFIG_NTFS_DEBUG is not set
+CONFIG_NTFS_RW=y
+
+#
+# Pseudo filesystems
+#
+CONFIG_PROC_FS=y
+CONFIG_PROC_KCORE=y
+CONFIG_PROC_SYSCTL=y
+CONFIG_PROC_PAGE_MONITOR=y
+CONFIG_KERNFS=y
+CONFIG_SYSFS=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+CONFIG_TMPFS_XATTR=y
+CONFIG_HUGETLBFS=y
+CONFIG_HUGETLB_PAGE=y
+CONFIG_CONFIGFS_FS=m
+CONFIG_MISC_FILESYSTEMS=y
+# CONFIG_ADFS_FS is not set
+# CONFIG_AFFS_FS is not set
+CONFIG_ECRYPT_FS=m
+CONFIG_ECRYPT_FS_MESSAGING=y
+CONFIG_HFS_FS=m
+CONFIG_HFSPLUS_FS=m
+CONFIG_HFSPLUS_FS_POSIX_ACL=y
+# CONFIG_BEFS_FS is not set
+# CONFIG_BFS_FS is not set
+# CONFIG_EFS_FS is not set
+# CONFIG_LOGFS is not set
+CONFIG_CRAMFS=m
+CONFIG_SQUASHFS=m
+# CONFIG_SQUASHFS_FILE_CACHE is not set
+CONFIG_SQUASHFS_FILE_DIRECT=y
+# CONFIG_SQUASHFS_DECOMP_SINGLE is not set
+# CONFIG_SQUASHFS_DECOMP_MULTI is not set
+CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
+CONFIG_SQUASHFS_XATTR=y
+CONFIG_SQUASHFS_ZLIB=y
+CONFIG_SQUASHFS_LZO=y
+CONFIG_SQUASHFS_XZ=y
+# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
+# CONFIG_SQUASHFS_EMBEDDED is not set
+CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
+# CONFIG_VXFS_FS is not set
+# CONFIG_MINIX_FS is not set
+# CONFIG_OMFS_FS is not set
+# CONFIG_HPFS_FS is not set
+# CONFIG_QNX4FS_FS is not set
+# CONFIG_QNX6FS_FS is not set
+CONFIG_ROMFS_FS=m
+CONFIG_ROMFS_BACKED_BY_BLOCK=y
+CONFIG_ROMFS_ON_BLOCK=y
+CONFIG_PSTORE=y
+# CONFIG_PSTORE_CONSOLE is not set
+# CONFIG_PSTORE_FTRACE is not set
+CONFIG_PSTORE_RAM=m
+# CONFIG_SYSV_FS is not set
+CONFIG_UFS_FS=m
+# CONFIG_UFS_FS_WRITE is not set
+# CONFIG_UFS_DEBUG is not set
+# CONFIG_F2FS_FS is not set
+CONFIG_EFIVAR_FS=m
+CONFIG_NETWORK_FILESYSTEMS=y
+CONFIG_NFS_FS=m
+CONFIG_NFS_V2=m
+CONFIG_NFS_V3=m
+CONFIG_NFS_V3_ACL=y
+CONFIG_NFS_V4=m
+CONFIG_NFS_SWAP=y
+CONFIG_NFS_V4_1=y
+CONFIG_NFS_V4_2=y
+CONFIG_PNFS_FILE_LAYOUT=m
+CONFIG_PNFS_BLOCK=m
+CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="pf.natalenko.name"
+CONFIG_NFS_V4_1_MIGRATION=y
+CONFIG_NFS_V4_SECURITY_LABEL=y
+CONFIG_NFS_FSCACHE=y
+# CONFIG_NFS_USE_LEGACY_DNS is not set
+CONFIG_NFS_USE_KERNEL_DNS=y
+CONFIG_NFSD=m
+CONFIG_NFSD_V2_ACL=y
+CONFIG_NFSD_V3=y
+CONFIG_NFSD_V3_ACL=y
+CONFIG_NFSD_V4=y
+CONFIG_NFSD_V4_SECURITY_LABEL=y
+# CONFIG_NFSD_FAULT_INJECTION is not set
+CONFIG_LOCKD=m
+CONFIG_LOCKD_V4=y
+CONFIG_NFS_ACL_SUPPORT=m
+CONFIG_NFS_COMMON=y
+CONFIG_SUNRPC=m
+CONFIG_SUNRPC_GSS=m
+CONFIG_SUNRPC_BACKCHANNEL=y
+CONFIG_SUNRPC_SWAP=y
+CONFIG_RPCSEC_GSS_KRB5=m
+# CONFIG_SUNRPC_DEBUG is not set
+# CONFIG_CEPH_FS is not set
+CONFIG_CIFS=m
+# CONFIG_CIFS_STATS is not set
+CONFIG_CIFS_WEAK_PW_HASH=y
+CONFIG_CIFS_UPCALL=y
+CONFIG_CIFS_XATTR=y
+CONFIG_CIFS_POSIX=y
+CONFIG_CIFS_ACL=y
+# CONFIG_CIFS_DEBUG is not set
+CONFIG_CIFS_DFS_UPCALL=y
+CONFIG_CIFS_SMB2=y
+CONFIG_CIFS_FSCACHE=y
+# CONFIG_NCP_FS is not set
+# CONFIG_CODA_FS is not set
+# CONFIG_AFS_FS is not set
+CONFIG_NLS=y
+CONFIG_NLS_DEFAULT="utf8"
+CONFIG_NLS_CODEPAGE_437=m
+# CONFIG_NLS_CODEPAGE_737 is not set
+# CONFIG_NLS_CODEPAGE_775 is not set
+CONFIG_NLS_CODEPAGE_850=m
+CONFIG_NLS_CODEPAGE_852=m
+CONFIG_NLS_CODEPAGE_855=m
+# CONFIG_NLS_CODEPAGE_857 is not set
+# CONFIG_NLS_CODEPAGE_860 is not set
+# CONFIG_NLS_CODEPAGE_861 is not set
+# CONFIG_NLS_CODEPAGE_862 is not set
+# CONFIG_NLS_CODEPAGE_863 is not set
+# CONFIG_NLS_CODEPAGE_864 is not set
+# CONFIG_NLS_CODEPAGE_865 is not set
+CONFIG_NLS_CODEPAGE_866=m
+# CONFIG_NLS_CODEPAGE_869 is not set
+# CONFIG_NLS_CODEPAGE_936 is not set
+# CONFIG_NLS_CODEPAGE_950 is not set
+# CONFIG_NLS_CODEPAGE_932 is not set
+# CONFIG_NLS_CODEPAGE_949 is not set
+# CONFIG_NLS_CODEPAGE_874 is not set
+# CONFIG_NLS_ISO8859_8 is not set
+CONFIG_NLS_CODEPAGE_1250=m
+CONFIG_NLS_CODEPAGE_1251=m
+CONFIG_NLS_ASCII=m
+CONFIG_NLS_ISO8859_1=y
+CONFIG_NLS_ISO8859_2=m
+CONFIG_NLS_ISO8859_3=m
+# CONFIG_NLS_ISO8859_4 is not set
+CONFIG_NLS_ISO8859_5=m
+# CONFIG_NLS_ISO8859_6 is not set
+CONFIG_NLS_ISO8859_7=m
+# CONFIG_NLS_ISO8859_9 is not set
+# CONFIG_NLS_ISO8859_13 is not set
+# CONFIG_NLS_ISO8859_14 is not set
+CONFIG_NLS_ISO8859_15=m
+CONFIG_NLS_KOI8_R=m
+CONFIG_NLS_KOI8_U=m
+CONFIG_NLS_MAC_ROMAN=m
+# CONFIG_NLS_MAC_CELTIC is not set
+# CONFIG_NLS_MAC_CENTEURO is not set
+# CONFIG_NLS_MAC_CROATIAN is not set
+CONFIG_NLS_MAC_CYRILLIC=m
+# CONFIG_NLS_MAC_GAELIC is not set
+CONFIG_NLS_MAC_GREEK=m
+# CONFIG_NLS_MAC_ICELAND is not set
+# CONFIG_NLS_MAC_INUIT is not set
+# CONFIG_NLS_MAC_ROMANIAN is not set
+# CONFIG_NLS_MAC_TURKISH is not set
+CONFIG_NLS_UTF8=m
+CONFIG_DLM=m
+# CONFIG_DLM_DEBUG is not set
+
+#
+# Kernel hacking
+#
+CONFIG_TRACE_IRQFLAGS_SUPPORT=y
+
+#
+# printk and dmesg options
+#
+CONFIG_PRINTK_TIME=y
+CONFIG_DEFAULT_MESSAGE_LOGLEVEL=4
+# CONFIG_BOOT_PRINTK_DELAY is not set
+# CONFIG_DYNAMIC_DEBUG is not set
+
+#
+# Compile-time checks and compiler options
+#
+# CONFIG_DEBUG_INFO is not set
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_FRAME_WARN=2048
+CONFIG_STRIP_ASM_SYMS=y
+# CONFIG_READABLE_ASM is not set
+CONFIG_UNUSED_SYMBOLS=y
+CONFIG_DEBUG_FS=y
+# CONFIG_HEADERS_CHECK is not set
+# CONFIG_DEBUG_SECTION_MISMATCH is not set
+CONFIG_ARCH_WANT_FRAME_POINTERS=y
+CONFIG_FRAME_POINTER=y
+# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
+CONFIG_DEBUG_KERNEL=y
+
+#
+# Memory Debugging
+#
+# CONFIG_DEBUG_PAGEALLOC is not set
+# CONFIG_DEBUG_OBJECTS is not set
+# CONFIG_SLUB_STATS is not set
+CONFIG_HAVE_DEBUG_KMEMLEAK=y
+# CONFIG_DEBUG_KMEMLEAK is not set
+# CONFIG_DEBUG_STACK_USAGE is not set
+# CONFIG_DEBUG_VM is not set
+# CONFIG_DEBUG_VIRTUAL is not set
+# CONFIG_DEBUG_MEMORY_INIT is not set
+# CONFIG_DEBUG_PER_CPU_MAPS is not set
+CONFIG_HAVE_DEBUG_STACKOVERFLOW=y
+# CONFIG_DEBUG_STACKOVERFLOW is not set
+CONFIG_HAVE_ARCH_KMEMCHECK=y
+# CONFIG_DEBUG_SHIRQ is not set
+
+#
+# Debug Lockups and Hangs
+#
+CONFIG_LOCKUP_DETECTOR=y
+CONFIG_HARDLOCKUP_DETECTOR=y
+# CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is not set
+CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0
+# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
+CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
+CONFIG_DETECT_HUNG_TASK=y
+CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
+# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
+CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
+# CONFIG_PANIC_ON_OOPS is not set
+CONFIG_PANIC_ON_OOPS_VALUE=0
+CONFIG_PANIC_TIMEOUT=0
+CONFIG_SCHED_DEBUG=y
+CONFIG_SCHEDSTATS=y
+CONFIG_TIMER_STATS=y
+# CONFIG_DEBUG_PREEMPT is not set
+
+#
+# Lock Debugging (spinlocks, mutexes, etc...)
+#
+# CONFIG_DEBUG_RT_MUTEXES is not set
+# CONFIG_RT_MUTEX_TESTER is not set
+# CONFIG_DEBUG_SPINLOCK is not set
+# CONFIG_DEBUG_MUTEXES is not set
+# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
+# CONFIG_DEBUG_LOCK_ALLOC is not set
+# CONFIG_PROVE_LOCKING is not set
+# CONFIG_LOCK_STAT is not set
+# CONFIG_DEBUG_ATOMIC_SLEEP is not set
+# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
+# CONFIG_LOCK_TORTURE_TEST is not set
+CONFIG_STACKTRACE=y
+# CONFIG_DEBUG_KOBJECT is not set
+CONFIG_DEBUG_BUGVERBOSE=y
+# CONFIG_DEBUG_LIST is not set
+# CONFIG_DEBUG_SG is not set
+# CONFIG_DEBUG_NOTIFIERS is not set
+# CONFIG_DEBUG_CREDENTIALS is not set
+
+#
+# RCU Debugging
+#
+# CONFIG_PROVE_RCU_DELAY is not set
+# CONFIG_SPARSE_RCU_POINTER is not set
+# CONFIG_TORTURE_TEST is not set
+# CONFIG_RCU_TORTURE_TEST is not set
+CONFIG_RCU_CPU_STALL_TIMEOUT=60
+# CONFIG_RCU_CPU_STALL_VERBOSE is not set
+# CONFIG_RCU_CPU_STALL_INFO is not set
+# CONFIG_RCU_TRACE is not set
+# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
+# CONFIG_NOTIFIER_ERROR_INJECTION is not set
+# CONFIG_FAULT_INJECTION is not set
+CONFIG_LATENCYTOP=y
+CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y
+# CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is not set
+CONFIG_USER_STACKTRACE_SUPPORT=y
+CONFIG_NOP_TRACER=y
+CONFIG_HAVE_FUNCTION_TRACER=y
+CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
+CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
+CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
+CONFIG_HAVE_DYNAMIC_FTRACE=y
+CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
+CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
+CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
+CONFIG_HAVE_FENTRY=y
+CONFIG_HAVE_C_RECORDMCOUNT=y
+CONFIG_TRACER_MAX_TRACE=y
+CONFIG_TRACE_CLOCK=y
+CONFIG_RING_BUFFER=y
+CONFIG_EVENT_TRACING=y
+CONFIG_CONTEXT_SWITCH_TRACER=y
+CONFIG_RING_BUFFER_ALLOW_SWAP=y
+CONFIG_TRACING=y
+CONFIG_GENERIC_TRACER=y
+CONFIG_TRACING_SUPPORT=y
+CONFIG_FTRACE=y
+CONFIG_FUNCTION_TRACER=y
+CONFIG_FUNCTION_GRAPH_TRACER=y
+# CONFIG_IRQSOFF_TRACER is not set
+# CONFIG_PREEMPT_TRACER is not set
+# CONFIG_SCHED_TRACER is not set
+CONFIG_FTRACE_SYSCALLS=y
+CONFIG_TRACER_SNAPSHOT=y
+CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y
+CONFIG_BRANCH_PROFILE_NONE=y
+# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
+# CONFIG_PROFILE_ALL_BRANCHES is not set
+# CONFIG_STACK_TRACER is not set
+# CONFIG_BLK_DEV_IO_TRACE is not set
+CONFIG_UPROBE_EVENT=y
+CONFIG_PROBE_EVENTS=y
+CONFIG_DYNAMIC_FTRACE=y
+CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
+# CONFIG_FUNCTION_PROFILER is not set
+CONFIG_FTRACE_MCOUNT_RECORD=y
+# CONFIG_FTRACE_STARTUP_TEST is not set
+CONFIG_MMIOTRACE=y
+# CONFIG_MMIOTRACE_TEST is not set
+# CONFIG_RING_BUFFER_BENCHMARK is not set
+# CONFIG_RING_BUFFER_STARTUP_TEST is not set
+
+#
+# Runtime Testing
+#
+# CONFIG_LKDTM is not set
+# CONFIG_TEST_LIST_SORT is not set
+# CONFIG_BACKTRACE_SELF_TEST is not set
+# CONFIG_RBTREE_TEST is not set
+# CONFIG_INTERVAL_TREE_TEST is not set
+# CONFIG_PERCPU_TEST is not set
+# CONFIG_ATOMIC64_SELFTEST is not set
+# CONFIG_ASYNC_RAID6_TEST is not set
+# CONFIG_TEST_STRING_HELPERS is not set
+# CONFIG_TEST_KSTRTOX is not set
+# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
+# CONFIG_DMA_API_DEBUG is not set
+# CONFIG_TEST_MODULE is not set
+# CONFIG_TEST_USER_COPY is not set
+# CONFIG_SAMPLES is not set
+CONFIG_HAVE_ARCH_KGDB=y
+# CONFIG_KGDB is not set
+CONFIG_STRICT_DEVMEM=y
+CONFIG_X86_VERBOSE_BOOTUP=y
+# CONFIG_EARLY_PRINTK is not set
+# CONFIG_X86_PTDUMP is not set
+CONFIG_DEBUG_RODATA=y
+# CONFIG_DEBUG_RODATA_TEST is not set
+# CONFIG_DEBUG_SET_MODULE_RONX is not set
+# CONFIG_DEBUG_NX_TEST is not set
+# CONFIG_DOUBLEFAULT is not set
+# CONFIG_DEBUG_TLBFLUSH is not set
+# CONFIG_IOMMU_STRESS is not set
+CONFIG_HAVE_MMIOTRACE_SUPPORT=y
+CONFIG_IO_DELAY_TYPE_0X80=0
+CONFIG_IO_DELAY_TYPE_0XED=1
+CONFIG_IO_DELAY_TYPE_UDELAY=2
+CONFIG_IO_DELAY_TYPE_NONE=3
+# CONFIG_IO_DELAY_0X80 is not set
+# CONFIG_IO_DELAY_0XED is not set
+# CONFIG_IO_DELAY_UDELAY is not set
+CONFIG_IO_DELAY_NONE=y
+CONFIG_DEFAULT_IO_DELAY_TYPE=3
+# CONFIG_DEBUG_BOOT_PARAMS is not set
+# CONFIG_CPA_DEBUG is not set
+# CONFIG_OPTIMIZE_INLINING is not set
+# CONFIG_DEBUG_NMI_SELFTEST is not set
+# CONFIG_X86_DEBUG_STATIC_CPU_HAS is not set
+
+#
+# Security options
+#
+CONFIG_KEYS=y
+CONFIG_PERSISTENT_KEYRINGS=y
+CONFIG_BIG_KEYS=y
+CONFIG_ENCRYPTED_KEYS=m
+CONFIG_KEYS_DEBUG_PROC_KEYS=y
+# CONFIG_SECURITY_DMESG_RESTRICT is not set
+CONFIG_SECURITY=y
+CONFIG_SECURITYFS=y
+CONFIG_SECURITY_NETWORK=y
+# CONFIG_SECURITY_NETWORK_XFRM is not set
+CONFIG_SECURITY_PATH=y
+# CONFIG_INTEL_TXT is not set
+# CONFIG_SECURITY_SELINUX is not set
+# CONFIG_SECURITY_SMACK is not set
+CONFIG_SECURITY_TOMOYO=y
+CONFIG_SECURITY_TOMOYO_MAX_ACCEPT_ENTRY=2048
+CONFIG_SECURITY_TOMOYO_MAX_AUDIT_LOG=1024
+# CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER is not set
+CONFIG_SECURITY_TOMOYO_POLICY_LOADER="/sbin/tomoyo-init"
+CONFIG_SECURITY_TOMOYO_ACTIVATION_TRIGGER="/sbin/init"
+# CONFIG_SECURITY_APPARMOR is not set
+# CONFIG_SECURITY_YAMA is not set
+# CONFIG_IMA is not set
+# CONFIG_EVM is not set
+CONFIG_DEFAULT_SECURITY_TOMOYO=y
+# CONFIG_DEFAULT_SECURITY_DAC is not set
+CONFIG_DEFAULT_SECURITY="tomoyo"
+CONFIG_XOR_BLOCKS=m
+CONFIG_ASYNC_CORE=m
+CONFIG_ASYNC_MEMCPY=m
+CONFIG_ASYNC_XOR=m
+CONFIG_ASYNC_PQ=m
+CONFIG_ASYNC_RAID6_RECOV=m
+CONFIG_CRYPTO=y
+
+#
+# Crypto core or helper
+#
+CONFIG_CRYPTO_ALGAPI=y
+CONFIG_CRYPTO_ALGAPI2=y
+CONFIG_CRYPTO_AEAD=m
+CONFIG_CRYPTO_AEAD2=y
+CONFIG_CRYPTO_BLKCIPHER=m
+CONFIG_CRYPTO_BLKCIPHER2=y
+CONFIG_CRYPTO_HASH=y
+CONFIG_CRYPTO_HASH2=y
+CONFIG_CRYPTO_RNG=m
+CONFIG_CRYPTO_RNG2=y
+CONFIG_CRYPTO_PCOMP=m
+CONFIG_CRYPTO_PCOMP2=y
+CONFIG_CRYPTO_MANAGER=y
+CONFIG_CRYPTO_MANAGER2=y
+CONFIG_CRYPTO_USER=m
+CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
+CONFIG_CRYPTO_GF128MUL=m
+CONFIG_CRYPTO_NULL=m
+CONFIG_CRYPTO_PCRYPT=m
+CONFIG_CRYPTO_WORKQUEUE=y
+CONFIG_CRYPTO_CRYPTD=m
+CONFIG_CRYPTO_AUTHENC=m
+CONFIG_CRYPTO_TEST=m
+CONFIG_CRYPTO_ABLK_HELPER=m
+CONFIG_CRYPTO_GLUE_HELPER_X86=m
+
+#
+# Authenticated Encryption with Associated Data
+#
+CONFIG_CRYPTO_CCM=m
+CONFIG_CRYPTO_GCM=m
+CONFIG_CRYPTO_SEQIV=m
+
+#
+# Block modes
+#
+CONFIG_CRYPTO_CBC=m
+CONFIG_CRYPTO_CTR=m
+CONFIG_CRYPTO_CTS=m
+CONFIG_CRYPTO_ECB=m
+CONFIG_CRYPTO_LRW=m
+CONFIG_CRYPTO_PCBC=m
+CONFIG_CRYPTO_XTS=m
+
+#
+# Hash modes
+#
+CONFIG_CRYPTO_CMAC=m
+CONFIG_CRYPTO_HMAC=m
+CONFIG_CRYPTO_XCBC=m
+CONFIG_CRYPTO_VMAC=m
+
+#
+# Digest
+#
+CONFIG_CRYPTO_CRC32C=y
+CONFIG_CRYPTO_CRC32C_INTEL=m
+CONFIG_CRYPTO_CRC32=m
+CONFIG_CRYPTO_CRC32_PCLMUL=m
+CONFIG_CRYPTO_CRCT10DIF=m
+CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
+CONFIG_CRYPTO_GHASH=m
+CONFIG_CRYPTO_MD4=y
+CONFIG_CRYPTO_MD5=m
+CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_CRYPTO_RMD128=m
+CONFIG_CRYPTO_RMD160=m
+CONFIG_CRYPTO_RMD256=m
+CONFIG_CRYPTO_RMD320=m
+CONFIG_CRYPTO_SHA1=m
+CONFIG_CRYPTO_SHA1_SSSE3=m
+CONFIG_CRYPTO_SHA256_SSSE3=m
+CONFIG_CRYPTO_SHA512_SSSE3=m
+CONFIG_CRYPTO_SHA256=m
+CONFIG_CRYPTO_SHA512=m
+CONFIG_CRYPTO_TGR192=m
+CONFIG_CRYPTO_WP512=m
+CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=m
+
+#
+# Ciphers
+#
+CONFIG_CRYPTO_AES=y
+CONFIG_CRYPTO_AES_X86_64=m
+CONFIG_CRYPTO_AES_NI_INTEL=m
+CONFIG_CRYPTO_ANUBIS=m
+CONFIG_CRYPTO_ARC4=m
+CONFIG_CRYPTO_BLOWFISH=m
+CONFIG_CRYPTO_BLOWFISH_COMMON=m
+CONFIG_CRYPTO_BLOWFISH_X86_64=m
+CONFIG_CRYPTO_CAMELLIA=m
+CONFIG_CRYPTO_CAMELLIA_X86_64=m
+CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
+CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
+CONFIG_CRYPTO_CAST_COMMON=m
+CONFIG_CRYPTO_CAST5=m
+CONFIG_CRYPTO_CAST5_AVX_X86_64=m
+CONFIG_CRYPTO_CAST6=m
+CONFIG_CRYPTO_CAST6_AVX_X86_64=m
+CONFIG_CRYPTO_DES=m
+CONFIG_CRYPTO_FCRYPT=m
+CONFIG_CRYPTO_KHAZAD=m
+CONFIG_CRYPTO_SALSA20=m
+CONFIG_CRYPTO_SALSA20_X86_64=m
+CONFIG_CRYPTO_SEED=m
+CONFIG_CRYPTO_SERPENT=m
+CONFIG_CRYPTO_SERPENT_SSE2_X86_64=m
+CONFIG_CRYPTO_SERPENT_AVX_X86_64=m
+CONFIG_CRYPTO_SERPENT_AVX2_X86_64=m
+CONFIG_CRYPTO_TEA=m
+CONFIG_CRYPTO_TWOFISH=m
+CONFIG_CRYPTO_TWOFISH_COMMON=m
+CONFIG_CRYPTO_TWOFISH_X86_64=m
+CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=m
+CONFIG_CRYPTO_TWOFISH_AVX_X86_64=m
+
+#
+# Compression
+#
+CONFIG_CRYPTO_DEFLATE=m
+CONFIG_CRYPTO_ZLIB=m
+CONFIG_CRYPTO_LZO=y
+CONFIG_CRYPTO_LZ4=m
+CONFIG_CRYPTO_LZ4HC=m
+
+#
+# Random Number Generation
+#
+CONFIG_CRYPTO_ANSI_CPRNG=m
+CONFIG_CRYPTO_USER_API=m
+CONFIG_CRYPTO_USER_API_HASH=m
+CONFIG_CRYPTO_USER_API_SKCIPHER=m
+CONFIG_CRYPTO_HASH_INFO=y
+CONFIG_CRYPTO_HW=y
+# CONFIG_CRYPTO_DEV_PADLOCK is not set
+# CONFIG_CRYPTO_DEV_CCP is not set
+CONFIG_ASYMMETRIC_KEY_TYPE=m
+CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=m
+CONFIG_PUBLIC_KEY_ALGO_RSA=m
+CONFIG_X509_CERTIFICATE_PARSER=m
+CONFIG_HAVE_KVM=y
+CONFIG_HAVE_KVM_IRQCHIP=y
+CONFIG_HAVE_KVM_IRQ_ROUTING=y
+CONFIG_HAVE_KVM_EVENTFD=y
+CONFIG_KVM_APIC_ARCHITECTURE=y
+CONFIG_KVM_MMIO=y
+CONFIG_KVM_ASYNC_PF=y
+CONFIG_HAVE_KVM_MSI=y
+CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
+CONFIG_KVM_VFIO=y
+CONFIG_VIRTUALIZATION=y
+CONFIG_KVM=m
+CONFIG_KVM_INTEL=m
+# CONFIG_KVM_AMD is not set
+CONFIG_KVM_MMU_AUDIT=y
+CONFIG_KVM_DEVICE_ASSIGNMENT=y
+CONFIG_BINARY_PRINTF=y
+
+#
+# Library routines
+#
+CONFIG_RAID6_PQ=m
+CONFIG_BITREVERSE=y
+CONFIG_GENERIC_STRNCPY_FROM_USER=y
+CONFIG_GENERIC_STRNLEN_USER=y
+CONFIG_GENERIC_NET_UTILS=y
+CONFIG_GENERIC_FIND_FIRST_BIT=y
+CONFIG_GENERIC_PCI_IOMAP=y
+CONFIG_GENERIC_IOMAP=y
+CONFIG_GENERIC_IO=y
+CONFIG_PERCPU_RWSEM=y
+CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
+CONFIG_CRC_CCITT=m
+CONFIG_CRC16=y
+CONFIG_CRC_T10DIF=m
+CONFIG_CRC_ITU_T=m
+CONFIG_CRC32=y
+# CONFIG_CRC32_SELFTEST is not set
+CONFIG_CRC32_SLICEBY8=y
+# CONFIG_CRC32_SLICEBY4 is not set
+# CONFIG_CRC32_SARWATE is not set
+# CONFIG_CRC32_BIT is not set
+CONFIG_CRC7=m
+CONFIG_LIBCRC32C=m
+CONFIG_CRC8=m
+# CONFIG_AUDIT_ARCH_COMPAT_GENERIC is not set
+# CONFIG_RANDOM32_SELFTEST is not set
+CONFIG_ZLIB_INFLATE=y
+CONFIG_ZLIB_DEFLATE=y
+CONFIG_LZO_COMPRESS=y
+CONFIG_LZO_DECOMPRESS=y
+CONFIG_LZ4_COMPRESS=m
+CONFIG_LZ4HC_COMPRESS=m
+CONFIG_LZ4_DECOMPRESS=y
+CONFIG_XZ_DEC=y
+CONFIG_XZ_DEC_X86=y
+CONFIG_XZ_DEC_POWERPC=y
+CONFIG_XZ_DEC_IA64=y
+CONFIG_XZ_DEC_ARM=y
+CONFIG_XZ_DEC_ARMTHUMB=y
+CONFIG_XZ_DEC_SPARC=y
+CONFIG_XZ_DEC_BCJ=y
+# CONFIG_XZ_DEC_TEST is not set
+CONFIG_DECOMPRESS_GZIP=y
+CONFIG_DECOMPRESS_BZIP2=y
+CONFIG_DECOMPRESS_LZMA=y
+CONFIG_DECOMPRESS_XZ=y
+CONFIG_DECOMPRESS_LZO=y
+CONFIG_DECOMPRESS_LZ4=y
+CONFIG_GENERIC_ALLOCATOR=y
+CONFIG_REED_SOLOMON=m
+CONFIG_REED_SOLOMON_ENC8=y
+CONFIG_REED_SOLOMON_DEC8=y
+CONFIG_TEXTSEARCH=y
+CONFIG_TEXTSEARCH_KMP=m
+CONFIG_TEXTSEARCH_BM=m
+CONFIG_TEXTSEARCH_FSM=m
+CONFIG_ASSOCIATIVE_ARRAY=y
+CONFIG_HAS_IOMEM=y
+CONFIG_HAS_IOPORT_MAP=y
+CONFIG_HAS_DMA=y
+CONFIG_CHECK_SIGNATURE=y
+CONFIG_CPU_RMAP=y
+CONFIG_DQL=y
+CONFIG_NLATTR=y
+CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
+CONFIG_AVERAGE=y
+CONFIG_CLZ_TAB=y
+CONFIG_CORDIC=m
+CONFIG_DDR=y
+CONFIG_MPILIB=m
+CONFIG_OID_REGISTRY=m
+CONFIG_UCS2_STRING=y
+CONFIG_FONT_SUPPORT=y
+CONFIG_FONTS=y
+# CONFIG_FONT_8x8 is not set
+CONFIG_FONT_8x16=y
+# CONFIG_FONT_6x11 is not set
+# CONFIG_FONT_7x14 is not set
+# CONFIG_FONT_PEARL_8x8 is not set
+# CONFIG_FONT_ACORN_8x8 is not set
+# CONFIG_FONT_MINI_4x6 is not set
+# CONFIG_FONT_SUN8x16 is not set
+# CONFIG_FONT_SUN12x22 is not set
+# CONFIG_FONT_10x18 is not set
+CONFIG_FONT_AUTOSELECT=y
diff --git a/distro/archlinux/PKGBUILD b/distro/archlinux/PKGBUILD
new file mode 100644
index 0000000..138018a
--- /dev/null
+++ b/distro/archlinux/PKGBUILD
@@ -0,0 +1,70 @@
+pkgname=('linux-pf' 'linux-pf-headers')
+pkgver=3.15.0
+pkgrel=1
+_pkgsuffix="-pf$pkgrel"
+pkgdesc="pf-kernel with modules"
+arch=('i686' 'x86_64')
+makedepends=('xz' 'rsync' 'bc')
+options=('!strip')
+license=('GPL')
+url="http://pf.natalenko.name/"
+
+build() {
+ # Go to kernel's tree root
+ cd $startdir
+
+ # Remove depmod from kernel script, steal this trick from Arch
+ sed -i '2iexit 0' scripts/depmod.sh
+
+ # Detect CPUs count automatically
+ CPUS_COUNT=`cat /proc/cpuinfo | grep processor | wc -l`
+ echo "Compiling using $CPUS_COUNT thread(s)"
+ LOCALVERSION="" make -j$CPUS_COUNT bzImage modules || return 1
+}
+
+package_linux-pf() {
+ depends=('coreutils' 'linux-firmware' 'kmod' 'mkinitcpio')
+ provides=('linux-pf')
+ install='linux-pf.install'
+
+ cd $startdir
+
+ # Note that modules are in /usr/lib/modules now
+ mkdir -p $pkgdir/{usr/lib/modules,boot}
+ make INSTALL_MOD_PATH=$pkgdir/usr modules_install || return 1
+
+ # Running depmod for installed modules
+ depmod -b "$pkgdir/usr" -F System.map "$pkgver$_pkgsuffix"
+
+ # There's no separation of firmware depending on kernel version -
+ # comment this line if you intend on using the built kernel exclusively,
+ # otherwise there'll be file conflicts with the existing kernel
+ rm -rf $pkgdir/usr/lib/firmware
+
+ rm -f $pkgdir/usr/lib/modules/$pkgver$_pkgsuffix/{source,build}
+
+ install -Dm644 "System.map" "$pkgdir/boot/System.map-linux-pf"
+ install -Dm644 "arch/x86/boot/bzImage" "$pkgdir/boot/vmlinuz-linux-pf"
+ install -Dm644 "distro/archlinux/linux-pf.preset" "$pkgdir/etc/mkinitcpio.d/linux-pf.preset"
+}
+
+package_linux-pf-headers() {
+ provides=('linux-pf-headers')
+
+ cd $startdir
+
+ mkdir -p $pkgdir/usr/lib/modules/$pkgver$_pkgsuffix/
+ cd $pkgdir/usr/lib/modules/$pkgver$_pkgsuffix/
+ ln -s ../../../src/linux-$pkgver$_pkgsuffix build
+
+ cd $startdir
+
+ mkdir -p $pkgdir/usr/src/linux-$pkgver$_pkgsuffix
+ make INSTALL_HDR_PATH=$pkgdir/usr/src/linux-$pkgver$_pkgsuffix headers_install
+ install -Dm644 .config $pkgdir/usr/src/linux-$pkgver$_pkgsuffix
+ install -Dm644 Module.symvers $pkgdir/usr/src/linux-$pkgver$_pkgsuffix
+ rsync -a --include='*/' --include='Kbuild*' --include='Kconfig*' --include='*Makefile*' --include='auto.conf' --include='autoconf.h' --include='kconfig.h' --include='asm-offsets.s' --exclude='*' --prune-empty-dirs . $pkgdir/usr/src/linux-$pkgver$_pkgsuffix
+ rsync -a scripts $pkgdir/usr/src/linux-$pkgver$_pkgsuffix
+ rsync -a include $pkgdir/usr/src/linux-$pkgver$_pkgsuffix
+ rsync -a arch/x86/include $pkgdir/usr/src/linux-$pkgver$_pkgsuffix/arch/x86
+}
diff --git a/distro/archlinux/linux-pf.install b/distro/archlinux/linux-pf.install
new file mode 100644
index 0000000..1da6ff3
--- /dev/null
+++ b/distro/archlinux/linux-pf.install
@@ -0,0 +1,17 @@
+KERNEL_VERSION="3.15.0"
+LOCAL_VERSION="-pf1"
+
+post_install () {
+ echo ">>> Updating module dependencies..."
+ /sbin/depmod -A -v ${KERNEL_VERSION}${LOCAL_VERSION}
+ echo ">>> Creating initial ramdisk..."
+ mkinitcpio -p linux-pf
+}
+
+post_upgrade() {
+ echo ">>> Updating module dependencies..."
+ /sbin/depmod -A -v ${KERNEL_VERSION}${LOCAL_VERSION}
+ echo ">>> Creating initial ramdisk..."
+ mkinitcpio -p linux-pf
+}
+
diff --git a/distro/archlinux/linux-pf.preset b/distro/archlinux/linux-pf.preset
new file mode 100644
index 0000000..e77e3f3
--- /dev/null
+++ b/distro/archlinux/linux-pf.preset
@@ -0,0 +1,6 @@
+ALL_config="/etc/mkinitcpio.conf"
+ALL_kver="/boot/vmlinuz-linux-pf"
+PRESETS=('default' 'fallback')
+default_image="/boot/initramfs-linux-pf.img"
+fallback_image="/boot/initramfs-linux-pf-fallback.img"
+fallback_options="-S autodetect"
diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
index 37d7302..49ba4a3 100644
--- a/drivers/acpi/acpi_pad.c
+++ b/drivers/acpi/acpi_pad.c
@@ -153,6 +153,7 @@ static int power_saving_thread(void *data)
u64 last_jiffies = 0;
sched_setscheduler(current, SCHED_RR, &param);
+ set_freezable();
while (!kthread_should_stop()) {
int cpu;
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 86d5e4f..af5d673 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -870,6 +870,7 @@ void dpm_resume(pm_message_t state)
cpufreq_resume();
}
+EXPORT_SYMBOL_GPL(dpm_resume);
/**
* device_complete - Complete a PM transition for given device.
@@ -946,6 +947,7 @@ void dpm_complete(pm_message_t state)
list_splice(&list, &dpm_list);
mutex_unlock(&dpm_list_mtx);
}
+EXPORT_SYMBOL_GPL(dpm_complete);
/**
* dpm_resume_end - Execute "resume" callbacks and complete system transition.
@@ -1474,6 +1476,7 @@ int dpm_suspend(pm_message_t state)
dpm_show_time(starttime, state, NULL);
return error;
}
+EXPORT_SYMBOL_GPL(dpm_suspend);
/**
* device_prepare - Prepare a device for system power transition.
@@ -1578,6 +1581,7 @@ int dpm_prepare(pm_message_t state)
mutex_unlock(&dpm_list_mtx);
return error;
}
+EXPORT_SYMBOL_GPL(dpm_prepare);
/**
* dpm_suspend_start - Prepare devices for PM transition and suspend them.
diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 2d56f41..2f530c8 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -23,6 +23,7 @@
* if wakeup events are registered during or immediately before the transition.
*/
bool events_check_enabled __read_mostly;
+EXPORT_SYMBOL_GPL(events_check_enabled);
/*
* Combined counters of registered wakeup events and wakeup events in progress.
@@ -715,6 +716,7 @@ bool pm_wakeup_pending(void)
return ret;
}
+EXPORT_SYMBOL_GPL(pm_wakeup_pending);
/**
* pm_get_wakeup_count - Read the number of registered wakeup events.
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 64c60ed..3ca69f3 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -577,6 +577,7 @@ int xen_blkif_schedule(void *arg)
int ret;
xen_blkif_get(blkif);
+ set_freezable();
while (!kthread_should_stop()) {
if (try_to_freeze())
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 9909bef..b37783f 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -135,7 +135,7 @@ int drm_gem_object_init(struct drm_device *dev,
drm_gem_private_object_init(dev, obj, size);
- filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+ filp = shmem_file_setup("drm mm object", size, VM_NORESERVE, 1);
if (IS_ERR(filp))
return PTR_ERR(filp);
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 75f3190..8f9b4c1 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -336,7 +336,7 @@ int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_storage)
if (!persistent_swap_storage) {
swap_storage = shmem_file_setup("ttm swap",
ttm->num_pages << PAGE_SHIFT,
- 0);
+ 0, 0);
if (unlikely(IS_ERR(swap_storage))) {
pr_err("Failed allocating swap storage\n");
return PTR_ERR(swap_storage);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2382cfc..85c2a98 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -33,6 +33,7 @@
*/
#include <linux/kthread.h>
+#include <linux/freezer.h>
#include <linux/blkdev.h>
#include <linux/sysctl.h>
#include <linux/seq_file.h>
@@ -7418,6 +7419,8 @@ void md_do_sync(struct md_thread *thread)
*
*/
+ set_freezable();
+
do {
mddev->curr_resync = 2;
@@ -7441,6 +7444,9 @@ void md_do_sync(struct md_thread *thread)
* time 'round when curr_resync == 2
*/
continue;
+
+ try_to_freeze();
+
/* We need to wait 'interruptible' so as not to
* contribute to the load average, and not to
* be caught by 'softlockup'
@@ -7453,6 +7459,7 @@ void md_do_sync(struct md_thread *thread)
" share one or more physical units)\n",
desc, mdname(mddev), mdname(mddev2));
mddev_put(mddev2);
+ try_to_freeze();
if (signal_pending(current))
flush_signals(current);
schedule();
@@ -7784,8 +7791,10 @@ no_add:
*/
void md_check_recovery(struct mddev *mddev)
{
- if (mddev->suspended)
+#ifdef CONFIG_FREEZER
+ if (mddev->suspended || unlikely(atomic_read(&system_freezing_cnt)))
return;
+#endif
if (mddev->bitmap)
bitmap_daemon_work(mddev);
diff --git a/drivers/net/irda/stir4200.c b/drivers/net/irda/stir4200.c
index dd1bd10..9eb8719 100644
--- a/drivers/net/irda/stir4200.c
+++ b/drivers/net/irda/stir4200.c
@@ -738,7 +738,9 @@ static int stir_transmit_thread(void *arg)
struct net_device *dev = stir->netdev;
struct sk_buff *skb;
- while (!kthread_should_stop()) {
+ set_freezable();
+
+ while (!kthread_freezable_should_stop(NULL)) {
#ifdef CONFIG_PM
/* if suspending, then power off and wait */
if (unlikely(freezing(current))) {
diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 713a972..f877e48 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -387,7 +387,7 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
name = asma->name;
/* ... and allocate the backing shmem file */
- vmfile = shmem_file_setup(name, asma->size, vma->vm_flags);
+ vmfile = shmem_file_setup(name, asma->size, vma->vm_flags, 0);
if (unlikely(IS_ERR(vmfile))) {
ret = PTR_ERR(vmfile);
goto out;
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 3ad0b61..ca51d69 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -2437,6 +2437,7 @@ int vt_kmsg_redirect(int new)
else
return kmsg_con;
}
+EXPORT_SYMBOL_GPL(vt_kmsg_redirect);
/*
* Console on virtual terminal
diff --git a/drivers/uwb/uwbd.c b/drivers/uwb/uwbd.c
index bdcb13c..ce8fc9c 100644
--- a/drivers/uwb/uwbd.c
+++ b/drivers/uwb/uwbd.c
@@ -271,6 +271,7 @@ static int uwbd(void *param)
struct uwb_event *evt;
int should_stop = 0;
+ set_freezable();
while (1) {
wait_event_interruptible_timeout(
rc->uwbd.wq,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9833149..c64eba5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1743,6 +1743,8 @@ static int cleaner_kthread(void *arg)
struct btrfs_root *root = arg;
int again;
+ set_freezable();
+
do {
again = 0;
@@ -1774,11 +1776,11 @@ static int cleaner_kthread(void *arg)
sleep:
if (!try_to_freeze() && !again) {
set_current_state(TASK_INTERRUPTIBLE);
- if (!kthread_should_stop())
+ if (!kthread_freezable_should_stop(NULL))
schedule();
__set_current_state(TASK_RUNNING);
}
- } while (!kthread_should_stop());
+ } while (!kthread_freezable_should_stop(NULL));
return 0;
}
@@ -1792,6 +1794,8 @@ static int transaction_kthread(void *arg)
unsigned long delay;
bool cannot_commit;
+ set_freezable();
+
do {
cannot_commit = false;
delay = HZ * root->fs_info->commit_interval;
@@ -1836,13 +1840,13 @@ sleep:
btrfs_cleanup_transaction(root);
if (!try_to_freeze()) {
set_current_state(TASK_INTERRUPTIBLE);
- if (!kthread_should_stop() &&
+ if (!kthread_freezable_should_stop(NULL) &&
(!btrfs_transaction_blocked(root->fs_info) ||
cannot_commit))
schedule_timeout(delay);
__set_current_state(TASK_RUNNING);
}
- } while (!kthread_should_stop());
+ } while (!kthread_freezable_should_stop(NULL));
return 0;
}
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 9280202..ae20186 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -8,6 +8,7 @@
#include <linux/writeback.h>
#include <linux/sysctl.h>
#include <linux/gfp.h>
+#include <linux/export.h>
#include "internal.h"
/* A global variable is a bit ugly, but it keeps the code simple */
@@ -50,6 +51,13 @@ static void drop_slab(void)
} while (nr_objects > 10);
}
+/* For TuxOnIce */
+void drop_pagecache(void)
+{
+ iterate_supers(drop_pagecache_sb, NULL);
+}
+EXPORT_SYMBOL_GPL(drop_pagecache);
+
int drop_caches_sysctl_handler(ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos)
{
diff --git a/fs/exec.c b/fs/exec.c
index 238b7aa..39e9ee0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -19,7 +19,7 @@
* current->executable is only used by the procfs. This allows a dispatch
* table to check for several different types of binary formats. We keep
* trying until we recognize the file or we run out of supported binary
- * formats.
+ * formats.
*/
#include <linux/slab.h>
@@ -56,6 +56,7 @@
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
#include <linux/compat.h>
+#include <linux/ksm.h>
#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -1131,6 +1132,7 @@ void setup_new_exec(struct linux_binprm * bprm)
/* An exec changes our domain. We are no longer part of the thread
group */
current->self_exec_id++;
+
flush_signal_handlers(current, 0);
do_close_on_exec(current->files);
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6f9e6fa..8e528d0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2926,6 +2926,7 @@ static int ext4_lazyinit_thread(void *arg)
unsigned long next_wakeup, cur;
BUG_ON(NULL == eli);
+ set_freezable();
cont_thread:
while (true) {
@@ -2965,7 +2966,7 @@ cont_thread:
schedule_timeout_interruptible(next_wakeup - cur);
- if (kthread_should_stop()) {
+ if (kthread_freezable_should_stop(NULL)) {
ext4_clear_request_list();
goto exit_thread;
}
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 4a14d50..7da4bda 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -878,7 +878,9 @@ int gfs2_logd(void *data)
unsigned long t = 1;
DEFINE_WAIT(wait);
- while (!kthread_should_stop()) {
+ set_freezable();
+
+ while (!kthread_freezable_should_stop(NULL)) {
if (gfs2_jrnl_flush_reqd(sdp) || t == 0) {
gfs2_ail1_empty(sdp);
@@ -904,11 +906,11 @@ int gfs2_logd(void *data)
TASK_INTERRUPTIBLE);
if (!gfs2_ail_flush_reqd(sdp) &&
!gfs2_jrnl_flush_reqd(sdp) &&
- !kthread_should_stop())
+ !kthread_freezable_should_stop(NULL))
t = schedule_timeout(t);
} while(t && !gfs2_ail_flush_reqd(sdp) &&
!gfs2_jrnl_flush_reqd(sdp) &&
- !kthread_should_stop());
+ !kthread_freezable_should_stop(NULL));
finish_wait(&sdp->sd_logd_waitq, &wait);
}
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index c4effff..c66cad1 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1433,7 +1433,9 @@ int gfs2_quotad(void *data)
DEFINE_WAIT(wait);
int empty;
- while (!kthread_should_stop()) {
+ set_freezable();
+
+ while (!kthread_freezable_should_stop(NULL)) {
/* Update the master statfs file */
if (sdp->sd_statfs_force_sync) {
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 8d811e0..12510c4 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -2342,6 +2342,8 @@ int jfsIOWait(void *arg)
{
struct lbuf *bp;
+ set_freezable();
+
do {
spin_lock_irq(&log_redrive_lock);
while ((bp = log_redrive_list)) {
@@ -2361,7 +2363,7 @@ int jfsIOWait(void *arg)
schedule();
__set_current_state(TASK_RUNNING);
}
- } while (!kthread_should_stop());
+ } while (!kthread_freezable_should_stop(NULL));
jfs_info("jfsIOWait being killed!");
return 0;
diff --git a/fs/jfs/jfs_txnmgr.c b/fs/jfs/jfs_txnmgr.c
index 564c4f2..0a622bc 100644
--- a/fs/jfs/jfs_txnmgr.c
+++ b/fs/jfs/jfs_txnmgr.c
@@ -2752,6 +2752,8 @@ int jfs_lazycommit(void *arg)
unsigned long flags;
struct jfs_sb_info *sbi;
+ set_freezable();
+
do {
LAZY_LOCK(flags);
jfs_commit_thread_waking = 0; /* OK to wake another thread */
@@ -2811,7 +2813,7 @@ int jfs_lazycommit(void *arg)
__set_current_state(TASK_RUNNING);
remove_wait_queue(&jfs_commit_thread_wait, &wq);
}
- } while (!kthread_should_stop());
+ } while (!kthread_freezable_should_stop(NULL));
if (!list_empty(&TxAnchor.unlock_queue))
jfs_err("jfs_lazycommit being killed w/pending transactions!");
@@ -2936,6 +2938,8 @@ int jfs_sync(void *arg)
struct jfs_inode_info *jfs_ip;
tid_t tid;
+ set_freezable();
+
do {
/*
* write each inode on the anonymous inode list
@@ -2998,7 +3002,7 @@ int jfs_sync(void *arg)
schedule();
__set_current_state(TASK_RUNNING);
}
- } while (!kthread_should_stop());
+ } while (!kthread_freezable_should_stop(NULL));
jfs_info("jfs_sync being killed");
return 0;
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..7bb1322 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2449,6 +2449,8 @@ static int nilfs_segctor_thread(void *arg)
struct the_nilfs *nilfs = sci->sc_super->s_fs_info;
int timeout = 0;
+ set_freezable();
+
sci->sc_timer.data = (unsigned long)current;
sci->sc_timer.function = nilfs_construction_timeout;
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 7445af0..2e6593f 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -121,6 +121,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
"SUnreclaim: %8lu kB\n"
"KernelStack: %8lu kB\n"
"PageTables: %8lu kB\n"
+#ifdef CONFIG_UKSM
+ "KsmZeroPages: %8lu kB\n"
+#endif
#ifdef CONFIG_QUICKLIST
"Quicklists: %8lu kB\n"
#endif
@@ -175,6 +178,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
K(global_page_state(NR_SLAB_UNRECLAIMABLE)),
global_page_state(NR_KERNEL_STACK) * THREAD_SIZE / 1024,
K(global_page_state(NR_PAGETABLE)),
+#ifdef CONFIG_UKSM
+ K(global_page_state(NR_UKSM_ZERO_PAGES)),
+#endif
#ifdef CONFIG_QUICKLIST
K(quicklist_total_size()),
#endif
diff --git a/fs/super.c b/fs/super.c
index 48377f7..8cdbfa3 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -38,6 +38,8 @@
LIST_HEAD(super_blocks);
+EXPORT_SYMBOL_GPL(super_blocks);
+
DEFINE_SPINLOCK(sb_lock);
static char *sb_writers_name[SB_FREEZE_LEVELS] = {
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index a728735..a5d9536 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -498,9 +498,10 @@ xfsaild(
struct xfs_ail *ailp = data;
long tout = 0; /* milliseconds */
+ set_freezable();
current->flags |= PF_MEMALLOC;
- while (!kthread_should_stop()) {
+ while (!kthread_freezable_should_stop(NULL)) {
if (tout && tout <= 20)
__set_current_state(TASK_KILLABLE);
else
@@ -522,6 +523,7 @@ xfsaild(
ailp->xa_target == ailp->xa_target_prev) {
spin_unlock(&ailp->xa_lock);
schedule();
+ try_to_freeze();
tout = 0;
continue;
}
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index a8015a7..e6a878d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -515,12 +515,25 @@ extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size);
#endif
+#ifdef CONFIG_UKSM
+static inline int is_uksm_zero_pfn(unsigned long pfn)
+{
+ extern unsigned long uksm_zero_pfn;
+ return pfn == uksm_zero_pfn;
+}
+#else
+static inline int is_uksm_zero_pfn(unsigned long pfn)
+{
+ return 0;
+}
+#endif
+
#ifdef __HAVE_COLOR_ZERO_PAGE
static inline int is_zero_pfn(unsigned long pfn)
{
extern unsigned long zero_pfn;
unsigned long offset_from_zero_pfn = pfn - zero_pfn;
- return offset_from_zero_pfn <= (zero_page_mask >> PAGE_SHIFT);
+ return offset_from_zero_pfn <= (zero_page_mask >> PAGE_SHIFT) || is_uksm_zero_pfn(pfn);
}
#define my_zero_pfn(addr) page_to_pfn(ZERO_PAGE(addr))
@@ -529,7 +542,7 @@ static inline int is_zero_pfn(unsigned long pfn)
static inline int is_zero_pfn(unsigned long pfn)
{
extern unsigned long zero_pfn;
- return pfn == zero_pfn;
+ return (pfn == zero_pfn) || (is_uksm_zero_pfn(pfn));
}
static inline unsigned long my_zero_pfn(unsigned long addr)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index bba5508..d341bdf 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -32,6 +32,8 @@
/* struct bio, bio_vec and BIO_* flags are defined in blk_types.h */
#include <linux/blk_types.h>
+extern int trap_non_toi_io;
+
#define BIO_DEBUG
#ifdef BIO_DEBUG
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index aa0eaa2..a15689c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -122,13 +122,14 @@ struct bio {
#define BIO_QUIET 10 /* Make BIO Quiet */
#define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
#define BIO_SNAP_STABLE 12 /* bio data must be snapshotted during write */
+#define BIO_TOI 13 /* bio is TuxOnIce submitted */
/*
* Flags starting here get preserved by bio_reset() - this includes
* BIO_POOL_IDX()
*/
-#define BIO_RESET_BITS 13
-#define BIO_OWNS_VEC 13 /* bio_free() should free bvec */
+#define BIO_RESET_BITS 14
+#define BIO_OWNS_VEC 14 /* bio_free() should free bvec */
#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 768fe44..cdd2528 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -39,6 +39,10 @@ SUBSYS(net_cls)
SUBSYS(blkio)
#endif
+#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
+SUBSYS(bfqio)
+#endif
+
#if IS_ENABLED(CONFIG_CGROUP_PERF)
SUBSYS(perf_event)
#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8780312..94610ae 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1572,6 +1572,8 @@ struct super_operations {
#define S_IMA 1024 /* Inode has an associated IMA struct */
#define S_AUTOMOUNT 2048 /* Automount/referral quasi-directory */
#define S_NOSEC 4096 /* no suid or xattr security attributes */
+#define S_ATOMIC_COPY 8192 /* Pages mapped with this inode need to be
+ atomically copied (gem) */
/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -2069,6 +2071,13 @@ extern struct super_block *freeze_bdev(struct block_device *);
extern void emergency_thaw_all(void);
extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
extern int fsync_bdev(struct block_device *);
+extern int fsync_super(struct super_block *);
+extern int fsync_no_super(struct block_device *);
+#define FS_FREEZER_FUSE 1
+#define FS_FREEZER_NORMAL 2
+#define FS_FREEZER_ALL (FS_FREEZER_FUSE | FS_FREEZER_NORMAL)
+void freeze_filesystems(int which);
+void thaw_filesystems(int which);
extern int sb_is_blkdev_sb(struct super_block *sb);
#else
static inline void bd_forget(struct inode *inode) {}
diff --git a/include/linux/fs_uuid.h b/include/linux/fs_uuid.h
new file mode 100644
index 0000000..3234135
--- /dev/null
+++ b/include/linux/fs_uuid.h
@@ -0,0 +1,19 @@
+#include <linux/device.h>
+
+struct hd_struct;
+struct block_device;
+
+struct fs_info {
+ char uuid[16];
+ dev_t dev_t;
+ char *last_mount;
+ int last_mount_size;
+};
+
+int part_matches_fs_info(struct hd_struct *part, struct fs_info *seek);
+dev_t blk_lookup_fs_info(struct fs_info *seek);
+struct fs_info *fs_info_from_block_dev(struct block_device *bdev);
+void free_fs_info(struct fs_info *fs_info);
+int bdev_matches_key(struct block_device *bdev, const char *key);
+struct block_device *next_bdev_of_type(struct block_device *last,
+ const char *key);
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 3be6bb1..51557d1 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -19,21 +19,6 @@ struct mem_cgroup;
#ifdef CONFIG_KSM
int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, unsigned long *vm_flags);
-int __ksm_enter(struct mm_struct *mm);
-void __ksm_exit(struct mm_struct *mm);
-
-static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
-{
- if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
- return __ksm_enter(mm);
- return 0;
-}
-
-static inline void ksm_exit(struct mm_struct *mm)
-{
- if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
- __ksm_exit(mm);
-}
/*
* A KSM page is one of those write-protected "shared pages" or "merged pages"
@@ -76,6 +61,33 @@ struct page *ksm_might_need_to_copy(struct page *page,
int rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc);
void ksm_migrate_page(struct page *newpage, struct page *oldpage);
+#ifdef CONFIG_KSM_LEGACY
+int __ksm_enter(struct mm_struct *mm);
+void __ksm_exit(struct mm_struct *mm);
+static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+{
+ if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
+ return __ksm_enter(mm);
+ return 0;
+}
+
+static inline void ksm_exit(struct mm_struct *mm)
+{
+ if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
+ __ksm_exit(mm);
+}
+
+#elif defined(CONFIG_UKSM)
+static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+{
+ return 0;
+}
+
+static inline void ksm_exit(struct mm_struct *mm)
+{
+}
+#endif /* !CONFIG_UKSM */
+
#else /* !CONFIG_KSM */
static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
@@ -123,4 +135,6 @@ static inline void ksm_migrate_page(struct page *newpage, struct page *oldpage)
#endif /* CONFIG_MMU */
#endif /* !CONFIG_KSM */
+#include <linux/uksm.h>
+
#endif /* __LINUX_KSM_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d677706..03f2b4f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2019,6 +2019,7 @@ int drop_caches_sysctl_handler(struct ctl_table *, int,
unsigned long shrink_slab(struct shrink_control *shrink,
unsigned long nr_pages_scanned,
unsigned long lru_pages);
+void drop_pagecache(void);
#ifndef CONFIG_MMU
#define randomize_va_space 0
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8967e20..c0b9906 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -308,6 +308,9 @@ struct vm_area_struct {
#ifdef CONFIG_NUMA
struct mempolicy *vm_policy; /* NUMA policy for the VMA */
#endif
+#ifdef CONFIG_UKSM
+ struct vma_slot *uksm_vma_slot;
+#endif
};
struct core_thread {
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fac5509..75911e8 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -147,6 +147,9 @@ enum zone_stat_item {
WORKINGSET_NODERECLAIM,
NR_ANON_TRANSPARENT_HUGEPAGES,
NR_FREE_CMA_PAGES,
+#ifdef CONFIG_UKSM
+ NR_UKSM_ZERO_PAGES,
+#endif
NR_VM_ZONE_STAT_ITEMS };
/*
@@ -879,7 +882,7 @@ static inline int is_highmem_idx(enum zone_type idx)
}
/**
- * is_highmem - helper function to quickly check if a struct zone is a
+ * is_highmem - helper function to quickly check if a struct zone is a
* highmem zone or not. This is an attempt to keep references
* to ZONE_{DMA/NORMAL/HIGHMEM/etc} in general code to a minimum.
* @zone - pointer to struct zone variable
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d1fe1a7..7cef674 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -109,6 +109,12 @@ enum pageflags {
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
PG_compound_lock,
#endif
+#ifdef CONFIG_TOI_INCREMENTAL
+ PG_toi_ignore, /* Ignore this page */
+ PG_toi_ro, /* Page was made RO by TOI */
+ PG_toi_cbw, /* Copy the page before it is written to */
+ PG_toi_dirty, /* Page has been modified */
+#endif
__NR_PAGEFLAGS,
/* Filesystems */
@@ -274,6 +280,12 @@ TESTSCFLAG(HWPoison, hwpoison)
PAGEFLAG_FALSE(HWPoison)
#define __PG_HWPOISON 0
#endif
+#ifdef CONFIG_TOI_INCREMENTAL
+PAGEFLAG(TOI_RO, toi_ro)
+PAGEFLAG(TOI_Dirty, toi_dirty)
+PAGEFLAG(TOI_Ignore, toi_ignore)
+PAGEFLAG(TOI_CBW, toi_cbw)
+#endif
u64 stable_page_flags(struct page *page);
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 4d1771c..20ab982 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -46,9 +46,10 @@ static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
extern int shmem_init(void);
extern int shmem_fill_super(struct super_block *sb, void *data, int silent);
extern struct file *shmem_file_setup(const char *name,
- loff_t size, unsigned long flags);
+ loff_t size, unsigned long flags,
+ int atomic_copy);
extern struct file *shmem_kernel_file_setup(const char *name, loff_t size,
- unsigned long flags);
+ unsigned long flags, int atomic_copy);
extern int shmem_zero_setup(struct vm_area_struct *);
extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
extern bool shmem_mapping(struct address_space *mapping);
diff --git a/include/linux/sradix-tree.h b/include/linux/sradix-tree.h
new file mode 100644
index 0000000..6780fdb
--- /dev/null
+++ b/include/linux/sradix-tree.h
@@ -0,0 +1,77 @@
+#ifndef _LINUX_SRADIX_TREE_H
+#define _LINUX_SRADIX_TREE_H
+
+
+#define INIT_SRADIX_TREE(root, mask) \
+do { \
+ (root)->height = 0; \
+ (root)->gfp_mask = (mask); \
+ (root)->rnode = NULL; \
+} while (0)
+
+#define ULONG_BITS (sizeof(unsigned long) * 8)
+#define SRADIX_TREE_INDEX_BITS (8 /* CHAR_BIT */ * sizeof(unsigned long))
+//#define SRADIX_TREE_MAP_SHIFT 6
+//#define SRADIX_TREE_MAP_SIZE (1UL << SRADIX_TREE_MAP_SHIFT)
+//#define SRADIX_TREE_MAP_MASK (SRADIX_TREE_MAP_SIZE-1)
+
+struct sradix_tree_node {
+ unsigned int height; /* Height from the bottom */
+ unsigned int count;
+ unsigned int fulls; /* Number of full sublevel trees */
+ struct sradix_tree_node *parent;
+ void *stores[0];
+};
+
+/* A simple radix tree implementation */
+struct sradix_tree_root {
+ unsigned int height;
+ struct sradix_tree_node *rnode;
+
+ /* Where found to have available empty stores in its sublevels */
+ struct sradix_tree_node *enter_node;
+ unsigned int shift;
+ unsigned int stores_size;
+ unsigned int mask;
+ unsigned long min; /* The first hole index */
+ unsigned long num;
+ //unsigned long *height_to_maxindex;
+
+ /* How the node is allocated and freed. */
+ struct sradix_tree_node *(*alloc)(void);
+ void (*free)(struct sradix_tree_node *node);
+
+ /* When a new node is added and removed */
+ void (*extend)(struct sradix_tree_node *parent, struct sradix_tree_node *child);
+ void (*assign)(struct sradix_tree_node *node, unsigned index, void *item);
+ void (*rm)(struct sradix_tree_node *node, unsigned offset);
+};
+
+struct sradix_tree_path {
+ struct sradix_tree_node *node;
+ int offset;
+};
+
+static inline
+void init_sradix_tree_root(struct sradix_tree_root *root, unsigned long shift)
+{
+ root->height = 0;
+ root->rnode = NULL;
+ root->shift = shift;
+ root->stores_size = 1UL << shift;
+ root->mask = root->stores_size - 1;
+}
+
+
+extern void *sradix_tree_next(struct sradix_tree_root *root,
+ struct sradix_tree_node *node, unsigned long index,
+ int (*iter)(void *, unsigned long));
+
+extern int sradix_tree_enter(struct sradix_tree_root *root, void **item, int num);
+
+extern void sradix_tree_delete_from_leaf(struct sradix_tree_root *root,
+ struct sradix_tree_node *node, unsigned long index);
+
+extern void *sradix_tree_lookup(struct sradix_tree_root *root, unsigned long index);
+
+#endif /* _LINUX_SRADIX_TREE_H */
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index f73cabf..5fde316 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -419,6 +419,74 @@ extern bool pm_print_times_enabled;
#define pm_print_times_enabled (false)
#endif
+enum {
+ TOI_CAN_HIBERNATE,
+ TOI_CAN_RESUME,
+ TOI_RESUME_DEVICE_OK,
+ TOI_NORESUME_SPECIFIED,
+ TOI_SANITY_CHECK_PROMPT,
+ TOI_CONTINUE_REQ,
+ TOI_RESUMED_BEFORE,
+ TOI_BOOT_TIME,
+ TOI_NOW_RESUMING,
+ TOI_IGNORE_LOGLEVEL,
+ TOI_TRYING_TO_RESUME,
+ TOI_LOADING_ALT_IMAGE,
+ TOI_STOP_RESUME,
+ TOI_IO_STOPPED,
+ TOI_NOTIFIERS_PREPARE,
+ TOI_CLUSTER_MODE,
+ TOI_BOOT_KERNEL,
+ TOI_DEVICE_HOTPLUG_LOCKED,
+};
+
+#ifdef CONFIG_TOI
+
+/* Used in init dir files */
+extern unsigned long toi_state;
+#define set_toi_state(bit) (set_bit(bit, &toi_state))
+#define clear_toi_state(bit) (clear_bit(bit, &toi_state))
+#define test_toi_state(bit) (test_bit(bit, &toi_state))
+extern int toi_running;
+
+#define test_action_state(bit) (test_bit(bit, &toi_bkd.toi_action))
+extern int try_tuxonice_hibernate(void);
+
+#else /* !CONFIG_TOI */
+
+#define toi_state (0)
+#define set_toi_state(bit) do { } while (0)
+#define clear_toi_state(bit) do { } while (0)
+#define test_toi_state(bit) (0)
+#define toi_running (0)
+
+static inline int try_tuxonice_hibernate(void) { return 0; }
+#define test_action_state(bit) (0)
+
+#endif /* CONFIG_TOI */
+
+#ifdef CONFIG_HIBERNATION
+#ifdef CONFIG_TOI
+extern void try_tuxonice_resume(void);
+#else
+#define try_tuxonice_resume() do { } while (0)
+#endif
+
+extern int resume_attempted;
+extern int software_resume(void);
+
+static inline void check_resume_attempted(void)
+{
+ if (resume_attempted)
+ return;
+
+ software_resume();
+}
+#else
+#define check_resume_attempted() do { } while (0)
+#define resume_attempted (0)
+#endif
+
#ifdef CONFIG_PM_AUTOSLEEP
/* kernel/power/autosleep.c */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 3507115..c133781 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -301,6 +301,7 @@ extern unsigned long totalram_pages;
extern unsigned long totalreserve_pages;
extern unsigned long dirty_balance_reserve;
extern unsigned long nr_free_buffer_pages(void);
+extern unsigned long nr_unallocated_buffer_pages(void);
extern unsigned long nr_free_pagecache_pages(void);
/* Definition of global_page_state not available yet */
@@ -350,6 +351,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
struct zone *zone,
unsigned long *nr_scanned);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
+extern unsigned long shrink_memory_mask(unsigned long nr_to_reclaim,
+ gfp_t mask);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern unsigned long vm_total_pages;
@@ -463,13 +466,17 @@ extern void swapcache_free(swp_entry_t, struct page *page);
extern int free_swap_and_cache(swp_entry_t);
extern int swap_type_of(dev_t, sector_t, struct block_device **);
extern unsigned int count_swap_pages(int, int);
+extern sector_t map_swap_entry(swp_entry_t entry, struct block_device **);
extern sector_t map_swap_page(struct page *, struct block_device **);
extern sector_t swapdev_block(int, pgoff_t);
+extern struct swap_info_struct *get_swap_info_struct(unsigned);
extern int page_swapcount(struct page *);
extern struct swap_info_struct *page_swap_info(struct page *);
extern int reuse_swap_page(struct page *);
extern int try_to_free_swap(struct page *);
struct backing_dev_info;
+extern void get_swap_range_of_type(int type, swp_entry_t *start,
+ swp_entry_t *end, unsigned int limit);
#ifdef CONFIG_MEMCG
extern void
diff --git a/include/linux/uksm.h b/include/linux/uksm.h
new file mode 100644
index 0000000..a644bca
--- /dev/null
+++ b/include/linux/uksm.h
@@ -0,0 +1,146 @@
+#ifndef __LINUX_UKSM_H
+#define __LINUX_UKSM_H
+/*
+ * Memory merging support.
+ *
+ * This code enables dynamic sharing of identical pages found in different
+ * memory areas, even if they are not shared by fork().
+ */
+
+/* if !CONFIG_UKSM this file should not be compiled at all. */
+#ifdef CONFIG_UKSM
+
+#include <linux/bitops.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/rmap.h>
+#include <linux/sched.h>
+
+extern unsigned long zero_pfn __read_mostly;
+extern unsigned long uksm_zero_pfn __read_mostly;
+extern struct page *empty_uksm_zero_page;
+
+/* must be done before linked to mm */
+extern void uksm_vma_add_new(struct vm_area_struct *vma);
+extern void uksm_remove_vma(struct vm_area_struct *vma);
+
+#define UKSM_SLOT_NEED_SORT (1 << 0)
+#define UKSM_SLOT_NEED_RERAND (1 << 1)
+#define UKSM_SLOT_SCANNED (1 << 2) /* It's scanned in this round */
+#define UKSM_SLOT_FUL_SCANNED (1 << 3)
+#define UKSM_SLOT_IN_UKSM (1 << 4)
+
+struct vma_slot {
+ struct sradix_tree_node *snode;
+ unsigned long sindex;
+
+ struct list_head slot_list;
+ unsigned long fully_scanned_round;
+ unsigned long dedup_num;
+ unsigned long pages_scanned;
+ unsigned long last_scanned;
+ unsigned long pages_to_scan;
+ struct scan_rung *rung;
+ struct page **rmap_list_pool;
+ unsigned int *pool_counts;
+ unsigned long pool_size;
+ struct vm_area_struct *vma;
+ struct mm_struct *mm;
+ unsigned long ctime_j;
+ unsigned long pages;
+ unsigned long flags;
+ unsigned long pages_cowed; /* pages cowed this round */
+ unsigned long pages_merged; /* pages merged this round */
+ unsigned long pages_bemerged;
+
+ /* when it has page merged in this eval round */
+ struct list_head dedup_list;
+};
+
+static inline void uksm_unmap_zero_page(pte_t pte)
+{
+ if (pte_pfn(pte) == uksm_zero_pfn)
+ __dec_zone_page_state(empty_uksm_zero_page, NR_UKSM_ZERO_PAGES);
+}
+
+static inline void uksm_map_zero_page(pte_t pte)
+{
+ if (pte_pfn(pte) == uksm_zero_pfn)
+ __inc_zone_page_state(empty_uksm_zero_page, NR_UKSM_ZERO_PAGES);
+}
+
+static inline void uksm_cow_page(struct vm_area_struct *vma, struct page *page)
+{
+ if (vma->uksm_vma_slot && PageKsm(page))
+ vma->uksm_vma_slot->pages_cowed++;
+}
+
+static inline void uksm_cow_pte(struct vm_area_struct *vma, pte_t pte)
+{
+ if (vma->uksm_vma_slot && pte_pfn(pte) == uksm_zero_pfn)
+ vma->uksm_vma_slot->pages_cowed++;
+}
+
+static inline int uksm_flags_can_scan(unsigned long vm_flags)
+{
+#ifndef VM_SAO
+#define VM_SAO 0
+#endif
+ return !(vm_flags & (VM_PFNMAP | VM_IO | VM_DONTEXPAND |
+ VM_HUGETLB | VM_NONLINEAR | VM_MIXEDMAP |
+ VM_SHARED | VM_MAYSHARE | VM_GROWSUP | VM_GROWSDOWN | VM_SAO));
+}
+
+static inline void uksm_vm_flags_mod(unsigned long *vm_flags_p)
+{
+ if (uksm_flags_can_scan(*vm_flags_p))
+ *vm_flags_p |= VM_MERGEABLE;
+}
+
+/*
+ * Just a wrapper for BUG_ON for where ksm_zeropage must not be. TODO: it will
+ * be removed when uksm zero page patch is stable enough.
+ */
+static inline void uksm_bugon_zeropage(pte_t pte)
+{
+ BUG_ON(pte_pfn(pte) == uksm_zero_pfn);
+}
+#else
+static inline void uksm_vma_add_new(struct vm_area_struct *vma)
+{
+}
+
+static inline void uksm_remove_vma(struct vm_area_struct *vma)
+{
+}
+
+static inline void uksm_unmap_zero_page(pte_t pte)
+{
+}
+
+static inline void uksm_map_zero_page(pte_t pte)
+{
+}
+
+static inline void uksm_cow_page(struct vm_area_struct *vma, struct page *page)
+{
+}
+
+static inline void uksm_cow_pte(struct vm_area_struct *vma, pte_t pte)
+{
+}
+
+static inline int uksm_flags_can_scan(unsigned long vm_flags)
+{
+ return 0;
+}
+
+static inline void uksm_vm_flags_mod(unsigned long *vm_flags_p)
+{
+}
+
+static inline void uksm_bugon_zeropage(pte_t pte)
+{
+}
+#endif /* !CONFIG_UKSM */
+#endif /* __LINUX_UKSM_H */
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 1a85940..3025606 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -27,6 +27,8 @@
#define NETLINK_ECRYPTFS 19
#define NETLINK_RDMA 20
#define NETLINK_CRYPTO 21 /* Crypto layer */
+#define NETLINK_TOI_USERUI 22 /* TuxOnIce's userui */
+#define NETLINK_TOI_USM 23 /* Userspace storage manager */
#define NETLINK_INET_DIAG NETLINK_SOCK_DIAG
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 82f2288..e35fb52 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -285,6 +285,7 @@ fail:
done:
return res;
}
+EXPORT_SYMBOL_GPL(name_to_dev_t);
static int __init root_dev_setup(char *line)
{
@@ -586,6 +587,8 @@ void __init prepare_namespace(void)
if (is_floppy && rd_doload && rd_load_disk(0))
ROOT_DEV = Root_RAM0;
+ check_resume_attempted();
+
mount_root();
out:
devtmpfs_mount("dev");
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 3e0878e..a49c596 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -15,6 +15,7 @@
#include <linux/romfs_fs.h>
#include <linux/initrd.h>
#include <linux/sched.h>
+#include <linux/suspend.h>
#include <linux/freezer.h>
#include <linux/kmod.h>
@@ -79,6 +80,11 @@ static void __init handle_initrd(void)
current->flags &= ~PF_FREEZER_SKIP;
+ if (!resume_attempted)
+ printk(KERN_ERR "TuxOnIce: No attempt was made to resume from "
+ "any image that might exist.\n");
+ clear_toi_state(TOI_BOOT_TIME);
+
/* move initrd to rootfs' /old */
sys_mount("..", ".", NULL, MS_MOVE, NULL);
/* switch root and cwd back to / of rootfs */
diff --git a/init/main.c b/init/main.c
index 48655ce..d08511c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -123,6 +123,7 @@ void (*__initdata late_time_init)(void);
char __initdata boot_command_line[COMMAND_LINE_SIZE];
/* Untouched saved command line (eg. for /proc) */
char *saved_command_line;
+EXPORT_SYMBOL_GPL(saved_command_line);
/* Command line for parameter parsing */
static char *static_command_line;
/* Command line for per-initcall parameter parsing */
diff --git a/ipc/shm.c b/ipc/shm.c
index 7645961..c1b7257 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -537,7 +537,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if ((shmflg & SHM_NORESERVE) &&
sysctl_overcommit_memory != OVERCOMMIT_NEVER)
acctflag = VM_NORESERVE;
- file = shmem_file_setup(name, size, acctflag);
+ file = shmem_file_setup(name, size, acctflag, 0);
}
error = PTR_ERR(file);
if (IS_ERR(file))
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 247979a..ac131de 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -542,6 +542,7 @@ int disable_nonboot_cpus(void)
cpu_maps_update_done();
return error;
}
+EXPORT_SYMBOL_GPL(disable_nonboot_cpus);
void __weak arch_enable_nonboot_cpus_begin(void)
{
@@ -580,6 +581,7 @@ void __ref enable_nonboot_cpus(void)
out:
cpu_maps_update_done();
}
+EXPORT_SYMBOL_GPL(enable_nonboot_cpus);
static int __init alloc_frozen_cpus(void)
{
diff --git a/kernel/fork.c b/kernel/fork.c
index 54a8d26..16be4e1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -398,7 +398,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
goto fail_nomem;
charge = len;
}
- tmp = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
+ tmp = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
if (!tmp)
goto fail_nomem;
*tmp = *mpnt;
@@ -453,7 +453,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
__vma_link_rb(mm, tmp, rb_link, rb_parent);
rb_link = &tmp->vm_rb.rb_right;
rb_parent = &tmp->vm_rb;
-
+ uksm_vma_add_new(tmp);
mm->map_count++;
retval = copy_page_range(mm, oldmm, mpnt);
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 6b375af..e426523 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -461,6 +461,7 @@ void __usermodehelper_set_disable_depth(enum umh_disable_depth depth)
wake_up(&usermodehelper_disabled_waitq);
up_write(&umhelper_sem);
}
+EXPORT_SYMBOL_GPL(__usermodehelper_set_disable_depth);
/**
* __usermodehelper_disable - Prevent new helpers from being started.
@@ -494,6 +495,7 @@ int __usermodehelper_disable(enum umh_disable_depth depth)
__usermodehelper_set_disable_depth(UMH_ENABLED);
return -EAGAIN;
}
+EXPORT_SYMBOL_GPL(__usermodehelper_disable);
static void helper_lock(void)
{
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9a130ec..47605e1 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -550,6 +550,8 @@ int kthread_worker_fn(void *worker_ptr)
WARN_ON(worker->task);
worker->task = current;
+ set_freezable();
+
repeat:
set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
diff --git a/kernel/pid.c b/kernel/pid.c
index 9b9a266..ad91ea4 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -450,6 +450,7 @@ struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns)
" protection");
return pid_task(find_pid_ns(nr, ns), PIDTYPE_PID);
}
+EXPORT_SYMBOL_GPL(find_task_by_pid_ns);
struct task_struct *find_task_by_vpid(pid_t vnr)
{
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 2fac9cc..bd51a10 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -91,6 +91,291 @@ config PM_STD_PARTITION
suspended image to. It will simply pick the first available swap
device.
+menuconfig TOI_CORE
+ tristate "Enhanced Hibernation (TuxOnIce)"
+ depends on HIBERNATION
+ default y
+ ---help---
+ TuxOnIce is the 'new and improved' suspend support.
+
+ See the TuxOnIce home page (tuxonice.net)
+ for FAQs, HOWTOs and other documentation.
+
+ comment "Image Storage (you need at least one allocator)"
+ depends on TOI_CORE
+
+ config TOI_FILE
+ tristate "File Allocator"
+ depends on TOI_CORE
+ default y
+ ---help---
+ This option enables support for storing an image in a
+ simple file. You might want this if your swap is
+ sometimes full enough that you don't have enough spare
+ space to store an image.
+
+ config TOI_SWAP
+ tristate "Swap Allocator"
+ depends on TOI_CORE && SWAP
+ default y
+ ---help---
+ This option enables support for storing an image in your
+ swap space.
+
+ comment "General Options"
+ depends on TOI_CORE
+
+ config TOI_PRUNE
+ tristate "Image pruning support"
+ depends on TOI_CORE && CRYPTO && BROKEN
+ default y
+ ---help---
+ This option adds support for using cryptoapi hashing
+ algorithms to identify pages with the same content. We
+ then write a much smaller pointer to the first copy of
+ the data instead of a complete (perhaps compressed)
+ additional copy.
+
+ You probably want this, so say Y here.
+
+ comment "No image pruning support available without Cryptoapi support."
+ depends on TOI_CORE && !CRYPTO
+
+ config TOI_CRYPTO
+ tristate "Compression support"
+ depends on TOI_CORE && CRYPTO
+ default y
+ ---help---
+ This option adds support for using cryptoapi compression
+ algorithms. Compression is particularly useful as it can
+ more than double your suspend and resume speed (depending
+ upon how well your image compresses).
+
+ You probably want this, so say Y here.
+
+ comment "No compression support available without Cryptoapi support."
+ depends on TOI_CORE && !CRYPTO
+
+ config TOI_USERUI
+ tristate "Userspace User Interface support"
+ depends on TOI_CORE && NET && (VT || SERIAL_CONSOLE)
+ default y
+ ---help---
+ This option enabled support for a userspace based user interface
+ to TuxOnIce, which allows you to have a nice display while suspending
+ and resuming, and also enables features such as pressing escape to
+ cancel a cycle or interactive debugging.
+
+ config TOI_USERUI_DEFAULT_PATH
+ string "Default userui program location"
+ default "/usr/local/sbin/tuxoniceui_text"
+ depends on TOI_USERUI
+ ---help---
+ This entry allows you to specify a default path to the userui binary.
+
+ config TOI_DEFAULT_IMAGE_SIZE_LIMIT
+ int "Default image size limit"
+ range -2 65536
+ default "-2"
+ depends on TOI_CORE
+ ---help---
+ This entry allows you to specify a default image size limit. It can
+ be overridden at run-time using /sys/power/tuxonice/image_size_limit.
+
+ config TOI_KEEP_IMAGE
+ bool "Allow Keep Image Mode"
+ depends on TOI_CORE
+ ---help---
+ This option allows you to keep and image and reuse it. It is intended
+ __ONLY__ for use with systems where all filesystems are mounted read-
+ only (kiosks, for example). To use it, compile this option in and boot
+ normally. Set the KEEP_IMAGE flag in /sys/power/tuxonice and suspend.
+ When you resume, the image will not be removed. You will be unable to turn
+ off swap partitions (assuming you are using the swap allocator), but future
+ suspends simply do a power-down. The image can be updated using the
+ kernel command line parameter suspend_act= to turn off the keep image
+ bit. Keep image mode is a little less user friendly on purpose - it
+ should not be used without thought!
+
+ config TOI_INCREMENTAL
+ bool "Incremental Image Support"
+ depends on TOI_CORE && 64BIT
+ default n
+ ---help---
+ This option enables the work in progress toward using the dirty page
+ tracking to record changes to pages. It is hoped that
+ this will be an initial step toward implementing storing just
+ the differences between consecutive images, which will
+ increase the amount of storage needed for the image, but also
+ increase the speed at which writing an image occurs and
+ reduce the wear and tear on drives.
+
+ At the moment, all that is implemented is the first step of keeping
+ an existing image and then comparing it to the contents in memory
+ (by setting /sys/power/tuxonice/verify_image to 1 and triggering a
+ (fake) resume) to see what the page change tracking should find to be
+ different. If you have verify_image set to 1, TuxOnIce will automatically
+ invalidate the old image when you next try to hibernate, so there's no
+ greater chance of disk corruption than normal.
+
+ comment "No incremental image support available without Keep Image support."
+ depends on TOI_CORE && !TOI_KEEP_IMAGE
+
+ config TOI_REPLACE_SWSUSP
+ bool "Replace swsusp by default"
+ default y
+ depends on TOI_CORE
+ ---help---
+ TuxOnIce can replace swsusp. This option makes that the default state,
+ requiring you to echo 0 > /sys/power/tuxonice/replace_swsusp if you want
+ to use the vanilla kernel functionality. Note that your initrd/ramfs will
+ need to do this before trying to resume, too.
+ With overriding swsusp enabled, echoing disk to /sys/power/state will
+ start a TuxOnIce cycle. If resume= doesn't specify an allocator and both
+ the swap and file allocators are compiled in, the swap allocator will be
+ used by default.
+
+ config TOI_IGNORE_LATE_INITCALL
+ bool "Wait for initrd/ramfs to run, by default"
+ default n
+ depends on TOI_CORE
+ ---help---
+ When booting, TuxOnIce can check for an image and start to resume prior
+ to any initrd/ramfs running (via a late initcall).
+
+ If you don't have an initrd/ramfs, this is what you want to happen -
+ otherwise you won't be able to safely resume. You should set this option
+ to 'No'.
+
+ If, however, you want your initrd/ramfs to run anyway before resuming,
+ you need to tell TuxOnIce to ignore that earlier opportunity to resume.
+ This can be done either by using this compile time option, or by
+ overriding this option with the boot-time parameter toi_initramfs_resume_only=1.
+
+ Note that if TuxOnIce can't resume at the earlier opportunity, the
+ value of this option won't matter - the initramfs/initrd (if any) will
+ run anyway.
+
+ menuconfig TOI_CLUSTER
+ tristate "Cluster support"
+ default n
+ depends on TOI_CORE && NET && BROKEN
+ ---help---
+ Support for linking multiple machines in a cluster so that they suspend
+ and resume together.
+
+ config TOI_DEFAULT_CLUSTER_INTERFACE
+ string "Default cluster interface"
+ depends on TOI_CLUSTER
+ ---help---
+ The default interface on which to communicate with other nodes in
+ the cluster.
+
+ If no value is set here, cluster support will be disabled by default.
+
+ config TOI_DEFAULT_CLUSTER_KEY
+ string "Default cluster key"
+ default "Default"
+ depends on TOI_CLUSTER
+ ---help---
+ The default key used by this node. All nodes in the same cluster
+ have the same key. Multiple clusters may coexist on the same lan
+ by using different values for this key.
+
+ config TOI_CLUSTER_IMAGE_TIMEOUT
+ int "Timeout when checking for image"
+ default 15
+ depends on TOI_CLUSTER
+ ---help---
+ Timeout (seconds) before continuing to boot when waiting to see
+ whether other nodes might have an image. Set to -1 to wait
+ indefinitely. In WAIT_UNTIL_NODES is non zero, we might continue
+ booting sooner than this timeout.
+
+ config TOI_CLUSTER_WAIT_UNTIL_NODES
+ int "Nodes without image before continuing"
+ default 0
+ depends on TOI_CLUSTER
+ ---help---
+ When booting and no image is found, we wait to see if other nodes
+ have an image before continuing to boot. This value lets us
+ continue after seeing a certain number of nodes without an image,
+ instead of continuing to wait for the timeout. Set to 0 to only
+ use the timeout.
+
+ config TOI_DEFAULT_CLUSTER_PRE_HIBERNATE
+ string "Default pre-hibernate script"
+ depends on TOI_CLUSTER
+ ---help---
+ The default script to be called when starting to hibernate.
+
+ config TOI_DEFAULT_CLUSTER_POST_HIBERNATE
+ string "Default post-hibernate script"
+ depends on TOI_CLUSTER
+ ---help---
+ The default script to be called after resuming from hibernation.
+
+ config TOI_DEFAULT_WAIT
+ int "Default waiting time for emergency boot messages"
+ default "25"
+ range -1 32768
+ depends on TOI_CORE
+ help
+ TuxOnIce can display warnings very early in the process of resuming,
+ if (for example) it appears that you have booted a kernel that doesn't
+ match an image on disk. It can then give you the opportunity to either
+ continue booting that kernel, or reboot the machine. This option can be
+ used to control how long to wait in such circumstances. -1 means wait
+ forever. 0 means don't wait at all (do the default action, which will
+ generally be to continue booting and remove the image). Values of 1 or
+ more indicate a number of seconds (up to 255) to wait before doing the
+ default.
+
+ config TOI_DEFAULT_EXTRA_PAGES_ALLOWANCE
+ int "Default extra pages allowance"
+ default "2000"
+ range 500 32768
+ depends on TOI_CORE
+ help
+ This value controls the default for the allowance TuxOnIce makes for
+ drivers to allocate extra memory during the atomic copy. The default
+ value of 2000 will be okay in most cases. If you are using
+ DRI, the easiest way to find what value to use is to try to hibernate
+ and look at how many pages were actually needed in the sysfs entry
+ /sys/power/tuxonice/debug_info (first number on the last line), adding
+ a little extra because the value is not always the same.
+
+ config TOI_CHECKSUM
+ bool "Checksum pageset2"
+ default n
+ depends on TOI_CORE
+ select CRYPTO
+ select CRYPTO_ALGAPI
+ select CRYPTO_MD4
+ ---help---
+ Adds support for checksumming pageset2 pages, to ensure you really get an
+ atomic copy. Since some filesystems (XFS especially) change metadata even
+ when there's no other activity, we need this to check for pages that have
+ been changed while we were saving the page cache. If your debugging output
+ always says no pages were resaved, you may be able to safely disable this
+ option.
+
+config TOI
+ bool
+ depends on TOI_CORE!=n
+ default y
+
+config TOI_EXPORTS
+ bool
+ depends on TOI_SWAP=m || TOI_FILE=m || \
+ TOI_CRYPTO=m || TOI_CLUSTER=m || \
+ TOI_USERUI=m || TOI_CORE=m
+ default y
+
+config TOI_ZRAM_SUPPORT
+ def_bool y
+ depends on TOI && ZRAM!=n
+
config PM_SLEEP
def_bool y
depends on SUSPEND || HIBERNATE_CALLBACKS
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index 29472bf..dd5d4f2 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -1,6 +1,37 @@
ccflags-$(CONFIG_PM_DEBUG) := -DDEBUG
+tuxonice_core-y := tuxonice_modules.o
+
+obj-$(CONFIG_TOI) += tuxonice_builtin.o
+
+tuxonice_core-$(CONFIG_PM_DEBUG) += tuxonice_alloc.o
+
+# Compile these in after allocation debugging, if used.
+
+tuxonice_core-y += tuxonice_sysfs.o tuxonice_highlevel.o \
+ tuxonice_io.o tuxonice_pagedir.o tuxonice_prepare_image.o \
+ tuxonice_extent.o tuxonice_pageflags.o tuxonice_ui.o \
+ tuxonice_power_off.o tuxonice_atomic_copy.o
+
+tuxonice_core-$(CONFIG_TOI_CHECKSUM) += tuxonice_checksum.o
+
+tuxonice_core-$(CONFIG_NET) += tuxonice_storage.o tuxonice_netlink.o
+
+obj-$(CONFIG_TOI_CORE) += tuxonice_core.o
+obj-$(CONFIG_TOI_PRUNE) += tuxonice_prune.o
+obj-$(CONFIG_TOI_INCREMENTAL) += tuxonice_incremental.o
+obj-$(CONFIG_TOI_CRYPTO) += tuxonice_compress.o
+
+tuxonice_bio-y := tuxonice_bio_core.o tuxonice_bio_chains.o \
+ tuxonice_bio_signature.o
+
+obj-$(CONFIG_TOI_SWAP) += tuxonice_bio.o tuxonice_swap.o
+obj-$(CONFIG_TOI_FILE) += tuxonice_bio.o tuxonice_file.o
+obj-$(CONFIG_TOI_CLUSTER) += tuxonice_cluster.o
+
+obj-$(CONFIG_TOI_USERUI) += tuxonice_userui.o
+
obj-y += qos.o
obj-$(CONFIG_PM) += main.o
obj-$(CONFIG_VT_CONSOLE_SLEEP) += console.o
diff --git a/kernel/power/console.c b/kernel/power/console.c
index aba9c54..856fe7f 100644
--- a/kernel/power/console.c
+++ b/kernel/power/console.c
@@ -138,6 +138,7 @@ int pm_prepare_console(void)
orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
return 0;
}
+EXPORT_SYMBOL_GPL(pm_prepare_console);
void pm_restore_console(void)
{
@@ -149,3 +150,4 @@ void pm_restore_console(void)
vt_kmsg_redirect(orig_kmsg);
}
}
+EXPORT_SYMBOL_GPL(pm_restore_console);
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index f4f2073..72760fa 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -29,14 +29,15 @@
#include <linux/ctype.h>
#include <linux/genhd.h>
-#include "power.h"
+#include "tuxonice.h"
static int nocompress;
static int noresume;
static int resume_wait;
static int resume_delay;
-static char resume_file[256] = CONFIG_PM_STD_PARTITION;
+char resume_file[256] = CONFIG_PM_STD_PARTITION;
+EXPORT_SYMBOL_GPL(resume_file);
dev_t swsusp_resume_device;
sector_t swsusp_resume_block;
__visible int in_suspend __nosavedata;
@@ -115,21 +116,23 @@ static int hibernation_test(int level) { return 0; }
* platform_begin - Call platform to start hibernation.
* @platform_mode: Whether or not to use the platform driver.
*/
-static int platform_begin(int platform_mode)
+int platform_begin(int platform_mode)
{
return (platform_mode && hibernation_ops) ?
hibernation_ops->begin() : 0;
}
+EXPORT_SYMBOL_GPL(platform_begin);
/**
* platform_end - Call platform to finish transition to the working state.
* @platform_mode: Whether or not to use the platform driver.
*/
-static void platform_end(int platform_mode)
+void platform_end(int platform_mode)
{
if (platform_mode && hibernation_ops)
hibernation_ops->end();
}
+EXPORT_SYMBOL_GPL(platform_end);
/**
* platform_pre_snapshot - Call platform to prepare the machine for hibernation.
@@ -139,11 +142,12 @@ static void platform_end(int platform_mode)
* if so configured, and return an error code if that fails.
*/
-static int platform_pre_snapshot(int platform_mode)
+int platform_pre_snapshot(int platform_mode)
{
return (platform_mode && hibernation_ops) ?
hibernation_ops->pre_snapshot() : 0;
}
+EXPORT_SYMBOL_GPL(platform_pre_snapshot);
/**
* platform_leave - Call platform to prepare a transition to the working state.
@@ -154,11 +158,12 @@ static int platform_pre_snapshot(int platform_mode)
*
* This routine is called on one CPU with interrupts disabled.
*/
-static void platform_leave(int platform_mode)
+void platform_leave(int platform_mode)
{
if (platform_mode && hibernation_ops)
hibernation_ops->leave();
}
+EXPORT_SYMBOL_GPL(platform_leave);
/**
* platform_finish - Call platform to switch the system to the working state.
@@ -169,11 +174,12 @@ static void platform_leave(int platform_mode)
*
* This routine must be called after platform_prepare().
*/
-static void platform_finish(int platform_mode)
+void platform_finish(int platform_mode)
{
if (platform_mode && hibernation_ops)
hibernation_ops->finish();
}
+EXPORT_SYMBOL_GPL(platform_finish);
/**
* platform_pre_restore - Prepare for hibernate image restoration.
@@ -185,11 +191,12 @@ static void platform_finish(int platform_mode)
* If the restore fails after this function has been called,
* platform_restore_cleanup() must be called.
*/
-static int platform_pre_restore(int platform_mode)
+int platform_pre_restore(int platform_mode)
{
return (platform_mode && hibernation_ops) ?
hibernation_ops->pre_restore() : 0;
}
+EXPORT_SYMBOL_GPL(platform_pre_restore);
/**
* platform_restore_cleanup - Switch to the working state after failing restore.
@@ -202,21 +209,23 @@ static int platform_pre_restore(int platform_mode)
* function must be called too, regardless of the result of
* platform_pre_restore().
*/
-static void platform_restore_cleanup(int platform_mode)
+void platform_restore_cleanup(int platform_mode)
{
if (platform_mode && hibernation_ops)
hibernation_ops->restore_cleanup();
}
+EXPORT_SYMBOL_GPL(platform_restore_cleanup);
/**
* platform_recover - Recover from a failure to suspend devices.
* @platform_mode: Whether or not to use the platform driver.
*/
-static void platform_recover(int platform_mode)
+void platform_recover(int platform_mode)
{
if (platform_mode && hibernation_ops && hibernation_ops->recover)
hibernation_ops->recover();
}
+EXPORT_SYMBOL_GPL(platform_recover);
/**
* swsusp_show_speed - Print time elapsed between two events during hibernation.
@@ -574,6 +583,7 @@ int hibernation_platform_enter(void)
return error;
}
+EXPORT_SYMBOL_GPL(hibernation_platform_enter);
/**
* power_down - Shut the machine down for hibernation.
@@ -633,6 +643,9 @@ int hibernate(void)
{
int error;
+ if (test_action_state(TOI_REPLACE_SWSUSP))
+ return try_tuxonice_hibernate();
+
lock_system_sleep();
/* The snapshot device should not be opened while we're running */
if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
@@ -717,11 +730,19 @@ int hibernate(void)
* attempts to recover gracefully and make the kernel return to the normal mode
* of operation.
*/
-static int software_resume(void)
+int software_resume(void)
{
int error;
unsigned int flags;
+ resume_attempted = 1;
+
+ /*
+ * We can't know (until an image header - if any - is loaded), whether
+ * we did override swsusp. We therefore ensure that both are tried.
+ */
+ try_tuxonice_resume();
+
/*
* If the user said "noresume".. bail out early.
*/
@@ -1098,6 +1119,7 @@ static int __init hibernate_setup(char *str)
static int __init noresume_setup(char *str)
{
noresume = 1;
+ set_toi_state(TOI_NORESUME_SPECIFIED);
return 1;
}
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 6271bc4..bcf87ed 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -19,12 +19,14 @@
#include "power.h"
DEFINE_MUTEX(pm_mutex);
+EXPORT_SYMBOL_GPL(pm_mutex);
#ifdef CONFIG_PM_SLEEP
/* Routines for PM-transition notifications */
-static BLOCKING_NOTIFIER_HEAD(pm_chain_head);
+BLOCKING_NOTIFIER_HEAD(pm_chain_head);
+EXPORT_SYMBOL_GPL(pm_chain_head);
int register_pm_notifier(struct notifier_block *nb)
{
@@ -44,6 +46,7 @@ int pm_notifier_call_chain(unsigned long val)
return notifier_to_errno(ret);
}
+EXPORT_SYMBOL_GPL(pm_notifier_call_chain);
/* If set, devices may be suspended and resumed asynchronously. */
int pm_async_enabled = 1;
@@ -277,6 +280,7 @@ static inline void pm_print_times_init(void) {}
#endif /* CONFIG_PM_SLEEP_DEBUG */
struct kobject *power_kobj;
+EXPORT_SYMBOL_GPL(power_kobj);
/**
* state - control system power state.
diff --git a/kernel/power/power.h b/kernel/power/power.h
index 15f37ea..906ea21 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -36,8 +36,12 @@ static inline char *check_image_kernel(struct swsusp_info *info)
return arch_hibernation_header_restore(info) ?
"architecture specific data" : NULL;
}
+#else
+extern char *check_image_kernel(struct swsusp_info *info);
#endif /* CONFIG_ARCH_HIBERNATION_HEADER */
+extern int init_header(struct swsusp_info *info);
+extern char resume_file[256];
/*
* Keep some memory free so that I/O operations can succeed without paging
* [Might this be more than 4 MB?]
@@ -58,6 +62,7 @@ extern bool freezer_test_done;
extern int hibernation_snapshot(int platform_mode);
extern int hibernation_restore(int platform_mode);
extern int hibernation_platform_enter(void);
+extern void platform_recover(int platform_mode);
#else /* !CONFIG_HIBERNATION */
@@ -77,6 +82,8 @@ static struct kobj_attribute _name##_attr = { \
.store = _name##_store, \
}
+extern struct pbe *restore_pblist;
+
/* Preferred image size in bytes (default 500 MB) */
extern unsigned long image_size;
/* Size of memory reserved for drivers (default SPARE_PAGES x PAGE_SIZE) */
@@ -271,6 +278,90 @@ static inline void suspend_thaw_processes(void)
}
#endif
+extern struct page *saveable_page(struct zone *z, unsigned long p);
+#ifdef CONFIG_HIGHMEM
+extern struct page *saveable_highmem_page(struct zone *z, unsigned long p);
+#else
+static
+inline struct page *saveable_highmem_page(struct zone *z, unsigned long p)
+{
+ return NULL;
+}
+#endif
+
+#define PBES_PER_PAGE (PAGE_SIZE / sizeof(struct pbe))
+extern struct list_head nosave_regions;
+
+/**
+ * This structure represents a range of page frames the contents of which
+ * should not be saved during the suspend.
+ */
+
+struct nosave_region {
+ struct list_head list;
+ unsigned long start_pfn;
+ unsigned long end_pfn;
+};
+
+#define BM_END_OF_MAP (~0UL)
+
+#define BM_BITS_PER_BLOCK (PAGE_SIZE * BITS_PER_BYTE)
+
+struct bm_block {
+ struct list_head hook; /* hook into a list of bitmap blocks */
+ unsigned long start_pfn; /* pfn represented by the first bit */
+ unsigned long end_pfn; /* pfn represented by the last bit plus 1 */
+ unsigned long *data; /* bitmap representing pages */
+};
+
+/* struct bm_position is used for browsing memory bitmaps */
+
+struct bm_position {
+ struct bm_block *block;
+ int bit;
+};
+
+struct memory_bitmap {
+ struct list_head blocks; /* list of bitmap blocks */
+ struct linked_page *p_list; /* list of pages used to store zone
+ * bitmap objects and bitmap block
+ * objects
+ */
+ struct bm_position *states; /* most recently used bit position */
+ int num_states; /* when iterating over a bitmap and
+ * number of states we support.
+ */
+};
+
+extern int memory_bm_create(struct memory_bitmap *bm, gfp_t gfp_mask,
+ int safe_needed);
+extern int memory_bm_create_index(struct memory_bitmap *bm, gfp_t gfp_mask,
+ int safe_needed, int index);
+extern void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free);
+extern void memory_bm_set_bit(struct memory_bitmap *bm, unsigned long pfn);
+extern void memory_bm_clear_bit(struct memory_bitmap *bm, unsigned long pfn);
+extern void memory_bm_clear_bit_index(struct memory_bitmap *bm, unsigned long pfn, int index);
+extern int memory_bm_test_bit(struct memory_bitmap *bm, unsigned long pfn);
+extern int memory_bm_test_bit_index(struct memory_bitmap *bm, unsigned long pfn, int index);
+extern unsigned long memory_bm_next_pfn(struct memory_bitmap *bm);
+extern unsigned long memory_bm_next_pfn_index(struct memory_bitmap *bm,
+ int index);
+extern void memory_bm_position_reset(struct memory_bitmap *bm);
+extern void memory_bm_clear(struct memory_bitmap *bm);
+extern void memory_bm_copy(struct memory_bitmap *source,
+ struct memory_bitmap *dest);
+extern void memory_bm_dup(struct memory_bitmap *source,
+ struct memory_bitmap *dest);
+extern int memory_bm_set_iterators(struct memory_bitmap *bm, int number);
+
+#ifdef CONFIG_TOI
+struct toi_module_ops;
+extern int memory_bm_read(struct memory_bitmap *bm, int (*rw_chunk)
+ (int rw, struct toi_module_ops *owner, char *buffer, int buffer_size));
+extern int memory_bm_write(struct memory_bitmap *bm, int (*rw_chunk)
+ (int rw, struct toi_module_ops *owner, char *buffer, int buffer_size));
+#endif
+
#ifdef CONFIG_PM_AUTOSLEEP
/* kernel/power/autosleep.c */
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 06ec886..4004a83 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -143,6 +143,7 @@ int freeze_processes(void)
thaw_processes();
return error;
}
+EXPORT_SYMBOL_GPL(freeze_processes);
/**
* freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
@@ -169,6 +170,7 @@ int freeze_kernel_threads(void)
thaw_kernel_threads();
return error;
}
+EXPORT_SYMBOL_GPL(freeze_kernel_threads);
void thaw_processes(void)
{
@@ -202,6 +204,7 @@ void thaw_processes(void)
schedule();
printk("done.\n");
}
+EXPORT_SYMBOL_GPL(thaw_processes);
void thaw_kernel_threads(void)
{
@@ -222,3 +225,4 @@ void thaw_kernel_threads(void)
schedule();
printk("done.\n");
}
+EXPORT_SYMBOL_GPL(thaw_kernel_threads);
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 1ea328a..7314819 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -36,6 +36,8 @@
#include <asm/io.h>
#include "power.h"
+#include "tuxonice_builtin.h"
+#include "tuxonice_pagedir.h"
static int swsusp_page_is_free(struct page *);
static void swsusp_set_page_forbidden(struct page *);
@@ -72,6 +74,10 @@ void __init hibernate_image_size_init(void)
* directly to their "original" page frames.
*/
struct pbe *restore_pblist;
+EXPORT_SYMBOL_GPL(restore_pblist);
+
+int resume_attempted;
+EXPORT_SYMBOL_GPL(resume_attempted);
/* Pointer to an auxiliary buffer (1 page) */
static void *buffer;
@@ -114,6 +120,9 @@ static void *get_image_page(gfp_t gfp_mask, int safe_needed)
unsigned long get_safe_page(gfp_t gfp_mask)
{
+ if (toi_running)
+ return toi_get_nonconflicting_page();
+
return (unsigned long)get_image_page(gfp_mask, PG_SAFE);
}
@@ -250,47 +259,53 @@ static void *chain_alloc(struct chain_allocator *ca, unsigned int size)
* the represented memory area.
*/
-#define BM_END_OF_MAP (~0UL)
-
-#define BM_BITS_PER_BLOCK (PAGE_SIZE * BITS_PER_BYTE)
-
-struct bm_block {
- struct list_head hook; /* hook into a list of bitmap blocks */
- unsigned long start_pfn; /* pfn represented by the first bit */
- unsigned long end_pfn; /* pfn represented by the last bit plus 1 */
- unsigned long *data; /* bitmap representing pages */
-};
-
static inline unsigned long bm_block_bits(struct bm_block *bb)
{
return bb->end_pfn - bb->start_pfn;
}
-/* strcut bm_position is used for browsing memory bitmaps */
+/* Functions that operate on memory bitmaps */
-struct bm_position {
- struct bm_block *block;
- int bit;
-};
+void memory_bm_position_reset_index(struct memory_bitmap *bm, int index)
+{
+ bm->states[index].block = list_entry(bm->blocks.next,
+ struct bm_block, hook);
+ bm->states[index].bit = 0;
+}
+EXPORT_SYMBOL_GPL(memory_bm_position_reset_index);
-struct memory_bitmap {
- struct list_head blocks; /* list of bitmap blocks */
- struct linked_page *p_list; /* list of pages used to store zone
- * bitmap objects and bitmap block
- * objects
- */
- struct bm_position cur; /* most recently used bit position */
-};
+void memory_bm_position_reset(struct memory_bitmap *bm)
+{
+ int i;
-/* Functions that operate on memory bitmaps */
+ for (i = 0; i < bm->num_states; i++) {
+ bm->states[i].block = list_entry(bm->blocks.next,
+ struct bm_block, hook);
+ bm->states[i].bit = 0;
+ }
+}
+EXPORT_SYMBOL_GPL(memory_bm_position_reset);
-static void memory_bm_position_reset(struct memory_bitmap *bm)
+int memory_bm_set_iterators(struct memory_bitmap *bm, int number)
{
- bm->cur.block = list_entry(bm->blocks.next, struct bm_block, hook);
- bm->cur.bit = 0;
-}
+ int bytes = number * sizeof(struct bm_position);
+ struct bm_position *new_states;
+
+ if (number < bm->num_states)
+ return 0;
-static void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free);
+ new_states = kmalloc(bytes, GFP_KERNEL);
+ if (!new_states)
+ return -ENOMEM;
+
+ if (bm->states)
+ kfree(bm->states);
+
+ bm->states = new_states;
+ bm->num_states = number;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(memory_bm_set_iterators);
/**
* create_bm_block_list - create a list of block bitmap objects
@@ -398,8 +413,8 @@ static int create_mem_extents(struct list_head *list, gfp_t gfp_mask)
/**
* memory_bm_create - allocate memory for a memory bitmap
*/
-static int
-memory_bm_create(struct memory_bitmap *bm, gfp_t gfp_mask, int safe_needed)
+int memory_bm_create_index(struct memory_bitmap *bm, gfp_t gfp_mask,
+ int safe_needed, int states)
{
struct chain_allocator ca;
struct list_head mem_extents;
@@ -443,6 +458,9 @@ memory_bm_create(struct memory_bitmap *bm, gfp_t gfp_mask, int safe_needed)
}
}
+ if (!error)
+ error = memory_bm_set_iterators(bm, states);
+
bm->p_list = ca.chain;
memory_bm_position_reset(bm);
Exit:
@@ -454,11 +472,18 @@ memory_bm_create(struct memory_bitmap *bm, gfp_t gfp_mask, int safe_needed)
memory_bm_free(bm, PG_UNSAFE_CLEAR);
goto Exit;
}
+EXPORT_SYMBOL_GPL(memory_bm_create_index);
+
+int memory_bm_create(struct memory_bitmap *bm, gfp_t gfp_mask, int safe_needed)
+{
+ return memory_bm_create_index(bm, gfp_mask, safe_needed, 1);
+}
+EXPORT_SYMBOL_GPL(memory_bm_create);
/**
* memory_bm_free - free memory occupied by the memory bitmap @bm
*/
-static void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free)
+void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free)
{
struct bm_block *bb;
@@ -469,15 +494,22 @@ static void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free)
free_list_of_pages(bm->p_list, clear_nosave_free);
INIT_LIST_HEAD(&bm->blocks);
+
+ if (bm->states) {
+ kfree(bm->states);
+ bm->states = NULL;
+ bm->num_states = 0;
+ }
}
+EXPORT_SYMBOL_GPL(memory_bm_free);
/**
* memory_bm_find_bit - find the bit in the bitmap @bm that corresponds
* to given pfn. The cur_zone_bm member of @bm and the cur_block member
- * of @bm->cur_zone_bm are updated.
+ * of @bm->states[i]_zone_bm are updated.
*/
-static int memory_bm_find_bit(struct memory_bitmap *bm, unsigned long pfn,
- void **addr, unsigned int *bit_nr)
+static int memory_bm_find_bit_index(struct memory_bitmap *bm, unsigned long pfn,
+ void **addr, unsigned int *bit_nr, int state)
{
struct bm_block *bb;
@@ -485,7 +517,7 @@ static int memory_bm_find_bit(struct memory_bitmap *bm, unsigned long pfn,
* Check if the pfn corresponds to the current bitmap block and find
* the block where it fits if this is not the case.
*/
- bb = bm->cur.block;
+ bb = bm->states[state].block;
if (pfn < bb->start_pfn)
list_for_each_entry_continue_reverse(bb, &bm->blocks, hook)
if (pfn >= bb->start_pfn)
@@ -500,15 +532,21 @@ static int memory_bm_find_bit(struct memory_bitmap *bm, unsigned long pfn,
return -EFAULT;
/* The block has been found */
- bm->cur.block = bb;
+ bm->states[state].block = bb;
pfn -= bb->start_pfn;
- bm->cur.bit = pfn + 1;
+ bm->states[state].bit = pfn + 1;
*bit_nr = pfn;
*addr = bb->data;
return 0;
}
-static void memory_bm_set_bit(struct memory_bitmap *bm, unsigned long pfn)
+static int memory_bm_find_bit(struct memory_bitmap *bm, unsigned long pfn,
+ void **addr, unsigned int *bit_nr)
+{
+ return memory_bm_find_bit_index(bm, pfn, addr, bit_nr, 0);
+}
+
+void memory_bm_set_bit(struct memory_bitmap *bm, unsigned long pfn)
{
void *addr;
unsigned int bit;
@@ -518,6 +556,7 @@ static void memory_bm_set_bit(struct memory_bitmap *bm, unsigned long pfn)
BUG_ON(error);
set_bit(bit, addr);
}
+EXPORT_SYMBOL_GPL(memory_bm_set_bit);
static int mem_bm_set_bit_check(struct memory_bitmap *bm, unsigned long pfn)
{
@@ -531,27 +570,43 @@ static int mem_bm_set_bit_check(struct memory_bitmap *bm, unsigned long pfn)
return error;
}
-static void memory_bm_clear_bit(struct memory_bitmap *bm, unsigned long pfn)
+void memory_bm_clear_bit_index(struct memory_bitmap *bm, unsigned long pfn,
+ int index)
{
void *addr;
unsigned int bit;
int error;
- error = memory_bm_find_bit(bm, pfn, &addr, &bit);
+ error = memory_bm_find_bit_index(bm, pfn, &addr, &bit, index);
BUG_ON(error);
clear_bit(bit, addr);
}
+EXPORT_SYMBOL_GPL(memory_bm_clear_bit_index);
+
+void memory_bm_clear_bit(struct memory_bitmap *bm, unsigned long pfn)
+{
+ memory_bm_clear_bit_index(bm, pfn, 0);
+}
+EXPORT_SYMBOL_GPL(memory_bm_clear_bit);
-static int memory_bm_test_bit(struct memory_bitmap *bm, unsigned long pfn)
+int memory_bm_test_bit_index(struct memory_bitmap *bm, unsigned long pfn,
+ int index)
{
void *addr;
unsigned int bit;
int error;
- error = memory_bm_find_bit(bm, pfn, &addr, &bit);
+ error = memory_bm_find_bit_index(bm, pfn, &addr, &bit, index);
BUG_ON(error);
return test_bit(bit, addr);
}
+EXPORT_SYMBOL_GPL(memory_bm_test_bit_index);
+
+int memory_bm_test_bit(struct memory_bitmap *bm, unsigned long pfn)
+{
+ return memory_bm_test_bit_index(bm, pfn, 0);
+}
+EXPORT_SYMBOL_GPL(memory_bm_test_bit);
static bool memory_bm_pfn_present(struct memory_bitmap *bm, unsigned long pfn)
{
@@ -570,43 +625,185 @@ static bool memory_bm_pfn_present(struct memory_bitmap *bm, unsigned long pfn)
* this function.
*/
-static unsigned long memory_bm_next_pfn(struct memory_bitmap *bm)
+unsigned long memory_bm_next_pfn_index(struct memory_bitmap *bm, int index)
{
struct bm_block *bb;
int bit;
- bb = bm->cur.block;
+ bb = bm->states[index].block;
do {
- bit = bm->cur.bit;
+ bit = bm->states[index].bit;
bit = find_next_bit(bb->data, bm_block_bits(bb), bit);
if (bit < bm_block_bits(bb))
goto Return_pfn;
bb = list_entry(bb->hook.next, struct bm_block, hook);
- bm->cur.block = bb;
- bm->cur.bit = 0;
+ bm->states[index].block = bb;
+ bm->states[index].bit = 0;
} while (&bb->hook != &bm->blocks);
- memory_bm_position_reset(bm);
+ memory_bm_position_reset_index(bm, index);
return BM_END_OF_MAP;
Return_pfn:
- bm->cur.bit = bit + 1;
+ bm->states[index].bit = bit + 1;
return bb->start_pfn + bit;
}
+EXPORT_SYMBOL_GPL(memory_bm_next_pfn_index);
-/**
- * This structure represents a range of page frames the contents of which
- * should not be saved during the suspend.
- */
+unsigned long memory_bm_next_pfn(struct memory_bitmap *bm)
+{
+ return memory_bm_next_pfn_index(bm, 0);
+}
+EXPORT_SYMBOL_GPL(memory_bm_next_pfn);
-struct nosave_region {
- struct list_head list;
- unsigned long start_pfn;
- unsigned long end_pfn;
-};
+void memory_bm_clear(struct memory_bitmap *bm)
+{
+ unsigned long pfn;
-static LIST_HEAD(nosave_regions);
+ memory_bm_position_reset(bm);
+ pfn = memory_bm_next_pfn(bm);
+ while (pfn != BM_END_OF_MAP) {
+ memory_bm_clear_bit(bm, pfn);
+ pfn = memory_bm_next_pfn(bm);
+ }
+}
+EXPORT_SYMBOL_GPL(memory_bm_clear);
+
+void memory_bm_copy(struct memory_bitmap *source, struct memory_bitmap *dest)
+{
+ unsigned long pfn;
+
+ memory_bm_position_reset(source);
+ pfn = memory_bm_next_pfn(source);
+ while (pfn != BM_END_OF_MAP) {
+ memory_bm_set_bit(dest, pfn);
+ pfn = memory_bm_next_pfn(source);
+ }
+}
+EXPORT_SYMBOL_GPL(memory_bm_copy);
+
+void memory_bm_dup(struct memory_bitmap *source, struct memory_bitmap *dest)
+{
+ memory_bm_clear(dest);
+ memory_bm_copy(source, dest);
+}
+EXPORT_SYMBOL_GPL(memory_bm_dup);
+
+#ifdef CONFIG_TOI
+#define DEFINE_MEMORY_BITMAP(name) \
+struct memory_bitmap *name; \
+EXPORT_SYMBOL_GPL(name)
+
+DEFINE_MEMORY_BITMAP(pageset1_map);
+DEFINE_MEMORY_BITMAP(pageset1_copy_map);
+DEFINE_MEMORY_BITMAP(pageset2_map);
+DEFINE_MEMORY_BITMAP(page_resave_map);
+DEFINE_MEMORY_BITMAP(io_map);
+DEFINE_MEMORY_BITMAP(nosave_map);
+DEFINE_MEMORY_BITMAP(free_map);
+DEFINE_MEMORY_BITMAP(compare_map);
+
+int memory_bm_write(struct memory_bitmap *bm, int (*rw_chunk)
+ (int rw, struct toi_module_ops *owner, char *buffer, int buffer_size))
+{
+ int result = 0;
+ unsigned int nr = 0;
+ struct bm_block *bb;
+
+ if (!bm)
+ return result;
+
+ list_for_each_entry(bb, &bm->blocks, hook)
+ nr++;
+
+ result = (*rw_chunk)(WRITE, NULL, (char *) &nr, sizeof(unsigned int));
+ if (result)
+ return result;
+
+ list_for_each_entry(bb, &bm->blocks, hook) {
+ result = (*rw_chunk)(WRITE, NULL, (char *) &bb->start_pfn,
+ 2 * sizeof(unsigned long));
+ if (result)
+ return result;
+
+ result = (*rw_chunk)(WRITE, NULL, (char *) bb->data, PAGE_SIZE);
+ if (result)
+ return result;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(memory_bm_write);
+
+int memory_bm_read(struct memory_bitmap *bm, int (*rw_chunk)
+ (int rw, struct toi_module_ops *owner, char *buffer, int buffer_size))
+{
+ int result = 0;
+ unsigned int nr, i;
+ struct bm_block *bb;
+
+ if (!bm)
+ return result;
+
+ result = memory_bm_create(bm, GFP_KERNEL, 0);
+
+ if (result)
+ return result;
+
+ result = (*rw_chunk)(READ, NULL, (char *) &nr, sizeof(unsigned int));
+ if (result)
+ goto Free;
+
+ for (i = 0; i < nr; i++) {
+ unsigned long pfn;
+
+ result = (*rw_chunk)(READ, NULL, (char *) &pfn,
+ sizeof(unsigned long));
+ if (result)
+ goto Free;
+
+ list_for_each_entry(bb, &bm->blocks, hook)
+ if (bb->start_pfn == pfn)
+ break;
+
+ if (&bb->hook == &bm->blocks) {
+ printk(KERN_ERR
+ "TuxOnIce: Failed to load memory bitmap.\n");
+ result = -EINVAL;
+ goto Free;
+ }
+
+ result = (*rw_chunk)(READ, NULL, (char *) &pfn,
+ sizeof(unsigned long));
+ if (result)
+ goto Free;
+
+ if (pfn != bb->end_pfn) {
+ printk(KERN_ERR
+ "TuxOnIce: Failed to load memory bitmap. "
+ "End PFN doesn't match what was saved.\n");
+ result = -EINVAL;
+ goto Free;
+ }
+
+ result = (*rw_chunk)(READ, NULL, (char *) bb->data, PAGE_SIZE);
+
+ if (result)
+ goto Free;
+ }
+
+ return 0;
+
+Free:
+ memory_bm_free(bm, PG_ANY);
+ return result;
+}
+EXPORT_SYMBOL_GPL(memory_bm_read);
+#endif
+
+LIST_HEAD(nosave_regions);
+EXPORT_SYMBOL_GPL(nosave_regions);
/**
* register_nosave_region - register a range of page frames the contents
@@ -849,7 +1046,7 @@ static unsigned int count_free_highmem_pages(void)
* We should save the page if it isn't Nosave or NosaveFree, or Reserved,
* and it isn't a part of a free chunk of pages.
*/
-static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
+struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
{
struct page *page;
@@ -871,6 +1068,7 @@ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
return page;
}
+EXPORT_SYMBOL_GPL(saveable_highmem_page);
/**
* count_highmem_pages - compute the total number of saveable highmem
@@ -896,11 +1094,6 @@ static unsigned int count_highmem_pages(void)
}
return n;
}
-#else
-static inline void *saveable_highmem_page(struct zone *z, unsigned long p)
-{
- return NULL;
-}
#endif /* CONFIG_HIGHMEM */
/**
@@ -911,7 +1104,7 @@ static inline void *saveable_highmem_page(struct zone *z, unsigned long p)
* of pages statically defined as 'unsaveable', and it isn't a part of
* a free chunk of pages.
*/
-static struct page *saveable_page(struct zone *zone, unsigned long pfn)
+struct page *saveable_page(struct zone *zone, unsigned long pfn)
{
struct page *page;
@@ -936,6 +1129,7 @@ static struct page *saveable_page(struct zone *zone, unsigned long pfn)
return page;
}
+EXPORT_SYMBOL_GPL(saveable_page);
/**
* count_data_pages - compute the total number of saveable non-highmem
@@ -1590,6 +1784,9 @@ asmlinkage __visible int swsusp_save(void)
{
unsigned int nr_pages, nr_highmem;
+ if (toi_running)
+ return toi_post_context_save();
+
printk(KERN_INFO "PM: Creating hibernation image:\n");
drain_local_pages(NULL);
@@ -1630,14 +1827,14 @@ asmlinkage __visible int swsusp_save(void)
}
#ifndef CONFIG_ARCH_HIBERNATION_HEADER
-static int init_header_complete(struct swsusp_info *info)
+int init_header_complete(struct swsusp_info *info)
{
memcpy(&info->uts, init_utsname(), sizeof(struct new_utsname));
info->version_code = LINUX_VERSION_CODE;
return 0;
}
-static char *check_image_kernel(struct swsusp_info *info)
+char *check_image_kernel(struct swsusp_info *info)
{
if (info->version_code != LINUX_VERSION_CODE)
return "kernel version";
@@ -1651,6 +1848,7 @@ static char *check_image_kernel(struct swsusp_info *info)
return "machine";
return NULL;
}
+EXPORT_SYMBOL_GPL(check_image_kernel);
#endif /* CONFIG_ARCH_HIBERNATION_HEADER */
unsigned long snapshot_get_image_size(void)
@@ -1658,7 +1856,7 @@ unsigned long snapshot_get_image_size(void)
return nr_copy_pages + nr_meta_pages + 1;
}
-static int init_header(struct swsusp_info *info)
+int init_header(struct swsusp_info *info)
{
memset(info, 0, sizeof(struct swsusp_info));
info->num_physpages = get_num_physpages();
@@ -1668,6 +1866,7 @@ static int init_header(struct swsusp_info *info)
info->size <<= PAGE_SHIFT;
return init_header_complete(info);
}
+EXPORT_SYMBOL_GPL(init_header);
/**
* pack_pfns - pfns corresponding to the set bits found in the bitmap @bm
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 8233cd4..2a37f76 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -302,6 +302,7 @@ int suspend_devices_and_enter(suspend_state_t state)
suspend_ops->recover();
goto Resume_devices;
}
+EXPORT_SYMBOL_GPL(suspend_devices_and_enter);
/**
* suspend_finish - Clean up before finishing the suspend sequence.
diff --git a/kernel/power/tuxonice.h b/kernel/power/tuxonice.h
new file mode 100644
index 0000000..0a511cb
--- /dev/null
+++ b/kernel/power/tuxonice.h
@@ -0,0 +1,227 @@
+/*
+ * kernel/power/tuxonice.h
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains declarations used throughout swsusp.
+ *
+ */
+
+#ifndef KERNEL_POWER_TOI_H
+#define KERNEL_POWER_TOI_H
+
+#include <linux/delay.h>
+#include <linux/bootmem.h>
+#include <linux/suspend.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <asm/setup.h>
+#include "tuxonice_pageflags.h"
+#include "power.h"
+
+#define TOI_CORE_VERSION "3.3"
+#define TOI_HEADER_VERSION 3
+#define MY_BOOT_KERNEL_DATA_VERSION 4
+
+struct toi_boot_kernel_data {
+ int version;
+ int size;
+ unsigned long toi_action;
+ unsigned long toi_debug_state;
+ u32 toi_default_console_level;
+ int toi_io_time[2][2];
+ char toi_nosave_commandline[COMMAND_LINE_SIZE];
+ unsigned long pages_used[33];
+ unsigned long incremental_bytes_in;
+ unsigned long incremental_bytes_out;
+ unsigned long compress_bytes_in;
+ unsigned long compress_bytes_out;
+ unsigned long pruned_pages;
+};
+
+extern struct toi_boot_kernel_data toi_bkd;
+
+/* Location of book kernel data struct in kernel being resumed */
+extern unsigned long boot_kernel_data_buffer;
+
+/* == Action states == */
+
+enum {
+ TOI_REBOOT,
+ TOI_PAUSE,
+ TOI_LOGALL,
+ TOI_CAN_CANCEL,
+ TOI_KEEP_IMAGE,
+ TOI_FREEZER_TEST,
+ TOI_SINGLESTEP,
+ TOI_PAUSE_NEAR_PAGESET_END,
+ TOI_TEST_FILTER_SPEED,
+ TOI_TEST_BIO,
+ TOI_NO_PAGESET2,
+ TOI_IGNORE_ROOTFS,
+ TOI_REPLACE_SWSUSP,
+ TOI_PAGESET2_FULL,
+ TOI_ABORT_ON_RESAVE_NEEDED,
+ TOI_NO_MULTITHREADED_IO,
+ TOI_NO_DIRECT_LOAD, /* Obsolete */
+ TOI_LATE_CPU_HOTPLUG,
+ TOI_GET_MAX_MEM_ALLOCD,
+ TOI_NO_FLUSHER_THREAD,
+ TOI_NO_PS2_IF_UNNEEDED,
+ TOI_POST_RESUME_BREAKPOINT,
+ TOI_NO_READAHEAD,
+};
+
+extern unsigned long toi_bootflags_mask;
+
+#define clear_action_state(bit) (test_and_clear_bit(bit, &toi_bkd.toi_action))
+
+/* == Result states == */
+
+enum {
+ TOI_ABORTED,
+ TOI_ABORT_REQUESTED,
+ TOI_NOSTORAGE_AVAILABLE,
+ TOI_INSUFFICIENT_STORAGE,
+ TOI_FREEZING_FAILED,
+ TOI_KEPT_IMAGE,
+ TOI_WOULD_EAT_MEMORY,
+ TOI_UNABLE_TO_FREE_ENOUGH_MEMORY,
+ TOI_PM_SEM,
+ TOI_DEVICE_REFUSED,
+ TOI_SYSDEV_REFUSED,
+ TOI_EXTRA_PAGES_ALLOW_TOO_SMALL,
+ TOI_UNABLE_TO_PREPARE_IMAGE,
+ TOI_FAILED_MODULE_INIT,
+ TOI_FAILED_MODULE_CLEANUP,
+ TOI_FAILED_IO,
+ TOI_OUT_OF_MEMORY,
+ TOI_IMAGE_ERROR,
+ TOI_PLATFORM_PREP_FAILED,
+ TOI_CPU_HOTPLUG_FAILED,
+ TOI_ARCH_PREPARE_FAILED, /* Removed Linux-3.0 */
+ TOI_RESAVE_NEEDED,
+ TOI_CANT_SUSPEND,
+ TOI_NOTIFIERS_PREPARE_FAILED,
+ TOI_PRE_SNAPSHOT_FAILED,
+ TOI_PRE_RESTORE_FAILED,
+ TOI_USERMODE_HELPERS_ERR,
+ TOI_CANT_USE_ALT_RESUME,
+ TOI_HEADER_TOO_BIG,
+ TOI_WAKEUP_EVENT,
+ TOI_SYSCORE_REFUSED,
+ TOI_DPM_PREPARE_FAILED,
+ TOI_DPM_SUSPEND_FAILED,
+ TOI_NUM_RESULT_STATES /* Used in printing debug info only */
+};
+
+extern unsigned long toi_result;
+
+#define set_result_state(bit) (test_and_set_bit(bit, &toi_result))
+#define set_abort_result(bit) (test_and_set_bit(TOI_ABORTED, &toi_result), \
+ test_and_set_bit(bit, &toi_result))
+#define clear_result_state(bit) (test_and_clear_bit(bit, &toi_result))
+#define test_result_state(bit) (test_bit(bit, &toi_result))
+
+/* == Debug sections and levels == */
+
+/* debugging levels. */
+enum {
+ TOI_STATUS = 0,
+ TOI_ERROR = 2,
+ TOI_LOW,
+ TOI_MEDIUM,
+ TOI_HIGH,
+ TOI_VERBOSE,
+};
+
+enum {
+ TOI_ANY_SECTION,
+ TOI_EAT_MEMORY,
+ TOI_IO,
+ TOI_HEADER,
+ TOI_WRITER,
+ TOI_MEMORY,
+ TOI_PAGEDIR,
+ TOI_COMPRESS,
+ TOI_BIO,
+};
+
+#define set_debug_state(bit) (test_and_set_bit(bit, &toi_bkd.toi_debug_state))
+#define clear_debug_state(bit) \
+ (test_and_clear_bit(bit, &toi_bkd.toi_debug_state))
+#define test_debug_state(bit) (test_bit(bit, &toi_bkd.toi_debug_state))
+
+/* == Steps in hibernating == */
+
+enum {
+ STEP_HIBERNATE_PREPARE_IMAGE,
+ STEP_HIBERNATE_SAVE_IMAGE,
+ STEP_HIBERNATE_POWERDOWN,
+ STEP_RESUME_CAN_RESUME,
+ STEP_RESUME_LOAD_PS1,
+ STEP_RESUME_DO_RESTORE,
+ STEP_RESUME_READ_PS2,
+ STEP_RESUME_GO,
+ STEP_RESUME_ALT_IMAGE,
+ STEP_CLEANUP,
+ STEP_QUIET_CLEANUP
+};
+
+/* == TuxOnIce states ==
+ (see also include/linux/suspend.h) */
+
+#define get_toi_state() (toi_state)
+#define restore_toi_state(saved_state) \
+ do { toi_state = saved_state; } while (0)
+
+/* == Module support == */
+
+struct toi_core_fns {
+ int (*post_context_save)(void);
+ unsigned long (*get_nonconflicting_page)(void);
+ int (*try_hibernate)(void);
+ void (*try_resume)(void);
+};
+
+extern struct toi_core_fns *toi_core_fns;
+
+/* == All else == */
+#define KB(x) ((x) << (PAGE_SHIFT - 10))
+#define MB(x) ((x) >> (20 - PAGE_SHIFT))
+
+extern int toi_start_anything(int toi_or_resume);
+extern void toi_finish_anything(int toi_or_resume);
+
+extern int save_image_part1(void);
+extern int toi_atomic_restore(void);
+
+extern int toi_try_hibernate(void);
+extern void toi_try_resume(void);
+
+extern int __toi_post_context_save(void);
+
+extern unsigned int nr_hibernates;
+extern char alt_resume_param[256];
+
+extern void copyback_post(void);
+extern int toi_hibernate(void);
+extern unsigned long extra_pd1_pages_used;
+
+#define SECTOR_SIZE 512
+
+extern void toi_early_boot_message(int can_erase_image, int default_answer,
+ char *warning_reason, ...);
+
+extern int do_check_can_resume(void);
+extern int do_toi_step(int step);
+extern int toi_launch_userspace_program(char *command, int channel_no,
+ int wait, int debug);
+
+extern char tuxonice_signature[9];
+
+extern int toi_start_other_threads(void);
+extern void toi_stop_other_threads(void);
+#endif
diff --git a/kernel/power/tuxonice_alloc.c b/kernel/power/tuxonice_alloc.c
new file mode 100644
index 0000000..fa44532
--- /dev/null
+++ b/kernel/power/tuxonice_alloc.c
@@ -0,0 +1,314 @@
+/*
+ * kernel/power/tuxonice_alloc.c
+ *
+ * Copyright (C) 2008-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ */
+
+#ifdef CONFIG_PM_DEBUG
+#include <linux/export.h>
+#include <linux/slab.h>
+#include "tuxonice_modules.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice.h"
+
+#define TOI_ALLOC_PATHS 40
+
+static DEFINE_MUTEX(toi_alloc_mutex);
+
+static struct toi_module_ops toi_alloc_ops;
+
+static int toi_fail_num;
+
+static atomic_t toi_alloc_count[TOI_ALLOC_PATHS],
+ toi_free_count[TOI_ALLOC_PATHS],
+ toi_test_count[TOI_ALLOC_PATHS],
+ toi_fail_count[TOI_ALLOC_PATHS];
+static int toi_cur_allocd[TOI_ALLOC_PATHS], toi_max_allocd[TOI_ALLOC_PATHS];
+static int cur_allocd, max_allocd;
+
+static char *toi_alloc_desc[TOI_ALLOC_PATHS] = {
+ "", /* 0 */
+ "get_io_info_struct",
+ "extent",
+ "extent (loading chain)",
+ "userui channel",
+ "userui arg", /* 5 */
+ "attention list metadata",
+ "extra pagedir memory metadata",
+ "bdev metadata",
+ "extra pagedir memory",
+ "header_locations_read", /* 10 */
+ "bio queue",
+ "prepare_readahead",
+ "i/o buffer",
+ "writer buffer in bio_init",
+ "checksum buffer", /* 15 */
+ "compression buffer",
+ "filewriter signature op",
+ "set resume param alloc1",
+ "set resume param alloc2",
+ "debugging info buffer", /* 20 */
+ "check can resume buffer",
+ "write module config buffer",
+ "read module config buffer",
+ "write image header buffer",
+ "read pageset1 buffer", /* 25 */
+ "get_have_image_data buffer",
+ "checksum page",
+ "worker rw loop",
+ "get nonconflicting page",
+ "ps1 load addresses", /* 30 */
+ "remove swap image",
+ "swap image exists",
+ "swap parse sig location",
+ "sysfs kobj",
+ "swap mark resume attempted buffer", /* 35 */
+ "cluster member",
+ "boot kernel data buffer",
+ "setting swap signature",
+ "block i/o bdev struct"
+};
+
+#define MIGHT_FAIL(FAIL_NUM, FAIL_VAL) \
+ do { \
+ BUG_ON(FAIL_NUM >= TOI_ALLOC_PATHS); \
+ \
+ if (FAIL_NUM == toi_fail_num) { \
+ atomic_inc(&toi_test_count[FAIL_NUM]); \
+ toi_fail_num = 0; \
+ return FAIL_VAL; \
+ } \
+ } while (0)
+
+static void alloc_update_stats(int fail_num, void *result, int size)
+{
+ if (!result) {
+ atomic_inc(&toi_fail_count[fail_num]);
+ return;
+ }
+
+ atomic_inc(&toi_alloc_count[fail_num]);
+ if (unlikely(test_action_state(TOI_GET_MAX_MEM_ALLOCD))) {
+ mutex_lock(&toi_alloc_mutex);
+ toi_cur_allocd[fail_num]++;
+ cur_allocd += size;
+ if (unlikely(cur_allocd > max_allocd)) {
+ int i;
+
+ for (i = 0; i < TOI_ALLOC_PATHS; i++)
+ toi_max_allocd[i] = toi_cur_allocd[i];
+ max_allocd = cur_allocd;
+ }
+ mutex_unlock(&toi_alloc_mutex);
+ }
+}
+
+static void free_update_stats(int fail_num, int size)
+{
+ BUG_ON(fail_num >= TOI_ALLOC_PATHS);
+ atomic_inc(&toi_free_count[fail_num]);
+ if (unlikely(atomic_read(&toi_free_count[fail_num]) >
+ atomic_read(&toi_alloc_count[fail_num])))
+ dump_stack();
+ if (unlikely(test_action_state(TOI_GET_MAX_MEM_ALLOCD))) {
+ mutex_lock(&toi_alloc_mutex);
+ cur_allocd -= size;
+ toi_cur_allocd[fail_num]--;
+ mutex_unlock(&toi_alloc_mutex);
+ }
+}
+
+void *toi_kzalloc(int fail_num, size_t size, gfp_t flags)
+{
+ void *result;
+
+ if (toi_alloc_ops.enabled)
+ MIGHT_FAIL(fail_num, NULL);
+ result = kzalloc(size, flags);
+ if (toi_alloc_ops.enabled)
+ alloc_update_stats(fail_num, result, size);
+ if (fail_num == toi_trace_allocs)
+ dump_stack();
+ return result;
+}
+EXPORT_SYMBOL_GPL(toi_kzalloc);
+
+unsigned long toi_get_free_pages(int fail_num, gfp_t mask,
+ unsigned int order)
+{
+ unsigned long result;
+
+ if (toi_alloc_ops.enabled)
+ MIGHT_FAIL(fail_num, 0);
+ result = __get_free_pages(mask, order);
+ if (toi_alloc_ops.enabled)
+ alloc_update_stats(fail_num, (void *) result,
+ PAGE_SIZE << order);
+ if (fail_num == toi_trace_allocs)
+ dump_stack();
+ return result;
+}
+EXPORT_SYMBOL_GPL(toi_get_free_pages);
+
+struct page *toi_alloc_page(int fail_num, gfp_t mask)
+{
+ struct page *result;
+
+ if (toi_alloc_ops.enabled)
+ MIGHT_FAIL(fail_num, NULL);
+ result = alloc_page(mask);
+ if (toi_alloc_ops.enabled)
+ alloc_update_stats(fail_num, (void *) result, PAGE_SIZE);
+ if (fail_num == toi_trace_allocs)
+ dump_stack();
+ return result;
+}
+EXPORT_SYMBOL_GPL(toi_alloc_page);
+
+unsigned long toi_get_zeroed_page(int fail_num, gfp_t mask)
+{
+ unsigned long result;
+
+ if (toi_alloc_ops.enabled)
+ MIGHT_FAIL(fail_num, 0);
+ result = get_zeroed_page(mask);
+ if (toi_alloc_ops.enabled)
+ alloc_update_stats(fail_num, (void *) result, PAGE_SIZE);
+ if (fail_num == toi_trace_allocs)
+ dump_stack();
+ return result;
+}
+EXPORT_SYMBOL_GPL(toi_get_zeroed_page);
+
+void toi_kfree(int fail_num, const void *arg, int size)
+{
+ if (arg && toi_alloc_ops.enabled)
+ free_update_stats(fail_num, size);
+
+ if (fail_num == toi_trace_allocs)
+ dump_stack();
+ kfree(arg);
+}
+EXPORT_SYMBOL_GPL(toi_kfree);
+
+void toi_free_page(int fail_num, unsigned long virt)
+{
+ if (virt && toi_alloc_ops.enabled)
+ free_update_stats(fail_num, PAGE_SIZE);
+
+ if (fail_num == toi_trace_allocs)
+ dump_stack();
+ free_page(virt);
+}
+EXPORT_SYMBOL_GPL(toi_free_page);
+
+void toi__free_page(int fail_num, struct page *page)
+{
+ if (page && toi_alloc_ops.enabled)
+ free_update_stats(fail_num, PAGE_SIZE);
+
+ if (fail_num == toi_trace_allocs)
+ dump_stack();
+ __free_page(page);
+}
+EXPORT_SYMBOL_GPL(toi__free_page);
+
+void toi_free_pages(int fail_num, struct page *page, int order)
+{
+ if (page && toi_alloc_ops.enabled)
+ free_update_stats(fail_num, PAGE_SIZE << order);
+
+ if (fail_num == toi_trace_allocs)
+ dump_stack();
+ __free_pages(page, order);
+}
+
+void toi_alloc_print_debug_stats(void)
+{
+ int i, header_done = 0;
+
+ if (!toi_alloc_ops.enabled)
+ return;
+
+ for (i = 0; i < TOI_ALLOC_PATHS; i++)
+ if (atomic_read(&toi_alloc_count[i]) !=
+ atomic_read(&toi_free_count[i])) {
+ if (!header_done) {
+ printk(KERN_INFO "Idx Allocs Frees Tests "
+ " Fails Max Description\n");
+ header_done = 1;
+ }
+
+ printk(KERN_INFO "%3d %7d %7d %7d %7d %7d %s\n", i,
+ atomic_read(&toi_alloc_count[i]),
+ atomic_read(&toi_free_count[i]),
+ atomic_read(&toi_test_count[i]),
+ atomic_read(&toi_fail_count[i]),
+ toi_max_allocd[i],
+ toi_alloc_desc[i]);
+ }
+}
+EXPORT_SYMBOL_GPL(toi_alloc_print_debug_stats);
+
+static int toi_alloc_initialise(int starting_cycle)
+{
+ int i;
+
+ if (!starting_cycle)
+ return 0;
+
+ if (toi_trace_allocs)
+ dump_stack();
+
+ for (i = 0; i < TOI_ALLOC_PATHS; i++) {
+ atomic_set(&toi_alloc_count[i], 0);
+ atomic_set(&toi_free_count[i], 0);
+ atomic_set(&toi_test_count[i], 0);
+ atomic_set(&toi_fail_count[i], 0);
+ toi_cur_allocd[i] = 0;
+ toi_max_allocd[i] = 0;
+ };
+
+ max_allocd = 0;
+ cur_allocd = 0;
+ return 0;
+}
+
+static struct toi_sysfs_data sysfs_params[] = {
+ SYSFS_INT("failure_test", SYSFS_RW, &toi_fail_num, 0, 99, 0, NULL),
+ SYSFS_INT("trace", SYSFS_RW, &toi_trace_allocs, 0, TOI_ALLOC_PATHS, 0,
+ NULL),
+ SYSFS_BIT("find_max_mem_allocated", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_GET_MAX_MEM_ALLOCD, 0),
+ SYSFS_INT("enabled", SYSFS_RW, &toi_alloc_ops.enabled, 0, 1, 0,
+ NULL)
+};
+
+static struct toi_module_ops toi_alloc_ops = {
+ .type = MISC_HIDDEN_MODULE,
+ .name = "allocation debugging",
+ .directory = "alloc",
+ .module = THIS_MODULE,
+ .early = 1,
+ .initialise = toi_alloc_initialise,
+
+ .sysfs_data = sysfs_params,
+ .num_sysfs_entries = sizeof(sysfs_params) /
+ sizeof(struct toi_sysfs_data),
+};
+
+int toi_alloc_init(void)
+{
+ int result = toi_register_module(&toi_alloc_ops);
+ return result;
+}
+
+void toi_alloc_exit(void)
+{
+ toi_unregister_module(&toi_alloc_ops);
+}
+#endif
diff --git a/kernel/power/tuxonice_alloc.h b/kernel/power/tuxonice_alloc.h
new file mode 100644
index 0000000..6e8167e
--- /dev/null
+++ b/kernel/power/tuxonice_alloc.h
@@ -0,0 +1,54 @@
+/*
+ * kernel/power/tuxonice_alloc.h
+ *
+ * Copyright (C) 2008-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ */
+
+#include <linux/slab.h>
+#define TOI_WAIT_GFP (GFP_NOFS | __GFP_NOWARN)
+#define TOI_ATOMIC_GFP (GFP_ATOMIC | __GFP_NOWARN)
+
+#ifdef CONFIG_PM_DEBUG
+extern void *toi_kzalloc(int fail_num, size_t size, gfp_t flags);
+extern void toi_kfree(int fail_num, const void *arg, int size);
+
+extern unsigned long toi_get_free_pages(int fail_num, gfp_t mask,
+ unsigned int order);
+#define toi_get_free_page(FAIL_NUM, MASK) toi_get_free_pages(FAIL_NUM, MASK, 0)
+extern unsigned long toi_get_zeroed_page(int fail_num, gfp_t mask);
+extern void toi_free_page(int fail_num, unsigned long buf);
+extern void toi__free_page(int fail_num, struct page *page);
+extern void toi_free_pages(int fail_num, struct page *page, int order);
+extern struct page *toi_alloc_page(int fail_num, gfp_t mask);
+extern int toi_alloc_init(void);
+extern void toi_alloc_exit(void);
+
+extern void toi_alloc_print_debug_stats(void);
+
+#else /* CONFIG_PM_DEBUG */
+
+#define toi_kzalloc(FAIL, SIZE, FLAGS) (kzalloc(SIZE, FLAGS))
+#define toi_kfree(FAIL, ALLOCN, SIZE) (kfree(ALLOCN))
+
+#define toi_get_free_pages(FAIL, FLAGS, ORDER) __get_free_pages(FLAGS, ORDER)
+#define toi_get_free_page(FAIL, FLAGS) __get_free_page(FLAGS)
+#define toi_get_zeroed_page(FAIL, FLAGS) get_zeroed_page(FLAGS)
+#define toi_free_page(FAIL, ALLOCN) do { free_page(ALLOCN); } while (0)
+#define toi__free_page(FAIL, PAGE) __free_page(PAGE)
+#define toi_free_pages(FAIL, PAGE, ORDER) __free_pages(PAGE, ORDER)
+#define toi_alloc_page(FAIL, MASK) alloc_page(MASK)
+static inline int toi_alloc_init(void)
+{
+ return 0;
+}
+
+static inline void toi_alloc_exit(void) { }
+
+static inline void toi_alloc_print_debug_stats(void) { }
+
+#endif
+
+extern int toi_trace_allocs;
diff --git a/kernel/power/tuxonice_atomic_copy.c b/kernel/power/tuxonice_atomic_copy.c
new file mode 100644
index 0000000..7b7b1cd
--- /dev/null
+++ b/kernel/power/tuxonice_atomic_copy.c
@@ -0,0 +1,473 @@
+/*
+ * kernel/power/tuxonice_atomic_copy.c
+ *
+ * Copyright 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * Distributed under GPLv2.
+ *
+ * Routines for doing the atomic save/restore.
+ */
+
+#include <linux/suspend.h>
+#include <linux/highmem.h>
+#include <linux/cpu.h>
+#include <linux/freezer.h>
+#include <linux/console.h>
+#include <linux/syscore_ops.h>
+#include <linux/ftrace.h>
+#include <asm/suspend.h>
+#include "tuxonice.h"
+#include "tuxonice_storage.h"
+#include "tuxonice_power_off.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_io.h"
+#include "tuxonice_prepare_image.h"
+#include "tuxonice_pageflags.h"
+#include "tuxonice_checksum.h"
+#include "tuxonice_builtin.h"
+#include "tuxonice_atomic_copy.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_modules.h"
+
+unsigned long extra_pd1_pages_used;
+
+/**
+ * free_pbe_list - free page backup entries used by the atomic copy code.
+ * @list: List to free.
+ * @highmem: Whether the list is in highmem.
+ *
+ * Normally, this function isn't used. If, however, we need to abort before
+ * doing the atomic copy, we use this to free the pbes previously allocated.
+ **/
+static void free_pbe_list(struct pbe **list, int highmem)
+{
+ while (*list) {
+ int i;
+ struct pbe *free_pbe, *next_page = NULL;
+ struct page *page;
+
+ if (highmem) {
+ page = (struct page *) *list;
+ free_pbe = (struct pbe *) kmap(page);
+ } else {
+ page = virt_to_page(*list);
+ free_pbe = *list;
+ }
+
+ for (i = 0; i < PBES_PER_PAGE; i++) {
+ if (!free_pbe)
+ break;
+ if (highmem)
+ toi__free_page(29, free_pbe->address);
+ else
+ toi_free_page(29,
+ (unsigned long) free_pbe->address);
+ free_pbe = free_pbe->next;
+ }
+
+ if (highmem) {
+ if (free_pbe)
+ next_page = free_pbe;
+ kunmap(page);
+ } else {
+ if (free_pbe)
+ next_page = free_pbe;
+ }
+
+ toi__free_page(29, page);
+ *list = (struct pbe *) next_page;
+ };
+}
+
+/**
+ * copyback_post - post atomic-restore actions
+ *
+ * After doing the atomic restore, we have a few more things to do:
+ * 1) We want to retain some values across the restore, so we now copy
+ * these from the nosave variables to the normal ones.
+ * 2) Set the status flags.
+ * 3) Resume devices.
+ * 4) Tell userui so it can redraw & restore settings.
+ * 5) Reread the page cache.
+ **/
+void copyback_post(void)
+{
+ struct toi_boot_kernel_data *bkd =
+ (struct toi_boot_kernel_data *) boot_kernel_data_buffer;
+
+ if (toi_activate_storage(1))
+ panic("Failed to reactivate our storage.");
+
+ toi_post_atomic_restore_modules(bkd);
+
+ toi_cond_pause(1, "About to reload secondary pagedir.");
+
+ if (read_pageset2(0))
+ panic("Unable to successfully reread the page cache.");
+
+ /*
+ * If the user wants to sleep again after resuming from full-off,
+ * it's most likely to be in order to suspend to ram, so we'll
+ * do this check after loading pageset2, to give them the fastest
+ * wakeup when they are ready to use the computer again.
+ */
+ toi_check_resleep();
+}
+
+/**
+ * toi_copy_pageset1 - do the atomic copy of pageset1
+ *
+ * Make the atomic copy of pageset1. We can't use copy_page (as we once did)
+ * because we can't be sure what side effects it has. On my old Duron, with
+ * 3DNOW, kernel_fpu_begin increments preempt count, making our preempt
+ * count at resume time 4 instead of 3.
+ *
+ * We don't want to call kmap_atomic unconditionally because it has the side
+ * effect of incrementing the preempt count, which will leave it one too high
+ * post resume (the page containing the preempt count will be copied after
+ * its incremented. This is essentially the same problem.
+ **/
+void toi_copy_pageset1(void)
+{
+ int i;
+ unsigned long source_index, dest_index;
+
+ memory_bm_position_reset(pageset1_map);
+ memory_bm_position_reset(pageset1_copy_map);
+
+ source_index = memory_bm_next_pfn(pageset1_map);
+ dest_index = memory_bm_next_pfn(pageset1_copy_map);
+
+ for (i = 0; i < pagedir1.size; i++) {
+ unsigned long *origvirt, *copyvirt;
+ struct page *origpage, *copypage;
+ int loop = (PAGE_SIZE / sizeof(unsigned long)) - 1,
+ was_present1, was_present2;
+
+ origpage = pfn_to_page(source_index);
+ copypage = pfn_to_page(dest_index);
+
+ origvirt = PageHighMem(origpage) ?
+ kmap_atomic(origpage) :
+ page_address(origpage);
+
+ copyvirt = PageHighMem(copypage) ?
+ kmap_atomic(copypage) :
+ page_address(copypage);
+
+ was_present1 = kernel_page_present(origpage);
+ if (!was_present1)
+ kernel_map_pages(origpage, 1, 1);
+
+ was_present2 = kernel_page_present(copypage);
+ if (!was_present2)
+ kernel_map_pages(copypage, 1, 1);
+
+ while (loop >= 0) {
+ *(copyvirt + loop) = *(origvirt + loop);
+ loop--;
+ }
+
+ if (!was_present1)
+ kernel_map_pages(origpage, 1, 0);
+
+ if (!was_present2)
+ kernel_map_pages(copypage, 1, 0);
+
+ if (PageHighMem(origpage))
+ kunmap_atomic(origvirt);
+
+ if (PageHighMem(copypage))
+ kunmap_atomic(copyvirt);
+
+ source_index = memory_bm_next_pfn(pageset1_map);
+ dest_index = memory_bm_next_pfn(pageset1_copy_map);
+ }
+}
+
+/**
+ * __toi_post_context_save - steps after saving the cpu context
+ *
+ * Steps taken after saving the CPU state to make the actual
+ * atomic copy.
+ *
+ * Called from swsusp_save in snapshot.c via toi_post_context_save.
+ **/
+int __toi_post_context_save(void)
+{
+ unsigned long old_ps1_size = pagedir1.size;
+
+ check_checksums();
+
+ free_checksum_pages();
+
+ toi_recalculate_image_contents(1);
+
+ extra_pd1_pages_used = pagedir1.size > old_ps1_size ?
+ pagedir1.size - old_ps1_size : 0;
+
+ if (extra_pd1_pages_used > extra_pd1_pages_allowance) {
+ printk(KERN_INFO "Pageset1 has grown by %lu pages. "
+ "extra_pages_allowance is currently only %lu.\n",
+ pagedir1.size - old_ps1_size,
+ extra_pd1_pages_allowance);
+
+ /*
+ * Highlevel code will see this, clear the state and
+ * retry if we haven't already done so twice.
+ */
+ if (any_to_free(1)) {
+ set_abort_result(TOI_EXTRA_PAGES_ALLOW_TOO_SMALL);
+ return 1;
+ }
+ if (try_allocate_extra_memory()) {
+ printk(KERN_INFO "Failed to allocate the extra memory"
+ " needed. Restarting the process.");
+ set_abort_result(TOI_EXTRA_PAGES_ALLOW_TOO_SMALL);
+ return 1;
+ }
+ printk(KERN_INFO "However it looks like there's enough"
+ " free ram and storage to handle this, so "
+ " continuing anyway.");
+ /*
+ * What if try_allocate_extra_memory above calls
+ * toi_allocate_extra_pagedir_memory and it allocs a new
+ * slab page via toi_kzalloc which should be in ps1? So...
+ */
+ toi_recalculate_image_contents(1);
+ }
+
+ if (!test_action_state(TOI_TEST_FILTER_SPEED) &&
+ !test_action_state(TOI_TEST_BIO))
+ toi_copy_pageset1();
+
+ return 0;
+}
+
+/**
+ * toi_hibernate - high level code for doing the atomic copy
+ *
+ * High-level code which prepares to do the atomic copy. Loosely based
+ * on the swsusp version, but with the following twists:
+ * - We set toi_running so the swsusp code uses our code paths.
+ * - We give better feedback regarding what goes wrong if there is a
+ * problem.
+ * - We use an extra function to call the assembly, just in case this code
+ * is in a module (return address).
+ **/
+int toi_hibernate(void)
+{
+ int error;
+
+ toi_running = 1; /* For the swsusp code we use :< */
+
+ error = toi_lowlevel_builtin();
+
+ if (!error) {
+ struct toi_boot_kernel_data *bkd =
+ (struct toi_boot_kernel_data *) boot_kernel_data_buffer;
+
+ /*
+ * The boot kernel's data may be larger (newer version) or
+ * smaller (older version) than ours. Copy the minimum
+ * of the two sizes, so that we don't overwrite valid values
+ * from pre-atomic copy.
+ */
+
+ memcpy(&toi_bkd, (char *) boot_kernel_data_buffer,
+ min_t(int, sizeof(struct toi_boot_kernel_data),
+ bkd->size));
+ }
+
+ toi_running = 0;
+ return error;
+}
+
+/**
+ * toi_atomic_restore - prepare to do the atomic restore
+ *
+ * Get ready to do the atomic restore. This part gets us into the same
+ * state we are in prior to do calling do_toi_lowlevel while
+ * hibernating: hot-unplugging secondary cpus and freeze processes,
+ * before starting the thread that will do the restore.
+ **/
+int toi_atomic_restore(void)
+{
+ int error;
+
+ toi_running = 1;
+
+ toi_prepare_status(DONT_CLEAR_BAR, "Atomic restore.");
+
+ memcpy(&toi_bkd.toi_nosave_commandline, saved_command_line,
+ strlen(saved_command_line));
+
+ toi_pre_atomic_restore_modules(&toi_bkd);
+
+ if (add_boot_kernel_data_pbe())
+ goto Failed;
+
+ toi_prepare_status(DONT_CLEAR_BAR, "Doing atomic copy/restore.");
+
+ if (toi_go_atomic(PMSG_QUIESCE, 0))
+ goto Failed;
+
+ /* We'll ignore saved state, but this gets preempt count (etc) right */
+ save_processor_state();
+
+ error = swsusp_arch_resume();
+ /*
+ * Code below is only ever reached in case of failure. Otherwise
+ * execution continues at place where swsusp_arch_suspend was called.
+ *
+ * We don't know whether it's safe to continue (this shouldn't happen),
+ * so lets err on the side of caution.
+ */
+ BUG();
+
+Failed:
+ free_pbe_list(&restore_pblist, 0);
+#ifdef CONFIG_HIGHMEM
+ free_pbe_list(&restore_highmem_pblist, 1);
+#endif
+ toi_running = 0;
+ return 1;
+}
+
+/**
+ * toi_go_atomic - do the actual atomic copy/restore
+ * @state: The state to use for dpm_suspend_start & power_down calls.
+ * @suspend_time: Whether we're suspending or resuming.
+ **/
+int toi_go_atomic(pm_message_t state, int suspend_time)
+{
+ if (suspend_time) {
+ if (platform_begin(1)) {
+ set_abort_result(TOI_PLATFORM_PREP_FAILED);
+ toi_end_atomic(ATOMIC_STEP_PLATFORM_END, suspend_time, 3);
+ return 1;
+ }
+
+ if (dpm_prepare(PMSG_FREEZE)) {
+ set_abort_result(TOI_DPM_PREPARE_FAILED);
+ dpm_complete(PMSG_RECOVER);
+ toi_end_atomic(ATOMIC_STEP_PLATFORM_END, suspend_time, 3);
+ return 1;
+ }
+ }
+
+ suspend_console();
+ ftrace_stop();
+ pm_restrict_gfp_mask();
+
+ if (suspend_time) {
+ if (dpm_suspend(state)) {
+ set_abort_result(TOI_DPM_SUSPEND_FAILED);
+ toi_end_atomic(ATOMIC_STEP_DEVICE_RESUME, suspend_time, 3);
+ return 1;
+ }
+ } else {
+ if (dpm_suspend_start(state)) {
+ set_abort_result(TOI_DPM_SUSPEND_FAILED);
+ toi_end_atomic(ATOMIC_STEP_DEVICE_RESUME, suspend_time, 3);
+ return 1;
+ }
+ }
+
+ /* At this point, dpm_suspend_start() has been called, but *not*
+ * dpm_suspend_noirq(). We *must* dpm_suspend_noirq() now.
+ * Otherwise, drivers for some devices (e.g. interrupt controllers)
+ * become desynchronized with the actual state of the hardware
+ * at resume time, and evil weirdness ensues.
+ */
+
+ if (dpm_suspend_end(state)) {
+ set_abort_result(TOI_DEVICE_REFUSED);
+ toi_end_atomic(ATOMIC_STEP_DEVICE_RESUME, suspend_time, 1);
+ return 1;
+ }
+
+ if (suspend_time) {
+ if (platform_pre_snapshot(1))
+ set_abort_result(TOI_PRE_SNAPSHOT_FAILED);
+ } else {
+ if (platform_pre_restore(1))
+ set_abort_result(TOI_PRE_RESTORE_FAILED);
+ }
+
+ if (test_result_state(TOI_ABORTED)) {
+ toi_end_atomic(ATOMIC_STEP_PLATFORM_FINISH, suspend_time, 1);
+ return 1;
+ }
+
+ if (test_action_state(TOI_LATE_CPU_HOTPLUG)) {
+ if (disable_nonboot_cpus()) {
+ set_abort_result(TOI_CPU_HOTPLUG_FAILED);
+ toi_end_atomic(ATOMIC_STEP_CPU_HOTPLUG,
+ suspend_time, 1);
+ return 1;
+ }
+ }
+
+ local_irq_disable();
+
+ if (syscore_suspend()) {
+ set_abort_result(TOI_SYSCORE_REFUSED);
+ toi_end_atomic(ATOMIC_STEP_IRQS, suspend_time, 1);
+ return 1;
+ }
+
+ if (suspend_time && pm_wakeup_pending()) {
+ set_abort_result(TOI_WAKEUP_EVENT);
+ toi_end_atomic(ATOMIC_STEP_SYSCORE_RESUME, suspend_time, 1);
+ return 1;
+ }
+ return 0;
+}
+
+/**
+ * toi_end_atomic - post atomic copy/restore routines
+ * @stage: What step to start at.
+ * @suspend_time: Whether we're suspending or resuming.
+ * @error: Whether we're recovering from an error.
+ **/
+void toi_end_atomic(int stage, int suspend_time, int error)
+{
+ pm_message_t msg = suspend_time ? (error ? PMSG_RECOVER : PMSG_THAW) :
+ PMSG_RESTORE;
+
+ switch (stage) {
+ case ATOMIC_ALL_STEPS:
+ if (!suspend_time) {
+ events_check_enabled = false;
+ }
+ platform_leave(1);
+ case ATOMIC_STEP_SYSCORE_RESUME:
+ syscore_resume();
+ case ATOMIC_STEP_IRQS:
+ local_irq_enable();
+ case ATOMIC_STEP_CPU_HOTPLUG:
+ if (test_action_state(TOI_LATE_CPU_HOTPLUG))
+ enable_nonboot_cpus();
+ case ATOMIC_STEP_PLATFORM_FINISH:
+ if (!suspend_time && error & 2)
+ platform_restore_cleanup(1);
+ else
+ platform_finish(1);
+ dpm_resume_start(msg);
+ case ATOMIC_STEP_DEVICE_RESUME:
+ if (suspend_time && (error & 2))
+ platform_recover(1);
+ dpm_resume(msg);
+ if (error || !toi_in_suspend())
+ pm_restore_gfp_mask();
+ ftrace_start();
+ resume_console();
+ case ATOMIC_STEP_DPM_COMPLETE:
+ dpm_complete(msg);
+ case ATOMIC_STEP_PLATFORM_END:
+ platform_end(1);
+
+ toi_prepare_status(DONT_CLEAR_BAR, "Post atomic.");
+ }
+}
diff --git a/kernel/power/tuxonice_atomic_copy.h b/kernel/power/tuxonice_atomic_copy.h
new file mode 100644
index 0000000..86aaae3
--- /dev/null
+++ b/kernel/power/tuxonice_atomic_copy.h
@@ -0,0 +1,23 @@
+/*
+ * kernel/power/tuxonice_atomic_copy.h
+ *
+ * Copyright 2008-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * Distributed under GPLv2.
+ *
+ * Routines for doing the atomic save/restore.
+ */
+
+enum {
+ ATOMIC_ALL_STEPS,
+ ATOMIC_STEP_SYSCORE_RESUME,
+ ATOMIC_STEP_IRQS,
+ ATOMIC_STEP_CPU_HOTPLUG,
+ ATOMIC_STEP_PLATFORM_FINISH,
+ ATOMIC_STEP_DEVICE_RESUME,
+ ATOMIC_STEP_DPM_COMPLETE,
+ ATOMIC_STEP_PLATFORM_END,
+};
+
+int toi_go_atomic(pm_message_t state, int toi_time);
+void toi_end_atomic(int stage, int toi_time, int error);
diff --git a/kernel/power/tuxonice_bio.h b/kernel/power/tuxonice_bio.h
new file mode 100644
index 0000000..65130c8
--- /dev/null
+++ b/kernel/power/tuxonice_bio.h
@@ -0,0 +1,77 @@
+/*
+ * kernel/power/tuxonice_bio.h
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * Distributed under GPLv2.
+ *
+ * This file contains declarations for functions exported from
+ * tuxonice_bio.c, which contains low level io functions.
+ */
+
+#include <linux/buffer_head.h>
+#include "tuxonice_extent.h"
+
+void toi_put_extent_chain(struct hibernate_extent_chain *chain);
+int toi_add_to_extent_chain(struct hibernate_extent_chain *chain,
+ unsigned long start, unsigned long end);
+
+struct hibernate_extent_saved_state {
+ int extent_num;
+ struct hibernate_extent *extent_ptr;
+ unsigned long offset;
+};
+
+struct toi_bdev_info {
+ struct toi_bdev_info *next;
+ struct hibernate_extent_chain blocks;
+ struct block_device *bdev;
+ struct toi_module_ops *allocator;
+ int allocator_index;
+ struct hibernate_extent_chain allocations;
+ char name[266]; /* "swap on " or "file " + up to 256 chars */
+
+ /* Saved in header */
+ char uuid[17];
+ dev_t dev_t;
+ int prio;
+ int bmap_shift;
+ int blocks_per_page;
+ unsigned long pages_used;
+ struct hibernate_extent_saved_state saved_state[4];
+};
+
+struct toi_extent_iterate_state {
+ struct toi_bdev_info *current_chain;
+ int num_chains;
+ int saved_chain_number[4];
+ struct toi_bdev_info *saved_chain_ptr[4];
+};
+
+/*
+ * Our exported interface so the swapwriter and filewriter don't
+ * need these functions duplicated.
+ */
+struct toi_bio_ops {
+ int (*bdev_page_io) (int rw, struct block_device *bdev, long pos,
+ struct page *page);
+ int (*register_storage)(struct toi_bdev_info *new);
+ void (*free_storage)(void);
+};
+
+struct toi_allocator_ops {
+ unsigned long (*toi_swap_storage_available) (void);
+};
+
+extern struct toi_bio_ops toi_bio_ops;
+
+extern char *toi_writer_buffer;
+extern int toi_writer_buffer_posn;
+
+struct toi_bio_allocator_ops {
+ int (*register_storage) (void);
+ unsigned long (*storage_available)(void);
+ int (*allocate_storage) (struct toi_bdev_info *, unsigned long);
+ int (*bmap) (struct toi_bdev_info *);
+ void (*free_storage) (struct toi_bdev_info *);
+};
diff --git a/kernel/power/tuxonice_bio_chains.c b/kernel/power/tuxonice_bio_chains.c
new file mode 100644
index 0000000..73dbcf2
--- /dev/null
+++ b/kernel/power/tuxonice_bio_chains.c
@@ -0,0 +1,1048 @@
+/*
+ * kernel/power/tuxonice_bio_devinfo.c
+ *
+ * Copyright (C) 2009-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * Distributed under GPLv2.
+ *
+ */
+
+#include <linux/mm_types.h>
+#include "tuxonice_bio.h"
+#include "tuxonice_bio_internal.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_ui.h"
+#include "tuxonice.h"
+#include "tuxonice_io.h"
+
+static struct toi_bdev_info *prio_chain_head;
+static int num_chains;
+
+/* Pointer to current entry being loaded/saved. */
+struct toi_extent_iterate_state toi_writer_posn;
+
+#define metadata_size (sizeof(struct toi_bdev_info) - \
+ offsetof(struct toi_bdev_info, uuid))
+
+/*
+ * After section 0 (header) comes 2 => next_section[0] = 2
+ */
+static int next_section[3] = { 2, 3, 1 };
+
+/**
+ * dump_block_chains - print the contents of the bdev info array.
+ **/
+void dump_block_chains(void)
+{
+ int i = 0;
+ int j;
+ struct toi_bdev_info *cur_chain = prio_chain_head;
+
+ while (cur_chain) {
+ struct hibernate_extent *this = cur_chain->blocks.first;
+
+ printk(KERN_DEBUG "Chain %d (prio %d):", i, cur_chain->prio);
+
+ while (this) {
+ printk(KERN_CONT " [%lu-%lu]%s", this->start,
+ this->end, this->next ? "," : "");
+ this = this->next;
+ }
+
+ printk("\n");
+ cur_chain = cur_chain->next;
+ i++;
+ }
+
+ printk(KERN_DEBUG "Saved states:\n");
+ for (i = 0; i < 4; i++) {
+ printk(KERN_DEBUG "Slot %d: Chain %d.\n",
+ i, toi_writer_posn.saved_chain_number[i]);
+
+ cur_chain = prio_chain_head;
+ j = 0;
+ while (cur_chain) {
+ printk(KERN_DEBUG " Chain %d: Extent %d. Offset %lu.\n",
+ j, cur_chain->saved_state[i].extent_num,
+ cur_chain->saved_state[i].offset);
+ cur_chain = cur_chain->next;
+ j++;
+ }
+ printk(KERN_CONT "\n");
+ }
+}
+
+/**
+ *
+ **/
+static void toi_extent_chain_next(void)
+{
+ struct toi_bdev_info *this = toi_writer_posn.current_chain;
+
+ if (!this->blocks.current_extent)
+ return;
+
+ if (this->blocks.current_offset == this->blocks.current_extent->end) {
+ if (this->blocks.current_extent->next) {
+ this->blocks.current_extent =
+ this->blocks.current_extent->next;
+ this->blocks.current_offset =
+ this->blocks.current_extent->start;
+ } else {
+ this->blocks.current_extent = NULL;
+ this->blocks.current_offset = 0;
+ }
+ } else
+ this->blocks.current_offset++;
+}
+
+/**
+ *
+ */
+
+static struct toi_bdev_info *__find_next_chain_same_prio(void)
+{
+ struct toi_bdev_info *start_chain = toi_writer_posn.current_chain;
+ struct toi_bdev_info *this = start_chain;
+ int orig_prio = this->prio;
+
+ do {
+ this = this->next;
+
+ if (!this)
+ this = prio_chain_head;
+
+ /* Back on original chain? Use it again. */
+ if (this == start_chain)
+ return start_chain;
+
+ } while (!this->blocks.current_extent || this->prio != orig_prio);
+
+ return this;
+}
+
+static void find_next_chain(void)
+{
+ struct toi_bdev_info *this;
+
+ this = __find_next_chain_same_prio();
+
+ /*
+ * If we didn't get another chain of the same priority that we
+ * can use, look for the next priority.
+ */
+ while (this && !this->blocks.current_extent)
+ this = this->next;
+
+ toi_writer_posn.current_chain = this;
+}
+
+/**
+ * toi_extent_state_next - go to the next extent
+ * @blocks: The number of values to progress.
+ * @stripe_mode: Whether to spread usage across all chains.
+ *
+ * Given a state, progress to the next valid entry. We may begin in an
+ * invalid state, as we do when invoked after extent_state_goto_start below.
+ *
+ * When using compression and expected_compression > 0, we let the image size
+ * be larger than storage, so we can validly run out of data to return.
+ **/
+static unsigned long toi_extent_state_next(int blocks, int current_stream)
+{
+ int i;
+
+ if (!toi_writer_posn.current_chain)
+ return -ENOSPC;
+
+ /* Assume chains always have lengths that are multiples of @blocks */
+ for (i = 0; i < blocks; i++)
+ toi_extent_chain_next();
+
+ /* The header stream is not striped */
+ if (current_stream ||
+ !toi_writer_posn.current_chain->blocks.current_extent)
+ find_next_chain();
+
+ return toi_writer_posn.current_chain ? 0 : -ENOSPC;
+}
+
+static void toi_insert_chain_in_prio_list(struct toi_bdev_info *this)
+{
+ struct toi_bdev_info **prev_ptr;
+ struct toi_bdev_info *cur;
+
+ /* Loop through the existing chain, finding where to insert it */
+ prev_ptr = &prio_chain_head;
+ cur = prio_chain_head;
+
+ while (cur && cur->prio >= this->prio) {
+ prev_ptr = &cur->next;
+ cur = cur->next;
+ }
+
+ this->next = *prev_ptr;
+ *prev_ptr = this;
+
+ this = prio_chain_head;
+ while (this)
+ this = this->next;
+ num_chains++;
+}
+
+/**
+ * toi_extent_state_goto_start - reinitialize an extent chain iterator
+ * @state: Iterator to reinitialize
+ **/
+void toi_extent_state_goto_start(void)
+{
+ struct toi_bdev_info *this = prio_chain_head;
+
+ while (this) {
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Setting current extent to %p.", this->blocks.first);
+ this->blocks.current_extent = this->blocks.first;
+ if (this->blocks.current_extent) {
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Setting current offset to %lu.",
+ this->blocks.current_extent->start);
+ this->blocks.current_offset =
+ this->blocks.current_extent->start;
+ }
+
+ this = this->next;
+ }
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Setting current chain to %p.",
+ prio_chain_head);
+ toi_writer_posn.current_chain = prio_chain_head;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Leaving extent state goto start.");
+}
+
+/**
+ * toi_extent_state_save - save state of the iterator
+ * @state: Current state of the chain
+ * @saved_state: Iterator to populate
+ *
+ * Given a state and a struct hibernate_extent_state_store, save the current
+ * position in a format that can be used with relocated chains (at
+ * resume time).
+ **/
+void toi_extent_state_save(int slot)
+{
+ struct toi_bdev_info *cur_chain = prio_chain_head;
+ struct hibernate_extent *extent;
+ struct hibernate_extent_saved_state *chain_state;
+ int i = 0;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "toi_extent_state_save, slot %d.",
+ slot);
+
+ if (!toi_writer_posn.current_chain) {
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "No current chain => "
+ "chain_num = -1.");
+ toi_writer_posn.saved_chain_number[slot] = -1;
+ return;
+ }
+
+ while (cur_chain) {
+ i++;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Saving chain %d (%p) "
+ "state, slot %d.", i, cur_chain, slot);
+
+ chain_state = &cur_chain->saved_state[slot];
+
+ chain_state->offset = cur_chain->blocks.current_offset;
+
+ if (toi_writer_posn.current_chain == cur_chain) {
+ toi_writer_posn.saved_chain_number[slot] = i;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "This is the chain "
+ "we were on => chain_num is %d.", i);
+ }
+
+ if (!cur_chain->blocks.current_extent) {
+ chain_state->extent_num = 0;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "No current extent "
+ "for this chain => extent_num %d is 0.",
+ i);
+ cur_chain = cur_chain->next;
+ continue;
+ }
+
+ extent = cur_chain->blocks.first;
+ chain_state->extent_num = 1;
+
+ while (extent != cur_chain->blocks.current_extent) {
+ chain_state->extent_num++;
+ extent = extent->next;
+ }
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "extent num %d is %d.", i,
+ chain_state->extent_num);
+
+ cur_chain = cur_chain->next;
+ }
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Completed saving extent state slot %d.", slot);
+}
+
+/**
+ * toi_extent_state_restore - restore the position saved by extent_state_save
+ * @state: State to populate
+ * @saved_state: Iterator saved to restore
+ **/
+void toi_extent_state_restore(int slot)
+{
+ int i = 0;
+ struct toi_bdev_info *cur_chain = prio_chain_head;
+ struct hibernate_extent_saved_state *chain_state;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "toi_extent_state_restore - slot %d.", slot);
+
+ if (toi_writer_posn.saved_chain_number[slot] == -1) {
+ toi_writer_posn.current_chain = NULL;
+ return;
+ }
+
+ while (cur_chain) {
+ int posn;
+ int j;
+ i++;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Restoring chain %d (%p) "
+ "state, slot %d.", i, cur_chain, slot);
+
+ chain_state = &cur_chain->saved_state[slot];
+
+ posn = chain_state->extent_num;
+
+ cur_chain->blocks.current_extent = cur_chain->blocks.first;
+ cur_chain->blocks.current_offset = chain_state->offset;
+
+ if (i == toi_writer_posn.saved_chain_number[slot]) {
+ toi_writer_posn.current_chain = cur_chain;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Found current chain.");
+ }
+
+ for (j = 0; j < 4; j++)
+ if (i == toi_writer_posn.saved_chain_number[j]) {
+ toi_writer_posn.saved_chain_ptr[j] = cur_chain;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Found saved chain ptr %d (%p) (offset"
+ " %d).", j, cur_chain,
+ cur_chain->saved_state[j].offset);
+ }
+
+ if (posn) {
+ while (--posn)
+ cur_chain->blocks.current_extent =
+ cur_chain->blocks.current_extent->next;
+ } else
+ cur_chain->blocks.current_extent = NULL;
+
+ cur_chain = cur_chain->next;
+ }
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Done.");
+ if (test_action_state(TOI_LOGALL))
+ dump_block_chains();
+}
+
+/*
+ * Storage needed
+ *
+ * Returns amount of space in the image header required
+ * for the chain data. This ignores the links between
+ * pages, which we factor in when allocating the space.
+ */
+int toi_bio_devinfo_storage_needed(void)
+{
+ int result = sizeof(num_chains);
+ struct toi_bdev_info *chain = prio_chain_head;
+
+ while (chain) {
+ result += metadata_size;
+
+ /* Chain size */
+ result += sizeof(int);
+
+ /* Extents */
+ result += (2 * sizeof(unsigned long) *
+ chain->blocks.num_extents);
+
+ chain = chain->next;
+ }
+
+ result += 4 * sizeof(int);
+ return result;
+}
+
+static unsigned long chain_pages_used(struct toi_bdev_info *chain)
+{
+ struct hibernate_extent *this = chain->blocks.first;
+ struct hibernate_extent_saved_state *state = &chain->saved_state[3];
+ unsigned long size = 0;
+ int extent_idx = 1;
+
+ if (!state->extent_num) {
+ if (!this)
+ return 0;
+ else
+ return chain->blocks.size;
+ }
+
+ while (extent_idx < state->extent_num) {
+ size += (this->end - this->start + 1);
+ this = this->next;
+ extent_idx++;
+ }
+
+ /* We didn't use the one we're sitting on, so don't count it */
+ return size + state->offset - this->start;
+}
+
+/**
+ * toi_serialise_extent_chain - write a chain in the image
+ * @chain: Chain to write.
+ **/
+static int toi_serialise_extent_chain(struct toi_bdev_info *chain)
+{
+ struct hibernate_extent *this;
+ int ret;
+ int i = 1;
+
+ chain->pages_used = chain_pages_used(chain);
+
+ if (test_action_state(TOI_LOGALL))
+ dump_block_chains();
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Serialising chain (dev_t %lx).",
+ chain->dev_t);
+ /* Device info - dev_t, prio, bmap_shift, blocks per page, positions */
+ ret = toiActiveAllocator->rw_header_chunk(WRITE, &toi_blockwriter_ops,
+ (char *) &chain->uuid, metadata_size);
+ if (ret)
+ return ret;
+
+ /* Num extents */
+ ret = toiActiveAllocator->rw_header_chunk(WRITE, &toi_blockwriter_ops,
+ (char *) &chain->blocks.num_extents, sizeof(int));
+ if (ret)
+ return ret;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "%d extents.",
+ chain->blocks.num_extents);
+
+ this = chain->blocks.first;
+ while (this) {
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Extent %d.", i);
+ ret = toiActiveAllocator->rw_header_chunk(WRITE,
+ &toi_blockwriter_ops,
+ (char *) this, 2 * sizeof(this->start));
+ if (ret)
+ return ret;
+ this = this->next;
+ i++;
+ }
+
+ return ret;
+}
+
+int toi_serialise_extent_chains(void)
+{
+ struct toi_bdev_info *this = prio_chain_head;
+ int result;
+
+ /* Write the number of chains */
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Write number of chains (%d)",
+ num_chains);
+ result = toiActiveAllocator->rw_header_chunk(WRITE,
+ &toi_blockwriter_ops, (char *) &num_chains,
+ sizeof(int));
+ if (result)
+ return result;
+
+ /* Then the chains themselves */
+ while (this) {
+ result = toi_serialise_extent_chain(this);
+ if (result)
+ return result;
+ this = this->next;
+ }
+
+ /*
+ * Finally, the chain we should be on at the start of each
+ * section.
+ */
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Saved chain numbers.");
+ result = toiActiveAllocator->rw_header_chunk(WRITE,
+ &toi_blockwriter_ops,
+ (char *) &toi_writer_posn.saved_chain_number[0],
+ 4 * sizeof(int));
+
+ return result;
+}
+
+int toi_register_storage_chain(struct toi_bdev_info *new)
+{
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Inserting chain %p into list.",
+ new);
+ toi_insert_chain_in_prio_list(new);
+ return 0;
+}
+
+static void free_bdev_info(struct toi_bdev_info *chain)
+{
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Free chain %p.", chain);
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, " - Block extents.");
+ toi_put_extent_chain(&chain->blocks);
+
+ /*
+ * The allocator may need to do more than just free the chains
+ * (swap_free, for example). Don't call from boot kernel.
+ */
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, " - Allocator extents.");
+ if (chain->allocator)
+ chain->allocator->bio_allocator_ops->free_storage(chain);
+
+ /*
+ * Dropping out of reading atomic copy? Need to undo
+ * toi_open_by_devnum.
+ */
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, " - Bdev.");
+ if (chain->bdev && !IS_ERR(chain->bdev) &&
+ chain->bdev != resume_block_device &&
+ chain->bdev != header_block_device &&
+ test_toi_state(TOI_TRYING_TO_RESUME))
+ toi_close_bdev(chain->bdev);
+
+ /* Poison */
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, " - Struct.");
+ toi_kfree(39, chain, sizeof(*chain));
+
+ if (prio_chain_head == chain)
+ prio_chain_head = NULL;
+
+ num_chains--;
+}
+
+void free_all_bdev_info(void)
+{
+ struct toi_bdev_info *this = prio_chain_head;
+
+ while (this) {
+ struct toi_bdev_info *next = this->next;
+ free_bdev_info(this);
+ this = next;
+ }
+
+ memset((char *) &toi_writer_posn, 0, sizeof(toi_writer_posn));
+ prio_chain_head = NULL;
+}
+
+static void set_up_start_position(void)
+{
+ toi_writer_posn.current_chain = prio_chain_head;
+ go_next_page(0, 0);
+}
+
+/**
+ * toi_load_extent_chain - read back a chain saved in the image
+ * @chain: Chain to load
+ *
+ * The linked list of extents is reconstructed from the disk. chain will point
+ * to the first entry.
+ **/
+int toi_load_extent_chain(int index, int *num_loaded)
+{
+ struct toi_bdev_info *chain = toi_kzalloc(39,
+ sizeof(struct toi_bdev_info), GFP_ATOMIC);
+ struct hibernate_extent *this, *last = NULL;
+ int i, ret;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Loading extent chain %d.", index);
+ /* Get dev_t, prio, bmap_shift, blocks per page, positions */
+ ret = toiActiveAllocator->rw_header_chunk_noreadahead(READ, NULL,
+ (char *) &chain->uuid, metadata_size);
+
+ if (ret) {
+ printk(KERN_ERR "Failed to read the size of extent chain.\n");
+ toi_kfree(39, chain, sizeof(*chain));
+ return 1;
+ }
+
+ toi_bkd.pages_used[index] = chain->pages_used;
+
+ ret = toiActiveAllocator->rw_header_chunk_noreadahead(READ, NULL,
+ (char *) &chain->blocks.num_extents, sizeof(int));
+ if (ret) {
+ printk(KERN_ERR "Failed to read the size of extent chain.\n");
+ toi_kfree(39, chain, sizeof(*chain));
+ return 1;
+ }
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "%d extents.",
+ chain->blocks.num_extents);
+
+ for (i = 0; i < chain->blocks.num_extents; i++) {
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Extent %d.", i + 1);
+
+ this = toi_kzalloc(2, sizeof(struct hibernate_extent),
+ TOI_ATOMIC_GFP);
+ if (!this) {
+ printk(KERN_INFO "Failed to allocate a new extent.\n");
+ free_bdev_info(chain);
+ return -ENOMEM;
+ }
+ this->next = NULL;
+ /* Get the next page */
+ ret = toiActiveAllocator->rw_header_chunk_noreadahead(READ,
+ NULL, (char *) this, 2 * sizeof(this->start));
+ if (ret) {
+ printk(KERN_INFO "Failed to read an extent.\n");
+ toi_kfree(2, this, sizeof(struct hibernate_extent));
+ free_bdev_info(chain);
+ return 1;
+ }
+
+ if (last)
+ last->next = this;
+ else {
+ char b1[32], b2[32], b3[32];
+ /*
+ * Open the bdev
+ */
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Chain dev_t is %s. Resume dev t is %s. Header"
+ " bdev_t is %s.\n",
+ format_dev_t(b1, chain->dev_t),
+ format_dev_t(b2, resume_dev_t),
+ format_dev_t(b3, toi_sig_data->header_dev_t));
+
+ if (chain->dev_t == resume_dev_t)
+ chain->bdev = resume_block_device;
+ else if (chain->dev_t == toi_sig_data->header_dev_t)
+ chain->bdev = header_block_device;
+ else {
+ chain->bdev = toi_open_bdev(chain->uuid,
+ chain->dev_t, 1);
+ if (IS_ERR(chain->bdev)) {
+ free_bdev_info(chain);
+ return -ENODEV;
+ }
+ }
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Chain bmap shift "
+ "is %d and blocks per page is %d.",
+ chain->bmap_shift,
+ chain->blocks_per_page);
+
+ chain->blocks.first = this;
+
+ /*
+ * Couldn't do this earlier, but can't do
+ * goto_start now - we may have already used blocks
+ * in the first chain.
+ */
+ chain->blocks.current_extent = this;
+ chain->blocks.current_offset = this->start;
+
+ /*
+ * Can't wait until we've read the whole chain
+ * before we insert it in the list. We might need
+ * this chain to read the next page in the header
+ */
+ toi_insert_chain_in_prio_list(chain);
+ }
+
+ /*
+ * We have to wait until 2 extents are loaded before setting up
+ * properly because if the first extent has only one page, we
+ * will need to put the position on the second extent. Sounds
+ * obvious, but it wasn't!
+ */
+ (*num_loaded)++;
+ if ((*num_loaded) == 2)
+ set_up_start_position();
+ last = this;
+ }
+
+ /*
+ * Shouldn't get empty chains, but it's not impossible. Link them in so
+ * they get freed properly later.
+ */
+ if (!chain->blocks.num_extents)
+ toi_insert_chain_in_prio_list(chain);
+
+ if (!chain->blocks.current_extent) {
+ chain->blocks.current_extent = chain->blocks.first;
+ if (chain->blocks.current_extent)
+ chain->blocks.current_offset =
+ chain->blocks.current_extent->start;
+ }
+ return 0;
+}
+
+int toi_load_extent_chains(void)
+{
+ int result;
+ int to_load;
+ int i;
+ int extents_loaded = 0;
+
+ result = toiActiveAllocator->rw_header_chunk_noreadahead(READ, NULL,
+ (char *) &to_load,
+ sizeof(int));
+ if (result)
+ return result;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "%d chains to read.", to_load);
+
+ for (i = 0; i < to_load; i++) {
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, " >> Loading chain %d/%d.",
+ i, to_load);
+ result = toi_load_extent_chain(i, &extents_loaded);
+ if (result)
+ return result;
+ }
+
+ /* If we never got to a second extent, we still need to do this. */
+ if (extents_loaded == 1)
+ set_up_start_position();
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Save chain numbers.");
+ result = toiActiveAllocator->rw_header_chunk_noreadahead(READ,
+ &toi_blockwriter_ops,
+ (char *) &toi_writer_posn.saved_chain_number[0],
+ 4 * sizeof(int));
+
+ return result;
+}
+
+static int toi_end_of_stream(int writing, int section_barrier)
+{
+ struct toi_bdev_info *cur_chain = toi_writer_posn.current_chain;
+ int compare_to = next_section[current_stream];
+ struct toi_bdev_info *compare_chain =
+ toi_writer_posn.saved_chain_ptr[compare_to];
+ int compare_offset = compare_chain ?
+ compare_chain->saved_state[compare_to].offset : 0;
+
+ if (!section_barrier)
+ return 0;
+
+ if (!cur_chain)
+ return 1;
+
+ if (cur_chain == compare_chain &&
+ cur_chain->blocks.current_offset == compare_offset) {
+ if (writing) {
+ if (!current_stream) {
+ debug_broken_header();
+ return 1;
+ }
+ } else {
+ more_readahead = 0;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Reached the end of stream %d "
+ "(not an error).", current_stream);
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * go_next_page - skip blocks to the start of the next page
+ * @writing: Whether we're reading or writing the image.
+ *
+ * Go forward one page.
+ **/
+int go_next_page(int writing, int section_barrier)
+{
+ struct toi_bdev_info *cur_chain = toi_writer_posn.current_chain;
+ int max = cur_chain ? cur_chain->blocks_per_page : 1;
+
+ /* Nope. Go foward a page - or maybe two. Don't stripe the header,
+ * so that bad fragmentation doesn't put the extent data containing
+ * the location of the second page out of the first header page.
+ */
+ if (toi_extent_state_next(max, current_stream)) {
+ /* Don't complain if readahead falls off the end */
+ if (writing && section_barrier) {
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Extent state eof. "
+ "Expected compression ratio too optimistic?");
+ if (test_action_state(TOI_LOGALL))
+ dump_block_chains();
+ }
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Ran out of extents to "
+ "read/write. (Not necessarily a fatal error.");
+ return -ENOSPC;
+ }
+
+ return 0;
+}
+
+int devices_of_same_priority(struct toi_bdev_info *this)
+{
+ struct toi_bdev_info *check = prio_chain_head;
+ int i = 0;
+
+ while (check) {
+ if (check->prio == this->prio)
+ i++;
+ check = check->next;
+ }
+
+ return i;
+}
+
+/**
+ * toi_bio_rw_page - do i/o on the next disk page in the image
+ * @writing: Whether reading or writing.
+ * @page: Page to do i/o on.
+ * @is_readahead: Whether we're doing readahead
+ * @free_group: The group used in allocating the page
+ *
+ * Submit a page for reading or writing, possibly readahead.
+ * Pass the group used in allocating the page as well, as it should
+ * be freed on completion of the bio if we're writing the page.
+ **/
+int toi_bio_rw_page(int writing, struct page *page,
+ int is_readahead, int free_group)
+{
+ int result = toi_end_of_stream(writing, 1);
+ struct toi_bdev_info *dev_info = toi_writer_posn.current_chain;
+
+ if (result) {
+ if (writing)
+ abort_hibernate(TOI_INSUFFICIENT_STORAGE,
+ "Insufficient storage for your image.");
+ else
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Seeking to "
+ "read/write another page when stream has "
+ "ended.");
+ return -ENOSPC;
+ }
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "%s %lx:%ld",
+ writing ? "Write" : "Read",
+ dev_info->dev_t, dev_info->blocks.current_offset);
+
+ result = toi_do_io(writing, dev_info->bdev,
+ dev_info->blocks.current_offset << dev_info->bmap_shift,
+ page, is_readahead, 0, free_group);
+
+ /* Ignore the result here - will check end of stream if come in again */
+ go_next_page(writing, 1);
+
+ if (result)
+ printk(KERN_ERR "toi_do_io returned %d.\n", result);
+ return result;
+}
+
+dev_t get_header_dev_t(void)
+{
+ return prio_chain_head->dev_t;
+}
+
+struct block_device *get_header_bdev(void)
+{
+ return prio_chain_head->bdev;
+}
+
+unsigned long get_headerblock(void)
+{
+ return prio_chain_head->blocks.first->start <<
+ prio_chain_head->bmap_shift;
+}
+
+int get_main_pool_phys_params(void)
+{
+ struct toi_bdev_info *this = prio_chain_head;
+ int result;
+
+ while (this) {
+ result = this->allocator->bio_allocator_ops->bmap(this);
+ if (result)
+ return result;
+ this = this->next;
+ }
+
+ return 0;
+}
+
+static int apply_header_reservation(void)
+{
+ int i;
+
+ if (!header_pages_reserved) {
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "No header pages reserved at the moment.");
+ return 0;
+ }
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Applying header reservation.");
+
+ /* Apply header space reservation */
+ toi_extent_state_goto_start();
+
+ for (i = 0; i < header_pages_reserved; i++)
+ if (go_next_page(1, 0))
+ return -ENOSPC;
+
+ /* The end of header pages will be the start of pageset 2 */
+ toi_extent_state_save(2);
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Finished applying header reservation.");
+ return 0;
+}
+
+static int toi_bio_register_storage(void)
+{
+ int result = 0;
+ struct toi_module_ops *this_module;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled ||
+ this_module->type != BIO_ALLOCATOR_MODULE)
+ continue;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Registering storage from %s.",
+ this_module->name);
+ result = this_module->bio_allocator_ops->register_storage();
+ if (result)
+ break;
+ }
+
+ return result;
+}
+
+int toi_bio_allocate_storage(unsigned long request)
+{
+ struct toi_bdev_info *chain = prio_chain_head;
+ unsigned long to_get = request;
+ unsigned long extra_pages, needed;
+ int no_free = 0;
+
+ if (!chain) {
+ int result = toi_bio_register_storage();
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "toi_bio_allocate_storage: "
+ "Registering storage.");
+ if (result)
+ return 0;
+ chain = prio_chain_head;
+ if (!chain) {
+ printk("TuxOnIce: No storage was registered.\n");
+ return 0;
+ }
+ }
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "toi_bio_allocate_storage: "
+ "Request is %lu pages.", request);
+ extra_pages = DIV_ROUND_UP(request * (sizeof(unsigned long)
+ + sizeof(int)), PAGE_SIZE);
+ needed = request + extra_pages + header_pages_reserved;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Adding %lu extra pages and %lu "
+ "for header => %lu.",
+ extra_pages, header_pages_reserved, needed);
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Already allocated %lu pages.",
+ raw_pages_allocd);
+
+ to_get = needed > raw_pages_allocd ? needed - raw_pages_allocd : 0;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Need to get %lu pages.", to_get);
+
+ if (!to_get)
+ return apply_header_reservation();
+
+ while (to_get && chain) {
+ int num_group = devices_of_same_priority(chain);
+ int divisor = num_group - no_free;
+ int i;
+ unsigned long portion = DIV_ROUND_UP(to_get, divisor);
+ unsigned long got = 0;
+ unsigned long got_this_round = 0;
+ struct toi_bdev_info *top = chain;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ " Start of loop. To get is %lu. Divisor is %d.",
+ to_get, divisor);
+ no_free = 0;
+
+ /*
+ * We're aiming to spread the allocated storage as evenly
+ * as possible, but we also want to get all the storage we
+ * can off this priority.
+ */
+ for (i = 0; i < num_group; i++) {
+ struct toi_bio_allocator_ops *ops =
+ chain->allocator->bio_allocator_ops;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ " Asking for %lu pages from chain %p.",
+ portion, chain);
+ got = ops->allocate_storage(chain, portion);
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ " Got %lu pages from allocator %p.",
+ got, chain);
+ if (!got)
+ no_free++;
+ got_this_round += got;
+ chain = chain->next;
+ }
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, " Loop finished. Got a "
+ "total of %lu pages from %d allocators.",
+ got_this_round, divisor - no_free);
+
+ raw_pages_allocd += got_this_round;
+ to_get = needed > raw_pages_allocd ? needed - raw_pages_allocd :
+ 0;
+
+ /*
+ * If we got anything from chains of this priority and we
+ * still have storage to allocate, go over this priority
+ * again.
+ */
+ if (got_this_round && to_get)
+ chain = top;
+ else
+ no_free = 0;
+ }
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Finished allocating. Calling "
+ "get_main_pool_phys_params");
+ /* Now let swap allocator bmap the pages */
+ get_main_pool_phys_params();
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Done. Reserving header.");
+ return apply_header_reservation();
+}
+
+void toi_bio_chains_post_atomic(struct toi_boot_kernel_data *bkd)
+{
+ int i = 0;
+ struct toi_bdev_info *cur_chain = prio_chain_head;
+
+ while (cur_chain) {
+ cur_chain->pages_used = bkd->pages_used[i];
+ cur_chain = cur_chain->next;
+ i++;
+ }
+}
+
+int toi_bio_chains_debug_info(char *buffer, int size)
+{
+ /* Show what we actually used */
+ struct toi_bdev_info *cur_chain = prio_chain_head;
+ int len = 0;
+
+ while (cur_chain) {
+ len += scnprintf(buffer + len, size - len, " Used %lu pages "
+ "from %s.\n", cur_chain->pages_used,
+ cur_chain->name);
+ cur_chain = cur_chain->next;
+ }
+
+ return len;
+}
diff --git a/kernel/power/tuxonice_bio_core.c b/kernel/power/tuxonice_bio_core.c
new file mode 100644
index 0000000..41d3505
--- /dev/null
+++ b/kernel/power/tuxonice_bio_core.c
@@ -0,0 +1,1839 @@
+/*
+ * kernel/power/tuxonice_bio.c
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * Distributed under GPLv2.
+ *
+ * This file contains block io functions for TuxOnIce. These are
+ * used by the swapwriter and it is planned that they will also
+ * be used by the NFSwriter.
+ *
+ */
+
+#include <linux/blkdev.h>
+#include <linux/syscalls.h>
+#include <linux/suspend.h>
+#include <linux/ctype.h>
+#include <linux/fs_uuid.h>
+
+#include "tuxonice.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_prepare_image.h"
+#include "tuxonice_bio.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_io.h"
+#include "tuxonice_builtin.h"
+#include "tuxonice_bio_internal.h"
+
+#define MEMORY_ONLY 1
+#define THROTTLE_WAIT 2
+
+/* #define MEASURE_MUTEX_CONTENTION */
+#ifndef MEASURE_MUTEX_CONTENTION
+#define my_mutex_lock(index, the_lock) mutex_lock(the_lock)
+#define my_mutex_unlock(index, the_lock) mutex_unlock(the_lock)
+#else
+unsigned long mutex_times[2][2][NR_CPUS];
+#define my_mutex_lock(index, the_lock) do { \
+ int have_mutex; \
+ have_mutex = mutex_trylock(the_lock); \
+ if (!have_mutex) { \
+ mutex_lock(the_lock); \
+ mutex_times[index][0][smp_processor_id()]++; \
+ } else { \
+ mutex_times[index][1][smp_processor_id()]++; \
+ }
+
+#define my_mutex_unlock(index, the_lock) \
+ mutex_unlock(the_lock); \
+} while (0)
+#endif
+
+static int page_idx, reset_idx;
+
+static int target_outstanding_io = 1024;
+static int max_outstanding_writes, max_outstanding_reads;
+
+static struct page *bio_queue_head, *bio_queue_tail;
+static atomic_t toi_bio_queue_size;
+static DEFINE_SPINLOCK(bio_queue_lock);
+
+static int free_mem_throttle, throughput_throttle;
+int more_readahead = 1;
+static struct page *readahead_list_head, *readahead_list_tail;
+
+static struct page *waiting_on;
+
+static atomic_t toi_io_in_progress, toi_io_done;
+static DECLARE_WAIT_QUEUE_HEAD(num_in_progress_wait);
+
+int current_stream;
+/* Not static, so that the allocators can setup and complete
+ * writing the header */
+char *toi_writer_buffer;
+int toi_writer_buffer_posn;
+
+static DEFINE_MUTEX(toi_bio_mutex);
+static DEFINE_MUTEX(toi_bio_readahead_mutex);
+
+static struct task_struct *toi_queue_flusher;
+static int toi_bio_queue_flush_pages(int dedicated_thread);
+
+struct toi_module_ops toi_blockwriter_ops;
+
+#define TOTAL_OUTSTANDING_IO (atomic_read(&toi_io_in_progress) + \
+ atomic_read(&toi_bio_queue_size))
+
+unsigned long raw_pages_allocd, header_pages_reserved;
+
+/**
+ * set_free_mem_throttle - set the point where we pause to avoid oom.
+ *
+ * Initially, this value is zero, but when we first fail to allocate memory,
+ * we set it (plus a buffer) and thereafter throttle i/o once that limit is
+ * reached.
+ **/
+static void set_free_mem_throttle(void)
+{
+ int new_throttle = nr_free_buffer_pages() + 256;
+
+ if (new_throttle > free_mem_throttle)
+ free_mem_throttle = new_throttle;
+}
+
+#define NUM_REASONS 7
+static atomic_t reasons[NUM_REASONS];
+static char *reason_name[NUM_REASONS] = {
+ "readahead not ready",
+ "bio allocation",
+ "synchronous I/O",
+ "toi_bio_get_new_page",
+ "memory low",
+ "readahead buffer allocation",
+ "throughput_throttle",
+};
+
+/* User Specified Parameters. */
+unsigned long resume_firstblock;
+dev_t resume_dev_t;
+struct block_device *resume_block_device;
+static atomic_t resume_bdev_open_count;
+
+struct block_device *header_block_device;
+
+/**
+ * toi_open_bdev: Open a bdev at resume time.
+ *
+ * index: The swap index. May be MAX_SWAPFILES for the resume_dev_t
+ * (the user can have resume= pointing at a swap partition/file that isn't
+ * swapon'd when they hibernate. MAX_SWAPFILES+1 for the first page of the
+ * header. It will be from a swap partition that was enabled when we hibernated,
+ * but we don't know it's real index until we read that first page.
+ * dev_t: The device major/minor.
+ * display_errs: Whether to try to do this quietly.
+ *
+ * We stored a dev_t in the image header. Open the matching device without
+ * requiring /dev/<whatever> in most cases and record the details needed
+ * to close it later and avoid duplicating work.
+ */
+struct block_device *toi_open_bdev(char *uuid, dev_t default_device,
+ int display_errs)
+{
+ struct block_device *bdev;
+ dev_t device = default_device;
+ char buf[32];
+ int retried = 0;
+
+retry:
+ if (uuid) {
+ struct fs_info seek;
+ strncpy((char *) &seek.uuid, uuid, 16);
+ seek.dev_t = 0;
+ seek.last_mount_size = 0;
+ device = blk_lookup_fs_info(&seek);
+ if (!device) {
+ device = default_device;
+ printk(KERN_DEBUG "Unable to resolve uuid. Falling back"
+ " to dev_t.\n");
+ } else
+ printk(KERN_DEBUG "Resolved uuid to device %s.\n",
+ format_dev_t(buf, device));
+ }
+
+ if (!device) {
+ printk(KERN_ERR "TuxOnIce attempting to open a "
+ "blank dev_t!\n");
+ dump_stack();
+ return NULL;
+ }
+ bdev = toi_open_by_devnum(device);
+
+ if (IS_ERR(bdev) || !bdev) {
+ if (!retried) {
+ retried = 1;
+ wait_for_device_probe();
+ goto retry;
+ }
+ if (display_errs)
+ toi_early_boot_message(1, TOI_CONTINUE_REQ,
+ "Failed to get access to block device "
+ "\"%x\" (error %d).\n Maybe you need "
+ "to run mknod and/or lvmsetup in an "
+ "initrd/ramfs?", device, bdev);
+ return ERR_PTR(-EINVAL);
+ }
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "TuxOnIce got bdev %p for dev_t %x.",
+ bdev, device);
+
+ return bdev;
+}
+
+static void toi_bio_reserve_header_space(unsigned long request)
+{
+ header_pages_reserved = request;
+}
+
+/**
+ * do_bio_wait - wait for some TuxOnIce I/O to complete
+ * @reason: The array index of the reason we're waiting.
+ *
+ * Wait for a particular page of I/O if we're after a particular page.
+ * If we're not after a particular page, wait instead for all in flight
+ * I/O to be completed or for us to have enough free memory to be able
+ * to submit more I/O.
+ *
+ * If we wait, we also update our statistics regarding why we waited.
+ **/
+static void do_bio_wait(int reason)
+{
+ struct page *was_waiting_on = waiting_on;
+
+ /* On SMP, waiting_on can be reset, so we make a copy */
+ if (was_waiting_on) {
+ wait_on_page_locked(was_waiting_on);
+ atomic_inc(&reasons[reason]);
+ } else {
+ atomic_inc(&reasons[reason]);
+
+ wait_event(num_in_progress_wait,
+ !atomic_read(&toi_io_in_progress) ||
+ nr_free_buffer_pages() > free_mem_throttle);
+ }
+}
+
+/**
+ * throttle_if_needed - wait for I/O completion if throttle points are reached
+ * @flags: What to check and how to act.
+ *
+ * Check whether we need to wait for some I/O to complete. We always check
+ * whether we have enough memory available, but may also (depending upon
+ * @reason) check if the throughput throttle limit has been reached.
+ **/
+static int throttle_if_needed(int flags)
+{
+ int free_pages = nr_free_buffer_pages();
+
+ /* Getting low on memory and I/O is in progress? */
+ while (unlikely(free_pages < free_mem_throttle) &&
+ atomic_read(&toi_io_in_progress) &&
+ !test_result_state(TOI_ABORTED)) {
+ if (!(flags & THROTTLE_WAIT))
+ return -ENOMEM;
+ do_bio_wait(4);
+ free_pages = nr_free_buffer_pages();
+ }
+
+ while (!(flags & MEMORY_ONLY) && throughput_throttle &&
+ TOTAL_OUTSTANDING_IO >= throughput_throttle &&
+ !test_result_state(TOI_ABORTED)) {
+ int result = toi_bio_queue_flush_pages(0);
+ if (result)
+ return result;
+ atomic_inc(&reasons[6]);
+ wait_event(num_in_progress_wait,
+ !atomic_read(&toi_io_in_progress) ||
+ TOTAL_OUTSTANDING_IO < throughput_throttle);
+ }
+
+ return 0;
+}
+
+/**
+ * update_throughput_throttle - update the raw throughput throttle
+ * @jif_index: The number of times this function has been called.
+ *
+ * This function is called four times per second by the core, and used to limit
+ * the amount of I/O we submit at once, spreading out our waiting through the
+ * whole job and letting userui get an opportunity to do its work.
+ *
+ * We don't start limiting I/O until 1/4s has gone so that we get a
+ * decent sample for our initial limit, and keep updating it because
+ * throughput may vary (on rotating media, eg) with our block number.
+ *
+ * We throttle to 1/10s worth of I/O.
+ **/
+static void update_throughput_throttle(int jif_index)
+{
+ int done = atomic_read(&toi_io_done);
+ throughput_throttle = done * 2 / 5 / jif_index;
+}
+
+/**
+ * toi_finish_all_io - wait for all outstanding i/o to complete
+ *
+ * Flush any queued but unsubmitted I/O and wait for it all to complete.
+ **/
+static int toi_finish_all_io(void)
+{
+ int result = toi_bio_queue_flush_pages(0);
+ toi_bio_queue_flusher_should_finish = 1;
+ wake_up(&toi_io_queue_flusher);
+ wait_event(num_in_progress_wait, !TOTAL_OUTSTANDING_IO);
+ return result;
+}
+
+/**
+ * toi_end_bio - bio completion function.
+ * @bio: bio that has completed.
+ * @err: Error value. Yes, like end_swap_bio_read, we ignore it.
+ *
+ * Function called by the block driver from interrupt context when I/O is
+ * completed. If we were writing the page, we want to free it and will have
+ * set bio->bi_private to the parameter we should use in telling the page
+ * allocation accounting code what the page was allocated for. If we're
+ * reading the page, it will be in the singly linked list made from
+ * page->private pointers.
+ **/
+static void toi_end_bio(struct bio *bio, int err)
+{
+ struct page *page = bio->bi_io_vec[0].bv_page;
+
+ BUG_ON(!test_bit(BIO_UPTODATE, &bio->bi_flags));
+
+ unlock_page(page);
+ bio_put(bio);
+
+ if (waiting_on == page)
+ waiting_on = NULL;
+
+ put_page(page);
+
+ if (bio->bi_private)
+ toi__free_page((int) ((unsigned long) bio->bi_private) , page);
+
+ bio_put(bio);
+
+ atomic_dec(&toi_io_in_progress);
+ atomic_inc(&toi_io_done);
+
+ wake_up(&num_in_progress_wait);
+}
+
+/**
+ * submit - submit BIO request
+ * @writing: READ or WRITE.
+ * @dev: The block device we're using.
+ * @first_block: The first sector we're using.
+ * @page: The page being used for I/O.
+ * @free_group: If writing, the group that was used in allocating the page
+ * and which will be used in freeing the page from the completion
+ * routine.
+ *
+ * Based on Patrick Mochell's pmdisk code from long ago: "Straight from the
+ * textbook - allocate and initialize the bio. If we're writing, make sure
+ * the page is marked as dirty. Then submit it and carry on."
+ *
+ * If we're just testing the speed of our own code, we fake having done all
+ * the hard work and all toi_end_bio immediately.
+ **/
+static int submit(int writing, struct block_device *dev, sector_t first_block,
+ struct page *page, int free_group)
+{
+ struct bio *bio = NULL;
+ int cur_outstanding_io, result;
+
+ /*
+ * Shouldn't throttle if reading - can deadlock in the single
+ * threaded case as pages are only freed when we use the
+ * readahead.
+ */
+ if (writing) {
+ result = throttle_if_needed(MEMORY_ONLY | THROTTLE_WAIT);
+ if (result)
+ return result;
+ }
+
+ while (!bio) {
+ bio = bio_alloc(TOI_ATOMIC_GFP, 1);
+ if (!bio) {
+ set_free_mem_throttle();
+ do_bio_wait(1);
+ }
+ }
+
+ bio->bi_bdev = dev;
+ bio->bi_iter.bi_sector = first_block;
+ bio->bi_private = (void *) ((unsigned long) free_group);
+ bio->bi_end_io = toi_end_bio;
+ bio->bi_flags |= (1 << BIO_TOI);
+
+ if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
+ printk(KERN_DEBUG "ERROR: adding page to bio at %lld\n",
+ (unsigned long long) first_block);
+ bio_put(bio);
+ return -EFAULT;
+ }
+
+ bio_get(bio);
+
+ cur_outstanding_io = atomic_add_return(1, &toi_io_in_progress);
+ if (writing) {
+ if (cur_outstanding_io > max_outstanding_writes)
+ max_outstanding_writes = cur_outstanding_io;
+ } else {
+ if (cur_outstanding_io > max_outstanding_reads)
+ max_outstanding_reads = cur_outstanding_io;
+ }
+
+
+ /* Still read the header! */
+ if (unlikely(test_action_state(TOI_TEST_BIO) && writing)) {
+ /* Fake having done the hard work */
+ set_bit(BIO_UPTODATE, &bio->bi_flags);
+ toi_end_bio(bio, 0);
+ } else
+ submit_bio(writing | REQ_SYNC, bio);
+
+ return 0;
+}
+
+/**
+ * toi_do_io: Prepare to do some i/o on a page and submit or batch it.
+ *
+ * @writing: Whether reading or writing.
+ * @bdev: The block device which we're using.
+ * @block0: The first sector we're reading or writing.
+ * @page: The page on which I/O is being done.
+ * @readahead_index: If doing readahead, the index (reset this flag when done).
+ * @syncio: Whether the i/o is being done synchronously.
+ *
+ * Prepare and start a read or write operation.
+ *
+ * Note that we always work with our own page. If writing, we might be given a
+ * compression buffer that will immediately be used to start compressing the
+ * next page. For reading, we do readahead and therefore don't know the final
+ * address where the data needs to go.
+ **/
+int toi_do_io(int writing, struct block_device *bdev, long block0,
+ struct page *page, int is_readahead, int syncio, int free_group)
+{
+ page->private = 0;
+
+ /* Do here so we don't race against toi_bio_get_next_page_read */
+ lock_page(page);
+
+ if (is_readahead) {
+ if (readahead_list_head)
+ readahead_list_tail->private = (unsigned long) page;
+ else
+ readahead_list_head = page;
+
+ readahead_list_tail = page;
+ }
+
+ /* Done before submitting to avoid races. */
+ if (syncio)
+ waiting_on = page;
+
+ /* Submit the page */
+ get_page(page);
+
+ if (submit(writing, bdev, block0, page, free_group))
+ return -EFAULT;
+
+ if (syncio)
+ do_bio_wait(2);
+
+ return 0;
+}
+
+/**
+ * toi_bdev_page_io - simpler interface to do directly i/o on a single page
+ * @writing: Whether reading or writing.
+ * @bdev: Block device on which we're operating.
+ * @pos: Sector at which page to read or write starts.
+ * @page: Page to be read/written.
+ *
+ * A simple interface to submit a page of I/O and wait for its completion.
+ * The caller must free the page used.
+ **/
+static int toi_bdev_page_io(int writing, struct block_device *bdev,
+ long pos, struct page *page)
+{
+ return toi_do_io(writing, bdev, pos, page, 0, 1, 0);
+}
+
+/**
+ * toi_bio_memory_needed - report the amount of memory needed for block i/o
+ *
+ * We want to have at least enough memory so as to have target_outstanding_io
+ * or more transactions on the fly at once. If we can do more, fine.
+ **/
+static int toi_bio_memory_needed(void)
+{
+ return target_outstanding_io * (PAGE_SIZE + sizeof(struct request) +
+ sizeof(struct bio));
+}
+
+/**
+ * toi_bio_print_debug_stats - put out debugging info in the buffer provided
+ * @buffer: A buffer of size @size into which text should be placed.
+ * @size: The size of @buffer.
+ *
+ * Fill a buffer with debugging info. This is used for both our debug_info sysfs
+ * entry and for recording the same info in dmesg.
+ **/
+static int toi_bio_print_debug_stats(char *buffer, int size)
+{
+ int len = 0;
+
+ if (toiActiveAllocator != &toi_blockwriter_ops) {
+ len = scnprintf(buffer, size,
+ "- Block I/O inactive.\n");
+ return len;
+ }
+
+ len = scnprintf(buffer, size, "- Block I/O active.\n");
+
+ len += toi_bio_chains_debug_info(buffer + len, size - len);
+
+ len += scnprintf(buffer + len, size - len,
+ "- Max outstanding reads %d. Max writes %d.\n",
+ max_outstanding_reads, max_outstanding_writes);
+
+ len += scnprintf(buffer + len, size - len,
+ " Memory_needed: %d x (%lu + %u + %u) = %d bytes.\n",
+ target_outstanding_io,
+ PAGE_SIZE, (unsigned int) sizeof(struct request),
+ (unsigned int) sizeof(struct bio), toi_bio_memory_needed());
+
+#ifdef MEASURE_MUTEX_CONTENTION
+ {
+ int i;
+
+ len += scnprintf(buffer + len, size - len,
+ " Mutex contention while reading:\n Contended Free\n");
+
+ for_each_online_cpu(i)
+ len += scnprintf(buffer + len, size - len,
+ " %9lu %9lu\n",
+ mutex_times[0][0][i], mutex_times[0][1][i]);
+
+ len += scnprintf(buffer + len, size - len,
+ " Mutex contention while writing:\n Contended Free\n");
+
+ for_each_online_cpu(i)
+ len += scnprintf(buffer + len, size - len,
+ " %9lu %9lu\n",
+ mutex_times[1][0][i], mutex_times[1][1][i]);
+
+ }
+#endif
+
+ return len + scnprintf(buffer + len, size - len,
+ " Free mem throttle point reached %d.\n", free_mem_throttle);
+}
+
+static int total_header_bytes;
+static int unowned;
+
+void debug_broken_header(void)
+{
+ printk(KERN_DEBUG "Image header too big for size allocated!\n");
+ print_toi_header_storage_for_modules();
+ printk(KERN_DEBUG "Page flags : %d.\n", toi_pageflags_space_needed());
+ printk(KERN_DEBUG "toi_header : %zu.\n", sizeof(struct toi_header));
+ printk(KERN_DEBUG "Total unowned : %d.\n", unowned);
+ printk(KERN_DEBUG "Total used : %d (%ld pages).\n", total_header_bytes,
+ DIV_ROUND_UP(total_header_bytes, PAGE_SIZE));
+ printk(KERN_DEBUG "Space needed now : %ld.\n",
+ get_header_storage_needed());
+ dump_block_chains();
+ abort_hibernate(TOI_HEADER_TOO_BIG, "Header reservation too small.");
+}
+
+/**
+ * toi_rw_init - prepare to read or write a stream in the image
+ * @writing: Whether reading or writing.
+ * @stream number: Section of the image being processed.
+ *
+ * Prepare to read or write a section ('stream') in the image.
+ **/
+static int toi_rw_init(int writing, int stream_number)
+{
+ if (stream_number)
+ toi_extent_state_restore(stream_number);
+ else
+ toi_extent_state_goto_start();
+
+ if (writing) {
+ reset_idx = 0;
+ if (!current_stream)
+ page_idx = 0;
+ } else {
+ reset_idx = 1;
+ }
+
+ atomic_set(&toi_io_done, 0);
+ if (!toi_writer_buffer)
+ toi_writer_buffer = (char *) toi_get_zeroed_page(11,
+ TOI_ATOMIC_GFP);
+ toi_writer_buffer_posn = writing ? 0 : PAGE_SIZE;
+
+ current_stream = stream_number;
+
+ more_readahead = 1;
+
+ return toi_writer_buffer ? 0 : -ENOMEM;
+}
+
+/**
+ * toi_bio_queue_write - queue a page for writing
+ * @full_buffer: Pointer to a page to be queued
+ *
+ * Add a page to the queue to be submitted. If we're the queue flusher,
+ * we'll do this once we've dropped toi_bio_mutex, so other threads can
+ * continue to submit I/O while we're on the slow path doing the actual
+ * submission.
+ **/
+static void toi_bio_queue_write(char **full_buffer)
+{
+ struct page *page = virt_to_page(*full_buffer);
+ unsigned long flags;
+
+ *full_buffer = NULL;
+ page->private = 0;
+
+ spin_lock_irqsave(&bio_queue_lock, flags);
+ if (!bio_queue_head)
+ bio_queue_head = page;
+ else
+ bio_queue_tail->private = (unsigned long) page;
+
+ bio_queue_tail = page;
+ atomic_inc(&toi_bio_queue_size);
+
+ spin_unlock_irqrestore(&bio_queue_lock, flags);
+ wake_up(&toi_io_queue_flusher);
+}
+
+/**
+ * toi_rw_cleanup - Cleanup after i/o.
+ * @writing: Whether we were reading or writing.
+ *
+ * Flush all I/O and clean everything up after reading or writing a
+ * section of the image.
+ **/
+static int toi_rw_cleanup(int writing)
+{
+ int i, result = 0;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "toi_rw_cleanup.");
+ if (writing) {
+ if (toi_writer_buffer_posn && !test_result_state(TOI_ABORTED))
+ toi_bio_queue_write(&toi_writer_buffer);
+
+ while (bio_queue_head && !result)
+ result = toi_bio_queue_flush_pages(0);
+
+ if (result)
+ return result;
+
+ if (current_stream == 2)
+ toi_extent_state_save(1);
+ else if (current_stream == 1)
+ toi_extent_state_save(3);
+ }
+
+ result = toi_finish_all_io();
+
+ while (readahead_list_head) {
+ void *next = (void *) readahead_list_head->private;
+ toi__free_page(12, readahead_list_head);
+ readahead_list_head = next;
+ }
+
+ readahead_list_tail = NULL;
+
+ if (!current_stream)
+ return result;
+
+ for (i = 0; i < NUM_REASONS; i++) {
+ if (!atomic_read(&reasons[i]))
+ continue;
+ printk(KERN_DEBUG "Waited for i/o due to %s %d times.\n",
+ reason_name[i], atomic_read(&reasons[i]));
+ atomic_set(&reasons[i], 0);
+ }
+
+ current_stream = 0;
+ return result;
+}
+
+/**
+ * toi_start_one_readahead - start one page of readahead
+ * @dedicated_thread: Is this a thread dedicated to doing readahead?
+ *
+ * Start one new page of readahead. If this is being called by a thread
+ * whose only just is to submit readahead, don't quit because we failed
+ * to allocate a page.
+ **/
+static int toi_start_one_readahead(int dedicated_thread)
+{
+ char *buffer = NULL;
+ int oom = 0, result;
+
+ result = throttle_if_needed(dedicated_thread ? THROTTLE_WAIT : 0);
+ if (result)
+ return result;
+
+ mutex_lock(&toi_bio_readahead_mutex);
+
+ while (!buffer) {
+ buffer = (char *) toi_get_zeroed_page(12,
+ TOI_ATOMIC_GFP);
+ if (!buffer) {
+ if (oom && !dedicated_thread) {
+ mutex_unlock(&toi_bio_readahead_mutex);
+ return -ENOMEM;
+ }
+
+ oom = 1;
+ set_free_mem_throttle();
+ do_bio_wait(5);
+ }
+ }
+
+ result = toi_bio_rw_page(READ, virt_to_page(buffer), 1, 0);
+ if (result == -ENOSPC)
+ toi__free_page(12, virt_to_page(buffer));
+ mutex_unlock(&toi_bio_readahead_mutex);
+ if (result) {
+ if (result == -ENOSPC)
+ toi_message(TOI_BIO, TOI_VERBOSE, 0,
+ "Last readahead page submitted.");
+ else
+ printk(KERN_DEBUG "toi_bio_rw_page returned %d.\n",
+ result);
+ }
+ return result;
+}
+
+/**
+ * toi_start_new_readahead - start new readahead
+ * @dedicated_thread: Are we dedicated to this task?
+ *
+ * Start readahead of image pages.
+ *
+ * We can be called as a thread dedicated to this task (may be helpful on
+ * systems with lots of CPUs), in which case we don't exit until there's no
+ * more readahead.
+ *
+ * If this is not called by a dedicated thread, we top up our queue until
+ * there's no more readahead to submit, we've submitted the number given
+ * in target_outstanding_io or the number in progress exceeds the target
+ * outstanding I/O value.
+ *
+ * No mutex needed because this is only ever called by the first cpu.
+ **/
+static int toi_start_new_readahead(int dedicated_thread)
+{
+ int last_result, num_submitted = 0;
+
+ /* Start a new readahead? */
+ if (!more_readahead)
+ return 0;
+
+ do {
+ last_result = toi_start_one_readahead(dedicated_thread);
+
+ if (last_result) {
+ if (last_result == -ENOMEM || last_result == -ENOSPC)
+ return 0;
+
+ printk(KERN_DEBUG
+ "Begin read chunk returned %d.\n",
+ last_result);
+ } else
+ num_submitted++;
+
+ } while (more_readahead && !last_result &&
+ (dedicated_thread ||
+ (num_submitted < target_outstanding_io &&
+ atomic_read(&toi_io_in_progress) < target_outstanding_io)));
+
+ return last_result;
+}
+
+/**
+ * bio_io_flusher - start the dedicated I/O flushing routine
+ * @writing: Whether we're writing the image.
+ **/
+static int bio_io_flusher(int writing)
+{
+
+ if (writing)
+ return toi_bio_queue_flush_pages(1);
+ else
+ return toi_start_new_readahead(1);
+}
+
+/**
+ * toi_bio_get_next_page_read - read a disk page, perhaps with readahead
+ * @no_readahead: Whether we can use readahead
+ *
+ * Read a page from disk, submitting readahead and cleaning up finished i/o
+ * while we wait for the page we're after.
+ **/
+static int toi_bio_get_next_page_read(int no_readahead)
+{
+ char *virt;
+ struct page *old_readahead_list_head;
+
+ /*
+ * When reading the second page of the header, we have to
+ * delay submitting the read until after we've gotten the
+ * extents out of the first page.
+ */
+ if (unlikely(no_readahead && toi_start_one_readahead(0))) {
+ printk(KERN_EMERG "No readahead and toi_start_one_readahead "
+ "returned non-zero.\n");
+ return -EIO;
+ }
+
+ if (unlikely(!readahead_list_head)) {
+ /*
+ * If the last page finishes exactly on the page
+ * boundary, we will be called one extra time and
+ * have no data to return. In this case, we should
+ * not BUG(), like we used to!
+ */
+ if (!more_readahead) {
+ printk(KERN_EMERG "No more readahead.\n");
+ return -ENOSPC;
+ }
+ if (unlikely(toi_start_one_readahead(0))) {
+ printk(KERN_EMERG "No readahead and "
+ "toi_start_one_readahead returned non-zero.\n");
+ return -EIO;
+ }
+ }
+
+ if (PageLocked(readahead_list_head)) {
+ waiting_on = readahead_list_head;
+ do_bio_wait(0);
+ }
+
+ virt = page_address(readahead_list_head);
+ memcpy(toi_writer_buffer, virt, PAGE_SIZE);
+
+ mutex_lock(&toi_bio_readahead_mutex);
+ old_readahead_list_head = readahead_list_head;
+ readahead_list_head = (struct page *) readahead_list_head->private;
+ mutex_unlock(&toi_bio_readahead_mutex);
+ toi__free_page(12, old_readahead_list_head);
+ return 0;
+}
+
+/**
+ * toi_bio_queue_flush_pages - flush the queue of pages queued for writing
+ * @dedicated_thread: Whether we're a dedicated thread
+ *
+ * Flush the queue of pages ready to be written to disk.
+ *
+ * If we're a dedicated thread, stay in here until told to leave,
+ * sleeping in wait_event.
+ *
+ * The first thread is normally the only one to come in here. Another
+ * thread can enter this routine too, though, via throttle_if_needed.
+ * Since that's the case, we must be careful to only have one thread
+ * doing this work at a time. Otherwise we have a race and could save
+ * pages out of order.
+ *
+ * If an error occurs, free all remaining pages without submitting them
+ * for I/O.
+ **/
+
+int toi_bio_queue_flush_pages(int dedicated_thread)
+{
+ unsigned long flags;
+ int result = 0;
+ static DEFINE_MUTEX(busy);
+
+ if (!mutex_trylock(&busy))
+ return 0;
+
+top:
+ spin_lock_irqsave(&bio_queue_lock, flags);
+ while (bio_queue_head) {
+ struct page *page = bio_queue_head;
+ bio_queue_head = (struct page *) page->private;
+ if (bio_queue_tail == page)
+ bio_queue_tail = NULL;
+ atomic_dec(&toi_bio_queue_size);
+ spin_unlock_irqrestore(&bio_queue_lock, flags);
+
+ /* Don't generate more error messages if already had one */
+ if (!result)
+ result = toi_bio_rw_page(WRITE, page, 0, 11);
+ /*
+ * If writing the page failed, don't drop out.
+ * Flush the rest of the queue too.
+ */
+ if (result)
+ toi__free_page(11 , page);
+ spin_lock_irqsave(&bio_queue_lock, flags);
+ }
+ spin_unlock_irqrestore(&bio_queue_lock, flags);
+
+ if (dedicated_thread) {
+ wait_event(toi_io_queue_flusher, bio_queue_head ||
+ toi_bio_queue_flusher_should_finish);
+ if (likely(!toi_bio_queue_flusher_should_finish))
+ goto top;
+ toi_bio_queue_flusher_should_finish = 0;
+ }
+
+ mutex_unlock(&busy);
+ return result;
+}
+
+/**
+ * toi_bio_get_new_page - get a new page for I/O
+ * @full_buffer: Pointer to a page to allocate.
+ **/
+static int toi_bio_get_new_page(char **full_buffer)
+{
+ int result = throttle_if_needed(THROTTLE_WAIT);
+ if (result)
+ return result;
+
+ while (!*full_buffer) {
+ *full_buffer = (char *) toi_get_zeroed_page(11, TOI_ATOMIC_GFP);
+ if (!*full_buffer) {
+ set_free_mem_throttle();
+ do_bio_wait(3);
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * toi_rw_buffer - combine smaller buffers into PAGE_SIZE I/O
+ * @writing: Bool - whether writing (or reading).
+ * @buffer: The start of the buffer to write or fill.
+ * @buffer_size: The size of the buffer to write or fill.
+ * @no_readahead: Don't try to start readhead (when getting extents).
+ **/
+static int toi_rw_buffer(int writing, char *buffer, int buffer_size,
+ int no_readahead)
+{
+ int bytes_left = buffer_size, result = 0;
+
+ while (bytes_left) {
+ char *source_start = buffer + buffer_size - bytes_left;
+ char *dest_start = toi_writer_buffer + toi_writer_buffer_posn;
+ int capacity = PAGE_SIZE - toi_writer_buffer_posn;
+ char *to = writing ? dest_start : source_start;
+ char *from = writing ? source_start : dest_start;
+
+ if (bytes_left <= capacity) {
+ memcpy(to, from, bytes_left);
+ toi_writer_buffer_posn += bytes_left;
+ return 0;
+ }
+
+ /* Complete this page and start a new one */
+ memcpy(to, from, capacity);
+ bytes_left -= capacity;
+
+ if (!writing) {
+ /*
+ * Perform actual I/O:
+ * read readahead_list_head into toi_writer_buffer
+ */
+ int result = toi_bio_get_next_page_read(no_readahead);
+ if (result) {
+ printk("toi_bio_get_next_page_read "
+ "returned %d.\n", result);
+ return result;
+ }
+ } else {
+ toi_bio_queue_write(&toi_writer_buffer);
+ result = toi_bio_get_new_page(&toi_writer_buffer);
+ if (result) {
+ printk(KERN_ERR "toi_bio_get_new_page returned "
+ "%d.\n", result);
+ return result;
+ }
+ }
+
+ toi_writer_buffer_posn = 0;
+ toi_cond_pause(0, NULL);
+ }
+
+ return 0;
+}
+
+/**
+ * toi_bio_read_page - read a page of the image
+ * @pfn: The pfn where the data belongs.
+ * @buffer_page: The page containing the (possibly compressed) data.
+ * @buf_size: The number of bytes on @buffer_page used (PAGE_SIZE).
+ *
+ * Read a (possibly compressed) page from the image, into buffer_page,
+ * returning its pfn and the buffer size.
+ **/
+static int toi_bio_read_page(unsigned long *pfn, int buf_type,
+ void *buffer_page, unsigned int *buf_size)
+{
+ int result = 0;
+ int this_idx;
+ char *buffer_virt = TOI_MAP(buf_type, buffer_page);
+
+ /*
+ * Only call start_new_readahead if we don't have a dedicated thread
+ * and we're the queue flusher.
+ */
+ if (current == toi_queue_flusher && more_readahead &&
+ !test_action_state(TOI_NO_READAHEAD)) {
+ int result2 = toi_start_new_readahead(0);
+ if (result2) {
+ printk(KERN_DEBUG "Queue flusher and "
+ "toi_start_one_readahead returned non-zero.\n");
+ result = -EIO;
+ goto out;
+ }
+ }
+
+ my_mutex_lock(0, &toi_bio_mutex);
+
+ /*
+ * Structure in the image:
+ * [destination pfn|page size|page data]
+ * buf_size is PAGE_SIZE
+ * We can validly find there's nothing to read in a multithreaded
+ * situation.
+ */
+ if (toi_rw_buffer(READ, (char *) &this_idx, sizeof(int), 0) ||
+ toi_rw_buffer(READ, (char *) pfn, sizeof(unsigned long), 0) ||
+ toi_rw_buffer(READ, (char *) buf_size, sizeof(int), 0) ||
+ toi_rw_buffer(READ, buffer_virt, *buf_size, 0)) {
+ result = -ENODATA;
+ goto out_unlock;
+ }
+
+ if (reset_idx) {
+ page_idx = this_idx;
+ reset_idx = 0;
+ } else {
+ page_idx++;
+ if (!this_idx)
+ result = -ENODATA;
+ else if (page_idx != this_idx)
+ printk(KERN_ERR "Got page index %d, expected %d.\n",
+ this_idx, page_idx);
+ }
+
+out_unlock:
+ my_mutex_unlock(0, &toi_bio_mutex);
+out:
+ TOI_UNMAP(buf_type, buffer_page);
+ return result;
+}
+
+/**
+ * toi_bio_write_page - write a page of the image
+ * @pfn: The pfn where the data belongs.
+ * @buffer_page: The page containing the (possibly compressed) data.
+ * @buf_size: The number of bytes on @buffer_page used.
+ *
+ * Write a (possibly compressed) page to the image from the buffer, together
+ * with it's index and buffer size.
+ **/
+static int toi_bio_write_page(unsigned long pfn, int buf_type,
+ void *buffer_page, unsigned int buf_size)
+{
+ char *buffer_virt;
+ int result = 0, result2 = 0;
+
+ if (unlikely(test_action_state(TOI_TEST_FILTER_SPEED)))
+ return 0;
+
+ my_mutex_lock(1, &toi_bio_mutex);
+
+ if (test_result_state(TOI_ABORTED)) {
+ my_mutex_unlock(1, &toi_bio_mutex);
+ return 0;
+ }
+
+ buffer_virt = TOI_MAP(buf_type, buffer_page);
+ page_idx++;
+
+ /*
+ * Structure in the image:
+ * [destination pfn|page size|page data]
+ * buf_size is PAGE_SIZE
+ */
+ if (toi_rw_buffer(WRITE, (char *) &page_idx, sizeof(int), 0) ||
+ toi_rw_buffer(WRITE, (char *) &pfn, sizeof(unsigned long), 0) ||
+ toi_rw_buffer(WRITE, (char *) &buf_size, sizeof(int), 0) ||
+ toi_rw_buffer(WRITE, buffer_virt, buf_size, 0)) {
+ printk(KERN_DEBUG "toi_rw_buffer returned non-zero to "
+ "toi_bio_write_page.\n");
+ result = -EIO;
+ }
+
+ TOI_UNMAP(buf_type, buffer_page);
+ my_mutex_unlock(1, &toi_bio_mutex);
+
+ if (current == toi_queue_flusher)
+ result2 = toi_bio_queue_flush_pages(0);
+
+ return result ? result : result2;
+}
+
+/**
+ * _toi_rw_header_chunk - read or write a portion of the image header
+ * @writing: Whether reading or writing.
+ * @owner: The module for which we're writing.
+ * Used for confirming that modules
+ * don't use more header space than they asked for.
+ * @buffer: Address of the data to write.
+ * @buffer_size: Size of the data buffer.
+ * @no_readahead: Don't try to start readhead (when getting extents).
+ *
+ * Perform PAGE_SIZE I/O. Start readahead if needed.
+ **/
+static int _toi_rw_header_chunk(int writing, struct toi_module_ops *owner,
+ char *buffer, int buffer_size, int no_readahead)
+{
+ int result = 0;
+
+ if (owner) {
+ owner->header_used += buffer_size;
+ toi_message(TOI_HEADER, TOI_LOW, 1,
+ "Header: %s : %d bytes (%d/%d) from offset %d.",
+ owner->name,
+ buffer_size, owner->header_used,
+ owner->header_requested,
+ toi_writer_buffer_posn);
+ if (owner->header_used > owner->header_requested && writing) {
+ printk(KERN_EMERG "TuxOnIce module %s is using more "
+ "header space (%u) than it requested (%u).\n",
+ owner->name,
+ owner->header_used,
+ owner->header_requested);
+ return buffer_size;
+ }
+ } else {
+ unowned += buffer_size;
+ toi_message(TOI_HEADER, TOI_LOW, 1,
+ "Header: (No owner): %d bytes (%d total so far) from "
+ "offset %d.", buffer_size, unowned,
+ toi_writer_buffer_posn);
+ }
+
+ if (!writing && !no_readahead && more_readahead) {
+ result = toi_start_new_readahead(0);
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Start new readahead "
+ "returned %d.", result);
+ }
+
+ if (!result) {
+ result = toi_rw_buffer(writing, buffer, buffer_size,
+ no_readahead);
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "rw_buffer returned "
+ "%d.", result);
+ }
+
+ total_header_bytes += buffer_size;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "_toi_rw_header_chunk returning "
+ "%d.", result);
+ return result;
+}
+
+static int toi_rw_header_chunk(int writing, struct toi_module_ops *owner,
+ char *buffer, int size)
+{
+ return _toi_rw_header_chunk(writing, owner, buffer, size, 1);
+}
+
+static int toi_rw_header_chunk_noreadahead(int writing,
+ struct toi_module_ops *owner, char *buffer, int size)
+{
+ return _toi_rw_header_chunk(writing, owner, buffer, size, 1);
+}
+
+/**
+ * toi_bio_storage_needed - get the amount of storage needed for my fns
+ **/
+static int toi_bio_storage_needed(void)
+{
+ return sizeof(int) + PAGE_SIZE + toi_bio_devinfo_storage_needed();
+}
+
+/**
+ * toi_bio_save_config_info - save block I/O config to image header
+ * @buf: PAGE_SIZE'd buffer into which data should be saved.
+ **/
+static int toi_bio_save_config_info(char *buf)
+{
+ int *ints = (int *) buf;
+ ints[0] = target_outstanding_io;
+ return sizeof(int);
+}
+
+/**
+ * toi_bio_load_config_info - restore block I/O config
+ * @buf: Data to be reloaded.
+ * @size: Size of the buffer saved.
+ **/
+static void toi_bio_load_config_info(char *buf, int size)
+{
+ int *ints = (int *) buf;
+ target_outstanding_io = ints[0];
+}
+
+void close_resume_dev_t(int force)
+{
+ if (!resume_block_device)
+ return;
+
+ if (force)
+ atomic_set(&resume_bdev_open_count, 0);
+ else
+ atomic_dec(&resume_bdev_open_count);
+
+ if (!atomic_read(&resume_bdev_open_count)) {
+ toi_close_bdev(resume_block_device);
+ resume_block_device = NULL;
+ }
+}
+
+int open_resume_dev_t(int force, int quiet)
+{
+ if (force) {
+ close_resume_dev_t(1);
+ atomic_set(&resume_bdev_open_count, 1);
+ } else
+ atomic_inc(&resume_bdev_open_count);
+
+ if (resume_block_device)
+ return 0;
+
+ resume_block_device = toi_open_bdev(NULL, resume_dev_t, 0);
+ if (IS_ERR(resume_block_device)) {
+ if (!quiet)
+ toi_early_boot_message(1, TOI_CONTINUE_REQ,
+ "Failed to open device %x, where"
+ " the header should be found.",
+ resume_dev_t);
+ resume_block_device = NULL;
+ atomic_set(&resume_bdev_open_count, 0);
+ return 1;
+ }
+
+ return 0;
+}
+
+/**
+ * toi_bio_initialise - initialise bio code at start of some action
+ * @starting_cycle: Whether starting a hibernation cycle, or just reading or
+ * writing a sysfs value.
+ **/
+static int toi_bio_initialise(int starting_cycle)
+{
+ int result;
+
+ if (!starting_cycle || !resume_dev_t)
+ return 0;
+
+ max_outstanding_writes = 0;
+ max_outstanding_reads = 0;
+ current_stream = 0;
+ toi_queue_flusher = current;
+#ifdef MEASURE_MUTEX_CONTENTION
+ {
+ int i, j, k;
+
+ for (i = 0; i < 2; i++)
+ for (j = 0; j < 2; j++)
+ for_each_online_cpu(k)
+ mutex_times[i][j][k] = 0;
+ }
+#endif
+ result = open_resume_dev_t(0, 1);
+
+ if (result)
+ return result;
+
+ return get_signature_page();
+}
+
+static unsigned long raw_to_real(unsigned long raw)
+{
+ unsigned long extra;
+
+ extra = (raw * (sizeof(unsigned long) + sizeof(int)) +
+ (PAGE_SIZE + sizeof(unsigned long) + sizeof(int) + 1)) /
+ (PAGE_SIZE + sizeof(unsigned long) + sizeof(int));
+
+ return raw > extra ? raw - extra : 0;
+}
+
+static unsigned long toi_bio_storage_available(void)
+{
+ unsigned long sum = 0;
+ struct toi_module_ops *this_module;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled ||
+ this_module->type != BIO_ALLOCATOR_MODULE)
+ continue;
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Seeking storage "
+ "available from %s.", this_module->name);
+ sum += this_module->bio_allocator_ops->storage_available();
+ }
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Total storage available is %lu "
+ "pages (%d header pages).", sum, header_pages_reserved);
+
+ return sum > header_pages_reserved ?
+ raw_to_real(sum - header_pages_reserved) : 0;
+
+}
+
+static unsigned long toi_bio_storage_allocated(void)
+{
+ return raw_pages_allocd > header_pages_reserved ?
+ raw_to_real(raw_pages_allocd - header_pages_reserved) : 0;
+}
+
+/*
+ * If we have read part of the image, we might have filled memory with
+ * data that should be zeroed out.
+ */
+static void toi_bio_noresume_reset(void)
+{
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "toi_bio_noresume_reset.");
+ toi_rw_cleanup(READ);
+ free_all_bdev_info();
+}
+
+/**
+ * toi_bio_cleanup - cleanup after some action
+ * @finishing_cycle: Whether completing a cycle.
+ **/
+static void toi_bio_cleanup(int finishing_cycle)
+{
+ if (!finishing_cycle)
+ return;
+
+ if (toi_writer_buffer) {
+ toi_free_page(11, (unsigned long) toi_writer_buffer);
+ toi_writer_buffer = NULL;
+ }
+
+ forget_signature_page();
+
+ if (header_block_device && toi_sig_data &&
+ toi_sig_data->header_dev_t != resume_dev_t)
+ toi_close_bdev(header_block_device);
+
+ header_block_device = NULL;
+
+ close_resume_dev_t(0);
+}
+
+static int toi_bio_write_header_init(void)
+{
+ int result;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "toi_bio_write_header_init");
+ toi_rw_init(WRITE, 0);
+ toi_writer_buffer_posn = 0;
+
+ /* Info needed to bootstrap goes at the start of the header.
+ * First we save the positions and devinfo, including the number
+ * of header pages. Then we save the structs containing data needed
+ * for reading the header pages back.
+ * Note that even if header pages take more than one page, when we
+ * read back the info, we will have restored the location of the
+ * next header page by the time we go to use it.
+ */
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "serialise extent chains.");
+ result = toi_serialise_extent_chains();
+
+ if (result)
+ return result;
+
+ /*
+ * Signature page hasn't been modified at this point. Write it in
+ * the header so we can restore it later.
+ */
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "serialise signature page.");
+ return toi_rw_header_chunk_noreadahead(WRITE, &toi_blockwriter_ops,
+ (char *) toi_cur_sig_page,
+ PAGE_SIZE);
+}
+
+static int toi_bio_write_header_cleanup(void)
+{
+ int result = 0;
+
+ if (toi_writer_buffer_posn)
+ toi_bio_queue_write(&toi_writer_buffer);
+
+ result = toi_finish_all_io();
+
+ unowned = 0;
+ total_header_bytes = 0;
+
+ /* Set signature to save we have an image */
+ if (!result)
+ result = toi_bio_mark_have_image();
+
+ return result;
+}
+
+/*
+ * toi_bio_read_header_init()
+ *
+ * Description:
+ * 1. Attempt to read the device specified with resume=.
+ * 2. Check the contents of the swap header for our signature.
+ * 3. Warn, ignore, reset and/or continue as appropriate.
+ * 4. If continuing, read the toi_swap configuration section
+ * of the header and set up block device info so we can read
+ * the rest of the header & image.
+ *
+ * Returns:
+ * May not return if user choose to reboot at a warning.
+ * -EINVAL if cannot resume at this time. Booting should continue
+ * normally.
+ */
+
+static int toi_bio_read_header_init(void)
+{
+ int result = 0;
+ char buf[32];
+
+ toi_writer_buffer_posn = 0;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "toi_bio_read_header_init");
+
+ if (!toi_sig_data) {
+ printk(KERN_INFO "toi_bio_read_header_init called when we "
+ "haven't verified there is an image!\n");
+ return -EINVAL;
+ }
+
+ /*
+ * If the header is not on the resume_swap_dev_t, get the resume device
+ * first.
+ */
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "Header dev_t is %lx.",
+ toi_sig_data->header_dev_t);
+ if (toi_sig_data->have_uuid) {
+ struct fs_info seek;
+ dev_t device;
+
+ strncpy((char *) seek.uuid, toi_sig_data->header_uuid, 16);
+ seek.dev_t = toi_sig_data->header_dev_t;
+ seek.last_mount_size = 0;
+ device = blk_lookup_fs_info(&seek);
+ if (device) {
+ printk("Using dev_t %s, returned by blk_lookup_fs_info.\n",
+ format_dev_t(buf, device));
+ toi_sig_data->header_dev_t = device;
+ }
+ }
+ if (toi_sig_data->header_dev_t != resume_dev_t) {
+ header_block_device = toi_open_bdev(NULL,
+ toi_sig_data->header_dev_t, 1);
+
+ if (IS_ERR(header_block_device))
+ return PTR_ERR(header_block_device);
+ } else
+ header_block_device = resume_block_device;
+
+ if (!toi_writer_buffer)
+ toi_writer_buffer = (char *) toi_get_zeroed_page(11,
+ TOI_ATOMIC_GFP);
+ more_readahead = 1;
+
+ /*
+ * Read toi_swap configuration.
+ * Headerblock size taken into account already.
+ */
+ result = toi_bio_ops.bdev_page_io(READ, header_block_device,
+ toi_sig_data->first_header_block,
+ virt_to_page((unsigned long) toi_writer_buffer));
+ if (result)
+ return result;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "load extent chains.");
+ result = toi_load_extent_chains();
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "load original signature page.");
+ toi_orig_sig_page = (char *) toi_get_zeroed_page(38, TOI_ATOMIC_GFP);
+ if (!toi_orig_sig_page) {
+ printk(KERN_ERR "Failed to allocate memory for the current"
+ " image signature.\n");
+ return -ENOMEM;
+ }
+
+ return toi_rw_header_chunk_noreadahead(READ, &toi_blockwriter_ops,
+ (char *) toi_orig_sig_page,
+ PAGE_SIZE);
+}
+
+static int toi_bio_read_header_cleanup(void)
+{
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "toi_bio_read_header_cleanup.");
+ return toi_rw_cleanup(READ);
+}
+
+/* Works only for digits and letters, but small and fast */
+#define TOLOWER(x) ((x) | 0x20)
+
+/*
+ * UUID must be 32 chars long. It may have dashes, but nothing
+ * else.
+ */
+char *uuid_from_commandline(char *commandline)
+{
+ int low = 0;
+ char *result = NULL, *output, *ptr;
+
+ if (strncmp(commandline, "UUID=", 5))
+ return NULL;
+
+ result = kzalloc(17, GFP_KERNEL);
+ if (!result) {
+ printk("Failed to kzalloc UUID text memory.\n");
+ return NULL;
+ }
+
+ ptr = commandline + 5;
+ output = result;
+
+ while (*ptr && (output - result) < 16) {
+ if (isxdigit(*ptr)) {
+ int value = isdigit(*ptr) ? *ptr - '0' :
+ TOLOWER(*ptr) - 'a' + 10;
+ if (low) {
+ *output += value;
+ output++;
+ } else {
+ *output = value << 4;
+ }
+ low = !low;
+ } else if (*ptr != '-')
+ break;
+ ptr++;
+ }
+
+ if ((output - result) < 16 || *ptr) {
+ printk(KERN_DEBUG "Found resume=UUID=, but the value looks "
+ "invalid.\n");
+ kfree(result);
+ result = NULL;
+ }
+
+ return result;
+}
+
+#define retry_if_fails(command) \
+do { \
+ command; \
+ if (!resume_dev_t && !waited_for_device_probe) { \
+ wait_for_device_probe(); \
+ command; \
+ waited_for_device_probe = 1; \
+ } \
+} while(0)
+
+/**
+ * try_to_open_resume_device: Try to parse and open resume=
+ *
+ * Any "swap:" has been stripped away and we just have the path to deal with.
+ * We attempt to do name_to_dev_t, open and stat the file. Having opened the
+ * file, get the struct block_device * to match.
+ */
+static int try_to_open_resume_device(char *commandline, int quiet)
+{
+ struct kstat stat;
+ int error = 0;
+ char *uuid = uuid_from_commandline(commandline);
+ int waited_for_device_probe = 0;
+
+ resume_dev_t = MKDEV(0, 0);
+
+ if (!strlen(commandline))
+ retry_if_fails(toi_bio_scan_for_image(quiet));
+
+ if (uuid) {
+ struct fs_info seek;
+ strncpy((char *) &seek.uuid, uuid, 16);
+ seek.dev_t = resume_dev_t;
+ seek.last_mount_size = 0;
+ retry_if_fails(resume_dev_t = blk_lookup_fs_info(&seek));
+ kfree(uuid);
+ }
+
+ if (!resume_dev_t)
+ retry_if_fails(resume_dev_t = name_to_dev_t(commandline));
+
+ if (!resume_dev_t) {
+ struct file *file = filp_open(commandline,
+ O_RDONLY|O_LARGEFILE, 0);
+
+ if (!IS_ERR(file) && file) {
+ vfs_getattr(&file->f_path, &stat);
+ filp_close(file, NULL);
+ } else
+ error = vfs_stat(commandline, &stat);
+ if (!error)
+ resume_dev_t = stat.rdev;
+ }
+
+ if (!resume_dev_t) {
+ if (quiet)
+ return 1;
+
+ if (test_toi_state(TOI_TRYING_TO_RESUME))
+ toi_early_boot_message(1, toi_translate_err_default,
+ "Failed to translate \"%s\" into a device id.\n",
+ commandline);
+ else
+ printk("TuxOnIce: Can't translate \"%s\" into a device "
+ "id yet.\n", commandline);
+ return 1;
+ }
+
+ return open_resume_dev_t(1, quiet);
+}
+
+/*
+ * Parse Image Location
+ *
+ * Attempt to parse a resume= parameter.
+ * Swap Writer accepts:
+ * resume=[swap:|file:]DEVNAME[:FIRSTBLOCK][@BLOCKSIZE]
+ *
+ * Where:
+ * DEVNAME is convertable to a dev_t by name_to_dev_t
+ * FIRSTBLOCK is the location of the first block in the swap file
+ * (specifying for a swap partition is nonsensical but not prohibited).
+ * Data is validated by attempting to read a swap header from the
+ * location given. Failure will result in toi_swap refusing to
+ * save an image, and a reboot with correct parameters will be
+ * necessary.
+ */
+static int toi_bio_parse_sig_location(char *commandline,
+ int only_allocator, int quiet)
+{
+ char *thischar, *devstart, *colon = NULL;
+ int signature_found, result = -EINVAL, temp_result = 0;
+
+ if (strncmp(commandline, "swap:", 5) &&
+ strncmp(commandline, "file:", 5)) {
+ /*
+ * Failing swap:, we'll take a simple resume=/dev/hda2, or a
+ * blank value (scan) but fall through to other allocators
+ * if /dev/ or UUID= isn't matched.
+ */
+ if (strncmp(commandline, "/dev/", 5) &&
+ strncmp(commandline, "UUID=", 5) &&
+ strlen(commandline))
+ return 1;
+ } else
+ commandline += 5;
+
+ devstart = commandline;
+ thischar = commandline;
+ while ((*thischar != ':') && (*thischar != '@') &&
+ ((thischar - commandline) < 250) && (*thischar))
+ thischar++;
+
+ if (*thischar == ':') {
+ colon = thischar;
+ *colon = 0;
+ thischar++;
+ }
+
+ while ((thischar - commandline) < 250 && *thischar)
+ thischar++;
+
+ if (colon) {
+ unsigned long block;
+ temp_result = strict_strtoul(colon + 1, 0, &block);
+ if (!temp_result)
+ resume_firstblock = (int) block;
+ } else
+ resume_firstblock = 0;
+
+ clear_toi_state(TOI_CAN_HIBERNATE);
+ clear_toi_state(TOI_CAN_RESUME);
+
+ if (!temp_result)
+ temp_result = try_to_open_resume_device(devstart, quiet);
+
+ if (colon)
+ *colon = ':';
+
+ /* No error if we only scanned */
+ if (temp_result)
+ return strlen(commandline) ? -EINVAL : 1;
+
+ signature_found = toi_bio_image_exists(quiet);
+
+ if (signature_found != -1) {
+ result = 0;
+ /*
+ * TODO: If only file storage, CAN_HIBERNATE should only be
+ * set if file allocator's target is valid.
+ */
+ set_toi_state(TOI_CAN_HIBERNATE);
+ set_toi_state(TOI_CAN_RESUME);
+ } else
+ if (!quiet)
+ printk(KERN_ERR "TuxOnIce: Block I/O: No "
+ "signature found at %s.\n", devstart);
+
+ return result;
+}
+
+static void toi_bio_release_storage(void)
+{
+ header_pages_reserved = 0;
+ raw_pages_allocd = 0;
+
+ free_all_bdev_info();
+}
+
+/* toi_swap_remove_image
+ *
+ */
+static int toi_bio_remove_image(void)
+{
+ int result;
+
+ toi_message(TOI_BIO, TOI_VERBOSE, 0, "toi_bio_remove_image.");
+
+ result = toi_bio_restore_original_signature();
+
+ /*
+ * We don't do a sanity check here: we want to restore the swap
+ * whatever version of kernel made the hibernate image.
+ *
+ * We need to write swap, but swap may not be enabled so
+ * we write the device directly
+ *
+ * If we don't have an current_signature_page, we didn't
+ * read an image header, so don't change anything.
+ */
+
+ toi_bio_release_storage();
+
+ return result;
+}
+
+struct toi_bio_ops toi_bio_ops = {
+ .bdev_page_io = toi_bdev_page_io,
+ .register_storage = toi_register_storage_chain,
+ .free_storage = toi_bio_release_storage,
+};
+EXPORT_SYMBOL_GPL(toi_bio_ops);
+
+static struct toi_sysfs_data sysfs_params[] = {
+ SYSFS_INT("target_outstanding_io", SYSFS_RW, &target_outstanding_io,
+ 0, 16384, 0, NULL),
+};
+
+struct toi_module_ops toi_blockwriter_ops = {
+ .type = WRITER_MODULE,
+ .name = "block i/o",
+ .directory = "block_io",
+ .module = THIS_MODULE,
+ .memory_needed = toi_bio_memory_needed,
+ .print_debug_info = toi_bio_print_debug_stats,
+ .storage_needed = toi_bio_storage_needed,
+ .save_config_info = toi_bio_save_config_info,
+ .load_config_info = toi_bio_load_config_info,
+ .initialise = toi_bio_initialise,
+ .cleanup = toi_bio_cleanup,
+ .post_atomic_restore = toi_bio_chains_post_atomic,
+
+ .rw_init = toi_rw_init,
+ .rw_cleanup = toi_rw_cleanup,
+ .read_page = toi_bio_read_page,
+ .write_page = toi_bio_write_page,
+ .rw_header_chunk = toi_rw_header_chunk,
+ .rw_header_chunk_noreadahead = toi_rw_header_chunk_noreadahead,
+ .io_flusher = bio_io_flusher,
+ .update_throughput_throttle = update_throughput_throttle,
+ .finish_all_io = toi_finish_all_io,
+
+ .noresume_reset = toi_bio_noresume_reset,
+ .storage_available = toi_bio_storage_available,
+ .storage_allocated = toi_bio_storage_allocated,
+ .reserve_header_space = toi_bio_reserve_header_space,
+ .allocate_storage = toi_bio_allocate_storage,
+ .image_exists = toi_bio_image_exists,
+ .mark_resume_attempted = toi_bio_mark_resume_attempted,
+ .write_header_init = toi_bio_write_header_init,
+ .write_header_cleanup = toi_bio_write_header_cleanup,
+ .read_header_init = toi_bio_read_header_init,
+ .read_header_cleanup = toi_bio_read_header_cleanup,
+ .get_header_version = toi_bio_get_header_version,
+ .remove_image = toi_bio_remove_image,
+ .parse_sig_location = toi_bio_parse_sig_location,
+
+ .sysfs_data = sysfs_params,
+ .num_sysfs_entries = sizeof(sysfs_params) /
+ sizeof(struct toi_sysfs_data),
+};
+
+/**
+ * toi_block_io_load - load time routine for block I/O module
+ *
+ * Register block i/o ops and sysfs entries.
+ **/
+static __init int toi_block_io_load(void)
+{
+ return toi_register_module(&toi_blockwriter_ops);
+}
+
+#ifdef MODULE
+static __exit void toi_block_io_unload(void)
+{
+ toi_unregister_module(&toi_blockwriter_ops);
+}
+
+module_init(toi_block_io_load);
+module_exit(toi_block_io_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("TuxOnIce block io functions");
+#else
+late_initcall(toi_block_io_load);
+#endif
diff --git a/kernel/power/tuxonice_bio_internal.h b/kernel/power/tuxonice_bio_internal.h
new file mode 100644
index 0000000..b09e176
--- /dev/null
+++ b/kernel/power/tuxonice_bio_internal.h
@@ -0,0 +1,86 @@
+/*
+ * kernel/power/tuxonice_bio_internal.h
+ *
+ * Copyright (C) 2009-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * Distributed under GPLv2.
+ *
+ * This file contains declarations for functions exported from
+ * tuxonice_bio.c, which contains low level io functions.
+ */
+
+/* Extent chains */
+void toi_extent_state_goto_start(void);
+void toi_extent_state_save(int slot);
+int go_next_page(int writing, int section_barrier);
+void toi_extent_state_restore(int slot);
+void free_all_bdev_info(void);
+int devices_of_same_priority(struct toi_bdev_info *this);
+int toi_register_storage_chain(struct toi_bdev_info *new);
+int toi_serialise_extent_chains(void);
+int toi_load_extent_chains(void);
+int toi_bio_rw_page(int writing, struct page *page, int is_readahead,
+ int free_group);
+int toi_bio_restore_original_signature(void);
+int toi_bio_devinfo_storage_needed(void);
+unsigned long get_headerblock(void);
+dev_t get_header_dev_t(void);
+struct block_device *get_header_bdev(void);
+int toi_bio_allocate_storage(unsigned long request);
+
+/* Signature functions */
+#define HaveImage "HaveImage"
+#define NoImage "TuxOnIce"
+#define sig_size (sizeof(HaveImage))
+
+struct sig_data {
+ char sig[sig_size];
+ int have_image;
+ int resumed_before;
+
+ char have_uuid;
+ char header_uuid[17];
+ dev_t header_dev_t;
+ unsigned long first_header_block;
+
+ /* Repeat the signature to be sure we have a header version */
+ char sig2[sig_size];
+ int header_version;
+};
+
+void forget_signature_page(void);
+int toi_check_for_signature(void);
+int toi_bio_image_exists(int quiet);
+int get_signature_page(void);
+int toi_bio_mark_resume_attempted(int);
+extern char *toi_cur_sig_page;
+extern char *toi_orig_sig_page;
+int toi_bio_mark_have_image(void);
+extern struct sig_data *toi_sig_data;
+extern dev_t resume_dev_t;
+extern struct block_device *resume_block_device;
+extern struct block_device *header_block_device;
+extern unsigned long resume_firstblock;
+
+struct block_device *open_bdev(dev_t device, int display_errs);
+extern int current_stream;
+extern int more_readahead;
+int toi_do_io(int writing, struct block_device *bdev, long block0,
+ struct page *page, int is_readahead, int syncio, int free_group);
+int get_main_pool_phys_params(void);
+
+void toi_close_bdev(struct block_device *bdev);
+struct block_device *toi_open_bdev(char *uuid, dev_t default_device,
+ int display_errs);
+
+extern struct toi_module_ops toi_blockwriter_ops;
+void dump_block_chains(void);
+void debug_broken_header(void);
+extern unsigned long raw_pages_allocd, header_pages_reserved;
+int toi_bio_chains_debug_info(char *buffer, int size);
+void toi_bio_chains_post_atomic(struct toi_boot_kernel_data *bkd);
+int toi_bio_scan_for_image(int quiet);
+int toi_bio_get_header_version(void);
+
+void close_resume_dev_t(int force);
+int open_resume_dev_t(int force, int quiet);
diff --git a/kernel/power/tuxonice_bio_signature.c b/kernel/power/tuxonice_bio_signature.c
new file mode 100644
index 0000000..9985385
--- /dev/null
+++ b/kernel/power/tuxonice_bio_signature.c
@@ -0,0 +1,403 @@
+/*
+ * kernel/power/tuxonice_bio_signature.c
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * Distributed under GPLv2.
+ *
+ */
+
+#include <linux/fs_uuid.h>
+
+#include "tuxonice.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_prepare_image.h"
+#include "tuxonice_bio.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_io.h"
+#include "tuxonice_builtin.h"
+#include "tuxonice_bio_internal.h"
+
+struct sig_data *toi_sig_data;
+
+/* Struct of swap header pages */
+
+struct old_sig_data {
+ dev_t device;
+ unsigned long sector;
+ int resume_attempted;
+ int orig_sig_type;
+};
+
+union diskpage {
+ union swap_header swh; /* swh.magic is the only member used */
+ struct sig_data sig_data;
+ struct old_sig_data old_sig_data;
+};
+
+union p_diskpage {
+ union diskpage *pointer;
+ char *ptr;
+ unsigned long address;
+};
+
+char *toi_cur_sig_page;
+char *toi_orig_sig_page;
+int have_image;
+int have_old_image;
+
+int get_signature_page(void)
+{
+ if (!toi_cur_sig_page) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0,
+ "Allocating current signature page.");
+ toi_cur_sig_page = (char *) toi_get_zeroed_page(38,
+ TOI_ATOMIC_GFP);
+ if (!toi_cur_sig_page) {
+ printk(KERN_ERR "Failed to allocate memory for the "
+ "current image signature.\n");
+ return -ENOMEM;
+ }
+
+ toi_sig_data = (struct sig_data *) toi_cur_sig_page;
+ }
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Reading signature from dev %lx,"
+ " sector %d.",
+ resume_block_device->bd_dev, resume_firstblock);
+
+ return toi_bio_ops.bdev_page_io(READ, resume_block_device,
+ resume_firstblock, virt_to_page(toi_cur_sig_page));
+}
+
+void forget_signature_page(void)
+{
+ if (toi_cur_sig_page) {
+ toi_sig_data = NULL;
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Freeing toi_cur_sig_page"
+ " (%p).", toi_cur_sig_page);
+ toi_free_page(38, (unsigned long) toi_cur_sig_page);
+ toi_cur_sig_page = NULL;
+ }
+
+ if (toi_orig_sig_page) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Freeing toi_orig_sig_page"
+ " (%p).", toi_orig_sig_page);
+ toi_free_page(38, (unsigned long) toi_orig_sig_page);
+ toi_orig_sig_page = NULL;
+ }
+}
+
+/*
+ * We need to ensure we use the signature page that's currently on disk,
+ * so as to not remove the image header. Post-atomic-restore, the orig sig
+ * page will be empty, so we can use that as our method of knowing that we
+ * need to load the on-disk signature and not use the non-image sig in
+ * memory. (We're going to powerdown after writing the change, so it's safe.
+ */
+int toi_bio_mark_resume_attempted(int flag)
+{
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Make resume attempted = %d.",
+ flag);
+ if (!toi_orig_sig_page) {
+ forget_signature_page();
+ get_signature_page();
+ }
+ toi_sig_data->resumed_before = flag;
+ return toi_bio_ops.bdev_page_io(WRITE, resume_block_device,
+ resume_firstblock, virt_to_page(toi_cur_sig_page));
+}
+
+int toi_bio_mark_have_image(void)
+{
+ int result = 0;
+ char buf[32];
+ struct fs_info *fs_info;
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Recording that an image exists.");
+ memcpy(toi_sig_data->sig, tuxonice_signature,
+ sizeof(tuxonice_signature));
+ toi_sig_data->have_image = 1;
+ toi_sig_data->resumed_before = 0;
+ toi_sig_data->header_dev_t = get_header_dev_t();
+ toi_sig_data->have_uuid = 0;
+
+ fs_info = fs_info_from_block_dev(get_header_bdev());
+ if (fs_info && !IS_ERR(fs_info)) {
+ memcpy(toi_sig_data->header_uuid, &fs_info->uuid, 16);
+ free_fs_info(fs_info);
+ } else
+ result = (int) PTR_ERR(fs_info);
+
+ if (!result) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Got uuid for dev_t %s.",
+ format_dev_t(buf, get_header_dev_t()));
+ toi_sig_data->have_uuid = 1;
+ } else
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Could not get uuid for "
+ "dev_t %s.",
+ format_dev_t(buf, get_header_dev_t()));
+
+ toi_sig_data->first_header_block = get_headerblock();
+ have_image = 1;
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "header dev_t is %x. First block "
+ "is %d.", toi_sig_data->header_dev_t,
+ toi_sig_data->first_header_block);
+
+ memcpy(toi_sig_data->sig2, tuxonice_signature,
+ sizeof(tuxonice_signature));
+ toi_sig_data->header_version = TOI_HEADER_VERSION;
+
+ return toi_bio_ops.bdev_page_io(WRITE, resume_block_device,
+ resume_firstblock, virt_to_page(toi_cur_sig_page));
+}
+
+int remove_old_signature(void)
+{
+ union p_diskpage swap_header_page = (union p_diskpage) toi_cur_sig_page;
+ char *orig_sig;
+ char *header_start = (char *) toi_get_zeroed_page(38, TOI_ATOMIC_GFP);
+ int result;
+ struct block_device *header_bdev;
+ struct old_sig_data *old_sig_data =
+ &swap_header_page.pointer->old_sig_data;
+
+ header_bdev = toi_open_bdev(NULL, old_sig_data->device, 1);
+ result = toi_bio_ops.bdev_page_io(READ, header_bdev,
+ old_sig_data->sector, virt_to_page(header_start));
+
+ if (result)
+ goto out;
+
+ /*
+ * TODO: Get the original contents of the first bytes of the swap
+ * header page.
+ */
+ if (!old_sig_data->orig_sig_type)
+ orig_sig = "SWAP-SPACE";
+ else
+ orig_sig = "SWAPSPACE2";
+
+ memcpy(swap_header_page.pointer->swh.magic.magic, orig_sig, 10);
+ memcpy(swap_header_page.ptr, header_start, 10);
+
+ result = toi_bio_ops.bdev_page_io(WRITE, resume_block_device,
+ resume_firstblock, virt_to_page(swap_header_page.ptr));
+
+out:
+ toi_close_bdev(header_bdev);
+ have_old_image = 0;
+ toi_free_page(38, (unsigned long) header_start);
+ return result;
+}
+
+/*
+ * toi_bio_restore_original_signature - restore the original signature
+ *
+ * At boot time (aborting pre atomic-restore), toi_orig_sig_page gets used.
+ * It will have the original signature page contents, stored in the image
+ * header. Post atomic-restore, we use :toi_cur_sig_page, which will contain
+ * the contents that were loaded when we started the cycle.
+ */
+int toi_bio_restore_original_signature(void)
+{
+ char *use = toi_orig_sig_page ? toi_orig_sig_page : toi_cur_sig_page;
+
+ if (have_old_image)
+ return remove_old_signature();
+
+ if (!use) {
+ printk("toi_bio_restore_original_signature: No signature "
+ "page loaded.\n");
+ return 0;
+ }
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Recording that no image exists.");
+ have_image = 0;
+ toi_sig_data->have_image = 0;
+ return toi_bio_ops.bdev_page_io(WRITE, resume_block_device,
+ resume_firstblock, virt_to_page(use));
+}
+
+/*
+ * check_for_signature - See whether we have an image.
+ *
+ * Returns 0 if no image, 1 if there is one, -1 if indeterminate.
+ */
+int toi_check_for_signature(void)
+{
+ union p_diskpage swap_header_page;
+ int type;
+ const char *normal_sigs[] = {"SWAP-SPACE", "SWAPSPACE2" };
+ const char *swsusp_sigs[] = {"S1SUSP", "S2SUSP", "S1SUSPEND" };
+ char *swap_header;
+
+ if (!toi_cur_sig_page) {
+ int result = get_signature_page();
+
+ if (result)
+ return result;
+ }
+
+ /*
+ * Start by looking for the binary header.
+ */
+ if (!memcmp(tuxonice_signature, toi_cur_sig_page,
+ sizeof(tuxonice_signature))) {
+ have_image = toi_sig_data->have_image;
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Have binary signature. "
+ "Have image is %d.", have_image);
+ if (have_image)
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "header dev_t is "
+ "%x. First block is %d.",
+ toi_sig_data->header_dev_t,
+ toi_sig_data->first_header_block);
+ return toi_sig_data->have_image;
+ }
+
+ /*
+ * Failing that, try old file allocator headers.
+ */
+
+ if (!memcmp(HaveImage, toi_cur_sig_page, strlen(HaveImage))) {
+ have_image = 1;
+ return 1;
+ }
+
+ have_image = 0;
+
+ if (!memcmp(NoImage, toi_cur_sig_page, strlen(NoImage)))
+ return 0;
+
+ /*
+ * Nope? How about swap?
+ */
+ swap_header_page = (union p_diskpage) toi_cur_sig_page;
+ swap_header = swap_header_page.pointer->swh.magic.magic;
+
+ /* Normal swapspace? */
+ for (type = 0; type < 2; type++)
+ if (!memcmp(normal_sigs[type], swap_header,
+ strlen(normal_sigs[type])))
+ return 0;
+
+ /* Swsusp or uswsusp? */
+ for (type = 0; type < 3; type++)
+ if (!memcmp(swsusp_sigs[type], swap_header,
+ strlen(swsusp_sigs[type])))
+ return 2;
+
+ /* Old TuxOnIce version? */
+ if (!memcmp(tuxonice_signature, swap_header,
+ sizeof(tuxonice_signature) - 1)) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Found old TuxOnIce "
+ "signature.");
+ have_old_image = 1;
+ return 3;
+ }
+
+ return -1;
+}
+
+/*
+ * Image_exists
+ *
+ * Returns -1 if don't know, otherwise 0 (no) or 1 (yes).
+ */
+int toi_bio_image_exists(int quiet)
+{
+ int result;
+ char *msg = NULL;
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "toi_bio_image_exists.");
+
+ if (!resume_dev_t) {
+ if (!quiet)
+ printk(KERN_INFO "Not even trying to read header "
+ "because resume_dev_t is not set.\n");
+ return -1;
+ }
+
+ if (open_resume_dev_t(0, quiet))
+ return -1;
+
+ result = toi_check_for_signature();
+
+ clear_toi_state(TOI_RESUMED_BEFORE);
+ if (toi_sig_data->resumed_before)
+ set_toi_state(TOI_RESUMED_BEFORE);
+
+ if (quiet || result == -ENOMEM)
+ return result;
+
+ if (result == -1)
+ msg = "TuxOnIce: Unable to find a signature."
+ " Could you have moved a swap file?\n";
+ else if (!result)
+ msg = "TuxOnIce: No image found.\n";
+ else if (result == 1)
+ msg = "TuxOnIce: Image found.\n";
+ else if (result == 2)
+ msg = "TuxOnIce: uswsusp or swsusp image found.\n";
+ else if (result == 3)
+ msg = "TuxOnIce: Old implementation's signature found.\n";
+
+ printk(KERN_INFO "%s", msg);
+
+ return result;
+}
+
+int toi_bio_scan_for_image(int quiet)
+{
+ struct block_device *bdev;
+ char default_name[255] = "";
+
+ if (!quiet)
+ printk(KERN_DEBUG "Scanning swap devices for TuxOnIce "
+ "signature...\n");
+ for (bdev = next_bdev_of_type(NULL, "swap"); bdev;
+ bdev = next_bdev_of_type(bdev, "swap")) {
+ int result;
+ char name[255] = "";
+ sprintf(name, "%u:%u", MAJOR(bdev->bd_dev),
+ MINOR(bdev->bd_dev));
+ if (!quiet)
+ printk(KERN_DEBUG "- Trying %s.\n", name);
+ resume_block_device = bdev;
+ resume_dev_t = bdev->bd_dev;
+
+ result = toi_check_for_signature();
+
+ resume_block_device = NULL;
+ resume_dev_t = MKDEV(0, 0);
+
+ if (!default_name[0])
+ strcpy(default_name, name);
+
+ if (result == 1) {
+ /* Got one! */
+ strcpy(resume_file, name);
+ next_bdev_of_type(bdev, NULL);
+ if (!quiet)
+ printk(KERN_DEBUG " ==> Image found on %s.\n",
+ resume_file);
+ return 1;
+ }
+ forget_signature_page();
+ }
+
+ if (!quiet)
+ printk(KERN_DEBUG "TuxOnIce scan: No image found.\n");
+ strcpy(resume_file, default_name);
+ return 0;
+}
+
+int toi_bio_get_header_version(void)
+{
+ return (memcmp(toi_sig_data->sig2, tuxonice_signature,
+ sizeof(tuxonice_signature))) ?
+ 0 : toi_sig_data->header_version;
+
+}
diff --git a/kernel/power/tuxonice_builtin.c b/kernel/power/tuxonice_builtin.c
new file mode 100644
index 0000000..a565bf6
--- /dev/null
+++ b/kernel/power/tuxonice_builtin.c
@@ -0,0 +1,445 @@
+/*
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ */
+#include <linux/resume-trace.h>
+#include <linux/kernel.h>
+#include <linux/swap.h>
+#include <linux/syscalls.h>
+#include <linux/bio.h>
+#include <linux/root_dev.h>
+#include <linux/freezer.h>
+#include <linux/reboot.h>
+#include <linux/writeback.h>
+#include <linux/tty.h>
+#include <linux/crypto.h>
+#include <linux/cpu.h>
+#include <linux/ctype.h>
+#include "tuxonice_io.h"
+#include "tuxonice.h"
+#include "tuxonice_extent.h"
+#include "tuxonice_netlink.h"
+#include "tuxonice_prepare_image.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_pagedir.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_builtin.h"
+#include "tuxonice_power_off.h"
+#include "tuxonice_alloc.h"
+
+unsigned long toi_bootflags_mask;
+EXPORT_SYMBOL_GPL(toi_bootflags_mask);
+
+/*
+ * Highmem related functions (x86 only).
+ */
+
+#ifdef CONFIG_HIGHMEM
+
+/**
+ * copyback_high: Restore highmem pages.
+ *
+ * Highmem data and pbe lists are/can be stored in highmem.
+ * The format is slightly different to the lowmem pbe lists
+ * used for the assembly code: the last pbe in each page is
+ * a struct page * instead of struct pbe *, pointing to the
+ * next page where pbes are stored (or NULL if happens to be
+ * the end of the list). Since we don't want to generate
+ * unnecessary deltas against swsusp code, we use a cast
+ * instead of a union.
+ **/
+
+static void copyback_high(void)
+{
+ struct page *pbe_page = (struct page *) restore_highmem_pblist;
+ struct pbe *this_pbe, *first_pbe;
+ unsigned long *origpage, *copypage;
+ int pbe_index = 1;
+
+ if (!pbe_page)
+ return;
+
+ this_pbe = (struct pbe *) kmap_atomic(pbe_page);
+ first_pbe = this_pbe;
+
+ while (this_pbe) {
+ int loop = (PAGE_SIZE / sizeof(unsigned long)) - 1;
+
+ origpage = kmap_atomic(pfn_to_page((unsigned long) this_pbe->orig_address));
+ copypage = kmap_atomic((struct page *) this_pbe->address);
+
+ while (loop >= 0) {
+ *(origpage + loop) = *(copypage + loop);
+ loop--;
+ }
+
+ kunmap_atomic(origpage);
+ kunmap_atomic(copypage);
+
+ if (!this_pbe->next)
+ break;
+
+ if (pbe_index < PBES_PER_PAGE) {
+ this_pbe++;
+ pbe_index++;
+ } else {
+ pbe_page = (struct page *) this_pbe->next;
+ kunmap_atomic(first_pbe);
+ if (!pbe_page)
+ return;
+ this_pbe = (struct pbe *) kmap_atomic(pbe_page);
+ first_pbe = this_pbe;
+ pbe_index = 1;
+ }
+ }
+ kunmap_atomic(first_pbe);
+}
+
+#else /* CONFIG_HIGHMEM */
+static void copyback_high(void) { }
+#endif
+
+char toi_wait_for_keypress_dev_console(int timeout)
+{
+ int fd, this_timeout = 255;
+ char key = '\0';
+ struct termios t, t_backup;
+
+ /* We should be guaranteed /dev/console exists after populate_rootfs()
+ * in init/main.c.
+ */
+ fd = sys_open("/dev/console", O_RDONLY, 0);
+ if (fd < 0) {
+ printk(KERN_INFO "Couldn't open /dev/console.\n");
+ return key;
+ }
+
+ if (sys_ioctl(fd, TCGETS, (long)&t) < 0)
+ goto out_close;
+
+ memcpy(&t_backup, &t, sizeof(t));
+
+ t.c_lflag &= ~(ISIG|ICANON|ECHO);
+ t.c_cc[VMIN] = 0;
+
+new_timeout:
+ if (timeout > 0) {
+ this_timeout = timeout < 26 ? timeout : 25;
+ timeout -= this_timeout;
+ this_timeout *= 10;
+ }
+
+ t.c_cc[VTIME] = this_timeout;
+
+ if (sys_ioctl(fd, TCSETS, (long)&t) < 0)
+ goto out_restore;
+
+ while (1) {
+ if (sys_read(fd, &key, 1) <= 0) {
+ if (timeout)
+ goto new_timeout;
+ key = '\0';
+ break;
+ }
+ key = tolower(key);
+ if (test_toi_state(TOI_SANITY_CHECK_PROMPT)) {
+ if (key == 'c') {
+ set_toi_state(TOI_CONTINUE_REQ);
+ break;
+ } else if (key == ' ')
+ break;
+ } else
+ break;
+ }
+
+out_restore:
+ sys_ioctl(fd, TCSETS, (long)&t_backup);
+out_close:
+ sys_close(fd);
+
+ return key;
+}
+EXPORT_SYMBOL_GPL(toi_wait_for_keypress_dev_console);
+
+struct toi_boot_kernel_data toi_bkd __nosavedata
+ __attribute__((aligned(PAGE_SIZE))) = {
+ MY_BOOT_KERNEL_DATA_VERSION,
+ 0,
+#ifdef CONFIG_TOI_REPLACE_SWSUSP
+ (1 << TOI_REPLACE_SWSUSP) |
+#endif
+ (1 << TOI_NO_FLUSHER_THREAD) |
+ (1 << TOI_PAGESET2_FULL) | (1 << TOI_LATE_CPU_HOTPLUG),
+};
+EXPORT_SYMBOL_GPL(toi_bkd);
+
+struct block_device *toi_open_by_devnum(dev_t dev)
+{
+ struct block_device *bdev = bdget(dev);
+ int err = -ENOMEM;
+ if (bdev)
+ err = blkdev_get(bdev, FMODE_READ | FMODE_NDELAY, NULL);
+ return err ? ERR_PTR(err) : bdev;
+}
+EXPORT_SYMBOL_GPL(toi_open_by_devnum);
+
+/**
+ * toi_close_bdev: Close a swap bdev.
+ *
+ * int: The swap entry number to close.
+ */
+void toi_close_bdev(struct block_device *bdev)
+{
+ blkdev_put(bdev, FMODE_READ | FMODE_NDELAY);
+}
+EXPORT_SYMBOL_GPL(toi_close_bdev);
+
+int toi_wait = CONFIG_TOI_DEFAULT_WAIT;
+EXPORT_SYMBOL_GPL(toi_wait);
+
+struct toi_core_fns *toi_core_fns;
+EXPORT_SYMBOL_GPL(toi_core_fns);
+
+unsigned long toi_result;
+EXPORT_SYMBOL_GPL(toi_result);
+
+struct pagedir pagedir1 = {1};
+EXPORT_SYMBOL_GPL(pagedir1);
+
+unsigned long toi_get_nonconflicting_page(void)
+{
+ return toi_core_fns->get_nonconflicting_page();
+}
+
+int toi_post_context_save(void)
+{
+ return toi_core_fns->post_context_save();
+}
+
+int try_tuxonice_hibernate(void)
+{
+ if (!toi_core_fns)
+ return -ENODEV;
+
+ return toi_core_fns->try_hibernate();
+}
+
+static int num_resume_calls;
+#ifdef CONFIG_TOI_IGNORE_LATE_INITCALL
+static int ignore_late_initcall = 1;
+#else
+static int ignore_late_initcall;
+#endif
+
+int toi_translate_err_default = TOI_CONTINUE_REQ;
+EXPORT_SYMBOL_GPL(toi_translate_err_default);
+
+void try_tuxonice_resume(void)
+{
+ /* Don't let it wrap around eventually */
+ if (num_resume_calls < 2)
+ num_resume_calls++;
+
+ if (num_resume_calls == 1 && ignore_late_initcall) {
+ printk(KERN_INFO "TuxOnIce: Ignoring late initcall, as requested.\n");
+ return;
+ }
+
+ if (toi_core_fns)
+ toi_core_fns->try_resume();
+ else
+ printk(KERN_INFO "TuxOnIce core not loaded yet.\n");
+}
+
+int toi_lowlevel_builtin(void)
+{
+ int error = 0;
+
+ save_processor_state();
+ error = swsusp_arch_suspend();
+ if (error)
+ printk(KERN_ERR "Error %d hibernating\n", error);
+
+ /* Restore control flow appears here */
+ if (!toi_in_hibernate) {
+ copyback_high();
+ set_toi_state(TOI_NOW_RESUMING);
+ }
+
+ restore_processor_state();
+ return error;
+}
+EXPORT_SYMBOL_GPL(toi_lowlevel_builtin);
+
+unsigned long toi_compress_bytes_in;
+EXPORT_SYMBOL_GPL(toi_compress_bytes_in);
+
+unsigned long toi_compress_bytes_out;
+EXPORT_SYMBOL_GPL(toi_compress_bytes_out);
+
+int toi_in_suspend(void)
+{
+ return in_suspend;
+}
+EXPORT_SYMBOL_GPL(toi_in_suspend);
+
+unsigned long toi_state = ((1 << TOI_BOOT_TIME) |
+ (1 << TOI_IGNORE_LOGLEVEL) |
+ (1 << TOI_IO_STOPPED));
+EXPORT_SYMBOL_GPL(toi_state);
+
+/* The number of hibernates we have started (some may have been cancelled) */
+unsigned int nr_hibernates;
+EXPORT_SYMBOL_GPL(nr_hibernates);
+
+int toi_running;
+EXPORT_SYMBOL_GPL(toi_running);
+
+__nosavedata int toi_in_hibernate;
+EXPORT_SYMBOL_GPL(toi_in_hibernate);
+
+__nosavedata struct pbe *restore_highmem_pblist;
+EXPORT_SYMBOL_GPL(restore_highmem_pblist);
+
+int toi_trace_allocs;
+EXPORT_SYMBOL_GPL(toi_trace_allocs);
+
+void toi_read_lock_tasklist(void)
+{
+ read_lock(&tasklist_lock);
+}
+EXPORT_SYMBOL_GPL(toi_read_lock_tasklist);
+
+void toi_read_unlock_tasklist(void)
+{
+ read_unlock(&tasklist_lock);
+}
+EXPORT_SYMBOL_GPL(toi_read_unlock_tasklist);
+
+#ifdef CONFIG_TOI_ZRAM_SUPPORT
+int (*toi_flag_zram_disks) (void);
+EXPORT_SYMBOL_GPL(toi_flag_zram_disks);
+
+int toi_do_flag_zram_disks(void)
+{
+ return toi_flag_zram_disks ? (*toi_flag_zram_disks)() : 0;
+}
+EXPORT_SYMBOL_GPL(toi_do_flag_zram_disks);
+#endif
+
+static int __init toi_wait_setup(char *str)
+{
+ int value;
+
+ if (sscanf(str, "=%d", &value)) {
+ if (value < -1 || value > 255)
+ printk(KERN_INFO "TuxOnIce_wait outside range -1 to "
+ "255.\n");
+ else
+ toi_wait = value;
+ }
+
+ return 1;
+}
+
+__setup("toi_wait", toi_wait_setup);
+
+static int __init toi_translate_retry_setup(char *str)
+{
+ toi_translate_err_default = 0;
+ return 1;
+}
+
+__setup("toi_translate_retry", toi_translate_retry_setup);
+
+static int __init toi_debug_setup(char *str)
+{
+ toi_bkd.toi_action |= (1 << TOI_LOGALL);
+ toi_bootflags_mask |= (1 << TOI_LOGALL);
+ toi_bkd.toi_debug_state = 255;
+ toi_bkd.toi_default_console_level = 7;
+ return 1;
+}
+
+__setup("toi_debug_setup", toi_debug_setup);
+
+static int __init toi_pause_setup(char *str)
+{
+ toi_bkd.toi_action |= (1 << TOI_PAUSE);
+ toi_bootflags_mask |= (1 << TOI_PAUSE);
+ return 1;
+}
+
+__setup("toi_pause", toi_pause_setup);
+
+#ifdef CONFIG_PM_DEBUG
+static int __init toi_trace_allocs_setup(char *str)
+{
+ int value;
+
+ if (sscanf(str, "=%d", &value))
+ toi_trace_allocs = value;
+
+ return 1;
+}
+__setup("toi_trace_allocs", toi_trace_allocs_setup);
+#endif
+
+static int __init toi_ignore_late_initcall_setup(char *str)
+{
+ int value;
+
+ if (sscanf(str, "=%d", &value))
+ ignore_late_initcall = value;
+
+ return 1;
+}
+
+__setup("toi_initramfs_resume_only", toi_ignore_late_initcall_setup);
+
+static int __init toi_force_no_multithreaded_setup(char *str)
+{
+ int value;
+
+ toi_bkd.toi_action &= ~(1 << TOI_NO_MULTITHREADED_IO);
+ toi_bootflags_mask |= (1 << TOI_NO_MULTITHREADED_IO);
+
+ if (sscanf(str, "=%d", &value) && value)
+ toi_bkd.toi_action |= (1 << TOI_NO_MULTITHREADED_IO);
+
+ return 1;
+}
+
+__setup("toi_no_multithreaded", toi_force_no_multithreaded_setup);
+
+#ifdef CONFIG_KGDB
+static int __init toi_post_resume_breakpoint_setup(char *str)
+{
+ int value;
+
+ toi_bkd.toi_action &= ~(1 << TOI_POST_RESUME_BREAKPOINT);
+ toi_bootflags_mask |= (1 << TOI_POST_RESUME_BREAKPOINT);
+ if (sscanf(str, "=%d", &value) && value)
+ toi_bkd.toi_action |= (1 << TOI_POST_RESUME_BREAKPOINT);
+
+ return 1;
+}
+
+__setup("toi_post_resume_break", toi_post_resume_breakpoint_setup);
+#endif
+
+static int __init toi_disable_readahead_setup(char *str)
+{
+ int value;
+
+ toi_bkd.toi_action &= ~(1 << TOI_NO_READAHEAD);
+ toi_bootflags_mask |= (1 << TOI_NO_READAHEAD);
+ if (sscanf(str, "=%d", &value) && value)
+ toi_bkd.toi_action |= (1 << TOI_NO_READAHEAD);
+
+ return 1;
+}
+
+__setup("toi_no_readahead", toi_disable_readahead_setup);
diff --git a/kernel/power/tuxonice_builtin.h b/kernel/power/tuxonice_builtin.h
new file mode 100644
index 0000000..6a1fd41
--- /dev/null
+++ b/kernel/power/tuxonice_builtin.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ */
+#include <asm/setup.h>
+
+extern struct toi_core_fns *toi_core_fns;
+extern unsigned long toi_compress_bytes_in, toi_compress_bytes_out;
+extern unsigned int nr_hibernates;
+extern int toi_in_hibernate;
+
+extern __nosavedata struct pbe *restore_highmem_pblist;
+
+int toi_lowlevel_builtin(void);
+
+#ifdef CONFIG_HIGHMEM
+extern __nosavedata struct zone_data *toi_nosave_zone_list;
+extern __nosavedata unsigned long toi_nosave_max_pfn;
+#endif
+
+extern unsigned long toi_get_nonconflicting_page(void);
+extern int toi_post_context_save(void);
+
+extern char toi_wait_for_keypress_dev_console(int timeout);
+extern struct block_device *toi_open_by_devnum(dev_t dev);
+extern void toi_close_bdev(struct block_device *bdev);
+extern int toi_wait;
+extern int toi_translate_err_default;
+extern int toi_force_no_multithreaded;
+extern void toi_read_lock_tasklist(void);
+extern void toi_read_unlock_tasklist(void);
+extern int toi_in_suspend(void);
+
+#ifdef CONFIG_TOI_ZRAM_SUPPORT
+extern int toi_do_flag_zram_disks(void);
+#else
+#define toi_do_flag_zram_disks() (0)
+#endif
diff --git a/kernel/power/tuxonice_checksum.c b/kernel/power/tuxonice_checksum.c
new file mode 100644
index 0000000..305475c
--- /dev/null
+++ b/kernel/power/tuxonice_checksum.c
@@ -0,0 +1,384 @@
+/*
+ * kernel/power/tuxonice_checksum.c
+ *
+ * Copyright (C) 2006-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains data checksum routines for TuxOnIce,
+ * using cryptoapi. They are used to locate any modifications
+ * made to pageset 2 while we're saving it.
+ */
+
+#include <linux/suspend.h>
+#include <linux/highmem.h>
+#include <linux/vmalloc.h>
+#include <linux/crypto.h>
+#include <linux/scatterlist.h>
+
+#include "tuxonice.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_io.h"
+#include "tuxonice_pageflags.h"
+#include "tuxonice_checksum.h"
+#include "tuxonice_pagedir.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_ui.h"
+
+static struct toi_module_ops toi_checksum_ops;
+
+/* Constant at the mo, but I might allow tuning later */
+static char toi_checksum_name[32] = "md4";
+/* Bytes per checksum */
+#define CHECKSUM_SIZE (16)
+
+#define CHECKSUMS_PER_PAGE ((PAGE_SIZE - sizeof(void *)) / CHECKSUM_SIZE)
+
+struct cpu_context {
+ struct crypto_hash *transform;
+ struct hash_desc desc;
+ struct scatterlist sg[2];
+ char *buf;
+};
+
+static DEFINE_PER_CPU(struct cpu_context, contexts);
+static int pages_allocated;
+static unsigned long page_list;
+
+static int toi_num_resaved;
+
+static unsigned long this_checksum, next_page;
+static int checksum_index;
+
+static inline int checksum_pages_needed(void)
+{
+ return DIV_ROUND_UP(pagedir2.size, CHECKSUMS_PER_PAGE);
+}
+
+/* ---- Local buffer management ---- */
+
+/*
+ * toi_checksum_cleanup
+ *
+ * Frees memory allocated for our labours.
+ */
+static void toi_checksum_cleanup(int ending_cycle)
+{
+ int cpu;
+
+ if (ending_cycle) {
+ for_each_online_cpu(cpu) {
+ struct cpu_context *this = &per_cpu(contexts, cpu);
+ if (this->transform) {
+ crypto_free_hash(this->transform);
+ this->transform = NULL;
+ this->desc.tfm = NULL;
+ }
+
+ if (this->buf) {
+ toi_free_page(27, (unsigned long) this->buf);
+ this->buf = NULL;
+ }
+ }
+ }
+}
+
+/*
+ * toi_crypto_initialise
+ *
+ * Prepare to do some work by allocating buffers and transforms.
+ * Returns: Int: Zero. Even if we can't set up checksum, we still
+ * seek to hibernate.
+ */
+static int toi_checksum_initialise(int starting_cycle)
+{
+ int cpu;
+
+ if (!(starting_cycle & SYSFS_HIBERNATE) || !toi_checksum_ops.enabled)
+ return 0;
+
+ if (!*toi_checksum_name) {
+ printk(KERN_INFO "TuxOnIce: No checksum algorithm name set.\n");
+ return 1;
+ }
+
+ for_each_online_cpu(cpu) {
+ struct cpu_context *this = &per_cpu(contexts, cpu);
+ struct page *page;
+
+ this->transform = crypto_alloc_hash(toi_checksum_name, 0, 0);
+ if (IS_ERR(this->transform)) {
+ printk(KERN_INFO "TuxOnIce: Failed to initialise the "
+ "%s checksum algorithm: %ld.\n",
+ toi_checksum_name, (long) this->transform);
+ this->transform = NULL;
+ return 1;
+ }
+
+ this->desc.tfm = this->transform;
+ this->desc.flags = 0;
+
+ page = toi_alloc_page(27, GFP_KERNEL);
+ if (!page)
+ return 1;
+ this->buf = page_address(page);
+ sg_init_one(&this->sg[0], this->buf, PAGE_SIZE);
+ }
+ return 0;
+}
+
+/*
+ * toi_checksum_print_debug_stats
+ * @buffer: Pointer to a buffer into which the debug info will be printed.
+ * @size: Size of the buffer.
+ *
+ * Print information to be recorded for debugging purposes into a buffer.
+ * Returns: Number of characters written to the buffer.
+ */
+
+static int toi_checksum_print_debug_stats(char *buffer, int size)
+{
+ int len;
+
+ if (!toi_checksum_ops.enabled)
+ return scnprintf(buffer, size,
+ "- Checksumming disabled.\n");
+
+ len = scnprintf(buffer, size, "- Checksum method is '%s'.\n",
+ toi_checksum_name);
+ len += scnprintf(buffer + len, size - len,
+ " %d pages resaved in atomic copy.\n", toi_num_resaved);
+ return len;
+}
+
+static int toi_checksum_memory_needed(void)
+{
+ return toi_checksum_ops.enabled ?
+ checksum_pages_needed() << PAGE_SHIFT : 0;
+}
+
+static int toi_checksum_storage_needed(void)
+{
+ if (toi_checksum_ops.enabled)
+ return strlen(toi_checksum_name) + sizeof(int) + 1;
+ else
+ return 0;
+}
+
+/*
+ * toi_checksum_save_config_info
+ * @buffer: Pointer to a buffer of size PAGE_SIZE.
+ *
+ * Save informaton needed when reloading the image at resume time.
+ * Returns: Number of bytes used for saving our data.
+ */
+static int toi_checksum_save_config_info(char *buffer)
+{
+ int namelen = strlen(toi_checksum_name) + 1;
+ int total_len;
+
+ *((unsigned int *) buffer) = namelen;
+ strncpy(buffer + sizeof(unsigned int), toi_checksum_name, namelen);
+ total_len = sizeof(unsigned int) + namelen;
+ return total_len;
+}
+
+/* toi_checksum_load_config_info
+ * @buffer: Pointer to the start of the data.
+ * @size: Number of bytes that were saved.
+ *
+ * Description: Reload information needed for dechecksuming the image at
+ * resume time.
+ */
+static void toi_checksum_load_config_info(char *buffer, int size)
+{
+ int namelen;
+
+ namelen = *((unsigned int *) (buffer));
+ strncpy(toi_checksum_name, buffer + sizeof(unsigned int),
+ namelen);
+ return;
+}
+
+/*
+ * Free Checksum Memory
+ */
+
+void free_checksum_pages(void)
+{
+ while (pages_allocated) {
+ unsigned long next = *((unsigned long *) page_list);
+ ClearPageNosave(virt_to_page(page_list));
+ toi_free_page(15, (unsigned long) page_list);
+ page_list = next;
+ pages_allocated--;
+ }
+}
+
+/*
+ * Allocate Checksum Memory
+ */
+
+int allocate_checksum_pages(void)
+{
+ int pages_needed = checksum_pages_needed();
+
+ if (!toi_checksum_ops.enabled)
+ return 0;
+
+ while (pages_allocated < pages_needed) {
+ unsigned long *new_page =
+ (unsigned long *) toi_get_zeroed_page(15, TOI_ATOMIC_GFP);
+ if (!new_page) {
+ printk(KERN_ERR "Unable to allocate checksum pages.\n");
+ return -ENOMEM;
+ }
+ SetPageNosave(virt_to_page(new_page));
+ (*new_page) = page_list;
+ page_list = (unsigned long) new_page;
+ pages_allocated++;
+ }
+
+ next_page = (unsigned long) page_list;
+ checksum_index = 0;
+
+ return 0;
+}
+
+char *tuxonice_get_next_checksum(void)
+{
+ if (!toi_checksum_ops.enabled)
+ return NULL;
+
+ if (checksum_index % CHECKSUMS_PER_PAGE)
+ this_checksum += CHECKSUM_SIZE;
+ else {
+ this_checksum = next_page + sizeof(void *);
+ next_page = *((unsigned long *) next_page);
+ }
+
+ checksum_index++;
+ return (char *) this_checksum;
+}
+
+int tuxonice_calc_checksum(struct page *page, char *checksum_locn)
+{
+ char *pa;
+ int result, cpu = smp_processor_id();
+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
+
+ if (!toi_checksum_ops.enabled)
+ return 0;
+
+ pa = kmap(page);
+ memcpy(ctx->buf, pa, PAGE_SIZE);
+ kunmap(page);
+ result = crypto_hash_digest(&ctx->desc, ctx->sg, PAGE_SIZE,
+ checksum_locn);
+ if (result)
+ printk(KERN_ERR "TuxOnIce checksumming: crypto_hash_digest "
+ "returned %d.\n", result);
+ return result;
+}
+/*
+ * Calculate checksums
+ */
+
+void check_checksums(void)
+{
+ int pfn, index = 0, cpu = smp_processor_id();
+ char current_checksum[CHECKSUM_SIZE];
+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
+
+ if (!toi_checksum_ops.enabled) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Checksumming disabled.");
+ return;
+ }
+
+ next_page = (unsigned long) page_list;
+
+ toi_num_resaved = 0;
+ this_checksum = 0;
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Verifying checksums.");
+ memory_bm_position_reset(pageset2_map);
+ for (pfn = memory_bm_next_pfn(pageset2_map); pfn != BM_END_OF_MAP;
+ pfn = memory_bm_next_pfn(pageset2_map)) {
+ int ret;
+ char *pa;
+ struct page *page = pfn_to_page(pfn);
+
+ if (index % CHECKSUMS_PER_PAGE) {
+ this_checksum += CHECKSUM_SIZE;
+ } else {
+ this_checksum = next_page + sizeof(void *);
+ next_page = *((unsigned long *) next_page);
+ }
+
+ /* Done when IRQs disabled so must be atomic */
+ pa = kmap_atomic(page);
+ memcpy(ctx->buf, pa, PAGE_SIZE);
+ kunmap_atomic(pa);
+ ret = crypto_hash_digest(&ctx->desc, ctx->sg, PAGE_SIZE,
+ current_checksum);
+
+ if (ret) {
+ printk(KERN_INFO "Digest failed. Returned %d.\n", ret);
+ return;
+ }
+
+ if (memcmp(current_checksum, (char *) this_checksum,
+ CHECKSUM_SIZE)) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Resaving %ld.",
+ pfn);
+ SetPageResave(pfn_to_page(pfn));
+ toi_num_resaved++;
+ if (test_action_state(TOI_ABORT_ON_RESAVE_NEEDED))
+ set_abort_result(TOI_RESAVE_NEEDED);
+ }
+
+ index++;
+ }
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Checksum verification complete.");
+}
+
+static struct toi_sysfs_data sysfs_params[] = {
+ SYSFS_INT("enabled", SYSFS_RW, &toi_checksum_ops.enabled, 0, 1, 0,
+ NULL),
+ SYSFS_BIT("abort_if_resave_needed", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_ABORT_ON_RESAVE_NEEDED, 0)
+};
+
+/*
+ * Ops structure.
+ */
+static struct toi_module_ops toi_checksum_ops = {
+ .type = MISC_MODULE,
+ .name = "checksumming",
+ .directory = "checksum",
+ .module = THIS_MODULE,
+ .initialise = toi_checksum_initialise,
+ .cleanup = toi_checksum_cleanup,
+ .print_debug_info = toi_checksum_print_debug_stats,
+ .save_config_info = toi_checksum_save_config_info,
+ .load_config_info = toi_checksum_load_config_info,
+ .memory_needed = toi_checksum_memory_needed,
+ .storage_needed = toi_checksum_storage_needed,
+
+ .sysfs_data = sysfs_params,
+ .num_sysfs_entries = sizeof(sysfs_params) /
+ sizeof(struct toi_sysfs_data),
+};
+
+/* ---- Registration ---- */
+int toi_checksum_init(void)
+{
+ int result = toi_register_module(&toi_checksum_ops);
+ return result;
+}
+
+void toi_checksum_exit(void)
+{
+ toi_unregister_module(&toi_checksum_ops);
+}
diff --git a/kernel/power/tuxonice_checksum.h b/kernel/power/tuxonice_checksum.h
new file mode 100644
index 0000000..08d3e7a
--- /dev/null
+++ b/kernel/power/tuxonice_checksum.h
@@ -0,0 +1,31 @@
+/*
+ * kernel/power/tuxonice_checksum.h
+ *
+ * Copyright (C) 2006-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains data checksum routines for TuxOnIce,
+ * using cryptoapi. They are used to locate any modifications
+ * made to pageset 2 while we're saving it.
+ */
+
+#if defined(CONFIG_TOI_CHECKSUM)
+extern int toi_checksum_init(void);
+extern void toi_checksum_exit(void);
+void check_checksums(void);
+int allocate_checksum_pages(void);
+void free_checksum_pages(void);
+char *tuxonice_get_next_checksum(void);
+int tuxonice_calc_checksum(struct page *page, char *checksum_locn);
+#else
+static inline int toi_checksum_init(void) { return 0; }
+static inline void toi_checksum_exit(void) { }
+static inline void check_checksums(void) { };
+static inline int allocate_checksum_pages(void) { return 0; };
+static inline void free_checksum_pages(void) { };
+static inline char *tuxonice_get_next_checksum(void) { return NULL; };
+static inline int tuxonice_calc_checksum(struct page *page, char *checksum_locn)
+ { return 0; }
+#endif
+
diff --git a/kernel/power/tuxonice_cluster.c b/kernel/power/tuxonice_cluster.c
new file mode 100644
index 0000000..c504227
--- /dev/null
+++ b/kernel/power/tuxonice_cluster.c
@@ -0,0 +1,1069 @@
+/*
+ * kernel/power/tuxonice_cluster.c
+ *
+ * Copyright (C) 2006-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains routines for cluster hibernation support.
+ *
+ * Based on ip autoconfiguration code in net/ipv4/ipconfig.c.
+ *
+ * How does it work?
+ *
+ * There is no 'master' node that tells everyone else what to do. All nodes
+ * send messages to the broadcast address/port, maintain a list of peers
+ * and figure out when to progress to the next step in hibernating or resuming.
+ * This makes us more fault tolerant when it comes to nodes coming and going
+ * (which may be more of an issue if we're hibernating when power supplies
+ * are being unreliable).
+ *
+ * At boot time, we start a ktuxonice thread that handles communication with
+ * other nodes. This node maintains a state machine that controls our progress
+ * through hibernating and resuming, keeping us in step with other nodes. Nodes
+ * are identified by their hw address.
+ *
+ * On startup, the node sends CLUSTER_PING on the configured interface's
+ * broadcast address, port $toi_cluster_port (see below) and begins to listen
+ * for other broadcast messages. CLUSTER_PING messages are repeated at
+ * intervals of 5 minutes, with a random offset to spread traffic out.
+ *
+ * A hibernation cycle is initiated from any node via
+ *
+ * echo > /sys/power/tuxonice/do_hibernate
+ *
+ * and (possibily) the hibernate script. At each step of the process, the node
+ * completes its work, and waits for all other nodes to signal completion of
+ * their work (or timeout) before progressing to the next step.
+ *
+ * Request/state Action before reply Possible reply Next state
+ * HIBERNATE capable, pre-script HIBERNATE|ACK NODE_PREP
+ * HIBERNATE|NACK INIT_0
+ *
+ * PREP prepare_image PREP|ACK IMAGE_WRITE
+ * PREP|NACK INIT_0
+ * ABORT RUNNING
+ *
+ * IO write image IO|ACK power off
+ * ABORT POST_RESUME
+ *
+ * (Boot time) check for image IMAGE|ACK RESUME_PREP
+ * (Note 1)
+ * IMAGE|NACK (Note 2)
+ *
+ * PREP prepare read image PREP|ACK IMAGE_READ
+ * PREP|NACK (As NACK_IMAGE)
+ *
+ * IO read image IO|ACK POST_RESUME
+ *
+ * POST_RESUME thaw, post-script RUNNING
+ *
+ * INIT_0 init 0
+ *
+ * Other messages:
+ *
+ * - PING: Request for all other live nodes to send a PONG. Used at startup to
+ * announce presence, when a node is suspected dead and periodically, in case
+ * segments of the network are [un]plugged.
+ *
+ * - PONG: Response to a PING.
+ *
+ * - ABORT: Request to cancel writing an image.
+ *
+ * - BYE: Notification that this node is shutting down.
+ *
+ * Note 1: Repeated at 3s intervals until we continue to boot/resume, so that
+ * nodes which are slower to start up can get state synchronised. If a node
+ * starting up sees other nodes sending RESUME_PREP or IMAGE_READ, it may send
+ * ACK_IMAGE and they will wait for it to catch up. If it sees ACK_READ, it
+ * must invalidate its image (if any) and boot normally.
+ *
+ * Note 2: May occur when one node lost power or powered off while others
+ * hibernated. This node waits for others to complete resuming (ACK_READ)
+ * before completing its boot, so that it appears as a fail node restarting.
+ *
+ * If any node has an image, then it also has a list of nodes that hibernated
+ * in synchronisation with it. The node will wait for other nodes to appear
+ * or timeout before beginning its restoration.
+ *
+ * If a node has no image, it needs to wait, in case other nodes which do have
+ * an image are going to resume, but are taking longer to announce their
+ * presence. For this reason, the user can specify a timeout value and a number
+ * of nodes detected before we just continue. (We might want to assume in a
+ * cluster of, say, 15 nodes, if 8 others have booted without finding an image,
+ * the remaining nodes will too. This might help in situations where some nodes
+ * are much slower to boot, or more subject to hardware failures or such like).
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/if.h>
+#include <linux/rtnetlink.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <linux/in.h>
+#include <linux/if_arp.h>
+#include <linux/kthread.h>
+#include <linux/wait.h>
+#include <linux/netdevice.h>
+#include <net/ip.h>
+
+#include "tuxonice.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_io.h"
+
+#if 1
+#define PRINTK(a, b...) do { printk(a, ##b); } while (0)
+#else
+#define PRINTK(a, b...) do { } while (0)
+#endif
+
+static int loopback_mode;
+static int num_local_nodes = 1;
+#define MAX_LOCAL_NODES 8
+#define SADDR (loopback_mode ? b->sid : h->saddr)
+
+#define MYNAME "TuxOnIce Clustering"
+
+enum cluster_message {
+ MSG_ACK = 1,
+ MSG_NACK = 2,
+ MSG_PING = 4,
+ MSG_ABORT = 8,
+ MSG_BYE = 16,
+ MSG_HIBERNATE = 32,
+ MSG_IMAGE = 64,
+ MSG_IO = 128,
+ MSG_RUNNING = 256
+};
+
+static char *str_message(int message)
+{
+ switch (message) {
+ case 4:
+ return "Ping";
+ case 8:
+ return "Abort";
+ case 9:
+ return "Abort acked";
+ case 10:
+ return "Abort nacked";
+ case 16:
+ return "Bye";
+ case 17:
+ return "Bye acked";
+ case 18:
+ return "Bye nacked";
+ case 32:
+ return "Hibernate request";
+ case 33:
+ return "Hibernate ack";
+ case 34:
+ return "Hibernate nack";
+ case 64:
+ return "Image exists?";
+ case 65:
+ return "Image does exist";
+ case 66:
+ return "No image here";
+ case 128:
+ return "I/O";
+ case 129:
+ return "I/O okay";
+ case 130:
+ return "I/O failed";
+ case 256:
+ return "Running";
+ default:
+ printk(KERN_ERR "Unrecognised message %d.\n", message);
+ return "Unrecognised message (see dmesg)";
+ }
+}
+
+#define MSG_ACK_MASK (MSG_ACK | MSG_NACK)
+#define MSG_STATE_MASK (~MSG_ACK_MASK)
+
+struct node_info {
+ struct list_head member_list;
+ wait_queue_head_t member_events;
+ spinlock_t member_list_lock;
+ spinlock_t receive_lock;
+ int peer_count, ignored_peer_count;
+ struct toi_sysfs_data sysfs_data;
+ enum cluster_message current_message;
+};
+
+struct node_info node_array[MAX_LOCAL_NODES];
+
+struct cluster_member {
+ __be32 addr;
+ enum cluster_message message;
+ struct list_head list;
+ int ignore;
+};
+
+#define toi_cluster_port_send 3501
+#define toi_cluster_port_recv 3502
+
+static struct net_device *net_dev;
+static struct toi_module_ops toi_cluster_ops;
+
+static int toi_recv(struct sk_buff *skb, struct net_device *dev,
+ struct packet_type *pt, struct net_device *orig_dev);
+
+static struct packet_type toi_cluster_packet_type = {
+ .type = __constant_htons(ETH_P_IP),
+ .func = toi_recv,
+};
+
+struct toi_pkt { /* BOOTP packet format */
+ struct iphdr iph; /* IP header */
+ struct udphdr udph; /* UDP header */
+ u8 htype; /* HW address type */
+ u8 hlen; /* HW address length */
+ __be32 xid; /* Transaction ID */
+ __be16 secs; /* Seconds since we started */
+ __be16 flags; /* Just what it says */
+ u8 hw_addr[16]; /* Sender's HW address */
+ u16 message; /* Message */
+ unsigned long sid; /* Source ID for loopback testing */
+};
+
+static char toi_cluster_iface[IFNAMSIZ] = CONFIG_TOI_DEFAULT_CLUSTER_INTERFACE;
+
+static int added_pack;
+
+static int others_have_image;
+
+/* Key used to allow multiple clusters on the same lan */
+static char toi_cluster_key[32] = CONFIG_TOI_DEFAULT_CLUSTER_KEY;
+static char pre_hibernate_script[255] =
+ CONFIG_TOI_DEFAULT_CLUSTER_PRE_HIBERNATE;
+static char post_hibernate_script[255] =
+ CONFIG_TOI_DEFAULT_CLUSTER_POST_HIBERNATE;
+
+/* List of cluster members */
+static unsigned long continue_delay = 5 * HZ;
+static unsigned long cluster_message_timeout = 3 * HZ;
+
+/* === Membership list === */
+
+static void print_member_info(int index)
+{
+ struct cluster_member *this;
+
+ printk(KERN_INFO "==> Dumping node %d.\n", index);
+
+ list_for_each_entry(this, &node_array[index].member_list, list)
+ printk(KERN_INFO "%d.%d.%d.%d last message %s. %s\n",
+ NIPQUAD(this->addr),
+ str_message(this->message),
+ this->ignore ? "(Ignored)" : "");
+ printk(KERN_INFO "== Done ==\n");
+}
+
+static struct cluster_member *__find_member(int index, __be32 addr)
+{
+ struct cluster_member *this;
+
+ list_for_each_entry(this, &node_array[index].member_list, list) {
+ if (this->addr != addr)
+ continue;
+
+ return this;
+ }
+
+ return NULL;
+}
+
+static void set_ignore(int index, __be32 addr, struct cluster_member *this)
+{
+ if (this->ignore) {
+ PRINTK("Node %d already ignoring %d.%d.%d.%d.\n",
+ index, NIPQUAD(addr));
+ return;
+ }
+
+ PRINTK("Node %d sees node %d.%d.%d.%d now being ignored.\n",
+ index, NIPQUAD(addr));
+ this->ignore = 1;
+ node_array[index].ignored_peer_count++;
+}
+
+static int __add_update_member(int index, __be32 addr, int message)
+{
+ struct cluster_member *this;
+
+ this = __find_member(index, addr);
+ if (this) {
+ if (this->message != message) {
+ this->message = message;
+ if ((message & MSG_NACK) &&
+ (message & (MSG_HIBERNATE | MSG_IMAGE | MSG_IO)))
+ set_ignore(index, addr, this);
+ PRINTK("Node %d sees node %d.%d.%d.%d now sending "
+ "%s.\n", index, NIPQUAD(addr),
+ str_message(message));
+ wake_up(&node_array[index].member_events);
+ }
+ return 0;
+ }
+
+ this = (struct cluster_member *) toi_kzalloc(36,
+ sizeof(struct cluster_member), GFP_KERNEL);
+
+ if (!this)
+ return -1;
+
+ this->addr = addr;
+ this->message = message;
+ this->ignore = 0;
+ INIT_LIST_HEAD(&this->list);
+
+ node_array[index].peer_count++;
+
+ PRINTK("Node %d sees node %d.%d.%d.%d sending %s.\n", index,
+ NIPQUAD(addr), str_message(message));
+
+ if ((message & MSG_NACK) &&
+ (message & (MSG_HIBERNATE | MSG_IMAGE | MSG_IO)))
+ set_ignore(index, addr, this);
+ list_add_tail(&this->list, &node_array[index].member_list);
+ return 1;
+}
+
+static int add_update_member(int index, __be32 addr, int message)
+{
+ int result;
+ unsigned long flags;
+ spin_lock_irqsave(&node_array[index].member_list_lock, flags);
+ result = __add_update_member(index, addr, message);
+ spin_unlock_irqrestore(&node_array[index].member_list_lock, flags);
+
+ print_member_info(index);
+
+ wake_up(&node_array[index].member_events);
+
+ return result;
+}
+
+static void del_member(int index, __be32 addr)
+{
+ struct cluster_member *this;
+ unsigned long flags;
+
+ spin_lock_irqsave(&node_array[index].member_list_lock, flags);
+ this = __find_member(index, addr);
+
+ if (this) {
+ list_del_init(&this->list);
+ toi_kfree(36, this, sizeof(*this));
+ node_array[index].peer_count--;
+ }
+
+ spin_unlock_irqrestore(&node_array[index].member_list_lock, flags);
+}
+
+/* === Message transmission === */
+
+static void toi_send_if(int message, unsigned long my_id);
+
+/*
+ * Process received TOI packet.
+ */
+static int toi_recv(struct sk_buff *skb, struct net_device *dev,
+ struct packet_type *pt, struct net_device *orig_dev)
+{
+ struct toi_pkt *b;
+ struct iphdr *h;
+ int len, result, index;
+ unsigned long addr, message, ack;
+
+ /* Perform verifications before taking the lock. */
+ if (skb->pkt_type == PACKET_OTHERHOST)
+ goto drop;
+
+ if (dev != net_dev)
+ goto drop;
+
+ skb = skb_share_check(skb, GFP_ATOMIC);
+ if (!skb)
+ return NET_RX_DROP;
+
+ if (!pskb_may_pull(skb,
+ sizeof(struct iphdr) +
+ sizeof(struct udphdr)))
+ goto drop;
+
+ b = (struct toi_pkt *)skb_network_header(skb);
+ h = &b->iph;
+
+ if (h->ihl != 5 || h->version != 4 || h->protocol != IPPROTO_UDP)
+ goto drop;
+
+ /* Fragments are not supported */
+ if (h->frag_off & htons(IP_OFFSET | IP_MF)) {
+ if (net_ratelimit())
+ printk(KERN_ERR "TuxOnIce: Ignoring fragmented "
+ "cluster message.\n");
+ goto drop;
+ }
+
+ if (skb->len < ntohs(h->tot_len))
+ goto drop;
+
+ if (ip_fast_csum((char *) h, h->ihl))
+ goto drop;
+
+ if (b->udph.source != htons(toi_cluster_port_send) ||
+ b->udph.dest != htons(toi_cluster_port_recv))
+ goto drop;
+
+ if (ntohs(h->tot_len) < ntohs(b->udph.len) + sizeof(struct iphdr))
+ goto drop;
+
+ len = ntohs(b->udph.len) - sizeof(struct udphdr);
+
+ /* Ok the front looks good, make sure we can get at the rest. */
+ if (!pskb_may_pull(skb, skb->len))
+ goto drop;
+
+ b = (struct toi_pkt *)skb_network_header(skb);
+ h = &b->iph;
+
+ addr = SADDR;
+ PRINTK(">>> Message %s received from " NIPQUAD_FMT ".\n",
+ str_message(b->message), NIPQUAD(addr));
+
+ message = b->message & MSG_STATE_MASK;
+ ack = b->message & MSG_ACK_MASK;
+
+ for (index = 0; index < num_local_nodes; index++) {
+ int new_message = node_array[index].current_message,
+ old_message = new_message;
+
+ if (index == SADDR || !old_message) {
+ PRINTK("Ignoring node %d (offline or self).\n", index);
+ continue;
+ }
+
+ /* One message at a time, please. */
+ spin_lock(&node_array[index].receive_lock);
+
+ result = add_update_member(index, SADDR, b->message);
+ if (result == -1) {
+ printk(KERN_INFO "Failed to add new cluster member "
+ NIPQUAD_FMT ".\n",
+ NIPQUAD(addr));
+ goto drop_unlock;
+ }
+
+ switch (b->message & MSG_STATE_MASK) {
+ case MSG_PING:
+ break;
+ case MSG_ABORT:
+ break;
+ case MSG_BYE:
+ break;
+ case MSG_HIBERNATE:
+ /* Can I hibernate? */
+ new_message = MSG_HIBERNATE |
+ ((index & 1) ? MSG_NACK : MSG_ACK);
+ break;
+ case MSG_IMAGE:
+ /* Can I resume? */
+ new_message = MSG_IMAGE |
+ ((index & 1) ? MSG_NACK : MSG_ACK);
+ if (new_message != old_message)
+ printk(KERN_ERR "Setting whether I can resume "
+ "to %d.\n", new_message);
+ break;
+ case MSG_IO:
+ new_message = MSG_IO | MSG_ACK;
+ break;
+ case MSG_RUNNING:
+ break;
+ default:
+ if (net_ratelimit())
+ printk(KERN_ERR "Unrecognised TuxOnIce cluster"
+ " message %d from " NIPQUAD_FMT ".\n",
+ b->message, NIPQUAD(addr));
+ };
+
+ if (old_message != new_message) {
+ node_array[index].current_message = new_message;
+ printk(KERN_INFO ">>> Sending new message for node "
+ "%d.\n", index);
+ toi_send_if(new_message, index);
+ } else if (!ack) {
+ printk(KERN_INFO ">>> Resending message for node %d.\n",
+ index);
+ toi_send_if(new_message, index);
+ }
+drop_unlock:
+ spin_unlock(&node_array[index].receive_lock);
+ };
+
+drop:
+ /* Throw the packet out. */
+ kfree_skb(skb);
+
+ return 0;
+}
+
+/*
+ * Send cluster message to single interface.
+ */
+static void toi_send_if(int message, unsigned long my_id)
+{
+ struct sk_buff *skb;
+ struct toi_pkt *b;
+ int hh_len = LL_RESERVED_SPACE(net_dev);
+ struct iphdr *h;
+
+ /* Allocate packet */
+ skb = alloc_skb(sizeof(struct toi_pkt) + hh_len + 15, GFP_KERNEL);
+ if (!skb)
+ return;
+ skb_reserve(skb, hh_len);
+ b = (struct toi_pkt *) skb_put(skb, sizeof(struct toi_pkt));
+ memset(b, 0, sizeof(struct toi_pkt));
+
+ /* Construct IP header */
+ skb_reset_network_header(skb);
+ h = ip_hdr(skb);
+ h->version = 4;
+ h->ihl = 5;
+ h->tot_len = htons(sizeof(struct toi_pkt));
+ h->frag_off = htons(IP_DF);
+ h->ttl = 64;
+ h->protocol = IPPROTO_UDP;
+ h->daddr = htonl(INADDR_BROADCAST);
+ h->check = ip_fast_csum((unsigned char *) h, h->ihl);
+
+ /* Construct UDP header */
+ b->udph.source = htons(toi_cluster_port_send);
+ b->udph.dest = htons(toi_cluster_port_recv);
+ b->udph.len = htons(sizeof(struct toi_pkt) - sizeof(struct iphdr));
+ /* UDP checksum not calculated -- explicitly allowed in BOOTP RFC */
+
+ /* Construct message */
+ b->message = message;
+ b->sid = my_id;
+ b->htype = net_dev->type; /* can cause undefined behavior */
+ b->hlen = net_dev->addr_len;
+ memcpy(b->hw_addr, net_dev->dev_addr, net_dev->addr_len);
+ b->secs = htons(3); /* 3 seconds */
+
+ /* Chain packet down the line... */
+ skb->dev = net_dev;
+ skb->protocol = htons(ETH_P_IP);
+ if ((dev_hard_header(skb, net_dev, ntohs(skb->protocol),
+ net_dev->broadcast, net_dev->dev_addr, skb->len) < 0) ||
+ dev_queue_xmit(skb) < 0)
+ printk(KERN_INFO "E");
+}
+
+/* ========================================= */
+
+/* kTOICluster */
+
+static atomic_t num_cluster_threads;
+static DECLARE_WAIT_QUEUE_HEAD(clusterd_events);
+
+static int kTOICluster(void *data)
+{
+ unsigned long my_id;
+
+ my_id = atomic_add_return(1, &num_cluster_threads) - 1;
+ node_array[my_id].current_message = (unsigned long) data;
+
+ PRINTK("kTOICluster daemon %lu starting.\n", my_id);
+
+ current->flags |= PF_NOFREEZE;
+
+ while (node_array[my_id].current_message) {
+ toi_send_if(node_array[my_id].current_message, my_id);
+ sleep_on_timeout(&clusterd_events,
+ cluster_message_timeout);
+ PRINTK("Link state %lu is %d.\n", my_id,
+ node_array[my_id].current_message);
+ }
+
+ toi_send_if(MSG_BYE, my_id);
+ atomic_dec(&num_cluster_threads);
+ wake_up(&clusterd_events);
+
+ PRINTK("kTOICluster daemon %lu exiting.\n", my_id);
+ __set_current_state(TASK_RUNNING);
+ return 0;
+}
+
+static void kill_clusterd(void)
+{
+ int i;
+
+ for (i = 0; i < num_local_nodes; i++) {
+ if (node_array[i].current_message) {
+ PRINTK("Seeking to kill clusterd %d.\n", i);
+ node_array[i].current_message = 0;
+ }
+ }
+ wait_event(clusterd_events,
+ !atomic_read(&num_cluster_threads));
+ PRINTK("All cluster daemons have exited.\n");
+}
+
+static int peers_not_in_message(int index, int message, int precise)
+{
+ struct cluster_member *this;
+ unsigned long flags;
+ int result = 0;
+
+ spin_lock_irqsave(&node_array[index].member_list_lock, flags);
+ list_for_each_entry(this, &node_array[index].member_list, list) {
+ if (this->ignore)
+ continue;
+
+ PRINTK("Peer %d.%d.%d.%d sending %s. "
+ "Seeking %s.\n",
+ NIPQUAD(this->addr),
+ str_message(this->message), str_message(message));
+ if ((precise ? this->message :
+ this->message & MSG_STATE_MASK) !=
+ message)
+ result++;
+ }
+ spin_unlock_irqrestore(&node_array[index].member_list_lock, flags);
+ PRINTK("%d peers in sought message.\n", result);
+ return result;
+}
+
+static void reset_ignored(int index)
+{
+ struct cluster_member *this;
+ unsigned long flags;
+
+ spin_lock_irqsave(&node_array[index].member_list_lock, flags);
+ list_for_each_entry(this, &node_array[index].member_list, list)
+ this->ignore = 0;
+ node_array[index].ignored_peer_count = 0;
+ spin_unlock_irqrestore(&node_array[index].member_list_lock, flags);
+}
+
+static int peers_in_message(int index, int message, int precise)
+{
+ return node_array[index].peer_count -
+ node_array[index].ignored_peer_count -
+ peers_not_in_message(index, message, precise);
+}
+
+static int time_to_continue(int index, unsigned long start, int message)
+{
+ int first = peers_not_in_message(index, message, 0);
+ int second = peers_in_message(index, message, 1);
+
+ PRINTK("First part returns %d, second returns %d.\n", first, second);
+
+ if (!first && !second) {
+ PRINTK("All peers answered message %d.\n",
+ message);
+ return 1;
+ }
+
+ if (time_after(jiffies, start + continue_delay)) {
+ PRINTK("Timeout reached.\n");
+ return 1;
+ }
+
+ PRINTK("Not time to continue yet (%lu < %lu).\n", jiffies,
+ start + continue_delay);
+ return 0;
+}
+
+void toi_initiate_cluster_hibernate(void)
+{
+ int result;
+ unsigned long start;
+
+ result = do_toi_step(STEP_HIBERNATE_PREPARE_IMAGE);
+ if (result)
+ return;
+
+ toi_send_if(MSG_HIBERNATE, 0);
+
+ start = jiffies;
+ wait_event(node_array[0].member_events,
+ time_to_continue(0, start, MSG_HIBERNATE));
+
+ if (test_action_state(TOI_FREEZER_TEST)) {
+ toi_send_if(MSG_ABORT, 0);
+
+ start = jiffies;
+ wait_event(node_array[0].member_events,
+ time_to_continue(0, start, MSG_RUNNING));
+
+ do_toi_step(STEP_QUIET_CLEANUP);
+ return;
+ }
+
+ toi_send_if(MSG_IO, 0);
+
+ result = do_toi_step(STEP_HIBERNATE_SAVE_IMAGE);
+ if (result)
+ return;
+
+ /* This code runs at resume time too! */
+ if (toi_in_hibernate)
+ result = do_toi_step(STEP_HIBERNATE_POWERDOWN);
+}
+EXPORT_SYMBOL_GPL(toi_initiate_cluster_hibernate);
+
+/* toi_cluster_print_debug_stats
+ *
+ * Description: Print information to be recorded for debugging purposes into a
+ * buffer.
+ * Arguments: buffer: Pointer to a buffer into which the debug info will be
+ * printed.
+ * size: Size of the buffer.
+ * Returns: Number of characters written to the buffer.
+ */
+static int toi_cluster_print_debug_stats(char *buffer, int size)
+{
+ int len;
+
+ if (strlen(toi_cluster_iface))
+ len = scnprintf(buffer, size,
+ "- Cluster interface is '%s'.\n",
+ toi_cluster_iface);
+ else
+ len = scnprintf(buffer, size,
+ "- Cluster support is disabled.\n");
+ return len;
+}
+
+/* cluster_memory_needed
+ *
+ * Description: Tell the caller how much memory we need to operate during
+ * hibernate/resume.
+ * Returns: Unsigned long. Maximum number of bytes of memory required for
+ * operation.
+ */
+static int toi_cluster_memory_needed(void)
+{
+ return 0;
+}
+
+static int toi_cluster_storage_needed(void)
+{
+ return 1 + strlen(toi_cluster_iface);
+}
+
+/* toi_cluster_save_config_info
+ *
+ * Description: Save informaton needed when reloading the image at resume time.
+ * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE.
+ * Returns: Number of bytes used for saving our data.
+ */
+static int toi_cluster_save_config_info(char *buffer)
+{
+ strcpy(buffer, toi_cluster_iface);
+ return strlen(toi_cluster_iface + 1);
+}
+
+/* toi_cluster_load_config_info
+ *
+ * Description: Reload information needed for declustering the image at
+ * resume time.
+ * Arguments: Buffer: Pointer to the start of the data.
+ * Size: Number of bytes that were saved.
+ */
+static void toi_cluster_load_config_info(char *buffer, int size)
+{
+ strncpy(toi_cluster_iface, buffer, size);
+ return;
+}
+
+static void cluster_startup(void)
+{
+ int have_image = do_check_can_resume(), i;
+ unsigned long start = jiffies, initial_message;
+ struct task_struct *p;
+
+ initial_message = MSG_IMAGE;
+
+ have_image = 1;
+
+ for (i = 0; i < num_local_nodes; i++) {
+ PRINTK("Starting ktoiclusterd %d.\n", i);
+ p = kthread_create(kTOICluster, (void *) initial_message,
+ "ktoiclusterd/%d", i);
+ if (IS_ERR(p)) {
+ printk(KERN_ERR "Failed to start ktoiclusterd.\n");
+ return;
+ }
+
+ wake_up_process(p);
+ }
+
+ /* Wait for delay or someone else sending first message */
+ wait_event(node_array[0].member_events, time_to_continue(0, start,
+ MSG_IMAGE));
+
+ others_have_image = peers_in_message(0, MSG_IMAGE | MSG_ACK, 1);
+
+ printk(KERN_INFO "Continuing. I %shave an image. Peers with image:"
+ " %d.\n", have_image ? "" : "don't ", others_have_image);
+
+ if (have_image) {
+ int result;
+
+ /* Start to resume */
+ printk(KERN_INFO " === Starting to resume === \n");
+ node_array[0].current_message = MSG_IO;
+ toi_send_if(MSG_IO, 0);
+
+ /* result = do_toi_step(STEP_RESUME_LOAD_PS1); */
+ result = 0;
+
+ if (!result) {
+ /*
+ * Atomic restore - we'll come back in the hibernation
+ * path.
+ */
+
+ /* result = do_toi_step(STEP_RESUME_DO_RESTORE); */
+ result = 0;
+
+ /* do_toi_step(STEP_QUIET_CLEANUP); */
+ }
+
+ node_array[0].current_message |= MSG_NACK;
+
+ /* For debugging - disable for real life? */
+ wait_event(node_array[0].member_events,
+ time_to_continue(0, start, MSG_IO));
+ }
+
+ if (others_have_image) {
+ /* Wait for them to resume */
+ printk(KERN_INFO "Waiting for other nodes to resume.\n");
+ start = jiffies;
+ wait_event(node_array[0].member_events,
+ time_to_continue(0, start, MSG_RUNNING));
+ if (peers_not_in_message(0, MSG_RUNNING, 0))
+ printk(KERN_INFO "Timed out while waiting for other "
+ "nodes to resume.\n");
+ }
+
+ /* Find out whether an image exists here. Send ACK_IMAGE or NACK_IMAGE
+ * as appropriate.
+ *
+ * If we don't have an image:
+ * - Wait until someone else says they have one, or conditions are met
+ * for continuing to boot (n machines or t seconds).
+ * - If anyone has an image, wait for them to resume before continuing
+ * to boot.
+ *
+ * If we have an image:
+ * - Wait until conditions are met before continuing to resume (n
+ * machines or t seconds). Send RESUME_PREP and freeze processes.
+ * NACK_PREP if freezing fails (shouldn't) and follow logic for
+ * us having no image above. On success, wait for [N]ACK_PREP from
+ * other machines. Read image (including atomic restore) until done.
+ * Wait for ACK_READ from others (should never fail). Thaw processes
+ * and do post-resume. (The section after the atomic restore is done
+ * via the code for hibernating).
+ */
+
+ node_array[0].current_message = MSG_RUNNING;
+}
+
+/* toi_cluster_open_iface
+ *
+ * Description: Prepare to use an interface.
+ */
+
+static int toi_cluster_open_iface(void)
+{
+ struct net_device *dev;
+
+ rtnl_lock();
+
+ for_each_netdev(&init_net, dev) {
+ if (/* dev == &init_net.loopback_dev || */
+ strcmp(dev->name, toi_cluster_iface))
+ continue;
+
+ net_dev = dev;
+ break;
+ }
+
+ rtnl_unlock();
+
+ if (!net_dev) {
+ printk(KERN_ERR MYNAME ": Device %s not found.\n",
+ toi_cluster_iface);
+ return -ENODEV;
+ }
+
+ dev_add_pack(&toi_cluster_packet_type);
+ added_pack = 1;
+
+ loopback_mode = (net_dev == init_net.loopback_dev);
+ num_local_nodes = loopback_mode ? 8 : 1;
+
+ PRINTK("Loopback mode is %s. Number of local nodes is %d.\n",
+ loopback_mode ? "on" : "off", num_local_nodes);
+
+ cluster_startup();
+ return 0;
+}
+
+/* toi_cluster_close_iface
+ *
+ * Description: Stop using an interface.
+ */
+
+static int toi_cluster_close_iface(void)
+{
+ kill_clusterd();
+ if (added_pack) {
+ dev_remove_pack(&toi_cluster_packet_type);
+ added_pack = 0;
+ }
+ return 0;
+}
+
+static void write_side_effect(void)
+{
+ if (toi_cluster_ops.enabled) {
+ toi_cluster_open_iface();
+ set_toi_state(TOI_CLUSTER_MODE);
+ } else {
+ toi_cluster_close_iface();
+ clear_toi_state(TOI_CLUSTER_MODE);
+ }
+}
+
+static void node_write_side_effect(void)
+{
+}
+
+/*
+ * data for our sysfs entries.
+ */
+static struct toi_sysfs_data sysfs_params[] = {
+ SYSFS_STRING("interface", SYSFS_RW, toi_cluster_iface, IFNAMSIZ, 0,
+ NULL),
+ SYSFS_INT("enabled", SYSFS_RW, &toi_cluster_ops.enabled, 0, 1, 0,
+ write_side_effect),
+ SYSFS_STRING("cluster_name", SYSFS_RW, toi_cluster_key, 32, 0, NULL),
+ SYSFS_STRING("pre-hibernate-script", SYSFS_RW, pre_hibernate_script,
+ 256, 0, NULL),
+ SYSFS_STRING("post-hibernate-script", SYSFS_RW, post_hibernate_script,
+ 256, 0, STRING),
+ SYSFS_UL("continue_delay", SYSFS_RW, &continue_delay, HZ / 2, 60 * HZ,
+ 0)
+};
+
+/*
+ * Ops structure.
+ */
+
+static struct toi_module_ops toi_cluster_ops = {
+ .type = FILTER_MODULE,
+ .name = "Cluster",
+ .directory = "cluster",
+ .module = THIS_MODULE,
+ .memory_needed = toi_cluster_memory_needed,
+ .print_debug_info = toi_cluster_print_debug_stats,
+ .save_config_info = toi_cluster_save_config_info,
+ .load_config_info = toi_cluster_load_config_info,
+ .storage_needed = toi_cluster_storage_needed,
+
+ .sysfs_data = sysfs_params,
+ .num_sysfs_entries = sizeof(sysfs_params) /
+ sizeof(struct toi_sysfs_data),
+};
+
+/* ---- Registration ---- */
+
+#ifdef MODULE
+#define INIT static __init
+#define EXIT static __exit
+#else
+#define INIT
+#define EXIT
+#endif
+
+INIT int toi_cluster_init(void)
+{
+ int temp = toi_register_module(&toi_cluster_ops), i;
+ struct kobject *kobj = toi_cluster_ops.dir_kobj;
+
+ for (i = 0; i < MAX_LOCAL_NODES; i++) {
+ node_array[i].current_message = 0;
+ INIT_LIST_HEAD(&node_array[i].member_list);
+ init_waitqueue_head(&node_array[i].member_events);
+ spin_lock_init(&node_array[i].member_list_lock);
+ spin_lock_init(&node_array[i].receive_lock);
+
+ /* Set up sysfs entry */
+ node_array[i].sysfs_data.attr.name = toi_kzalloc(8,
+ sizeof(node_array[i].sysfs_data.attr.name),
+ GFP_KERNEL);
+ sprintf((char *) node_array[i].sysfs_data.attr.name, "node_%d",
+ i);
+ node_array[i].sysfs_data.attr.mode = SYSFS_RW;
+ node_array[i].sysfs_data.type = TOI_SYSFS_DATA_INTEGER;
+ node_array[i].sysfs_data.flags = 0;
+ node_array[i].sysfs_data.data.integer.variable =
+ (int *) &node_array[i].current_message;
+ node_array[i].sysfs_data.data.integer.minimum = 0;
+ node_array[i].sysfs_data.data.integer.maximum = INT_MAX;
+ node_array[i].sysfs_data.write_side_effect =
+ node_write_side_effect;
+ toi_register_sysfs_file(kobj, &node_array[i].sysfs_data);
+ }
+
+ toi_cluster_ops.enabled = (strlen(toi_cluster_iface) > 0);
+
+ if (toi_cluster_ops.enabled)
+ toi_cluster_open_iface();
+
+ return temp;
+}
+
+EXIT void toi_cluster_exit(void)
+{
+ int i;
+ toi_cluster_close_iface();
+
+ for (i = 0; i < MAX_LOCAL_NODES; i++)
+ toi_unregister_sysfs_file(toi_cluster_ops.dir_kobj,
+ &node_array[i].sysfs_data);
+ toi_unregister_module(&toi_cluster_ops);
+}
+
+static int __init toi_cluster_iface_setup(char *iface)
+{
+ toi_cluster_ops.enabled = (*iface &&
+ strcmp(iface, "off"));
+
+ if (toi_cluster_ops.enabled)
+ strncpy(toi_cluster_iface, iface, strlen(iface));
+}
+
+__setup("toi_cluster=", toi_cluster_iface_setup);
+
+#ifdef MODULE
+MODULE_LICENSE("GPL");
+module_init(toi_cluster_init);
+module_exit(toi_cluster_exit);
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Cluster Support for TuxOnIce");
+#endif
diff --git a/kernel/power/tuxonice_cluster.h b/kernel/power/tuxonice_cluster.h
new file mode 100644
index 0000000..5c46acc
--- /dev/null
+++ b/kernel/power/tuxonice_cluster.h
@@ -0,0 +1,18 @@
+/*
+ * kernel/power/tuxonice_cluster.h
+ *
+ * Copyright (C) 2006-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifdef CONFIG_TOI_CLUSTER
+extern int toi_cluster_init(void);
+extern void toi_cluster_exit(void);
+extern void toi_initiate_cluster_hibernate(void);
+#else
+static inline int toi_cluster_init(void) { return 0; }
+static inline void toi_cluster_exit(void) { }
+static inline void toi_initiate_cluster_hibernate(void) { }
+#endif
+
diff --git a/kernel/power/tuxonice_compress.c b/kernel/power/tuxonice_compress.c
new file mode 100644
index 0000000..362f8fb
--- /dev/null
+++ b/kernel/power/tuxonice_compress.c
@@ -0,0 +1,465 @@
+/*
+ * kernel/power/compression.c
+ *
+ * Copyright (C) 2003-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains data compression routines for TuxOnIce,
+ * using cryptoapi.
+ */
+
+#include <linux/suspend.h>
+#include <linux/highmem.h>
+#include <linux/vmalloc.h>
+#include <linux/crypto.h>
+
+#include "tuxonice_builtin.h"
+#include "tuxonice.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_io.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_alloc.h"
+
+static int toi_expected_compression;
+
+static struct toi_module_ops toi_compression_ops;
+static struct toi_module_ops *next_driver;
+
+static char toi_compressor_name[32] = "lzo";
+
+static DEFINE_MUTEX(stats_lock);
+
+struct cpu_context {
+ u8 *page_buffer;
+ struct crypto_comp *transform;
+ unsigned int len;
+ u8 *buffer_start;
+ u8 *output_buffer;
+};
+
+#define OUT_BUF_SIZE (2 * PAGE_SIZE)
+
+static DEFINE_PER_CPU(struct cpu_context, contexts);
+
+/*
+ * toi_crypto_prepare
+ *
+ * Prepare to do some work by allocating buffers and transforms.
+ */
+static int toi_compress_crypto_prepare(void)
+{
+ int cpu;
+
+ if (!*toi_compressor_name) {
+ printk(KERN_INFO "TuxOnIce: Compression enabled but no "
+ "compressor name set.\n");
+ return 1;
+ }
+
+ for_each_online_cpu(cpu) {
+ struct cpu_context *this = &per_cpu(contexts, cpu);
+ this->transform = crypto_alloc_comp(toi_compressor_name, 0, 0);
+ if (IS_ERR(this->transform)) {
+ printk(KERN_INFO "TuxOnIce: Failed to initialise the "
+ "%s compression transform.\n",
+ toi_compressor_name);
+ this->transform = NULL;
+ return 1;
+ }
+
+ this->page_buffer =
+ (char *) toi_get_zeroed_page(16, TOI_ATOMIC_GFP);
+
+ if (!this->page_buffer) {
+ printk(KERN_ERR
+ "Failed to allocate a page buffer for TuxOnIce "
+ "compression driver.\n");
+ return -ENOMEM;
+ }
+
+ this->output_buffer =
+ (char *) vmalloc_32(OUT_BUF_SIZE);
+
+ if (!this->output_buffer) {
+ printk(KERN_ERR
+ "Failed to allocate a output buffer for TuxOnIce "
+ "compression driver.\n");
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
+static int toi_compress_rw_cleanup(int writing)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ struct cpu_context *this = &per_cpu(contexts, cpu);
+ if (this->transform) {
+ crypto_free_comp(this->transform);
+ this->transform = NULL;
+ }
+
+ if (this->page_buffer)
+ toi_free_page(16, (unsigned long) this->page_buffer);
+
+ this->page_buffer = NULL;
+
+ if (this->output_buffer)
+ vfree(this->output_buffer);
+
+ this->output_buffer = NULL;
+ }
+
+ return 0;
+}
+
+/*
+ * toi_compress_init
+ */
+
+static int toi_compress_init(int toi_or_resume)
+{
+ if (!toi_or_resume)
+ return 0;
+
+ toi_compress_bytes_in = 0;
+ toi_compress_bytes_out = 0;
+
+ next_driver = toi_get_next_filter(&toi_compression_ops);
+
+ return next_driver ? 0 : -ECHILD;
+}
+
+/*
+ * toi_compress_rw_init()
+ */
+
+static int toi_compress_rw_init(int rw, int stream_number)
+{
+ if (toi_compress_crypto_prepare()) {
+ printk(KERN_ERR "Failed to initialise compression "
+ "algorithm.\n");
+ if (rw == READ) {
+ printk(KERN_INFO "Unable to read the image.\n");
+ return -ENODEV;
+ } else {
+ printk(KERN_INFO "Continuing without "
+ "compressing the image.\n");
+ toi_compression_ops.enabled = 0;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * toi_compress_write_page()
+ *
+ * Compress a page of data, buffering output and passing on filled
+ * pages to the next module in the pipeline.
+ *
+ * Buffer_page: Pointer to a buffer of size PAGE_SIZE, containing
+ * data to be compressed.
+ *
+ * Returns: 0 on success. Otherwise the error is that returned by later
+ * modules, -ECHILD if we have a broken pipeline or -EIO if
+ * zlib errs.
+ */
+static int toi_compress_write_page(unsigned long index, int buf_type,
+ void *buffer_page, unsigned int buf_size)
+{
+ int ret = 0, cpu = smp_processor_id();
+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
+ u8* output_buffer = buffer_page;
+ int output_len = buf_size;
+ int out_buf_type = buf_type;
+
+ if (ctx->transform) {
+
+ ctx->buffer_start = TOI_MAP(buf_type, buffer_page);
+ ctx->len = OUT_BUF_SIZE;
+
+ ret = crypto_comp_compress(ctx->transform,
+ ctx->buffer_start, buf_size,
+ ctx->output_buffer, &ctx->len);
+
+ TOI_UNMAP(buf_type, buffer_page);
+
+ toi_message(TOI_COMPRESS, TOI_VERBOSE, 0,
+ "CPU %d, index %lu: %d bytes",
+ cpu, index, ctx->len);
+
+ if (!ret && ctx->len < buf_size) { /* some compression */
+ output_buffer = ctx->output_buffer;
+ output_len = ctx->len;
+ out_buf_type = TOI_VIRT;
+ }
+
+ }
+
+ mutex_lock(&stats_lock);
+
+ toi_compress_bytes_in += buf_size;
+ toi_compress_bytes_out += output_len;
+
+ mutex_unlock(&stats_lock);
+
+ if (!ret)
+ ret = next_driver->write_page(index, out_buf_type,
+ output_buffer, output_len);
+
+ return ret;
+}
+
+/*
+ * toi_compress_read_page()
+ * @buffer_page: struct page *. Pointer to a buffer of size PAGE_SIZE.
+ *
+ * Retrieve data from later modules and decompress it until the input buffer
+ * is filled.
+ * Zero if successful. Error condition from me or from downstream on failure.
+ */
+static int toi_compress_read_page(unsigned long *index, int buf_type,
+ void *buffer_page, unsigned int *buf_size)
+{
+ int ret, cpu = smp_processor_id();
+ unsigned int len;
+ unsigned int outlen = PAGE_SIZE;
+ char *buffer_start;
+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
+
+ if (!ctx->transform)
+ return next_driver->read_page(index, TOI_PAGE, buffer_page,
+ buf_size);
+
+ /*
+ * All our reads must be synchronous - we can't decompress
+ * data that hasn't been read yet.
+ */
+
+ ret = next_driver->read_page(index, TOI_VIRT, ctx->page_buffer, &len);
+
+ buffer_start = kmap(buffer_page);
+
+ /* Error or uncompressed data */
+ if (ret || len == PAGE_SIZE) {
+ memcpy(buffer_start, ctx->page_buffer, len);
+ goto out;
+ }
+
+ ret = crypto_comp_decompress(
+ ctx->transform,
+ ctx->page_buffer,
+ len, buffer_start, &outlen);
+
+ toi_message(TOI_COMPRESS, TOI_VERBOSE, 0,
+ "CPU %d, index %lu: %d=>%d (%d).",
+ cpu, *index, len, outlen, ret);
+
+ if (ret)
+ abort_hibernate(TOI_FAILED_IO,
+ "Compress_read returned %d.\n", ret);
+ else if (outlen != PAGE_SIZE) {
+ abort_hibernate(TOI_FAILED_IO,
+ "Decompression yielded %d bytes instead of %ld.\n",
+ outlen, PAGE_SIZE);
+ printk(KERN_ERR "Decompression yielded %d bytes instead of "
+ "%ld.\n", outlen, PAGE_SIZE);
+ ret = -EIO;
+ *buf_size = outlen;
+ }
+out:
+ TOI_UNMAP(buf_type, buffer_page);
+ return ret;
+}
+
+/*
+ * toi_compress_print_debug_stats
+ * @buffer: Pointer to a buffer into which the debug info will be printed.
+ * @size: Size of the buffer.
+ *
+ * Print information to be recorded for debugging purposes into a buffer.
+ * Returns: Number of characters written to the buffer.
+ */
+
+static int toi_compress_print_debug_stats(char *buffer, int size)
+{
+ unsigned long pages_in = toi_compress_bytes_in >> PAGE_SHIFT,
+ pages_out = toi_compress_bytes_out >> PAGE_SHIFT;
+ int len;
+
+ /* Output the compression ratio achieved. */
+ if (*toi_compressor_name)
+ len = scnprintf(buffer, size, "- Compressor is '%s'.\n",
+ toi_compressor_name);
+ else
+ len = scnprintf(buffer, size, "- Compressor is not set.\n");
+
+ if (pages_in)
+ len += scnprintf(buffer+len, size - len, " Compressed "
+ "%lu bytes into %lu (%ld percent compression).\n",
+ toi_compress_bytes_in,
+ toi_compress_bytes_out,
+ (pages_in - pages_out) * 100 / pages_in);
+ return len;
+}
+
+/*
+ * toi_compress_compression_memory_needed
+ *
+ * Tell the caller how much memory we need to operate during hibernate/resume.
+ * Returns: Unsigned long. Maximum number of bytes of memory required for
+ * operation.
+ */
+static int toi_compress_memory_needed(void)
+{
+ return 2 * PAGE_SIZE;
+}
+
+static int toi_compress_storage_needed(void)
+{
+ return 2 * sizeof(unsigned long) + 2 * sizeof(int) +
+ strlen(toi_compressor_name) + 1;
+}
+
+/*
+ * toi_compress_save_config_info
+ * @buffer: Pointer to a buffer of size PAGE_SIZE.
+ *
+ * Save informaton needed when reloading the image at resume time.
+ * Returns: Number of bytes used for saving our data.
+ */
+static int toi_compress_save_config_info(char *buffer)
+{
+ int len = strlen(toi_compressor_name) + 1, offset = 0;
+
+ *((unsigned long *) buffer) = toi_compress_bytes_in;
+ offset += sizeof(unsigned long);
+ *((unsigned long *) (buffer + offset)) = toi_compress_bytes_out;
+ offset += sizeof(unsigned long);
+ *((int *) (buffer + offset)) = toi_expected_compression;
+ offset += sizeof(int);
+ *((int *) (buffer + offset)) = len;
+ offset += sizeof(int);
+ strncpy(buffer + offset, toi_compressor_name, len);
+ return offset + len;
+}
+
+/* toi_compress_load_config_info
+ * @buffer: Pointer to the start of the data.
+ * @size: Number of bytes that were saved.
+ *
+ * Description: Reload information needed for decompressing the image at
+ * resume time.
+ */
+static void toi_compress_load_config_info(char *buffer, int size)
+{
+ int len, offset = 0;
+
+ toi_compress_bytes_in = *((unsigned long *) buffer);
+ offset += sizeof(unsigned long);
+ toi_compress_bytes_out = *((unsigned long *) (buffer + offset));
+ offset += sizeof(unsigned long);
+ toi_expected_compression = *((int *) (buffer + offset));
+ offset += sizeof(int);
+ len = *((int *) (buffer + offset));
+ offset += sizeof(int);
+ strncpy(toi_compressor_name, buffer + offset, len);
+}
+
+static void toi_compress_pre_atomic_restore(struct toi_boot_kernel_data *bkd)
+{
+ bkd->compress_bytes_in = toi_compress_bytes_in;
+ bkd->compress_bytes_out = toi_compress_bytes_out;
+}
+
+static void toi_compress_post_atomic_restore(struct toi_boot_kernel_data *bkd)
+{
+ toi_compress_bytes_in = bkd->compress_bytes_in;
+ toi_compress_bytes_out = bkd->compress_bytes_out;
+}
+
+/*
+ * toi_expected_compression_ratio
+ *
+ * Description: Returns the expected ratio between data passed into this module
+ * and the amount of data output when writing.
+ * Returns: 100 if the module is disabled. Otherwise the value set by the
+ * user via our sysfs entry.
+ */
+
+static int toi_compress_expected_ratio(void)
+{
+ if (!toi_compression_ops.enabled)
+ return 100;
+ else
+ return 100 - toi_expected_compression;
+}
+
+/*
+ * data for our sysfs entries.
+ */
+static struct toi_sysfs_data sysfs_params[] = {
+ SYSFS_INT("expected_compression", SYSFS_RW, &toi_expected_compression,
+ 0, 99, 0, NULL),
+ SYSFS_INT("enabled", SYSFS_RW, &toi_compression_ops.enabled, 0, 1, 0,
+ NULL),
+ SYSFS_STRING("algorithm", SYSFS_RW, toi_compressor_name, 31, 0, NULL),
+};
+
+/*
+ * Ops structure.
+ */
+static struct toi_module_ops toi_compression_ops = {
+ .type = FILTER_MODULE,
+ .name = "compression",
+ .directory = "compression",
+ .module = THIS_MODULE,
+ .initialise = toi_compress_init,
+ .memory_needed = toi_compress_memory_needed,
+ .print_debug_info = toi_compress_print_debug_stats,
+ .save_config_info = toi_compress_save_config_info,
+ .load_config_info = toi_compress_load_config_info,
+ .storage_needed = toi_compress_storage_needed,
+ .expected_compression = toi_compress_expected_ratio,
+
+ .pre_atomic_restore = toi_compress_pre_atomic_restore,
+ .post_atomic_restore = toi_compress_post_atomic_restore,
+
+ .rw_init = toi_compress_rw_init,
+ .rw_cleanup = toi_compress_rw_cleanup,
+
+ .write_page = toi_compress_write_page,
+ .read_page = toi_compress_read_page,
+
+ .sysfs_data = sysfs_params,
+ .num_sysfs_entries = sizeof(sysfs_params) /
+ sizeof(struct toi_sysfs_data),
+};
+
+/* ---- Registration ---- */
+
+static __init int toi_compress_load(void)
+{
+ return toi_register_module(&toi_compression_ops);
+}
+
+#ifdef MODULE
+static __exit void toi_compress_unload(void)
+{
+ toi_unregister_module(&toi_compression_ops);
+}
+
+module_init(toi_compress_load);
+module_exit(toi_compress_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Compression Support for TuxOnIce");
+#else
+late_initcall(toi_compress_load);
+#endif
diff --git a/kernel/power/tuxonice_extent.c b/kernel/power/tuxonice_extent.c
new file mode 100644
index 0000000..cf111c1
--- /dev/null
+++ b/kernel/power/tuxonice_extent.c
@@ -0,0 +1,123 @@
+/*
+ * kernel/power/tuxonice_extent.c
+ *
+ * Copyright (C) 2003-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * Distributed under GPLv2.
+ *
+ * These functions encapsulate the manipulation of storage metadata.
+ */
+
+#include <linux/suspend.h>
+#include "tuxonice_modules.h"
+#include "tuxonice_extent.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_ui.h"
+#include "tuxonice.h"
+
+/**
+ * toi_get_extent - return a free extent
+ *
+ * May fail, returning NULL instead.
+ **/
+static struct hibernate_extent *toi_get_extent(void)
+{
+ return (struct hibernate_extent *) toi_kzalloc(2,
+ sizeof(struct hibernate_extent), TOI_ATOMIC_GFP);
+}
+
+/**
+ * toi_put_extent_chain - free a whole chain of extents
+ * @chain: Chain to free.
+ **/
+void toi_put_extent_chain(struct hibernate_extent_chain *chain)
+{
+ struct hibernate_extent *this;
+
+ this = chain->first;
+
+ while (this) {
+ struct hibernate_extent *next = this->next;
+ toi_kfree(2, this, sizeof(*this));
+ chain->num_extents--;
+ this = next;
+ }
+
+ chain->first = NULL;
+ chain->last_touched = NULL;
+ chain->current_extent = NULL;
+ chain->size = 0;
+}
+EXPORT_SYMBOL_GPL(toi_put_extent_chain);
+
+/**
+ * toi_add_to_extent_chain - add an extent to an existing chain
+ * @chain: Chain to which the extend should be added
+ * @start: Start of the extent (first physical block)
+ * @end: End of the extent (last physical block)
+ *
+ * The chain information is updated if the insertion is successful.
+ **/
+int toi_add_to_extent_chain(struct hibernate_extent_chain *chain,
+ unsigned long start, unsigned long end)
+{
+ struct hibernate_extent *new_ext = NULL, *cur_ext = NULL;
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0,
+ "Adding extent %lu-%lu to chain %p.\n", start, end, chain);
+
+ /* Find the right place in the chain */
+ if (chain->last_touched && chain->last_touched->start < start)
+ cur_ext = chain->last_touched;
+ else if (chain->first && chain->first->start < start)
+ cur_ext = chain->first;
+
+ if (cur_ext) {
+ while (cur_ext->next && cur_ext->next->start < start)
+ cur_ext = cur_ext->next;
+
+ if (cur_ext->end == (start - 1)) {
+ struct hibernate_extent *next_ext = cur_ext->next;
+ cur_ext->end = end;
+
+ /* Merge with the following one? */
+ if (next_ext && cur_ext->end + 1 == next_ext->start) {
+ cur_ext->end = next_ext->end;
+ cur_ext->next = next_ext->next;
+ toi_kfree(2, next_ext, sizeof(*next_ext));
+ chain->num_extents--;
+ }
+
+ chain->last_touched = cur_ext;
+ chain->size += (end - start + 1);
+
+ return 0;
+ }
+ }
+
+ new_ext = toi_get_extent();
+ if (!new_ext) {
+ printk(KERN_INFO "Error unable to append a new extent to the "
+ "chain.\n");
+ return -ENOMEM;
+ }
+
+ chain->num_extents++;
+ chain->size += (end - start + 1);
+ new_ext->start = start;
+ new_ext->end = end;
+
+ chain->last_touched = new_ext;
+
+ if (cur_ext) {
+ new_ext->next = cur_ext->next;
+ cur_ext->next = new_ext;
+ } else {
+ if (chain->first)
+ new_ext->next = chain->first;
+ chain->first = new_ext;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(toi_add_to_extent_chain);
diff --git a/kernel/power/tuxonice_extent.h b/kernel/power/tuxonice_extent.h
new file mode 100644
index 0000000..3c9a737
--- /dev/null
+++ b/kernel/power/tuxonice_extent.h
@@ -0,0 +1,44 @@
+/*
+ * kernel/power/tuxonice_extent.h
+ *
+ * Copyright (C) 2003-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains declarations related to extents. Extents are
+ * TuxOnIce's method of storing some of the metadata for the image.
+ * See tuxonice_extent.c for more info.
+ *
+ */
+
+#include "tuxonice_modules.h"
+
+#ifndef EXTENT_H
+#define EXTENT_H
+
+struct hibernate_extent {
+ unsigned long start, end;
+ struct hibernate_extent *next;
+};
+
+struct hibernate_extent_chain {
+ unsigned long size; /* size of the chain ie sum (max-min+1) */
+ int num_extents;
+ struct hibernate_extent *first, *last_touched;
+ struct hibernate_extent *current_extent;
+ unsigned long current_offset;
+};
+
+/* Simplify iterating through all the values in an extent chain */
+#define toi_extent_for_each(extent_chain, extentpointer, value) \
+if ((extent_chain)->first) \
+ for ((extentpointer) = (extent_chain)->first, (value) = \
+ (extentpointer)->start; \
+ ((extentpointer) && ((extentpointer)->next || (value) <= \
+ (extentpointer)->end)); \
+ (((value) == (extentpointer)->end) ? \
+ ((extentpointer) = (extentpointer)->next, (value) = \
+ ((extentpointer) ? (extentpointer)->start : 0)) : \
+ (value)++))
+
+#endif
diff --git a/kernel/power/tuxonice_file.c b/kernel/power/tuxonice_file.c
new file mode 100644
index 0000000..b425767
--- /dev/null
+++ b/kernel/power/tuxonice_file.c
@@ -0,0 +1,497 @@
+/*
+ * kernel/power/tuxonice_file.c
+ *
+ * Copyright (C) 2005-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * Distributed under GPLv2.
+ *
+ * This file encapsulates functions for usage of a simple file as a
+ * backing store. It is based upon the swapallocator, and shares the
+ * same basic working. Here, though, we have nothing to do with
+ * swapspace, and only one device to worry about.
+ *
+ * The user can just
+ *
+ * echo TuxOnIce > /path/to/my_file
+ *
+ * dd if=/dev/zero bs=1M count=<file_size_desired> >> /path/to/my_file
+ *
+ * and
+ *
+ * echo /path/to/my_file > /sys/power/tuxonice/file/target
+ *
+ * then put what they find in /sys/power/tuxonice/resume
+ * as their resume= parameter in lilo.conf (and rerun lilo if using it).
+ *
+ * Having done this, they're ready to hibernate and resume.
+ *
+ * TODO:
+ * - File resizing.
+ */
+
+#include <linux/blkdev.h>
+#include <linux/mount.h>
+#include <linux/fs.h>
+#include <linux/fs_uuid.h>
+
+#include "tuxonice.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_bio.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_builtin.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_io.h"
+
+#define target_is_normal_file() (S_ISREG(target_inode->i_mode))
+
+static struct toi_module_ops toi_fileops;
+
+static struct file *target_file;
+static struct block_device *toi_file_target_bdev;
+static unsigned long pages_available, pages_allocated;
+static char toi_file_target[256];
+static struct inode *target_inode;
+static int file_target_priority;
+static int used_devt;
+static int target_claim;
+static dev_t toi_file_dev_t;
+static int sig_page_index;
+
+/* For test_toi_file_target */
+static struct toi_bdev_info *file_chain;
+
+static int has_contiguous_blocks(struct toi_bdev_info *dev_info, int page_num)
+{
+ int j;
+ sector_t last = 0;
+
+ for (j = 0; j < dev_info->blocks_per_page; j++) {
+ sector_t this = bmap(target_inode,
+ page_num * dev_info->blocks_per_page + j);
+
+ if (!this || (last && (last + 1) != this))
+ break;
+
+ last = this;
+ }
+
+ return j == dev_info->blocks_per_page;
+}
+
+static unsigned long get_usable_pages(struct toi_bdev_info *dev_info)
+{
+ unsigned long result = 0;
+ struct block_device *bdev = dev_info->bdev;
+ int i;
+
+ switch (target_inode->i_mode & S_IFMT) {
+ case S_IFSOCK:
+ case S_IFCHR:
+ case S_IFIFO: /* Socket, Char, Fifo */
+ return -1;
+ case S_IFREG: /* Regular file: current size - holes + free
+ space on part */
+ for (i = 0; i < (target_inode->i_size >> PAGE_SHIFT) ; i++) {
+ if (has_contiguous_blocks(dev_info, i))
+ result++;
+ }
+ break;
+ case S_IFBLK: /* Block device */
+ if (!bdev->bd_disk) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0,
+ "bdev->bd_disk null.");
+ return 0;
+ }
+
+ result = (bdev->bd_part ?
+ bdev->bd_part->nr_sects :
+ get_capacity(bdev->bd_disk)) >> (PAGE_SHIFT - 9);
+ }
+
+
+ return result;
+}
+
+static int toi_file_register_storage(void)
+{
+ struct toi_bdev_info *devinfo;
+ int result = 0;
+ struct fs_info *fs_info;
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "toi_file_register_storage.");
+ if (!strlen(toi_file_target)) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Register file storage: "
+ "No target filename set.");
+ return 0;
+ }
+
+ target_file = filp_open(toi_file_target, O_RDONLY|O_LARGEFILE, 0);
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "filp_open %s returned %p.",
+ toi_file_target, target_file);
+
+ if (IS_ERR(target_file) || !target_file) {
+ target_file = NULL;
+ toi_file_dev_t = name_to_dev_t(toi_file_target);
+ if (!toi_file_dev_t) {
+ struct kstat stat;
+ int error = vfs_stat(toi_file_target, &stat);
+ printk(KERN_INFO "Open file %s returned %p and "
+ "name_to_devt failed.\n",
+ toi_file_target, target_file);
+ if (error) {
+ printk(KERN_INFO "Stating the file also failed."
+ " Nothing more we can do.\n");
+ return 0;
+ } else
+ toi_file_dev_t = stat.rdev;
+ }
+
+ toi_file_target_bdev = toi_open_by_devnum(toi_file_dev_t);
+ if (IS_ERR(toi_file_target_bdev)) {
+ printk(KERN_INFO "Got a dev_num (%lx) but failed to "
+ "open it.\n",
+ (unsigned long) toi_file_dev_t);
+ toi_file_target_bdev = NULL;
+ return 0;
+ }
+ used_devt = 1;
+ target_inode = toi_file_target_bdev->bd_inode;
+ } else
+ target_inode = target_file->f_mapping->host;
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Succeeded in opening the target.");
+ if (S_ISLNK(target_inode->i_mode) || S_ISDIR(target_inode->i_mode) ||
+ S_ISSOCK(target_inode->i_mode) || S_ISFIFO(target_inode->i_mode)) {
+ printk(KERN_INFO "File support works with regular files,"
+ " character files and block devices.\n");
+ /* Cleanup routine will undo the above */
+ return 0;
+ }
+
+ if (!used_devt) {
+ if (S_ISBLK(target_inode->i_mode)) {
+ toi_file_target_bdev = I_BDEV(target_inode);
+ if (!blkdev_get(toi_file_target_bdev, FMODE_WRITE |
+ FMODE_READ, NULL))
+ target_claim = 1;
+ } else
+ toi_file_target_bdev = target_inode->i_sb->s_bdev;
+ if (!toi_file_target_bdev) {
+ printk(KERN_INFO "%s is not a valid file allocator "
+ "target.\n", toi_file_target);
+ return 0;
+ }
+ toi_file_dev_t = toi_file_target_bdev->bd_dev;
+ }
+
+ devinfo = toi_kzalloc(39, sizeof(struct toi_bdev_info), GFP_ATOMIC);
+ if (!devinfo) {
+ printk("Failed to allocate a toi_bdev_info struct for the file allocator.\n");
+ return -ENOMEM;
+ }
+
+ devinfo->bdev = toi_file_target_bdev;
+ devinfo->allocator = &toi_fileops;
+ devinfo->allocator_index = 0;
+
+ fs_info = fs_info_from_block_dev(toi_file_target_bdev);
+ if (fs_info && !IS_ERR(fs_info)) {
+ memcpy(devinfo->uuid, &fs_info->uuid, 16);
+ free_fs_info(fs_info);
+ } else
+ result = (int) PTR_ERR(fs_info);
+
+ /* Unlike swap code, only complain if fs_info_from_block_dev returned
+ * -ENOMEM. The 'file' might be a full partition, so might validly not
+ * have an identifiable type, UUID etc.
+ */
+ if (result)
+ printk(KERN_DEBUG "Failed to get fs_info for file device (%d).\n",
+ result);
+ devinfo->dev_t = toi_file_dev_t;
+ devinfo->prio = file_target_priority;
+ devinfo->bmap_shift = target_inode->i_blkbits - 9;
+ devinfo->blocks_per_page =
+ (1 << (PAGE_SHIFT - target_inode->i_blkbits));
+ sprintf(devinfo->name, "file %s", toi_file_target);
+ file_chain = devinfo;
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Dev_t is %lx. Prio is %d. Bmap "
+ "shift is %d. Blocks per page %d.",
+ devinfo->dev_t, devinfo->prio, devinfo->bmap_shift,
+ devinfo->blocks_per_page);
+
+ /* Keep one aside for the signature */
+ pages_available = get_usable_pages(devinfo) - 1;
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Registering file storage, %lu "
+ "pages.", pages_available);
+
+ toi_bio_ops.register_storage(devinfo);
+ return 0;
+}
+
+static unsigned long toi_file_storage_available(void)
+{
+ return pages_available;
+}
+
+static int toi_file_allocate_storage(struct toi_bdev_info *chain,
+ unsigned long request)
+{
+ unsigned long available = pages_available - pages_allocated;
+ unsigned long to_add = min(available, request);
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Pages available is %lu. Allocated "
+ "is %lu. Allocating %lu pages from file.",
+ pages_available, pages_allocated, to_add);
+ pages_allocated += to_add;
+
+ return to_add;
+}
+
+/**
+ * __populate_block_list - add an extent to the chain
+ * @min: Start of the extent (first physical block = sector)
+ * @max: End of the extent (last physical block = sector)
+ *
+ * If TOI_TEST_BIO is set, print a debug message, outputting the min and max
+ * fs block numbers.
+ **/
+static int __populate_block_list(struct toi_bdev_info *chain, int min, int max)
+{
+ if (test_action_state(TOI_TEST_BIO))
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Adding extent %d-%d.",
+ min << chain->bmap_shift,
+ ((max + 1) << chain->bmap_shift) - 1);
+
+ return toi_add_to_extent_chain(&chain->blocks, min, max);
+}
+
+static int get_main_pool_phys_params(struct toi_bdev_info *chain)
+{
+ int i, extent_min = -1, extent_max = -1, result = 0, have_sig_page = 0;
+ unsigned long pages_mapped = 0;
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Getting file allocator blocks.");
+
+ if (chain->blocks.first)
+ toi_put_extent_chain(&chain->blocks);
+
+ if (!target_is_normal_file()) {
+ result = (pages_available > 0) ?
+ __populate_block_list(chain, chain->blocks_per_page,
+ (pages_allocated + 1) *
+ chain->blocks_per_page - 1) : 0;
+ return result;
+ }
+
+ /*
+ * FIXME: We are assuming the first page is contiguous. Is that
+ * assumption always right?
+ */
+
+ for (i = 0; i < (target_inode->i_size >> PAGE_SHIFT); i++) {
+ sector_t new_sector;
+
+ if (!has_contiguous_blocks(chain, i))
+ continue;
+
+ if (!have_sig_page) {
+ have_sig_page = 1;
+ sig_page_index = i;
+ continue;
+ }
+
+ pages_mapped++;
+
+ /* Ignore first page - it has the header */
+ if (pages_mapped == 1)
+ continue;
+
+ new_sector = bmap(target_inode, (i * chain->blocks_per_page));
+
+ /*
+ * I'd love to be able to fill in holes and resize
+ * files, but not yet...
+ */
+
+ if (new_sector == extent_max + 1)
+ extent_max += chain->blocks_per_page;
+ else {
+ if (extent_min > -1) {
+ result = __populate_block_list(chain,
+ extent_min, extent_max);
+ if (result)
+ return result;
+ }
+
+ extent_min = new_sector;
+ extent_max = extent_min +
+ chain->blocks_per_page - 1;
+ }
+
+ if (pages_mapped == pages_allocated)
+ break;
+ }
+
+ if (extent_min > -1) {
+ result = __populate_block_list(chain, extent_min, extent_max);
+ if (result)
+ return result;
+ }
+
+ return 0;
+}
+
+static void toi_file_free_storage(struct toi_bdev_info *chain)
+{
+ pages_allocated = 0;
+ file_chain = NULL;
+}
+
+/**
+ * toi_file_print_debug_stats - print debug info
+ * @buffer: Buffer to data to populate
+ * @size: Size of the buffer
+ **/
+static int toi_file_print_debug_stats(char *buffer, int size)
+{
+ int len = scnprintf(buffer, size, "- File Allocator active.\n");
+
+ len += scnprintf(buffer+len, size-len, " Storage available for "
+ "image: %lu pages.\n", pages_available);
+
+ return len;
+}
+
+static void toi_file_cleanup(int finishing_cycle)
+{
+ if (toi_file_target_bdev) {
+ if (target_claim) {
+ blkdev_put(toi_file_target_bdev, FMODE_WRITE | FMODE_READ);
+ target_claim = 0;
+ }
+
+ if (used_devt) {
+ blkdev_put(toi_file_target_bdev,
+ FMODE_READ | FMODE_NDELAY);
+ used_devt = 0;
+ }
+ toi_file_target_bdev = NULL;
+ target_inode = NULL;
+ }
+
+ if (target_file) {
+ filp_close(target_file, NULL);
+ target_file = NULL;
+ }
+
+ pages_available = 0;
+}
+
+/**
+ * test_toi_file_target - sysfs callback for /sys/power/tuxonince/file/target
+ *
+ * Test wheter the target file is valid for hibernating.
+ **/
+static void test_toi_file_target(void)
+{
+ int result = toi_file_register_storage();
+ sector_t sector;
+ char buf[50];
+ struct fs_info *fs_info;
+
+ if (result || !file_chain)
+ return;
+
+ /* This doesn't mean we're in business. Is any storage available? */
+ if (!pages_available)
+ goto out;
+
+ toi_file_allocate_storage(file_chain, 1);
+ result = get_main_pool_phys_params(file_chain);
+ if (result)
+ goto out;
+
+
+ sector = bmap(target_inode, sig_page_index *
+ file_chain->blocks_per_page) << file_chain->bmap_shift;
+
+ /* Use the uuid, or the dev_t if that fails */
+ fs_info = fs_info_from_block_dev(toi_file_target_bdev);
+ if (!fs_info || IS_ERR(fs_info)) {
+ bdevname(toi_file_target_bdev, buf);
+ sprintf(resume_file, "/dev/%s:%llu", buf,
+ (unsigned long long) sector);
+ } else {
+ int i;
+ hex_dump_to_buffer(fs_info->uuid, 16, 32, 1, buf, 50, 0);
+
+ /* Remove the spaces */
+ for (i = 1; i < 16; i++) {
+ buf[2 * i] = buf[3 * i];
+ buf[2 * i + 1] = buf[3 * i + 1];
+ }
+ buf[32] = 0;
+ sprintf(resume_file, "UUID=%s:0x%llx", buf,
+ (unsigned long long) sector);
+ free_fs_info(fs_info);
+ }
+
+ toi_attempt_to_parse_resume_device(0);
+out:
+ toi_file_free_storage(file_chain);
+ toi_bio_ops.free_storage();
+}
+
+static struct toi_sysfs_data sysfs_params[] = {
+ SYSFS_STRING("target", SYSFS_RW, toi_file_target, 256,
+ SYSFS_NEEDS_SM_FOR_WRITE, test_toi_file_target),
+ SYSFS_INT("enabled", SYSFS_RW, &toi_fileops.enabled, 0, 1, 0, NULL),
+ SYSFS_INT("priority", SYSFS_RW, &file_target_priority, -4095,
+ 4096, 0, NULL),
+};
+
+static struct toi_bio_allocator_ops toi_bio_fileops = {
+ .register_storage = toi_file_register_storage,
+ .storage_available = toi_file_storage_available,
+ .allocate_storage = toi_file_allocate_storage,
+ .bmap = get_main_pool_phys_params,
+ .free_storage = toi_file_free_storage,
+};
+
+static struct toi_module_ops toi_fileops = {
+ .type = BIO_ALLOCATOR_MODULE,
+ .name = "file storage",
+ .directory = "file",
+ .module = THIS_MODULE,
+ .print_debug_info = toi_file_print_debug_stats,
+ .cleanup = toi_file_cleanup,
+ .bio_allocator_ops = &toi_bio_fileops,
+
+ .sysfs_data = sysfs_params,
+ .num_sysfs_entries = sizeof(sysfs_params) /
+ sizeof(struct toi_sysfs_data),
+};
+
+/* ---- Registration ---- */
+static __init int toi_file_load(void)
+{
+ return toi_register_module(&toi_fileops);
+}
+
+#ifdef MODULE
+static __exit void toi_file_unload(void)
+{
+ toi_unregister_module(&toi_fileops);
+}
+
+module_init(toi_file_load);
+module_exit(toi_file_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("TuxOnIce FileAllocator");
+#else
+late_initcall(toi_file_load);
+#endif
diff --git a/kernel/power/tuxonice_highlevel.c b/kernel/power/tuxonice_highlevel.c
new file mode 100644
index 0000000..4f49e22
--- /dev/null
+++ b/kernel/power/tuxonice_highlevel.c
@@ -0,0 +1,1351 @@
+/*
+ * kernel/power/tuxonice_highlevel.c
+ */
+/** \mainpage TuxOnIce.
+ *
+ * TuxOnIce provides support for saving and restoring an image of
+ * system memory to an arbitrary storage device, either on the local computer,
+ * or across some network. The support is entirely OS based, so TuxOnIce
+ * works without requiring BIOS, APM or ACPI support. The vast majority of the
+ * code is also architecture independant, so it should be very easy to port
+ * the code to new architectures. TuxOnIce includes support for SMP, 4G HighMem
+ * and preemption. Initramfses and initrds are also supported.
+ *
+ * TuxOnIce uses a modular design, in which the method of storing the image is
+ * completely abstracted from the core code, as are transformations on the data
+ * such as compression and/or encryption (multiple 'modules' can be used to
+ * provide arbitrary combinations of functionality). The user interface is also
+ * modular, so that arbitrarily simple or complex interfaces can be used to
+ * provide anything from debugging information through to eye candy.
+ *
+ * \section Copyright
+ *
+ * TuxOnIce is released under the GPLv2.
+ *
+ * Copyright (C) 1998-2001 Gabor Kuti <seasons@fornax.hu><BR>
+ * Copyright (C) 1998,2001,2002 Pavel Machek <pavel@suse.cz><BR>
+ * Copyright (C) 2002-2003 Florent Chabaud <fchabaud@free.fr><BR>
+ * Copyright (C) 2002-2014 Nigel Cunningham (nigel at tuxonice net)<BR>
+ *
+ * \section Credits
+ *
+ * Nigel would like to thank the following people for their work:
+ *
+ * Bernard Blackham <bernard@blackham.com.au><BR>
+ * Web page & Wiki administration, some coding. A person without whom
+ * TuxOnIce would not be where it is.
+ *
+ * Michael Frank <mhf@linuxmail.org><BR>
+ * Extensive testing and help with improving stability. I was constantly
+ * amazed by the quality and quantity of Michael's help.
+ *
+ * Pavel Machek <pavel@ucw.cz><BR>
+ * Modifications, defectiveness pointing, being with Gabor at the very
+ * beginning, suspend to swap space, stop all tasks. Port to 2.4.18-ac and
+ * 2.5.17. Even though Pavel and I disagree on the direction suspend to
+ * disk should take, I appreciate the valuable work he did in helping Gabor
+ * get the concept working.
+ *
+ * ..and of course the myriads of TuxOnIce users who have helped diagnose
+ * and fix bugs, made suggestions on how to improve the code, proofread
+ * documentation, and donated time and money.
+ *
+ * Thanks also to corporate sponsors:
+ *
+ * <B>Redhat.</B>Sometime employer from May 2006 (my fault, not Redhat's!).
+ *
+ * <B>Cyclades.com.</B> Nigel's employers from Dec 2004 until May 2006, who
+ * allowed him to work on TuxOnIce and PM related issues on company time.
+ *
+ * <B>LinuxFund.org.</B> Sponsored Nigel's work on TuxOnIce for four months Oct
+ * 2003 to Jan 2004.
+ *
+ * <B>LAC Linux.</B> Donated P4 hardware that enabled development and ongoing
+ * maintenance of SMP and Highmem support.
+ *
+ * <B>OSDL.</B> Provided access to various hardware configurations, make
+ * occasional small donations to the project.
+ */
+
+#include <linux/suspend.h>
+#include <linux/freezer.h>
+#include <generated/utsrelease.h>
+#include <linux/cpu.h>
+#include <linux/console.h>
+#include <linux/writeback.h>
+#include <linux/uaccess.h> /* for get/set_fs & KERNEL_DS on i386 */
+#include <linux/bio.h>
+#include <linux/kgdb.h>
+
+#include "tuxonice.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_prepare_image.h"
+#include "tuxonice_io.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_power_off.h"
+#include "tuxonice_storage.h"
+#include "tuxonice_checksum.h"
+#include "tuxonice_builtin.h"
+#include "tuxonice_atomic_copy.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_cluster.h"
+
+/*! Pageset metadata. */
+struct pagedir pagedir2 = {2};
+EXPORT_SYMBOL_GPL(pagedir2);
+
+static mm_segment_t oldfs;
+static DEFINE_MUTEX(tuxonice_in_use);
+static int block_dump_save;
+
+/* Binary signature if an image is present */
+char tuxonice_signature[9] = "\xed\xc3\x02\xe9\x98\x56\xe5\x0c";
+EXPORT_SYMBOL_GPL(tuxonice_signature);
+
+unsigned long boot_kernel_data_buffer;
+
+static char *result_strings[] = {
+ "Hibernation was aborted",
+ "The user requested that we cancel the hibernation",
+ "No storage was available",
+ "Insufficient storage was available",
+ "Freezing filesystems and/or tasks failed",
+ "A pre-existing image was used",
+ "We would free memory, but image size limit doesn't allow this",
+ "Unable to free enough memory to hibernate",
+ "Unable to obtain the Power Management Semaphore",
+ "A device suspend/resume returned an error",
+ "A system device suspend/resume returned an error",
+ "The extra pages allowance is too small",
+ "We were unable to successfully prepare an image",
+ "TuxOnIce module initialisation failed",
+ "TuxOnIce module cleanup failed",
+ "I/O errors were encountered",
+ "Ran out of memory",
+ "An error was encountered while reading the image",
+ "Platform preparation failed",
+ "CPU Hotplugging failed",
+ "Architecture specific preparation failed",
+ "Pages needed resaving, but we were told to abort if this happens",
+ "We can't hibernate at the moment (invalid resume= or filewriter "
+ "target?)",
+ "A hibernation preparation notifier chain member cancelled the "
+ "hibernation",
+ "Pre-snapshot preparation failed",
+ "Pre-restore preparation failed",
+ "Failed to disable usermode helpers",
+ "Can't resume from alternate image",
+ "Header reservation too small",
+ "Device Power Management Preparation failed",
+};
+
+/**
+ * toi_finish_anything - cleanup after doing anything
+ * @hibernate_or_resume: Whether finishing a cycle or attempt at
+ * resuming.
+ *
+ * This is our basic clean-up routine, matching start_anything below. We
+ * call cleanup routines, drop module references and restore process fs and
+ * cpus allowed masks, together with the global block_dump variable's value.
+ **/
+void toi_finish_anything(int hibernate_or_resume)
+{
+ toi_cleanup_modules(hibernate_or_resume);
+ toi_put_modules();
+ if (hibernate_or_resume) {
+ block_dump = block_dump_save;
+ set_cpus_allowed_ptr(current, cpu_all_mask);
+ toi_alloc_print_debug_stats();
+ atomic_inc(&snapshot_device_available);
+ unlock_system_sleep();
+ }
+
+ set_fs(oldfs);
+ mutex_unlock(&tuxonice_in_use);
+}
+
+/**
+ * toi_start_anything - basic initialisation for TuxOnIce
+ * @toi_or_resume: Whether starting a cycle or attempt at resuming.
+ *
+ * Our basic initialisation routine. Take references on modules, use the
+ * kernel segment, recheck resume= if no active allocator is set, initialise
+ * modules, save and reset block_dump and ensure we're running on CPU0.
+ **/
+int toi_start_anything(int hibernate_or_resume)
+{
+ mutex_lock(&tuxonice_in_use);
+
+ oldfs = get_fs();
+ set_fs(KERNEL_DS);
+
+ if (hibernate_or_resume) {
+ lock_system_sleep();
+
+ if (!atomic_add_unless(&snapshot_device_available, -1, 0))
+ goto snapshotdevice_unavailable;
+ }
+
+ if (hibernate_or_resume == SYSFS_HIBERNATE)
+ toi_print_modules();
+
+ if (toi_get_modules()) {
+ printk(KERN_INFO "TuxOnIce: Get modules failed!\n");
+ goto prehibernate_err;
+ }
+
+ if (hibernate_or_resume) {
+ block_dump_save = block_dump;
+ block_dump = 0;
+ set_cpus_allowed_ptr(current,
+ cpumask_of(cpumask_first(cpu_online_mask)));
+ }
+
+ if (toi_initialise_modules_early(hibernate_or_resume))
+ goto early_init_err;
+
+ if (!toiActiveAllocator)
+ toi_attempt_to_parse_resume_device(!hibernate_or_resume);
+
+ if (!toi_initialise_modules_late(hibernate_or_resume))
+ return 0;
+
+ toi_cleanup_modules(hibernate_or_resume);
+early_init_err:
+ if (hibernate_or_resume) {
+ block_dump_save = block_dump;
+ set_cpus_allowed_ptr(current, cpu_all_mask);
+ }
+ toi_put_modules();
+prehibernate_err:
+ if (hibernate_or_resume)
+ atomic_inc(&snapshot_device_available);
+snapshotdevice_unavailable:
+ if (hibernate_or_resume)
+ mutex_unlock(&pm_mutex);
+ set_fs(oldfs);
+ mutex_unlock(&tuxonice_in_use);
+ return -EBUSY;
+}
+
+/*
+ * Nosave page tracking.
+ *
+ * Here rather than in prepare_image because we want to do it once only at the
+ * start of a cycle.
+ */
+
+/**
+ * mark_nosave_pages - set up our Nosave bitmap
+ *
+ * Build a bitmap of Nosave pages from the list. The bitmap allows faster
+ * use when preparing the image.
+ **/
+static void mark_nosave_pages(void)
+{
+ struct nosave_region *region;
+
+ list_for_each_entry(region, &nosave_regions, list) {
+ unsigned long pfn;
+
+ for (pfn = region->start_pfn; pfn < region->end_pfn; pfn++)
+ if (pfn_valid(pfn))
+ SetPageNosave(pfn_to_page(pfn));
+ }
+}
+
+static int toi_alloc_bitmap(struct memory_bitmap **bm)
+{
+ int result = 0;
+
+ *bm = kzalloc(sizeof(struct memory_bitmap), GFP_KERNEL);
+ if (!*bm) {
+ printk(KERN_ERR "Failed to kzalloc memory for a bitmap.\n");
+ return -ENOMEM;
+ }
+
+ result = memory_bm_create(*bm, GFP_KERNEL, 0);
+
+ if (result) {
+ printk(KERN_ERR "Failed to create a bitmap.\n");
+ kfree(*bm);
+ *bm = NULL;
+ }
+
+ return result;
+}
+
+/**
+ * allocate_bitmaps - allocate bitmaps used to record page states
+ *
+ * Allocate the bitmaps we use to record the various TuxOnIce related
+ * page states.
+ **/
+static int allocate_bitmaps(void)
+{
+ if (toi_alloc_bitmap(&pageset1_map) ||
+ toi_alloc_bitmap(&pageset1_copy_map) ||
+ toi_alloc_bitmap(&pageset2_map) ||
+ toi_alloc_bitmap(&io_map) ||
+ toi_alloc_bitmap(&nosave_map) ||
+ toi_alloc_bitmap(&free_map) ||
+ toi_alloc_bitmap(&compare_map) ||
+ toi_alloc_bitmap(&page_resave_map))
+ return 1;
+
+ return 0;
+}
+
+static void toi_free_bitmap(struct memory_bitmap **bm)
+{
+ if (!*bm)
+ return;
+
+ memory_bm_free(*bm, 0);
+ kfree(*bm);
+ *bm = NULL;
+}
+
+/**
+ * free_bitmaps - free the bitmaps used to record page states
+ *
+ * Free the bitmaps allocated above. It is not an error to call
+ * memory_bm_free on a bitmap that isn't currently allocated.
+ **/
+static void free_bitmaps(void)
+{
+ toi_free_bitmap(&pageset1_map);
+ toi_free_bitmap(&pageset1_copy_map);
+ toi_free_bitmap(&pageset2_map);
+ toi_free_bitmap(&io_map);
+ toi_free_bitmap(&nosave_map);
+ toi_free_bitmap(&free_map);
+ toi_free_bitmap(&compare_map);
+ toi_free_bitmap(&page_resave_map);
+}
+
+/**
+ * io_MB_per_second - return the number of MB/s read or written
+ * @write: Whether to return the speed at which we wrote.
+ *
+ * Calculate the number of megabytes per second that were read or written.
+ **/
+static int io_MB_per_second(int write)
+{
+ return (toi_bkd.toi_io_time[write][1]) ?
+ MB((unsigned long) toi_bkd.toi_io_time[write][0]) * HZ /
+ toi_bkd.toi_io_time[write][1] : 0;
+}
+
+#define SNPRINTF(a...) do { len += scnprintf(((char *) buffer) + len, \
+ count - len - 1, ## a); } while (0)
+
+/**
+ * get_debug_info - fill a buffer with debugging information
+ * @buffer: The buffer to be filled.
+ * @count: The size of the buffer, in bytes.
+ *
+ * Fill a (usually PAGE_SIZEd) buffer with the debugging info that we will
+ * either printk or return via sysfs.
+ **/
+static int get_toi_debug_info(const char *buffer, int count)
+{
+ int len = 0, i, first_result = 1;
+
+ SNPRINTF("TuxOnIce debugging info:\n");
+ SNPRINTF("- TuxOnIce core : " TOI_CORE_VERSION "\n");
+ SNPRINTF("- Kernel Version : " UTS_RELEASE "\n");
+ SNPRINTF("- Compiler vers. : %d.%d\n", __GNUC__, __GNUC_MINOR__);
+ SNPRINTF("- Attempt number : %d\n", nr_hibernates);
+ SNPRINTF("- Parameters : %ld %ld %ld %d %ld %ld\n",
+ toi_result,
+ toi_bkd.toi_action,
+ toi_bkd.toi_debug_state,
+ toi_bkd.toi_default_console_level,
+ image_size_limit,
+ toi_poweroff_method);
+ SNPRINTF("- Overall expected compression percentage: %d.\n",
+ 100 - toi_expected_compression_ratio());
+ len += toi_print_module_debug_info(((char *) buffer) + len,
+ count - len - 1);
+ if (toi_bkd.toi_io_time[0][1]) {
+ if ((io_MB_per_second(0) < 5) || (io_MB_per_second(1) < 5)) {
+ SNPRINTF("- I/O speed: Write %ld KB/s",
+ (KB((unsigned long) toi_bkd.toi_io_time[0][0]) * HZ /
+ toi_bkd.toi_io_time[0][1]));
+ if (toi_bkd.toi_io_time[1][1])
+ SNPRINTF(", Read %ld KB/s",
+ (KB((unsigned long)
+ toi_bkd.toi_io_time[1][0]) * HZ /
+ toi_bkd.toi_io_time[1][1]));
+ } else {
+ SNPRINTF("- I/O speed: Write %ld MB/s",
+ (MB((unsigned long) toi_bkd.toi_io_time[0][0]) * HZ /
+ toi_bkd.toi_io_time[0][1]));
+ if (toi_bkd.toi_io_time[1][1])
+ SNPRINTF(", Read %ld MB/s",
+ (MB((unsigned long)
+ toi_bkd.toi_io_time[1][0]) * HZ /
+ toi_bkd.toi_io_time[1][1]));
+ }
+ SNPRINTF(".\n");
+ } else
+ SNPRINTF("- No I/O speed stats available.\n");
+ SNPRINTF("- Extra pages : %lu used/%lu.\n",
+ extra_pd1_pages_used, extra_pd1_pages_allowance);
+
+ for (i = 0; i < TOI_NUM_RESULT_STATES; i++)
+ if (test_result_state(i)) {
+ SNPRINTF("%s: %s.\n", first_result ?
+ "- Result " :
+ " ",
+ result_strings[i]);
+ first_result = 0;
+ }
+ if (first_result)
+ SNPRINTF("- Result : %s.\n", nr_hibernates ?
+ "Succeeded" :
+ "No hibernation attempts so far");
+ return len;
+}
+
+/**
+ * do_cleanup - cleanup after attempting to hibernate or resume
+ * @get_debug_info: Whether to allocate and return debugging info.
+ *
+ * Cleanup after attempting to hibernate or resume, possibly getting
+ * debugging info as we do so.
+ **/
+static void do_cleanup(int get_debug_info, int restarting)
+{
+ int i = 0;
+ char *buffer = NULL;
+
+ trap_non_toi_io = 0;
+
+ if (get_debug_info)
+ toi_prepare_status(DONT_CLEAR_BAR, "Cleaning up...");
+
+ free_checksum_pages();
+
+ if (get_debug_info)
+ buffer = (char *) toi_get_zeroed_page(20, TOI_ATOMIC_GFP);
+
+ if (buffer)
+ i = get_toi_debug_info(buffer, PAGE_SIZE);
+
+ toi_free_extra_pagedir_memory();
+
+ pagedir1.size = 0;
+ pagedir2.size = 0;
+ set_highmem_size(pagedir1, 0);
+ set_highmem_size(pagedir2, 0);
+
+ if (boot_kernel_data_buffer) {
+ if (!test_toi_state(TOI_BOOT_KERNEL))
+ toi_free_page(37, boot_kernel_data_buffer);
+ boot_kernel_data_buffer = 0;
+ }
+
+ if (test_toi_state(TOI_DEVICE_HOTPLUG_LOCKED)) {
+ unlock_device_hotplug();
+ clear_toi_state(TOI_DEVICE_HOTPLUG_LOCKED);
+ }
+
+ clear_toi_state(TOI_BOOT_KERNEL);
+ if (current->flags & PF_SUSPEND_TASK)
+ thaw_processes();
+
+ if (!restarting)
+ toi_stop_other_threads();
+
+ if (test_action_state(TOI_KEEP_IMAGE) &&
+ !test_result_state(TOI_ABORTED)) {
+ toi_message(TOI_ANY_SECTION, TOI_LOW, 1,
+ "TuxOnIce: Not invalidating the image due "
+ "to Keep Image being enabled.");
+ set_result_state(TOI_KEPT_IMAGE);
+ } else
+ if (toiActiveAllocator)
+ toiActiveAllocator->remove_image();
+
+ free_bitmaps();
+ usermodehelper_enable();
+
+ if (test_toi_state(TOI_NOTIFIERS_PREPARE)) {
+ pm_notifier_call_chain(PM_POST_HIBERNATION);
+ clear_toi_state(TOI_NOTIFIERS_PREPARE);
+ }
+
+ if (buffer && i) {
+ /* Printk can only handle 1023 bytes, including
+ * its level mangling. */
+ for (i = 0; i < 3; i++)
+ printk(KERN_ERR "%s", buffer + (1023 * i));
+ toi_free_page(20, (unsigned long) buffer);
+ }
+
+ if (!test_action_state(TOI_LATE_CPU_HOTPLUG))
+ enable_nonboot_cpus();
+
+ if (!restarting)
+ toi_cleanup_console();
+
+ free_attention_list();
+
+ if (!restarting)
+ toi_deactivate_storage(0);
+
+ clear_toi_state(TOI_IGNORE_LOGLEVEL);
+ clear_toi_state(TOI_TRYING_TO_RESUME);
+ clear_toi_state(TOI_NOW_RESUMING);
+}
+
+/**
+ * check_still_keeping_image - we kept an image; check whether to reuse it.
+ *
+ * We enter this routine when we have kept an image. If the user has said they
+ * want to still keep it, all we need to do is powerdown. If powering down
+ * means hibernating to ram and the power doesn't run out, we'll return 1.
+ * If we do power off properly or the battery runs out, we'll resume via the
+ * normal paths.
+ *
+ * If the user has said they want to remove the previously kept image, we
+ * remove it, and return 0. We'll then store a new image.
+ **/
+static int check_still_keeping_image(void)
+{
+ if (test_action_state(TOI_KEEP_IMAGE)) {
+ printk(KERN_INFO "Image already stored: powering down "
+ "immediately.");
+ do_toi_step(STEP_HIBERNATE_POWERDOWN);
+ return 1; /* Just in case we're using S3 */
+ }
+
+ printk(KERN_INFO "Invalidating previous image.\n");
+ toiActiveAllocator->remove_image();
+
+ return 0;
+}
+
+/**
+ * toi_init - prepare to hibernate to disk
+ *
+ * Initialise variables & data structures, in preparation for
+ * hibernating to disk.
+ **/
+static int toi_init(int restarting)
+{
+ int result, i, j;
+
+ toi_result = 0;
+
+ printk(KERN_INFO "Initiating a hibernation cycle.\n");
+
+ nr_hibernates++;
+
+ for (i = 0; i < 2; i++)
+ for (j = 0; j < 2; j++)
+ toi_bkd.toi_io_time[i][j] = 0;
+
+ if (!test_toi_state(TOI_CAN_HIBERNATE) ||
+ allocate_bitmaps())
+ return 1;
+
+ mark_nosave_pages();
+
+ if (!restarting)
+ toi_prepare_console();
+
+ result = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
+ if (result) {
+ set_result_state(TOI_NOTIFIERS_PREPARE_FAILED);
+ return 1;
+ }
+ set_toi_state(TOI_NOTIFIERS_PREPARE);
+
+ if (!restarting) {
+ printk(KERN_ERR "Starting other threads.");
+ toi_start_other_threads();
+ }
+
+ result = usermodehelper_disable();
+ if (result) {
+ printk(KERN_ERR "TuxOnIce: Failed to disable usermode "
+ "helpers\n");
+ set_result_state(TOI_USERMODE_HELPERS_ERR);
+ return 1;
+ }
+
+ boot_kernel_data_buffer = toi_get_zeroed_page(37, TOI_ATOMIC_GFP);
+ if (!boot_kernel_data_buffer) {
+ printk(KERN_ERR "TuxOnIce: Failed to allocate "
+ "boot_kernel_data_buffer.\n");
+ set_result_state(TOI_OUT_OF_MEMORY);
+ return 1;
+ }
+
+ if (!test_action_state(TOI_LATE_CPU_HOTPLUG) &&
+ disable_nonboot_cpus()) {
+ set_abort_result(TOI_CPU_HOTPLUG_FAILED);
+ return 1;
+ }
+
+ return 0;
+}
+
+/**
+ * can_hibernate - perform basic 'Can we hibernate?' tests
+ *
+ * Perform basic tests that must pass if we're going to be able to hibernate:
+ * Can we get the pm_mutex? Is resume= valid (we need to know where to write
+ * the image header).
+ **/
+static int can_hibernate(void)
+{
+ if (!test_toi_state(TOI_CAN_HIBERNATE))
+ toi_attempt_to_parse_resume_device(0);
+
+ if (!test_toi_state(TOI_CAN_HIBERNATE)) {
+ printk(KERN_INFO "TuxOnIce: Hibernation is disabled.\n"
+ "This may be because you haven't put something along "
+ "the lines of\n\nresume=swap:/dev/hda1\n\n"
+ "in lilo.conf or equivalent. (Where /dev/hda1 is your "
+ "swap partition).\n");
+ set_abort_result(TOI_CANT_SUSPEND);
+ return 0;
+ }
+
+ if (strlen(alt_resume_param)) {
+ attempt_to_parse_alt_resume_param();
+
+ if (!strlen(alt_resume_param)) {
+ printk(KERN_INFO "Alternate resume parameter now "
+ "invalid. Aborting.\n");
+ set_abort_result(TOI_CANT_USE_ALT_RESUME);
+ return 0;
+ }
+ }
+
+ return 1;
+}
+
+/**
+ * do_post_image_write - having written an image, figure out what to do next
+ *
+ * After writing an image, we might load an alternate image or power down.
+ * Powering down might involve hibernating to ram, in which case we also
+ * need to handle reloading pageset2.
+ **/
+static int do_post_image_write(void)
+{
+ /* If switching images fails, do normal powerdown */
+ if (alt_resume_param[0])
+ do_toi_step(STEP_RESUME_ALT_IMAGE);
+
+ toi_power_down();
+
+ barrier();
+ mb();
+ return 0;
+}
+
+/**
+ * __save_image - do the hard work of saving the image
+ *
+ * High level routine for getting the image saved. The key assumptions made
+ * are that processes have been frozen and sufficient memory is available.
+ *
+ * We also exit through here at resume time, coming back from toi_hibernate
+ * after the atomic restore. This is the reason for the toi_in_hibernate
+ * test.
+ **/
+static int __save_image(void)
+{
+ int temp_result, did_copy = 0;
+
+ toi_prepare_status(DONT_CLEAR_BAR, "Starting to save the image..");
+
+ toi_message(TOI_ANY_SECTION, TOI_LOW, 1,
+ " - Final values: %d and %d.",
+ pagedir1.size, pagedir2.size);
+
+ toi_cond_pause(1, "About to write pagedir2.");
+
+ temp_result = write_pageset(&pagedir2);
+
+ if (temp_result == -1 || test_result_state(TOI_ABORTED))
+ return 1;
+
+ toi_cond_pause(1, "About to copy pageset 1.");
+
+ if (test_result_state(TOI_ABORTED))
+ return 1;
+
+ toi_deactivate_storage(1);
+
+ toi_prepare_status(DONT_CLEAR_BAR, "Doing atomic copy/restore.");
+
+ toi_in_hibernate = 1;
+
+ if (toi_go_atomic(PMSG_FREEZE, 1))
+ goto Failed;
+
+ temp_result = toi_hibernate();
+
+#ifdef CONFIG_KGDB
+ if (test_action_state(TOI_POST_RESUME_BREAKPOINT))
+ kgdb_breakpoint();
+#endif
+
+ if (!temp_result)
+ did_copy = 1;
+
+ /* We return here at resume time too! */
+ toi_end_atomic(ATOMIC_ALL_STEPS, toi_in_hibernate, temp_result);
+
+Failed:
+ if (toi_activate_storage(1))
+ panic("Failed to reactivate our storage.");
+
+ /* Resume time? */
+ if (!toi_in_hibernate) {
+ copyback_post();
+ return 0;
+ }
+
+ /* Nope. Hibernating. So, see if we can save the image... */
+
+ if (temp_result || test_result_state(TOI_ABORTED)) {
+ if (did_copy)
+ goto abort_reloading_pagedir_two;
+ else
+ return 1;
+ }
+
+ toi_update_status(pagedir2.size, pagedir1.size + pagedir2.size,
+ NULL);
+
+ if (test_result_state(TOI_ABORTED))
+ goto abort_reloading_pagedir_two;
+
+ toi_cond_pause(1, "About to write pageset1.");
+
+ toi_message(TOI_ANY_SECTION, TOI_LOW, 1, "-- Writing pageset1");
+
+ temp_result = write_pageset(&pagedir1);
+
+ /* We didn't overwrite any memory, so no reread needs to be done. */
+ if (test_action_state(TOI_TEST_FILTER_SPEED) ||
+ test_action_state(TOI_TEST_BIO))
+ return 1;
+
+ if (temp_result == 1 || test_result_state(TOI_ABORTED))
+ goto abort_reloading_pagedir_two;
+
+ toi_cond_pause(1, "About to write header.");
+
+ if (test_result_state(TOI_ABORTED))
+ goto abort_reloading_pagedir_two;
+
+ temp_result = write_image_header();
+
+ if (!temp_result && !test_result_state(TOI_ABORTED))
+ return 0;
+
+abort_reloading_pagedir_two:
+ temp_result = read_pageset2(1);
+
+ /* If that failed, we're sunk. Panic! */
+ if (temp_result)
+ panic("Attempt to reload pagedir 2 while aborting "
+ "a hibernate failed.");
+
+ return 1;
+}
+
+static void map_ps2_pages(int enable)
+{
+ unsigned long pfn = 0;
+
+ pfn = memory_bm_next_pfn(pageset2_map);
+
+ while (pfn != BM_END_OF_MAP) {
+ struct page *page = pfn_to_page(pfn);
+ kernel_map_pages(page, 1, enable);
+ pfn = memory_bm_next_pfn(pageset2_map);
+ }
+}
+
+/**
+ * do_save_image - save the image and handle the result
+ *
+ * Save the prepared image. If we fail or we're in the path returning
+ * from the atomic restore, cleanup.
+ **/
+static int do_save_image(void)
+{
+ int result;
+ map_ps2_pages(0);
+ result = __save_image();
+ map_ps2_pages(1);
+ return result;
+}
+
+/**
+ * do_prepare_image - try to prepare an image
+ *
+ * Seek to initialise and prepare an image to be saved. On failure,
+ * cleanup.
+ **/
+static int do_prepare_image(void)
+{
+ int restarting = test_result_state(TOI_EXTRA_PAGES_ALLOW_TOO_SMALL);
+
+ if (!restarting && toi_activate_storage(0))
+ return 1;
+
+ /*
+ * If kept image and still keeping image and hibernating to RAM, we will
+ * return 1 after hibernating and resuming (provided the power doesn't
+ * run out. In that case, we skip directly to cleaning up and exiting.
+ */
+
+ if (!can_hibernate() ||
+ (test_result_state(TOI_KEPT_IMAGE) &&
+ check_still_keeping_image()))
+ return 1;
+
+ if (toi_init(restarting) || toi_prepare_image() ||
+ test_result_state(TOI_ABORTED))
+ return 1;
+
+ trap_non_toi_io = 1;
+
+ return 0;
+}
+
+/**
+ * do_check_can_resume - find out whether an image has been stored
+ *
+ * Read whether an image exists. We use the same routine as the
+ * image_exists sysfs entry, and just look to see whether the
+ * first character in the resulting buffer is a '1'.
+ **/
+int do_check_can_resume(void)
+{
+ int result = -1;
+
+ if (toi_activate_storage(0))
+ return -1;
+
+ if (!test_toi_state(TOI_RESUME_DEVICE_OK))
+ toi_attempt_to_parse_resume_device(1);
+
+ if (toiActiveAllocator)
+ result = toiActiveAllocator->image_exists(1);
+
+ toi_deactivate_storage(0);
+ return result;
+}
+EXPORT_SYMBOL_GPL(do_check_can_resume);
+
+/**
+ * do_load_atomic_copy - load the first part of an image, if it exists
+ *
+ * Check whether we have an image. If one exists, do sanity checking
+ * (possibly invalidating the image or even rebooting if the user
+ * requests that) before loading it into memory in preparation for the
+ * atomic restore.
+ *
+ * If and only if we have an image loaded and ready to restore, we return 1.
+ **/
+static int do_load_atomic_copy(void)
+{
+ int read_image_result = 0;
+
+ if (sizeof(swp_entry_t) != sizeof(long)) {
+ printk(KERN_WARNING "TuxOnIce: The size of swp_entry_t != size"
+ " of long. Please report this!\n");
+ return 1;
+ }
+
+ if (!resume_file[0])
+ printk(KERN_WARNING "TuxOnIce: "
+ "You need to use a resume= command line parameter to "
+ "tell TuxOnIce where to look for an image.\n");
+
+ toi_activate_storage(0);
+
+ if (!(test_toi_state(TOI_RESUME_DEVICE_OK)) &&
+ !toi_attempt_to_parse_resume_device(0)) {
+ /*
+ * Without a usable storage device we can do nothing -
+ * even if noresume is given
+ */
+
+ if (!toiNumAllocators)
+ printk(KERN_ALERT "TuxOnIce: "
+ "No storage allocators have been registered.\n");
+ else
+ printk(KERN_ALERT "TuxOnIce: "
+ "Missing or invalid storage location "
+ "(resume= parameter). Please correct and "
+ "rerun lilo (or equivalent) before "
+ "hibernating.\n");
+ toi_deactivate_storage(0);
+ return 1;
+ }
+
+ if (allocate_bitmaps())
+ return 1;
+
+ read_image_result = read_pageset1(); /* non fatal error ignored */
+
+ if (test_toi_state(TOI_NORESUME_SPECIFIED))
+ clear_toi_state(TOI_NORESUME_SPECIFIED);
+
+ toi_deactivate_storage(0);
+
+ if (read_image_result)
+ return 1;
+
+ return 0;
+}
+
+/**
+ * prepare_restore_load_alt_image - save & restore alt image variables
+ *
+ * Save and restore the pageset1 maps, when loading an alternate image.
+ **/
+static void prepare_restore_load_alt_image(int prepare)
+{
+ static struct memory_bitmap *pageset1_map_save, *pageset1_copy_map_save;
+
+ if (prepare) {
+ pageset1_map_save = pageset1_map;
+ pageset1_map = NULL;
+ pageset1_copy_map_save = pageset1_copy_map;
+ pageset1_copy_map = NULL;
+ set_toi_state(TOI_LOADING_ALT_IMAGE);
+ toi_reset_alt_image_pageset2_pfn();
+ } else {
+ memory_bm_free(pageset1_map, 0);
+ pageset1_map = pageset1_map_save;
+ memory_bm_free(pageset1_copy_map, 0);
+ pageset1_copy_map = pageset1_copy_map_save;
+ clear_toi_state(TOI_NOW_RESUMING);
+ clear_toi_state(TOI_LOADING_ALT_IMAGE);
+ }
+}
+
+/**
+ * do_toi_step - perform a step in hibernating or resuming
+ *
+ * Perform a step in hibernating or resuming an image. This abstraction
+ * is in preparation for implementing cluster support, and perhaps replacing
+ * uswsusp too (haven't looked whether that's possible yet).
+ **/
+int do_toi_step(int step)
+{
+ switch (step) {
+ case STEP_HIBERNATE_PREPARE_IMAGE:
+ return do_prepare_image();
+ case STEP_HIBERNATE_SAVE_IMAGE:
+ return do_save_image();
+ case STEP_HIBERNATE_POWERDOWN:
+ return do_post_image_write();
+ case STEP_RESUME_CAN_RESUME:
+ return do_check_can_resume();
+ case STEP_RESUME_LOAD_PS1:
+ return do_load_atomic_copy();
+ case STEP_RESUME_DO_RESTORE:
+ /*
+ * If we succeed, this doesn't return.
+ * Instead, we return from do_save_image() in the
+ * hibernated kernel.
+ */
+ return toi_atomic_restore();
+ case STEP_RESUME_ALT_IMAGE:
+ printk(KERN_INFO "Trying to resume alternate image.\n");
+ toi_in_hibernate = 0;
+ save_restore_alt_param(SAVE, NOQUIET);
+ prepare_restore_load_alt_image(1);
+ if (!do_check_can_resume()) {
+ printk(KERN_INFO "Nothing to resume from.\n");
+ goto out;
+ }
+ if (!do_load_atomic_copy())
+ toi_atomic_restore();
+
+ printk(KERN_INFO "Failed to load image.\n");
+out:
+ prepare_restore_load_alt_image(0);
+ save_restore_alt_param(RESTORE, NOQUIET);
+ break;
+ case STEP_CLEANUP:
+ do_cleanup(1, 0);
+ break;
+ case STEP_QUIET_CLEANUP:
+ do_cleanup(0, 0);
+ break;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(do_toi_step);
+
+/* -- Functions for kickstarting a hibernate or resume --- */
+
+/**
+ * toi_try_resume - try to do the steps in resuming
+ *
+ * Check if we have an image and if so try to resume. Clear the status
+ * flags too.
+ **/
+void toi_try_resume(void)
+{
+ set_toi_state(TOI_TRYING_TO_RESUME);
+ resume_attempted = 1;
+
+ current->flags |= PF_MEMALLOC;
+ toi_start_other_threads();
+
+ if (do_toi_step(STEP_RESUME_CAN_RESUME) &&
+ !do_toi_step(STEP_RESUME_LOAD_PS1))
+ do_toi_step(STEP_RESUME_DO_RESTORE);
+
+ toi_stop_other_threads();
+ do_cleanup(0, 0);
+
+ current->flags &= ~PF_MEMALLOC;
+
+ clear_toi_state(TOI_IGNORE_LOGLEVEL);
+ clear_toi_state(TOI_TRYING_TO_RESUME);
+ clear_toi_state(TOI_NOW_RESUMING);
+}
+
+/**
+ * toi_sys_power_disk_try_resume - wrapper calling toi_try_resume
+ *
+ * Wrapper for when __toi_try_resume is called from swsusp resume path,
+ * rather than from echo > /sys/power/tuxonice/do_resume.
+ **/
+static void toi_sys_power_disk_try_resume(void)
+{
+ resume_attempted = 1;
+
+ /*
+ * There's a comment in kernel/power/disk.c that indicates
+ * we should be able to use mutex_lock_nested below. That
+ * doesn't seem to cut it, though, so let's just turn lockdep
+ * off for now.
+ */
+ lockdep_off();
+
+ if (toi_start_anything(SYSFS_RESUMING))
+ goto out;
+
+ toi_try_resume();
+
+ /*
+ * For initramfs, we have to clear the boot time
+ * flag after trying to resume
+ */
+ clear_toi_state(TOI_BOOT_TIME);
+
+ toi_finish_anything(SYSFS_RESUMING);
+out:
+ lockdep_on();
+}
+
+/**
+ * toi_try_hibernate - try to start a hibernation cycle
+ *
+ * Start a hibernation cycle, coming in from either
+ * echo > /sys/power/tuxonice/do_suspend
+ *
+ * or
+ *
+ * echo disk > /sys/power/state
+ *
+ * In the later case, we come in without pm_sem taken; in the
+ * former, it has been taken.
+ **/
+int toi_try_hibernate(void)
+{
+ int result = 0, sys_power_disk = 0, retries = 0;
+
+ if (!mutex_is_locked(&tuxonice_in_use)) {
+ /* Came in via /sys/power/disk */
+ if (toi_start_anything(SYSFS_HIBERNATING))
+ return -EBUSY;
+ sys_power_disk = 1;
+ }
+
+ current->flags |= PF_MEMALLOC;
+
+ if (test_toi_state(TOI_CLUSTER_MODE)) {
+ toi_initiate_cluster_hibernate();
+ goto out;
+ }
+
+prepare:
+ result = do_toi_step(STEP_HIBERNATE_PREPARE_IMAGE);
+
+ if (result)
+ goto out;
+
+ if (test_action_state(TOI_FREEZER_TEST))
+ goto out_restore_gfp_mask;
+
+ result = do_toi_step(STEP_HIBERNATE_SAVE_IMAGE);
+
+ if (test_result_state(TOI_EXTRA_PAGES_ALLOW_TOO_SMALL)) {
+ if (retries < 2) {
+ do_cleanup(0, 1);
+ retries++;
+ clear_result_state(TOI_ABORTED);
+ extra_pd1_pages_allowance = extra_pd1_pages_used + 500;
+ printk(KERN_INFO "Automatically adjusting the extra"
+ " pages allowance to %ld and restarting.\n",
+ extra_pd1_pages_allowance);
+ pm_restore_gfp_mask();
+ goto prepare;
+ }
+
+ printk(KERN_INFO "Adjusted extra pages allowance twice and "
+ "still couldn't hibernate successfully. Giving up.");
+ }
+
+ /* This code runs at resume time too! */
+ if (!result && toi_in_hibernate)
+ result = do_toi_step(STEP_HIBERNATE_POWERDOWN);
+
+out_restore_gfp_mask:
+ pm_restore_gfp_mask();
+out:
+ do_cleanup(1, 0);
+ current->flags &= ~PF_MEMALLOC;
+
+ if (sys_power_disk)
+ toi_finish_anything(SYSFS_HIBERNATING);
+
+ return result;
+}
+
+/*
+ * channel_no: If !0, -c <channel_no> is added to args (userui).
+ */
+int toi_launch_userspace_program(char *command, int channel_no,
+ int wait, int debug)
+{
+ int retval;
+ static char *envp[] = {
+ "HOME=/",
+ "TERM=linux",
+ "PATH=/sbin:/usr/sbin:/bin:/usr/bin",
+ NULL };
+ static char *argv[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
+ };
+ char *channel = NULL;
+ int arg = 0, size;
+ char test_read[255];
+ char *orig_posn = command;
+
+ if (!strlen(orig_posn))
+ return 1;
+
+ if (channel_no) {
+ channel = toi_kzalloc(4, 6, GFP_KERNEL);
+ if (!channel) {
+ printk(KERN_INFO "Failed to allocate memory in "
+ "preparing to launch userspace program.\n");
+ return 1;
+ }
+ }
+
+ /* Up to 6 args supported */
+ while (arg < 6) {
+ sscanf(orig_posn, "%s", test_read);
+ size = strlen(test_read);
+ if (!(size))
+ break;
+ argv[arg] = toi_kzalloc(5, size + 1, TOI_ATOMIC_GFP);
+ strcpy(argv[arg], test_read);
+ orig_posn += size + 1;
+ *test_read = 0;
+ arg++;
+ }
+
+ if (channel_no) {
+ sprintf(channel, "-c%d", channel_no);
+ argv[arg] = channel;
+ } else
+ arg--;
+
+ if (debug) {
+ argv[++arg] = toi_kzalloc(5, 8, TOI_ATOMIC_GFP);
+ strcpy(argv[arg], "--debug");
+ }
+
+ retval = call_usermodehelper(argv[0], argv, envp, wait);
+
+ /*
+ * If the program reports an error, retval = 256. Don't complain
+ * about that here.
+ */
+ if (retval && retval != 256)
+ printk(KERN_ERR "Failed to launch userspace program '%s': "
+ "Error %d\n", command, retval);
+
+ {
+ int i;
+ for (i = 0; i < arg; i++)
+ if (argv[i] && argv[i] != channel)
+ toi_kfree(5, argv[i], sizeof(*argv[i]));
+ }
+
+ toi_kfree(4, channel, sizeof(*channel));
+
+ return retval;
+}
+
+/*
+ * This array contains entries that are automatically registered at
+ * boot. Modules and the console code register their own entries separately.
+ */
+static struct toi_sysfs_data sysfs_params[] = {
+ SYSFS_LONG("extra_pages_allowance", SYSFS_RW,
+ &extra_pd1_pages_allowance, 0, LONG_MAX, 0),
+ SYSFS_CUSTOM("image_exists", SYSFS_RW, image_exists_read,
+ image_exists_write, SYSFS_NEEDS_SM_FOR_BOTH, NULL),
+ SYSFS_STRING("resume", SYSFS_RW, resume_file, 255,
+ SYSFS_NEEDS_SM_FOR_WRITE,
+ attempt_to_parse_resume_device2),
+ SYSFS_STRING("alt_resume_param", SYSFS_RW, alt_resume_param, 255,
+ SYSFS_NEEDS_SM_FOR_WRITE,
+ attempt_to_parse_alt_resume_param),
+ SYSFS_CUSTOM("debug_info", SYSFS_READONLY, get_toi_debug_info, NULL, 0,
+ NULL),
+ SYSFS_BIT("ignore_rootfs", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_IGNORE_ROOTFS, 0),
+ SYSFS_LONG("image_size_limit", SYSFS_RW, &image_size_limit, -2,
+ INT_MAX, 0),
+ SYSFS_UL("last_result", SYSFS_RW, &toi_result, 0, 0, 0),
+ SYSFS_BIT("no_multithreaded_io", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_NO_MULTITHREADED_IO, 0),
+ SYSFS_BIT("no_flusher_thread", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_NO_FLUSHER_THREAD, 0),
+ SYSFS_BIT("full_pageset2", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_PAGESET2_FULL, 0),
+ SYSFS_BIT("reboot", SYSFS_RW, &toi_bkd.toi_action, TOI_REBOOT, 0),
+ SYSFS_BIT("replace_swsusp", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_REPLACE_SWSUSP, 0),
+ SYSFS_STRING("resume_commandline", SYSFS_RW,
+ toi_bkd.toi_nosave_commandline, COMMAND_LINE_SIZE, 0,
+ NULL),
+ SYSFS_STRING("version", SYSFS_READONLY, TOI_CORE_VERSION, 0, 0, NULL),
+ SYSFS_BIT("freezer_test", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_FREEZER_TEST, 0),
+ SYSFS_BIT("test_bio", SYSFS_RW, &toi_bkd.toi_action, TOI_TEST_BIO, 0),
+ SYSFS_BIT("test_filter_speed", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_TEST_FILTER_SPEED, 0),
+ SYSFS_BIT("no_pageset2", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_NO_PAGESET2, 0),
+ SYSFS_BIT("no_pageset2_if_unneeded", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_NO_PS2_IF_UNNEEDED, 0),
+ SYSFS_BIT("late_cpu_hotplug", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_LATE_CPU_HOTPLUG, 0),
+ SYSFS_STRING("binary_signature", SYSFS_READONLY,
+ tuxonice_signature, 9, 0, NULL),
+ SYSFS_INT("max_workers", SYSFS_RW, &toi_max_workers, 0, NR_CPUS, 0,
+ NULL),
+#ifdef CONFIG_KGDB
+ SYSFS_BIT("post_resume_breakpoint", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_POST_RESUME_BREAKPOINT, 0),
+#endif
+ SYSFS_BIT("no_readahead", SYSFS_RW, &toi_bkd.toi_action,
+ TOI_NO_READAHEAD, 0),
+#ifdef CONFIG_TOI_KEEP_IMAGE
+ SYSFS_BIT("keep_image", SYSFS_RW , &toi_bkd.toi_action, TOI_KEEP_IMAGE,
+ 0),
+#endif
+};
+
+static struct toi_core_fns my_fns = {
+ .get_nonconflicting_page = __toi_get_nonconflicting_page,
+ .post_context_save = __toi_post_context_save,
+ .try_hibernate = toi_try_hibernate,
+ .try_resume = toi_sys_power_disk_try_resume,
+};
+
+/**
+ * core_load - initialisation of TuxOnIce core
+ *
+ * Initialise the core, beginning with sysfs. Checksum and so on are part of
+ * the core, but have their own initialisation routines because they either
+ * aren't compiled in all the time or have their own subdirectories.
+ **/
+static __init int core_load(void)
+{
+ int i,
+ numfiles = sizeof(sysfs_params) / sizeof(struct toi_sysfs_data);
+
+ printk(KERN_INFO "TuxOnIce " TOI_CORE_VERSION
+ " (http://tuxonice.net)\n");
+
+ if (toi_sysfs_init())
+ return 1;
+
+ for (i = 0; i < numfiles; i++)
+ toi_register_sysfs_file(tuxonice_kobj, &sysfs_params[i]);
+
+ toi_core_fns = &my_fns;
+
+ if (toi_alloc_init())
+ return 1;
+ if (toi_checksum_init())
+ return 1;
+ if (toi_usm_init())
+ return 1;
+ if (toi_ui_init())
+ return 1;
+ if (toi_poweroff_init())
+ return 1;
+ if (toi_cluster_init())
+ return 1;
+
+ return 0;
+}
+
+#ifdef MODULE
+/**
+ * core_unload: Prepare to unload the core code.
+ **/
+static __exit void core_unload(void)
+{
+ int i,
+ numfiles = sizeof(sysfs_params) / sizeof(struct toi_sysfs_data);
+
+ toi_alloc_exit();
+ toi_checksum_exit();
+ toi_poweroff_exit();
+ toi_ui_exit();
+ toi_usm_exit();
+ toi_cluster_exit();
+
+ for (i = 0; i < numfiles; i++)
+ toi_unregister_sysfs_file(tuxonice_kobj, &sysfs_params[i]);
+
+ toi_core_fns = NULL;
+
+ toi_sysfs_exit();
+}
+MODULE_LICENSE("GPL");
+module_init(core_load);
+module_exit(core_unload);
+#else
+late_initcall(core_load);
+#endif
diff --git a/kernel/power/tuxonice_incremental.c b/kernel/power/tuxonice_incremental.c
new file mode 100644
index 0000000..5870fdd
--- /dev/null
+++ b/kernel/power/tuxonice_incremental.c
@@ -0,0 +1,12 @@
+/*
+ * kernel/power/tuxonice_incremental.c
+ *
+ * Copyright (C) 2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains routines related to storing incremental images - that
+ * is, retaining an image after an initial cycle and then storing incremental
+ * changes on subsequent hibernations.
+ */
+
diff --git a/kernel/power/tuxonice_io.c b/kernel/power/tuxonice_io.c
new file mode 100644
index 0000000..00577e1
--- /dev/null
+++ b/kernel/power/tuxonice_io.c
@@ -0,0 +1,1936 @@
+/*
+ * kernel/power/tuxonice_io.c
+ *
+ * Copyright (C) 1998-2001 Gabor Kuti <seasons@fornax.hu>
+ * Copyright (C) 1998,2001,2002 Pavel Machek <pavel@suse.cz>
+ * Copyright (C) 2002-2003 Florent Chabaud <fchabaud@free.fr>
+ * Copyright (C) 2002-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains high level IO routines for hibernating.
+ *
+ */
+
+#include <linux/suspend.h>
+#include <linux/version.h>
+#include <linux/utsname.h>
+#include <linux/mount.h>
+#include <linux/highmem.h>
+#include <linux/kthread.h>
+#include <linux/cpu.h>
+#include <linux/fs_struct.h>
+#include <linux/bio.h>
+#include <linux/fs_uuid.h>
+#include <asm/tlbflush.h>
+
+#include "tuxonice.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_pageflags.h"
+#include "tuxonice_io.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_storage.h"
+#include "tuxonice_prepare_image.h"
+#include "tuxonice_extent.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_builtin.h"
+#include "tuxonice_checksum.h"
+#include "tuxonice_alloc.h"
+char alt_resume_param[256];
+
+/* Version read from image header at resume */
+static int toi_image_header_version;
+
+#define read_if_version(VERS, VAR, DESC, ERR_ACT) do { \
+ if (likely(toi_image_header_version >= VERS)) \
+ if (toiActiveAllocator->rw_header_chunk(READ, NULL, \
+ (char *) &VAR, sizeof(VAR))) { \
+ abort_hibernate(TOI_FAILED_IO, "Failed to read DESC."); \
+ ERR_ACT; \
+ } \
+} while(0) \
+
+/* Variables shared between threads and updated under the mutex */
+static int io_write, io_finish_at, io_base, io_barmax, io_pageset, io_result;
+static int io_index, io_nextupdate, io_pc, io_pc_step;
+static DEFINE_MUTEX(io_mutex);
+static DEFINE_PER_CPU(struct page *, last_sought);
+static DEFINE_PER_CPU(struct page *, last_high_page);
+static DEFINE_PER_CPU(char *, checksum_locn);
+static DEFINE_PER_CPU(struct pbe *, last_low_page);
+static atomic_t io_count;
+atomic_t toi_io_workers;
+EXPORT_SYMBOL_GPL(toi_io_workers);
+
+static int using_flusher;
+
+DECLARE_WAIT_QUEUE_HEAD(toi_io_queue_flusher);
+EXPORT_SYMBOL_GPL(toi_io_queue_flusher);
+
+int toi_bio_queue_flusher_should_finish;
+EXPORT_SYMBOL_GPL(toi_bio_queue_flusher_should_finish);
+
+int toi_max_workers;
+
+static char *image_version_error = "The image header version is newer than " \
+ "this kernel supports.";
+
+struct toi_module_ops *first_filter;
+
+static atomic_t toi_num_other_threads;
+static DECLARE_WAIT_QUEUE_HEAD(toi_worker_wait_queue);
+enum toi_worker_commands {
+ TOI_IO_WORKER_STOP,
+ TOI_IO_WORKER_RUN,
+ TOI_IO_WORKER_EXIT
+};
+static enum toi_worker_commands toi_worker_command;
+
+/**
+ * toi_attempt_to_parse_resume_device - determine if we can hibernate
+ *
+ * Can we hibernate, using the current resume= parameter?
+ **/
+int toi_attempt_to_parse_resume_device(int quiet)
+{
+ struct list_head *Allocator;
+ struct toi_module_ops *thisAllocator;
+ int result, returning = 0;
+
+ if (toi_activate_storage(0))
+ return 0;
+
+ toiActiveAllocator = NULL;
+ clear_toi_state(TOI_RESUME_DEVICE_OK);
+ clear_toi_state(TOI_CAN_RESUME);
+ clear_result_state(TOI_ABORTED);
+
+ if (!toiNumAllocators) {
+ if (!quiet)
+ printk(KERN_INFO "TuxOnIce: No storage allocators have "
+ "been registered. Hibernating will be "
+ "disabled.\n");
+ goto cleanup;
+ }
+
+ list_for_each(Allocator, &toiAllocators) {
+ thisAllocator = list_entry(Allocator, struct toi_module_ops,
+ type_list);
+
+ /*
+ * Not sure why you'd want to disable an allocator, but
+ * we should honour the flag if we're providing it
+ */
+ if (!thisAllocator->enabled)
+ continue;
+
+ result = thisAllocator->parse_sig_location(
+ resume_file, (toiNumAllocators == 1),
+ quiet);
+
+ switch (result) {
+ case -EINVAL:
+ /* For this allocator, but not a valid
+ * configuration. Error already printed. */
+ goto cleanup;
+
+ case 0:
+ /* For this allocator and valid. */
+ toiActiveAllocator = thisAllocator;
+
+ set_toi_state(TOI_RESUME_DEVICE_OK);
+ set_toi_state(TOI_CAN_RESUME);
+ returning = 1;
+ goto cleanup;
+ }
+ }
+ if (!quiet)
+ printk(KERN_INFO "TuxOnIce: No matching enabled allocator "
+ "found. Resuming disabled.\n");
+cleanup:
+ toi_deactivate_storage(0);
+ return returning;
+}
+EXPORT_SYMBOL_GPL(toi_attempt_to_parse_resume_device);
+
+void attempt_to_parse_resume_device2(void)
+{
+ toi_prepare_usm();
+ toi_attempt_to_parse_resume_device(0);
+ toi_cleanup_usm();
+}
+EXPORT_SYMBOL_GPL(attempt_to_parse_resume_device2);
+
+void save_restore_alt_param(int replace, int quiet)
+{
+ static char resume_param_save[255];
+ static unsigned long toi_state_save;
+
+ if (replace) {
+ toi_state_save = toi_state;
+ strcpy(resume_param_save, resume_file);
+ strcpy(resume_file, alt_resume_param);
+ } else {
+ strcpy(resume_file, resume_param_save);
+ toi_state = toi_state_save;
+ }
+ toi_attempt_to_parse_resume_device(quiet);
+}
+
+void attempt_to_parse_alt_resume_param(void)
+{
+ int ok = 0;
+
+ /* Temporarily set resume_param to the poweroff value */
+ if (!strlen(alt_resume_param))
+ return;
+
+ printk(KERN_INFO "=== Trying Poweroff Resume2 ===\n");
+ save_restore_alt_param(SAVE, NOQUIET);
+ if (test_toi_state(TOI_CAN_RESUME))
+ ok = 1;
+
+ printk(KERN_INFO "=== Done ===\n");
+ save_restore_alt_param(RESTORE, QUIET);
+
+ /* If not ok, clear the string */
+ if (ok)
+ return;
+
+ printk(KERN_INFO "Can't resume from that location; clearing "
+ "alt_resume_param.\n");
+ alt_resume_param[0] = '\0';
+}
+
+/**
+ * noresume_reset_modules - reset data structures in case of non resuming
+ *
+ * When we read the start of an image, modules (and especially the
+ * active allocator) might need to reset data structures if we
+ * decide to remove the image rather than resuming from it.
+ **/
+static void noresume_reset_modules(void)
+{
+ struct toi_module_ops *this_filter;
+
+ list_for_each_entry(this_filter, &toi_filters, type_list)
+ if (this_filter->noresume_reset)
+ this_filter->noresume_reset();
+
+ if (toiActiveAllocator && toiActiveAllocator->noresume_reset)
+ toiActiveAllocator->noresume_reset();
+}
+
+/**
+ * fill_toi_header - fill the hibernate header structure
+ * @struct toi_header: Header data structure to be filled.
+ **/
+static int fill_toi_header(struct toi_header *sh)
+{
+ int i, error;
+
+ error = init_header((struct swsusp_info *) sh);
+ if (error)
+ return error;
+
+ sh->pagedir = pagedir1;
+ sh->pageset_2_size = pagedir2.size;
+ sh->param0 = toi_result;
+ sh->param1 = toi_bkd.toi_action;
+ sh->param2 = toi_bkd.toi_debug_state;
+ sh->param3 = toi_bkd.toi_default_console_level;
+ sh->root_fs = current->fs->root.mnt->mnt_sb->s_dev;
+ for (i = 0; i < 4; i++)
+ sh->io_time[i/2][i%2] = toi_bkd.toi_io_time[i/2][i%2];
+ sh->bkd = boot_kernel_data_buffer;
+ return 0;
+}
+
+/**
+ * rw_init_modules - initialize modules
+ * @rw: Whether we are reading of writing an image.
+ * @which: Section of the image being processed.
+ *
+ * Iterate over modules, preparing the ones that will be used to read or write
+ * data.
+ **/
+static int rw_init_modules(int rw, int which)
+{
+ struct toi_module_ops *this_module;
+ /* Initialise page transformers */
+ list_for_each_entry(this_module, &toi_filters, type_list) {
+ if (!this_module->enabled)
+ continue;
+ if (this_module->rw_init && this_module->rw_init(rw, which)) {
+ abort_hibernate(TOI_FAILED_MODULE_INIT,
+ "Failed to initialize the %s filter.",
+ this_module->name);
+ return 1;
+ }
+ }
+
+ /* Initialise allocator */
+ if (toiActiveAllocator->rw_init(rw, which)) {
+ abort_hibernate(TOI_FAILED_MODULE_INIT,
+ "Failed to initialise the allocator.");
+ return 1;
+ }
+
+ /* Initialise other modules */
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled ||
+ this_module->type == FILTER_MODULE ||
+ this_module->type == WRITER_MODULE)
+ continue;
+ if (this_module->rw_init && this_module->rw_init(rw, which)) {
+ set_abort_result(TOI_FAILED_MODULE_INIT);
+ printk(KERN_INFO "Setting aborted flag due to module "
+ "init failure.\n");
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * rw_cleanup_modules - cleanup modules
+ * @rw: Whether we are reading of writing an image.
+ *
+ * Cleanup components after reading or writing a set of pages.
+ * Only the allocator may fail.
+ **/
+static int rw_cleanup_modules(int rw)
+{
+ struct toi_module_ops *this_module;
+ int result = 0;
+
+ /* Cleanup other modules */
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled ||
+ this_module->type == FILTER_MODULE ||
+ this_module->type == WRITER_MODULE)
+ continue;
+ if (this_module->rw_cleanup)
+ result |= this_module->rw_cleanup(rw);
+ }
+
+ /* Flush data and cleanup */
+ list_for_each_entry(this_module, &toi_filters, type_list) {
+ if (!this_module->enabled)
+ continue;
+ if (this_module->rw_cleanup)
+ result |= this_module->rw_cleanup(rw);
+ }
+
+ result |= toiActiveAllocator->rw_cleanup(rw);
+
+ return result;
+}
+
+static struct page *copy_page_from_orig_page(struct page *orig_page, int is_high)
+{
+ int index, min, max;
+ struct page *high_page = NULL,
+ **my_last_high_page = &__get_cpu_var(last_high_page),
+ **my_last_sought = &__get_cpu_var(last_sought);
+ struct pbe *this, **my_last_low_page = &__get_cpu_var(last_low_page);
+ void *compare;
+
+ if (is_high) {
+ if (*my_last_sought && *my_last_high_page &&
+ *my_last_sought < orig_page)
+ high_page = *my_last_high_page;
+ else
+ high_page = (struct page *) restore_highmem_pblist;
+ this = (struct pbe *) kmap(high_page);
+ compare = orig_page;
+ } else {
+ if (*my_last_sought && *my_last_low_page &&
+ *my_last_sought < orig_page)
+ this = *my_last_low_page;
+ else
+ this = restore_pblist;
+ compare = page_address(orig_page);
+ }
+
+ *my_last_sought = orig_page;
+
+ /* Locate page containing pbe */
+ while (this[PBES_PER_PAGE - 1].next &&
+ this[PBES_PER_PAGE - 1].orig_address < compare) {
+ if (is_high) {
+ struct page *next_high_page = (struct page *)
+ this[PBES_PER_PAGE - 1].next;
+ kunmap(high_page);
+ this = kmap(next_high_page);
+ high_page = next_high_page;
+ } else
+ this = this[PBES_PER_PAGE - 1].next;
+ }
+
+ /* Do a binary search within the page */
+ min = 0;
+ max = PBES_PER_PAGE;
+ index = PBES_PER_PAGE / 2;
+ while (max - min) {
+ if (!this[index].orig_address ||
+ this[index].orig_address > compare)
+ max = index;
+ else if (this[index].orig_address == compare) {
+ if (is_high) {
+ struct page *page = this[index].address;
+ *my_last_high_page = high_page;
+ kunmap(high_page);
+ return page;
+ }
+ *my_last_low_page = this;
+ return virt_to_page(this[index].address);
+ } else
+ min = index;
+ index = ((max + min) / 2);
+ };
+
+ if (is_high)
+ kunmap(high_page);
+
+ abort_hibernate(TOI_FAILED_IO, "Failed to get destination page for"
+ " orig page %p. This[min].orig_address=%p.\n", orig_page,
+ this[index].orig_address);
+ return NULL;
+}
+
+/**
+ * write_next_page - write the next page in a pageset
+ * @data_pfn: The pfn where the next data to write is located.
+ * @my_io_index: The index of the page in the pageset.
+ * @write_pfn: The pfn number to write in the image (where the data belongs).
+ *
+ * Get the pfn of the next page to write, map the page if necessary and do the
+ * write.
+ **/
+static int write_next_page(unsigned long *data_pfn, int *my_io_index,
+ unsigned long *write_pfn)
+{
+ struct page *page;
+ char **my_checksum_locn = &__get_cpu_var(checksum_locn);
+ int result = 0, was_present;
+
+ *data_pfn = memory_bm_next_pfn(io_map);
+
+ /* Another thread could have beaten us to it. */
+ if (*data_pfn == BM_END_OF_MAP) {
+ if (atomic_read(&io_count)) {
+ printk(KERN_INFO "Ran out of pfns but io_count is "
+ "still %d.\n", atomic_read(&io_count));
+ BUG();
+ }
+ mutex_unlock(&io_mutex);
+ return -ENODATA;
+ }
+
+ *my_io_index = io_finish_at - atomic_sub_return(1, &io_count);
+
+ memory_bm_clear_bit(io_map, *data_pfn);
+ page = pfn_to_page(*data_pfn);
+
+ was_present = kernel_page_present(page);
+ if (!was_present)
+ kernel_map_pages(page, 1, 1);
+
+ if (io_pageset == 1)
+ *write_pfn = memory_bm_next_pfn(pageset1_map);
+ else {
+ *write_pfn = *data_pfn;
+ *my_checksum_locn = tuxonice_get_next_checksum();
+ }
+
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Write %d:%ld.", *my_io_index, *write_pfn);
+
+ mutex_unlock(&io_mutex);
+
+ if (io_pageset == 2 && tuxonice_calc_checksum(page, *my_checksum_locn))
+ return 1;
+
+ result = first_filter->write_page(*write_pfn, TOI_PAGE, page,
+ PAGE_SIZE);
+
+ if (!was_present)
+ kernel_map_pages(page, 1, 0);
+
+ return result;
+}
+
+/**
+ * read_next_page - read the next page in a pageset
+ * @my_io_index: The index of the page in the pageset.
+ * @write_pfn: The pfn in which the data belongs.
+ *
+ * Read a page of the image into our buffer. It can happen (here and in the
+ * write routine) that threads don't get run until after other CPUs have done
+ * all the work. This was the cause of the long standing issue with
+ * occasionally getting -ENODATA errors at the end of reading the image. We
+ * therefore need to check there's actually a page to read before trying to
+ * retrieve one.
+ **/
+
+static int read_next_page(int *my_io_index, unsigned long *write_pfn,
+ struct page *buffer)
+{
+ unsigned int buf_size = PAGE_SIZE;
+ unsigned long left = atomic_read(&io_count);
+
+ if (!left)
+ return -ENODATA;
+
+ /* Start off assuming the page we read isn't resaved */
+ *my_io_index = io_finish_at - atomic_sub_return(1, &io_count);
+
+ mutex_unlock(&io_mutex);
+
+ /*
+ * Are we aborting? If so, don't submit any more I/O as
+ * resetting the resume_attempted flag (from ui.c) will
+ * clear the bdev flags, making this thread oops.
+ */
+ if (unlikely(test_toi_state(TOI_STOP_RESUME))) {
+ atomic_dec(&toi_io_workers);
+ if (!atomic_read(&toi_io_workers)) {
+ /*
+ * So we can be sure we'll have memory for
+ * marking that we haven't resumed.
+ */
+ rw_cleanup_modules(READ);
+ set_toi_state(TOI_IO_STOPPED);
+ }
+ while (1)
+ schedule();
+ }
+
+ /*
+ * See toi_bio_read_page in tuxonice_bio.c:
+ * read the next page in the image.
+ */
+ return first_filter->read_page(write_pfn, TOI_PAGE, buffer, &buf_size);
+}
+
+static void use_read_page(unsigned long write_pfn, struct page *buffer)
+{
+ struct page *final_page = pfn_to_page(write_pfn),
+ *copy_page = final_page;
+ char *virt, *buffer_virt;
+ int was_present, cpu = smp_processor_id();
+ unsigned long idx = 0;
+
+ if (io_pageset == 1 && (!pageset1_copy_map ||
+ !memory_bm_test_bit_index(pageset1_copy_map, write_pfn, cpu))) {
+ int is_high = PageHighMem(final_page);
+ copy_page = copy_page_from_orig_page(is_high ? (void *) write_pfn : final_page, is_high);
+ }
+
+ if (!memory_bm_test_bit_index(io_map, write_pfn, cpu)) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Discard %ld.", write_pfn);
+ mutex_lock(&io_mutex);
+ idx = atomic_add_return(1, &io_count);
+ mutex_unlock(&io_mutex);
+ return;
+ }
+
+ virt = kmap(copy_page);
+ buffer_virt = kmap(buffer);
+ was_present = kernel_page_present(copy_page);
+ if (!was_present)
+ kernel_map_pages(copy_page, 1, 1);
+ memcpy(virt, buffer_virt, PAGE_SIZE);
+ if (!was_present)
+ kernel_map_pages(copy_page, 1, 0);
+ kunmap(copy_page);
+ kunmap(buffer);
+ memory_bm_clear_bit_index(io_map, write_pfn, cpu);
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "Read %d:%ld", idx, write_pfn);
+}
+
+static unsigned long status_update(int writing, unsigned long done,
+ unsigned long ticks)
+{
+ int cs_index = writing ? 0 : 1;
+ unsigned long ticks_so_far = toi_bkd.toi_io_time[cs_index][1] + ticks;
+ unsigned long msec = jiffies_to_msecs(abs(ticks_so_far));
+ unsigned long pgs_per_s, estimate = 0, pages_left;
+
+ if (msec) {
+ pages_left = io_barmax - done;
+ pgs_per_s = 1000 * done / msec;
+ if (pgs_per_s)
+ estimate = DIV_ROUND_UP(pages_left, pgs_per_s);
+ }
+
+ if (estimate && ticks > HZ / 2)
+ return toi_update_status(done, io_barmax,
+ " %d/%d MB (%lu sec left)",
+ MB(done+1), MB(io_barmax), estimate);
+
+ return toi_update_status(done, io_barmax, " %d/%d MB",
+ MB(done+1), MB(io_barmax));
+}
+
+/**
+ * worker_rw_loop - main loop to read/write pages
+ *
+ * The main I/O loop for reading or writing pages. The io_map bitmap is used to
+ * track the pages to read/write.
+ * If we are reading, the pages are loaded to their final (mapped) pfn.
+ * Data is non zero iff this is a thread started via start_other_threads.
+ * In that case, we stay in here until told to quit.
+ **/
+static int worker_rw_loop(void *data)
+{
+ unsigned long data_pfn, write_pfn, next_jiffies = jiffies + HZ / 4,
+ jif_index = 1, start_time = jiffies, thread_num;
+ int result = 0, my_io_index = 0, last_worker;
+ struct page *buffer = toi_alloc_page(28, TOI_ATOMIC_GFP);
+ cpumask_var_t orig_mask;
+
+ if (!alloc_cpumask_var(&orig_mask, GFP_KERNEL)) {
+ printk(KERN_EMERG "Failed to allocate cpumask for TuxOnIce I/O thread %ld.\n", (unsigned long) data);
+ return -ENOMEM;
+ }
+
+ cpumask_copy(orig_mask, tsk_cpus_allowed(current));
+
+ current->flags |= PF_NOFREEZE;
+
+top:
+ mutex_lock(&io_mutex);
+ thread_num = atomic_read(&toi_io_workers);
+
+ cpumask_copy(tsk_cpus_allowed(current), orig_mask);
+ schedule();
+
+ atomic_inc(&toi_io_workers);
+
+ while (atomic_read(&io_count) >= atomic_read(&toi_io_workers) &&
+ !(io_write && test_result_state(TOI_ABORTED)) &&
+ toi_worker_command == TOI_IO_WORKER_RUN) {
+ if (!thread_num && jiffies > next_jiffies) {
+ next_jiffies += HZ / 4;
+ if (toiActiveAllocator->update_throughput_throttle)
+ toiActiveAllocator->update_throughput_throttle(
+ jif_index);
+ jif_index++;
+ }
+
+ /*
+ * What page to use? If reading, don't know yet which page's
+ * data will be read, so always use the buffer. If writing,
+ * use the copy (Pageset1) or original page (Pageset2), but
+ * always write the pfn of the original page.
+ */
+ if (io_write)
+ result = write_next_page(&data_pfn, &my_io_index,
+ &write_pfn);
+ else /* Reading */
+ result = read_next_page(&my_io_index, &write_pfn,
+ buffer);
+
+ if (result) {
+ mutex_lock(&io_mutex);
+ /* Nothing to do? */
+ if (result == -ENODATA) {
+ toi_message(TOI_IO, TOI_VERBOSE, 0,
+ "Thread %d has no more work.",
+ smp_processor_id());
+ break;
+ }
+
+ io_result = result;
+
+ if (io_write) {
+ printk(KERN_INFO "Write chunk returned %d.\n",
+ result);
+ abort_hibernate(TOI_FAILED_IO,
+ "Failed to write a chunk of the "
+ "image.");
+ break;
+ }
+
+ if (io_pageset == 1) {
+ printk(KERN_ERR "\nBreaking out of I/O loop "
+ "because of result code %d.\n", result);
+ break;
+ }
+ panic("Read chunk returned (%d)", result);
+ }
+
+ /*
+ * Discard reads of resaved pages while reading ps2
+ * and unwanted pages while rereading ps2 when aborting.
+ */
+ if (!io_write) {
+ if (!PageResave(pfn_to_page(write_pfn)))
+ use_read_page(write_pfn, buffer);
+ else {
+ mutex_lock(&io_mutex);
+ toi_message(TOI_IO, TOI_VERBOSE, 0,
+ "Resaved %ld.", write_pfn);
+ atomic_inc(&io_count);
+ mutex_unlock(&io_mutex);
+ }
+ }
+
+ if (!thread_num) {
+ if(my_io_index + io_base > io_nextupdate)
+ io_nextupdate = status_update(io_write,
+ my_io_index + io_base,
+ jiffies - start_time);
+
+ if (my_io_index > io_pc) {
+ printk(KERN_CONT "...%d%%", 20 * io_pc_step);
+ io_pc_step++;
+ io_pc = io_finish_at * io_pc_step / 5;
+ }
+ }
+
+ toi_cond_pause(0, NULL);
+
+ /*
+ * Subtle: If there's less I/O still to be done than threads
+ * running, quit. This stops us doing I/O beyond the end of
+ * the image when reading.
+ *
+ * Possible race condition. Two threads could do the test at
+ * the same time; one should exit and one should continue.
+ * Therefore we take the mutex before comparing and exiting.
+ */
+
+ mutex_lock(&io_mutex);
+ }
+
+ last_worker = atomic_dec_and_test(&toi_io_workers);
+ toi_message(TOI_IO, TOI_VERBOSE, 0, "%d workers left.", atomic_read(&toi_io_workers));
+ mutex_unlock(&io_mutex);
+
+ if ((unsigned long) data && toi_worker_command != TOI_IO_WORKER_EXIT) {
+ /* Were we the last thread and we're using a flusher thread? */
+ if (last_worker && using_flusher) {
+ toiActiveAllocator->finish_all_io();
+ }
+ /* First, if we're doing I/O, wait for it to finish */
+ wait_event(toi_worker_wait_queue, toi_worker_command != TOI_IO_WORKER_RUN);
+ /* Then wait to be told what to do next */
+ wait_event(toi_worker_wait_queue, toi_worker_command != TOI_IO_WORKER_STOP);
+ if (toi_worker_command == TOI_IO_WORKER_RUN)
+ goto top;
+ }
+
+ if (thread_num)
+ atomic_dec(&toi_num_other_threads);
+
+ toi_message(TOI_IO, TOI_LOW, 0, "Thread %d exiting.", thread_num);
+ toi__free_page(28, buffer);
+ free_cpumask_var(orig_mask);
+
+ return result;
+}
+
+int toi_start_other_threads(void)
+{
+ int cpu;
+ struct task_struct *p;
+ int to_start = (toi_max_workers ? toi_max_workers : num_online_cpus()) - 1;
+ unsigned long num_started = 0;
+
+ if (test_action_state(TOI_NO_MULTITHREADED_IO))
+ return 0;
+
+ toi_worker_command = TOI_IO_WORKER_STOP;
+
+ for_each_online_cpu(cpu) {
+ if (num_started == to_start)
+ break;
+
+ if (cpu == smp_processor_id())
+ continue;
+
+ p = kthread_create_on_node(worker_rw_loop, (void *) num_started + 1,
+ cpu_to_node(cpu), "ktoi_io/%d", cpu);
+ if (IS_ERR(p)) {
+ printk(KERN_ERR "ktoi_io for %i failed\n", cpu);
+ continue;
+ }
+ kthread_bind(p, cpu);
+ p->flags |= PF_MEMALLOC;
+ wake_up_process(p);
+ num_started++;
+ atomic_inc(&toi_num_other_threads);
+ }
+
+ toi_message(TOI_IO, TOI_LOW, 0, "Started %d threads.", num_started);
+ return num_started;
+}
+
+void toi_stop_other_threads(void)
+{
+ toi_message(TOI_IO, TOI_LOW, 0, "Stopping other threads.");
+ toi_worker_command = TOI_IO_WORKER_EXIT;
+ wake_up(&toi_worker_wait_queue);
+}
+
+/**
+ * do_rw_loop - main highlevel function for reading or writing pages
+ *
+ * Create the io_map bitmap and call worker_rw_loop to perform I/O operations.
+ **/
+static int do_rw_loop(int write, int finish_at, struct memory_bitmap *pageflags,
+ int base, int barmax, int pageset)
+{
+ int index = 0, cpu, result = 0, workers_started;
+ unsigned long pfn;
+
+ first_filter = toi_get_next_filter(NULL);
+
+ if (!finish_at)
+ return 0;
+
+ io_write = write;
+ io_finish_at = finish_at;
+ io_base = base;
+ io_barmax = barmax;
+ io_pageset = pageset;
+ io_index = 0;
+ io_pc = io_finish_at / 5;
+ io_pc_step = 1;
+ io_result = 0;
+ io_nextupdate = base + 1;
+ toi_bio_queue_flusher_should_finish = 0;
+
+ for_each_online_cpu(cpu) {
+ per_cpu(last_sought, cpu) = NULL;
+ per_cpu(last_low_page, cpu) = NULL;
+ per_cpu(last_high_page, cpu) = NULL;
+ }
+
+ /* Ensure all bits clear */
+ memory_bm_clear(io_map);
+
+ /* Set the bits for the pages to write */
+ memory_bm_position_reset(pageflags);
+
+ pfn = memory_bm_next_pfn(pageflags);
+
+ while (pfn != BM_END_OF_MAP && index < finish_at) {
+ memory_bm_set_bit(io_map, pfn);
+ pfn = memory_bm_next_pfn(pageflags);
+ index++;
+ }
+
+ BUG_ON(index < finish_at);
+
+ atomic_set(&io_count, finish_at);
+
+ memory_bm_position_reset(pageset1_map);
+
+ mutex_lock(&io_mutex);
+
+ clear_toi_state(TOI_IO_STOPPED);
+
+ using_flusher = (atomic_read(&toi_num_other_threads) &&
+ toiActiveAllocator->io_flusher &&
+ !test_action_state(TOI_NO_FLUSHER_THREAD));
+
+ workers_started = atomic_read(&toi_num_other_threads);
+
+ memory_bm_set_iterators(io_map, atomic_read(&toi_num_other_threads) + 1);
+ memory_bm_position_reset(io_map);
+
+ memory_bm_set_iterators(pageset1_copy_map, atomic_read(&toi_num_other_threads) + 1);
+ memory_bm_position_reset(pageset1_copy_map);
+
+ toi_worker_command = TOI_IO_WORKER_RUN;
+ wake_up(&toi_worker_wait_queue);
+
+ mutex_unlock(&io_mutex);
+
+ if (using_flusher)
+ result = toiActiveAllocator->io_flusher(write);
+ else
+ worker_rw_loop(NULL);
+
+ while (atomic_read(&toi_io_workers))
+ schedule();
+
+ printk(KERN_CONT "\n");
+
+ toi_worker_command = TOI_IO_WORKER_STOP;
+ wake_up(&toi_worker_wait_queue);
+
+ if (unlikely(test_toi_state(TOI_STOP_RESUME))) {
+ if (!atomic_read(&toi_io_workers)) {
+ rw_cleanup_modules(READ);
+ set_toi_state(TOI_IO_STOPPED);
+ }
+ while (1)
+ schedule();
+ }
+ set_toi_state(TOI_IO_STOPPED);
+
+ if (!io_result && !result && !test_result_state(TOI_ABORTED)) {
+ unsigned long next;
+
+ toi_update_status(io_base + io_finish_at, io_barmax,
+ " %d/%d MB ",
+ MB(io_base + io_finish_at), MB(io_barmax));
+
+ memory_bm_position_reset(io_map);
+ next = memory_bm_next_pfn(io_map);
+ if (next != BM_END_OF_MAP) {
+ printk(KERN_INFO "Finished I/O loop but still work to "
+ "do?\nFinish at = %d. io_count = %d.\n",
+ finish_at, atomic_read(&io_count));
+ printk(KERN_INFO "I/O bitmap still records work to do."
+ "%ld.\n", next);
+ BUG();
+ do {
+ cpu_relax();
+ } while (0);
+ }
+ }
+
+ return io_result ? io_result : result;
+}
+
+/**
+ * write_pageset - write a pageset to disk.
+ * @pagedir: Which pagedir to write.
+ *
+ * Returns:
+ * Zero on success or -1 on failure.
+ **/
+int write_pageset(struct pagedir *pagedir)
+{
+ int finish_at, base = 0;
+ int barmax = pagedir1.size + pagedir2.size;
+ long error = 0;
+ struct memory_bitmap *pageflags;
+ unsigned long start_time, end_time;
+
+ /*
+ * Even if there is nothing to read or write, the allocator
+ * may need the init/cleanup for it's housekeeping. (eg:
+ * Pageset1 may start where pageset2 ends when writing).
+ */
+ finish_at = pagedir->size;
+
+ if (pagedir->id == 1) {
+ toi_prepare_status(DONT_CLEAR_BAR,
+ "Writing kernel & process data...");
+ base = pagedir2.size;
+ if (test_action_state(TOI_TEST_FILTER_SPEED) ||
+ test_action_state(TOI_TEST_BIO))
+ pageflags = pageset1_map;
+ else
+ pageflags = pageset1_copy_map;
+ } else {
+ toi_prepare_status(DONT_CLEAR_BAR, "Writing caches...");
+ pageflags = pageset2_map;
+ }
+
+ start_time = jiffies;
+
+ if (rw_init_modules(WRITE, pagedir->id)) {
+ abort_hibernate(TOI_FAILED_MODULE_INIT,
+ "Failed to initialise modules for writing.");
+ error = 1;
+ }
+
+ if (!error)
+ error = do_rw_loop(WRITE, finish_at, pageflags, base, barmax,
+ pagedir->id);
+
+ if (rw_cleanup_modules(WRITE) && !error) {
+ abort_hibernate(TOI_FAILED_MODULE_CLEANUP,
+ "Failed to cleanup after writing.");
+ error = 1;
+ }
+
+ end_time = jiffies;
+
+ if ((end_time - start_time) && (!test_result_state(TOI_ABORTED))) {
+ toi_bkd.toi_io_time[0][0] += finish_at,
+ toi_bkd.toi_io_time[0][1] += (end_time - start_time);
+ }
+
+ return error;
+}
+
+/**
+ * read_pageset - highlevel function to read a pageset from disk
+ * @pagedir: pageset to read
+ * @overwrittenpagesonly: Whether to read the whole pageset or
+ * only part of it.
+ *
+ * Returns:
+ * Zero on success or -1 on failure.
+ **/
+static int read_pageset(struct pagedir *pagedir, int overwrittenpagesonly)
+{
+ int result = 0, base = 0;
+ int finish_at = pagedir->size;
+ int barmax = pagedir1.size + pagedir2.size;
+ struct memory_bitmap *pageflags;
+ unsigned long start_time, end_time;
+
+ if (pagedir->id == 1) {
+ toi_prepare_status(DONT_CLEAR_BAR,
+ "Reading kernel & process data...");
+ pageflags = pageset1_map;
+ } else {
+ toi_prepare_status(DONT_CLEAR_BAR, "Reading caches...");
+ if (overwrittenpagesonly) {
+ barmax = min(pagedir1.size, pagedir2.size);
+ finish_at = min(pagedir1.size, pagedir2.size);
+ } else
+ base = pagedir1.size;
+ pageflags = pageset2_map;
+ }
+
+ start_time = jiffies;
+
+ if (rw_init_modules(READ, pagedir->id)) {
+ toiActiveAllocator->remove_image();
+ result = 1;
+ } else
+ result = do_rw_loop(READ, finish_at, pageflags, base, barmax,
+ pagedir->id);
+
+ if (rw_cleanup_modules(READ) && !result) {
+ abort_hibernate(TOI_FAILED_MODULE_CLEANUP,
+ "Failed to cleanup after reading.");
+ result = 1;
+ }
+
+ /* Statistics */
+ end_time = jiffies;
+
+ if ((end_time - start_time) && (!test_result_state(TOI_ABORTED))) {
+ toi_bkd.toi_io_time[1][0] += finish_at,
+ toi_bkd.toi_io_time[1][1] += (end_time - start_time);
+ }
+
+ return result;
+}
+
+/**
+ * write_module_configs - store the modules configuration
+ *
+ * The configuration for each module is stored in the image header.
+ * Returns: Int
+ * Zero on success, Error value otherwise.
+ **/
+static int write_module_configs(void)
+{
+ struct toi_module_ops *this_module;
+ char *buffer = (char *) toi_get_zeroed_page(22, TOI_ATOMIC_GFP);
+ int len, index = 1;
+ struct toi_module_header toi_module_header;
+
+ if (!buffer) {
+ printk(KERN_INFO "Failed to allocate a buffer for saving "
+ "module configuration info.\n");
+ return -ENOMEM;
+ }
+
+ /*
+ * We have to know which data goes with which module, so we at
+ * least write a length of zero for a module. Note that we are
+ * also assuming every module's config data takes <= PAGE_SIZE.
+ */
+
+ /* For each module (in registration order) */
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled || !this_module->storage_needed ||
+ (this_module->type == WRITER_MODULE &&
+ toiActiveAllocator != this_module))
+ continue;
+
+ /* Get the data from the module */
+ len = 0;
+ if (this_module->save_config_info)
+ len = this_module->save_config_info(buffer);
+
+ /* Save the details of the module */
+ toi_module_header.enabled = this_module->enabled;
+ toi_module_header.type = this_module->type;
+ toi_module_header.index = index++;
+ strncpy(toi_module_header.name, this_module->name,
+ sizeof(toi_module_header.name));
+ toiActiveAllocator->rw_header_chunk(WRITE,
+ this_module,
+ (char *) &toi_module_header,
+ sizeof(toi_module_header));
+
+ /* Save the size of the data and any data returned */
+ toiActiveAllocator->rw_header_chunk(WRITE,
+ this_module,
+ (char *) &len, sizeof(int));
+ if (len)
+ toiActiveAllocator->rw_header_chunk(
+ WRITE, this_module, buffer, len);
+ }
+
+ /* Write a blank header to terminate the list */
+ toi_module_header.name[0] = '\0';
+ toiActiveAllocator->rw_header_chunk(WRITE, NULL,
+ (char *) &toi_module_header, sizeof(toi_module_header));
+
+ toi_free_page(22, (unsigned long) buffer);
+ return 0;
+}
+
+/**
+ * read_one_module_config - read and configure one module
+ *
+ * Read the configuration for one module, and configure the module
+ * to match if it is loaded.
+ *
+ * Returns: Int
+ * Zero on success, Error value otherwise.
+ **/
+static int read_one_module_config(struct toi_module_header *header)
+{
+ struct toi_module_ops *this_module;
+ int result, len;
+ char *buffer;
+
+ /* Find the module */
+ this_module = toi_find_module_given_name(header->name);
+
+ if (!this_module) {
+ if (header->enabled) {
+ toi_early_boot_message(1, TOI_CONTINUE_REQ,
+ "It looks like we need module %s for reading "
+ "the image but it hasn't been registered.\n",
+ header->name);
+ if (!(test_toi_state(TOI_CONTINUE_REQ)))
+ return -EINVAL;
+ } else
+ printk(KERN_INFO "Module %s configuration data found, "
+ "but the module hasn't registered. Looks like "
+ "it was disabled, so we're ignoring its data.",
+ header->name);
+ }
+
+ /* Get the length of the data (if any) */
+ result = toiActiveAllocator->rw_header_chunk(READ, NULL, (char *) &len,
+ sizeof(int));
+ if (result) {
+ printk(KERN_ERR "Failed to read the length of the module %s's"
+ " configuration data.\n",
+ header->name);
+ return -EINVAL;
+ }
+
+ /* Read any data and pass to the module (if we found one) */
+ if (!len)
+ return 0;
+
+ buffer = (char *) toi_get_zeroed_page(23, TOI_ATOMIC_GFP);
+
+ if (!buffer) {
+ printk(KERN_ERR "Failed to allocate a buffer for reloading "
+ "module configuration info.\n");
+ return -ENOMEM;
+ }
+
+ toiActiveAllocator->rw_header_chunk(READ, NULL, buffer, len);
+
+ if (!this_module)
+ goto out;
+
+ if (!this_module->save_config_info)
+ printk(KERN_ERR "Huh? Module %s appears to have a "
+ "save_config_info, but not a load_config_info "
+ "function!\n", this_module->name);
+ else
+ this_module->load_config_info(buffer, len);
+
+ /*
+ * Now move this module to the tail of its lists. This will put it in
+ * order. Any new modules will end up at the top of the lists. They
+ * should have been set to disabled when loaded (people will
+ * normally not edit an initrd to load a new module and then hibernate
+ * without using it!).
+ */
+
+ toi_move_module_tail(this_module);
+
+ this_module->enabled = header->enabled;
+
+out:
+ toi_free_page(23, (unsigned long) buffer);
+ return 0;
+}
+
+/**
+ * read_module_configs - reload module configurations from the image header.
+ *
+ * Returns: Int
+ * Zero on success or an error code.
+ **/
+static int read_module_configs(void)
+{
+ int result = 0;
+ struct toi_module_header toi_module_header;
+ struct toi_module_ops *this_module;
+
+ /* All modules are initially disabled. That way, if we have a module
+ * loaded now that wasn't loaded when we hibernated, it won't be used
+ * in trying to read the data.
+ */
+ list_for_each_entry(this_module, &toi_modules, module_list)
+ this_module->enabled = 0;
+
+ /* Get the first module header */
+ result = toiActiveAllocator->rw_header_chunk(READ, NULL,
+ (char *) &toi_module_header,
+ sizeof(toi_module_header));
+ if (result) {
+ printk(KERN_ERR "Failed to read the next module header.\n");
+ return -EINVAL;
+ }
+
+ /* For each module (in registration order) */
+ while (toi_module_header.name[0]) {
+ result = read_one_module_config(&toi_module_header);
+
+ if (result)
+ return -EINVAL;
+
+ /* Get the next module header */
+ result = toiActiveAllocator->rw_header_chunk(READ, NULL,
+ (char *) &toi_module_header,
+ sizeof(toi_module_header));
+
+ if (result) {
+ printk(KERN_ERR "Failed to read the next module "
+ "header.\n");
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+static inline int save_fs_info(struct fs_info *fs, struct block_device *bdev)
+{
+ return (!fs || IS_ERR(fs) || !fs->last_mount_size) ? 0 : 1;
+}
+
+int fs_info_space_needed(void)
+{
+ const struct super_block *sb;
+ int result = sizeof(int);
+
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ struct fs_info *fs;
+
+ if (!sb->s_bdev)
+ continue;
+
+ fs = fs_info_from_block_dev(sb->s_bdev);
+ if (save_fs_info(fs, sb->s_bdev))
+ result += 16 + sizeof(dev_t) + sizeof(int) +
+ fs->last_mount_size;
+ free_fs_info(fs);
+ }
+ return result;
+}
+
+static int fs_info_num_to_save(void)
+{
+ const struct super_block *sb;
+ int to_save = 0;
+
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ struct fs_info *fs;
+
+ if (!sb->s_bdev)
+ continue;
+
+ fs = fs_info_from_block_dev(sb->s_bdev);
+ if (save_fs_info(fs, sb->s_bdev))
+ to_save++;
+ free_fs_info(fs);
+ }
+
+ return to_save;
+}
+
+static int fs_info_save(void)
+{
+ const struct super_block *sb;
+ int to_save = fs_info_num_to_save();
+
+ if (toiActiveAllocator->rw_header_chunk(WRITE, NULL, (char *) &to_save,
+ sizeof(int))) {
+ abort_hibernate(TOI_FAILED_IO, "Failed to write num fs_info"
+ " to save.");
+ return -EIO;
+ }
+
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ struct fs_info *fs;
+
+ if (!sb->s_bdev)
+ continue;
+
+ fs = fs_info_from_block_dev(sb->s_bdev);
+ if (save_fs_info(fs, sb->s_bdev)) {
+ if (toiActiveAllocator->rw_header_chunk(WRITE, NULL,
+ &fs->uuid[0], 16)) {
+ abort_hibernate(TOI_FAILED_IO, "Failed to "
+ "write uuid.");
+ return -EIO;
+ }
+ if (toiActiveAllocator->rw_header_chunk(WRITE, NULL,
+ (char *) &fs->dev_t, sizeof(dev_t))) {
+ abort_hibernate(TOI_FAILED_IO, "Failed to "
+ "write dev_t.");
+ return -EIO;
+ }
+ if (toiActiveAllocator->rw_header_chunk(WRITE, NULL,
+ (char *) &fs->last_mount_size, sizeof(int))) {
+ abort_hibernate(TOI_FAILED_IO, "Failed to "
+ "write last mount length.");
+ return -EIO;
+ }
+ if (toiActiveAllocator->rw_header_chunk(WRITE, NULL,
+ fs->last_mount, fs->last_mount_size)) {
+ abort_hibernate(TOI_FAILED_IO, "Failed to "
+ "write uuid.");
+ return -EIO;
+ }
+ }
+ free_fs_info(fs);
+ }
+ return 0;
+}
+
+static int fs_info_load_and_check_one(void)
+{
+ char uuid[16], *last_mount;
+ int result = 0, ln;
+ dev_t dev_t;
+ struct block_device *dev;
+ struct fs_info *fs_info, seek;
+
+ if (toiActiveAllocator->rw_header_chunk(READ, NULL, uuid, 16)) {
+ abort_hibernate(TOI_FAILED_IO, "Failed to read uuid.");
+ return -EIO;
+ }
+
+ read_if_version(3, dev_t, "uuid dev_t field", return -EIO);
+
+ if (toiActiveAllocator->rw_header_chunk(READ, NULL, (char *) &ln,
+ sizeof(int))) {
+ abort_hibernate(TOI_FAILED_IO,
+ "Failed to read last mount size.");
+ return -EIO;
+ }
+
+ last_mount = kzalloc(ln, GFP_KERNEL);
+
+ if (!last_mount)
+ return -ENOMEM;
+
+ if (toiActiveAllocator->rw_header_chunk(READ, NULL, last_mount, ln)) {
+ abort_hibernate(TOI_FAILED_IO,
+ "Failed to read last mount timestamp.");
+ result = -EIO;
+ goto out_lmt;
+ }
+
+ strncpy((char *) &seek.uuid, uuid, 16);
+ seek.dev_t = dev_t;
+ seek.last_mount_size = ln;
+ seek.last_mount = last_mount;
+ dev_t = blk_lookup_fs_info(&seek);
+ if (!dev_t)
+ goto out_lmt;
+
+ dev = toi_open_by_devnum(dev_t);
+
+ fs_info = fs_info_from_block_dev(dev);
+ if (fs_info && !IS_ERR(fs_info)) {
+ if (ln != fs_info->last_mount_size) {
+ printk(KERN_EMERG "Found matching uuid but last mount "
+ "time lengths differ?! "
+ "(%d vs %d).\n", ln,
+ fs_info->last_mount_size);
+ result = -EINVAL;
+ } else {
+ char buf[BDEVNAME_SIZE];
+ result = !!memcmp(fs_info->last_mount, last_mount, ln);
+ if (result)
+ printk(KERN_EMERG "Last mount time for %s has "
+ "changed!\n", bdevname(dev, buf));
+ }
+ }
+ toi_close_bdev(dev);
+ free_fs_info(fs_info);
+out_lmt:
+ kfree(last_mount);
+ return result;
+}
+
+static int fs_info_load_and_check(void)
+{
+ int to_do, result = 0;
+
+ if (toiActiveAllocator->rw_header_chunk(READ, NULL, (char *) &to_do,
+ sizeof(int))) {
+ abort_hibernate(TOI_FAILED_IO, "Failed to read num fs_info "
+ "to load.");
+ return -EIO;
+ }
+
+ while(to_do--)
+ result |= fs_info_load_and_check_one();
+
+ return result;
+}
+
+/**
+ * write_image_header - write the image header after write the image proper
+ *
+ * Returns: Int
+ * Zero on success, error value otherwise.
+ **/
+int write_image_header(void)
+{
+ int ret;
+ int total = pagedir1.size + pagedir2.size+2;
+ char *header_buffer = NULL;
+
+ /* Now prepare to write the header */
+ ret = toiActiveAllocator->write_header_init();
+ if (ret) {
+ abort_hibernate(TOI_FAILED_MODULE_INIT,
+ "Active allocator's write_header_init"
+ " function failed.");
+ goto write_image_header_abort;
+ }
+
+ /* Get a buffer */
+ header_buffer = (char *) toi_get_zeroed_page(24, TOI_ATOMIC_GFP);
+ if (!header_buffer) {
+ abort_hibernate(TOI_OUT_OF_MEMORY,
+ "Out of memory when trying to get page for header!");
+ goto write_image_header_abort;
+ }
+
+ /* Write hibernate header */
+ if (fill_toi_header((struct toi_header *) header_buffer)) {
+ abort_hibernate(TOI_OUT_OF_MEMORY,
+ "Failure to fill header information!");
+ goto write_image_header_abort;
+ }
+
+ if (toiActiveAllocator->rw_header_chunk(WRITE, NULL,
+ header_buffer, sizeof(struct toi_header))) {
+ abort_hibernate(TOI_OUT_OF_MEMORY,
+ "Failure to write header info.");
+ goto write_image_header_abort;
+ }
+
+ if (toiActiveAllocator->rw_header_chunk(WRITE, NULL,
+ (char *) &toi_max_workers, sizeof(toi_max_workers))) {
+ abort_hibernate(TOI_OUT_OF_MEMORY,
+ "Failure to number of workers to use.");
+ goto write_image_header_abort;
+ }
+
+ /* Write filesystem info */
+ if (fs_info_save())
+ goto write_image_header_abort;
+
+ /* Write module configurations */
+ ret = write_module_configs();
+ if (ret) {
+ abort_hibernate(TOI_FAILED_IO,
+ "Failed to write module configs.");
+ goto write_image_header_abort;
+ }
+
+ if (memory_bm_write(pageset1_map,
+ toiActiveAllocator->rw_header_chunk)) {
+ abort_hibernate(TOI_FAILED_IO,
+ "Failed to write bitmaps.");
+ goto write_image_header_abort;
+ }
+
+ /* Flush data and let allocator cleanup */
+ if (toiActiveAllocator->write_header_cleanup()) {
+ abort_hibernate(TOI_FAILED_IO,
+ "Failed to cleanup writing header.");
+ goto write_image_header_abort_no_cleanup;
+ }
+
+ if (test_result_state(TOI_ABORTED))
+ goto write_image_header_abort_no_cleanup;
+
+ toi_update_status(total, total, NULL);
+
+out:
+ if (header_buffer)
+ toi_free_page(24, (unsigned long) header_buffer);
+ return ret;
+
+write_image_header_abort:
+ toiActiveAllocator->write_header_cleanup();
+write_image_header_abort_no_cleanup:
+ ret = -1;
+ goto out;
+}
+
+/**
+ * sanity_check - check the header
+ * @sh: the header which was saved at hibernate time.
+ *
+ * Perform a few checks, seeking to ensure that the kernel being
+ * booted matches the one hibernated. They need to match so we can
+ * be _sure_ things will work. It is not absolutely impossible for
+ * resuming from a different kernel to work, just not assured.
+ **/
+static char *sanity_check(struct toi_header *sh)
+{
+ char *reason = check_image_kernel((struct swsusp_info *) sh);
+
+ if (reason)
+ return reason;
+
+ if (!test_action_state(TOI_IGNORE_ROOTFS)) {
+ const struct super_block *sb;
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ if ((!(sb->s_flags & MS_RDONLY)) &&
+ (sb->s_type->fs_flags & FS_REQUIRES_DEV))
+ return "Device backed fs has been mounted "
+ "rw prior to resume or initrd/ramfs "
+ "is mounted rw.";
+ }
+ }
+
+ return NULL;
+}
+
+static DECLARE_WAIT_QUEUE_HEAD(freeze_wait);
+
+#define FREEZE_IN_PROGRESS (~0)
+
+static int freeze_result;
+
+static void do_freeze(struct work_struct *dummy)
+{
+ freeze_result = freeze_processes();
+ wake_up(&freeze_wait);
+ trap_non_toi_io = 1;
+}
+
+static DECLARE_WORK(freeze_work, do_freeze);
+
+/**
+ * __read_pageset1 - test for the existence of an image and attempt to load it
+ *
+ * Returns: Int
+ * Zero if image found and pageset1 successfully loaded.
+ * Error if no image found or loaded.
+ **/
+static int __read_pageset1(void)
+{
+ int i, result = 0;
+ char *header_buffer = (char *) toi_get_zeroed_page(25, TOI_ATOMIC_GFP),
+ *sanity_error = NULL;
+ struct toi_header *toi_header;
+
+ if (!header_buffer) {
+ printk(KERN_INFO "Unable to allocate a page for reading the "
+ "signature.\n");
+ return -ENOMEM;
+ }
+
+ /* Check for an image */
+ result = toiActiveAllocator->image_exists(1);
+ if (result == 3) {
+ result = -ENODATA;
+ toi_early_boot_message(1, 0, "The signature from an older "
+ "version of TuxOnIce has been detected.");
+ goto out_remove_image;
+ }
+
+ if (result != 1) {
+ result = -ENODATA;
+ noresume_reset_modules();
+ printk(KERN_INFO "TuxOnIce: No image found.\n");
+ goto out;
+ }
+
+ /*
+ * Prepare the active allocator for reading the image header. The
+ * activate allocator might read its own configuration.
+ *
+ * NB: This call may never return because there might be a signature
+ * for a different image such that we warn the user and they choose
+ * to reboot. (If the device ids look erroneous (2.4 vs 2.6) or the
+ * location of the image might be unavailable if it was stored on a
+ * network connection).
+ */
+
+ result = toiActiveAllocator->read_header_init();
+ if (result) {
+ printk(KERN_INFO "TuxOnIce: Failed to initialise, reading the "
+ "image header.\n");
+ goto out_remove_image;
+ }
+
+ /* Check for noresume command line option */
+ if (test_toi_state(TOI_NORESUME_SPECIFIED)) {
+ printk(KERN_INFO "TuxOnIce: Noresume on command line. Removed "
+ "image.\n");
+ goto out_remove_image;
+ }
+
+ /* Check whether we've resumed before */
+ if (test_toi_state(TOI_RESUMED_BEFORE)) {
+ toi_early_boot_message(1, 0, NULL);
+ if (!(test_toi_state(TOI_CONTINUE_REQ))) {
+ printk(KERN_INFO "TuxOnIce: Tried to resume before: "
+ "Invalidated image.\n");
+ goto out_remove_image;
+ }
+ }
+
+ clear_toi_state(TOI_CONTINUE_REQ);
+
+ toi_image_header_version = toiActiveAllocator->get_header_version();
+
+ if (unlikely(toi_image_header_version > TOI_HEADER_VERSION)) {
+ toi_early_boot_message(1, 0, image_version_error);
+ if (!(test_toi_state(TOI_CONTINUE_REQ))) {
+ printk(KERN_INFO "TuxOnIce: Header version too new: "
+ "Invalidated image.\n");
+ goto out_remove_image;
+ }
+ }
+
+ /* Read hibernate header */
+ result = toiActiveAllocator->rw_header_chunk(READ, NULL,
+ header_buffer, sizeof(struct toi_header));
+ if (result < 0) {
+ printk(KERN_ERR "TuxOnIce: Failed to read the image "
+ "signature.\n");
+ goto out_remove_image;
+ }
+
+ toi_header = (struct toi_header *) header_buffer;
+
+ /*
+ * NB: This call may also result in a reboot rather than returning.
+ */
+
+ sanity_error = sanity_check(toi_header);
+ if (sanity_error) {
+ toi_early_boot_message(1, TOI_CONTINUE_REQ,
+ sanity_error);
+ printk(KERN_INFO "TuxOnIce: Sanity check failed.\n");
+ goto out_remove_image;
+ }
+
+ /*
+ * We have an image and it looks like it will load okay.
+ *
+ * Get metadata from header. Don't override commandline parameters.
+ *
+ * We don't need to save the image size limit because it's not used
+ * during resume and will be restored with the image anyway.
+ */
+
+ memcpy((char *) &pagedir1,
+ (char *) &toi_header->pagedir, sizeof(pagedir1));
+ toi_result = toi_header->param0;
+ if (!toi_bkd.toi_debug_state) {
+ toi_bkd.toi_action =
+ (toi_header->param1 & ~toi_bootflags_mask) |
+ (toi_bkd.toi_action & toi_bootflags_mask);
+ toi_bkd.toi_debug_state = toi_header->param2;
+ toi_bkd.toi_default_console_level = toi_header->param3;
+ }
+ clear_toi_state(TOI_IGNORE_LOGLEVEL);
+ pagedir2.size = toi_header->pageset_2_size;
+ for (i = 0; i < 4; i++)
+ toi_bkd.toi_io_time[i/2][i%2] =
+ toi_header->io_time[i/2][i%2];
+
+ set_toi_state(TOI_BOOT_KERNEL);
+ boot_kernel_data_buffer = toi_header->bkd;
+
+ read_if_version(1, toi_max_workers, "TuxOnIce max workers",
+ goto out_remove_image);
+
+ /* Read filesystem info */
+ if (fs_info_load_and_check()) {
+ printk(KERN_EMERG "TuxOnIce: File system mount time checks "
+ "failed. Refusing to corrupt your filesystems!\n");
+ goto out_remove_image;
+ }
+
+ /* Read module configurations */
+ result = read_module_configs();
+ if (result) {
+ pagedir1.size = 0;
+ pagedir2.size = 0;
+ printk(KERN_INFO "TuxOnIce: Failed to read TuxOnIce module "
+ "configurations.\n");
+ clear_action_state(TOI_KEEP_IMAGE);
+ goto out_remove_image;
+ }
+
+ toi_prepare_console();
+
+ set_toi_state(TOI_NOW_RESUMING);
+
+ if (!test_action_state(TOI_LATE_CPU_HOTPLUG)) {
+ toi_prepare_status(DONT_CLEAR_BAR, "Disable nonboot cpus.");
+ if (disable_nonboot_cpus()) {
+ set_abort_result(TOI_CPU_HOTPLUG_FAILED);
+ goto out_reset_console;
+ }
+ }
+
+ result = pm_notifier_call_chain(PM_RESTORE_PREPARE);
+ if (result)
+ goto out_notifier_call_chain;;
+
+ if (usermodehelper_disable())
+ goto out_enable_nonboot_cpus;
+
+ current->flags |= PF_NOFREEZE;
+ freeze_result = FREEZE_IN_PROGRESS;
+
+ schedule_work_on(cpumask_first(cpu_online_mask), &freeze_work);
+
+ toi_cond_pause(1, "About to read original pageset1 locations.");
+
+ /*
+ * See _toi_rw_header_chunk in tuxonice_bio.c:
+ * Initialize pageset1_map by reading the map from the image.
+ */
+ if (memory_bm_read(pageset1_map, toiActiveAllocator->rw_header_chunk))
+ goto out_thaw;
+
+ /*
+ * See toi_rw_cleanup in tuxonice_bio.c:
+ * Clean up after reading the header.
+ */
+ result = toiActiveAllocator->read_header_cleanup();
+ if (result) {
+ printk(KERN_ERR "TuxOnIce: Failed to cleanup after reading the "
+ "image header.\n");
+ goto out_thaw;
+ }
+
+ toi_cond_pause(1, "About to read pagedir.");
+
+ /*
+ * Get the addresses of pages into which we will load the kernel to
+ * be copied back and check if they conflict with the ones we are using.
+ */
+ if (toi_get_pageset1_load_addresses()) {
+ printk(KERN_INFO "TuxOnIce: Failed to get load addresses for "
+ "pageset1.\n");
+ goto out_thaw;
+ }
+
+ /* Read the original kernel back */
+ toi_cond_pause(1, "About to read pageset 1.");
+
+ /* Given the pagemap, read back the data from disk */
+ if (read_pageset(&pagedir1, 0)) {
+ toi_prepare_status(DONT_CLEAR_BAR, "Failed to read pageset 1.");
+ result = -EIO;
+ goto out_thaw;
+ }
+
+ toi_cond_pause(1, "About to restore original kernel.");
+ result = 0;
+
+ if (!test_action_state(TOI_KEEP_IMAGE) &&
+ toiActiveAllocator->mark_resume_attempted)
+ toiActiveAllocator->mark_resume_attempted(1);
+
+ wait_event(freeze_wait, freeze_result != FREEZE_IN_PROGRESS);
+out:
+ current->flags &= ~PF_NOFREEZE;
+ toi_free_page(25, (unsigned long) header_buffer);
+ return result;
+
+out_thaw:
+ wait_event(freeze_wait, freeze_result != FREEZE_IN_PROGRESS);
+ trap_non_toi_io = 0;
+ thaw_processes();
+ usermodehelper_enable();
+out_enable_nonboot_cpus:
+ enable_nonboot_cpus();
+out_notifier_call_chain:
+ pm_notifier_call_chain(PM_POST_RESTORE);
+out_reset_console:
+ toi_cleanup_console();
+out_remove_image:
+ result = -EINVAL;
+ if (!test_action_state(TOI_KEEP_IMAGE))
+ toiActiveAllocator->remove_image();
+ toiActiveAllocator->read_header_cleanup();
+ noresume_reset_modules();
+ goto out;
+}
+
+/**
+ * read_pageset1 - highlevel function to read the saved pages
+ *
+ * Attempt to read the header and pageset1 of a hibernate image.
+ * Handle the outcome, complaining where appropriate.
+ **/
+int read_pageset1(void)
+{
+ int error;
+
+ error = __read_pageset1();
+
+ if (error && error != -ENODATA && error != -EINVAL &&
+ !test_result_state(TOI_ABORTED))
+ abort_hibernate(TOI_IMAGE_ERROR,
+ "TuxOnIce: Error %d resuming\n", error);
+
+ return error;
+}
+
+/**
+ * get_have_image_data - check the image header
+ **/
+static char *get_have_image_data(void)
+{
+ char *output_buffer = (char *) toi_get_zeroed_page(26, TOI_ATOMIC_GFP);
+ struct toi_header *toi_header;
+
+ if (!output_buffer) {
+ printk(KERN_INFO "Output buffer null.\n");
+ return NULL;
+ }
+
+ /* Check for an image */
+ if (!toiActiveAllocator->image_exists(1) ||
+ toiActiveAllocator->read_header_init() ||
+ toiActiveAllocator->rw_header_chunk(READ, NULL,
+ output_buffer, sizeof(struct toi_header))) {
+ sprintf(output_buffer, "0\n");
+ /*
+ * From an initrd/ramfs, catting have_image and
+ * getting a result of 0 is sufficient.
+ */
+ clear_toi_state(TOI_BOOT_TIME);
+ goto out;
+ }
+
+ toi_header = (struct toi_header *) output_buffer;
+
+ sprintf(output_buffer, "1\n%s\n%s\n",
+ toi_header->uts.machine,
+ toi_header->uts.version);
+
+ /* Check whether we've resumed before */
+ if (test_toi_state(TOI_RESUMED_BEFORE))
+ strcat(output_buffer, "Resumed before.\n");
+
+out:
+ noresume_reset_modules();
+ return output_buffer;
+}
+
+/**
+ * read_pageset2 - read second part of the image
+ * @overwrittenpagesonly: Read only pages which would have been
+ * verwritten by pageset1?
+ *
+ * Read in part or all of pageset2 of an image, depending upon
+ * whether we are hibernating and have only overwritten a portion
+ * with pageset1 pages, or are resuming and need to read them
+ * all.
+ *
+ * Returns: Int
+ * Zero if no error, otherwise the error value.
+ **/
+int read_pageset2(int overwrittenpagesonly)
+{
+ int result = 0;
+
+ if (!pagedir2.size)
+ return 0;
+
+ result = read_pageset(&pagedir2, overwrittenpagesonly);
+
+ toi_cond_pause(1, "Pagedir 2 read.");
+
+ return result;
+}
+
+/**
+ * image_exists_read - has an image been found?
+ * @page: Output buffer
+ *
+ * Store 0 or 1 in page, depending on whether an image is found.
+ * Incoming buffer is PAGE_SIZE and result is guaranteed
+ * to be far less than that, so we don't worry about
+ * overflow.
+ **/
+int image_exists_read(const char *page, int count)
+{
+ int len = 0;
+ char *result;
+
+ if (toi_activate_storage(0))
+ return count;
+
+ if (!test_toi_state(TOI_RESUME_DEVICE_OK))
+ toi_attempt_to_parse_resume_device(0);
+
+ if (!toiActiveAllocator) {
+ len = sprintf((char *) page, "-1\n");
+ } else {
+ result = get_have_image_data();
+ if (result) {
+ len = sprintf((char *) page, "%s", result);
+ toi_free_page(26, (unsigned long) result);
+ }
+ }
+
+ toi_deactivate_storage(0);
+
+ return len;
+}
+
+/**
+ * image_exists_write - invalidate an image if one exists
+ **/
+int image_exists_write(const char *buffer, int count)
+{
+ if (toi_activate_storage(0))
+ return count;
+
+ if (toiActiveAllocator && toiActiveAllocator->image_exists(1))
+ toiActiveAllocator->remove_image();
+
+ toi_deactivate_storage(0);
+
+ clear_result_state(TOI_KEPT_IMAGE);
+
+ return count;
+}
diff --git a/kernel/power/tuxonice_io.h b/kernel/power/tuxonice_io.h
new file mode 100644
index 0000000..6f740ca
--- /dev/null
+++ b/kernel/power/tuxonice_io.h
@@ -0,0 +1,74 @@
+/*
+ * kernel/power/tuxonice_io.h
+ *
+ * Copyright (C) 2005-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains high level IO routines for hibernating.
+ *
+ */
+
+#include <linux/utsname.h>
+#include "tuxonice_pagedir.h"
+
+/* Non-module data saved in our image header */
+struct toi_header {
+ /*
+ * Mirror struct swsusp_info, but without
+ * the page aligned attribute
+ */
+ struct new_utsname uts;
+ u32 version_code;
+ unsigned long num_physpages;
+ int cpus;
+ unsigned long image_pages;
+ unsigned long pages;
+ unsigned long size;
+
+ /* Our own data */
+ unsigned long orig_mem_free;
+ int page_size;
+ int pageset_2_size;
+ int param0;
+ int param1;
+ int param2;
+ int param3;
+ int progress0;
+ int progress1;
+ int progress2;
+ int progress3;
+ int io_time[2][2];
+ struct pagedir pagedir;
+ dev_t root_fs;
+ unsigned long bkd; /* Boot kernel data locn */
+};
+
+extern int write_pageset(struct pagedir *pagedir);
+extern int write_image_header(void);
+extern int read_pageset1(void);
+extern int read_pageset2(int overwrittenpagesonly);
+
+extern int toi_attempt_to_parse_resume_device(int quiet);
+extern void attempt_to_parse_resume_device2(void);
+extern void attempt_to_parse_alt_resume_param(void);
+int image_exists_read(const char *page, int count);
+int image_exists_write(const char *buffer, int count);
+extern void save_restore_alt_param(int replace, int quiet);
+extern atomic_t toi_io_workers;
+
+/* Args to save_restore_alt_param */
+#define RESTORE 0
+#define SAVE 1
+
+#define NOQUIET 0
+#define QUIET 1
+
+extern dev_t name_to_dev_t(char *line);
+
+extern wait_queue_head_t toi_io_queue_flusher;
+extern int toi_bio_queue_flusher_should_finish;
+
+int fs_info_space_needed(void);
+
+extern int toi_max_workers;
diff --git a/kernel/power/tuxonice_modules.c b/kernel/power/tuxonice_modules.c
new file mode 100644
index 0000000..9e794cb
--- /dev/null
+++ b/kernel/power/tuxonice_modules.c
@@ -0,0 +1,522 @@
+/*
+ * kernel/power/tuxonice_modules.c
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ */
+
+#include <linux/suspend.h>
+#include "tuxonice.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_ui.h"
+
+LIST_HEAD(toi_filters);
+LIST_HEAD(toiAllocators);
+
+LIST_HEAD(toi_modules);
+EXPORT_SYMBOL_GPL(toi_modules);
+
+struct toi_module_ops *toiActiveAllocator;
+EXPORT_SYMBOL_GPL(toiActiveAllocator);
+
+static int toi_num_filters;
+int toiNumAllocators, toi_num_modules;
+
+/*
+ * toi_header_storage_for_modules
+ *
+ * Returns the amount of space needed to store configuration
+ * data needed by the modules prior to copying back the original
+ * kernel. We can exclude data for pageset2 because it will be
+ * available anyway once the kernel is copied back.
+ */
+long toi_header_storage_for_modules(void)
+{
+ struct toi_module_ops *this_module;
+ int bytes = 0;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled ||
+ (this_module->type == WRITER_MODULE &&
+ toiActiveAllocator != this_module))
+ continue;
+ if (this_module->storage_needed) {
+ int this = this_module->storage_needed() +
+ sizeof(struct toi_module_header) +
+ sizeof(int);
+ this_module->header_requested = this;
+ bytes += this;
+ }
+ }
+
+ /* One more for the empty terminator */
+ return bytes + sizeof(struct toi_module_header);
+}
+
+void print_toi_header_storage_for_modules(void)
+{
+ struct toi_module_ops *this_module;
+ int bytes = 0;
+
+ printk(KERN_DEBUG "Header storage:\n");
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled ||
+ (this_module->type == WRITER_MODULE &&
+ toiActiveAllocator != this_module))
+ continue;
+ if (this_module->storage_needed) {
+ int this = this_module->storage_needed() +
+ sizeof(struct toi_module_header) +
+ sizeof(int);
+ this_module->header_requested = this;
+ bytes += this;
+ printk(KERN_DEBUG "+ %16s : %-4d/%d.\n",
+ this_module->name,
+ this_module->header_used, this);
+ }
+ }
+
+ printk(KERN_DEBUG "+ empty terminator : %zu.\n",
+ sizeof(struct toi_module_header));
+ printk(KERN_DEBUG " ====\n");
+ printk(KERN_DEBUG " %zu\n",
+ bytes + sizeof(struct toi_module_header));
+}
+EXPORT_SYMBOL_GPL(print_toi_header_storage_for_modules);
+
+/*
+ * toi_memory_for_modules
+ *
+ * Returns the amount of memory requested by modules for
+ * doing their work during the cycle.
+ */
+
+long toi_memory_for_modules(int print_parts)
+{
+ long bytes = 0, result;
+ struct toi_module_ops *this_module;
+
+ if (print_parts)
+ printk(KERN_INFO "Memory for modules:\n===================\n");
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ int this;
+ if (!this_module->enabled)
+ continue;
+ if (this_module->memory_needed) {
+ this = this_module->memory_needed();
+ if (print_parts)
+ printk(KERN_INFO "%10d bytes (%5ld pages) for "
+ "module '%s'.\n", this,
+ DIV_ROUND_UP(this, PAGE_SIZE),
+ this_module->name);
+ bytes += this;
+ }
+ }
+
+ result = DIV_ROUND_UP(bytes, PAGE_SIZE);
+ if (print_parts)
+ printk(KERN_INFO " => %ld bytes, %ld pages.\n", bytes, result);
+
+ return result;
+}
+
+/*
+ * toi_expected_compression_ratio
+ *
+ * Returns the compression ratio expected when saving the image.
+ */
+
+int toi_expected_compression_ratio(void)
+{
+ int ratio = 100;
+ struct toi_module_ops *this_module;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled)
+ continue;
+ if (this_module->expected_compression)
+ ratio = ratio * this_module->expected_compression()
+ / 100;
+ }
+
+ return ratio;
+}
+
+/* toi_find_module_given_dir
+ * Functionality : Return a module (if found), given a pointer
+ * to its directory name
+ */
+
+static struct toi_module_ops *toi_find_module_given_dir(char *name)
+{
+ struct toi_module_ops *this_module, *found_module = NULL;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!strcmp(name, this_module->directory)) {
+ found_module = this_module;
+ break;
+ }
+ }
+
+ return found_module;
+}
+
+/* toi_find_module_given_name
+ * Functionality : Return a module (if found), given a pointer
+ * to its name
+ */
+
+struct toi_module_ops *toi_find_module_given_name(char *name)
+{
+ struct toi_module_ops *this_module, *found_module = NULL;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!strcmp(name, this_module->name)) {
+ found_module = this_module;
+ break;
+ }
+ }
+
+ return found_module;
+}
+
+/*
+ * toi_print_module_debug_info
+ * Functionality : Get debugging info from modules into a buffer.
+ */
+int toi_print_module_debug_info(char *buffer, int buffer_size)
+{
+ struct toi_module_ops *this_module;
+ int len = 0;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled)
+ continue;
+ if (this_module->print_debug_info) {
+ int result;
+ result = this_module->print_debug_info(buffer + len,
+ buffer_size - len);
+ len += result;
+ }
+ }
+
+ /* Ensure null terminated */
+ buffer[buffer_size] = 0;
+
+ return len;
+}
+
+/*
+ * toi_register_module
+ *
+ * Register a module.
+ */
+int toi_register_module(struct toi_module_ops *module)
+{
+ int i;
+ struct kobject *kobj;
+
+ module->enabled = 1;
+
+ if (toi_find_module_given_name(module->name)) {
+ printk(KERN_INFO "TuxOnIce: Trying to load module %s,"
+ " which is already registered.\n",
+ module->name);
+ return -EBUSY;
+ }
+
+ switch (module->type) {
+ case FILTER_MODULE:
+ list_add_tail(&module->type_list, &toi_filters);
+ toi_num_filters++;
+ break;
+ case WRITER_MODULE:
+ list_add_tail(&module->type_list, &toiAllocators);
+ toiNumAllocators++;
+ break;
+ case MISC_MODULE:
+ case MISC_HIDDEN_MODULE:
+ case BIO_ALLOCATOR_MODULE:
+ break;
+ default:
+ printk(KERN_ERR "Hmmm. Module '%s' has an invalid type."
+ " It has been ignored.\n", module->name);
+ return -EINVAL;
+ }
+ list_add_tail(&module->module_list, &toi_modules);
+ toi_num_modules++;
+
+ if ((!module->directory && !module->shared_directory) ||
+ !module->sysfs_data || !module->num_sysfs_entries)
+ return 0;
+
+ /*
+ * Modules may share a directory, but those with shared_dir
+ * set must be loaded (via symbol dependencies) after parents
+ * and unloaded beforehand.
+ */
+ if (module->shared_directory) {
+ struct toi_module_ops *shared =
+ toi_find_module_given_dir(module->shared_directory);
+ if (!shared) {
+ printk(KERN_ERR "TuxOnIce: Module %s wants to share "
+ "%s's directory but %s isn't loaded.\n",
+ module->name, module->shared_directory,
+ module->shared_directory);
+ toi_unregister_module(module);
+ return -ENODEV;
+ }
+ kobj = shared->dir_kobj;
+ } else {
+ if (!strncmp(module->directory, "[ROOT]", 6))
+ kobj = tuxonice_kobj;
+ else
+ kobj = make_toi_sysdir(module->directory);
+ }
+ module->dir_kobj = kobj;
+ for (i = 0; i < module->num_sysfs_entries; i++) {
+ int result = toi_register_sysfs_file(kobj,
+ &module->sysfs_data[i]);
+ if (result)
+ return result;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(toi_register_module);
+
+/*
+ * toi_unregister_module
+ *
+ * Remove a module.
+ */
+void toi_unregister_module(struct toi_module_ops *module)
+{
+ int i;
+
+ if (module->dir_kobj)
+ for (i = 0; i < module->num_sysfs_entries; i++)
+ toi_unregister_sysfs_file(module->dir_kobj,
+ &module->sysfs_data[i]);
+
+ if (!module->shared_directory && module->directory &&
+ strncmp(module->directory, "[ROOT]", 6))
+ remove_toi_sysdir(module->dir_kobj);
+
+ switch (module->type) {
+ case FILTER_MODULE:
+ list_del(&module->type_list);
+ toi_num_filters--;
+ break;
+ case WRITER_MODULE:
+ list_del(&module->type_list);
+ toiNumAllocators--;
+ if (toiActiveAllocator == module) {
+ toiActiveAllocator = NULL;
+ clear_toi_state(TOI_CAN_RESUME);
+ clear_toi_state(TOI_CAN_HIBERNATE);
+ }
+ break;
+ case MISC_MODULE:
+ case MISC_HIDDEN_MODULE:
+ case BIO_ALLOCATOR_MODULE:
+ break;
+ default:
+ printk(KERN_ERR "Module '%s' has an invalid type."
+ " It has been ignored.\n", module->name);
+ return;
+ }
+ list_del(&module->module_list);
+ toi_num_modules--;
+}
+EXPORT_SYMBOL_GPL(toi_unregister_module);
+
+/*
+ * toi_move_module_tail
+ *
+ * Rearrange modules when reloading the config.
+ */
+void toi_move_module_tail(struct toi_module_ops *module)
+{
+ switch (module->type) {
+ case FILTER_MODULE:
+ if (toi_num_filters > 1)
+ list_move_tail(&module->type_list, &toi_filters);
+ break;
+ case WRITER_MODULE:
+ if (toiNumAllocators > 1)
+ list_move_tail(&module->type_list, &toiAllocators);
+ break;
+ case MISC_MODULE:
+ case MISC_HIDDEN_MODULE:
+ case BIO_ALLOCATOR_MODULE:
+ break;
+ default:
+ printk(KERN_ERR "Module '%s' has an invalid type."
+ " It has been ignored.\n", module->name);
+ return;
+ }
+ if ((toi_num_filters + toiNumAllocators) > 1)
+ list_move_tail(&module->module_list, &toi_modules);
+}
+
+/*
+ * toi_initialise_modules
+ *
+ * Get ready to do some work!
+ */
+int toi_initialise_modules(int starting_cycle, int early)
+{
+ struct toi_module_ops *this_module;
+ int result;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ this_module->header_requested = 0;
+ this_module->header_used = 0;
+ if (!this_module->enabled)
+ continue;
+ if (this_module->early != early)
+ continue;
+ if (this_module->initialise) {
+ result = this_module->initialise(starting_cycle);
+ if (result) {
+ toi_cleanup_modules(starting_cycle);
+ return result;
+ }
+ this_module->initialised = 1;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * toi_cleanup_modules
+ *
+ * Tell modules the work is done.
+ */
+void toi_cleanup_modules(int finishing_cycle)
+{
+ struct toi_module_ops *this_module;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (!this_module->enabled || !this_module->initialised)
+ continue;
+ if (this_module->cleanup)
+ this_module->cleanup(finishing_cycle);
+ this_module->initialised = 0;
+ }
+}
+
+/*
+ * toi_pre_atomic_restore_modules
+ *
+ * Get ready to do some work!
+ */
+void toi_pre_atomic_restore_modules(struct toi_boot_kernel_data *bkd)
+{
+ struct toi_module_ops *this_module;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (this_module->enabled && this_module->pre_atomic_restore)
+ this_module->pre_atomic_restore(bkd);
+ }
+}
+
+/*
+ * toi_post_atomic_restore_modules
+ *
+ * Get ready to do some work!
+ */
+void toi_post_atomic_restore_modules(struct toi_boot_kernel_data *bkd)
+{
+ struct toi_module_ops *this_module;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (this_module->enabled && this_module->post_atomic_restore)
+ this_module->post_atomic_restore(bkd);
+ }
+}
+
+/*
+ * toi_get_next_filter
+ *
+ * Get the next filter in the pipeline.
+ */
+struct toi_module_ops *toi_get_next_filter(struct toi_module_ops *filter_sought)
+{
+ struct toi_module_ops *last_filter = NULL, *this_filter = NULL;
+
+ list_for_each_entry(this_filter, &toi_filters, type_list) {
+ if (!this_filter->enabled)
+ continue;
+ if ((last_filter == filter_sought) || (!filter_sought))
+ return this_filter;
+ last_filter = this_filter;
+ }
+
+ return toiActiveAllocator;
+}
+EXPORT_SYMBOL_GPL(toi_get_next_filter);
+
+/**
+ * toi_show_modules: Printk what support is loaded.
+ */
+void toi_print_modules(void)
+{
+ struct toi_module_ops *this_module;
+ int prev = 0;
+
+ printk(KERN_INFO "TuxOnIce " TOI_CORE_VERSION ", with support for");
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ if (this_module->type == MISC_HIDDEN_MODULE)
+ continue;
+ printk("%s %s%s%s", prev ? "," : "",
+ this_module->enabled ? "" : "[",
+ this_module->name,
+ this_module->enabled ? "" : "]");
+ prev = 1;
+ }
+
+ printk(".\n");
+}
+
+/* toi_get_modules
+ *
+ * Take a reference to modules so they can't go away under us.
+ */
+
+int toi_get_modules(void)
+{
+ struct toi_module_ops *this_module;
+
+ list_for_each_entry(this_module, &toi_modules, module_list) {
+ struct toi_module_ops *this_module2;
+
+ if (try_module_get(this_module->module))
+ continue;
+
+ /* Failed! Reverse gets and return error */
+ list_for_each_entry(this_module2, &toi_modules,
+ module_list) {
+ if (this_module == this_module2)
+ return -EINVAL;
+ module_put(this_module2->module);
+ }
+ }
+ return 0;
+}
+
+/* toi_put_modules
+ *
+ * Release our references to modules we used.
+ */
+
+void toi_put_modules(void)
+{
+ struct toi_module_ops *this_module;
+
+ list_for_each_entry(this_module, &toi_modules, module_list)
+ module_put(this_module->module);
+}
diff --git a/kernel/power/tuxonice_modules.h b/kernel/power/tuxonice_modules.h
new file mode 100644
index 0000000..d488572
--- /dev/null
+++ b/kernel/power/tuxonice_modules.h
@@ -0,0 +1,211 @@
+/*
+ * kernel/power/tuxonice_modules.h
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains declarations for modules. Modules are additions to
+ * TuxOnIce that provide facilities such as image compression or
+ * encryption, backends for storage of the image and user interfaces.
+ *
+ */
+
+#ifndef TOI_MODULES_H
+#define TOI_MODULES_H
+
+/* This is the maximum size we store in the image header for a module name */
+#define TOI_MAX_MODULE_NAME_LENGTH 30
+
+struct toi_boot_kernel_data;
+
+/* Per-module metadata */
+struct toi_module_header {
+ char name[TOI_MAX_MODULE_NAME_LENGTH];
+ int enabled;
+ int type;
+ int index;
+ int data_length;
+ unsigned long signature;
+};
+
+enum {
+ FILTER_MODULE,
+ WRITER_MODULE,
+ BIO_ALLOCATOR_MODULE,
+ MISC_MODULE,
+ MISC_HIDDEN_MODULE,
+};
+
+enum {
+ TOI_ASYNC,
+ TOI_SYNC
+};
+
+enum {
+ TOI_VIRT,
+ TOI_PAGE,
+};
+
+#define TOI_MAP(type, addr) \
+ (type == TOI_PAGE ? kmap(addr) : addr)
+
+#define TOI_UNMAP(type, addr) \
+ do { \
+ if (type == TOI_PAGE) \
+ kunmap(addr); \
+ } while(0)
+
+struct toi_module_ops {
+ /* Functions common to all modules */
+ int type;
+ char *name;
+ char *directory;
+ char *shared_directory;
+ struct kobject *dir_kobj;
+ struct module *module;
+ int enabled, early, initialised;
+ struct list_head module_list;
+
+ /* List of filters or allocators */
+ struct list_head list, type_list;
+
+ /*
+ * Requirements for memory and storage in
+ * the image header..
+ */
+ int (*memory_needed) (void);
+ int (*storage_needed) (void);
+
+ int header_requested, header_used;
+
+ int (*expected_compression) (void);
+
+ /*
+ * Debug info
+ */
+ int (*print_debug_info) (char *buffer, int size);
+ int (*save_config_info) (char *buffer);
+ void (*load_config_info) (char *buffer, int len);
+
+ /*
+ * Initialise & cleanup - general routines called
+ * at the start and end of a cycle.
+ */
+ int (*initialise) (int starting_cycle);
+ void (*cleanup) (int finishing_cycle);
+
+ void (*pre_atomic_restore) (struct toi_boot_kernel_data *bkd);
+ void (*post_atomic_restore) (struct toi_boot_kernel_data *bkd);
+
+ /*
+ * Calls for allocating storage (allocators only).
+ *
+ * Header space is requested separately and cannot fail, but the
+ * reservation is only applied when main storage is allocated.
+ * The header space reservation is thus always set prior to
+ * requesting the allocation of storage - and prior to querying
+ * how much storage is available.
+ */
+
+ unsigned long (*storage_available) (void);
+ void (*reserve_header_space) (unsigned long space_requested);
+ int (*register_storage) (void);
+ int (*allocate_storage) (unsigned long space_requested);
+ unsigned long (*storage_allocated) (void);
+
+ /*
+ * Routines used in image I/O.
+ */
+ int (*rw_init) (int rw, int stream_number);
+ int (*rw_cleanup) (int rw);
+ int (*write_page) (unsigned long index, int buf_type, void *buf,
+ unsigned int buf_size);
+ int (*read_page) (unsigned long *index, int buf_type, void *buf,
+ unsigned int *buf_size);
+ int (*io_flusher) (int rw);
+
+ /* Reset module if image exists but reading aborted */
+ void (*noresume_reset) (void);
+
+ /* Read and write the metadata */
+ int (*write_header_init) (void);
+ int (*write_header_cleanup) (void);
+
+ int (*read_header_init) (void);
+ int (*read_header_cleanup) (void);
+
+ /* To be called after read_header_init */
+ int (*get_header_version) (void);
+
+ int (*rw_header_chunk) (int rw, struct toi_module_ops *owner,
+ char *buffer_start, int buffer_size);
+
+ int (*rw_header_chunk_noreadahead) (int rw,
+ struct toi_module_ops *owner, char *buffer_start,
+ int buffer_size);
+
+ /* Attempt to parse an image location */
+ int (*parse_sig_location) (char *buffer, int only_writer, int quiet);
+
+ /* Throttle I/O according to throughput */
+ void (*update_throughput_throttle) (int jif_index);
+
+ /* Flush outstanding I/O */
+ int (*finish_all_io) (void);
+
+ /* Determine whether image exists that we can restore */
+ int (*image_exists) (int quiet);
+
+ /* Mark the image as having tried to resume */
+ int (*mark_resume_attempted) (int);
+
+ /* Destroy image if one exists */
+ int (*remove_image) (void);
+
+ /* Sysfs Data */
+ struct toi_sysfs_data *sysfs_data;
+ int num_sysfs_entries;
+
+ /* Block I/O allocator */
+ struct toi_bio_allocator_ops *bio_allocator_ops;
+};
+
+extern int toi_num_modules, toiNumAllocators;
+
+extern struct toi_module_ops *toiActiveAllocator;
+extern struct list_head toi_filters, toiAllocators, toi_modules;
+
+extern void toi_prepare_console_modules(void);
+extern void toi_cleanup_console_modules(void);
+
+extern struct toi_module_ops *toi_find_module_given_name(char *name);
+extern struct toi_module_ops *toi_get_next_filter(struct toi_module_ops *);
+
+extern int toi_register_module(struct toi_module_ops *module);
+extern void toi_move_module_tail(struct toi_module_ops *module);
+
+extern long toi_header_storage_for_modules(void);
+extern long toi_memory_for_modules(int print_parts);
+extern void print_toi_header_storage_for_modules(void);
+extern int toi_expected_compression_ratio(void);
+
+extern int toi_print_module_debug_info(char *buffer, int buffer_size);
+extern int toi_register_module(struct toi_module_ops *module);
+extern void toi_unregister_module(struct toi_module_ops *module);
+
+extern int toi_initialise_modules(int starting_cycle, int early);
+#define toi_initialise_modules_early(starting) \
+ toi_initialise_modules(starting, 1)
+#define toi_initialise_modules_late(starting) \
+ toi_initialise_modules(starting, 0)
+extern void toi_cleanup_modules(int finishing_cycle);
+
+extern void toi_post_atomic_restore_modules(struct toi_boot_kernel_data *bkd);
+extern void toi_pre_atomic_restore_modules(struct toi_boot_kernel_data *bkd);
+
+extern void toi_print_modules(void);
+
+int toi_get_modules(void);
+void toi_put_modules(void);
+#endif
diff --git a/kernel/power/tuxonice_netlink.c b/kernel/power/tuxonice_netlink.c
new file mode 100644
index 0000000..0a40aa8
--- /dev/null
+++ b/kernel/power/tuxonice_netlink.c
@@ -0,0 +1,329 @@
+/*
+ * kernel/power/tuxonice_netlink.c
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * Functions for communicating with a userspace helper via netlink.
+ */
+
+
+#include <linux/suspend.h>
+#include <linux/sched.h>
+#include "tuxonice_netlink.h"
+#include "tuxonice.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_builtin.h"
+
+static struct user_helper_data *uhd_list;
+
+/*
+ * Refill our pool of SKBs for use in emergencies (eg, when eating memory and
+ * none can be allocated).
+ */
+static void toi_fill_skb_pool(struct user_helper_data *uhd)
+{
+ while (uhd->pool_level < uhd->pool_limit) {
+ struct sk_buff *new_skb =
+ alloc_skb(NLMSG_SPACE(uhd->skb_size), TOI_ATOMIC_GFP);
+
+ if (!new_skb)
+ break;
+
+ new_skb->next = uhd->emerg_skbs;
+ uhd->emerg_skbs = new_skb;
+ uhd->pool_level++;
+ }
+}
+
+/*
+ * Try to allocate a single skb. If we can't get one, try to use one from
+ * our pool.
+ */
+static struct sk_buff *toi_get_skb(struct user_helper_data *uhd)
+{
+ struct sk_buff *skb =
+ alloc_skb(NLMSG_SPACE(uhd->skb_size), TOI_ATOMIC_GFP);
+
+ if (skb)
+ return skb;
+
+ skb = uhd->emerg_skbs;
+ if (skb) {
+ uhd->pool_level--;
+ uhd->emerg_skbs = skb->next;
+ skb->next = NULL;
+ }
+
+ return skb;
+}
+
+void toi_send_netlink_message(struct user_helper_data *uhd,
+ int type, void *params, size_t len)
+{
+ struct sk_buff *skb;
+ struct nlmsghdr *nlh;
+ void *dest;
+ struct task_struct *t;
+
+ if (uhd->pid == -1)
+ return;
+
+ if (uhd->debug)
+ printk(KERN_ERR "toi_send_netlink_message: Send "
+ "message type %d.\n", type);
+
+ skb = toi_get_skb(uhd);
+ if (!skb) {
+ printk(KERN_INFO "toi_netlink: Can't allocate skb!\n");
+ return;
+ }
+
+ nlh = nlmsg_put(skb, 0, uhd->sock_seq, type, len, 0);
+ uhd->sock_seq++;
+
+ dest = NLMSG_DATA(nlh);
+ if (params && len > 0)
+ memcpy(dest, params, len);
+
+ netlink_unicast(uhd->nl, skb, uhd->pid, 0);
+
+ toi_read_lock_tasklist();
+ t = find_task_by_pid_ns(uhd->pid, &init_pid_ns);
+ if (!t) {
+ toi_read_unlock_tasklist();
+ if (uhd->pid > -1)
+ printk(KERN_INFO "Hmm. Can't find the userspace task"
+ " %d.\n", uhd->pid);
+ return;
+ }
+ wake_up_process(t);
+ toi_read_unlock_tasklist();
+
+ yield();
+}
+EXPORT_SYMBOL_GPL(toi_send_netlink_message);
+
+static void send_whether_debugging(struct user_helper_data *uhd)
+{
+ static u8 is_debugging = 1;
+
+ toi_send_netlink_message(uhd, NETLINK_MSG_IS_DEBUGGING,
+ &is_debugging, sizeof(u8));
+}
+
+/*
+ * Set the PF_NOFREEZE flag on the given process to ensure it can run whilst we
+ * are hibernating.
+ */
+static int nl_set_nofreeze(struct user_helper_data *uhd, __u32 pid)
+{
+ struct task_struct *t;
+
+ if (uhd->debug)
+ printk(KERN_ERR "nl_set_nofreeze for pid %d.\n", pid);
+
+ toi_read_lock_tasklist();
+ t = find_task_by_pid_ns(pid, &init_pid_ns);
+ if (!t) {
+ toi_read_unlock_tasklist();
+ printk(KERN_INFO "Strange. Can't find the userspace task %d.\n",
+ pid);
+ return -EINVAL;
+ }
+
+ t->flags |= PF_NOFREEZE;
+
+ toi_read_unlock_tasklist();
+ uhd->pid = pid;
+
+ toi_send_netlink_message(uhd, NETLINK_MSG_NOFREEZE_ACK, NULL, 0);
+
+ return 0;
+}
+
+/*
+ * Called when the userspace process has informed us that it's ready to roll.
+ */
+static int nl_ready(struct user_helper_data *uhd, u32 version)
+{
+ if (version != uhd->interface_version) {
+ printk(KERN_INFO "%s userspace process using invalid interface"
+ " version (%d - kernel wants %d). Trying to "
+ "continue without it.\n",
+ uhd->name, version, uhd->interface_version);
+ if (uhd->not_ready)
+ uhd->not_ready();
+ return -EINVAL;
+ }
+
+ complete(&uhd->wait_for_process);
+
+ return 0;
+}
+
+void toi_netlink_close_complete(struct user_helper_data *uhd)
+{
+ if (uhd->nl) {
+ netlink_kernel_release(uhd->nl);
+ uhd->nl = NULL;
+ }
+
+ while (uhd->emerg_skbs) {
+ struct sk_buff *next = uhd->emerg_skbs->next;
+ kfree_skb(uhd->emerg_skbs);
+ uhd->emerg_skbs = next;
+ }
+
+ uhd->pid = -1;
+}
+EXPORT_SYMBOL_GPL(toi_netlink_close_complete);
+
+static int toi_nl_gen_rcv_msg(struct user_helper_data *uhd,
+ struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+ int type = nlh->nlmsg_type;
+ int *data;
+ int err;
+
+ if (uhd->debug)
+ printk(KERN_ERR "toi_user_rcv_skb: Received message %d.\n",
+ type);
+
+ /* Let the more specific handler go first. It returns
+ * 1 for valid messages that it doesn't know. */
+ err = uhd->rcv_msg(skb, nlh);
+ if (err != 1)
+ return err;
+
+ /* Only allow one task to receive NOFREEZE privileges */
+ if (type == NETLINK_MSG_NOFREEZE_ME && uhd->pid != -1) {
+ printk(KERN_INFO "Received extra nofreeze me requests.\n");
+ return -EBUSY;
+ }
+
+ data = NLMSG_DATA(nlh);
+
+ switch (type) {
+ case NETLINK_MSG_NOFREEZE_ME:
+ return nl_set_nofreeze(uhd, nlh->nlmsg_pid);
+ case NETLINK_MSG_GET_DEBUGGING:
+ send_whether_debugging(uhd);
+ return 0;
+ case NETLINK_MSG_READY:
+ if (nlh->nlmsg_len != NLMSG_LENGTH(sizeof(u32))) {
+ printk(KERN_INFO "Invalid ready mesage.\n");
+ if (uhd->not_ready)
+ uhd->not_ready();
+ return -EINVAL;
+ }
+ return nl_ready(uhd, (u32) *data);
+ case NETLINK_MSG_CLEANUP:
+ toi_netlink_close_complete(uhd);
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static void toi_user_rcv_skb(struct sk_buff *skb)
+{
+ int err;
+ struct nlmsghdr *nlh;
+ struct user_helper_data *uhd = uhd_list;
+
+ while (uhd && uhd->netlink_id != skb->sk->sk_protocol)
+ uhd = uhd->next;
+
+ if (!uhd)
+ return;
+
+ while (skb->len >= NLMSG_SPACE(0)) {
+ u32 rlen;
+
+ nlh = (struct nlmsghdr *) skb->data;
+ if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
+ return;
+
+ rlen = NLMSG_ALIGN(nlh->nlmsg_len);
+ if (rlen > skb->len)
+ rlen = skb->len;
+
+ err = toi_nl_gen_rcv_msg(uhd, skb, nlh);
+ if (err)
+ netlink_ack(skb, nlh, err);
+ else if (nlh->nlmsg_flags & NLM_F_ACK)
+ netlink_ack(skb, nlh, 0);
+ skb_pull(skb, rlen);
+ }
+}
+
+static int netlink_prepare(struct user_helper_data *uhd)
+{
+ struct netlink_kernel_cfg cfg = {
+ .groups = 0,
+ .input = toi_user_rcv_skb,
+ };
+
+ uhd->next = uhd_list;
+ uhd_list = uhd;
+
+ uhd->sock_seq = 0x42c0ffee;
+ uhd->nl = netlink_kernel_create(&init_net, uhd->netlink_id, &cfg);
+ if (!uhd->nl) {
+ printk(KERN_INFO "Failed to allocate netlink socket for %s.\n",
+ uhd->name);
+ return -ENOMEM;
+ }
+
+ toi_fill_skb_pool(uhd);
+
+ return 0;
+}
+
+void toi_netlink_close(struct user_helper_data *uhd)
+{
+ struct task_struct *t;
+
+ toi_read_lock_tasklist();
+ t = find_task_by_pid_ns(uhd->pid, &init_pid_ns);
+ if (t)
+ t->flags &= ~PF_NOFREEZE;
+ toi_read_unlock_tasklist();
+
+ toi_send_netlink_message(uhd, NETLINK_MSG_CLEANUP, NULL, 0);
+}
+EXPORT_SYMBOL_GPL(toi_netlink_close);
+
+int toi_netlink_setup(struct user_helper_data *uhd)
+{
+ /* In case userui didn't cleanup properly on us */
+ toi_netlink_close_complete(uhd);
+
+ if (netlink_prepare(uhd) < 0) {
+ printk(KERN_INFO "Netlink prepare failed.\n");
+ return 1;
+ }
+
+ if (toi_launch_userspace_program(uhd->program, uhd->netlink_id,
+ UMH_WAIT_EXEC, uhd->debug) < 0) {
+ printk(KERN_INFO "Launch userspace program failed.\n");
+ toi_netlink_close_complete(uhd);
+ return 1;
+ }
+
+ /* Wait 2 seconds for the userspace process to make contact */
+ wait_for_completion_timeout(&uhd->wait_for_process, 2*HZ);
+
+ if (uhd->pid == -1) {
+ printk(KERN_INFO "%s: Failed to contact userspace process.\n",
+ uhd->name);
+ toi_netlink_close_complete(uhd);
+ return 1;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(toi_netlink_setup);
diff --git a/kernel/power/tuxonice_netlink.h b/kernel/power/tuxonice_netlink.h
new file mode 100644
index 0000000..952f67b
--- /dev/null
+++ b/kernel/power/tuxonice_netlink.h
@@ -0,0 +1,62 @@
+/*
+ * kernel/power/tuxonice_netlink.h
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * Declarations for functions for communicating with a userspace helper
+ * via netlink.
+ */
+
+#include <linux/netlink.h>
+#include <net/sock.h>
+
+#define NETLINK_MSG_BASE 0x10
+
+#define NETLINK_MSG_READY 0x10
+#define NETLINK_MSG_NOFREEZE_ME 0x16
+#define NETLINK_MSG_GET_DEBUGGING 0x19
+#define NETLINK_MSG_CLEANUP 0x24
+#define NETLINK_MSG_NOFREEZE_ACK 0x27
+#define NETLINK_MSG_IS_DEBUGGING 0x28
+
+struct user_helper_data {
+ int (*rcv_msg) (struct sk_buff *skb, struct nlmsghdr *nlh);
+ void (*not_ready) (void);
+ struct sock *nl;
+ u32 sock_seq;
+ pid_t pid;
+ char *comm;
+ char program[256];
+ int pool_level;
+ int pool_limit;
+ struct sk_buff *emerg_skbs;
+ int skb_size;
+ int netlink_id;
+ char *name;
+ struct user_helper_data *next;
+ struct completion wait_for_process;
+ u32 interface_version;
+ int must_init;
+ int debug;
+};
+
+#ifdef CONFIG_NET
+int toi_netlink_setup(struct user_helper_data *uhd);
+void toi_netlink_close(struct user_helper_data *uhd);
+void toi_send_netlink_message(struct user_helper_data *uhd,
+ int type, void *params, size_t len);
+void toi_netlink_close_complete(struct user_helper_data *uhd);
+#else
+static inline int toi_netlink_setup(struct user_helper_data *uhd)
+{
+ return 0;
+}
+
+static inline void toi_netlink_close(struct user_helper_data *uhd) { };
+static inline void toi_send_netlink_message(struct user_helper_data *uhd,
+ int type, void *params, size_t len) { };
+static inline void toi_netlink_close_complete(struct user_helper_data *uhd)
+ { };
+#endif
diff --git a/kernel/power/tuxonice_pagedir.c b/kernel/power/tuxonice_pagedir.c
new file mode 100644
index 0000000..6934114
--- /dev/null
+++ b/kernel/power/tuxonice_pagedir.c
@@ -0,0 +1,346 @@
+/*
+ * kernel/power/tuxonice_pagedir.c
+ *
+ * Copyright (C) 1998-2001 Gabor Kuti <seasons@fornax.hu>
+ * Copyright (C) 1998,2001,2002 Pavel Machek <pavel@suse.cz>
+ * Copyright (C) 2002-2003 Florent Chabaud <fchabaud@free.fr>
+ * Copyright (C) 2006-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * Routines for handling pagesets.
+ * Note that pbes aren't actually stored as such. They're stored as
+ * bitmaps and extents.
+ */
+
+#include <linux/suspend.h>
+#include <linux/highmem.h>
+#include <linux/bootmem.h>
+#include <linux/hardirq.h>
+#include <linux/sched.h>
+#include <linux/cpu.h>
+#include <asm/tlbflush.h>
+
+#include "tuxonice_pageflags.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_pagedir.h"
+#include "tuxonice_prepare_image.h"
+#include "tuxonice.h"
+#include "tuxonice_builtin.h"
+#include "tuxonice_alloc.h"
+
+static int ptoi_pfn;
+static struct pbe *this_low_pbe;
+static struct pbe **last_low_pbe_ptr;
+
+void toi_reset_alt_image_pageset2_pfn(void)
+{
+ memory_bm_position_reset(pageset2_map);
+}
+
+static struct page *first_conflicting_page;
+
+/*
+ * free_conflicting_pages
+ */
+
+static void free_conflicting_pages(void)
+{
+ while (first_conflicting_page) {
+ struct page *next =
+ *((struct page **) kmap(first_conflicting_page));
+ kunmap(first_conflicting_page);
+ toi__free_page(29, first_conflicting_page);
+ first_conflicting_page = next;
+ }
+}
+
+/* __toi_get_nonconflicting_page
+ *
+ * Description: Gets order zero pages that won't be overwritten
+ * while copying the original pages.
+ */
+
+struct page *___toi_get_nonconflicting_page(int can_be_highmem)
+{
+ struct page *page;
+ gfp_t flags = TOI_ATOMIC_GFP;
+ if (can_be_highmem)
+ flags |= __GFP_HIGHMEM;
+
+
+ if (test_toi_state(TOI_LOADING_ALT_IMAGE) &&
+ pageset2_map &&
+ (ptoi_pfn != BM_END_OF_MAP)) {
+ do {
+ ptoi_pfn = memory_bm_next_pfn(pageset2_map);
+ if (ptoi_pfn != BM_END_OF_MAP) {
+ page = pfn_to_page(ptoi_pfn);
+ if (!PagePageset1(page) &&
+ (can_be_highmem || !PageHighMem(page)))
+ return page;
+ }
+ } while (ptoi_pfn != BM_END_OF_MAP);
+ }
+
+ do {
+ page = toi_alloc_page(29, flags);
+ if (!page) {
+ printk(KERN_INFO "Failed to get nonconflicting "
+ "page.\n");
+ return NULL;
+ }
+ if (PagePageset1(page)) {
+ struct page **next = (struct page **) kmap(page);
+ *next = first_conflicting_page;
+ first_conflicting_page = page;
+ kunmap(page);
+ }
+ } while (PagePageset1(page));
+
+ return page;
+}
+
+unsigned long __toi_get_nonconflicting_page(void)
+{
+ struct page *page = ___toi_get_nonconflicting_page(0);
+ return page ? (unsigned long) page_address(page) : 0;
+}
+
+static struct pbe *get_next_pbe(struct page **page_ptr, struct pbe *this_pbe,
+ int highmem)
+{
+ if (((((unsigned long) this_pbe) & (PAGE_SIZE - 1))
+ + 2 * sizeof(struct pbe)) > PAGE_SIZE) {
+ struct page *new_page =
+ ___toi_get_nonconflicting_page(highmem);
+ if (!new_page)
+ return ERR_PTR(-ENOMEM);
+ this_pbe = (struct pbe *) kmap(new_page);
+ memset(this_pbe, 0, PAGE_SIZE);
+ *page_ptr = new_page;
+ } else
+ this_pbe++;
+
+ return this_pbe;
+}
+
+/**
+ * get_pageset1_load_addresses - generate pbes for conflicting pages
+ *
+ * We check here that pagedir & pages it points to won't collide
+ * with pages where we're going to restore from the loaded pages
+ * later.
+ *
+ * Returns:
+ * Zero on success, one if couldn't find enough pages (shouldn't
+ * happen).
+ **/
+int toi_get_pageset1_load_addresses(void)
+{
+ int pfn, highallocd = 0, lowallocd = 0;
+ int low_needed = pagedir1.size - get_highmem_size(pagedir1);
+ int high_needed = get_highmem_size(pagedir1);
+ int low_pages_for_highmem = 0;
+ gfp_t flags = GFP_ATOMIC | __GFP_NOWARN | __GFP_HIGHMEM;
+ struct page *page, *high_pbe_page = NULL, *last_high_pbe_page = NULL,
+ *low_pbe_page, *last_low_pbe_page = NULL;
+ struct pbe **last_high_pbe_ptr = &restore_highmem_pblist,
+ *this_high_pbe = NULL;
+ unsigned long orig_low_pfn, orig_high_pfn;
+ int high_pbes_done = 0, low_pbes_done = 0;
+ int low_direct = 0, high_direct = 0, result = 0, i;
+ int high_page = 1, high_offset = 0, low_page = 1, low_offset = 0;
+
+ memory_bm_set_iterators(pageset1_map, 3);
+ memory_bm_position_reset(pageset1_map);
+
+ memory_bm_set_iterators(pageset1_copy_map, 2);
+ memory_bm_position_reset(pageset1_copy_map);
+
+ last_low_pbe_ptr = &restore_pblist;
+
+ /* First, allocate pages for the start of our pbe lists. */
+ if (high_needed) {
+ high_pbe_page = ___toi_get_nonconflicting_page(1);
+ if (!high_pbe_page) {
+ result = -ENOMEM;
+ goto out;
+ }
+ this_high_pbe = (struct pbe *) kmap(high_pbe_page);
+ memset(this_high_pbe, 0, PAGE_SIZE);
+ }
+
+ low_pbe_page = ___toi_get_nonconflicting_page(0);
+ if (!low_pbe_page) {
+ result = -ENOMEM;
+ goto out;
+ }
+ this_low_pbe = (struct pbe *) page_address(low_pbe_page);
+
+ /*
+ * Next, allocate the number of pages we need.
+ */
+
+ i = low_needed + high_needed;
+
+ do {
+ int is_high;
+
+ if (i == low_needed)
+ flags &= ~__GFP_HIGHMEM;
+
+ page = toi_alloc_page(30, flags);
+ BUG_ON(!page);
+
+ SetPagePageset1Copy(page);
+ is_high = PageHighMem(page);
+
+ if (PagePageset1(page)) {
+ if (is_high)
+ high_direct++;
+ else
+ low_direct++;
+ } else {
+ if (is_high)
+ highallocd++;
+ else
+ lowallocd++;
+ }
+ } while (--i);
+
+ high_needed -= high_direct;
+ low_needed -= low_direct;
+
+ /*
+ * Do we need to use some lowmem pages for the copies of highmem
+ * pages?
+ */
+ if (high_needed > highallocd) {
+ low_pages_for_highmem = high_needed - highallocd;
+ high_needed -= low_pages_for_highmem;
+ low_needed += low_pages_for_highmem;
+ }
+
+ /*
+ * Now generate our pbes (which will be used for the atomic restore),
+ * and free unneeded pages.
+ */
+ memory_bm_position_reset(pageset1_copy_map);
+ for (pfn = memory_bm_next_pfn_index(pageset1_copy_map, 1); pfn != BM_END_OF_MAP;
+ pfn = memory_bm_next_pfn_index(pageset1_copy_map, 1)) {
+ int is_high;
+ page = pfn_to_page(pfn);
+ is_high = PageHighMem(page);
+
+ if (PagePageset1(page))
+ continue;
+
+ /* Nope. We're going to use this page. Add a pbe. */
+ if (is_high || low_pages_for_highmem) {
+ struct page *orig_page;
+ high_pbes_done++;
+ if (!is_high)
+ low_pages_for_highmem--;
+ do {
+ orig_high_pfn = memory_bm_next_pfn_index(pageset1_map, 1);
+ BUG_ON(orig_high_pfn == BM_END_OF_MAP);
+ orig_page = pfn_to_page(orig_high_pfn);
+ } while (!PageHighMem(orig_page) ||
+ PagePageset1Copy(orig_page));
+
+ this_high_pbe->orig_address = (void *) orig_high_pfn;
+ this_high_pbe->address = page;
+ this_high_pbe->next = NULL;
+ toi_message(TOI_PAGEDIR, TOI_VERBOSE, 0, "High pbe %d/%d: %p(%d)=>%p",
+ high_page, high_offset, page, orig_high_pfn, orig_page);
+ if (last_high_pbe_page != high_pbe_page) {
+ *last_high_pbe_ptr =
+ (struct pbe *) high_pbe_page;
+ if (last_high_pbe_page) {
+ kunmap(last_high_pbe_page);
+ high_page++;
+ high_offset = 0;
+ } else
+ high_offset++;
+ last_high_pbe_page = high_pbe_page;
+ } else {
+ *last_high_pbe_ptr = this_high_pbe;
+ high_offset++;
+ }
+ last_high_pbe_ptr = &this_high_pbe->next;
+ this_high_pbe = get_next_pbe(&high_pbe_page,
+ this_high_pbe, 1);
+ if (IS_ERR(this_high_pbe)) {
+ printk(KERN_INFO
+ "This high pbe is an error.\n");
+ return -ENOMEM;
+ }
+ } else {
+ struct page *orig_page;
+ low_pbes_done++;
+ do {
+ orig_low_pfn = memory_bm_next_pfn_index(pageset1_map, 2);
+ BUG_ON(orig_low_pfn == BM_END_OF_MAP);
+ orig_page = pfn_to_page(orig_low_pfn);
+ } while (PageHighMem(orig_page) ||
+ PagePageset1Copy(orig_page));
+
+ this_low_pbe->orig_address = page_address(orig_page);
+ this_low_pbe->address = page_address(page);
+ this_low_pbe->next = NULL;
+ toi_message(TOI_PAGEDIR, TOI_VERBOSE, 0, "Low pbe %d/%d: %p(%d)=>%p",
+ low_page, low_offset, this_low_pbe->orig_address,
+ orig_low_pfn, this_low_pbe->address);
+ *last_low_pbe_ptr = this_low_pbe;
+ last_low_pbe_ptr = &this_low_pbe->next;
+ this_low_pbe = get_next_pbe(&low_pbe_page,
+ this_low_pbe, 0);
+ if (low_pbe_page != last_low_pbe_page) {
+ if (last_low_pbe_page) {
+ low_page++;
+ low_offset = 0;
+ }
+ last_low_pbe_page = low_pbe_page;
+ } else
+ low_offset++;
+ if (IS_ERR(this_low_pbe)) {
+ printk(KERN_INFO "this_low_pbe is an error.\n");
+ return -ENOMEM;
+ }
+ }
+ }
+
+ if (high_pbe_page)
+ kunmap(high_pbe_page);
+
+ if (last_high_pbe_page != high_pbe_page) {
+ if (last_high_pbe_page)
+ kunmap(last_high_pbe_page);
+ toi__free_page(29, high_pbe_page);
+ }
+
+ free_conflicting_pages();
+
+out:
+ memory_bm_set_iterators(pageset1_map, 1);
+ memory_bm_set_iterators(pageset1_copy_map, 1);
+ return result;
+}
+
+int add_boot_kernel_data_pbe(void)
+{
+ this_low_pbe->address = (char *) __toi_get_nonconflicting_page();
+ if (!this_low_pbe->address) {
+ printk(KERN_INFO "Failed to get bkd atomic restore buffer.");
+ return -ENOMEM;
+ }
+
+ toi_bkd.size = sizeof(toi_bkd);
+ memcpy(this_low_pbe->address, &toi_bkd, sizeof(toi_bkd));
+
+ *last_low_pbe_ptr = this_low_pbe;
+ this_low_pbe->orig_address = (char *) boot_kernel_data_buffer;
+ this_low_pbe->next = NULL;
+ return 0;
+}
diff --git a/kernel/power/tuxonice_pagedir.h b/kernel/power/tuxonice_pagedir.h
new file mode 100644
index 0000000..0c7321e
--- /dev/null
+++ b/kernel/power/tuxonice_pagedir.h
@@ -0,0 +1,50 @@
+/*
+ * kernel/power/tuxonice_pagedir.h
+ *
+ * Copyright (C) 2006-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * Declarations for routines for handling pagesets.
+ */
+
+#ifndef KERNEL_POWER_PAGEDIR_H
+#define KERNEL_POWER_PAGEDIR_H
+
+/* Pagedir
+ *
+ * Contains the metadata for a set of pages saved in the image.
+ */
+
+struct pagedir {
+ int id;
+ unsigned long size;
+#ifdef CONFIG_HIGHMEM
+ unsigned long size_high;
+#endif
+};
+
+#ifdef CONFIG_HIGHMEM
+#define get_highmem_size(pagedir) (pagedir.size_high)
+#define set_highmem_size(pagedir, sz) do { pagedir.size_high = sz; } while (0)
+#define inc_highmem_size(pagedir) do { pagedir.size_high++; } while (0)
+#define get_lowmem_size(pagedir) (pagedir.size - pagedir.size_high)
+#else
+#define get_highmem_size(pagedir) (0)
+#define set_highmem_size(pagedir, sz) do { } while (0)
+#define inc_highmem_size(pagedir) do { } while (0)
+#define get_lowmem_size(pagedir) (pagedir.size)
+#endif
+
+extern struct pagedir pagedir1, pagedir2;
+
+extern void toi_copy_pageset1(void);
+
+extern int toi_get_pageset1_load_addresses(void);
+
+extern unsigned long __toi_get_nonconflicting_page(void);
+struct page *___toi_get_nonconflicting_page(int can_be_highmem);
+
+extern void toi_reset_alt_image_pageset2_pfn(void);
+extern int add_boot_kernel_data_pbe(void);
+#endif
diff --git a/kernel/power/tuxonice_pageflags.c b/kernel/power/tuxonice_pageflags.c
new file mode 100644
index 0000000..a3780e5
--- /dev/null
+++ b/kernel/power/tuxonice_pageflags.c
@@ -0,0 +1,29 @@
+/*
+ * kernel/power/tuxonice_pageflags.c
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * Routines for serialising and relocating pageflags in which we
+ * store our image metadata.
+ */
+
+#include <linux/list.h>
+#include <linux/module.h>
+#include "tuxonice_pageflags.h"
+#include "power.h"
+
+int toi_pageflags_space_needed(void)
+{
+ int total = 0;
+ struct bm_block *bb;
+
+ total = sizeof(unsigned int);
+
+ list_for_each_entry(bb, &pageset1_map->blocks, hook)
+ total += 2 * sizeof(unsigned long) + PAGE_SIZE;
+
+ return total;
+}
+EXPORT_SYMBOL_GPL(toi_pageflags_space_needed);
diff --git a/kernel/power/tuxonice_pageflags.h b/kernel/power/tuxonice_pageflags.h
new file mode 100644
index 0000000..3d6d471
--- /dev/null
+++ b/kernel/power/tuxonice_pageflags.h
@@ -0,0 +1,80 @@
+/*
+ * kernel/power/tuxonice_pageflags.h
+ *
+ * Copyright (C) 2004-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef KERNEL_POWER_TUXONICE_PAGEFLAGS_H
+#define KERNEL_POWER_TUXONICE_PAGEFLAGS_H
+
+extern struct memory_bitmap *pageset1_map;
+extern struct memory_bitmap *pageset1_copy_map;
+extern struct memory_bitmap *pageset2_map;
+extern struct memory_bitmap *page_resave_map;
+extern struct memory_bitmap *io_map;
+extern struct memory_bitmap *nosave_map;
+extern struct memory_bitmap *free_map;
+extern struct memory_bitmap *compare_map;
+
+#define PagePageset1(page) \
+ (memory_bm_test_bit(pageset1_map, page_to_pfn(page)))
+#define SetPagePageset1(page) \
+ (memory_bm_set_bit(pageset1_map, page_to_pfn(page)))
+#define ClearPagePageset1(page) \
+ (memory_bm_clear_bit(pageset1_map, page_to_pfn(page)))
+
+#define PagePageset1Copy(page) \
+ (memory_bm_test_bit(pageset1_copy_map, page_to_pfn(page)))
+#define SetPagePageset1Copy(page) \
+ (memory_bm_set_bit(pageset1_copy_map, page_to_pfn(page)))
+#define ClearPagePageset1Copy(page) \
+ (memory_bm_clear_bit(pageset1_copy_map, page_to_pfn(page)))
+
+#define PagePageset2(page) \
+ (memory_bm_test_bit(pageset2_map, page_to_pfn(page)))
+#define SetPagePageset2(page) \
+ (memory_bm_set_bit(pageset2_map, page_to_pfn(page)))
+#define ClearPagePageset2(page) \
+ (memory_bm_clear_bit(pageset2_map, page_to_pfn(page)))
+
+#define PageWasRW(page) \
+ (memory_bm_test_bit(pageset2_map, page_to_pfn(page)))
+#define SetPageWasRW(page) \
+ (memory_bm_set_bit(pageset2_map, page_to_pfn(page)))
+#define ClearPageWasRW(page) \
+ (memory_bm_clear_bit(pageset2_map, page_to_pfn(page)))
+
+#define PageResave(page) (page_resave_map ? \
+ memory_bm_test_bit(page_resave_map, page_to_pfn(page)) : 0)
+#define SetPageResave(page) \
+ (memory_bm_set_bit(page_resave_map, page_to_pfn(page)))
+#define ClearPageResave(page) \
+ (memory_bm_clear_bit(page_resave_map, page_to_pfn(page)))
+
+#define PageNosave(page) (nosave_map ? \
+ memory_bm_test_bit(nosave_map, page_to_pfn(page)) : 0)
+#define SetPageNosave(page) \
+ (memory_bm_set_bit(nosave_map, page_to_pfn(page)))
+#define ClearPageNosave(page) \
+ (memory_bm_clear_bit(nosave_map, page_to_pfn(page)))
+
+#define PageNosaveFree(page) (free_map ? \
+ memory_bm_test_bit(free_map, page_to_pfn(page)) : 0)
+#define SetPageNosaveFree(page) \
+ (memory_bm_set_bit(free_map, page_to_pfn(page)))
+#define ClearPageNosaveFree(page) \
+ (memory_bm_clear_bit(free_map, page_to_pfn(page)))
+
+#define PageCompareChanged(page) (compare_map ? \
+ memory_bm_test_bit(compare_map, page_to_pfn(page)) : 0)
+#define SetPageCompareChanged(page) \
+ (memory_bm_set_bit(compare_map, page_to_pfn(page)))
+#define ClearPageCompareChanged(page) \
+ (memory_bm_clear_bit(compare_map, page_to_pfn(page)))
+
+extern void save_pageflags(struct memory_bitmap *pagemap);
+extern int load_pageflags(struct memory_bitmap *pagemap);
+extern int toi_pageflags_space_needed(void);
+#endif
diff --git a/kernel/power/tuxonice_power_off.c b/kernel/power/tuxonice_power_off.c
new file mode 100644
index 0000000..6ac5ea71
--- /dev/null
+++ b/kernel/power/tuxonice_power_off.c
@@ -0,0 +1,287 @@
+/*
+ * kernel/power/tuxonice_power_off.c
+ *
+ * Copyright (C) 2006-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * Support for powering down.
+ */
+
+#include <linux/device.h>
+#include <linux/suspend.h>
+#include <linux/mm.h>
+#include <linux/pm.h>
+#include <linux/reboot.h>
+#include <linux/cpu.h>
+#include <linux/console.h>
+#include <linux/fs.h>
+#include "tuxonice.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_power_off.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_io.h"
+
+unsigned long toi_poweroff_method; /* 0 - Kernel power off */
+EXPORT_SYMBOL_GPL(toi_poweroff_method);
+
+static int wake_delay;
+static char lid_state_file[256], wake_alarm_dir[256];
+static struct file *lid_file, *alarm_file, *epoch_file;
+static int post_wake_state = -1;
+
+static int did_suspend_to_both;
+
+/*
+ * __toi_power_down
+ * Functionality : Powers down or reboots the computer once the image
+ * has been written to disk.
+ * Key Assumptions : Able to reboot/power down via code called or that
+ * the warning emitted if the calls fail will be visible
+ * to the user (ie printk resumes devices).
+ */
+
+static void __toi_power_down(int method)
+{
+ int error;
+
+ toi_cond_pause(1, test_action_state(TOI_REBOOT) ? "Ready to reboot." :
+ "Powering down.");
+
+ if (test_result_state(TOI_ABORTED))
+ goto out;
+
+ if (test_action_state(TOI_REBOOT))
+ kernel_restart(NULL);
+
+ switch (method) {
+ case 0:
+ break;
+ case 3:
+ /*
+ * Re-read the overwritten part of pageset2 to make post-resume
+ * faster.
+ */
+ if (read_pageset2(1))
+ panic("Attempt to reload pagedir 2 failed. "
+ "Try rebooting.");
+
+ pm_prepare_console();
+
+ error = pm_notifier_call_chain(PM_SUSPEND_PREPARE);
+ if (!error) {
+ pm_restore_gfp_mask();
+ error = suspend_devices_and_enter(PM_SUSPEND_MEM);
+ pm_restrict_gfp_mask();
+ if (!error)
+ did_suspend_to_both = 1;
+ }
+ pm_notifier_call_chain(PM_POST_SUSPEND);
+ pm_restore_console();
+
+ /* Success - we're now post-resume-from-ram */
+ if (did_suspend_to_both)
+ return;
+
+ /* Failed to suspend to ram - do normal power off */
+ break;
+ case 4:
+ /*
+ * If succeeds, doesn't return. If fails, do a simple
+ * powerdown.
+ */
+ hibernation_platform_enter();
+ break;
+ case 5:
+ /* Historic entry only now */
+ break;
+ }
+
+ if (method && method != 5)
+ toi_cond_pause(1,
+ "Falling back to alternate power off method.");
+
+ if (test_result_state(TOI_ABORTED))
+ goto out;
+
+ kernel_power_off();
+ kernel_halt();
+ toi_cond_pause(1, "Powerdown failed.");
+ while (1)
+ cpu_relax();
+
+out:
+ if (read_pageset2(1))
+ panic("Attempt to reload pagedir 2 failed. Try rebooting.");
+ return;
+}
+
+#define CLOSE_FILE(file) \
+ if (file) { \
+ filp_close(file, NULL); file = NULL; \
+ }
+
+static void powerdown_cleanup(int toi_or_resume)
+{
+ if (!toi_or_resume)
+ return;
+
+ CLOSE_FILE(lid_file);
+ CLOSE_FILE(alarm_file);
+ CLOSE_FILE(epoch_file);
+}
+
+static void open_file(char *format, char *arg, struct file **var, int mode,
+ char *desc)
+{
+ char buf[256];
+
+ if (strlen(arg)) {
+ sprintf(buf, format, arg);
+ *var = filp_open(buf, mode, 0);
+ if (IS_ERR(*var) || !*var) {
+ printk(KERN_INFO "Failed to open %s file '%s' (%p).\n",
+ desc, buf, *var);
+ *var = NULL;
+ }
+ }
+}
+
+static int powerdown_init(int toi_or_resume)
+{
+ if (!toi_or_resume)
+ return 0;
+
+ did_suspend_to_both = 0;
+
+ open_file("/proc/acpi/button/%s/state", lid_state_file, &lid_file,
+ O_RDONLY, "lid");
+
+ if (strlen(wake_alarm_dir)) {
+ open_file("/sys/class/rtc/%s/wakealarm", wake_alarm_dir,
+ &alarm_file, O_WRONLY, "alarm");
+
+ open_file("/sys/class/rtc/%s/since_epoch", wake_alarm_dir,
+ &epoch_file, O_RDONLY, "epoch");
+ }
+
+ return 0;
+}
+
+static int lid_closed(void)
+{
+ char array[25];
+ ssize_t size;
+ loff_t pos = 0;
+
+ if (!lid_file)
+ return 0;
+
+ size = vfs_read(lid_file, (char __user *) array, 25, &pos);
+ if ((int) size < 1) {
+ printk(KERN_INFO "Failed to read lid state file (%d).\n",
+ (int) size);
+ return 0;
+ }
+
+ if (!strcmp(array, "state: closed\n"))
+ return 1;
+
+ return 0;
+}
+
+static void write_alarm_file(int value)
+{
+ ssize_t size;
+ char buf[40];
+ loff_t pos = 0;
+
+ if (!alarm_file)
+ return;
+
+ sprintf(buf, "%d\n", value);
+
+ size = vfs_write(alarm_file, (char __user *)buf, strlen(buf), &pos);
+
+ if (size < 0)
+ printk(KERN_INFO "Error %d writing alarm value %s.\n",
+ (int) size, buf);
+}
+
+/**
+ * toi_check_resleep: See whether to powerdown again after waking.
+ *
+ * After waking, check whether we should powerdown again in a (usually
+ * different) way. We only do this if the lid switch is still closed.
+ */
+void toi_check_resleep(void)
+{
+ /* We only return if we suspended to ram and woke. */
+ if (lid_closed() && post_wake_state >= 0)
+ __toi_power_down(post_wake_state);
+}
+
+void toi_power_down(void)
+{
+ if (alarm_file && wake_delay) {
+ char array[25];
+ loff_t pos = 0;
+ size_t size = vfs_read(epoch_file, (char __user *) array, 25,
+ &pos);
+
+ if (((int) size) < 1)
+ printk(KERN_INFO "Failed to read epoch file (%d).\n",
+ (int) size);
+ else {
+ unsigned long since_epoch;
+ if (!strict_strtoul(array, 0, &since_epoch)) {
+ /* Clear any wakeup time. */
+ write_alarm_file(0);
+
+ /* Set new wakeup time. */
+ write_alarm_file(since_epoch + wake_delay);
+ }
+ }
+ }
+
+ __toi_power_down(toi_poweroff_method);
+
+ toi_check_resleep();
+}
+EXPORT_SYMBOL_GPL(toi_power_down);
+
+static struct toi_sysfs_data sysfs_params[] = {
+#if defined(CONFIG_ACPI)
+ SYSFS_STRING("lid_file", SYSFS_RW, lid_state_file, 256, 0, NULL),
+ SYSFS_INT("wake_delay", SYSFS_RW, &wake_delay, 0, INT_MAX, 0, NULL),
+ SYSFS_STRING("wake_alarm_dir", SYSFS_RW, wake_alarm_dir, 256, 0, NULL),
+ SYSFS_INT("post_wake_state", SYSFS_RW, &post_wake_state, -1, 5, 0,
+ NULL),
+ SYSFS_UL("powerdown_method", SYSFS_RW, &toi_poweroff_method, 0, 5, 0),
+ SYSFS_INT("did_suspend_to_both", SYSFS_READONLY, &did_suspend_to_both,
+ 0, 0, 0, NULL)
+#endif
+};
+
+static struct toi_module_ops powerdown_ops = {
+ .type = MISC_HIDDEN_MODULE,
+ .name = "poweroff",
+ .initialise = powerdown_init,
+ .cleanup = powerdown_cleanup,
+ .directory = "[ROOT]",
+ .module = THIS_MODULE,
+ .sysfs_data = sysfs_params,
+ .num_sysfs_entries = sizeof(sysfs_params) /
+ sizeof(struct toi_sysfs_data),
+};
+
+int toi_poweroff_init(void)
+{
+ return toi_register_module(&powerdown_ops);
+}
+
+void toi_poweroff_exit(void)
+{
+ toi_unregister_module(&powerdown_ops);
+}
diff --git a/kernel/power/tuxonice_power_off.h b/kernel/power/tuxonice_power_off.h
new file mode 100644
index 0000000..804293d
--- /dev/null
+++ b/kernel/power/tuxonice_power_off.h
@@ -0,0 +1,24 @@
+/*
+ * kernel/power/tuxonice_power_off.h
+ *
+ * Copyright (C) 2006-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * Support for the powering down.
+ */
+
+int toi_pm_state_finish(void);
+void toi_power_down(void);
+extern unsigned long toi_poweroff_method;
+int toi_poweroff_init(void);
+void toi_poweroff_exit(void);
+void toi_check_resleep(void);
+
+extern int platform_begin(int platform_mode);
+extern int platform_pre_snapshot(int platform_mode);
+extern void platform_leave(int platform_mode);
+extern void platform_end(int platform_mode);
+extern void platform_finish(int platform_mode);
+extern int platform_pre_restore(int platform_mode);
+extern void platform_restore_cleanup(int platform_mode);
diff --git a/kernel/power/tuxonice_prepare_image.c b/kernel/power/tuxonice_prepare_image.c
new file mode 100644
index 0000000..64c71c0
--- /dev/null
+++ b/kernel/power/tuxonice_prepare_image.c
@@ -0,0 +1,1118 @@
+/*
+ * kernel/power/tuxonice_prepare_image.c
+ *
+ * Copyright (C) 2003-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * We need to eat memory until we can:
+ * 1. Perform the save without changing anything (RAM_NEEDED < #pages)
+ * 2. Fit it all in available space (toiActiveAllocator->available_space() >=
+ * main_storage_needed())
+ * 3. Reload the pagedir and pageset1 to places that don't collide with their
+ * final destinations, not knowing to what extent the resumed kernel will
+ * overlap with the one loaded at boot time. I think the resumed kernel
+ * should overlap completely, but I don't want to rely on this as it is
+ * an unproven assumption. We therefore assume there will be no overlap at
+ * all (worse case).
+ * 4. Meet the user's requested limit (if any) on the size of the image.
+ * The limit is in MB, so pages/256 (assuming 4K pages).
+ *
+ */
+
+#include <linux/highmem.h>
+#include <linux/freezer.h>
+#include <linux/hardirq.h>
+#include <linux/mmzone.h>
+#include <linux/console.h>
+
+#include "tuxonice_pageflags.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_io.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_prepare_image.h"
+#include "tuxonice.h"
+#include "tuxonice_extent.h"
+#include "tuxonice_checksum.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_alloc.h"
+#include "tuxonice_atomic_copy.h"
+#include "tuxonice_builtin.h"
+
+static unsigned long num_nosave, main_storage_allocated, storage_limit,
+ header_storage_needed;
+unsigned long extra_pd1_pages_allowance =
+ CONFIG_TOI_DEFAULT_EXTRA_PAGES_ALLOWANCE;
+long image_size_limit = CONFIG_TOI_DEFAULT_IMAGE_SIZE_LIMIT;
+static int no_ps2_needed;
+
+struct attention_list {
+ struct task_struct *task;
+ struct attention_list *next;
+};
+
+static struct attention_list *attention_list;
+
+#define PAGESET1 0
+#define PAGESET2 1
+
+void free_attention_list(void)
+{
+ struct attention_list *last = NULL;
+
+ while (attention_list) {
+ last = attention_list;
+ attention_list = attention_list->next;
+ toi_kfree(6, last, sizeof(*last));
+ }
+}
+
+static int build_attention_list(void)
+{
+ int i, task_count = 0;
+ struct task_struct *p;
+ struct attention_list *next;
+
+ /*
+ * Count all userspace process (with task->mm) marked PF_NOFREEZE.
+ */
+ toi_read_lock_tasklist();
+ for_each_process(p)
+ if ((p->flags & PF_NOFREEZE) || p == current)
+ task_count++;
+ toi_read_unlock_tasklist();
+
+ /*
+ * Allocate attention list structs.
+ */
+ for (i = 0; i < task_count; i++) {
+ struct attention_list *this =
+ toi_kzalloc(6, sizeof(struct attention_list),
+ TOI_WAIT_GFP);
+ if (!this) {
+ printk(KERN_INFO "Failed to allocate slab for "
+ "attention list.\n");
+ free_attention_list();
+ return 1;
+ }
+ this->next = NULL;
+ if (attention_list)
+ this->next = attention_list;
+ attention_list = this;
+ }
+
+ next = attention_list;
+ toi_read_lock_tasklist();
+ for_each_process(p)
+ if ((p->flags & PF_NOFREEZE) || p == current) {
+ next->task = p;
+ next = next->next;
+ }
+ toi_read_unlock_tasklist();
+ return 0;
+}
+
+static void pageset2_full(void)
+{
+ struct zone *zone;
+ struct page *page;
+ unsigned long flags;
+ int i;
+
+ for_each_populated_zone(zone) {
+ spin_lock_irqsave(&zone->lru_lock, flags);
+ for_each_lru(i) {
+ if (!zone_page_state(zone, NR_LRU_BASE + i))
+ continue;
+
+ list_for_each_entry(page, &zone->lruvec.lists[i], lru) {
+ struct address_space *mapping;
+
+ mapping = page_mapping(page);
+ if (!mapping || !mapping->host ||
+ !(mapping->host->i_flags & S_ATOMIC_COPY))
+ SetPagePageset2(page);
+ }
+ }
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
+ }
+}
+
+/*
+ * toi_mark_task_as_pageset
+ * Functionality : Marks all the saveable pages belonging to a given process
+ * as belonging to a particular pageset.
+ */
+
+static void toi_mark_task_as_pageset(struct task_struct *t, int pageset2)
+{
+ struct vm_area_struct *vma;
+ struct mm_struct *mm;
+
+ mm = t->active_mm;
+
+ if (!mm || !mm->mmap)
+ return;
+
+ if (!irqs_disabled())
+ down_read(&mm->mmap_sem);
+
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ unsigned long posn;
+
+ if (!vma->vm_start ||
+ vma->vm_flags & (VM_IO | VM_DONTDUMP | VM_PFNMAP))
+ continue;
+
+ for (posn = vma->vm_start; posn < vma->vm_end;
+ posn += PAGE_SIZE) {
+ struct page *page = follow_page(vma, posn, 0);
+ struct address_space *mapping;
+
+ if (!page || !pfn_valid(page_to_pfn(page)))
+ continue;
+
+ mapping = page_mapping(page);
+ if (mapping && mapping->host &&
+ mapping->host->i_flags & S_ATOMIC_COPY)
+ continue;
+
+ if (pageset2)
+ SetPagePageset2(page);
+ else {
+ ClearPagePageset2(page);
+ SetPagePageset1(page);
+ }
+ }
+ }
+
+ if (!irqs_disabled())
+ up_read(&mm->mmap_sem);
+}
+
+static void mark_tasks(int pageset)
+{
+ struct task_struct *p;
+
+ toi_read_lock_tasklist();
+ for_each_process(p) {
+ if (!p->mm)
+ continue;
+
+ if (p->flags & PF_KTHREAD)
+ continue;
+
+ toi_mark_task_as_pageset(p, pageset);
+ }
+ toi_read_unlock_tasklist();
+
+}
+
+/* mark_pages_for_pageset2
+ *
+ * Description: Mark unshared pages in processes not needed for hibernate as
+ * being able to be written out in a separate pagedir.
+ * HighMem pages are simply marked as pageset2. They won't be
+ * needed during hibernate.
+ */
+
+static void toi_mark_pages_for_pageset2(void)
+{
+ struct attention_list *this = attention_list;
+
+ memory_bm_clear(pageset2_map);
+
+ if (test_action_state(TOI_NO_PAGESET2) || no_ps2_needed)
+ return;
+
+ if (test_action_state(TOI_PAGESET2_FULL))
+ pageset2_full();
+ else
+ mark_tasks(PAGESET2);
+
+ /*
+ * Because the tasks in attention_list are ones related to hibernating,
+ * we know that they won't go away under us.
+ */
+
+ while (this) {
+ if (!test_result_state(TOI_ABORTED))
+ toi_mark_task_as_pageset(this->task, PAGESET1);
+ this = this->next;
+ }
+}
+
+/*
+ * The atomic copy of pageset1 is stored in pageset2 pages.
+ * But if pageset1 is larger (normally only just after boot),
+ * we need to allocate extra pages to store the atomic copy.
+ * The following data struct and functions are used to handle
+ * the allocation and freeing of that memory.
+ */
+
+static unsigned long extra_pages_allocated;
+
+struct extras {
+ struct page *page;
+ int order;
+ struct extras *next;
+};
+
+static struct extras *extras_list;
+
+/* toi_free_extra_pagedir_memory
+ *
+ * Description: Free previously allocated extra pagedir memory.
+ */
+void toi_free_extra_pagedir_memory(void)
+{
+ /* Free allocated pages */
+ while (extras_list) {
+ struct extras *this = extras_list;
+ int i;
+
+ extras_list = this->next;
+
+ for (i = 0; i < (1 << this->order); i++)
+ ClearPageNosave(this->page + i);
+
+ toi_free_pages(9, this->page, this->order);
+ toi_kfree(7, this, sizeof(*this));
+ }
+
+ extra_pages_allocated = 0;
+}
+
+/* toi_allocate_extra_pagedir_memory
+ *
+ * Description: Allocate memory for making the atomic copy of pagedir1 in the
+ * case where it is bigger than pagedir2.
+ * Arguments: int num_to_alloc: Number of extra pages needed.
+ * Result: int. Number of extra pages we now have allocated.
+ */
+static int toi_allocate_extra_pagedir_memory(int extra_pages_needed)
+{
+ int j, order, num_to_alloc = extra_pages_needed - extra_pages_allocated;
+ gfp_t flags = TOI_ATOMIC_GFP;
+
+ if (num_to_alloc < 1)
+ return 0;
+
+ order = fls(num_to_alloc);
+ if (order >= MAX_ORDER)
+ order = MAX_ORDER - 1;
+
+ while (num_to_alloc) {
+ struct page *newpage;
+ unsigned long virt;
+ struct extras *extras_entry;
+
+ while ((1 << order) > num_to_alloc)
+ order--;
+
+ extras_entry = (struct extras *) toi_kzalloc(7,
+ sizeof(struct extras), TOI_ATOMIC_GFP);
+
+ if (!extras_entry)
+ return extra_pages_allocated;
+
+ virt = toi_get_free_pages(9, flags, order);
+ while (!virt && order) {
+ order--;
+ virt = toi_get_free_pages(9, flags, order);
+ }
+
+ if (!virt) {
+ toi_kfree(7, extras_entry, sizeof(*extras_entry));
+ return extra_pages_allocated;
+ }
+
+ newpage = virt_to_page(virt);
+
+ extras_entry->page = newpage;
+ extras_entry->order = order;
+ extras_entry->next = extras_list;
+
+ extras_list = extras_entry;
+
+ for (j = 0; j < (1 << order); j++) {
+ SetPageNosave(newpage + j);
+ SetPagePageset1Copy(newpage + j);
+ }
+
+ extra_pages_allocated += (1 << order);
+ num_to_alloc -= (1 << order);
+ }
+
+ return extra_pages_allocated;
+}
+
+/*
+ * real_nr_free_pages: Count pcp pages for a zone type or all zones
+ * (-1 for all, otherwise zone_idx() result desired).
+ */
+unsigned long real_nr_free_pages(unsigned long zone_idx_mask)
+{
+ struct zone *zone;
+ int result = 0, cpu;
+
+ /* PCP lists */
+ for_each_populated_zone(zone) {
+ if (!(zone_idx_mask & (1 << zone_idx(zone))))
+ continue;
+
+ for_each_online_cpu(cpu) {
+ struct per_cpu_pageset *pset =
+ per_cpu_ptr(zone->pageset, cpu);
+ struct per_cpu_pages *pcp = &pset->pcp;
+ result += pcp->count;
+ }
+
+ result += zone_page_state(zone, NR_FREE_PAGES);
+ }
+ return result;
+}
+EXPORT_SYMBOL_GPL(real_nr_free_pages);
+
+/*
+ * Discover how much extra memory will be required by the drivers
+ * when they're asked to hibernate. We can then ensure that amount
+ * of memory is available when we really want it.
+ */
+static void get_extra_pd1_allowance(void)
+{
+ unsigned long orig_num_free = real_nr_free_pages(all_zones_mask), final;
+
+ toi_prepare_status(CLEAR_BAR, "Finding allowance for drivers.");
+
+ if (toi_go_atomic(PMSG_FREEZE, 1))
+ return;
+
+ final = real_nr_free_pages(all_zones_mask);
+ toi_end_atomic(ATOMIC_ALL_STEPS, 1, 0);
+
+ extra_pd1_pages_allowance = (orig_num_free > final) ?
+ orig_num_free - final + MIN_EXTRA_PAGES_ALLOWANCE :
+ MIN_EXTRA_PAGES_ALLOWANCE;
+}
+
+/*
+ * Amount of storage needed, possibly taking into account the
+ * expected compression ratio and possibly also ignoring our
+ * allowance for extra pages.
+ */
+static unsigned long main_storage_needed(int use_ecr,
+ int ignore_extra_pd1_allow)
+{
+ return (pagedir1.size + pagedir2.size +
+ (ignore_extra_pd1_allow ? 0 : extra_pd1_pages_allowance)) *
+ (use_ecr ? toi_expected_compression_ratio() : 100) / 100;
+}
+
+/*
+ * Storage needed for the image header, in bytes until the return.
+ */
+unsigned long get_header_storage_needed(void)
+{
+ unsigned long bytes = sizeof(struct toi_header) +
+ toi_header_storage_for_modules() +
+ toi_pageflags_space_needed() +
+ fs_info_space_needed();
+
+ return DIV_ROUND_UP(bytes, PAGE_SIZE);
+}
+EXPORT_SYMBOL_GPL(get_header_storage_needed);
+
+/*
+ * When freeing memory, pages from either pageset might be freed.
+ *
+ * When seeking to free memory to be able to hibernate, for every ps1 page
+ * freed, we need 2 less pages for the atomic copy because there is one less
+ * page to copy and one more page into which data can be copied.
+ *
+ * Freeing ps2 pages saves us nothing directly. No more memory is available
+ * for the atomic copy. Indirectly, a ps1 page might be freed (slab?), but
+ * that's too much work to figure out.
+ *
+ * => ps1_to_free functions
+ *
+ * Of course if we just want to reduce the image size, because of storage
+ * limitations or an image size limit either ps will do.
+ *
+ * => any_to_free function
+ */
+
+static unsigned long lowpages_usable_for_highmem_copy(void)
+{
+ unsigned long needed = get_lowmem_size(pagedir1) +
+ extra_pd1_pages_allowance + MIN_FREE_RAM +
+ toi_memory_for_modules(0),
+ available = get_lowmem_size(pagedir2) +
+ real_nr_free_low_pages() + extra_pages_allocated;
+
+ return available > needed ? available - needed : 0;
+}
+
+static unsigned long highpages_ps1_to_free(void)
+{
+ unsigned long need = get_highmem_size(pagedir1),
+ available = get_highmem_size(pagedir2) +
+ real_nr_free_high_pages() +
+ lowpages_usable_for_highmem_copy();
+
+ return need > available ? DIV_ROUND_UP(need - available, 2) : 0;
+}
+
+static unsigned long lowpages_ps1_to_free(void)
+{
+ unsigned long needed = get_lowmem_size(pagedir1) +
+ extra_pd1_pages_allowance + MIN_FREE_RAM +
+ toi_memory_for_modules(0),
+ available = get_lowmem_size(pagedir2) +
+ real_nr_free_low_pages() + extra_pages_allocated;
+
+ return needed > available ? DIV_ROUND_UP(needed - available, 2) : 0;
+}
+
+static unsigned long current_image_size(void)
+{
+ return pagedir1.size + pagedir2.size + header_storage_needed;
+}
+
+static unsigned long storage_still_required(void)
+{
+ unsigned long needed = main_storage_needed(1, 1);
+ return needed > storage_limit ? needed - storage_limit : 0;
+}
+
+static unsigned long ram_still_required(void)
+{
+ unsigned long needed = MIN_FREE_RAM + toi_memory_for_modules(0) +
+ 2 * extra_pd1_pages_allowance,
+ available = real_nr_free_low_pages() + extra_pages_allocated;
+ return needed > available ? needed - available : 0;
+}
+
+unsigned long any_to_free(int use_image_size_limit)
+{
+ int use_soft_limit = use_image_size_limit && image_size_limit > 0;
+ unsigned long current_size = current_image_size(),
+ soft_limit = use_soft_limit ? (image_size_limit << 8) : 0,
+ to_free = use_soft_limit ? (current_size > soft_limit ?
+ current_size - soft_limit : 0) : 0,
+ storage_limit = storage_still_required(),
+ ram_limit = ram_still_required(),
+ first_max = max(to_free, storage_limit);
+
+ return max(first_max, ram_limit);
+}
+
+static int need_pageset2(void)
+{
+ return (real_nr_free_low_pages() + extra_pages_allocated -
+ 2 * extra_pd1_pages_allowance - MIN_FREE_RAM -
+ toi_memory_for_modules(0) - pagedir1.size) < pagedir2.size;
+}
+
+/* amount_needed
+ *
+ * Calculates the amount by which the image size needs to be reduced to meet
+ * our constraints.
+ */
+static unsigned long amount_needed(int use_image_size_limit)
+{
+ return max(highpages_ps1_to_free() + lowpages_ps1_to_free(),
+ any_to_free(use_image_size_limit));
+}
+
+static int image_not_ready(int use_image_size_limit)
+{
+ toi_message(TOI_EAT_MEMORY, TOI_LOW, 1,
+ "Amount still needed (%lu) > 0:%u,"
+ " Storage allocd: %lu < %lu: %u.\n",
+ amount_needed(use_image_size_limit),
+ (amount_needed(use_image_size_limit) > 0),
+ main_storage_allocated,
+ main_storage_needed(1, 1),
+ main_storage_allocated < main_storage_needed(1, 1));
+
+ toi_cond_pause(0, NULL);
+
+ return (amount_needed(use_image_size_limit) > 0) ||
+ main_storage_allocated < main_storage_needed(1, 1);
+}
+
+static void display_failure_reason(int tries_exceeded)
+{
+ unsigned long storage_required = storage_still_required(),
+ ram_required = ram_still_required(),
+ high_ps1 = highpages_ps1_to_free(),
+ low_ps1 = lowpages_ps1_to_free();
+
+ printk(KERN_INFO "Failed to prepare the image because...\n");
+
+ if (!storage_limit) {
+ printk(KERN_INFO "- You need some storage available to be "
+ "able to hibernate.\n");
+ return;
+ }
+
+ if (tries_exceeded)
+ printk(KERN_INFO "- The maximum number of iterations was "
+ "reached without successfully preparing the "
+ "image.\n");
+
+ if (storage_required) {
+ printk(KERN_INFO " - We need at least %lu pages of storage "
+ "(ignoring the header), but only have %lu.\n",
+ main_storage_needed(1, 1),
+ main_storage_allocated);
+ set_abort_result(TOI_INSUFFICIENT_STORAGE);
+ }
+
+ if (ram_required) {
+ printk(KERN_INFO " - We need %lu more free pages of low "
+ "memory.\n", ram_required);
+ printk(KERN_INFO " Minimum free : %8d\n", MIN_FREE_RAM);
+ printk(KERN_INFO " + Reqd. by modules : %8lu\n",
+ toi_memory_for_modules(0));
+ printk(KERN_INFO " + 2 * extra allow : %8lu\n",
+ 2 * extra_pd1_pages_allowance);
+ printk(KERN_INFO " - Currently free : %8lu\n",
+ real_nr_free_low_pages());
+ printk(KERN_INFO " - Pages allocd : %8lu\n",
+ extra_pages_allocated);
+ printk(KERN_INFO " : ========\n");
+ printk(KERN_INFO " Still needed : %8lu\n",
+ ram_required);
+
+ /* Print breakdown of memory needed for modules */
+ toi_memory_for_modules(1);
+ set_abort_result(TOI_UNABLE_TO_FREE_ENOUGH_MEMORY);
+ }
+
+ if (high_ps1) {
+ printk(KERN_INFO "- We need to free %lu highmem pageset 1 "
+ "pages.\n", high_ps1);
+ set_abort_result(TOI_UNABLE_TO_FREE_ENOUGH_MEMORY);
+ }
+
+ if (low_ps1) {
+ printk(KERN_INFO " - We need to free %ld lowmem pageset 1 "
+ "pages.\n", low_ps1);
+ set_abort_result(TOI_UNABLE_TO_FREE_ENOUGH_MEMORY);
+ }
+}
+
+static void display_stats(int always, int sub_extra_pd1_allow)
+{
+ char buffer[255];
+ snprintf(buffer, 254,
+ "Free:%lu(%lu). Sets:%lu(%lu),%lu(%lu). "
+ "Nosave:%lu-%lu=%lu. Storage:%lu/%lu(%lu=>%lu). "
+ "Needed:%lu,%lu,%lu(%u,%lu,%lu,%ld) (PS2:%s)\n",
+
+ /* Free */
+ real_nr_free_pages(all_zones_mask),
+ real_nr_free_low_pages(),
+
+ /* Sets */
+ pagedir1.size, pagedir1.size - get_highmem_size(pagedir1),
+ pagedir2.size, pagedir2.size - get_highmem_size(pagedir2),
+
+ /* Nosave */
+ num_nosave, extra_pages_allocated,
+ num_nosave - extra_pages_allocated,
+
+ /* Storage */
+ main_storage_allocated,
+ storage_limit,
+ main_storage_needed(1, sub_extra_pd1_allow),
+ main_storage_needed(1, 1),
+
+ /* Needed */
+ lowpages_ps1_to_free(), highpages_ps1_to_free(),
+ any_to_free(1),
+ MIN_FREE_RAM, toi_memory_for_modules(0),
+ extra_pd1_pages_allowance,
+ image_size_limit,
+
+ need_pageset2() ? "yes" : "no");
+
+ if (always)
+ printk("%s", buffer);
+ else
+ toi_message(TOI_EAT_MEMORY, TOI_MEDIUM, 1, buffer);
+}
+
+/* generate_free_page_map
+ *
+ * Description: This routine generates a bitmap of free pages from the
+ * lists used by the memory manager. We then use the bitmap
+ * to quickly calculate which pages to save and in which
+ * pagesets.
+ */
+static void generate_free_page_map(void)
+{
+ int order, cpu, t;
+ unsigned long flags, i;
+ struct zone *zone;
+ struct list_head *curr;
+ unsigned long pfn;
+ struct page *page;
+
+ for_each_populated_zone(zone) {
+
+ if (!zone->spanned_pages)
+ continue;
+
+ spin_lock_irqsave(&zone->lock, flags);
+
+ for (i = 0; i < zone->spanned_pages; i++) {
+ pfn = zone->zone_start_pfn + i;
+
+ if (!pfn_valid(pfn))
+ continue;
+
+ page = pfn_to_page(pfn);
+
+ ClearPageNosaveFree(page);
+ }
+
+ for_each_migratetype_order(order, t) {
+ list_for_each(curr,
+ &zone->free_area[order].free_list[t]) {
+ unsigned long j;
+
+ pfn = page_to_pfn(list_entry(curr, struct page,
+ lru));
+ for (j = 0; j < (1UL << order); j++)
+ SetPageNosaveFree(pfn_to_page(pfn + j));
+ }
+ }
+
+ for_each_online_cpu(cpu) {
+ struct per_cpu_pageset *pset =
+ per_cpu_ptr(zone->pageset, cpu);
+ struct per_cpu_pages *pcp = &pset->pcp;
+ struct page *page;
+ int t;
+
+ for (t = 0; t < MIGRATE_PCPTYPES; t++)
+ list_for_each_entry(page, &pcp->lists[t], lru)
+ SetPageNosaveFree(page);
+ }
+
+ spin_unlock_irqrestore(&zone->lock, flags);
+ }
+}
+
+/* size_of_free_region
+ *
+ * Description: Return the number of pages that are free, beginning with and
+ * including this one.
+ */
+static int size_of_free_region(struct zone *zone, unsigned long start_pfn)
+{
+ unsigned long this_pfn = start_pfn,
+ end_pfn = zone_end_pfn(zone);
+
+ while (pfn_valid(this_pfn) && this_pfn < end_pfn && PageNosaveFree(pfn_to_page(this_pfn)))
+ this_pfn++;
+
+ return this_pfn - start_pfn;
+}
+
+/* flag_image_pages
+ *
+ * This routine generates our lists of pages to be stored in each
+ * pageset. Since we store the data using extents, and adding new
+ * extents might allocate a new extent page, this routine may well
+ * be called more than once.
+ */
+static void flag_image_pages(int atomic_copy)
+{
+ int num_free = 0;
+ unsigned long loop;
+ struct zone *zone;
+
+ pagedir1.size = 0;
+ pagedir2.size = 0;
+
+ set_highmem_size(pagedir1, 0);
+ set_highmem_size(pagedir2, 0);
+
+ num_nosave = 0;
+
+ memory_bm_clear(pageset1_map);
+
+ generate_free_page_map();
+
+ /*
+ * Pages not to be saved are marked Nosave irrespective of being
+ * reserved.
+ */
+ for_each_populated_zone(zone) {
+ int highmem = is_highmem(zone);
+
+ for (loop = 0; loop < zone->spanned_pages; loop++) {
+ unsigned long pfn = zone->zone_start_pfn + loop;
+ struct page *page;
+ int chunk_size;
+
+ if (!pfn_valid(pfn))
+ continue;
+
+ chunk_size = size_of_free_region(zone, pfn);
+ if (chunk_size) {
+ num_free += chunk_size;
+ loop += chunk_size - 1;
+ continue;
+ }
+
+ page = pfn_to_page(pfn);
+
+ if (PageNosave(page)) {
+ num_nosave++;
+ continue;
+ }
+
+ page = highmem ? saveable_highmem_page(zone, pfn) :
+ saveable_page(zone, pfn);
+
+ if (!page) {
+ num_nosave++;
+ continue;
+ }
+
+ if (PagePageset2(page)) {
+ pagedir2.size++;
+ if (PageHighMem(page))
+ inc_highmem_size(pagedir2);
+ else
+ SetPagePageset1Copy(page);
+ if (PageResave(page)) {
+ SetPagePageset1(page);
+ ClearPagePageset1Copy(page);
+ pagedir1.size++;
+ if (PageHighMem(page))
+ inc_highmem_size(pagedir1);
+ }
+ } else {
+ pagedir1.size++;
+ SetPagePageset1(page);
+ if (PageHighMem(page))
+ inc_highmem_size(pagedir1);
+ }
+ }
+ }
+
+ if (!atomic_copy)
+ toi_message(TOI_EAT_MEMORY, TOI_MEDIUM, 0,
+ "Count data pages: Set1 (%d) + Set2 (%d) + Nosave (%ld)"
+ " + NumFree (%d) = %d.\n",
+ pagedir1.size, pagedir2.size, num_nosave, num_free,
+ pagedir1.size + pagedir2.size + num_nosave + num_free);
+}
+
+void toi_recalculate_image_contents(int atomic_copy)
+{
+ memory_bm_clear(pageset1_map);
+ if (!atomic_copy) {
+ unsigned long pfn;
+ memory_bm_position_reset(pageset2_map);
+ for (pfn = memory_bm_next_pfn(pageset2_map);
+ pfn != BM_END_OF_MAP;
+ pfn = memory_bm_next_pfn(pageset2_map))
+ ClearPagePageset1Copy(pfn_to_page(pfn));
+ /* Need to call this before getting pageset1_size! */
+ toi_mark_pages_for_pageset2();
+ }
+ flag_image_pages(atomic_copy);
+
+ if (!atomic_copy) {
+ storage_limit = toiActiveAllocator->storage_available();
+ display_stats(0, 0);
+ }
+}
+
+int try_allocate_extra_memory(void)
+{
+ unsigned long wanted = pagedir1.size + extra_pd1_pages_allowance -
+ get_lowmem_size(pagedir2);
+ if (wanted > extra_pages_allocated) {
+ unsigned long got = toi_allocate_extra_pagedir_memory(wanted);
+ if (wanted < got) {
+ toi_message(TOI_EAT_MEMORY, TOI_LOW, 1,
+ "Want %d extra pages for pageset1, got %d.\n",
+ wanted, got);
+ return 1;
+ }
+ }
+ return 0;
+}
+
+
+/* update_image
+ *
+ * Allocate [more] memory and storage for the image.
+ */
+static void update_image(int ps2_recalc)
+{
+ int old_header_req;
+ unsigned long seek;
+
+ if (try_allocate_extra_memory())
+ return;
+
+ if (ps2_recalc)
+ goto recalc;
+
+ thaw_kernel_threads();
+
+ /*
+ * Allocate remaining storage space, if possible, up to the
+ * maximum we know we'll need. It's okay to allocate the
+ * maximum if the writer is the swapwriter, but
+ * we don't want to grab all available space on an NFS share.
+ * We therefore ignore the expected compression ratio here,
+ * thereby trying to allocate the maximum image size we could
+ * need (assuming compression doesn't expand the image), but
+ * don't complain if we can't get the full amount we're after.
+ */
+
+ do {
+ int result;
+
+ old_header_req = header_storage_needed;
+ toiActiveAllocator->reserve_header_space(header_storage_needed);
+
+ /* How much storage is free with the reservation applied? */
+ storage_limit = toiActiveAllocator->storage_available();
+ seek = min(storage_limit, main_storage_needed(0, 0));
+
+ result = toiActiveAllocator->allocate_storage(seek);
+ if (result)
+ printk("Failed to allocate storage (%d).\n", result);
+
+ main_storage_allocated =
+ toiActiveAllocator->storage_allocated();
+
+ /* Need more header because more storage allocated? */
+ header_storage_needed = get_header_storage_needed();
+
+ } while (header_storage_needed > old_header_req);
+
+ if (freeze_kernel_threads())
+ set_abort_result(TOI_FREEZING_FAILED);
+
+recalc:
+ toi_recalculate_image_contents(0);
+}
+
+/* attempt_to_freeze
+ *
+ * Try to freeze processes.
+ */
+
+static int attempt_to_freeze(void)
+{
+ int result;
+
+ /* Stop processes before checking again */
+ toi_prepare_status(CLEAR_BAR, "Freezing processes & syncing "
+ "filesystems.");
+ result = freeze_processes();
+
+ if (result)
+ set_abort_result(TOI_FREEZING_FAILED);
+
+ result = freeze_kernel_threads();
+
+ if (result)
+ set_abort_result(TOI_FREEZING_FAILED);
+
+ return result;
+}
+
+/* eat_memory
+ *
+ * Try to free some memory, either to meet hard or soft constraints on the image
+ * characteristics.
+ *
+ * Hard constraints:
+ * - Pageset1 must be < half of memory;
+ * - We must have enough memory free at resume time to have pageset1
+ * be able to be loaded in pages that don't conflict with where it has to
+ * be restored.
+ * Soft constraints
+ * - User specificied image size limit.
+ */
+static void eat_memory(void)
+{
+ unsigned long amount_wanted = 0;
+ int did_eat_memory = 0;
+
+ /*
+ * Note that if we have enough storage space and enough free memory, we
+ * may exit without eating anything. We give up when the last 10
+ * iterations ate no extra pages because we're not going to get much
+ * more anyway, but the few pages we get will take a lot of time.
+ *
+ * We freeze processes before beginning, and then unfreeze them if we
+ * need to eat memory until we think we have enough. If our attempts
+ * to freeze fail, we give up and abort.
+ */
+
+ amount_wanted = amount_needed(1);
+
+ switch (image_size_limit) {
+ case -1: /* Don't eat any memory */
+ if (amount_wanted > 0) {
+ set_abort_result(TOI_WOULD_EAT_MEMORY);
+ return;
+ }
+ break;
+ case -2: /* Free caches only */
+ drop_pagecache();
+ toi_recalculate_image_contents(0);
+ amount_wanted = amount_needed(1);
+ break;
+ default:
+ break;
+ }
+
+ if (amount_wanted > 0 && !test_result_state(TOI_ABORTED) &&
+ image_size_limit != -1) {
+ unsigned long request = amount_wanted;
+ unsigned long high_req = max(highpages_ps1_to_free(),
+ any_to_free(1));
+ unsigned long low_req = lowpages_ps1_to_free();
+ unsigned long got = 0;
+
+ toi_prepare_status(CLEAR_BAR,
+ "Seeking to free %ldMB of memory.",
+ MB(amount_wanted));
+
+ thaw_kernel_threads();
+
+ /*
+ * Ask for too many because shrink_memory_mask doesn't
+ * currently return enough most of the time.
+ */
+
+ if (low_req)
+ got = shrink_memory_mask(low_req, GFP_KERNEL);
+ if (high_req)
+ shrink_memory_mask(high_req - got, GFP_HIGHUSER);
+
+ did_eat_memory = 1;
+
+ toi_recalculate_image_contents(0);
+
+ amount_wanted = amount_needed(1);
+
+ printk(KERN_DEBUG "Asked shrink_memory_mask for %ld low pages &"
+ " %ld pages from anywhere, got %ld.\n",
+ high_req, low_req,
+ request - amount_wanted);
+
+ toi_cond_pause(0, NULL);
+
+ if (freeze_kernel_threads())
+ set_abort_result(TOI_FREEZING_FAILED);
+ }
+
+ if (did_eat_memory)
+ toi_recalculate_image_contents(0);
+}
+
+/* toi_prepare_image
+ *
+ * Entry point to the whole image preparation section.
+ *
+ * We do four things:
+ * - Freeze processes;
+ * - Ensure image size constraints are met;
+ * - Complete all the preparation for saving the image,
+ * including allocation of storage. The only memory
+ * that should be needed when we're finished is that
+ * for actually storing the image (and we know how
+ * much is needed for that because the modules tell
+ * us).
+ * - Make sure that all dirty buffers are written out.
+ */
+#define MAX_TRIES 2
+int toi_prepare_image(void)
+{
+ int result = 1, tries = 1;
+
+ main_storage_allocated = 0;
+ no_ps2_needed = 0;
+
+ if (attempt_to_freeze())
+ return 1;
+
+ lock_device_hotplug();
+ set_toi_state(TOI_DEVICE_HOTPLUG_LOCKED);
+
+ if (!extra_pd1_pages_allowance)
+ get_extra_pd1_allowance();
+
+ storage_limit = toiActiveAllocator->storage_available();
+
+ if (!storage_limit) {
+ printk(KERN_INFO "No storage available. Didn't try to prepare "
+ "an image.\n");
+ display_failure_reason(0);
+ set_abort_result(TOI_NOSTORAGE_AVAILABLE);
+ return 1;
+ }
+
+ if (build_attention_list()) {
+ abort_hibernate(TOI_UNABLE_TO_PREPARE_IMAGE,
+ "Unable to successfully prepare the image.\n");
+ return 1;
+ }
+
+ toi_recalculate_image_contents(0);
+
+ do {
+ toi_prepare_status(CLEAR_BAR,
+ "Preparing Image. Try %d.", tries);
+
+ eat_memory();
+
+ if (test_result_state(TOI_ABORTED))
+ break;
+
+ update_image(0);
+
+ tries++;
+
+ } while (image_not_ready(1) && tries <= MAX_TRIES &&
+ !test_result_state(TOI_ABORTED));
+
+ result = image_not_ready(0);
+
+ if (!test_result_state(TOI_ABORTED)) {
+ if (result) {
+ display_stats(1, 0);
+ display_failure_reason(tries > MAX_TRIES);
+ abort_hibernate(TOI_UNABLE_TO_PREPARE_IMAGE,
+ "Unable to successfully prepare the image.\n");
+ } else {
+ /* Pageset 2 needed? */
+ if (!need_pageset2() &&
+ test_action_state(TOI_NO_PS2_IF_UNNEEDED)) {
+ no_ps2_needed = 1;
+ toi_recalculate_image_contents(0);
+ update_image(1);
+ }
+
+ toi_cond_pause(1, "Image preparation complete.");
+ }
+ }
+
+ return result ? result : allocate_checksum_pages();
+}
diff --git a/kernel/power/tuxonice_prepare_image.h b/kernel/power/tuxonice_prepare_image.h
new file mode 100644
index 0000000..73e8bf2
--- /dev/null
+++ b/kernel/power/tuxonice_prepare_image.h
@@ -0,0 +1,38 @@
+/*
+ * kernel/power/tuxonice_prepare_image.h
+ *
+ * Copyright (C) 2003-2014 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ */
+
+#include <asm/sections.h>
+
+extern int toi_prepare_image(void);
+extern void toi_recalculate_image_contents(int storage_available);
+extern unsigned long real_nr_free_pages(unsigned long zone_idx_mask);
+extern long image_size_limit;
+extern void toi_free_extra_pagedir_memory(void);
+extern unsigned long extra_pd1_pages_allowance;
+extern void free_attention_list(void);
+
+#define MIN_FREE_RAM 100
+#define MIN_EXTRA_PAGES_ALLOWANCE 500
+
+#define all_zones_mask ((unsigned long) ((1 << MAX_NR_ZONES) - 1))
+#ifdef CONFIG_HIGHMEM
+#define real_nr_free_high_pages() (real_nr_free_pages(1 << ZONE_HIGHMEM))
+#define real_nr_free_low_pages() (real_nr_free_pages(all_zones_mask - \
+ (1 << ZONE_HIGHMEM)))
+#else
+#define real_nr_free_high_pages() (0)
+#define real_nr_free_low_pages() (real_nr_free_pages(all_zones_mask))
+
+/* For eat_memory function */
+#define ZONE_HIGHMEM (MAX_NR_ZONES + 1)
+#endif
+
+unsigned long get_header_storage_needed(void);
+unsigned long any_to_free(int use_image_size_limit);
+int try_allocate_extra_memory(void);
diff --git a/kernel/power/tuxonice_prune.c b/kernel/power/tuxonice_prune.c
new file mode 100644
index 0000000..9a9444d
--- /dev/null
+++ b/kernel/power/tuxonice_prune.c
@@ -0,0 +1,419 @@
+/*
+ * kernel/power/tuxonice_prune.c
+ *
+ * Copyright (C) 2012 Nigel Cunningham (nigel at tuxonice net)
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file implements a TuxOnIce module that seeks to prune the
+ * amount of data written to disk. It builds a table of hashes
+ * of the uncompressed data, and writes the pfn of the previous page
+ * with the same contents instead of repeating the data when a match
+ * is found.
+ */
+
+#include <linux/suspend.h>
+#include <linux/highmem.h>
+#include <linux/vmalloc.h>
+#include <linux/crypto.h>
+#include <linux/scatterlist.h>
+#include <crypto/hash.h>
+
+#include "tuxonice_builtin.h"
+#include "tuxonice.h"
+#include "tuxonice_modules.h"
+#include "tuxonice_sysfs.h"
+#include "tuxonice_io.h"
+#include "tuxonice_ui.h"
+#include "tuxonice_alloc.h"
+
+/*
+ * We never write a page bigger than PAGE_SIZE, so use a large number
+ * to indicate that data is a PFN.
+ */
+#define PRUNE_DATA_IS_PFN (PAGE_SIZE + 100)
+
+static unsigned long toi_pruned_pages;
+
+static struct toi_module_ops toi_prune_ops;
+static struct toi_module_ops *next_driver;
+
+static char toi_prune_hash_algo_name[32] = "sha1";
+
+static DEFINE_MUTEX(stats_lock);
+
+struct cpu_context {
+ struct shash_desc desc;
+ char *digest;
+};
+
+#define OUT_BUF_SIZE (2 * PAGE_SIZE)
+
+static DEFINE_PER_CPU(struct cpu_context, contexts);
+
+/*
+ * toi_crypto_prepare
+ *
+ * Prepare to do some work by allocating buffers and transforms.
+ */
+static int toi_prune_crypto_prepare(void)
+{
+ int cpu, ret, digestsize;
+
+ if (!*toi_prune_hash_algo_name) {
+ printk(KERN_INFO "TuxOnIce: Pruning enabled but no "
+ "hash algorithm set.\n");
+ return 1;
+ }
+
+ for_each_online_cpu(cpu) {
+ struct cpu_context *this = &per_cpu(contexts, cpu);
+ this->desc.tfm = crypto_alloc_shash(toi_prune_hash_algo_name, 0, 0);
+ if (IS_ERR(this->desc.tfm)) {
+ printk(KERN_INFO "TuxOnIce: Failed to allocate the "
+ "%s prune hash algorithm.\n",
+ toi_prune_hash_algo_name);
+ this->desc.tfm = NULL;
+ return 1;
+ }
+
+ if (!digestsize)
+ digestsize = crypto_shash_digestsize(this->desc.tfm);
+
+ this->digest = kmalloc(digestsize, GFP_KERNEL);
+ if (!this->digest) {
+ printk(KERN_INFO "TuxOnIce: Failed to allocate space "
+ "for digest output.\n");
+ crypto_free_shash(this->desc.tfm);
+ this->desc.tfm = NULL;
+ }
+
+ this->desc.flags = 0;
+
+ ret = crypto_shash_init(&this->desc);
+ if (ret < 0) {
+ printk(KERN_INFO "TuxOnIce: Failed to initialise the "
+ "%s prune hash algorithm.\n",
+ toi_prune_hash_algo_name);
+ kfree(this->digest);
+ this->digest = NULL;
+ crypto_free_shash(this->desc.tfm);
+ this->desc.tfm = NULL;
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+static int toi_prune_rw_cleanup(int writing)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ struct cpu_context *this = &per_cpu(contexts, cpu);
+ if (this->desc.tfm) {
+ crypto_free_shash(this->desc.tfm);
+ this->desc.tfm = NULL;
+ }
+
+ if (this->digest) {
+ kfree(this->digest);
+ this->digest = NULL;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * toi_prune_init
+ */
+
+static int toi_prune_init(int toi_or_resume)
+{
+ if (!toi_or_resume)
+ return 0;
+
+ toi_pruned_pages = 0;
+
+ next_driver = toi_get_next_filter(&toi_prune_ops);
+
+ return next_driver ? 0 : -ECHILD;
+}
+
+/*
+ * toi_prune_rw_init()
+ */
+
+static int toi_prune_rw_init(int rw, int stream_number)
+{
+ if (toi_prune_crypto_prepare()) {
+ printk(KERN_ERR "Failed to initialise prune "
+ "algorithm.\n");
+ if (rw == READ) {
+ printk(KERN_INFO "Unable to read the image.\n");
+ return -ENODEV;
+ } else {
+ printk(KERN_INFO "Continuing without "
+ "pruning the image.\n");
+ toi_prune_ops.enabled = 0;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * toi_prune_write_page()
+ *
+ * Compress a page of data, buffering output and passing on filled
+ * pages to the next module in the pipeline.
+ *
+ * Buffer_page: Pointer to a buffer of size PAGE_SIZE, containing
+ * data to be checked.
+ *
+ * Returns: 0 on success. Otherwise the error is that returned by later
+ * modules, -ECHILD if we have a broken pipeline or -EIO if
+ * zlib errs.
+ */
+static int toi_prune_write_page(unsigned long index, int buf_type,
+ void *buffer_page, unsigned int buf_size)
+{
+ int ret = 0, cpu = smp_processor_id(), write_data = 1;
+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
+ u8* output_buffer = buffer_page;
+ int output_len = buf_size;
+ int out_buf_type = buf_type;
+ void *buffer_start;
+ u32 buf[4];
+
+ if (ctx->desc.tfm) {
+
+ buffer_start = TOI_MAP(buf_type, buffer_page);
+ ctx->len = OUT_BUF_SIZE;
+
+ ret = crypto_shash_digest(&ctx->desc, buffer_start, buf_size, &ctx->digest);
+ if (ret) {
+ printk(KERN_INFO "TuxOnIce: Failed to calculate digest (%d).\n", ret);
+ } else {
+ mutex_lock(&stats_lock);
+
+ toi_pruned_pages++;
+
+ mutex_unlock(&stats_lock);
+
+ }
+
+ TOI_UNMAP(buf_type, buffer_page);
+ }
+
+ if (write_data)
+ ret = next_driver->write_page(index, out_buf_type,
+ output_buffer, output_len);
+ else
+ ret = next_driver->write_page(index, out_buf_type,
+ output_buffer, output_len);
+
+ return ret;
+}
+
+/*
+ * toi_prune_read_page()
+ * @buffer_page: struct page *. Pointer to a buffer of size PAGE_SIZE.
+ *
+ * Retrieve data from later modules or from a previously loaded page and
+ * fill the input buffer.
+ * Zero if successful. Error condition from me or from downstream on failure.
+ */
+static int toi_prune_read_page(unsigned long *index, int buf_type,
+ void *buffer_page, unsigned int *buf_size)
+{
+ int ret, cpu = smp_processor_id();
+ unsigned int len;
+ char *buffer_start;
+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
+
+ if (!ctx->desc.tfm)
+ return next_driver->read_page(index, TOI_PAGE, buffer_page,
+ buf_size);
+
+ /*
+ * All our reads must be synchronous - we can't handle
+ * data that hasn't been read yet.
+ */
+
+ ret = next_driver->read_page(index, buf_type, buffer_page, &len);
+
+ if (len == PRUNE_DATA_IS_PFN) {
+ buffer_start = kmap(buffer_page);
+ }
+
+ return ret;
+}
+
+/*
+ * toi_prune_print_debug_stats
+ * @buffer: Pointer to a buffer into which the debug info will be printed.
+ * @size: Size of the buffer.
+ *
+ * Print information to be recorded for debugging purposes into a buffer.
+ * Returns: Number of characters written to the buffer.
+ */
+
+static int toi_prune_print_debug_stats(char *buffer, int size)
+{
+ int len;
+
+ /* Output the number of pages pruned. */
+ if (*toi_prune_hash_algo_name)
+ len = scnprintf(buffer, size, "- Compressor is '%s'.\n",
+ toi_prune_hash_algo_name);
+ else
+ len = scnprintf(buffer, size, "- Compressor is not set.\n");
+
+ if (toi_pruned_pages)
+ len += scnprintf(buffer+len, size - len, " Pruned "
+ "%lu pages).\n",
+ toi_pruned_pages);
+ return len;
+}
+
+/*
+ * toi_prune_memory_needed
+ *
+ * Tell the caller how much memory we need to operate during hibernate/resume.
+ * Returns: Unsigned long. Maximum number of bytes of memory required for
+ * operation.
+ */
+static int toi_prune_memory_needed(void)
+{
+ return 2 * PAGE_SIZE;
+}
+
+static int toi_prune_storage_needed(void)
+{
+ return 2 * sizeof(unsigned long) + 2 * sizeof(int) +
+ strlen(toi_prune_hash_algo_name) + 1;
+}
+
+/*
+ * toi_prune_save_config_info
+ * @buffer: Pointer to a buffer of size PAGE_SIZE.
+ *
+ * Save informaton needed when reloading the image at resume time.
+ * Returns: Number of bytes used for saving our data.
+ */
+static int toi_prune_save_config_info(char *buffer)
+{
+ int len = strlen(toi_prune_hash_algo_name) + 1, offset = 0;
+
+ *((unsigned long *) buffer) = toi_pruned_pages;
+ offset += sizeof(unsigned long);
+ *((int *) (buffer + offset)) = len;
+ offset += sizeof(int);
+ strncpy(buffer + offset, toi_prune_hash_algo_name, len);
+ return offset + len;
+}
+
+/* toi_prune_load_config_info
+ * @buffer: Pointer to the start of the data.
+ * @size: Number of bytes that were saved.
+ *
+ * Description: Reload information needed for passing back to the
+ * resumed kernel.
+ */
+static void toi_prune_load_config_info(char *buffer, int size)
+{
+ int len, offset = 0;
+
+ toi_pruned_pages = *((unsigned long *) buffer);
+ offset += sizeof(unsigned long);
+ len = *((int *) (buffer + offset));
+ offset += sizeof(int);
+ strncpy(toi_prune_hash_algo_name, buffer + offset, len);
+}
+
+static void toi_prune_pre_atomic_restore(struct toi_boot_kernel_data *bkd)
+{
+ bkd->pruned_pages = toi_pruned_pages;
+}
+
+static void toi_prune_post_atomic_restore(struct toi_boot_kernel_data *bkd)
+{
+ toi_pruned_pages = bkd->pruned_pages;
+}
+
+/*
+ * toi_expected_ratio
+ *
+ * Description: Returns the expected ratio between data passed into this module
+ * and the amount of data output when writing.
+ * Returns: 100 - we have no idea how many pages will be pruned.
+ */
+
+static int toi_prune_expected_ratio(void)
+{
+ return 100;
+}
+
+/*
+ * data for our sysfs entries.
+ */
+static struct toi_sysfs_data sysfs_params[] = {
+ SYSFS_INT("enabled", SYSFS_RW, &toi_prune_ops.enabled, 0, 1, 0,
+ NULL),
+ SYSFS_STRING("algorithm", SYSFS_RW, toi_prune_hash_algo_name, 31, 0, NULL),
+};
+
+/*
+ * Ops structure.
+ */
+static struct toi_module_ops toi_prune_ops = {
+ .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment