Saturday, April 16, 2016

'Cached' memory in Linux Kernel

It is my understanding that the free memory in linux operating system, can be shown by checking the second line in the result of "free -m" :

The first line, shows free memory that are really really free. The second line, shows free memory combined by buffers and cache. The reason is, I was told, that buffer and cache memory could be converted to free memory whenever there is a need. The cache memory is filled with the filesystem cache of the Linux Operating System.

The problem is, I was wrong. There are several cases where I find that cache memory is not being reduced when there is an application needing more memory. Instead, a part of the application memory is being sent to the swap, increasing swap usage and causing pauses in the system (while the memory pages being written to disk). In one case an Oracle database instance restarted and the team thinks it is because the memory demand too high (I think this is a bug).

The cache memory suppose to be reduced when we issue this command (ref: how-do-you-empty-the-buffers-and-cache-on-a-linux-system)
# echo 1 > /proc/sys/vm/drop_caches
But in our oracle database instances, running the command would only get a small reduction to the cached column. I also tried echo with 2 and 3 as the value, same results.

The truth

First, the 'cached' part not only contains file cache. It also contains memory mapped files and anonymous mappings. Shared memory also falls into 'anonymous mappings'. In Oracle systems without hugepages enabled, the SGA (System Global Area) is created as shared memory (see pythian-goodies-free-memory-swap-oracle-and-everything). Of course, the SGA could not be freed from the memory.. otherwise the database would be offline ! 

Another way to get better understanding of the memory usage is to use /proc/meminfo :

You might want to check Shmem usage there. In this example server, 0 values in Hugepages section shows that that hugepages is not being used in this server. For an Oracle database with large amount of RAM (say, 64 GB) and large amount of processes (500 or something) this could became a problem, mainly that the PageTables is going to be exceedingly large (thats another story). The memory in PageTables could not be used for anything else, so it is going to calculated as 'used' in the 'free -m' outputs.

Second, some people in their blog said that page cache (file cache) competes with application memory in order to gain portions of the real memory. A SUSE documentation confirms this :
The kernel swaps out rarely accessed memory pages to use freed memory pages as cache to speed up file system operations, for example during backup operations.

Limiting page cache usage

SUSE developed additional kernel parameter to control this behavior, vm.pagecache_limit_mb and vm.pagecache_limit_ignore_dirty. This two parameters could be used to limit pagecache ( = file cache) size that competes with ordinary memory. When pagecache is below this limit, it is allowed to compete directly with application memory for memory usage, allowing the kernel to swap either when the blocks of memory (file cache or application memory) is not accessed in recent times. Pagecache blocks above this limit is deemed to be less important than application memory, so application memory will not get swapped if there is large amount of page cache that could be freed.
In Red Hat Enterprise Linux 5, we have kernel parameter vm.pagecache that is similar to SUSE's parameter, but using percentage value instead. The default parameter value is 100, meaning that whole memory is available for use as page cache (see Memory_Usage_and_Page_Cache-Tuning_the_Page_Cache). I believe this also the case with CentOS Linuxes.


You might want to check your Linux distribution's documentation about page cache. There are some non-standard parameters that each distribution have that enable us to contain the page cache usage of the Linux operating system.