How to manage Linux resources using Cgroups

How to manage Linux resources using Cgroups

linux_resource_mgmt_cgroups

What is Cgroups

Cgroups also known as Control Groups is a Linux kernel feature that allows to allocate resources such as CPU time, System memory, Network bandwidth, or combinations of these resources to hierarchically ordered groups of processes running on a system.

Cgroups provides System Administrators with fine-grained control over allocating, prioritizing, denying, managing, and monitoring system resources. They provide a way to hierarchically group and label processes, and to apply resource limits to them.

We can monitor the Cgroups, deny access to Cgroups for resources and reconfigure Cgroups on a running system. Hardware resources can be smartly divided up amongst tasks and users, increasing overall efficiency.

Traditional resource management moved from the process level to the application level by binding the system of cgroup hierarchies with the systemd unit tree. We can now manage system resources with systemctl commands, or by modifying systemd unit files.

What are Cgroup Subsystems

A subsystem represents a single resource, such as CPU time or memory. Cgroup subsystems are also called as resource controllers. To identify all the subsystems that are provided by the linux kernel we can read the following file.

File: /proc/cgroups

#subsys_name	hierarchy	num_cgroups	enabled
cpuset	0	98	1
cpu	0	98	1
cpuacct	0	98	1
blkio	0	98	1
memory	0	98	1
devices	0	98	1
freezer	0	98	1
net_cls	0	98	1
perf_event	0	98	1
net_prio	0	98	1
hugetlb	0	98	1
pids	0	98	1
rdma	0	98	1
misc	0	98	1

Here are the details of the above subsystems.

  • blkio : This subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, USB, etc.)
  • cpu : This subsystem uses the scheduler to provide cgroup tasks access to the CPU
  • cpuacct : This subsystem generates automatic reports on CPU resources used by tasks in a cgroup
  • cpuset : This subsystem assigns individual CPUs (on a multicore system) and memory nodes to tasks in a cgroup
  • devices : This subsystem allows or denies access to devices by tasks in a cgroup
  • freezer : This subsystem suspends or resumes tasks in a cgroup
  • memory : This subsystem sets limits on memory use by tasks in a cgroup, and generates automatic reports on memory resources used by those tasks
  • net_cls : This subsystem tags network packets with a class identifier (classid) that allows the Linux traffic controller (tc) to identify packets originating from a particular cgroup task
  • net_prio : This subsystem provides a way to dynamically set the priority of network traffic per network interface
  • ns : The namespace subsystem
  • perf_event: Enables monitoring cgroups with the perf tool
  • hugetlb : Allows to use virtual memory pages of large sizes and to enforce resource limits on these pages

The Linux kernel exposes a wide range of tunable parameters for resource controllers that can be configured with systemd.

Default Cgroup Hierarchies

By default, systemd automatically creates a hierarchy of slice, scope and service units to provide a unified structure for the cgroup tree. It also automatically mounts hierarchies for important kernel resource controllers in the /sys/fs/cgroup/ directory.

All processes running on the system are child processes of the systemd init process. Systemd provides three unit types that are used for the purpose of resource control.

  • Service : A process or a group of processes, which systemd started based on a unit configuration file. Services encapsulate the specified processes so that they can be started and stopped as one set.
  • Scope : A group of externally created processes. Scopes encapsulate processes that are started and stopped by arbitrary processes through the fork() function and then registered by systemd at runtime. For instance, user sessions, containers, and virtual machines are treated as scopes.
  • *Slice : A group of hierarchically organized units. Slices do not contain processes, they organize a hierarchy in which scopes and services are placed.

Service, scope, and slice units directly map to objects in the cgroup tree. Services, scopes, and slices are created manually by the system administrator or dynamically by programs. By default, the operating system defines a number of built-in services that are necessary to run the system.

Also, there are four slices created by default.

  • -.slice : The root slice
  • system.slice : The default place for all system services
  • user.slice : The default place for all user sessions
  • machine.slice : The default place for all virtual machines and Linux containers

Cgroup Tree

Let’s try to look at the default Cgroup hirerachy that is created by the system as shown below.

sudo systemd-cgls

Output:

CGroup /:
-.slice
├─user.slice
│ └─user-1000.slice
│   ├─user@1000.service …
│   │ └─init.scope
│   │   ├─1095 /usr/lib/systemd/systemd --user
│   │   └─1097 (sd-pam)
│   └─session-1.scope
│     ├─1090 sshd-session: admin [priv]
│     ├─1106 sshd-session: admin@pts/0
│     ├─1107 -bash
│     ├─1140 sudo systemd-cgls
│     ├─1142 sudo systemd-cgls
│     ├─1143 systemd-cgls
│     └─1144 less
├─init.scope
│ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize=49 rhgb
└─system.slice
  ├─irqbalance.service
  │ └─898 /usr/sbin/irqbalance
  ├─abrt-journal-core.service
  │ └─940 /usr/bin/abrt-dump-journal-core -D -T -f -e
  ├─systemd-udevd.service …
  │ └─udev
  │   └─751 /usr/lib/systemd/systemd-udevd
  ├─dbus-broker.service
  │ ├─887 /usr/bin/dbus-broker-launch --scope system --audit
  │ └─894 dbus-broker --log 4 --controller 9 --machine-id b94e7cb863f54473aac77325a3858a49 --max-bytes 536870912 --max-fds 4096 --max-m>
  ├─systemd-homed.service
  │ └─902 /usr/lib/systemd/systemd-homed
...

Creating Control Groups

Service and slice units can be configured with persistent unit files or created dynamically at runtime by API calls to PID 1. Scope units can be created only dynamically.

Units created dynamically with API calls are transient and exist only during runtime. Transient units are released automatically as soon as they finish, get deactivated, or the system is rebooted.

There are two types of Cgroups that can be created (ie. Transient and Persistent). To create a transient cgroup for a service, start the service with the systemd-run command. To assign a persistent cgroup to a service, edit its unit configuration file.

From the systemd’s perspective, a cgroup is bound to a system unit configurable with a unit file and manageable with systemd’s command-line utilities.

Transient Control Group

Let’s try to run command “sleep 1000” in a service unit named “sleeptest” under new slice “myslice”.

sudo systemd-run --unit=sleeptest --slice=myslice sleep 1000

Output:

Running as unit: sleeptest.service; invocation ID: 925eae0915de432981cf628debbb3b48

Now, the name sleeptest.service can be used to monitor or to modify the cgroup with systemctl commands.

ps -ef | grep sleep

Output:

root        2223       1  0 19:08 ?        00:00:00 /usr/bin/sleep 1000

Let’s check Cgroup tree to identify our new slice and process in the hierarchy by running “sudo systemd-cgls”.

Output:

...
├─myslice.slice
│ └─sleeptest.service
│   └─2223 /usr/bin/sleep 1000
...

This systemd service that we launched maps to the cgroup hirerarchy at the following default location.

ls -ltr /sys/fs/cgroup/myslice.slice/

Persistent Control Group

Each persistent unit supervised by systemd has a unit configuration file in the /usr/lib/systemd/system/ directory. Let’s try to install httpd service on our system as shown below which will create a systemd unit file to manage the process.

sudo dnf install httpd

Here is the service unit file that is created to manage the process.

File: /usr/lib/systemd/system/httpd.service

Further systemctl set-property can be used to add or modify cgroup related parameters related to this service.

sudo systemctl set-property httpd.service CPUShares=600 MemoryLimit=500M
sudo systemctl daemon-reload

Now our httpd service is allocated the CPU and Memory resources as defined in the above command.

For more detailed information refer “Systemd resource control“.

Hope you enjoyed reading this article. Thank you..