- Back to Home »
- Inter-Process Communication in Linux
Inter-Process Communication in Linux:
1. Goals:
Linux IPC mechanism is provided so that concurrently executing processes have a means to share resources, synchronize and exchange data with one another. Linux implements all forms of IPC between processes executing on the same system through shared resources, kernel data structures, and wait queues.
Linux provides the following forms of IPC:
Signals: perhaps the oldest form of Unix IPC, signals are asynchronous messages
sent to a process.
Wait queues: provides a mechanism to put processes to sleep while they are waiting for an operation to complete.
File locks: provides a mechanism to allow processes to declare either regions of a file, or the entire file itself, as read-only to all processes except the one which holds the filelock.
Pipes and Named Pipes: allows connection-oriented, bi-directional data transfer between two processes either by explicitly setting up the pipe connection, or communicating through a named pipe residing in the file-system.
System VIPC
Semaphores: an implementation of a classical semaphore model. The model also allows for the creation of arrays of semaphores.
Message queues: a connectionless data-transfer model. A message is a sequence of bytes, with an associated type. Messages are written to message queues, and messages can be obtained by reading from the message queue, possibly restricting which messages are read in by type.
Shared memory: a mechanism by which several processes have access to the same region of physical memory.
Unix Domain sockets: another connection-oriented data-transfer mechanism that provides the same communication model as the INET sockets, discussed in the next section.
2. External Interface:
A signal is a notification sent to a process by kernel or another process. Signals are sent with the send_sig() function. The signal number is provided as a parameter, as well as the destination process. Processes may register to handle signals by using the signal() function.
File locks are supported directly by the Linux file system. To lock an entire file, the open() system call can be used, or the sys_fcntl() system-call can be used. Locking areas within a file is done through the sys_fcntl() system call.
Pipes are created by using the pipe() system call. The file-systems read() and write() calls are then used to transfer data on the pipe. Named pipes are opened using the open() system-call. The System V IPC mechanisms have a common interface, which is the ipc() system call. The various IPC operations are specified using parameters to the system call.
The Unix domain socket functionality is also encapsulated by a single system call, socketcall(). Each of the system-calls mentioned above are well documented, and the reader is encouraged to consult the corresponding man-page.
The IPC subsystem exposes wait calls to other kernel subsystems. Since wait queues are not used by user processes, they do not have a system-call interface. Wait queues are used in implementing semaphores, pipes, and bottom-half handlers. The procedure add_wait_queue() inserts a task into a wait queue. The procedure remove_wait_queue() removes a task from the wait queue.
3. Subsystem Description:
Signals are used to notify a process of an event. A signal has the effect of altering the state of recipient process, depending on the semantics of particular signal. Kernel can send signals to any executing process. A user process may only send a signal to a process or process group if it possesses associated FID or GID. Signals are not handled immediately for dormant processes. Rather, before the scheduler sets a process running in user mode again, it checks if a signal was sent to process. If so, then the scheduler calls the do_signal() function, which handles the signal appropriately.
Wait queues are simply linked lists of pointers to task structures that correspond to processes that are Waiting for a kernel event such as conclusion of a DMA transfer. A process can enter itself on the wait queue by either calling sleep_on() or interruptable_sleep_on() functions. The functions wake_up() and wake_up_interruptable() remove the process from the wait queue. Interrupt routines also use wait-queues to avoid race conditions.
Linux allows user process to prevent other processes to access a file. This exclusion can be based on a whole file or a region of a file. File-locks are used to implement this exclusion. The file-system implementation contains appropriate data: fields in its data structures to allow kernel to determine if a lock has been placed on a file or a region inside a file. In the former case, a lock attempt on a locked file, will fail. In the latter case, an attempt to lock a region already locked will fail. In either case, the requesting process is not permitted to access the file since the lock has not been granted by the kernel.
Pipes and named pipes have a similar implementation, as their functionality is almost the same. The creation of process is different. However, in either case a file descriptor is returned which refers to pipe. Upon creation, one page of memory is associated with opened pipe. This memory is treated like circular buffer to which write operations are done atomically. When the buffer is full, the writing processes block. If a read request is made for more data than available, the reading processes block. Each pipe has a wait queue associated with it. Processes are added and removed from the queue during the read and writes.
Semaphores are implemented with wait queues and follow classical semaphore model. Each semaphore has an associated value. Two operations, up() and down() are implemented on the semaphore. When the value of the semaphore is zero, the process performing the decrement on the semaphore is blocked on the wait queue. Semaphore arrays are simply a contiguous set of semaphores. Each process also maintains a list of semaphore operations it has performed, so that if the process exits prematurely, these operations can be undone.
The message queue is a linear linked-list, to which processes read or write a sequence of bytes. Messages are received in the same order that they are written. Two wait queues are associated with the message queues, one for processes that are writing to a full message queue, and another for serializing the message writes. The actual size of the message is set when the message queue is created.
Shared memory is the fastest form of IPC. This mechanism allows processes to share a region of their memory. Creation of shared memory areas is handled by the memory management system. Shared pages are attached to the user processes virtual memory space by the system call sys_shmat(). A shared page can be removed from the user segment of a process by calling the sys_shmdt() call.
The Unix domain sockets are implemented in a similar fashion to pipes, in the sense that both are based on a circular buffer based on a page of memory. However, sockets provide a separate buffer for each communication direction.
4. Data Structures:
In this section, the important data structures needed to implement the above IPC mechanisms are described.
Signals are implemented through the signal field in the task_struct structure. Each signal is represented by a bit in this field. Thus, the number of signals a version of Linux can support is limited to the number of bits in a word. The field blocked holds the signals that are being blocked by a process.
There is only one data structure associated with wait queues, the wait_queue structure. These structures contain a pointer to the associated task_struct, and are linked into a list.
File locks have an associated file_lock structure. This structure contains a pointer to a task_struct for the owning process, the file descriptor of the locked file, a wait queue for processes which are waiting for the cancellation of the file lock, and which region of the file is locked. The file_lock structures are linked into a list for each open file.
Pipes, both nameless and named,; are represented by a file system inode. This inode stores extra pipe-specific information in the pipe_inode_info structure. This structure contains a wait queue for processes which are blocking on a read or write, a pointer to the page of memory used as the circular buffer for the pipe, the amount of data in the pipe, and the number of processes which are currently reading and writing from/to the pipe.
All system V IPC objects are created in the kernel, and each have associated access permissions. These access permissions are held in the ipc_perm structure. Semaphores are represented with the sem structure, which holds the value of the semaphore and the pid of the process that performed the last operation on the semaphore. Semaphore arrays are represented by the semid_ds structure, which holds the access permissions, the time of the last semaphore operation, a pointer to the first semaphore in the array, and queues on which processes block when performing semaphore operations. The structure sem_undo is used to create a list of semaphore operations performed by a process, so that they can all be undone when the process is killed.
Message queues are based on the msquid_ds structure, which holds management and control information. This structure stores the following fields:
- Access permissions
- Link fields to implement the message queue (i.e. pointers to msquid_ds)
- Times for the last send, receipt and change
- Queues on which processes block, as described in the previous section
- The current number of bytes in the queue
- The number of messages
- The size of the queue (in bytes)
- The process number of the last sender
- The process number of the last receiver.
A message itself is stored in the kernel with a msg structure. This structure holds a link field, to implement a link list of messages, the type of message, the address of the message data, and the length of the message.
The shared memory implementation is based on the shmid_ds structure, which, like the msquid_ds structure, holds management and control information. The structure contains access control permissions, last attach, detach and change times, pids of the creator and last process to call an operation for the shared segment, number of processes to which the shared memory region is attached to, the number of pages which make up the shared memory region, and a field for page table entries.
The Unix domain sockets are based on the socket data structure, described in the Network Interface section.
5. Subsystem Structure:
Control flows from the system call layer down into each module. The System V IPC facilities are implemented in the ipc directory of the kernel source. The kernel IPC module refers to IPC facilities implemented within the kernel directory. Similar conventions hold for the File and Net IPC facilities.
The System V IPC module is dependant on the Kernel IPC mechanism. In particular, semaphores are implemented with wait queues. All other IPC facilities are implemented independently of each other.
6. Subsystem Dependencies:
The IPC subsystem depends on the file system for sockets. Sockets use file descriptors, and once they are opened, they are assigned to an inode. Memory management depends on IPC as the page swapping routine calls the IPC subsystem to perform swapping of shared
memory. IPC depends on memory management primarily for the allocation of buffers and the implementation of shared memory.
Some IPC mechanisms use timers, which are implemented in the process scheduler subsystem. Process, scheduling relies on signals. For these two reasons, the IPC and Process Scheduler modules depend on each other.