A system call is just what its name implies -- a request for the
operating system to do something on behalf of the user's program. The
system calls are functions used in the kernel itself. To the
programmer, the system call appears as a normal C function call.
However since a system call executes code in the kernel, there must be a
mechanism to change the mode of a process from user mode to kernel mode.
The C compiler uses a predefined library of functions (the C library)
that have the names of the system calls. The library functions
typically invoke an instruction that changes the process execution mode
to kernel mode and causes the kernel to start executing code for system
calls. The instruction that causes the mode change is often referred to
as an "operating system trap" which is a software generated interrupt.
The library routines execute in user mode, but the system call interface
is a special case of an interrupt handler. The library functions pass
the kernel a unique number per system call in a machine dependent way --
either as a parameter to the operating system trap, in a particular
register, or on the stack -- and the kernel thus determines the specific
system call the user is invoking. In handling the operating system
trap, the kernel looks up the system call number in a table to find the
address of the appropriate kernel routine that is the entry point for
the system call and to find the number of parameters the system call
expects. The kernel calculates the (user) address of the first
parameter to the system call by adding (or subtracting, depending on the
direction of stack growth) an offset to the user stack pointer,
corresponding to the number of the parameters to the system call.
Finally, it copies the user parameters to the "u area" and call the
appropriate system call routine. After executing the code for the
system call, the kernel determines whether there was an error. If so,
it adjusts register locations in the saved user register context,
typically setting the "carry" bit for the PS (processor status) register
and copying the error number into register 0 location. If there were no
errors in the execution of the system call, the kernel clears the
"carry" bit in the PS register and copies the appropriate return values
from the system call into the locations for registers 0 and 1 in the
saved user register context. When the kernel returns from the operating
system trap to user mode, it returns to the library instruction after
the trap instruction. The library interprets the return values from the
kernel and returns a value to the user program.
and to provide interprocess communication. The UNIX system interface
consists of about 80 system calls (as UNIX evolves this number will
increase). The following table lists about 40 of the more important
system call:
------------------------------------------------------------------------------------
File Structure Creating a Channel creat()
Related Calls open()
close()
Input/Output read()
write()
Random Access lseek()
Channel Duplication dup()
Aliasing and Removing link()
Files unlink()
File Status stat()
fstat()
Access Control access()
chmod()
chown()
umask()
Device Control ioctl()
---------------------------------------------------------------------
Process Related Process Creation and exec()
Calls Termination fork()
wait()
exit()
Process Owner and Group getuid()
geteuid()
getgid()
getegid()
Process Identity getpid()
getppid()
Process Control signal()
kill()
alarm()
Change Working Directory chdir()
----------------------------------------------------------------------
Interprocess Pipelines pipe()
Communication Messages msgget()
msgsnd()
msgrcv()
msgctl()
Semaphores semget()
semop()
Shared Memory shmget()
shmat()
shmdt()
----------------------------------------------------------------------
[NOTE: The system call interface is that aspect of UNIX that has
changed the most since the inception of the UNIX system. Therefore,
when you write a software tool, you should protect that tool by putting
system calls in other subroutines within your program and then calling
only those subroutines. Should the next version of the UNIX system
change the syntax and semantics of the system calls you've used, you
need only change your interface routines.]
When a system call discovers and error, it returns -1 and stores the
reason the called failed in an external variable named "errno". The
"/usr/include/errno.h" file maps these error numbers to manifest
constants, and it these constants that you should use in your programs.
When a system call returns successfully, it returns something other than
-1, but it does not clear "errno". "errno" only has meaning directly
after a system call that returns an error.
When you use system calls in your programs, you should check the value
returned by those system calls. Furthermore, when a system call
discovers an error, you should use the "perror()" subroutine to print a
diagnostic message on the standard error file that describes why the
system call failed. The syntax for "perror()" is:
void perror(string)
char string;
"perror()" displays the argument string, a colon, and then the error
message, as directed by "errno", followed by a newline. The output of
"perror()" is displayed on "standard error". Typically, the argument
give to "perror()" is the name of the program that incurred the error,
argv[0]. However, when using subroutines and system calls on files, the
related file name might be passed to "perror()".
There are occasions where you the programmer might wish to maintain more
control over the printing of error messages than "perror()" provides --
such as with a formatted screen where the newline printed by "perror()"
would destroy the formatting. In this case, you can directly access the
same system external (global) variables that "perror()" uses. They are:
extern int errno;
extern char *sys_errlist[];
extern int sys_nerr;
"errno" has been described above. "sys_errlist" is an array (table) of
pointers to the error message strings. Each message string is null
terminated and does not contain a newline. "sys_nerr" is the number of
messages in the error message table and is the maximum value "errno" can
assume. "errno" is used as the index into the table of error messages.
Following are two sample programs that display all of the system error
messages on standard error.
/* errmsg1.c
print all system error messages using "perror()"
*/
#include
int main()
{
int i;
extern int errno, sys_nerr;
for (i = 0; i < sys_nerr; ++i)
{
fprintf(stderr, "%3d",i);
errno = i;
perror(" ");
}
exit (0);
}
/* errmsg2.c
print all system error messages using the global error message table.
*/
#include
int main()
{
int i;
extern int sys_nerr;
extern char *sys_errlist[];
fprintf(stderr,"Here are the current %d error messages:\n\n",sys_nerr);
for (i = 0; i < sys_nerr; ++i)
fprintf(stderr,"%3d: %s\n", i, sys_errlist[i]);
}
calls.
File Structure Related System Calls
The file structure related system calls available in the UNIX system let
you create, open, and close files, read and write files, randomly access
files, alias and remove files, get information about files, check the
accessibility of files, change protections, owner, and group of files,
and control devices. These operations either use a character string
that defines the absolute or relative path name of a file, or a small
integer called a file descriptor that identifies the I/O channel. A
channel is a connection between a process and a file that appears to the
process as an unformatted stream of bytes. The kernel presents and
accepts data from the channel as a process reads and writes that
channel. To a process then, all input and output operations are
synchronous and unbuffered.
When doing I/O, a process specifies the file descriptor for an I/O
channel, a buffer to be filled or emptied, and the maximum size of data
to be transferred. An I/O channel may allow input, output, or both.
Furthermore, each channel has a read/write pointer. Each I/O operation
starts where the last operation finished and advances the pointer by the
number of bytes transferred. A process can access a channel's data
randomly by changing the read/write pointer.
All input and output operations start by opening a file using either the
"creat()" or "open()" system calls. These calls return a file
descriptor that identifies the I/O channel. Recall that file
descriptors 0, 1, and 2 refer to standard input, standard output, and
standard error files respectively, and that file descriptor 0 is a
channel to your terminal's keyboard and file descriptors 1 and 2 are
channels to your terminal's display screen.
The prototype for the creat() system call is:
int creat(file_name, mode)
char *file_name;
int mode;
where file_name is pointer to a null terminated character string that
names the file and mode defines the file's access permissions. The mode
is usually specified as an octal number such as 0666 that would mean
read/write permission for owner, group, and others or the mode may also
be entered using manifest constants defined in the "/usr/include/sys/stat.h"
file. If the file named by file_name does not exist, the UNIX system creates
it with the specified mode permissions. However, if the file does exist, its
contents are discarded and the mode value is ignored. The permissions of the
existing file are retained.
Following is an example of how to use creat():
/* creat.c */
#include
#include /* defines types used by sys/stat.h */
#include /* defines S_IREAD & S_IWRITE */
int main()
{
int fd;
fd = creat("datafile.dat", S_IREAD | S_IWRITE);
if (fd == -1)
printf("Error in opening datafile.dat\n");
else
{
printf("datafile.dat opened for read/write access\n");
printf("datafile.dat is currently empty\n");
}
close(fd);
exit (0);
}
The following is a sample of the manifest constants for the mode
argument as defined in /usr/include/sys/stat.h:
#define S_IRWXU 0000700 /* -rwx------ */
#define S_IREAD 0000400 /* read permission, owner */
#define S_IRUSR S_IREAD
#define S_IWRITE 0000200 /* write permission, owner */
#define S_IWUSR S_IWRITE
#define S_IEXEC 0000100 /* execute/search permission, owner */
#define S_IXUSR S_IEXEC
#define S_IRWXG 0000070 /* ----rwx--- */
#define S_IRGRP 0000040 /* read permission, group */
#define S_IWGRP 0000020 /* write " " */
#define S_IXGRP 0000010 /* execute/search " " */
#define S_IRWXO 0000007 /* -------rwx */
#define S_IROTH 0000004 /* read permission, other */
#define S_IWOTH 0000002 /* write " " */
#define S_IXOTH 0000001 /* execute/search " " */
Multiple mode values may be combined by or'ing (using the | operator)
the values together as demonstrated in the above sample program.
open()
Next is the open() system call. open() lets you open a file for
reading, writing, or reading and writing.
The prototype for the open() system call is:
#include
int open(file_name, option_flags [, mode])
char *file_name;
int option_flags, mode;
where file_name is a pointer to the character string that names the
file, option_flags represent the type of channel, and mode defines the
file's access permissions if the file is being created.
The allowable option_flags as defined in "/usr/include/fcntl.h" are:
#define O_RDONLY 0 /* Open the file for reading only */
#define O_WRONLY 1 /* Open the file for writing only */
#define O_RDWR 2 /* Open the file for both reading and writing*/
#define O_NDELAY 04 /* Non-blocking I/O */
#define O_APPEND 010 /* append (writes guaranteed at the end) */
#define O_CREAT 00400 /*open with file create (uses third open arg) */
#define O_TRUNC 01000 /* open with truncation */
#define O_EXCL 02000 /* exclusive open */
Multiple values are combined using the | operator (i.e. bitwise OR).
Note: some combinations are mutually exclusive such as: O_RDONLY |
O_WRONLY and will cause open() to fail. If the O_CREAT flag is used,
then a mode argument is required. The mode argument may be specified in
the same manner as in the creat() system call.
/* open.c */
#include /* defines options flags */
#include /* defines types used by sys/stat.h */
#include /* defines S_IREAD & S_IWRITE */
static char message[] = "Hello, world";
int main()
{
int fd;
char buffer[80];
/* open datafile.dat for read/write access (O_RDWR)
create datafile.dat if it does not exist (O_CREAT)
return error if datafile already exists (O_EXCL)
permit read/write access to file (S_IWRITE | S_IREAD)
*/
fd = open("datafile.dat",O_RDWR | O_CREAT | O_EXCL, S_IREAD | S_IWRITE);
if (fd != -1)
{
printf("datafile.dat opened for read/write access\n");
write(fd, message, sizeof(message));
lseek(fd, 0L, 0); /* go back to the beginning of the file */
if (read(fd, buffer, sizeof(message)) == sizeof(message))
printf("\"%s\" was written to datafile.dat\n", buffer);
else
printf("*** error reading datafile.dat ***\n");
close (fd);
}
else
printf("*** datafile.dat already exists ***\n");
exit (0);
}
close()
To close a channel, use the close() system call. The prototype for the
close() system call is:
int close(file_descriptor)
int file_descriptor;
where file_descriptor identifies a currently open channel. close()
fails if file_descriptor does not identify a currently open channel.
read() write()
The read() system call does all input and the write() system call does
all output. When used together, they provide all the tools necessary to
do input and output sequentially. When used with the lseek() system
call, they provide all the tools necessary to do input and output
randomly.
Both read() and write() take three arguments. Their prototypes are:
int read(file_descriptor, buffer_pointer, transfer_size)
int file_descriptor;
char *buffer_pointer;
unsigned transfer_size;
int write(file_descriptor, buffer_pointer, transfer_size)
int file_descriptor;
char *buffer_pointer;
unsigned transfer_size;
where file_descriptor identifies the I/O channel, buffer_pointer points
to the area in memory where the data is stored for a read() or where
the data is taken for a write(), and transfer_size defines the maximum
number of characters transferred between the file and the buffer.
read() and write() return the number of bytes transferred.
There is no limit on transfer_size, but you must make sure it's safe to
copy transfer_size bytes to or from the memory pointed to by
buffer_pointer. A transfer_size of 1 is used to transfer a byte at a
time for so-called "unbuffered" input/output. The most efficient value
for transfer_size is the size of the largest physical record the I/O
channel is likely to have to handle. Therefore, 1K bytes -- the disk
block size -- is the most efficient general-purpose buffer size for a
standard file. However, if you are writing to a terminal, the transfer
is best handled in lines ending with a newline.
For an example using read() and write(), see the above example of
open().
lseek()
The UNIX system file system treats an ordinary file as a sequence of
bytes. No internal structure is imposed on a file by the operating
system. Generally, a file is read or written sequentially -- that is,
from beginning to the end of the file. Sometimes sequential reading and
writing is not appropriate. It may be inefficient, for instance, to
read an entire file just to move to the end of the file to add
characters. Fortunately, the UNIX system lets you read and write
anywhere in the file. Known as "random access", this capability is made
possible with the lseek() system call. During file I/O, the UNIX system
uses a long integer, also called a File Pointer, to keep track of the
next byte to read or write. This long integer represents the number of
bytes from the beginning of the file to that next character. Random
access I/O is achieved by changing the value of this file pointer using
the lseek() system call.
The prototype for lseek() is:
long lseek(file_descriptor, offset, whence)
int file_descriptor;
long offset;
int whence;
where file_descriptor identifies the I/O channel and offset and whence
work together to describe how to change the file pointer according to
the following table:
whence new position
------------------------------
0 offset bytes into the file
1 current position in the file plus offset
2 current end-of-file position plus offset
If successful, lseek() returns a long integer that defines the new file
pointer value measured in bytes from the beginning of the file. If
unsuccessful, the file position does not change.
Certain devices are incapable of seeking, namely terminals and the
character interface to a tape drive. lseek() does not change the file
pointer to these devices.
Following is an example using lseek():
/* lseek.c */
#include
#include
int main()
{
int fd;
long position;
fd = open("datafile.dat", O_RDONLY);
if ( fd != -1)
{
position = lseek(fd, 0L, 2); /* seek 0 bytes from end-of-file */
if (position != -1)
printf("The length of datafile.dat is %ld bytes.\n", position);
else
perror("lseek error");
}
else
printf("can't open datafile.dat\n");
close(fd);
}
Many UNIX systems have defined manifest constants for use as the
"whence" argument of lseek(). The definitions can be found in the
"file.h" and/or "unistd.h" include files. For example, the University
of Maryland's HP-9000 UNIX system has the following definitions:
from file.h we have:
#define L_SET 0 /* absolute offset */
#define L_INCR 1 /* relative to current offset */
#define L_XTND 2 /* relative to end of file */
and from unistd.h we have:
#define SEEK_SET 0 /* Set file pointer to "offset" */
#define SEEK_CUR 1 /* Set file pointer to current plus "offset" */
#define SEEK_END 2 /* Set file pointer to EOF plus "offset" */
The definitions from unistd.h are the most "portable" across UNIX and
MS-DOS C compilers.
link()
The UNIX system file structure allows more than one named reference to a
given file, a feature called "aliasing". Making an alias to a file
means that the file has more than one name, but all names of the file
refer to the same data. Since all names refer to the same data,
changing the contents of one file changes the contents of all aliases to
that file. Aliasing a file in the UNIX system amounts to the system
creating a new directory entry that contains the alias file name and
then copying the i-number of a existing file to the i-number position of
this new directory entry. This action is accomplished by the link()
system call. The link() system call links an existing file to a new
file.
The prototype for link() is:
int link(original_name, alias_name)
char *original_name, *alias_name;
where both original_name and alias_name are character strings that name
the existing and new files respectively. link() will fail and no link
will be created if any of the following conditions holds:
a path name component is not a directory.
a path name component does not exist.
a path name component is off-limits.
original_name does not exist.
alias_name does exist.
original_name is a directory and you are not the superuser.
a link is attempted across file systems.
the destination directory for alias_name is not writable.
the destination directory is on a mounted read-only file system.
Following is a short example:
/* link.c
*/
#include
int main()
{
if ((link("foo.old", "foo.new")) == -1)
{
perror(" ");
exit (1); /* return a non-zero exit code on error */
}
exit(0);
}
unlink()
The opposite of the link() system call is the unlink() system call.
unlink() removes a file by zeroing the i-number part of the file's
directory entry, reducing the link count field in the file's inode by 1,
and releasing the data blocks and the inode if the link count field
becomes zero. unlink() is the only system call for removing a file in
the UNIX system.
The prototype for unlink() is:
int unlink(file_name)
char *file_name;
where file_name names the file to be unlinked. unlink() fails if any of
the following conditions holds:
a path name component is not a directory.
a path name component does not exist.
a path name component is off-limits.
file_name does not exist.
file_name is a directory and you are not the superuser.
the directory for the file named by file_name is not writable.
the directory is contained in a file system mounted read-only.
It is important to understand that a file's contents and its inode are
not discarded until all processes close the unlinked file.
Following is a short example:
/* unlink.c
*/
#include
int main()
{
if ((unlink("foo.bar")) == -1)
{
perror(" ");
exit (1); /* return a non-zero exit code on error */
}
exit (0);
}
File Status
stat() - fstat()
The i-node data structure holds all the information about a file except the
file's name and its contents. Sometimes your programs need to use the
information in the i-node structure to do some job. You can access this
information with the stat() and fstat() system calls. stat() and fstat()
return the information in the i-node for the file named by a string and by a
file descriptor, respectively. The format for the i-node struct returned by
these system calls is defined in /usr/include/sys/stat.h. stat.h uses types
built with the C language typedef construct and defined in the file
/usr/include/sys/types.h, so it too must be included and must be included
before the inclusion of the stat.h file.
The prototypes for stat() and fstat() are:
#include
#include
int stat(file_name, stat_buf)
char *file_name;
struct stat *stat_buf;
int fstat(file_descriptor, stat_buf)
int file_descriptor;
struct stat *stat_buf;
where file_name names the file as an ASCII string and file_descriptor names
the I/O channel and therefore the file. Both calls returns the file's
specifics in stat_buf. stat() and fstat() fail if any of the following
conditions hold:
a path name component is not a directory (stat() only).
file_name does not exit (stat() only).
a path name component is off-limits (stat() only).
file_descriptor does not identify an open I/O channel (fstat() only).
stat_buf points to an invalid address.
Following is an extract of the stat.h file from the University's HP-9000. It
shows the definition of the stat structure and some manifest constants used
to access the st_mode field of the structure.
/* stat.h */
struct stat
{
dev_t st_dev; /* The device number containing the i-node */
ino_t st_ino; /* The i-number */
unsigned short st_mode; /* The 16 bit mode */
short st_nlink; /* The link count; 0 for pipes */
ushort st_uid; /* The owner user-ID */
ushort st_gid; /* The group-ID */
dev_t st_rdev; /* For a special file, the device number */
off_t st_size; /* The size of the file; 0 for special files */
time_t st_atime; /* The access time. */
int st_spare1;
time_t st_mtime; /* The modification time. */
int st_spare2;
time_t st_ctime; /* The status-change time. */
int st_spare3;
long st_blksize;
long st_blocks;
uint st_remote:1; /* Set if file is remote */
dev_t st_netdev; /* ID of device containing */
/* network special file */
ino_t st_netino; /* Inode number of network special file */
long st_spare4[9];
};
#define S_IFMT 0170000 /* type of file */
#define S_IFDIR 0040000 /* directory */
#define S_IFCHR 0020000 /* character special */
#define S_IFBLK 0060000 /* block special */
#define S_IFREG 0100000 /* regular (ordinary) */
#define S_IFIFO 0010000 /* fifo */
#define S_IFNWK 0110000 /* network special */
#define S_IFLNK 0120000 /* symbolic link */
#define S_IFSOCK 0140000 /* socket */
#define S_ISUID 0004000 /* set user id on execution */
#define S_ISGID 0002000 /* set group id on execution */
#define S_ENFMT 0002000 /* enforced file locking (shared with S_ISGID)*/
#define S_ISVTX 0001000 /* save swapped text even after use */
Following is an example program demonstrating the use of the stat() system
call to determine the status of a file:
/* status.c */
/* demonstrates the use of the stat() system call to determine the
status of a file.
*/
#include
#include
#include
#define ERR (-1)
#define TRUE 1
#define FALSE 0
int main();
int main(argc, argv)
int argc;
char *argv[];
{
int isdevice = FALSE;
struct stat stat_buf;
if (argc != 2)
{
printf("Usage: %s filename\n", argv[0]);
exit (1);
}
if ( stat( argv[1], &stat_buf) == ERR)
{
perror("stat");
exit (1);
}
printf("\nFile: %s status:\n\n",argv[1]);
if ((stat_buf.st_mode & S_IFMT) == S_IFDIR)
printf("Directory\n");
else if ((stat_buf.st_mode & S_IFMT) == S_IFBLK)
{
printf("Block special file\n");
isdevice = TRUE;
}
else if ((stat_buf.st_mode & S_IFMT) == S_IFCHR)
{
printf("Character special file\n");
isdevice = TRUE;
}
else if ((stat_buf.st_mode & S_IFMT) == S_IFREG)
printf("Ordinary file\n");
else if ((stat_buf.st_mode & S_IFMT) == S_IFIFO)
printf("FIFO\n");
-32-
if (isdevice)
printf("Device number:%d, %d\n", (stat_buf.st_rdev > 8) & 0377,
stat_buf.st_rdev & 0377);
printf("Resides on device:%d, %d\n", (stat_buf.st_dev > 8) & 0377,
stat_buf.st_dev & 0377);
printf("I-node: %d; Links: %d; Size: %ld\n", stat_buf.st_ino,
stat_buf.st_nlink, stat_buf.st_size);
if ((stat_buf.st_mode & S_ISUID) == S_ISUID)
printf("Set-user-ID\n");
if ((stat_buf.st_mode & S_ISGID) == S_ISGID)
printf("Set-group-ID\n");
if ((stat_buf.st_mode & S_ISVTX) == S_ISVTX)
printf("Sticky-bit set -- save swapped text after use\n");
printf("Permissions: %o\n", stat_buf.st_mode & 0777);
exit (0);
}
access()
To determine if a file is accessible to a program, the access() system call
may be used. Unlike any other system call that deals with permissions,
access() checks the real user-ID or group-ID, not the effective ones.
The prototype for the access() system call is:
int access(file_name, access_mode)
char *file_name;
int access_mode;
where file_name is the name of the file to which access permissions given in
access_mode are to be applied. Access modes are often defined as manifest
constants in /usr/include/sys/file.h. The available modes are:
Value Meaning file.h constant
----- ------ ------
00 existence F_OK
01 execute X_OK
02 write W_OK
04 read R_OK
These values may be ORed together to check for mone than one access
permission. The call to access() returns 0 if the program has the given
access permissions, otherwise -1 is returned and errno is set to the reason
for failure. This call is somewhat useful in that it makes checking for a
specific permission easy. However, it only answers the question "do I have
this permission?" It cannot answer the question "what permissions do I
have?"
The following example program demonstrates the use of the access() system
call to remove a file. Before removing the file, a check is made to make
sure that the file exits and that it is writable (it will not remove a
read-only file).
/* remove.c */
#include
#include
#define ERR (-1)
int main();
int main(argc, argv)
int argc;
char *argv[];
{
if (argc != 2)
{
printf("Usage: %s filename\n", argv[0]);
exit (1);
}
if (access (argv[1], F_OK) == ERR) /* check that file exists */
{
perror(argv[1]);
exit (1);
}
if (access (argv[1], W_OK) == ERR) /* check for write permission */
{
fprintf(stderr,"File: %s is write protected!\n", argv[1]);
exit (1);
}
if (unlink (argv[1]) == ERR)
{
perror(argv[1]);
exit (1);
}
exit (0);
}
Print this post
No comments:
Post a Comment