LINUX TROUBLESHOOTING FOR SYSTEM ADMINISTRATORS AND POWER USERS PDF

adminComment(0)
    Contents:

Linux Troubleshooting for System Administrators and Power Users is THE book for (inevosisan.ml): /boot/grub/inevosisan.ml Linux Troubleshooting for System Administrators and Power Users Microsoft Outlook Programming: Jumpstart for Power Users and Administrators. Linux Troubleshooting for System Administrators and Power Users - Ebook download as PDF File .pdf), Text File .txt) or read book online.


Linux Troubleshooting For System Administrators And Power Users Pdf

Author:SHAE HOLDEN
Language:English, Japanese, Arabic
Country:San Marino
Genre:Fiction & Literature
Pages:611
Published (Last):05.01.2016
ISBN:278-8-41476-317-9
ePub File Size:27.41 MB
PDF File Size:8.55 MB
Distribution:Free* [*Sign up for free]
Downloads:47239
Uploaded by: VERONIKA

This book is aimed at novice Linux system administrators (and might be interesting for home users that want to know a bit more about their Linux system). More information and inevosisan.ml available at inevosisan.ml .. practice: troubleshooting tools. systemd power management. Linux Troubleshooting for System Administrators and Power Users. Christopher L . Ebook Teach Yourself Linux In 24 Hours inevosisan.ml John Wiley & Sons - Linux Bible, Edition, M, pdf Linux Troubleshooting for System Administrators and Power Users By James Kirkland, David.

The sysadmin is on call when a computer system goes down or malfunctions, and must be able to quickly and correctly diagnose what is wrong and how best to fix it. They may also need to have teamwork and communication skills; as well as being able to install and configure hardware and software. Sysadmins must understand the behavior of software in order to deploy it and to troubleshoot problems, and generally know several programming languages used for scripting or automation of routine tasks.

A typical sysadmin's role is not to design or write new application software but when they are responsible for automating system or application configuration with various configuration management tools, the lines somewhat blur.

That said, system administrators are not software engineers or developers , in the job title sense. Particularly when dealing with Internet -facing or business-critical systems, a sysadmin must have a strong grasp of computer security.

This includes not merely deploying software patches, but also preventing break-ins and other security problems with preventive measures. In some organizations, computer security administration is a separate role responsible for overall security and the upkeep of firewalls and intrusion detection systems , but all sysadmins are generally responsible for the security of computer systems.

A system administrator's responsibilities might include: Analyzing system logs and identifying potential issues with computer systems. Applying operating system updates, patches, and configuration changes. Installing and configuring new hardware and software. Adding, removing, or updating user account information, resetting passwords , etc. Answering technical queries and assisting users.

Responsibility for documenting the configuration of the system. You are in the middle of some production cycle or are just working on the desktop when the computer, for some mysterious reason, hangs or displays some elaborate screen message with a lot of HEX addresses and perhaps a stack of an offending NULL dereference.

What to do? In this chapter, we hope to provide an answer as we discuss kernel panics, oops, hangs, and hardware faults. We examine what the system does in these situations and discuss the tools required for initial analysis. We begin by discussing OS hangs. We then discuss kernel panics and oops panics. Finally, we conclude with hardware machine checks. It is important to identify whether you are encountering a panic, a hang, or a hardware fault to know how to remedy the problem.

Panics are easy to detect because they consist of the kernel voluntarily shutting down. Hangs can be more difficult to detect because the kernel has gone into some unknown state and the driver has ceased to respond for some reason, preventing the processes from being scheduled.

Hardware faults occur at a lower level, independent of and beneath the OS, and are observed through firmware logs. When you encounter a hang, panic, or hardware fault, determine whether it is easily reproducible.

This information helps to identify whether the underlying problem is a hardware or software problem. If it is easily reproducible on different machines, chances are that the problem is software-related. If it is reproducible on only one machine, focus on ruling out a problem with supported hardware. One final important point before we begin discussing hangs: Whether you are dealing with an OS hang or panic, you must confirm that the hardware involved is supported by the Linux distribution before proceeding.

Make sure the manufacturer supports the Linux kernel and hardware configuration used. Contact the manufacturer or consult its documentation or official Web site. This step is so important because when the hardware is supported, the manufacturer has already contributed vast resources to ensure compatibility and operability with the Linux kernel. Conversely, if it is not supported, you will not have the benefit of this expertise, even if you can find the bug, and either the manufacturer would have to implement your fix, or you would have to modify the open source driver yourself.

However, even if the hardware is not supported, you may find this chapter to be a helpful learning tool because we highlight why the driver, kernel module, application, and hardware are behaving as they are. Chapter 3. Performance Tools This chapter explains how to use the wealth of performance tools available for Linux. We also explain what the information from each tool means.

Even if you are already using top or sar, you can probably learn some things from this chapter. You should make a habit of using these tools if you are not already doing so.

You need to know how to troubleshoot a performance problem, of course, but you should also regularly look for changes in the key metrics that can indicate a problem. You can use these tools to measure the performance impact of a new application. Just like looking at the temperature gauge in a car, you need to keep an eye on the performance metrics of your Linux systems.

These performance tools are delivered with a few rpms. The procps rpm supplies top, free, and vmstat. The sysstat rpm provides sar and iostat. The top command is a great interactive utility for monitoring performance. It provides a few summary lines of overall Linux performance, but reporting process information is where top shines.

The process display can be customized extensively. You can add fields, sort the list of processes by different metrics, and even kill processes from top.

The sar utility offers the capability to monitor just about everything. It has over 15 separate reporting categories including CPU, disk, networking, process, swap, and more. The vmstat command reports extensive information about memory and swap usage. These commands cover a lot of the same ground. We discuss how to use the commands, and we explain the reports that each command generates. We don't discuss all 15 sar syntaxes, but we cover the most common ones. Chapter 4. Performance As a general discussion, performance is much too broad for a single book, let alone a single chapter.

SANs are growing in popularity because they assist with storage consolidation and simplification. The main discussion point within the computing industry with regards to storage consolidation is, as it has always been, performance. We include examples of each topic throughout the chapter. Chapter 5.

It is a well-known fact that no system remains stagnant forever, so for those who try to achieve life eternal for their systems, ultimate failure awaits. Take racing as an apt analogy. If a racer never upgrades to a newer engine CPU or chassis model , then the racer will have a hard time staying competitive. Thus, in this chapter, we discuss how to add more to our "racer. The capability to consolidate all storage in a data center into large frames containing many drives is indeed the direction companies will, and need to, take.

Therefore, Linux must "Lead, follow, or get out of the way. We begin by defining the configuration used to demonstrate our examples and by discussing some highlights. We then discuss the addition of a PCI device to connect additional storage.

Next, we move to a discussion of adding storage to a defined PCI device. Chapter 6. Disk Partitions and Filesystems Cylinders, sectors, tracks, and heads are the building blocks of spindle storage. Understanding the millions of bytes confined to a space that is half an inch thick, two inches wide, and three inches in length is critical to data recovery.

Consider the smallest form of storage that every person, at one time or another, has held in the palm of his or her hand. Most of us over the age of 25 recollect ravaging our desk, digging for the all-important, "critical-to-life" 1. This critical piece of plastic never fails to be found under the heaviest object on the desk. It is amazing that any data survives on the floppy after removing the seven-pound differential equations bible that was covering it.

However, today's storage needs require much larger devices and more advanced methods to protect and recover the data they hold. Chapter 7. Device Failure and Replacement Whether the red LED is flashing or the syslog is filling up with cryptic messages, a hardware failure is never a day at the beach. The goal of this chapter is to provide a guide for identifying and remedying device failures.

We begin with a discussion of supported devices before proceeding with a discussion of how to look for errors. We then discuss how to identify a failed device. Finally, we consider replacements and alternative options to remedy the problem. Chapter 8. In both systems, for process hangs, we identify the system resources being used by the process and attempt to identify the cause for the process to stop responding. With application core dumps, we must identify the signal for which the process terminated and proceed with acquiring a stack trace to identify system calls made by the process at the time it died.

There exists neither a "golden" troubleshooting path nor a set of instructions that can be applied for all cases. Some conditions are much easier to solve than others, but with a good understanding of the fundamentals, a solution is not far from reach.

This chapter explains various facets of Linux processes. We begin by examining the structure of a process and its life cycle from creation to termination. This is followed by a discussion of Linux threads. The aforementioned establish a basis for proceeding with a discussion of process hangs and core dumps.

Chapter If a report should be run every Friday at 8 p. If a user wants to run a sweep of the system to find a misplaced file, the job can be scheduled to run that evening using the at command. This chapter explains how cron and at work. You might not be familiar with the anacron and kcron packages, but they extend the features of cron. We also explain these tools in this chapter. We show how to use cron and the other utilities, but we also show how they work and provide examples of what can go wrong.

The crontab command is used to submit and edit jobs. The reader will see the various crontab syntaxes and the format of the cron configuration file. The other files cron uses are explained as well. The cron daemon runs the jobs submitted with crontab. This topic details how the daemon gets started, where it logs, and the differences between cron packages.

We also discuss a graphical front end to crontab called kcron. Learn how it works in this section. We show examples of submitting, removing, and monitoring jobs with at.

We conclude the chapter with a section on four troubleshooting scenarios that demonstrate good methodologies for fixing problems with cron. Printing and Printers Printing is easy to overlook.

Many people dismiss it as a minor subsystem in Linux. I learned that was not the case one Friday. I received an urgent call for assistance from the payroll department. It was payday, and the payroll system was unable to print. Hundreds of people in that company were praying for the printing subsystem to be fixed. In this chapter, we discuss the major types of printer hardware, the major spooler software available, and ways to troubleshoot both.

System Security System security is about as important an IT topic as there is these days. A key responsibility of a system administrator is keeping data secure and safe. In the Internet age, this requires more diligence and preparation that ever before.

Even on systems inside a firewall, it is urgent to prepare and monitor for intrusions. This chapter begins by defining system security. It then tackles the issue of prevention, focusing on troubleshooting SSH and system hardening issues. Network Problems It goes without saying that a networking problem can really put a kink in your day.

That is why we devote an entire chapter to Linux network troubleshooting.

Free Linux Books for System and Network Administrators

Although this chapter is not intended to teach the fundamentals of networking, a brief overview is justified. After we cover this subject, we move on to discussing identification of the perceived network problem and isolation of the subsystem involved. We then discuss options for resolution. Login Problems User login attempts can fail for many reasons. The account could have been removed or the password changed. Linux provides password aging to force users to change their passwords regularly.

A password can have a maximum age after which the account is locked. If a user notifies you that his login attempts fail, the first thing to check is whether he is permitted to log in. Linux does not provide a meaningful explanation for why logins fail. This is part of good security because few hints are given to would-be intruders.

It does make troubleshooting more complex, however. This chapter explains the commands needed to troubleshoot login failures and explains the authentication components. If you follow the steps explained in this chapter, you should be able to understand and correct login failures.

We demonstrate how to look at and modify the password aging information in accounts. Login failures due to Linux configuration Some examples include when the login is disabled because system maintenance is being performed and root login is refused because it is attempted from somewhere other than the console. Shell problems If a user logs in but does not get the shell prompt or the application doesn't start, there may be a problem with the shell configuration. We discuss some common shell issues.

Password problems Finally, we provide a short program to validate user passwords.

Créez un blog gratuitement et facilement sur free!

X Windows Problems With today's servers and personal computers, it is hard to imagine not having a graphical desktop. The ability to "point and click" your way around the desktop and configuration menus has made it possible for non-computer geeks to manipulate these machines. In addition, it has made the computer administrator more powerful than ever. This server process could be thought of as being just like any other application that uses drivers to access and control hardware.

With today's desktop environments, a single computer can use multiple monitors along with multiple virtual desktops. Something that makes X stand above the rest is its innate network design, which enables remote machines to display their programs on a local desktop. In this chapter, we cover Linux's implementation of X along with some troubleshooting techniques, and we illustrate key concepts using a few scenarios.

The init process is always running. Few environment variables are set when a process is started by init. The id must be runlevels The runlevels field contains one or more characters, usually numbers identifying the unique.

Table lists the runlevel meanings. The more common keywords are shown in Table In that case, download version 0. LILO is a two-stage bootloader. LILO does not understand filesystems. The physical location of the files is stored in the map file. Because this information is encoded in the map file, LILO doesn't provide a shell-like environment as GRUB does to manually enter kernel location information at boot time.

Table provides a description of the global entries used in this file. Table Timeout value specified in tenths of a second. If not specified, the current root partition is used. Location of map file. The stage1 and stage2 bootloader. Addresses will be linear sector addresses instead of sector, head, cylinder addresses.

The first line of a group of lines defining a boot entry. File to be used as a ram disk. A string to be appended to the parameter line passed to the kernel.

Name of the boot entry to be displayed. If no label entry exists, the boot entry name is the filename from the image parameter. Our goal is to show how LILO works and how to fix problems. LILO is well documented in the lilo. Press Tab to interrupt autoboot and see the list of boot entries. Figure shows the display after Tab is pressed. Just type the name of the entry and press Enter. Figure demonstrates how to boot to single user mode init runlevel 1.

Just add emergency to the command line. As we stated earlier, emergency mode is a minimalist environment. Red Hat provides the command mkbootdisk to create a bootable floppy. Thus, the root filesystem must be in good condition. This is not a rescue utilities disk. See the mkbootdisk 8 man page for full details. Any information on the disk will be lost. Configuring bootloader The meaning of each is described in Chapter 6.

When only LI is displayed, the first stage bootloader could not execute the second stage loader. Maybe the file was moved or deleted. What now? We can use the mkbootdisk floppy. A mkbootdisk floppy is a good recovery tool. We discuss recovery CDs later in this chapter. Chapter 9.

It is also one of the more vexing areas. Nothing gets an administrator in trouble faster than lost data. In this chapter, we discuss the key categories of backup and recovery, and we look at some important areas of concern.

The first distinction between backup types is remote versus local backups. Local backups to media are typically faster, but the incremental cost of adding media storage to every system becomes expensive quickly.

The second option is to use a remote system as a backup server. This approach slows the backups somewhat and increases network bandwidth usage, but the backups typically happen in the middle of the night when most systems are quiet. This distinction is not a major focus of the chapter, but it is a fact of backup and recovery life and must be mentioned. The main issues addressed in the chapter include backup media and the types of backup devices available, backup strategies, the benefits and limitations of different utilities, and ways to troubleshoot failing tape backups.

System Hangs and Panics Anyone with any system administration experience has been there. You are in the middle of some production cycle or are just working on the desktop when the computer, for some mysterious reason, hangs or displays some elaborate screen message with a lot of HEX addresses and perhaps a stack of an offending NULL dereference. What to do? In this chapter, we hope to provide an answer as we discuss kernel panics, oops, hangs, and hardware faults.

We examine what the system does in these situations and discuss the tools required for initial analysis. We begin by discussing OS hangs. We then discuss kernel panics and oops panics.

Finally, we conclude with hardware machine checks. It is important to identify whether you are encountering a panic, a hang, or a hardware fault to know how to remedy the problem. Panics are easy to detect because they consist of the kernel voluntarily shutting down. Hangs can be more difficult to detect because the kernel has gone into some unknown state and the driver has ceased to respond for some reason, preventing the processes from being scheduled.

Hardware faults occur at a lower level, independent of and beneath the OS, and are observed through firmware logs.

When you encounter a hang, panic, or hardware fault, determine whether it is easily reproducible. This information helps to identify whether the underlying problem is a hardware or software problem.

If it is easily reproducible on different machines, chances are that the problem is software-related. If it is reproducible on only one machine, focus on ruling out a problem with supported hardware. One final important point before we begin discussing hangs: Whether you are dealing with an OS hang or panic, you must confirm that the hardware involved is supported by the Linux distribution before proceeding.

Make sure the manufacturer supports the Linux kernel and hardware configuration used. Contact the manufacturer or consult its documentation or official Web site. This step is so important because when the hardware is supported, the manufacturer has already contributed vast resources to ensure compatibility and operability with the Linux kernel.

Conversely, if it is not supported, you will not have the benefit of this expertise, even if you can find the bug, and either the manufacturer would have to implement your fix, or you would have to modify the open source driver yourself.

However, even if the hardware is not supported, you may find this chapter to be a helpful learning tool because we highlight why the driver, kernel module, application, and hardware are behaving as they are.

Chapter 3. Performance Tools This chapter explains how to use the wealth of performance tools available for Linux. We also explain what the information from each tool means. Even if you are already using top or sar, you can probably learn some things from this chapter.

You should make a habit of using these tools if you are not already doing so. You need to know how to troubleshoot a performance problem, of course, but you should also regularly look for changes in the key metrics that can indicate a problem. You can use these tools to measure the performance impact of a new application.

Just like looking at the temperature gauge in a car, you need to keep an eye on the performance metrics of your Linux systems. These performance tools are delivered with a few rpms. The procps rpm supplies top, free, and vmstat.

The sysstat rpm provides sar and iostat. The top command is a great interactive utility for monitoring performance. It provides a few summary lines of overall Linux performance, but reporting process information is where top shines. The process display can be customized extensively. You can add fields, sort the list of processes by different metrics, and even kill processes from top.

The sar utility offers the capability to monitor just about everything. It has over 15 separate reporting categories including CPU, disk, networking, process, swap, and more.

The vmstat command reports extensive information about memory and swap usage. These commands cover a lot of the same ground. We discuss how to use the commands, and we explain the reports that each command generates. We don't discuss all 15 sar syntaxes, but we cover the most common ones.

Chapter 4. Performance As a general discussion, performance is much too broad for a single book, let alone a single chapter. SANs are growing in popularity because they assist with storage consolidation and simplification.

Créez un blog gratuitement et facilement sur free!

The main discussion point within the computing industry with regards to storage consolidation is, as it has always been, performance. We include examples of each topic throughout the chapter. Chapter 5. It is a well-known fact that no system remains stagnant forever, so for those who try to achieve life eternal for their systems, ultimate failure awaits.

Take racing as an apt analogy. If a racer never upgrades to a newer engine CPU or chassis model , then the racer will have a hard time staying competitive. Thus, in this chapter, we discuss how to add more to our "racer. The capability to consolidate all storage in a data center into large frames containing many drives is indeed the direction companies will, and need to, take.

Therefore, Linux must "Lead, follow, or get out of the way. We begin by defining the configuration used to demonstrate our examples and by discussing some highlights. We then discuss the addition of a PCI device to connect additional storage. Next, we move to a discussion of adding storage to a defined PCI device. Chapter 6. Disk Partitions and Filesystems Cylinders, sectors, tracks, and heads are the building blocks of spindle storage.

Understanding the millions of bytes confined to a space that is half an inch thick, two inches wide, and three inches in length is critical to data recovery. Consider the smallest form of storage that every person, at one time or another, has held in the palm of his or her hand. Most of us over the age of 25 recollect ravaging our desk, digging for the all-important, "critical-to-life" 1.

This critical piece of plastic never fails to be found under the heaviest object on the desk. It is amazing that any data survives on the floppy after removing the seven-pound differential equations bible that was covering it. However, today's storage needs require much larger devices and more advanced methods to protect and recover the data they hold. Chapter 7. Device Failure and Replacement Whether the red LED is flashing or the syslog is filling up with cryptic messages, a hardware failure is never a day at the beach.

The goal of this chapter is to provide a guide for identifying and remedying device failures. We begin with a discussion of supported devices before proceeding with a discussion of how to look for errors.

We then discuss how to identify a failed device. Finally, we consider replacements and alternative options to remedy the problem.

Chapter 8. In both systems, for process hangs, we identify the system resources being used by the process and attempt to identify the cause for the process to stop responding.

With application core dumps, we must identify the signal for which the process terminated and proceed with acquiring a stack trace to identify system calls made by the process at the time it died.

There exists neither a "golden" troubleshooting path nor a set of instructions that can be applied for all cases. Some conditions are much easier to solve than others, but with a good understanding of the fundamentals, a solution is not far from reach.

This chapter explains various facets of Linux processes. We begin by examining the structure of a process and its life cycle from creation to termination.

This is followed by a discussion of Linux threads. The aforementioned establish a basis for proceeding with a discussion of process hangs and core dumps. Chapter If a report should be run every Friday at 8 p. If a user wants to run a sweep of the system to find a misplaced file, the job can be scheduled to run that evening using the at command. This chapter explains how cron and at work. You might not be familiar with the anacron and kcron packages, but they extend the features of cron.

We also explain these tools in this chapter. We show how to use cron and the other utilities, but we also show how they work and provide examples of what can go wrong. The crontab command is used to submit and edit jobs. The reader will see the various crontab syntaxes and the format of the cron configuration file.

The other files cron uses are explained as well. The cron daemon runs the jobs submitted with crontab.

This topic details how the daemon gets started, where it logs, and the differences between cron packages.Section 3. It was later added through a patch. There are numerous examples. You are on page 1of Search inside document Chapter 1. The mtx command moves tapes around within a tape library.

Figure is a screenshot of a Red Hat single user mode boot.