Compaq AlphaServer ES40 Uživatelský manuál stáhnout pdf (Strana 12)

Reliability and Availability Features

The AlphaServer ES40 system achieves an unparalleled level

of reliability and availability through the careful application of

technologies that balance redundancy, error correction, and

fault management. Reliability and availability features are

built into the CPU, memory, and I/O, and implemented at the

system level.

Processor Features

•

CPU data cache provides error correction code (ECC)

protection.

•

Parity protection on CPU cache tag store.

•

Multi-tiered power-up diagnostics to verify the

functionality of the hardware.

When you power up or reset the system, each CPU, in parallel,

runs a set of diagnostic tests. If any tests fail, the failing CPU

is configured out of the system. Responsibility for initializing

memory and booting the console firmware is transferred to

another CPU, and the boot process continues. This feature

ensures that a system can still power up and boot the operating

system in case of a CPU failure. Messages on the operator

control panel power-up/diagnostic display indicate the test

status and component failure information.

Memory Features

•

The memory ECC scheme is designed to provide maxi-

mum protection for user data. The memory scheme

corrects single-bit errors and detects double-bit errors and

total DRAM failure.

•

Memory failover. The power-up diagnostics are designed

to provide the largest amount of usable memory, config-

uring around errors.

I/O Features

•

ECC protection on the switch interconnect and parity

protection on the PCI and SCSI buses.

•

Extensive error correction built into disk drives.

•

Optional internal RAID improves reliability and data

security.

•

Disk hot swap.

System Features

Auto reboot. On systems running Tru64 UNIX or OpenVMS,

a firmware environment variable lets you set the default action

the system takes on power-up, reset, or after an operating

system crash. For maximum system availability, the variable

can be set to cause the system to automatically reboot the

operating system after most system failures. Windows NT

auto reboots by default, but lets you specify a countdown value

so you can stop the system from booting if you need to carry

out other tasks from the console firmware.

Software installation. The operating systems are factory

installed. Factory installed software (FIS) allows you to boot

and use your system in a shorter time than if you install the

software from a distribution kit.

Diagnostics. During the power-up process, diagnostics are run

to achieve several goals:

•

Provide a robust hardware platform for the operating

system by ensuring that any faulty hardware does not

participate in the operating system session. This maxi-

mizes system uptime by reducing the risk of system

failure.

•

Enable efficient, timely repair.

Audible beep codes report the status of diagnostic testing.

The system has a firmware update utility (LFU) that provides

update capability for console and PCI I/O adapter firmware. A

fail-safe loader provides a means of reloading the console in

the event of corrupted firmware.

Thermal management. The air temperature and fan operation

are monitored to protect against overheating and possible

hardware destruction. Six fans provide front to back cooling,

and the power supplies, in the rear, have their own fans. If the

termperature rises, the system fans speed up; or if necessary to

prevent damage, the system shuts down. If the main fan,

which cools the system card cage, fails, a redundant fan takes

over.

Error handling. Parity and other error conditions are detected

on the PCI buses. The memory checking scheme corrects

single-bit errors and detects double-bit errors. Multiple ECC

corrections to single-bit errors detected by the operating

systems help in determining where in the system the error

originated. Errors are logged for analysis.

Disk hot swap. The hardware is designed to enable hot swap

of disks. Hot swap is the removal of a disk or disks from any

of the storage compartments while the rest of the system

remains powered on and continues to operate. This feature

contributes significantly to system availability. Since many

disk problems can be fixed without shutting down the entire

system, users lose access only to the disks that are removed.

N+1 power redundancy. A second or third power supply can

be added to provide redundant power to the chassis. A second

power supply is needed for more than two CPUs or if a second

disk cage is installed. In this case the third supply provides

redundancy. Power supplies are 735 watts (DC). Each has

two LEDs to indicate the state of power to the system.

An external UPS can be purchased to support critical customer

configurations. Because power is maintained for the entire

system (CPU, memory, and I/O), power interruptions are

completely transparent to users.

1 2 ... 7 8 9 10 11 12 13 14 15 16 17 18

Komentáře k této Příručce

Žádné komentáře

Compaq AlphaServer ES40 Uživatelský manuál Strana 12

Komentáře k této Příručce

Související produkty a manuály pro Servery Compaq AlphaServer ES40