Linux SCSI HOWTO Drew Eckhardt, drew@cs.colorado.edu v2.15, 20 March 1995 This HOWTO covers the Linux SCSI subsystem, as implemented in Linux kernel revision 1.1.74 and newer alpha code. Earlier revisions of the SCSI code are unsupported, and may differ significantly in terms of the drivers implemented, performance, and options available. 1. Introduction 1.1. License Noncommercial redistributions of a verbatim copy in any medium physical or electronic are permitted without express permission of the author. Translations are similarly permitted without express permission if they includes a notice on who translated it. Commercial redistribution is allowed and encouraged, provided that the author is notified of any such distributions and given opportunity to to provide a more up-to-date version. Short quotes may be used without prior consent by the author. Derivative work and partial distributions of the SCSI-HOWTO must include either a verbatim copy of this file or make a verbatim copy of this file available. If the latter is the case, a pointer to the verbatim copy must be stated at a clearly visible place. In short, we wish to promote dissemination of this information through as many channels as possible. However, we do wish to retain copyright on the HOWTO documents, to be notified of any plans to redistribute the HOWTOs to insure that outdated versions don't spread too far, and for ALL the information provided in the HOWTOS to be disseminated. If you have questions on the Linux documentation project, please contact Matt Welsh, the Linux HOWTO coordinator, at mdw@sunsite.unc.edu. Questions regarding this document itself should be addressed to Drew Eckhardt, drew@Colorado.EDU. 1.2. Important Note IMPORTANT: BUG REPORTS WHICH FAIL TO FOLLOW THE PROCEDURE OUTLINED IN SECTION 2 WILL BE IGNORED. For additional information, you may wish to join the SCSI channel of the Linux activists list - mail to linux-activists- request@joker.cs.hut.fi. with the line X-Mn-Admin: join SCSI in the header, as well as the Linux SCSI list by mailing majordomo@vger.rutgers.edu with the line subscribe linux-scsi in the text. I'm aware that this document isn't the most user-friendly, and that there may be inaccuracies and oversights. If you have constructive comments on how to rectify the situation you're free to mail me about it. 2. Common Problems This section lists some of the common problems that people have. If there is not anything here that answers your questions, you should also consult the sections for your host adapter and the devices in that are giving you problems. 2.1. General Flakiness If you experience random errors, the most likely causes are cabling and termination problems. Some products, such as those built arround the newer NCR chips, feature digital filtering and active signal negation, and aren't very sensitive to cabling problems. Others, such as the Adaptec 154xC, 154xCF, and 274x, are extremely sensitive and may fail with cables that work with other systems. I reiterate: some host adapters are extremely sensitive to cabling and termination problems and therefore, cabling and termination should be the first things checked when there are problems. To minimize your problems, you should use cables which 1. Claim SCSI-II compliance 2. Have a characteristic impedance of 132 ohms 3. All come from the same source to avoid impedance mismatches 4. Come from a reputable vendor such as Amphenol Termination power should be provided by all devices on the SCSI bus, through a diode to prevent current backflow, so that sufficient power is available at the ends of the cable where it is needed. To prevent damage if the bus is shorted, TERMPWR should be driven through a fuse or other current limiting device. If multiple devices, external cables, or FAST SCSI 2 are used, active or forced perfect termination should be used on both ends of the SCSI bus. See the comp.periphs.scsi FAQ for more information about active termination. 2.2. The kernel command line Other parts of the documentation refer to a ``kernel command line''. The kernel command line is a set of options you may specify from either the LILO: prompt after an image name, or in the append field in your LILO configuration file (LILO .14 and newer use /etc/lilo.conf, older versions use /etc/lilo/config). Boot your system with LILO, and hit one of the alt, control, or shift keys when it first comes up to get a prompt. LILO should respond with boot: At this prompt, you can select a kernel image to boot, or list them with ``?''. Ie boot: ? ramdisk floppy harddisk To boot that kernel with the command line options you have selected, simply enter the name followed by a white space delimited list of options, terminating with a return. Options take the form of variable=valuelist Where valuelist may be a single value or comma delimited list of values with no whitespace. With the exception of root device, individual values are numbers, and may be specified in either decimal or hexadecimal. Ie, to boot linux with an Adaptec 1520 clone not recognized at bootup, you might type boot: floppy aha152x=0x340,11,7,1 If you don't care to type all of this at boot time, it is also possible to use the LILO configuration file ``append'' option with LILO .13 and newer. Ie, append="aha152x=0x340,11,7,1" 2.3. A SCSI device shows up at all possible IDs If this is the case, you have strapped the device at the same address as the controller (typically 7, although some boards use other addresses, with 6 being used by some Future Domain boards). Please change the jumper settings. 2.4. A SCSI device shows up at all possible LUNs The device has buggy firmware. As an interim solution, you should try using the kernel command line option max_scsi_luns=1 If that works, there is a list of buggy devices in the kernel sources in drivers/scsi/scsi.c in the variable blacklist. Add your device to this list and mail the patch to Linus. 2.5. You get sense errors when you know the devices are error free Sometimes this is caused by bad cables or impropper termination. See ``'': General Flakiness 2.6. A kernel configured with networking does not work. The auto-probe routines for many of the network drivers are not passive, and will interfere with operation with some of the SCSI drivers. 2.7. Device detected, but unable to access. A SCSI device is detected by the kernel, but you are unable to access it - ie mkfs /dev/sdc, tar xvf /dev/rst2, etc fails. You don't have a special file in /dev for the device. Unix devices are identified as either block or character (block devices go through the buffer cache, character devices do not) devices, a major number (ie which driver is used - block major 8 corresponds to SCSI disks) and a minor number (ie which unit is being accessed through a given driver - ie character major 4, minor 0 is the first virtual console, minor 1 the next, etc). However, accessing devices through this separate namespace would break the unix/Linux metaphor of ``everything is a file,'' so character and block device special files are created under /dev. This lets you access the raw third SCSI disk device as /dev/sdc, the first serial port as /dev/ttyS0, etc. The preferred method for creating a file is using the MAKEDEV script: cd /dev and run MAKEDEV (as root) for the devices you want to create - ie wildcards ``should'' work - ie ``should'' create entries for all SCSI disk devices (doing this should create /dev/sda through /dev/sdp, with fifteen partition entries for each) ``should'' create entries for /dev/sdc and all fifteen permissible partitions on /dev/sdc, etc. I say ``should'' because this is the standard unix behavior - the MAKEDEV script in your installation may not conform to this behavior, or may have restricted the number of devices it will create. If MAKEDEV won't do the right magic for you, you'll have to create the device entries by hand with the mknod command. The block/character type, major, and minor numbers are specified for the various SCSI devices in Subsection 3: Device Files in the appropriate section. Take those numbers, and use (as root) mknod /dev/device b|c major minor ie - mknod /dev/sdc b 8 32 mknod /dev/rst0 c 9 0 2.8. SCSI System Lockups This could be one of a number of things. Also see the section for your specific host adapter for possible further solutions. There are cases where the lockups seem to occur when multiple devices are in use at the same time. In this case, you can try contacting the manufacturer of the devices and see if firmware upgrades are available which would correct the problem. If possible, try a different scsi cable, or try on another system. This can also be caused by bad blocks on disks, or by bad handling of DMA by the motherboard (for host adapters that do DMA). There are probably many other possible conditions that could lead to this type of event. Sometimes these problems occur when there are multiple devices in use on the bus at the same time. In this case, if your host adapter driver supports more than one outstanding command on the bus at one time, try reducing this to 1 and see if this helps. If you have tape drives or slow cdrom drives on the bus, this might not be a practical solution. 2.9. Configuring and building the kernel Unused SCSI drivers eat up valuable memory, aggravating memory shortage problems on small systems because kernel memory is unpagable. So, you will want to build a kernel tuned for your system, with only the drivers you need installed. cd to /usr/src/linux If you are using a root device other than the current one, or something other than 80x25 VGA, and you are writing a boot floppy, you should edit the makefile, and make sure the ROOT_DEV = and SVGA_MODE = lines are the way you want them. If you've installed any patches, you may wish to guarantee that all files are rebuilt. If this is the case, you should type make mrproper Regardless of whether you ran make mrproper, type make config and answer the configuration questions. Then run make depend and finally make Once the build completes, you may wish to update the lilo configuration, or write a boot floppy. A boot floppy may be made by running make zdisk 2.10. LUNS other than 0 don't work This is often a problem with SCSI-> MFM, RLL, ESDI, SMD, and similar bridge boards. At some point, we came to the conclusion that many SCSI-I devices were extremely broken, and added the following code ______________________________________________________________________ /* Some scsi-1 peripherals do not handle lun != 0. I am assuming that scsi-2 peripherals do better */ if((scsi_result[2] & 0x07) == 1 && (scsi_result[3] & 0x0f) == 0) break; ______________________________________________________________________ to scan_scsis() in drivers/scsi/scsi.c. If you delete this code, your old devices should be detected correctly if you have not used the max_scsi_luns kernel command line option, or the NO_MULTI_LUN compile time define. 3. Reporting Bugs The Linux SCSI developers don't necessarily maintain old revisions of the code due to space constraints. So, if you are not running the latest publically released Linux kernel (note that many of the Linux distributions, such as MCC, SLS, Yggdrasil, etc. often lag one or even twenty patches behind this) chances are we will be unable to solve your problem. So, before reporting a bug, please check to see if it exists with the latest publically available kernel. If after upgrading, and reading this document thoroughly, you still believe that you have a bug, please mail a bug report to the SCSI channel of the mailing list where it will be seen by many of the people who've contributed to the Linux SCSI drivers. In your bug report, please provide as much information as possible regarding your hardware configuration, the exact text of all of the messages that Linux prints when it boots, when the error condition occurs, and where in the source code the error is. Use the procedures outlined in Section 2.1 : Capturing messages and Section 2.2 : Locating the source of a panic(). Failure to provide the maximum possible amount of information may result in misdiagnosis of your problem, or developers deciding that there are other more interesting problems to fix. The bottom line is that if we can't reproduce your bug, and you can't point at us what's broken, it won't get fixed. 3.1. Capturing messages If you are not running a kernel message logging system : Insure that the /proc filesystem is mounted. grep proc /etc/mtab If the /proc filesystem is not mounted, mount it mkdir /proc chmod 755 /proc mount -t proc /proc /proc Copy the kernel revision and messages into a log file cat /proc/version >/tmp/log cat /proc/kmsg >>/tmp/log Type Ctrl-C after a second or two. If you are running some logger, you'll have to poke through the appropriate log files (/etc/syslog.conf should be of some use in locating them), or use dmesg. If Linux is not yet bootstrapped, format a floppy diskette under DOS. Note that if you have a distribution which mounts the root diskette off of floppy rather than RAM drive, you'll have to format a diskette readable in the drive not being used to mount root or use their ramdisk boot option. Boot Linux off your distribution boot floppy, preferably in single user mode using a RAM disk as root. mkdir /tmp/dos Insert the diskette in a drive not being used to mount root, and mount it. Ie mount -t msdos /dev/fd0 /tmp/dos or mount -t msdos /dev/fd1 /tmp/dos Copy your log to it cp /tmp/log /tmp/dos/log Unmount the DOS floppy umount /tmp/dos And shutdown Linux shutdown Reboot into DOS, and using your favorite communications software include the log file in your trouble mail. 3.2. Locating the source of a panic() Like other unices, when a fatal error is encountered, Linux calls the kernel panic() function. Unlike other unices, Linux doesn't dump core to the swap or dump device and reboot automatically. Instead, a useful summary of state information is printed for the user to manually copy down. Ie : Unable to handle kernel NULL pointer dereference at virtual address c0000004 current->tss,cr3 = 00101000, %cr3 = 00101000 *pde = 00102027 *pte = 00000027 Oops: 0000 EIP: 0010:0019c905 EFLAGS: 00010002 eax: 0000000a ebx: 001cd0e8 ecx: 00000006 edx: 000003d5 esi: 001cd0a8 edi: 00000000 ebp: 00000000 esp: 001a18c0 ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 Process swapper (pid: 0, process nr: 0, stackpage=001a09c8) Stack: 0019c5c6 00000000 0019c5b2 00000000 0019c5a5 001cd0a8 00000002 00000000 001cd0e8 001cd0a8 00000000 001cdb38 001cdb00 00000000 001ce284 0019d001 001cd004 0000e800 fbfff000 0019d051 001cd0a8 00000000 001a29f4 00800000 Call Trace: 0019c5c6 0019c5b2 0018c5a5 0019d001 0019d051 00111508 00111502 0011e800 0011154d 00110f63 0010e2b3 0010ef55 0010ddb7 Code: 8b 57 04 52 68 d2 c5 19 00 e8 cd a0 f7 ff 83 c4 20 8b 4f 04 Aiee, killing interrupt handler kfree of non-kmalloced memory: 001a29c0, next= 00000000, order=0 task[0] (swapper) killed: unable to recover Kernel panic: Trying to free up swapper memory space In swapper task - not syncing Take the hexidecimal number on the EIP: line, in this case 19c905, and search through /usr/src/linux/zSystem.map for the highest number not larger than that address. Ie, 0019a000 T _fix_pointers 0019c700 t _intr_scsi 0019d000 t _NCR53c7x0_intr That tells you what function its in. Recompile the source file which defines that function file with debugging enabled, or the whole kernel if you prefer by editing /usr/src/linux/Makefile and adding a ``-g'' to the CFLAGS definition. ##standard CFLAGS # Ie, CFLAGS = -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -pipe becomes CFLAGS = -g -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -pipe Rebuild the kernel, incrementally or by doing a make clean make Make the kernel bootable by creating an entry in your /etc/lilo.conf for it image = /usr/src/linux/zImage label = experimental and re-running LILO as root, or by creating a boot floppy make zImage Reboot and record the new EIP for the error. If you have script installed, you may want to start it, as it will log your debugging session to the typescript file. Now, run gdb /usr/src/linux/tools/zSystem and enter info line * Ie, info line *0x19c905 To which GDB will respond something like (gdb) info line *0x19c905 Line 2855 of ``53c7,8xx.c'' starts at address 0x19c905 and ends at 0x19c913 . Record this information. Then, enter list < line number> Ie, ______________________________________________________________________ (gdb) list 2855 2850 /* printk("scsi%d : target %d lun %d unexpected disconnect\n", 2851 host->host_no, cmd->cmd->target, cmd->cmd->lun); */ 2852 printk("host : 0x%x\n", (unsigned) host); 2853 printk("host->host_no : %d\n", host->host_no); 2854 printk("cmd : 0x%x\n", (unsigned) cmd); 2855 printk("cmd->cmd : 0x%x\n", (unsigned) cmd->cmd); 2856 printk("cmd->cmd->target : %d\n", cmd->cmd->target); 2857 if (cmd) { 2858 abnormal_finished(cmd, DID_ERROR << 16); 2859 } 2860 hostdata->dsp = hostdata->script + hostdata->E_schedule / 2861 sizeof(long); 2862 hostdata->dsp_changed = 1; 2863 /* SCSI PARITY error */ 2864 } 2865 2866 if (sstat0_sist0 & SSTAT0_PAR) { 2867 fatal = 1; 2868 if (cmd && cmd->cmd) { 2869 printk("scsi%d : target %d lun %d parity error.\n", ______________________________________________________________________ Obviously, quit will take you out of GDB. Record this information too, as it will provide a context incase the developers' kernels differ from yours. 4. Hosts This section gives specific information about the various host adapters that are supported in some way or another under linux. 4.1. Supported and Unsupported Hardware Drivers in the distribution kernel : Adaptec 152x, Adaptec 154x (including clones from Bustek and DTC 329x boards), Adaptec 174x, Adaptec 274x/284x/2940, EATA-DMA protocol compilant boards (all DPT PMXXXXX/XX and SKXXXXX/XX except the PM2001, some boards from NEC and ATT), Future Domain 850, 885, 950, and other boards in that series (but not the 840, 841, 880, and 881 boards unless you make the appropriate patch), Future Domain 16x0 with TMC-1800, TMC-18C30, or TMC-18C50 chips, NCR53c8xx,PAS16 SCSI ports, Seagate ST0x, Trantor T128/T130/T228 boards, Ultrastor 14F, 24F, and 34F, and Western Digital 7000. Alpha drivers: Richoh GSI-8 Many of the ALPHA drivers are available via anonymous FTP from Drivers that are being developed, but aren't publically available yet, and modifications needed to make existing drivers compatable with other boards: DPT PM2001 Announcements WILL be made when drivers are available for public alpha testing. Until then, please don't use up the developers' valuable time with mail asking for release dates, etc. o NCR53c8x0/7x0 o A NCR53c8xx driver has been developed, and with modifications ranging from minor to severe should support these chips o NCR53c720 - detection changes, initializaion changes, modification of the assembler to use the 720's register mapping o NCR53c710 - detection changes, initialization changes, modification of assembler, modification of the NCR code to use fatal interrupts or GPIO generated non fatal interrupts for command completion. o NCR53c700, NCR53c700-66 - detection changes, initialization changes, modification of NCR code to not use DSA, modification of Linux code to handle context switches. o NCR53c9x family o Qlogic SCSI hosts that will not work : o All parallel-> SCSI adapters o Rancho SCSI boards o and Grass Roots SCSI boards. SCSI hosts that will NEVER work: o Non Adaptec compatable DTC boards (including the 3270 and 3280). Aquiring programming information requires a non-disclosure agreement with DTC. This means that it would be impossible to distribute a Linux driver if one were written, since complying with the NDA would mean distributing no source, in violation of the GPL, and complying with the GPL would mean distributing source, in violation of the NDA. If you want to run Linux on an unsupported piece of hardware, your options are to either write a driver yourself (Eric Youngdale and I are usually willing to answer technical questions concerning the Linux SCSI drivers) or to commision a driver. 4.1.1. Multiple host adapters With some host adapters (see ``'': Buyers' Guide : Feature Comparison), you can use multiple host adapters of the same type in the same system. With multiple adapters of the same type in the same system, generally the one at the lowest address will be scsi0, the one at the next address scsi1, etc. In all cases, it is possible to use multiple host adapters of different types, provided that none of their addresses conflict. SCSI controllers are scanned in the order specified in the builtin_scsi_hosts[ ]array in drivers/scsi/hosts.c, with the order currently being 1. Ultrastor 2. Adaptec 151x/152x 3. Buslogic 4. Adaptec 154x 5. Adaptec 174x 6. Future Domain 16x0 7. Always IN2000 8. Generic NCR5380 9. PAS16 10. Seagate 11. Trantor T128/T130 12. NCR53c8xx 13. EATA-DMA 14. WD7000 15. debugging driver. In most cases (ie, you aren't trying to use both Buslogic and Adaptec drivers), this can be changed to suit your needs (ie, keeping the same devices when new SCSI devices are added to the system on a new controller) by moving the individual entries. 4.2. Common Problems 4.2.1. SCSI timeouts Make sure interrupts are enabled correctly, and there are no IRQ, DMA, or address conflicts with other boards. 4.2.2. Failure of autoprobe routines on boards that rely on BIOS for autoprobe. If your SCSI adapter is one of the following : o Adaptec 152x o Adaptec 151x o Adaptec AIC-6260 o Adaptec AIC-6360 o Future Domain 1680 o Future Domain TMC-950 o Future Domain TMC-8xx o Trantor T128 o Trantor T128F o Trantor T228F o Seagate ST01 o Seagate ST02 o Western Digital 7000 and it is not detected on bootup, ie you get a scsi : 0 hosts message or a scsi%d : type message is not printed for each supported SCSI adapter installed in the system, you may have a problem with the autoprobe routine not knowing about your board. Autodetection will fail for drivers using the BIOS for autodetection if the BIOS is disabled. Double check that your BIOS is enabled, and not conflicting with any other peripherial BIOSes. Autodetection will also fail if the board's ``signature'' and/or BIOS address don't match known ones. If the BIOS is installed, please use DOS and DEBUG to find a signature that will detect your board - Ie, if your board lives at 0xc8000, under DOS do debug d c800:0 q and send a message to the SCSI channel of the mailing list with the ASCII message, with the length and offset from the base address (ie, 0xc8000). Note that the exact text is required, and you should provide both the hex and ASCII portions of the text. If no BIOS is installed, and you are using an Adaptec 152x, Trantor T128, or Seagate driver, you can use command line or compile time overrides to force detection. Please consult the appropriate subsection for your SCSI board as well as ``''. 4.2.3. Failure of boards using memory mapped I/O (This include the Trantor T128 and Seagate boards, but not the Adaptec, Generic NCR5380, PAS16, and Ultrastor drivers) This is often caused when the memory mapped I/O ports are incorrectly cached. You should have the board's address space marked as uncachable in the XCMOS settings. If this is not possible, you will have to disable cache entirely. If you have manually specified the address of the board, remember that Linux needs the actual address of the board, and not the 16 byte segment the documentation may refer to. Ie, 0xc8000 would be correct, 0xc800 would not work and could cause memory corruption. 4.2.4. ``kernel panic : cannot mount root device'' when booting an ALPHA driver boot floppy You'll need to edit the binary image of the kernel (before or after writing it out to disk), and modify a few two byte fields (little endian) to gurantee that it will work on your system. 1. default swap device at offset 502, this should be set to 0x00 0x00 2. ram disk size at offset 504, this should be set to the size of the boot floppy in K - ie, 5.25" = 1200, 3.5" = 1440. This means the bytes are 3.5" : 0xA0 0x05 5.25" : 0xB0 0x04 3. root device offset at 508, this should be 0x00 0x00, ie the boot device. dd or rawrite the file to a disk. Insert the disk in the first floppy drive, wait until it prompts you to insert the root disk, and insert the root floppy from your distribution. 4.2.5. Installing a device driver not included with the distribution kernel You need to start with the version of the kernel used by the driver author. This revision may be alluded to in the documentation included with the driver. Various recent kernel revisions can be found at as linux-version.tar.gz They are also mirrored at tsx-11.mit.edu and various other sites. cd to /usr/src. Remove your old Linux sources, if you want to keep a backup copy of them mv linux linux-old Untar the archive gunzip ,< IRQ> ,< SCSI-ID> , < RECONNECT> Usually, SCSI-ID will be 7 and RECONNECT non-zero. To force d etection at 0x340, IRQ 11, at SCSI-ID 7, allowing disconnect/reconnect, you wou ld use the following command line option : aha152x=0x340,11,7,1 Antiquity Problems, fix by upgrading: The driver fails with VLB boards. There was a timing problem in kernels older than revision 1.0.5. Defines: AUTOCONF: use configuration the controller reports (only 152x) IRQ: override interrupt channel (9,10,11 or 12) (default 11) SCSI_ID: override scsiid of AIC-6260 (0-7) (default 7) RECONNECT: override target dis-/reconnection/multiple outstanding command: set to non-zero to enable, zero to disable. DONT_SNARF: Don't register ports (pl12 and below) SKIP_BIOSTEST: Don't test for BIOS signature (AHA-1510 or disabled BIOS) PORTBASE: Force port base. Don't try to probe 4.4. Adaptec 154x, AMI FastDisk VLB, Buslogic, DTC 329x (Standard) Supported Configurations: Ports: 0x330 and 0x334 IRQs: 9, 10, 11, 12, 14, 15 DMA channels: 5, 6, 7 IO: port mapped, bus master Autoprobe: works with all supported configurations, does not require an installed BIOS. Autoprobe override: none Note: No suffix boards, and early 'A' suffix boards do not support scatter/gather, and thus don't work. However, they can be made to work for some definition of the word works if AHA1542_SCATTER is changed to 0 in drivers/scsi/aha1542.h. Note: Buslogic makes a series of boards that are software compatible with the Adaptec 1542, and these come in ISA, VLB and EISA flavors. Antiquity Problems, fix by upgrading: 1. Linux kernel revisions prior to .99.10 don't support the 'C' revision. 2. Linux kernel revisions prior to .99.14k don't support the 'C' revision options for o BIOS support for the extended mapping for disks > 1G o BIOS support for > 2 drives o BIOS support for autoscanning the SCSI bus 3. Linux kernel revisions prior to .99.15e don't support the 'C' with the BIOS support for > 2 drives turned on and the BIOS support for the extended mapping for disks > 1G turned off. 4. Linux kernel revisions prior to .99.14u don't support the 'CF' revisions of the board. 5. Linux kernel revisions prior to 1.0.5 have a race condition when multiple devices are accessed at the same time. Common problems: 1. There are unexpected errors with a 154xC or 154xCF board, Early examples of the 154xC boards have a high slew rate on one of the SCSI signals, which results in signal reflections when cables with the wrong impedance are used. Newer boards aren't much better, and also suffer from extreme cabling and termination sensitivity. See also Common Problems # 2 and # 3 and Section ``'': Common Problems, and Section ``'': General Flakiness 2. There are unexpected errors with a 154xC or 154x with both internal and external devices connected. This is probably a termination problem. In order to use the software option to disable host adapter termination, you must turn switch 1 off. See also Common Problems # 2 and # 3 and Section ``'': Common Problems, and Section ``'': General Flakiness 3. The SCSI subsystem locks up completely. There are cases where the lockups seem to occur when multiple devices are in use at the same time. In this case, you can try contacting the manufacturer of the devices and see if firmware upgrades are available which would correct the problem. As a last resort, you can go into aha1542.h and change AHA1542_MAILBOX to 1. This will effectively limit you to one outstanding command on the scsi bus at one time, and may help the situation. If you have tape drives or slow cdrom drives on the bus, this might not be a practical solution. See also Common Problems # 2 and # 3 and Section ``'': Common Problems, Section ``'': General Flakiness and Section ``'':SCSI Lockups. 4. An ``Interrupt received, but no mail'' message is printed on bootup and your SCSI devices are not detected. Disable the BIOS options to support the extended mapping for disks > 1G, support for > 2 drives, and for autoscanning the bus. Or, upgrade to Linux .99.14k or newer. 4.5. Adaptec 174x Supported Configurations: Slots: 1-8 Ports: EISA board, not applicable IRQs: 9, 10, 11, 12, 14, 15 DMA Channels: EISA board, not applicable IO: port mapped, bus master Autoprobe: works with all supported configurations Autoprobe override: none Note: This board has been discontinued by Adaptec. Common Problems: 1. If the Adaptec 1740 driver prints the message ``aha1740: Board detected, but EBCNTRL = % x, so disabled it.'' your board was disabled because it was not running in enhanced mode. Boards running in standard 1542 mode are not supported. 4.6. Adaptec 274x, 284x, 294x (Standard) Newer revisions may be available at Supported Configurations: 274x: EISA Slots: 1-12 IRQs: ALL IO: port mapped, bus master 284x: Ports: All IRQs: All DMA Channels: All 294x PCI Note: BIOS MUST be enabled Note: The B channel on 2742AT boards is ignored. 4.7. Always IN2000 (ALPHA) ALPHA driver available at . The driver is in2000.tar.z, bootable kernel zImage Port: 0x100, 0x110, 0x200, 0x220 IRQs: 10, 11, 14, 15 DMA: not used IO: port mapped Autoprobe: BIOS not required Autoprobe override: none Common Problems: 1. There are known problems in systems with IDE drives and with swapping. 4.8. EATA: DPT Smartcache, Smartcache Plus, Smartcache III (Standard) Supported boards: all, that support the EATA-DMA protocol (no PM2001). DPT Smartcache: PM2011 PM2012A PM2012B Smartcache III: PM2021 PM2022 PM2024 PM2122 PM2124 PM2322 SmartRAID: PM3021 PM3222 PM3224 many of those boards are also available as SKXXXX versions, which are supported as well. Supported Configurations: Slots: ALL Ports: ALL IRQs: ALL level & edge triggered DMA Channels: ISA ALL, EISA/PCI not applicable IO: port mapped, bus master SCSI Channels: ALL Autoprobe: works with all supported configurations Compile time: diskgeometry in eata_dma.h for unusual disk geometries which came from the usage of the old DPTFMT utility. The latest version of the EATA-DMA driver and a Slackware bootdisk should be available on: Common Problems: 1. The IDE driver detects the ST-506 interface of the EATA board. a. This will look like similar to one of the following 2 examples: hd.c: ST-506 interface disk with more than 16 heads detected, probably due to non-standard sector translation. Giving up. (disk % d: cyl=% d, sect=63, head=64) hdc: probing with STATUS instead of ALTSTATUS hdc: MP0242 A, 0MB w/128KB Cache, CHS=0/0/0 hdc: cannot handle disk with 0 physical heads hdd: probing with STATUS instead of ALTSTATUS hdd: MP0242 A, 0MB w/128KB Cache, CHS=0/0/0 hdd: cannot handle disk with 0 physical heads If the IDE driver gets into trouble because of this, ie. you can't access your (real) IDE hardware, change the IO Port and/or the IRQ of the EATA board. b. If the IDE driver finds hardware it can handle ie. harddisks with a capacity < =504MB, it will allocate the IO Port and IRQ, so that the eata driver can't utilize them. In this case also change IO Port and IRQ (!= 14,15). 2. Some old SK2011 boards have a broken firmware. Please contact DPT's customer support for an update. 4.9. p Future Domain 16x0 with TMC-1800, TMC-18C30, TMC-18C50, or TMC-36C70 chi Supported Configurations: BIOSs: 2.0, 3.0, 3.2, 3.4, 3.5 BIOS Addresses: 0xc8000, 0xca000, 0xce000, 0xde000 Ports: 0x140, 0x150, 0x160, 0x170 IRQs: 3, 5, 10, 11, 12, 14, 15 DMA: not used IO: port mapped Autoprobe: works with all supported configurations, requires installed BIOS Autoprobe Override: none Antiquity Problems, fix by upgrading: 1. Old versions do not support the TMC-18C50 chip, and will fail with newer boards. 2. Old versions will not have the most current BIOS signatures for autodetection. 3. Versions prior to the one included in Linux 1.0.9 and 1.1.6 don't support the new SCSI chip or 3.4 BIOS. 4.10. Generic NCR5380 / T130B Supported and Unsupported Configurations: Ports: all IRQs: all DMA: not used IO: port mapped Autoprobe: none Autoprobe Override: Compile time: Define GENERIC_NCR5380_OVERRIDE to be an array of tupples with port, irq, dma, board type - ie _____________________________________________________________ #define GENERIC_NCR5380_OVERRIDE {{0x330, 5, DMA_NONE, BOARD_NCR5380}} _____________________________________________________________ for a NCR5380 board at port 330, IRQ 5. ________________________________________________________________ #define GENERIC_NCR5380_OVERRIDE {{0x350, 5, DMA_NONE, BOARD_NCR53C400}} ________________________________________________________________ for a T130B at port 0x350. Older versions of the code eliminate the BOARD_* entry. The symbolic IRQs IRQ_NONE and IRQ_AUTO may be used. kernel command line: o ncr5380=port,irq o ncr5380=port,irq,dma o ncr53c400=port,irq 255 may be used for no irq, 254 for irq autoprobe. Common Problems: 1. Using the T130B board with the old (pre public release 6) generic NCR5380 driver which doesn't support the ncr53c400 command line option. The NCR5380 compatable registers are offset eight from the base address. So, if your address is 0x350, use ncr53480=0x358,254 on the kernel command line. Antiquity problems, fix by upgrading : 1. The kernel locks up during disk access with T130B or other NCR53c400 boards Pre-public release 6 versions of the Generic NCR5380 driver didn't support interrupts on these boards. Upgrade. Notes: the generic driver doesn't support DMA yet, and pseudo-DMA isn't supported in the generic driver. 4.11. NCR53c8xx (Standard) Supported and Unsupported Configurations: Base addresses: ALL IRQs: ALL DMA channels: PCI, not applicable IO: port mapped, busmastering Autoprobe: requires PCI BIOS, uses PCI BIOS routines to search for devices and read configuration space The driver uses the pre-programmed values in some registers for initialization, so a BIOS must be installed. Antiquity Problems, fix by upgrading: 1. Older versions of Linux had a problem with swapping See Section ``'':System Hangs When Swapping 2. Older versions of Linux didn't recognize '815 and '825 boards. Common Problems: 1. Many people have encountered problems where the chip worked fine under DOS, but failed under Linux with a timeout on test 1 due to a lost interrupt. This is often due to a mismatch between the IRQ hardware jumper for a slot or mainboard device and the value set in the CMOS setup. It may also be due to PCI INTB, INTC, or INTD being selected on a PCI board in a system which only supports PCI INTA. Finally, PCI should be using level-sensitive rather than edge triggered interrupts. Check that your board is jumpered for level-sensitive, and if that fails try edge-triggered because your system may be broken. This problem is especially common with Viglen some Viglen motherboards, where the mainboard IRQ jumper settings are NOT as documented in the manual. I've been told that what claims to be IRQ5 is really IRQ9, your mileage will vary. 2. Lockups occur when using an S3928P, X11, and the NCR chip at the same time. There are hardware bugs in at least some S3928P chip. Don't do this. 3. You get a message on boot up indicating that the I/O mapping was disabled because base address 0 bits 0..1 indicated a non I/O mapping This is due to a BIOS bug in some machines which results in dword reads of configuration regsisters returning the high and low 16 bit words swapped. 4. Some systems have problems if PCI write posting, or CPU->PCI buffering are enabled. If you have problems, disable these options. 5. Some systems with the NCR SDMS software in an onboard BIOS ROM and in the system BIOS are unable to boot DOS. Disabling the image in one place should rectify this problem. 6. Some systems have hideous, broken, BIOS chips. Don't make any bug reports until you've made sure you have the newest ROM from your vendor. o Intel P90 boards require revision 1.00.04.AX1 4.12. Seagate ST0x/Future Domain TMC-8xx/TMC-9xx Supported and Unsupported Configurations : Base addresses: 0xc8000, 0xca000, 0xcc000, 0xce000, 0xdc000, 0xde000 IRQs: 3, 5 DMA: not used IO: memory mapped Autoprobe: probes for address only, IRQ is assumed to be 5, requires installed BIOS. Autoprobe Override: Compile time: Define OVERRIDE to be the base address, CONTROLLER to FD or SEAGATE as appropriate, and IRQ to the IRQ. kernel command line: st0x=address,irq or tmc8xx=address,irq (only works for .99.13b and newer) Antiquity Problems, fix by upgrading: 1. . Versions prior to the one in the Linux .99.12 kernel had a problem handshaking with some slow devices, where This is what happens when you write data out to the bus a. Write byte to data register, data register is asserted to bus b. time_remaining = 12us c. wait while time_remaining > 0 and REQ is not asserted d. if time_remaining > 0, assert ACK e. wait while time remaining > 0 and REQ is asserted f. deassert ACK The problem was encountered in slow devices that do the com- mand processing as they read the command, where the REQ/ACK handshake takes over 12us - REQ didn't go false when the driver expected it to, so the driver ended up sending multi- ple bytes of data for each REQ pulse. 2. With Linux .99.12, a bug was introduced when I fixed the arbitration code, resulting in failed selections on some systems. This was fixed in .99.13. 4.12.1. Common Problems 4.12.1.1. Command Timeouts There are command timeouts when Linux attempts to read the partition table or do other disk access. The board ships with the defaults set up for MSDOS, ie interrupts are disabled. To jumper the board for interrupts, on the Seagate use jumper W3 (ST01) or JP3 (ST02) and short pins F-G to select IRQ 5. 4.12.1.2. Some Devices Don't Work The driver can't handle some devices, particularly cheap SCSI tapes and CDROMs. The Seagate ties the SCSI bus REQ/ACK handshaking into the PC bus IO CHANNEL READY and (optionally) 0WS signals. Unfortunately, it doesn't tell you when the watchdog timer runs out, and you have no way of knowing for certain that REQ went low, and may end up seeing one REQ pulse as multiple REQ pulses. Dealing with this means using a tight loop to look for REQ to go low, with a timeout incase you don't catch the transition due to an interrupt, etc. This results in a performance decrease, so it would be undesireable to apply this to all SCSI devices. Instead, it is selected on a per-device basis with the ``borken'' field for the given SCSI device in the scsi_devices array. If you run into problems, you should try adding your device to the list of devices for which borken is not reset to zero (currently, only the TENEX CDROM drives). 4.12.1.3. Future Domain does not work A future domain board (specific examples include the 840,841, 880, and 881) doesn't work. A few of the Future domain boards use the Seagate register mapping, and have the MSG and CD bits of the status register flipped. You should edit seagate.h, swapping the definitions for STAT_MSG and STAT_CD, and recompile the kernel with CONTROLLER defined to SEAGATE and an appropriate IRQ and OVERRIDE specified. 4.12.1.4. HDIO_REQ or HDIO_GETGEO failed When attempting to fdisk your drive, you get error messages indicating that the HDIO_REQ or HDIO_GETGEO ioctl failed, or You must set heads sectors and cylinders. You can do this from the extra functions menu. See Section ``'': Partitioning 4.12.1.5. Fdisk fails After manually specifying the drive geometry, subsequent attempts to read the partition table result in partition boundary not on a cylinder boundary, physical and logical boundaries don't match, etc. error messages. See Section ``'': Partitioning 4.12.1.6. Used to work but now it doesn't Some systems which worked prior to .99.13 fail with newer versions of Linux. Older versions of Linux assigned the CONTROL and DATA registers in an order different than that outlined in the Seagate documentation, which broke on some systems. Newer versions make the assignment in the correct way, but this breaks other systems. The code in seagate.c looks like this now : ______________________________________________________________________ cli(); DATA = (unsigned char) ((1 < 1024 message. When partitioning, you get a warning message about ``cylinder > 1024'' or you are unable to boot from a partition including a logical cylinder past logical cylinder 1024. This is a BIOS limitation. See Section ``'': Disk Geometry and Partitioning for a n explanation. 5.2.2. You are unable to partition ``/dev/hd*'' /dev/hd* aren't SCSI devices, /dev/sd* are. See Section ``'': Device files, and Section ``'', Disk Geometry and Partitioning for the correct device names and partitioning procedure. 5.2.3. Unable to eject media from a removeable media drive. Linux attempts to lock the drive door when a piece of media is mounted to prevent filesystem corruption due to an inadvertant media change. Please unmount your disks before ejecting them. 5.2.4. Unable to boot using LILO from a SCSI disk In some cases, the SCSI driver and BIOS will disagree over the correct BIOS mapping to use, and will result in LILO hanging after 'LI' at boot time and/or other problems. To workarround this, you'll have to determine your BIOS geometry mapping used under DOS, and make an entry for your disk in /etc/lilo/disktab. Alternatively, you may be able to use the ``linear'' configuration file option. 5.2.5. Fdisk responds with You must set heads sectors and cylinders. You can do this from the extra functions menu. and disk geometry is not 'remembered' when fdisk is rerun. See Section ``'': Partitioning 5.2.6. Only one drive is detected on a bridge board with multiple drives connected. Linux won't search LUNs past zero on SCSI devices which predate ANSI SCSI revision 1. If you wish devices on alternate LUNs to be recognized, you will have to modify drivers/scsi/scsi.c:scan_scsis(). 5.2.7. System Hangs When Swapping We think this has been fixed, try upgrading to 1.1.38. 5.2.8. Connor CFP1060S disks get corrupted This is due to a microcode bug in the read-ahead and caching code. >From Soenke Behrens of Conner tech. support: During the past few weeks, we got several calls from customers stating that they had severe problems with Conner CFP1060x 1GB SCSI drives using the Linux operating system. Symptoms were corrupt filesystems (damaged inodes) reported by e2fsck on each system boot and similar errors. There is now a fix available for customers with a CFP1060x (microcode revisions 9WA1.62/1.66/1.68) and Linux. To apply the upgrade, you will need a DOS boot disk and ASPI drivers that can access the hard drive. The upgrade downloads new queuing and lookahead code into the non-volatile SCSI RAM of the drive. If you are experiencing problems with a disk that has microcode revision 9WA1.60, you will have to contact your nearest Conner service centre to get the disk upgraded. The microcode revision can be found on the label of the drive and on the underside of the drive on a label on one of the ICs. If you are confident that you can perform the upgrade yourself, please contact Conner Technical Support and have your microcode revision ready. Conner Technical Support Europe can be reached on +44-1294-315333, Conner Technical Support in the USA can be reached on 1-800-4CONNER. Regards Soenke Behrens European Technical Support 5.3. Device Files SCSI disks use block device major 8, and there are no ``raw'' devices ala BSD. 16 minor numbers are allocated to each SCSI disk, with minor % 16 == 0 being the whole disk, minors 1 < = (minor % 16) < = 4 the four prima ry partitions, minors 5 < = (minor % 16) < = 15 any extended partitions. Due to constraints imposed by Linux's use of a sixteen bit dev_t with only eight bits allocated to the minor number, the SCSI disk minor numbers are assigned dynamically starting with the lowest SCSI HOST/ID/LUN. Ie, a configuration may work out like this (with one host adapter) Device Target Lun SCSI disk 84M Seagate 0 0 /dev/sda SCSI-> SMD bridge disk 0 3 0 /dev/sdb SCSI-> SMD bridge disk 1 3 1 /dev/sdc Wangtek tape 4 0 none 213M Maxtor 6 0 /dev/sdd etc. The standard naming convention is /dev/sd{letter} for the entire disk device ((minor % 16) == 0) /dev/sd{letter}{partition} for the partitions on that device (1 < = (minor % 16) < = 15) Ie Device type major minor /dev/sda block 8 0 /dev/sda1 block 8 1 /dev/sda2 block 8 2 /dev/sda3 block 8 3 etc. 5.4. Partitioning You can partition your SCSI disks using the partitioning program of your choice, under DOS, OS/2, Linux or any other operating system supporting the standard partitioning scheme. The correct way to run the Linux fdisk program is by specifying the device on the command line. Ie, to partition the first SCSI disk, fdisk /dev/sda If you don't explicitly specify the device, the partitioning program may default to /dev/hda, which isn't a SCSI disk. In some cases, fdisk will respond with You must set heads sectors and cylinders. You can do this from the extra functions menu. Command (m for help): and/or give a message to the effect that the HDIO_REQ or HDIO_GETGEO ioctl failed. In these cases, you must manually specify the disk geometry as outlined in Subsection ``'': Disk Geometry when running fdisk, and also in /etc/disktab if you wish to boot kernels off that disk with LILO. If you have manually specified the disk geometry, subsequent attempts to run fdisk will give the same error message. This is normal, since PCs don't store the disk geometry information in the partition table. In and of itself, will cause _NO PROBLEMS_, and you will have no problems accessing partitions you created on the drive with Linux. Some vendors' poor installation code will choke on this, in which case you should contact your vendor and insist that they fix the code. In some cases, you will get a warning message about a partition ending past cylinder 1024. If you create one of these partitions, you will be unable to boot Linux kernels off of that partition using LILO. Note, however, that this restriction does not preclude the creation of a root partition partially or entirely above the 1024 cylinder mark, since it is possible to create a small /boot partition below the 1024 cylinder mark or to boot kernels off existing partitions. 5.5. Disk Geometry Under Linux, each disk is viewed as the SCSI host adapter sees it : N blocks, numbered from 0 to N-1, all error free, where as DOS/BIOS predate intelligent disks and apply an arbitrary head / cylinder / sector mapping to this linear addressing. This can pose a problem when you partition the drives under Linux, since there is no portable way to get DOS/BIOS's idea of the mapped geometry. In most cases, a HDIO_GETGEO ioctl() can be implemented to return this mapping. Unfortunately, when the vendor (ie Seagate) has chosen a perverse, non-standard, and undocumented mapping, this is not possible and geometry must be manually specified If manual specification of the is required, you have one of several options : 1. If you don't care about using DOS, or booting kernels from the drive with LILO, create a translation such that heads * cylinders * sectors * 512 < size of your drive in bytes (a megabyte is defined as 2^20 bytes). 1 < = heads < = 256 1 < = cylinders < = 1024 1 < = sectors < = 63 2. Use the BIOS mapping. In some cases, this will mean reconfiguring the disk so that it is at SCSI ID 0, and disabling the second IDE drive (if you have one). You can either use a program like NU, or you can use the following program: ___________________________________________________________________ begin 664 dparam.com MBAZ``##_B+^!`+N!`(H'0SP@=/D\,'5:@#]X=`6`/UAU4(!_`3AU2H!_`P!U M1(I7`H#J,(#Z`7L6N]T!,=*Y M"@#W\8#",$N(%PG`=>^)VK0)S2'#=7-A9V4Z(&1P87)A;2`P>#@P#0H@("!O L These programs may work with some of the non-SCSI cdrom drives if the driver implements the same ioctls as the scsi drivers. 6.3. Device Files SCSI CDROMs use major 11. Minors are allocated dynamically (See Section 4: Disks, Subsection 4.3: Device Files for an example) with the first CDROM found being minor 0, the second minor 1, etc. The standard naming convention is /dev/sr{digit} ie /dev/sr0 /dev/sr1 etc. 7. Tapes This setion gives information that is specific to scsi tape drives. 7.1. Supported and Unsupported Hardware Drives using both fixed and variable length blocks smaller than the the driver buffer length (set to 32K in the distribution sources) are supported. Parameters (block size, buffering, density) are set with ioctls (usually with the mt program), and remain in effect after the device is closed and reopened. Virtually all drives should work, including : o Archive Viper QIC drives, including the 150M and 525M models o Exabyte 8mm drives o Wangtek 5150S drives o Wangdat DAT drives 7.2. Common Problems 7.2.1. Tape drive not recognized at boot time. Try booting with a tape in the drive. 7.2.2. Tapes with multiple files cannot be read properly. When reading a tape with multiple files, the first tar is successful, a second tar fails silently, and retrying the second tar is successful. User level programs, such as tar, don't understand file marks. The first tar reads up until the end of the file. The second tar attempts to read at the file mark, gets nothing, but the tape spaces over the file mark. The third tar is successful since the tape is at the start of the next file. Use mt on the no-rewind device to space forward to the next file. 7.2.3. Decompression fails. Decompressing programs cannot handle the zeros padding the last block of the file. To prevent warnings and errors, wrap your compressed files in a .tar file - ie, rather than doing tar cfvz /dev/nrst0 file.1 file.2 ... do tar cfvz tmp.tar.z file.1 file.2 ... tar cf /dev/nrst0 tmp.tar.z 7.2.4. Problems taking tapes to/from other systems. You can't read a tape made with another operating system or another operating system can't read a tape written in Linux. Different systems often use different block sizes. On a tape device using a fixed blocksize, you will get errors when reading blocks written using a different block size. To read these tapes, you must set the blocksize of the tape driver to match the blocksize used when the tape was written, or to variable. Note: this is the hardware block size, not the blocking factor used with tar, dump, etc. You can do this with the mt command - mt setblk or mt setblk 0 to get variable block length support. Note that these mt flags are NOT supported under the GNU version of mt which is included with some Linux distributions. Instead, you must use the BSD derrived Linux SCSI mt command. Source should be available from 7.2.5. ``No such device'' error message. All attempts to access the tape result in a ``No such device'' or similar error message. Check the type of your tape device - it must be a character device with major and minor numbers matching those specified in subsection C, Device Files. 7.2.6. Tape reads at a given density work, writes fail Many tape drives support reading at lower densities for compatability with older harware, but will not write at those same densities. This is especially the case with QIC drives, which will read old 60M tapes but only write new 120, 150, 250, and 525M formats. 7.3. Device Files SCSI tapes use character device major 9. Due to constraints imposed by Linux's use of a sixteen bit dev_t with only eight bits allocated to the minor number, the SCSI tape minor numbers are assigned dynamically starting with the lowest SCSI HOST/ID/LUN. Rewinding devices are numbered from 0 - with the first SCSI tape, /dev/rst0 being c 9 0, the second /dev/rst1 c 9 1, etc. Non-rewinding devices have the high bit set in the minor number, ie /dev/nrst0 is c 9 128. The standard naming convention is /dev/nrst{digit} for non-rewinding devices /dev/rst{digit} for rewinding devices 8. Generic This information gives information that is specific to the generic scsi driver. 8.1. Supported Hardware The Generic SCSI device driver provides an interface for sending SCSI commands to all SCSI devices - disks, tapes, CDROMs, media changer robots, etc. Everything electrically compatable with your SCSI board should work. 8.2. Common Problems None :-). 8.3. Device Files SCSI generic devices use character major 21. Due to constraints imposed by Linux's use of a 16 bit dev_t, minor numbers are dynamically assigned from 0, one per device, with /dev/sg0 corresponding to the lowest numerical target/lun on the first SCSI board. 9. Buyers' Guide A frequent question is: ``Linux supports quite a number of different boards, so which SCSI host adapter should I get.'' The answer depends upon how much performance you expect or need, motherboard, and the scsi peripherals that you plan on attaching to your machine. 9.1. Transfer types The biggest factor affecting performance (in terms of throughput and interactive response time during SCSI I/O) is going to be the transfer type used. 9.1.1. Pure Polled handshaking. A pure polled I/O board will use the CPU to handle all of the SCSI processing, including the REQ/ACK Even a fast CPU will be slower handling the REQ/ACK handshake sequence than a simple finite state machine, resulting in peak transfer rates of about 150K/sec on a fast machine, perhaps 60K/sec on a slow machine (through the filesystem). The driver also must sit in a tight loop as long as the SCSI bus is busy, resulting in near 100% CPU utilitization and extremely poor responsiveness during SCSI/IO. Slow CDROMs which don't disconnect/reconnect will kill interactive performance with these boards. Not recommended. 9.1.2. Interlocked Polled handshaking Boards using interlocked polled I/O are essentially the same as pure polled I/O boards, only the SCSI REQ/ACK signals the PC bus handshaking signals. All SCSI processing beyond the handshaking is handled by the CPU. Peak transfer rates of 500-600K/sec through the filesystem re possible on these boards. As with pure polled I/O boards, the driver must sit in a tight loop as long as the SCSI bus is busy, resulting in CPU utilization dependant on the transfer rates of the devices, and when they disconnect/reconnect. CPU utilization may vary between 25% for single speed CDs which handle disconnect/reconnect properly to 100% for faster drives or broken CDROMs which fail to disconnect/reconnect. On my 486-66, with a T128, I use 90% of my CPU time to sustain a throughput of 547K/sec on a drive with a headrate of 1080K/sec with a T128 board. Sometimes acceptable for slow tapes and CDROMs when low cost is essential. 9.1.3. FIFO Polled Boards using FIFO polled I/O put a small (typically 8K) buffer between the CPU and the SCSI bus, and often implement some amount of intelligence. The net effect is that the CPU is only tied up when it is transfering data at top speed to the FIFO and when it's handling the rest of the interrupt processing for FIFO empty conditions, disconnect/reconnect, etc. Peak transfer rates should be sufficient to handle most SCSI devices, and have been measured at up to 4M/sec using raw SCSI commands to read 64K blocks on a fast Seagate Barcuda with an Adaptec 1520. CPU utilization is dependant on the transfer rates of the devices, with faster devices generating more interrupts per unit time which require more CPU processing time. Although CPU usage may be high (perhaps 75%) with fast devices, the system usually remains usable. These boards will provide excellent interactive performance with broken devices which don't disconnect/reconnect (typically cheap CDROM drives) Recommended for persons on a budget. 9.1.4. Slave DMA Drivers for boards using slave DMA program the PC's DMA controller for a channel when they do a data transfer, and return control to the CPU. Peak transfer rates are usually handicapped by the poor DMA controller used on PCs, with one such 8-bit board having problems going faster than 140-150K/sec with one mainboard. CPU utilization is very reasonable, slightly less than what is seen with FIFO polled I/O boards. These boards are very tollerant of broken devices which don't disconnect/reconnect (typically cheap CSG limit DROM drives). Acceptable for slow CDROM drives, tapes, etc. 9.1.5. Busmastering DMA These boards are intelligent. Drivers for these boards throw a SCSI command, the destination target and lun, and where the data should end upin a structure, and tell the board ``Hey, I have a command for you.'' The driver returns control to various running programs, and eventually the SCSI board gets back and says that it's done. Since the intelligence is in the host adapter firmware and not the driver, drivers for these boards typically support more features - synchronous transfers, tagged queing, etc. With the clustered read/write patches, peak transfer rates through the file system approach 100% of head rate writing, 75% reading. CPU utilization is minimal, irregardless of I/O load, with a measured 5% CPU usage while accessing a double speed CDROM on an Adaptec 1540 and 20% while sustaining a 1.2M/sec transfer rate on a SCSI disk. Recommended in all cases where money is not extremely tight, the main board is not broken (some broken main boards do not work with bus masters), and applications where time to data is more important than throughput are not being run (bus master overhead may hit 3-4ms per command). 9.2. Scatter/gather The second most important driver/hardware feature with respect to performance is support for scatter/gather I/O. The overhead of executing a SCSI command is significant - on the order of milliseconds. Intelligent bus masters like the Adaptec 1540 may take 3-4ms to process a SCSI command before the target even sees it. On unbuffered devices, this overhead is allways enough to slip a revolution, resulting in a transfer rate of about 60K/sec (assuming a 3600RPM drive) per block transfered at a time. So, to maximize performance, it is necessary to minimize the number of SCSI commands needed to transfer a given amount of data by transfering more data per command. Due to the design of the Linux buffer cache, contiguous disk blocks are not contiguous in memory. With the clustered read/write patches, 4K worth of buffers are contiguous. So, the maximum amount of data which can be transfered per SCSI command is going to be 1K * # of scatter/gather regions without the clustered read/write patches, 4K * # of regions with. Experimentally, we've determined that 64K is a reasonable amount to transfer with a single SCSI command - meaning 64 scatter/gather buffers with clustered read/write patches, 16 without. With the change from 16K to 64K transfers, we saw an improvement from 50% of headrate, through the filesystem, reading and writing, to 75% and 100% respectively using an Adaptec 1540 series board. 9.3. Mailbox vs. non-mailbox A number of intelligent host adapters, such as the Ultrastor, WD7000, Adaptec 1540, 1740, and Buslogic boards have used a mailbox-metaphor interface, where SCSI commands are executed by putting a SCSI command structure in a fixed memory location (mailbox), signaling the board (ie, raising the outgoing mail flag), and waiting for a return (incoming mail). With this high level programming interface, users can often upgrade to a newer board revision to take advantage of new features, such as FAST + WIDE SCSI, without software changes. Drivers tend to be simpler to implement, may implement a larger feature set, and may be more stable. Other intelligent host adapters, such as the NCR53c7/8xx family, and Adaptec AIC-7770/7870 chips (including the 274x, 284x, and 2940 boards) use a lower level programming interface. This may prove faster since processing can be shifted between the board's processor and faster host CPU, allow better flexibility in implementing certain features (ie, target mode for arbitrary devices), and these boards can be built for less money (In some cases, this is passed on to the consumer (ie, most NCR boards)). On the down side, drivers tend to be more complex, and must be modified to take advantage of the features present on newer chips. 9.4. Bus types Bus type is the next thing to consider, with choices including ISA, EISA, VESA, and PCI. Marketing types often spout of absurd bandwidth numbers based on burst transfer rates and fiction, which isn't very useful. Instead, I've chosen to state ``real-world'' numbers based on measured performance with various peripherials. 9.4.1. ISA Bandwidth is slightly better than 5M/sec for busmastering devices. With an ISA bus, arbitration for busmasters is performed by the venerable 8237 third party DMA controller, resulting in relatively high bus aquisition times. Interrupt drivers are tri-state and edge triggered, meaning interrupts cannot be shared. Generally, ISA is unbuffered, meaning the host/memory bus is tied up whenever a transfer is occuring. No mechanism is provided to prevent bus-hogging. 9.4.2. VESA Bandwidth is about 30M/sec. Some VESA systems run the bus out of spec, rendering them incompatable with some boards, so this should be taken into consideration before purchasing hardware without a return guarantee. Generally, VESA is unbuffered, meaning meaning the host/memory bus is tied up whenever a transfer is occuring. 9.4.3. EISA Bandwidth is about 30M/sec, with busmastering operations generally being faster than VESA. Some EISA systems buffer the bus, allowing burst transfers to the faster host/memory bus and minimizing impact on CPU performance. EISA interrupt drivers may be either tri-state edge- triggered or open collector level-active, allowing interrupt sharing with drivers that support it. Since EISA allocates a separate address space for each board, it is usually less prone to resource conflicts than ISA or VESA. 9.4.4. PCI Bandwidth is about 60M/sec. Most PCI systems implement write posting buffers on the host bridge, allowing speed mismatches on either side to have a minimum impact on bus/CPU performance. PCI interrupt drivers are open collector level-active, allowing interrupt sharing with drivers that support it. Mechanisms are provided to prevent bus hogging, and for both master and slave to suspend a bus-mastering operation. Since PCI provides a plug-n-play mechanism with writeable configuration registers on every board, in a separate address space, a propperly implemented PCI system is plug-and play. PCI is extremely strict as to trace length, loading, mechanical specifications, etc. and ultimately should be more reliable than VESA or ISA. In summary, PCI is the best PC bus, although it does have its dark side. PCI is still in its infancy, and although most manufacturers have ironed out the problems, there is still stock of older, buggy PCI hardware and broken main BIOSes. For this reason, I _strongly_ recommend a return guarantee on the hardware. While the latest PCI mainboards are truly plug-and-play, older PCI boards may require the user to set options with both jumpers and in software (ie, interrupt assignments). Although many users have resolved their PCI problems, it has taken time and for this reason I cannot recommend a PCI purchase if having the system operational is extremely time critical. For many slower SCSI devices, such as disks with head rates arround 2M/sec or less, CDROMs, and tapes, there will be little difference in throughputs with the different PC bus interfaces. For faster contemporary SCSI drives (Typical high end multi-gigabyte drives have a head rate of 4-5M/sec, and at least one company is currently ALPHA testing a parallel head unit with a 14M/sec head rate), throughput will often be significantly better with controllers on faster busses, with one user noting a 2.5 fold performance improvement when going from an Adaptec 1542 ISA board to a NCR53c810 PCI board. With the exception of situations where PCI write-posting or a similar write-buffering mechanism is being used, when one of the busses in your system is busy, all of the busses will be unaccessable. So, although bus saturation may not be interfering with SCSI performance, it may have a negative effect on interactive performance. Ie, if you have a 4M/sec SCSI disk under ISA, you'll have lost 80% of your bandwidth, and in an ISA/VESA system would only be able to bitblt at 6M/sec. In most cases, a similar impact on processing jobs in the background would also be felt. Note that having over 16M of memory does not preclude using an ISA busmastering SCSI board. Unlike various broken operating systems, Linux will double buffer when using a DMA with an ISA controller and a transfer is ultimately destined for an area above 16M. Performance on these transfers only suffers by about 1.5% , ie not noticably. Finally, the price difference between bus masters offered with the different bus interfaces is often minimal. With all that in mind, based on your priorities you will have certain bus preferences. Stability, time critical installations and poor return policies EISA ISA VESA PCI Performance, and typical hobbiest installations PCI EISA VESA ISA As I pointed out earlier, bus mastering versus other transfer modes is going to have a bigger impact on total system performance, and should be considered more important than bus type when purchasing a SCSI controller. 9.5. Multiple devices If will you have multiple devices on your SCSI bus, you may want to see whether the host adapter/driver that you are considering supports more than one outstanding command at one time. This is very important if you are mixing devices of different speeds, like a tape drive and a disk drive. If the linux driver only supports one outstanding command, you may be locked out of your disk drive while a tape in the tape drive is rewinding, for example. With two disk drives, the problem will not be as noticeable, allthough throughput would approach the average of the two transfer rates rather than the sum of the two transfer rates. 9.6. SCSI-I, SCSI-II, SCSI-III FAST and WIDE options, etc. Over the years, SCSI has evolved, with new revisions of the standard introducing higher transfer rates, methods to increase throughput, standardized commands for new devices, and new commands for previously supported devices. In and of themselves, the revision levels don't really mean anything. Excepting minor things like SCSI-II not allowing the single initiator option of SCSI-I, SCSI is backwards compatable, with new features being introduced as options and not mandatory. So, the descision to call a SCSI adapter SCSI, SCSI-II, or SCSI-III is almost entirely a marketing one. 9.7. Driver feature comparison Driver feature comparison (supported chips are listed in parenthesis) Driver Transfer mode Simultaneous Commands SG >1 total/LUN Limit aha152x FIFO(8k) Polled 1s/1s 255s (AIC6260, AIC6360) aha1542 Busmastering DMA 8s/1s 16 Y aha1740 Busmastering DMA 32s 16 aha274x Busmastering DMA 4s/1s 255s Y buslogic Busmastering DMA Y 64s, 8196h eata dma Busmastering DMA 64s/16s 64s Y fdomain FIFO(8k) Polled 1s 64s TMC1800 except TMC18c30 TMC18c30, with 2k FIFO TMC18c50, TMC36c70 in2000* FIFO(2k) Polled 1s 255s g NCR5380 Pure Polled 16s/2s 255s Y (NCR5380, NCR53c80, NCR5381, NCR53c400) gsi8* Slave DMA 16s/2s 255s (NCR5380) PAS16 Pure Polled or 16s/2s 255s Y (NCR5380) Interlocked Polled (fails on some systems!) seagate Interlocked Polled 1s 255s N wd7000 Busmastering DMA 8s 1 t128 Interlocked Polled 16s 255s Y (NCR5380) ultrastor Busmastering DMA Y 53c7,8xx Busmastering DMA 1s/1s 255s Y (NCR53c810) Notes : 1. drivers flagged with an '*' are not included with the distribution kernel, and binary boot images may be unavailable. 2. numbers suffixed with an 's' are arbitrary limits set in software which may be changed with a compile time define. 3. hardware limits are indicated by an 'h' suffix, and may differ from the software limits currently imposed by the Linux drivers. 4. unsuffixed numbers may indicate either hard or soft limits. 9.8. Board comparison Board Driver Bus Price Notes Adaptec AIC-6260 aha152x ISA chip, not board Adaptec AIC-6360 aha152x VLB chip, not board (Used in most VESA/ISA multi-IO boards with SCSI, Zenon mainboards) Adaptec 1520 aha152x ISA Adaptec 1522 aha152x ISA $80 1520 w/FDC Adaptec 1510 aha152x ISA 1520 w/out boot ROM, won't autoprobe. Adaptec 1540C aha1542 ISA Adaptec 1542C aha1542 ISA 1540C w/FDC Adaptec 1540CF aha1542 ISA FAST SCSI-II Adaptec 1542CF aha1542 ISA $200 1540CF w/FDC Adaptec 1740 aha1740 EISA discontinued Adaptec 1742 aha1740 EISA discontinued, 1740 w/FDC Adaptec 2740 aha274x EISA Adaptec 2742 aha274x EISA w/FDC Adaptec 2840 aha274x VLB Adaptec 2842 aha274x VLB w/FDC Always IN2000 in2000 ISA Buslogic 445S aha1542, VLB $250 FAST SCSI-II, active buslogic termination, w/FDC Buslogic 747S aha1542 EISA FAST SCSI-II, active buslogic termination, w/FDC Buslogic 946S buslogic PCI FAST SCSI-II, active termination. DPT PM2011 eata dma ISA FAST SCSI-II DPT PM2012A eata dma EISA FAST SCSI-II DPT PM2012B eata dma EISA FAST SCSI-II DPT PM2021 eata dma ISA $245 FAST SCSI-II DPT PM2022 eata dma EISA $449 FAST SCSI-II active termination DPT PM2024 eata dma PCI $395 FAST SCSI-II active termination DPT PM2122 eata dma EISA $595 FAST SCSI-II active termination DPT PM2124 eata dma PCI $595 FAST SCSI-II active termination DPT PM2322 eata dma EISA FAST SCSI-II active termination DPT PM3021 eata dma ISA $1595 FAST SCSI-II multichannel raid/simm sockets active termination DPT PM3122 eata dma EISA $1795 FAST SCSI-II multichannel/raid active termination DPT PM3222 eata dma EISA $1795 FAST SCSI-II multichannel raid/simm sockets active termination DPT PM3224 eata dma PCI $1995 FAST SCSI-II multichannel raid/simm sockets active termination DPT DTC 3 aha1542 EISA Although it should work, due to documentation release polcies, DTC hardware is unsupported DTC 3292 aha1542 EISA 3290 w/FDC DTC 3292 aha1542 EISA 3290 w/FDC Future Domain 1680 fdomain ISA FDC Future Domain 3260 fdomain PCI NCR53c810 (boards sold 53c7,8xx PCI $70 +chip, not board. Boards by FIC, Chaintech, (board) don't include Nextor, Gigabyte, etc. BIOS, although most Mainboards with chip by non-NCR equipped main AMI, ASUS, J-Bond, boards have the SDMS etc. Common in DEC BIOS PCI systems) NCR53c815 ( 53c7,8xx PCI $115 NCR53c810 plus Intel PCISCSIKIT, bios NCR8150S, etc) NCR53c825 53c7,8xx PCI Wide variant of NCR53c815. Note that the current Linux driver does not negotiate for wide transfers. Pro Audio Spectrum 16 pas16 ISA Sound board w/SCSI Seagate ST01 seagate ISA $20 IOS only works with some drives Seagate ST02 seagate ISA $40 ST01 w/FDC Sound Blaster 16 SCSI aha152x ISA Sound board w/SCSI Western Digital 7000 wd7000 ISA w/FDC Trantor T128 t128 ISA Trantor T128F t128 ISA T128 w/FDC and support for high IRQs Trantor T130B g NCR5380 ISA Ultrastor 14F ultrastor ISA w/FDC Ultrastor 24F ultrastor EISA w/FDC Ultrastor 34F ultrastor VLB Notes: 1. Trantor was recently purchased by Adaptec, and some products are being sold under the Adaptec name. 2. Ultrastor recently filed for Chapter 11 Bankruptcy, so technical support is non-existant at this time. 3. Various Buslogic boards other than the 545S, 445S, 747S, and 946S _should_ work, although to my knowledge have not been tried. 4. The $70 price for the busmastering NCR53c810 boards is not a typo, includes the standard ASPI/CAM driver package for DOS, OS/2 and Windows (32 bit access), and other drivers are available for free download. If you can't find one at that price, try Technoland at 1-800-292-4500 or 1-408-992-0888 if you live in California, InteliSys at (703)385-0347, Superpower 1 (800) 736-0007, SW (swt@netcom.com) 214-907-0871 fax 214-907-9339 Insight Electronics at 1-609-985-5556 stocks NCR8150S '815 boards for $115 if you don't have a NCR SDMS BIOS in your main ROM. 5. Adaptec's recent SCSI chips show an unusual sensitivity to cabling and termination problems. For this reason, I cannot recommend the Adaptec 154x C and CF revisions or the 2xxx series. Note that the reliability problems do not apply to the older 154x B revision boards, 174x A revision boards, or to my knowledge AIC-6360/AIC-6260 based boards. Also, the quality of their technical support has slipped markedly, with long delays becoming more common, and their employees being ignorant (suggesting there were non-disclosure policies affecting certain literature when there were none), and hostile (ie, refusing to pass questions on to some one else when they couldn't answer them). If users desire handholding, or wish to make a political statement, they should take this point into consideration. Otherwise, the Adaptec 152x/1510 are nicer than the other ISA boards in the same price range, and there are some excellent deals on used and surplus 154x B revision boards and 1742 boards which IMHO outweigh the support problems. 6. All given prices for the DPT controllers are official list prices. Street prices should be considerably lower. All boards can be upgraded with chache and raid modules, most of the boards are also available in Wide and/or Differential versions. 9.9. Summary Most ISA, EISA, and VESA users will probably be served best by a Buslogic board, due to its performance, features such as active termination, and Adaptec 1540 compatability. There are a number of models available with EISA, ISA, PCI, and VESA local bus interfaces, in single ended and differential, and 8/16 bit SCSI bus widths. People with PCI systems should consider NCR53c810 based boards. These are bus mastering SCSI controllers, available in Q1 for about $70 (ie, cheaper than the Adaptec 1520) with larger quantities being cheaper (I've seen $62 in Q20). In addition to being the cheapest PCI SCSI boards, the NCR boards were also benchmarked as faster than the Adaptec 2940 and Buslogic BT-946, and demonstrate excellent performance under Linux (up to 4M/sec through the file system ) inspite of the performance optomizations being disabled in the current driver. The disadvantages of these boards versus the Buslogics are that they aren't Adaptec 1540 compatable, don't come with active termination, and to my knowledge are only supported under DOS+Windows, OS/2, Windows NT, SCO, NeXTstep, and Free BSD. Currently, the Linux driver appears quite stable on most systems (We've moved several gigabytes of data to NCR based devices with no problems), surprisingly fast (I've seen 4M/sec through the filesystem) and will eventually become more featureful. On the downside, the current Linux driver implementation doesn't support disconnect/reconnect, so you will be unable to access your SCSI disks if rewinding,retensioning,etc. SCSI tapes at the same time. People wanting non-PCI SCSI on a limited budget will probably be happiest finding a surplus or used Adaptec 154x B revision or 174x A revision, or an Adaptec 1520 clone of some sort (about $80) if they want new hardware. These boards offer reasonable throughput and interactive performance at a modest price.