Google Search: The Unofficial Unix Administration Horror Stories Summary (part 3) Groups Advanced Groups Search Preferences Groups Help Groups search result 2 for The Unofficial Unix Administration Horror Stories Summary (part 3) Search Result 2 From: A.X. Ivasyuk (axi0349@ultb.isc.rit.edu) Subject: The Unofficial Unix Administration Horror Stories Summary (part 2) View: Complete Thread (2 articles) Original Format Newsgroups: comp.unix.admin Date: 1992-12-03 09:53:35 PST Part 2 of the Unix Administration Horror Stories Summary ----------------------------------------------------------------------------- From: kelley@epg.nist.gov (Mike Kelley) Organization: NIST Sometimes you just can't win . . . We have a cluster of HP workstations and, once upon a time, were using 1/4-tape as the backup medium. This was very slow and cumbersome, as we were forever increasing the amount of disk space on our system, and we decided to purchase HP's optical jukebox to use both as large removable media and as the primary backup device. We had been experiencing occasional problems with the 1/4-inch tape backups, but HP's hardware service engineer convinced us that the problems were resolved. A complete backup was performed prior to installation (by the HP engineer) of the jukebox. Two unfortunate things happened. First, the problems on our backup tapes were due to intermittent hardware problems on the tape drive which were not discovered by the extensive diagnostics performed on the tape drive. Second, the engineer installed the jukebox with the same hardware SCSI address as our root file system. As you may have anticipated, the attempt to mediainit the first optical cartridge resulted in a rather ungraceful failure of the root file system. This was compounded by the fact that much of the data on the backup tapes was not recoverable. ----------------------------------------------------------------------------- From: ericw@hobbes.amd.com (Eric Wedaa) Organization: Advanced Micro Devices, Inc. The moral(s) of the story here: -NEVER use 'rm ', use rm -i ' instead. -Do backups more often than you go to church. -Read the backup media at least as often as you go to church. -Set up your prompt to do a `pwd` everytime you cd. -Always do a `cd .` before doing anything. -DOCUMENT all your changes to the system (We use a text file called /Changes) -Don't nuke stuff you are not sure about. -Do major changes to the system on Saturday morning so you will have all weekend to fix it. -Have a shadow watching you when you do anything major. -Don't do systems work on a Friday afternoon. (or any other time when you are tired and not paying attention.) >>>Ericw (Paranoia is a "Good Thing" when you can really muck things up!) ----------------------------------------------------------------------------- From: rob@wzv.win.tue.nl (Rob J. Nauta) Organization: None mfraioli@grebyn.com (Marc Fraioli) writes: >Well, here's a good one for you: > I was happily churning along developing something on a Sun workstation, >and was getting a number of annoying permission denieds from trying to >write into a directory heirarchy that I didn't own. Getting tired of >that, I decided to set the permissions on that subtree to 777 while I >was working, so I wouldn't have to worry about it. At my previous employer, the sysadmin would create new user accounts by hand by editing the passwd file, create a home dir, put some files in it, and chown '*' and '.*' to that new user. Thus, /home/machine was also chowned ('.*' also matches '..'). It was quite handy to see who was added last, but after a while i slipped him the hint to chown '.Õa-zþ*' which works much better of course. But the stories told now are more folklore than real horror. Having read 2 Stephen Kings this weekend I beg everyone to tell more interesting stories, about demons, the system clock running backwards, old files reappearing etc ! Rob ----------------------------------------------------------------------------- From: alan@spuddy.uucp (Alan Saunders) Organization: Spuddy's Public Usenet Domain About inexperienced sysadmins .. One such had been on a Sun syasadmin course, and learned all about security. One of the topics was on file and group access. On his return, he decided to put what he had learned into practice, and changed the ownership of all files in /bin, /usr/bin to bin.bin! I was called in when no one could log in to the system (of course /bin/login needs to be setuid root!) Regards .. Alan ----------------------------------------------------------------------------- From: robjohn@ocdis01.UUCP (Contractor Bob Johnson) Organization: Tinker Air Force Base, Oklahoma >Arne Asplem (aras@multix.no) wrote: > I'm the program chair for a one day conference on Unix system > administration in Oslo in 3 weeks, including topics like network > management, system admininistration tools, integration, print/file-servers, > securitym, etc. > > I'm looking for actual horror stories of what have gone wrong because > of bad system administration, as an early morning wakeup. Management told us to email a security notice to every user on the our system (at that time, around 3000 users). A certain novice administrator on our system wanted to do it, so I instructed them to extract a list of users from /etc/passwd, write a simple shell loop to do the job, and throw it in the background. Here's what they wrote (bourne shell)... for USER in `cat user.list`; do mail $USER But the stories told now are more folklore than real horror. Having read > 2 Stephen Kings this weekend I beg everyone to tell more interesting > stories, about demons, the system clock running backwards, old files > reappearing etc ! I once had problems with files that mysteriously refused to stayed changed for very long. It was a PDP-11 Unix system that had crashed, and I brought it up single-user. I would change some file and it would stay changed for a minute or so but then revert to its earlier state (contents, protection mode, etc). What happened was that the write-protect switch on the disk drive had gotten bumped into the "on" position but the device driver failed to report any write errors. As long as the data stayed in kernel buffers the changes "took", but they would disappear once the buffers were reused and the system had to reread the disk. -greep ----------------------------------------------------------------------------- From: sam@bsu-cs.bsu.edu (B. Samuel Blanchard) Organization: Dept. of CS Ball State University Muncie IN #1 I never actually verified it but I think I deleted some of my bosses files as a very novice sysadmin. He found some things missing after I had a minor tangle with rm. When he ask I said I had run into a problem and he smiled and let it go. Sorry Raul! #2 I had a boss continue to reboot a dying system in an attempt to print out material for his conference presentation. He was not interested in waiting until I worked on the system; if he couldn't get it working, he assumed I couldn't. I quit :-( Then he quit :-) Then I spent a weeking fixing the system. :-0 <--words edited Some thing have improved there they tell me. Disclaimer: This is purely my interpretation and not intended to offend. It was my pre-assumption that you didn't read this group. #3 Recently had someone recover an old full backup over a running system. A manager 2 levels up noticed that our automatic backup, written by his staff, was failing far too often. Even worse, it did not always report errors. Since I was gone, he felt free to assign a manual backup to another group. The guy doing the "backup" called a member of his group at 8pm, that person finally called me at some un-goddly hour in the morning (I was glad he called!). The best part was the end result. We now do backups in our group. Don't you love how progress slaps you awake some times. ----------------------------------------------------------------------------- From: cjc@ulysses.att.com (Chris Calabrese) Organization: AT&T Bell Labs, Murray Hill, NJ, USA In article <7515@blue.cis.pitt.edu.UUCP> broadley@neurocog.lrdc.pitt.edu writes: >On a old decstation 3100 I was deleting last semesters users to try to >dig up some disk space, I also deleted some test users at the same time. > >One user took longer then usual, so I hit control-c and tried ls. >"ls: command not found" > >Turns out that the test user had / as the home directory and the remove >user script in ultrix just happily blew away the whole disk. >Õ...þ Reminds me of a bit of local folk-lore (this happened before I was in the admin group)... We have a home-grown admin system that controls accounts on all of our machines. It has a remove user operation that removes the user from all machines at the same time in the middle of the night. Well, one night, the thing goes off and tries to remove a user with the home directory '/'. All the machines went down, with varying ammounts of stuff missing (depending on how soon the script, rm, find, and other importing things were clobbered). Nobody knew what what was going on! The systems were restored from backup, and things seemed to be going OK, until the next night when the remove-user script was fired off by cron again. This time, Corporate Security was called in, and the admin group's supervisor was called back from his vacation (I think there's something in there about a helicopter picking the guy up from a rafting trip in the Grand Canyon). By chance, somebody checked the cron scripts, and all was well for the next night... ----------------------------------------------------------------------------- From: sam@bsu-cs.bsu.edu (B. Samuel Blanchard) Organization: Dept. of CS Ball State University Muncie IN Oh yea, I recalled 2 more kill -1 1 on an Altos SV box is not good. I pulled this one trying to show off. No more gettys appeared when uses logged off. When I went to the console, I calmly typed 0 to the Run Level request prompt. 2 would have been nice? It was my first SystemV like box, and it seemed to have such nice berkley commands. a control-s on a Sequent S27 console can cause processes to hang waiting to write to the console. Unfortunatly, su is one such process. No real problem since I don't blindly reboot on request ;-) ----------------------------------------------------------------------------- From: pete@tecc.co.uk (Pete Bentley) Organization: T.E.C.C. Ltd, London, England David J Dawkins (djd@csg.cs.reading.ac.uk) wrote: : About a year back, I was looking through /etc and found that a few : system files had world write permission. Gasping with horror, I went : to put it right with something like : : dipshit# chmod -r 664 /etc/* : A similar thing happened at a place a used to work 3 or 4 years back. The guys next door had just got a Sun 3/360 (or some such) to host a VME-bus image processing system - none of them knew much (or cared much) about Un*x and so early on a student on loan to them got a space in the wrong place and did pillock# chmod -r -x ~ /* with the same results (system in single user, refusing to run any commands or go multi-user). As it happened a) This was a government establishment, and so the order for the QIC tapes for backups had not yet been approved, hence no backups... b) The install script for the kernel drivers for the image processing stuff had not worked 'out of the box', and so the company had sent an engineer down to install it. I hadn't been around when he came and built their drivers, and they hadn't a clue what he had done. So, there was no way to rebuild the drivers without another engineer call and because of (a) there were no backups of the driver...Anyway, a complete reload was therefore out of the question. These were the days before SunOS on CD-ROM. In the end I managed to get the thing up by booting from tape, installing the miniroot into the swap partition and booting from that. This gave me a working tar and a working mount, but no chmod. Also no mt command. Also at this time very little of my Un*x experience was on Suns, so I had no idea of the layout of the distribution tape. Various experiments with dd and the non-rewinding tape device eventually found the file on the tape with a chmod I could extract. chmod +x /etc/* /bin/* /usr/bin/* on the system's existing disk was enough to make it bootable. After that I sat the student down with a SunOS manual and let him figure out the mess and correct the permissions that had been todged all over the system... Pete. ----------------------------------------------------------------------------- From: rca@Ingres.COM (Bob Arnold) Organization: Ask Computer Systems Inc., Ingres Division, Alameda CA 94501 In article <1992Oct12.233524.13463@pony.Ingres.COM> I wrote: >I was brave and bold, not to mention boneheaded, and formatted the user disk. > > Õ rest of story deleted ... Bob þ > >Morals: > 1) The "man" pages don't tell you everything you need to know. > 2) Don't do backups to floppies. > 3) Test your backups to make sure they are readable. > 4) Handle the format program (and anything else that writes directly > to disk devices) like nitroglycerine. > 5) Strenuously avoid systems with inadequate backup and restore > programs wherever possible (thank goodness for "restore" with > an "e"!). > 6) If you've never done sysadmin work before, take a formal > training class. Just thought of a few more related morals (managers pay attention now): 7) You get what you pay for. 8) There's no substutite for experience. 9) It's a lot less painful to learn from someone else's experience than your own (that's what this thread is about, I guess :-) ) Part of the story I should tell here. My employer had been looking for a way to cut costs. I was 15% cheaper than their previous sysadmin so they let him go and hired me. It wasn't as nasty as it sounds, since they kept him on as a consultant at 4 hours a week and he ended up with a better job too (so did I). Everyone benefited in the end. I leaned heavily on his consulting, which was great. He was older and wiser, and probably had his own horror stories to tell. After this one, so did I! Bob ----------------------------------------------------------------------------- From: rca@Ingres.COM (Bob Arnold) Organization: Ask Computer Systems Inc., Ingres Division, Alameda CA 94501 Many moons ago, in my first sysadmin job, learning via "on-the-job training", I was in charge of a UNIX box who's user disk developed a bad block. (Maybe you can see it already ...) The "format" man page seemed to indicate that it could repair bad blocks. (Can you see it now?) I read the man page very carefully. Nowhere did it indicate any kind of destructive behavior. I was brave and bold, not to mention boneheaded, and formatted the user disk. Heh. The good news: 1) The bad block was gone. 2) I was about to learn a lot real fast :-) The bad news: 1) The user data was gone too. 2) The users weren't happy, to say the least. Having recently made a full backup of the disk, I knew I was in for a miserable all day restore. Why all day? It took 8 hours to dump that disk to 40 floppies. And I had incrementals (levels 1, 2, 3, 4, and 5, which were another sign of my novice state) to layer on top of the full. Only it got worse. The floppy drive had intermittent problems reading some of the floppies. So I had to go back and retry to get the files which were missed on the first attempt. This was also a port of Version 7 UNIX (like I said, this was many moons ago). It had a program called "restor", primordial ancestor of BSD's "restore". If you used the "x" option to extract selected files (the ones missed on earlier attempts), "restor" would use the *inode number* as the name of the extracted files. You had to move the extracted files to their correct locations yourself (the man page said to write a shellscript to do this :-(). I didn't know much about shell scripts at the time, but I learned a lot more that week. Yes, it took me a full week, including the weekend, maybe 120 hours or more, to get what I could (probably 95% of the data) off the backups. And there were a few ownership and permissions problems to be cleaned up after that. Once burned twice shy. This is the only truly catastrophic mistake I've ever made as a sysadmin, I'm glad to be able to say. I kept a copy of my memo to the users after I had done what I could. Reading it over now is sobering indeed! I also kept my extensive notes on the restore process - thank goodness I've never had to use them since. Morals: 1) The "man" pages don't tell you everything you need to know. 2) Don't do backups to floppies. 3) Test your backups to make sure they are readable. 4) Handle the format program (and anything else that writes directly to disk devices) like nitroglycerine. 5) Strenuously avoid systems with inadequate backup and restore programs wherever possible (thank goodness for "restore" with an "e"!). 6) If you've never done sysadmin work before, take a formal training class. Well, I haven't thought about that one in a while! I can laugh about it now .... Bob ----------------------------------------------------------------------------- From: jimh@pacdata.uucp (Jim Harkins) Organization: Pacific Data Products A friend of mine admins an RS6000 for a state college. The weekend before the fall semester started the Powers That Be decided to physically move the system to a different room. She stayed late friday night, moved the machine, and then it wouldn't boot. I was in Sunday afternoon looking at it, wouldn't boot for nothing. Monday morning, first day of classes, an IBM rep comes in and reformats the hard disk without telling her. Turns out this was the machine all the professors were doing their class plans on. So not only couldn't they have them printed out, but when school started monday morning the teachers discovered they had lost all the work they'd done in the week before school started. Seems she never did backups because the teachers always bitched about how slow the system was when she did, and she hadn't learned about cron yet (I told her about that one). In her defense, she'd only been using the RS6000 for less than a month before this happened. She didn't know UNIX. She hadn't had any training. She still had her regular job to do. To make things worse, when she called me monday night she was in tears as she told me how she had to personally visit all the professors and tell them their work was gone. I blurted out "Stupid of you not to make backups". Here she is looking for a shoulder to cry on and I go and tell her the same thing everybody from the department chair on down to the janitor had been saying. Oops. The moral? If you appoint someone to admin your machine you better be willing to train them. If they've never had a hard disk crash on them you might want to ensure they understand hardware does stuff like that. I also found out she was unplugging and plugging cables all over the place without powering down the system. Her hardware knowledge was essentially "this thing goes into the wall, then the lights blink". jim ----------------------------------------------------------------------------- From: russells@ccu1.aukuni.ac.nz (Russell Street) Organization: University of Auckland, New Zealand. Not quite a reall _horror_ story but ... I once had "gnu-emacs" aliased to 'em' (and 'emacs' etc) One day I wanted to edit the start up file and mistyped # rm /etc/rc.local instead of the obvious. *Fortunately* I had just finished a backup and was now finding out the joys of tar and it's love of path names. Õ./etc/rc.local and /etc/rc.local and etc/rc.local) are *not* the same for tar and TK-50s take a *long* time search for non-existant files :(þ Of course the BREAK (Ctrl-P) key on a VAX and an Ultrix manual and a certain /etc/ttys line are just a horror story waiting to happen! Especially when the VAX and manuals are in a unsupervised place :) ----------------------------------------------------------------------------- From: obi@gumby.ocs.com (Obi Thomas) Organization: Online Computer Systems, Inc. This isn't nearly as bad as some of the stories in this thread, but... I once mistakenly partitioned my Sun's boot disk so that the swap partition overlapped the usr partition. The machine ran fine for a long time (many months), presumably because the swap space was always nearly empty. Then, one day there was a memory parity error and the system crash dumped at the *end* of the swap partition. What should have been a simple reboot after the crash dump turned into a long and painful re-install of the entire system (Suns cannot boot without a /usr partition). Now when I partition a disk I sit there with a calculator and make sure all the numbers add up correctly (offsets, number of cylinders, number of blocks, and so on). ----------------------------------------------------------------------------- From: dp@world.std.com (Jeff DelPapa) Organization: The World Public Access UNIX, Brookline, MA In article obi@gumby.ocs.com writes: >This isn't nearly as bad as some of the stories in this thread, but... > >I once mistakenly partitioned my Sun's boot disk so that the swap >partition overlapped the usr partition. The machine ran fine for a long >time (many months), presumably because the swap space was always nearly >empty. I remember a similar thing once - on a symbolics machine, a customer declared a file in the FEP filesystem as a paging file, and as part of the file system (it was one way to solve their disk space crunch) It was caught before damage was done - we weren't sure if it was because they hadn't done anything real yet, or simply the machine knew not to mess with the IRS (the customer). ----------------------------------------------------------------------------- From: rik@nella15.cc.monash.edu.au (Rik Harris) Organization: Monash University, Melb., Australia. Sometimes it takes a few tries to get it through the tired brain... Most of our disks reside on a single, high-powered server. We decided this probably wasn't too good an idea, and put a new disk on one of the workstations (particularly since the w/s has a faster transfer rate than the server does!). It's still really useful to be able to use all disks from the one machine, so I mounted the w/s disk on the server. I said to myself (being a Friday afternoon...see previous post) "it's only temporary.../mnt is already being used...I'll mount it in /tmp". So, I mounted on /tmp/a (or something). This was fine for a few hours, but then the auto-cleanup script kicked in, and blew away half of my source (the stuff over 2 weeks old). I didn't notice this for a few days, though. After I figured out what had happened, and restored the files (we _do_ have a good backup strategy), everything was OK. Until a few months later. We were trying to convince a sysadmin from another site that he shouldn't NFS export his disks rw,root to everyone, so I mounted the disk to put a few suid root programs in his home directory to convince him. Well, it's only a temporary mount, so.... You guessed it, another Friday afternoon. I did a umount /tmp/b, and forgot about it. I noticed this one about halfway through the next day. (NFS over a couple of 64k links is pretty slow). The disk had not unmounted because it was busy...busy with two find scripts, happily checking for suid programs, and deleting anything over a week old. A df on the filesystem later showed about 12% full :-( Sorry Craig. Now, I create /mnt1, /mnt2, /mnt3.... :-) Remember....Friday afternoons are BAD news. rik. ----------------------------------------------------------------------------- From: ranck@joesbar.cc.vt.edu (Wm. L. Ranck) Hello folks, Well, after reading some of the stories in this thread I guess I can tell mine. I got an RS/6000 mod. 220 for my office about 6 months ago. The OS was preloaded so I had little chance to learn that process. Being used to a full-screen editor I was not happy with vi so I read in the manual that INED (IBM's editor for AIX) was full-screen and I logged in as root and installed it. I immediately started to play with the new editor and somehow found a series of keys that told the editor to delete the current directory. To this day I don't know what that sequence of keys was, but I was unfortunately in the /etc directory when I found it, and I got a prompt that said "do you want to remove this?" and I thought i was just removing the file I had been playing with but instead I removed /etc! I got the chance to learn how to install AIX from scratch. I did reinstall INED even though I was a little gun-shy but I made sure that whenever I used it from then on I was *not* root. I have since decided that EMACS may be a better choice. ----------------------------------------------------------------------------- From: stehman%citron.cs.clemson.edu@hubcap.clemson.edu (Jeff Stehman) Organization: Clemson University From article <3965@wzv.win.tue.nl>, by rob@wzv.win.tue.nl (Rob J. Nauta): > > But the stories told now are more folklore than real horror. Having read > 2 Stephen Kings this weekend I beg everyone to tell more interesting > stories, about demons, the system clock running backwards, old files > reappearing etc ! Hmmm. Maybe this is a little closer to what you're looking for... Many years ago a tiny little college in the middle of nowhere purchased an NCR tower, then a newfangled contraption. A half-dozen of us were using it for an assembly class. The prof should have made his warnings about TRAP a little more clear. One student runs his program and it suddenly begans spawning processes, rapidly filling the machine. The prof came in, amused, logged on as superuser, and killed a process. Another process was immediately spawned. The prof tried again. He was ignored. He was also no longer amused. After several minutes he gave up and turned off the box. The tower didn't even flinch. He pulled the plug. Nothing. He ripped the back off the box and dug around. Finally he found the fuse and pulled it, killing the machine. Some of us later claimed we heard laughter as it went down. (Many times since then I have wished other computers came with a backup battery as standard issue.) ----------------------------------------------------------------------------- From: grover@ccai.clv.oh.us (grover davidson) Organization: CCAI Several months ago here, we were reoganizing our disk space on an RS/6000 with AIX 3.1. I have done this many time before, but for some reason, I was rushing through expanding a file system. Instead of entering the new file system size where it belongs, I entered it into the mount point. It also turns out that I was attached 2 levels down in the file system. Since the size was entered as a number ('234567') and was INTERPRETED as a mount point directory, the result was a circular hard link that basicly left the file system unusable. IBM was not able to help, and we had done quite a bit of work that day, we had to somehow recover some of the stuff. We ended up doing a dd of the raw volume, and the read it back in a couple MB at a time and extracted the pieces that we needed for the mess. The other day while reading Stevens new book, "Advanced Programming in the UNIX Environment", he stated that he had done the exact same thing durring the preparation of his book. At least I am not alone..... ----------------------------------------------------------------------------- From: dvsc-a@minster.york.ac.uk Organization: Department of Computer Science, University of York, England I remember my first (and only, so far) major mistake in unix admin: I was changing the UIDs of a few users on one of our major servers, due to a clash with some machines newly connected to the net. Fine, edit /etc/passwd then chown all their files to the new UID. So, rather than just assume that all files owned by "fred" live in /home/machine/fred I did this: machine# find / -user old_uid -exec chown username {} \; This was fine... except it was late at night and I was tired, and in a hurry to get home. I had six of these commands to type, and as they would take a long time I'd just let them run in the background over night..... So, you come in the next morning and a user compains... I can't login to the 4/490 - it says "/bin/login: setgid: not owner". Okay.... naive user problem no? rlogin machine -l root /bin/login: setgid: not owner machine console login: root /bin/login: setgid: not owner Okay - I REALLY can't get in... lets reboot single user and see whats on... this worked. /bin/login is owned (and setuid to) one of the users whos UID I changed the previous day... infact ALL FILES in the ENTIRE filesystem are owned by this user..problem! We `only' lost about 200 man hours through my little typing mistake: the moral of the story.. beware anything recursive when logged in as root! find / -exec chown user {} \; Oh dear... Dave ----------------------------------------------------------------------------- From: mba@controls.ccd.harris.com (Belinda Asbell) Organization: Harris Controls In article , JRowe@cen.ex.ac.uk (J.Rowe) writes: ³> One thing I would like all vendors to do (I know one or two do) is ³> to give root the option of logging in using another shell. Am I the ³> only one to have mangled a root shell? ³> ³> John Rowe ³> Dept. Physics ³> Exeter University ³> UK. Probably not. I learned the hard way to be careful if messing with /etc/passwd. One day, for some reason, I couldn't login as root (pretty scary, since I knew the root passwd and hadn't changed it). Turned out that somehow I'd blitzed the first letter of /etc/passwd somehow (vi does bizarre things sometimes). So I logged in as 'oot' and fixed it. NEVER do a "chmod -R u-s .", especially not in /usr.... I think that "mount -o" or something similar will mount a filesystem read-write if it's come up in singleuser mode and is mounted read-only..... Just my tuppence.... ----------------------------------------------------------------------------- From: joslin_paul@ae.ge.com Organization: GE Aircraft Engines cjc@ulysses.att.com (Chris Calabrese) writes: >We have a home-grown admin system that controls accounts on all of our >machines. It has a remove user operation that removes the user from >all machines at the same time in the middle of the night. >Well, one night, the thing goes off and tries to remove a user with >the home directory '/'. All the machines went down, with varying >ammounts of stuff missing (depending on how soon the script, rm, find, >and other importing things were clobbered). >Nobody knew what what was going on! The systems were restored from >backup, and things seemed to be going OK, until the next night when >the remove-user script was fired off by cron again. True confession time: Cron is a great way to hide your flubs. I installed the COPS security package on a system, then set up cron to recheck the system once a month. No problem, right? Except that I had configured COPS to put the reports in /. As a security measure, COPS chmods its directory to u-rwx,w-rwx so that only the COPS owner can read the reports. The chronology was 1) Run cops. Add cops entry to root's crontab. Later that day, notice that / was 600; change it back. 2) 30 days later: get calls from users - can't log in, "No shell" error messages. Find / is 600; change it. Vaguely remember that this happened once before. The machine was a sandbox, so almost anything could have changed /. 3) 30 days later: get calls from users - can't log in, "No shell" error messages. Find / is 600; change it. Vaguely remember that this happened once before. Happen to think "cron"; notice that the only cron activity for root last night was COPS. Read COPS source and discover problem. Moral: RTFM. Keep logs, so that you can notice patterns in your data. Don't do anything as root that you can do as a mortal. ----------------------------------------------------------------------------- From: root@rulcvx.LeidenUniv.nl (root) Organization: CRI, institute for telecommunication and computerservices. In article <64@ocdis01.UUCP> robjohn@ocdis01.UUCP (Contractor Bob Johnson) writes: >Another horror story (mine this time)... >Cleaning out an old directory, I did 'rm *', then noticed several files >that began with dot (.profile, etc) still there. So, in a fit of obtuse >brilliance, I typed... > rm -rf .* & Well, waddya know... Some half hour ago, coming back from root (I was installing m4 on our system) ÕShit, all my neato emacs tricks won't work. Damn, damn, damn kill, kill, KILLþ to my own userid, I got this little message: "Can't find home directory /mnt0/crissl." and an other: "Can't lstat .". ÕGrrrrr, ªS and ªQ haven't been remapped...þ Guess what happened, not an hour ago... A collegue of mine was emptying some directories of computer-course accounts. As I did a "ps -t" on his tty, what did I see? "rm -rf .*" Well, I'm not alone, he got sixteen other homedirectories as well. And guess what filesystems we don't make incremental backups of... And why not? Beats me... I haven't killed him yet, he first has to restore the lot. And for those "touch \-i" fans out there: you wouldn't have been protected... Boy, am I MAD. :-) (Bitten by the bug I, too, once released.) Stefan "where can I find a well-equipped torture chamber" Linnemann ----------------------------------------------------------------------------- From: hillig@U.Chem.LSA.UMich.EDU (Kurt Hillig) Organization: Department of Chemistry, University of Michigan, Ann Arbor Just so nobody get the impression that you can only screw up U**X systems.... Several years ago I was sysadmin for the department's VAX/VMS system. One day, trying to free up some space on the system disk, I noticed there were a bunch of files like COBRTL.EXE, BASRTL.EXE etc. - i.e. the Cobol, Basic, etc. run-time libraries. Since the only language used was Fortran, I nuked them. Three weeks later, a visiting professor came over from Greece for a few weeks, mostly to do some calculations on the VAX. He got in on a Friday morning, and started work that afternoon. About 7 PM I got a call at home - he'd accidentally bumped the reset switch (on the VAX 3200, it was just at knee height!) and it wouldn't reboot. I went back in and took a look, and the reason it wouldn't come up was that the run-time libraries were missing. I ended up booting stand-alone backup from tape, dumping another data disk to tape, restoring an old system from tape, copying the RTL's, then restoring the data disk from tape again - all with TK50's. Took me until 3 AM. ----------------------------------------------------------------------------- From: kevin@sherman.pas.rochester.edu (kevin mcfadden) Organization: University of Rochester Me and my co-system admin were in the process of repartioning a drive so that we could allocate more space for incoming mail. We had just finished backing up our Data directory from which we were going to take 10MB from. Next step was to to actually repartition it which includes formating. Anyway, it comes time to give a device name and we do a df to see which one. To make a short story long, there was a /dev/sd2g and a /dev/sd3g, one which was 300MB of stuff we could delete and the other was 600MB of applications. We confused the the two and accidently formatted the 600 MB of applications, which of course had been backed up......a month ago. It could have been worse. BUT WAIT!!! It did. Turns out it took 3 or 4 tries to get the partition size correct (what the hell is it with telling it how long it is in hex or whatever?). It was at this point where I started to cover my eyes and wander around the building because we only found out the partition didn't work after spending 3 hours restoring the applications. 4 * 3 = 12 hours to repartition! -- Anatoly Ivasyuk @ Computer Science House @ Rochester Institute of Technology anatoly@nick.csh.rit.edu || axi0349@ultb.rit.edu || axi0349@ritvax.rit.edu You say you haven't heard of CSH? You will... Google Home - Advertise with Us - Search Solutions - Services & Tools - Jobs, Press, & Help ©2003 Google