Google Search: The Unofficial Unix Administration Horror Stories Summary (part 3) Groups Advanced Groups Search Preferences Groups Help Groups search result 3 for The Unofficial Unix Administration Horror Stories Summary (part 3) Search Result 3 From: A.X. Ivasyuk (axi0349@ultb.isc.rit.edu) Subject: The Unofficial Unix Administration Horror Stories Summary (part 1) This is the only article in this thread View: Original Format Newsgroups: comp.unix.admin Date: 1992-12-03 09:57:13 PST Unix Admin. Horror Story Summary, version 1.0 ----------------------------------------- compiled by: Anatoly Ivasyuk (anatoly@nick.csh.rit.edu) This is version 1.0 of the unofficial "Unix Administration Horror Story Summary". This is a summary of the "Unix Administration Horror Stories" thread which was seen in comp.unix.admin in October '92. I put this together for two reasons: 1) Some of these stories are damn amusing. 2) Many people can learn many things about what *not* to do when they're in charge of a system. This summary contains quite a few different types of stories. There are success stories, and... well... other stories. But the most important thing that can be learned from this is not that you have to make backups (we all know that, right? ;-) ). More important than making backups is to make sure your backups are complete and verified. For more on this, see the story about trying to backup 300MB drives onto 150MB tapes. If there are additional stories that anyone wants to submit, I'll be glad to add them to this FAQ. Send them to me at: anatoly@nick.csh.rit.edu. Please send any general comments my way, also. Please consider this a "beta test" release. I have not had the time to go over this as many times as I wanted to, so there may be mistakes in my editing. I have not edited the content of the stories except where noted, and may have excluded stories or bits where I felt it was appropriate. -Anatoly ----------------------------------------------------------------------------- The posting that started it all: -------------------------------- On 7 Oct 92 12:02:46 GMT, aras@multix.no (Arne Asplem) said: > I'm the program chair for a one day conference on Unix system > administration in Oslo in 3 weeks, including topics like network > management, system admininistration tools, integration, print/file-servers, > securitym, etc. > I'm looking for actual horror stories of what have gone wrong because > of bad system administration, as an early morning wakeup. > I'll summarise to the net if there is any interest. > -- Arne ----------------------------------------------------------------------------- From: jdell@maggie.mit.edu (John Ellithorpe) Organization: Massachusetts Institute of Technology Here's a pretty bad story. I wanted to have root use tcsh instead of the Bourne shell. So I decided to copy tcsh to /usr/local/bin. I created the file, /etc/shells, and put in /usr/local/bin/tcsh, along with /bin/sh and /bin/csh. All seems fine, so I used the chsh command and changed root's shell to /usr/local/bin/tcsh. So I logged out and tried to log back in. Only to find out that I couldn't get back in. Every time I tried to log in, I only got the statement: /usr/local/bin/tcsh: permission denied! I instantly realized what I had done. I forgot to check that tcsh has execute privileges and I couldn't get in as root! After about 30 minutes of getting mad at myself, I finally figured out to just bring the system down to single-user mode, which ONLY uses the /bin/sh, thankfully, and edited the password file back to /bin/sh. I'll never do that again. This wasn't that much of a horror story, but good enough if you aren't that familiar with the system. John ----------------------------------------------------------------------------- From: dbrillha@dave.mis.semi.harris.com (Dave Brillhart) Organization: Harris Semiconductor We can laugh (almost) about it now, but... Our operations group, a VMS group but trying to learn UNIX, was assigned account administration. They were cleaning up a few non-used accounts like they do on VMS - backup and purge. When they came across the account "sccs", which had never been accessed, away it went. The "deleteuser" utility fom DEC asks if you would like to delete all the files in the account. Seems reasonable, huh? Well, the home directory for "sccs" is "/". Enough said :-( ----------------------------------------------------------------------------- From: tzs@stein.u.washington.edu (Tim Smith) Organization: University of Washington, Seattle I was working on a line printer spooler, which lived in /etc. I wanted to remove it, and so issued the command "rm /etc/lpspl." There was only one problem. Out of habit, I typed "passwd" after "/etc/" and removed the password file. Oops. I called up the person who handled backups, and he restored the password file. A couple of days later, I did it again! This time, after he restored it, he made a link, /etc/safe_from_tim. About a week later, I overwrote /etc/passwd, rather than removing it. After he restored it again, he installed a daemon that kept a copy of /etc/passwd, on another file system, and automatically restored it if it appeared to have been damaged. Fortunately, I finished my work on /etc/lpspl around this time, so we didn't have to see if I could find a way to wipe out a couple of filesystems... --Tim Smith ----------------------------------------------------------------------------- From: nickp@BNR.CA ("Nick Pitfield", N.T.) Greetings, The following horror story occured only last week.... One of my colleagues had been itching to get into sys admin for some time, so last week he was finally sent on a 5-day sys admin course run by HP in Bracknell.. On the following Sunday, he decided to try out his new found knowledge by trying to connect and configure a DAT drive on one of our critical test systems. He connected the cables up okay, and then created the device file using 'mknod'. Unfortunately, he gave the device file the same minor & major device numbers as the root disk; so as soon as he tried to write to this newly installed 'DAT drive', the machine wents tits up with a corrupt root disk....ho hum. Regards. Nick Pitfield. ----------------------------------------------------------------------------- From: philip@haas.berkeley.edu (Philip Enteles) Organization: Haas School of Business, Berkeley As a new system administrator of a Unix machine with limited space I thought I was doing myself a favor by keeping things neat and clean. One day as I was 'cleaning up' I removed a file called 'bzero'. Strange things started to happen like vi didn't work then the compliants started coming in. Mail didn't work. The compilers didn't work. About this time the REAL system administrator poked his head in and asked what I had done. Further examination showed that bzero is the zeroed memory without which the OS had no operating space so anything using temporary memory was non-functional. The repair? Well things are tough to do when most of the utilities don't work. Eventually the REAL system administrator took the system to single user and rebuilt the system including full restores from a tape system. The Moral is don't be to anal about things you don't understand. Take the time learn what those strange files are before removeing them and screwing yourself. Philip Enteles ----------------------------------------------------------------------------- From: broberts@waggen.twuug.com (Bill Roberts) Organization: Brite Systems My most interesting in the reguard was when I deleted "/dev/null". Of course it was soon recreated as a "regular file", then permission problems started to show up. I was new at the game at the time and couldn't figure out what happened! It look good to me. I didn't know about "special files" and "mknod" and major and minor device codes. A friend finally helped out and started laughing and put me on the right track. That one episode taught me a lot about my system. ----------------------------------------------------------------------------- From: Frank T Lofaro Organization: Sophomore, Math/Computer Science, Carnegie Mellon, Pittsburgh, PA Well one time I was installing a minimal base system of Linux on a friends PC, so that we would have all the necessary utlitities to bring over the rest of the stuff. His 3 1/2 inch disk was dead, so when had to get the 5 1/4 inch version of the boot/root disk. Too bad that version, having to fit in 1.2M instead of 1.44, didn't have tar. We could get a version of tar, but it was in a tar file (nice chicken and egg scenario). I said, okay, since we don't have tar, we can't use that to copy the files from floppy to the hard disk, I'll use cp instead (bad move). It actually seemed to work for a while, then the machine rebooted! I did it again, the same thing happened. Then I realize cp wouldn't work on device files! (this is what happens when you try to install un*x at 3 AM). It just read the contents of the device and made a file containing such, which is undesireable in any event. (when it read /dev/port, the device file that references I/O ports, it must've did something to reboot the machine, that was the file that was causing the reboots). I finally got it working by having him get the tar archive of the linux binaries (including the tar we needed), and untarring it on one of the public decstations here, so we could ftp tar to his PC using his dos tcp/ip stuff. A funny aside was that it untarred into ~/bin, and superseded all his normal commands. We were wondering why everything wouldn't run. Luckily it wasn't too hard to fix after we realized what happened. ----------------------------------------------------------------------------- From: mfraioli@grebyn.com (Marc Fraioli) Organization: Grebyn Timesharing Well, here's a good one for you: I was happily churning along developing something on a Sun workstation, and was getting a number of annoying permission denieds from trying to write into a directory heirarchy that I didn't own. Getting tired of that, I decided to set the permissions on that subtree to 777 while I was working, so I wouldn't have to worry about it. Someone had recently told me that rather than using plain "su", it was good to use "su -", but the implications had not yet sunk in. (You can probably see where this is going already, but I'll go to the bitter end.) Anyway, I cd'd to where I wanted to be, the top of my subtree, and did su -. Then I did chmod -R 777. I then started to wonder why it was taking so damn long when there were only about 45 files in 20 directories under where I (thought) I was. Well, needless to say, su - simulates a real login, and had put me into root's home directory, /, so I was proceeding to set file permissions for the whole system to wide open. I aborted it before it finished, realizing that something was wrong, but this took quite a while to straighten out. Marc Fraioli ----------------------------------------------------------------------------- From: rheiger@renext.open.ch (Richard H. E. Eiger) Organization: Olivetti (Schweiz) AG, Branch Office Berne In article <1992Oct9.100444.27928@u.washington.edu> tzs@stein.u.washington.edu (Tim Smith) writes: > I was working on a line printer spooler, which lived in /etc. I wanted > to remove it, and so issued the command "rm /etc/lpspl." There was only > one problem. Out of habit, I typed "passwd" after "/etc/" and removed > the password file. Oops. > [deleted to save space[ > > --Tim Smith Here's another story. Just imagine having the sendmail.cf file in /etc. Now, I was working on the sendmail stuff and had come up with lots of sendmail.cf.xxx which I wanted to get rid of so I typed "rm -f sendmail.cf. *". At first I was surprised about how much time it took to remove some 10 files or so. Hitting the interrupt key, when I finally saw what had happened was way to late, though. Fortune has it that I'm a very lazy person. That's why I never bothered to just back up directories with data that changes often. Therefore I managed to restore /etc successfully before rebooting... :-) Happy end, after all. Of course I had lost the only well working version of my sendmail.cf... Richard ----------------------------------------------------------------------------- From: mitch@cirrus.com (Mitch Wright) Organization: Cirrus Logic Inc. I guess I should add a story (or maybe not). Anyway, a fellow sysadmin was looking to free up some much needed disk space. Since it was purely a production machine I suggested that he go through and "strip" his binaries. Unfortunately I made the assumption that he knew what strip does and would use it wisely -- flashes of the Bad News Bears come to mind now. To make it short, he stripped /vmunix which didn't destroy the system, but certainly caused some interesting problems. ~mitch ----------------------------------------------------------------------------- From: hirai@cc.swarthmore.edu (Eiji Hirai) Organization: Information Services, Swarthmore College, Swarthmore, PA, USA Some of these stories of pure stupidity rather than of interesting horror but they did happen. [ BTW, these happened at a different place at a different time than where I am now. Don't bother my current employer about it. ] (1) A consultant we had hired (and not a very good one) was installing Unix on one our workstations. He was mucking with creating and deleting /dev/tty* files and made /dev/tty a regular file. Weird things started to happen. Commands would only print their output if you pressed return twice, etc. Fortunately, we solved the problem by re-mknod-ing /dev/tty. However, it took a while to realize what was causing this problem. (2) I wanted to create a second swap partition on another disk and made the partition start at sector 0 of the disk! (which sounded ok at the time since all other regular 'a' partitions started on sector 0) Every time I rebooted, fsck would complain about missing partition tables - I initially suspected that the disk was bad but I later realized that swapping was overwriting the partition table. I had lost an unknown percentage of the financial data for the institution that I was working for at the time, right when they were being audited! Yikes! Anyway, we were able to recover the data and life returned to normal but I did wonder at the time whether I could still keep my job there. (3) At the same institution, we were running a system software that had a serious bug where if anyone had logged out ungracefully, the system wouldn't let any more users onto the system and users who were logged on couldn't execute any new commands. (The newest release of the software later on did fix this bug.) I had to reboot the machine to restore the system to a sane state. I did a wall < hirai@cc.swarthmore.edu (Eiji Hirai) writes: >...[some deleted] >(4) I heard this from a fellow sysadmin friend. My friend was forced to >work with some sysadmins who didn't have their act together. One day, one >of them was "cleaning" the filesytem and saw a file called "vmunix" in /. >"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix". >My friend had to reinstall the entire OS on that machine after his coworker >did this "cleanup". Ahh, the hazards of working with sysadmins who really >shouldn't be sysadmins in the first place. When this happened to a colleague (when I worked somewhere else) he restored vmunix by copying from another machine. Unfortunately, a 68000 kernel does not run very well on a Sparc... ----------------------------------------------------------------------------- From: smckinty@sunicnc.France.Sun.COM (Steve McKinty - Sun ICNC) Organization: SunConnect In article , hirai@cc.swarthmore.edu (Eiji Hirai) writes: > (4) I heard this from a fellow sysadmin friend. My friend was forced to > work with some sysadmins who didn't have their act together. One day, one > of them was "cleaning" the filesytem and saw a file called "vmunix" in /. > "Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix". > > My friend had to reinstall the entire OS on that machine after his coworker > did this "cleanup". Ahh, the hazards of working with sysadmins who really > shouldn't be sysadmins in the first place. Hmm. A colleague of mine did much the same by accident on one of our test machines. After discovering it, fortunately while the machine was still up & running, he FTPed a copy of /vmunix from the other lab system (both running exactly the same kernel). After rebooting his machine everything (to his relief) worked fine. ----------------------------------------------------------------------------- From: lingnau@math.uni-frankfurt.de (Anselm Lingnau) Organization: University of Frankfurt/Main, Dept. of Mathematics In article <1992Oct10.010412.3448@waggen.twuug.com>, broberts@waggen.twuug.com (Bill Roberts) writes: > My most interesting in the reguard was when I deleted "/dev/null". Of > course it was soon recreated as a "regular file", then permission problems > started to show up. Years ago when I was working in the Graphics Workshop at Edinburgh University, we used to have a small UNIX machine for testing. The machine wasn't used too much, so nobody bothered to set up user accounts, and so everybody was running as root all the time. Now one of the chaps who used to come in was fond of reading fortunes (/usr/games/fortune having been removed from the University's real machines along with all the other games). Guess what happened when the machine said # fortune fortune: write error on /dev/null --- please empty the bit bucket Quite a lot of stuff wouldn't work after the chap was done with the machine for the day. You bet we put up proper accounts after that! Anselm ----------------------------------------------------------------------------- From: peter@NeoSoft.com (Peter da Silva) Organization: NeoSoft Communications Services -- (713) 684-5900 Well, we had one system on which you couldn't log in on the console for a while after rebooting, but it'd start working sometimes. What was happening was that the manufacturer had, for some idiot reason, hardcoded the names of the terminals they wanted to support into getty (this manufacturers own terminals, that I can understand, but also a handful of common types like adm3a) so getty could clear the screen properly (I guess hacking that into gettydefs was too obvious or something). If getty couldn't recognise the terminal type on the command line, it'd display a message on the console reading "Unknown terminal type pc100". We ignored this flamage, which was a pity. Cos that was the problem. It did this *before* opening the terminal, so if it happened to run between the time rc completed and the getty on the console started the console got attached to some random terminal somewhere, so when login attempted to open /dev/tty to prompt for a password it failed. Moral: always deal with error messages even when you *know* they're bogus. Moral: never cry wolf. ----------------------------------------------------------------------------- From: rickf@pmafire.inel.gov (Rick Furniss) Organization: WINCO Horror stories: Did this myself many years ago, and have come close to it since. Murphy's law #?? , preventive maintenence doesnt. try this one: /etc/dump /dev/rmt/0m /dev/dsk/0s1 Or: tar cvf /dev/root /dev/rmt0 Backups on unix can be one of the most dangerous commands used, and they are used to prevent rather than cause a problem. If any Unix utility were a candidate for a warning message, or error checking, this would be it. Just in case you didnt catch the HORROR above, the parameters are backworks causing a TOTAL wipe out of the root file systems. More systems have been wiped out by admins, than any hacker could do in a life time. ----------------------------------------------------------------------------- From: gfowler@javelin.sim.es.com (Gary Fowler) Organization: Evans & Sutherland Computer Corporation Once I was going to make a new file system using mkfs. The device I wanted to make it on was /dev/c0d1s8. The device name that I used, however, was /dev/c0d0s8 which held a very important application. I had always been a little annoyed by the 10 second wait that mkfs has before it actually makes the file system. I'm sure glad it waited that time though. I probably waited 9.9 seconds before I realized my mistake and hit that DEL key just in time. That was a near disaster avoided. Another time I wasn't so lucky. I was a very new SA, and I was trying to clean some junk out of a system. I was in /usr/bin when I noticed a sub directory that didn't belong there. A former SA had put it there. I did an ls on it and determined that it could be zapped. Forgetting that I was still in /usr/bin, I did an rm *. No 10 second idiot proofing with rm. Now if some one would only create an OS with a "Do what I mean, not what I say" feature. Gary "Experience is what allows you to recognize a mistake the second time you make it." Fowler ----------------------------------------------------------------------------- From: broadley@neurocog.lrdc.pitt.edu (Bill Broadley) Organization: University of Pittsburgh On a old decstation 3100 I was deleting last semesters users to try to dig up some disk space, I also deleted some test users at the same time. One user took longer then usual, so I hit control-c and tried ls. "ls: command not found" Turns out that the test user had / as the home directory and the remove user script in ultrix just happily blew away the whole disk. ftp, telnet, rcp, rsh, etc were all gone. Had to go to tapes, and had one LONG rebuild of X11R5. Fortunately it wasn't our primary system, and I'm only a student.... ----------------------------------------------------------------------------- From: hirai@cc.swarthmore.edu (Eiji Hirai) Message-ID: Sender: news@cc.swarthmore.edu (USENET News System) Nntp-Posting-Host: gingko Organization: Information Services, Swarthmore College, Swarthmore, PA, USA References: <2840@bsu-cs.bsu.edu> Date: Tue, 13 Oct 1992 16:00:28 GMT rik.harris@fcit.monash.edu.au writes: > I'll mount it in /tmp Though this may strike most sane sysadmins as bad practice, SunOS (3.4 or so - my memory is vague) shipped a command called "on". If you were logged on machine A and wanted to execute a command on machine B, you said "on B command", sort of like rsh. However, A would mount B's disks under some invokations of "on" and it would mount it in /tmp! Of course, lots of folks got bitten by this stupid command and it was taken out after a long delay by Sun. Anyone remember the details? I've blocked out my memory of pre-4.0 SunOS. Am I just hallucinating? ----------------------------------------------------------------------------- From: matthews@oberon.umd.edu (Mike Matthews) Organization: /etc/organization In article obi@gumby.ocs.com writes: >Now when I partition a disk I sit there with a calculator and make sure >all the numbers add up correctly (offsets, number of cylinders, number of >blocks, and so on). Heh heh, now that you mention that... We had just gotten a 1.2G disk drive for our Sun (which direly needed it) so we felt we'd repartition everything. All went well, except... on reboot, one of the partitions that was newly restored from backup got a fsck error. Fixed it, it rebooted, then another one got an error. fscked that one, rebooted it, and doggone it, the first error was back! We had a one cylinder overlap. Sheesh. At least Ultrix WARNS you of that. Mike Matthews, matthews@oberon.umd.edu (NeXTmail accepted) ----------------------------------------------------------------------------- From: mt00@eurotherm.co.uk (Martin Tomes) Organization: Eurotherm Limited We had something really wierd happen one day. I copied a file to /usr/local on someone elses machine and all seemed to be OK. A bit later the user of the machine noticed that the files and directories they were using on another disk partition were corrupted. There were 2 gigbyte files on a 650Mb disk - and lots of them with wierd names and permissions. At first I did not connect the two events. This disk had given trouble when the power failed a week before, so I fsck'ed it. Now I have run fsck more times than I can begin to imagine and seen plenty of errors, some needing 'manual intervention' but I had never seen anything like this before! It was spectacular. And what was more, when I ran it a second time things got worse. Then I tried to backup the /usr/local partition before restoring this corrupt data and lo, that was corrupt too. It turned out that our sysadmin had created the /usr/local disk partition in the wrong place on the disk and put it over the top of the alternate sectors partition. By writing to the /usr/local disk I had written all over the alts which were mapped into the users partition. Oh dear, what a mess. Solution, rebuild all the partitions so they don't overlap and restore, also buy the sysadmin a calculator. Moral, always do your sums on the /etc/partitions file very carefully before using mkpart. ----- UNIX-ADM USENET appended at 20:22:10 on 92/10/13 GMT (by USENET at ALMADEN) From: caa@Unify.Com (Chris A. Anderson) Organization: Unify Corporation, Sacramento, California Ok, here's one... At a company that I used to work for, the CEO's brother was the "system operator". It was his job to do backups, maintentance, etc. Problem was, he didn't have a clue about Unix. We were re- quired to go through him to do anything, though. Well, I was setting up a Plexus P-95 to be a news/mail/communications machine and needed to wipe the disks and install a new OS. El CEO requested that his brother do the in- stallation and disk partitioning. He had done this before, so I gave him the partition maps and let him at it. When he was done, everything seemed to be ok. Great, on with the install and set- up. Things went fine until I started compiling the news and mail software. All of a sudden, the machine paniced. I brought it back up and the root file system was amazingly corrupt. After rebuilding things, it all seemed to be fine -- diagnostics all ran fine, etc. So I started again -- this time keeping an eye on things. Sure enough, the root file system became corrupted again when the system started to load. This time I brought it down and checked everything. The problem? Swap space started at block zero and so did the root file system. ARRRGGGHHHHH!! Oh yes, the brother still works there. Chris ----------------------------------------------------------------------------- From: miles@Chaos.mcs.kent.edu (Roger Miles) Organization: Kent State University A year ago we moved to a brand spanking new building. All the equipment was moved by professional movers. The last piece of equipment I wanted moved was the computer (a Zilog s8000, 6ft. tall, with 3 disk drives, cartridge drive and reel tape drive all mounted in one cabinet. It must have weighed 250 to 300 lbs) because I wanted to keep an eye on the movers. Actually, I was hoping they'd drop it so I could get a new computer. Anyway, much to my surprize the movers said they would not move the computer because of the liability. One of my co-workers owned a Ford pickup so we hoisted it up and drove off with me riding in the back hanging on to the Zilog. It was the longest 15 minute drive I was ever on in my life. Roger Miles KSU ----------------------------------------------------------------------------- From: tjm@hrt213.brooks.af.mil (Tim Miller) Organization: AL/HRTI, Brooks AFB This one qulaified for Stupid Act of the Month: All this happened on my sparcII... I was making room on / because I needed to to test run something (which was using a tmp file in, of all places, /var/tmp. I could have recompiled the application to use more memory and/or /tmp, but I'm too lazy for that), so I figure "I'll just compress this, and this, and this..." One of those "this'" was vmunix. Well, of course the application crashes the machine, and stupid me had forgotten that I'd compressed vmunix, so the damn thing won't boot. checksum: Bad value or some such error. Took me most of the day to figure out just what I'd done to the dang thing. 8) Moral(s): 1) Never, ever, EVER play with vmunix. 2) Always keep a log of what you do to the root file system. ----------------------------------------------------------------------------- From: jarocki@dvorak.amd.com (John Jarocki) Organization: Advanced Micro Devices, Inc.; Austin, Texas In article ericw@hobbes.amd.com (Eric Wedaa) writes: > >The moral(s) of the story here: [Eric's "Guidebook to Being a Good Paranoid UNIX Sysadmin" Deleted] > >>>>Ericw >(Paranoia is a "Good Thing" when you can really muck things up!) >-- >Eric Wedaa - eric.wedaa@amd.com ³ Two more kinds of lies... >{ames apple uunet}!amd!ericw ³ Release Dates, and Benchmarks >Advanced Micro Devices, M/S 167 PO Box 3453 Sunnyvale, CA 94088-3453 >=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Eric, You left out an important one: - Never hand out directions on "how to" do some sysadmin task until the directions have been tested thoroughly. - Corollary: Just because it works one one flavor on *nix says nothing about the others. '-} - Corollary: This goes for changes to rc.local (and other such "vital" scripties. ----------------------------------------------------------------------------- From: bill@chaos.cs.umn.edu ( Hari Seldon ... psychohistorian ) Organization: University of Minnesota In <1992Oct13.014245.24930@ccu1.aukuni.ac.nz> russells@ccu1.aukuni.ac.nz (Russell Street) writes: >rca@Ingres.COM (Bob Arnold) writes: >> 9) It's a lot less painful to learn from someone else's experience >> than your own (that's what this thread is about, I guess :-) ) >With out trying to wander off the thread tooooo much ... In my >experience the best experiences to learn off are your own :) >I wonder how many stories we have got so far about "I will never >type rm -r /" as root. (And no I have not done that _yet_, but >the day will come :() after a real bad crash (tm) and having been an admin (on an rs/6000) for less than a month (honest it wasn't my fault, yea right stupid) we got to test our backup by doing: # cd / # rm -rf * ohhhhhhhh sh*t i hope those tapes are good ya know it's kinda funny (in a perverse way) to watch the system just slowly go away. bill pociengel ----------------------------------------------------------------------------- From: barrie@calvin.demon.co.uk (Barrie Spence) Organization: DataCAD Ltd, Hamilton, Scotland In article <1992Oct13.014245.24930@ccu1.aukuni.ac.nz> russells@ccu1.aukuni.ac.nz (Russell Street) writes: >rca@Ingres.COM (Bob Arnold) writes: >> 9) It's a lot less painful to learn from someone else's experience >> than your own (that's what this thread is about, I guess :-) ) > >With out trying to wander off the thread tooooo much ... In my >experience the best experiences to learn off are your own :) >I wonder how many stories we have got so far about "I will never >type rm -r /" as root. (And no I have not done that _yet_, but >the day will come :() > My mistake on SunOS (with OpenWindows) was to try and clean up all the '.*' directories in /tmp. Obviously "rm -rf /tmp/*" missed these, so I was very careful and made sure I was in /tmp and then executed "rm -rf ./.*". I will never do this again. If I am in any doubt as to how a wildcard will expand I will echo it first. Barrie ----------------------------------------------------------------------------- From: root@trebor.uucp (Bob Stockler) Organization: Bob Stockler rca@Ingres.COM (Bob Arnold) writes: >Morals: > 2) Don't do backups to floppies. Once, Tandy Xenix had the largest installed base of *NIX systems extant. My friend, mentor and guru Bob Snapp and I undertook to write a systematic backup set of shell scripts do what the *NIX programs then available would not do: make a reliable compressed Master Backup, and reliable compressed incremental backups (so 'cron' could do it) to available 8" floppy drives. We've never found that our programs failed. Now, on SCO *NIX systems we prefer CTAR. We've never found it to fail either. ----------------------------------------------------------------------------- From: JRowe@cen.ex.ac.uk (J.Rowe) Organization: Computer Unit. - University of Exeter. UK In article rik@nella15.cc.monash.edu.au (Rik Harris) writes: > I said to myself (being a Friday afternoon...see previous > post) "it's only temporary.../mnt is already being used...I'll mount > it in /tmp". So, I mounted on /tmp/a (or something). This was fine > for a few hours, but then the auto-cleanup script kicked in, and blew > away half of my source (the stuff over 2 weeks old). I didn't notice > this for a few days, though. After I figured out what had happened, > and restored the files (we _do_ have a good backup strategy), > everything was OK. If you're doing this using find always put -xdev in: find /tmp/ -xdev -fstype 4.2 -type f -atime +5 -exec rm {} \; This stops find from working its way down filesystems mounted under /tmp/. If you're using, say, perl you have to stat . and .. and see if they are mounted on the same device. The fstype 4.2 is pure paranoia. Needless to say, I once forgot to do this. All was well for some weeks until Convex's version of NQS decided to temporarily mount /mnt under /tmp... Interestingly, only two people noticed. Yes, the chief op. keeps good backups! Other triumphs: I created a list of a user's files that hadn't been accessed for three months and a perl script for him to delete them. Of course, it had to be tested, I mislaid a quote from a print statement... This did turn into a triumph, he only wanted a small fraction of them back so we saved 20 MB. I once deleted the only line from within an if.. then statement in rc.local, the sun refused to come up, and it was surprisingly difficult to come up single user with a writeable file system. AIX is a whole system of nightmares strung together. If you stray outside of the sort of setup IBM implicitly assume you have (all IBM kit, no non IBM hosts on the network, etc.) you're liable to end up in deep doodoo. One thing I would like all vendors to do (I know one or two do) is to give root the option of logging in using another shell. Am I the only one to have mangled a root shell? John Rowe ----------------------------------------------------------------------------- From: kochmar@sei.cmu.edu (John Kochmar) Organization: The Software Engineering Institute A long time ago, back when the Apollo 460 was around and I had just graduated from college, I had the good fortune of being one of two adminstrators in charge of making a cluster of 460's a part of our environment. One of the things I was tasked with was geting them onto our network. Well, I was young, I had the manuals, and a guy from Apollo tech support was there to help. How hard could it be, right? Well, we got out the manuals, configured the system (relying heavily on the defaults), and within 2 hours, we had that puppy on the network. Life was good. About 3 hours later, I get a phone call from a systems programmer / developer from CMU campus (the SEI is a part of CMU, and we are on their network.) He told me that if I didn't take the &%@*ing Apollo off the network, he was going to do hurtful things to me physically. Life was not so good. As it turned out, in default mode, the Apollo answered every address request it saw, even if it is not the machine the request was for. Kind of a "hey, I'm not who you are looking for, but I'm out here in case you decide you'd rather talk to me." Apollo considered this a feature, and they took advantage of it in their OS environment. However, one of the earlier versions of a heavily network dependant OS developed at CMU considered this a bug. The OS would issue a request, and expect only the machine it was looking for to answer it. Of course, it would assume that if it got an answer to its request, it must be the machine it expected to talk to. It didn't look at the address of the answer it got, so if it wasn't the correct machine, most of the time the OS would hang or panic. The outcome? Over about 3 hours time, more and more of campus was talking to our little 460, which had just enough muscle to keep up with the requests. By the time campus figured out what was going on, we had an Apollo merrily answering the network requests for hundreds of machines (the ones that were still up, that is.) This caused the part of campus who used the new OS going to hell in a bucket, one very busy Apollo 460, and one very warm ethernet. Well, we turned off the Apollo, configured it not to chat to all of campus before putting it back on the ethernet (this time, we did it while talking with campus, making sure we didn't cause the same problems we did the last time -- we didn't have a packet monitor at the time), and campus changed their OS to look at the request response before assuming it was the correct one. I also learned to think very carefully about default values before using them. John ----------------------------------------------------------------------------- From: djd@csg.cs.reading.ac.uk (David J Dawkins) Organization: University of Reading weave@bach.udel.edu (Ken Weaverling) writes: >A friend of mine called me up saying he no longer could log into his >system. I asked him what he had done recently, and found out that he >thought that all executable programs in /bin /usr/bin /etc and so on >should be owned by bin, since they were all binaries! So he had >chown'ed them all. Oh you bastards. I was hoping that a thread like this would never appear, because if it did, I knew I would have to confess. Oh well... About a year back, I was looking through /etc and found that a few system files had world write permission. Gasping with horror, I went to put it right with something like dipshit# chmod -r 664 /etc/* (I know, I know, goddamnit!.. now) Everything was OK for about two to three weeks, then the machine went down for some reason (other than the obvious). Well, I expect that you can imagine the result. The booting procedure was unable to run fsck, so barfed and mounted the file systems read-only, and bunged me into single-user mode. Dumb expression..gradual realisation..cold sweat. Of course, now I can't do a frigging chmod +x on anything because it's all read-only. In fact I can't run anything that isn't part of sh. Wedgerama. Hysteria time. Consider reformatting disks. All sorts of crap ideas. Headless chicken scene. Confession. "You did WHAT??!!" Much forehead slapping, solemn oaths and floor pacing. Luckily, we have a local MegaUnixGenius who, having sat puzzled for an hour or more, decided to boot from a cdrom and take things from there. He fixed it. My boss, totally amazed at the fix I'd got the system into, luckily saw the funny side of it. I didn't. Even though at that stage, I didn't know much about unix/suns/booting/admin, I did actually know enough to NOT use a command like the one above. Don't ask. Must be the drugs. BTW, if my future employer _is_ reading this (like they say he/she might), then I have certainly learned tonnes of stuff in the last year, especially having had to set up a complete Sun system, fix local problems, etc :-) Anyone else got a tale of SGS (Spontaneous Gross Stupidity) ? -dave "I'm much better now, honest.. no, really.. hey what's this button doooooooooOOOOOO..." -- Anatoly Ivasyuk @ Computer Science House @ Rochester Institute of Technology anatoly@nick.csh.rit.edu || axi0349@ultb.rit.edu || axi0349@ritvax.rit.edu You say you haven't heard of CSH? You will... Google Home - Advertise with Us - Search Solutions - Services & Tools - Jobs, Press, & Help ©2003 Google