################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally presented at the Seventh System Administration Conference (LISA '93) Monterey, California, November 1-5, 1993 It was published by USENIX Association in the Conference Proceedings of the Seventh System Administration Conference For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org A Case Study on Moves and Mergers This is a case study on the issues surrounding a large administration group move or merger. The actual case is that of the Silicon Graphics Inc. merger with Mips Computer Systems Inc., but the lessons learned are global to such events. The paper goes through the merger process, and the subsequent move of the new Mips engineering organization, pointing out many of the key decisions and problems that were overcome. Introduction In July of 1992 Mips Computer Systems, a small, high-tech, computer company was purchased by one of their best customers, Silicon Graphics Inc. At the time Mips was comprised of approximately 700 people designing RISC computer processors, and systems. The company used over 1400 computer systems spread over approximately 50 networks. By the end of September the six building in the Mips facility were empty. All of the remaining employees had been merged into the main Silicon Graphics campus along with their computer systems and labs. The administrators that carried out the move had to reinvent many solutions that other groups had learned before, and all systems administrators can learn from their mistakes. This paper documents many of the problems and solutions that came about from this project in the hopes that it will aid the next such attempt. The Project The upper management of Silicon Graphics quickly made the decision that centralizing all of their projects onto the main company campus was a key to the success of the merger. Mips was in the middle of several important projects, and SGI did not want to lose key Mips employees due to the feeling of neglect or mistreatment. At the same time, SGI did not want to seriously effect the engineering schedules on projects continuing within Silicon Graphics itself. In addition to the merger, Silicon Graphics was in the process of a major restructuring which included a massive reorganization of their campus. This greatly complicated the progress on finding locations for the Mips employees moving into the Silicon Graphics facility. One large part of the project was the creation of a new company which contained all of the processor and compiler development for the Mips architecture. This included opening a new building on the SGI campus containing a new network design, offices, labs, and a computer room. Beyond the physical plant for this new company, Mips Technologies, Inc., was the creation of a new sub-domain under Silicon Graphics, mti.sgi.com, with its own mail, news, and name services. This new sub-domain needed to work within the SGI domain in order to facilitate joint engineering projects, but also had to remain entirely separable logically. A major part of the project was the incorporation of nearly 700 new Unix accounts into the SGI name-space. Over 200 of the users from mips had to change their login names, and nearly all of them had to change their user IDs to fit into the flat name-space on the SGI campus. This required the changing of ownerships on millions of files on the Mips computer systems that were moving to the new site. Dozens of other projects were simultaneously underway to merge the database systems from the new entities, to deal with separate bug and information tracking systems, to separate the compute resources for the Mips engineering projects, to centralize source and release trees, and much more. Early Concerns The immediate concerns after the announcement of the merger were mostly concerned with the structure of Silicon Graphics, and how they were going to fit the Mips contingency into it. At the same time as the merger with Mips Computer Systems, Silicon Graphics decided to have a major reorganization of the current divisions, which delayed the question of Mips until after that had succeeded. Once the decision to create a separate company containing all of the VLSI and compiler design had been made, many personnel decisions needed to be worked out about which support people would move over as part of the MTI organization, and which would be merged into other portions of the Silicon Graphics engineering department. The members of the Mips support group were all expected to apply and interview for the positions in Silicon Graphics at the same time that they were trying to support projects at Mips, design the domain for MTI, and learn the new operating system and hardware platforms in order to train the Mips engineers. In designing the new domain the systems administrators from Mips needed to know how the hardware from Mips was going to fit into the compute philosophy at SGI. Were they to move over all of the workstations, or was Silicon Graphics going to attempt to provide engineers in certain departments with an SGI platform to work with. Were they going to bring over the hardware labs from Mips or were those projects going to be canceled. If they were to be in a new building how much of the network hardware was going to be brought over from the Mips facility, and how much was going to be setup ahead of time. To each question there was an answer, but rarely did the answer come before the implementation was begun. There were a lot of concerns about merging the projects themselves in the beginning as well. Both Mips and Silicon Graphics had a compiler project, and they each had VLSI groups. They had two different operating systems groups, and various systems projects. Some managers wanted an immediate merger of the two projects while some wanted this delayed as long as possible so that current projects would be less effected. For instance, the compiler organization wanted a single group on day one, while the VLSI groups wanted to delay this until the follow-up projects were organized. Each of the merged projects had computer needs that had to be considered separately. Pre-merger Decisions There were a lot of decisions made on how to implement the moment of the move that had lasting consequences on how things were approached down the road. This is a discussion on some of the more important of these. Probably the most important decision that was made was that the actual merge of the environments would not happen until the day an engineer moved from one site to the other. It was decided that this was the right thing to do to least effect the projects at either site. The number of employees sitting in Mips buildings and working on projects at SGI was relatively small, as was the reverse, and these employees could maintain separate accounts in the opposite domain. The consequences of this decision were vast. This meant that the User ID for accounts would not change until the day the user moved. Real work over NFS was disallowed across the boundary because this would inevitably cause confusion with UID/GID space. It also meant that the domain name would suddenly switch on move day from mips.com to mti.sgi.com or some other domain, and all of a users mail had to be forwarded. It also meant that there would be a small number of machines that would have to appear to be in both domains for a short time in order not to break dependencies within certain projects. Another relatively important decision was that the network layout would be totally redesigned for the new site. This meant that every machine that moved would have to change addresses on the move, that licenses would all have to change when the machine moved, and that network boards would have to be shuffled between machines during the physical move to get reasonable network gateways configured. A third long-lasting decision was that the mti.sgi.com domain was going to be treated differently from the other domains on campus in order to delay as much of the pain of relearning the environment as long as possible. This included such things as trying to support BSD and System V printer spooling to all the printers in the domain, to support NIS and a centralized rdist distribution for machines, to support the Mips mail configuration, and the mips.com mail aliases indefinitely, and to support the Mips bugs database. In other domains the old mips workstations would be reconfigured to run NIS, their mail configuration would change dramatically, their users would have to learn how to use the System V printer mechanism immediately, and many other things. It was the decision of the Mips engineering managers that they would like to try to delay these changes until after their current projects were completed. Physical Plant The new company, Mips Technologies Inc., that was started out of the merged organization was moved into a new building on the Silicon Graphics campus. Planning the physical plant represented a large part of the work in creating the new domain. This is a discussion on some of the decisions made during the creation of MTI, and their results. One of the early battles was over the existence of a raised floor computer room. There were a lot of questions from upper management about the need for special conditions in which to place the MTI computer systems. There was only one other computer room in all of SGI before the Mips arrival, and this was for the MIS mainframe systems. All the rest of engineering just had machines sitting in labs or in peoples offices, and managers were under the impression that MTI could follow the SGI engineering example to save space and cost. The problem was that MTI had several hundred server machines of various sizes, and all of the systems administrators on the project knew that there was no alternative to a real computer room to house these machines. It took a great deal of effort to have a computer room installed in the MTI building, and to prove that the Mips servers could not be dealt with in any other way. Another piece of the puzzle was the computer network physical wiring. MTI used ethernet over twisted pair to two central closets. This included more that 600 lines of twisted pair home-run into the computer room. All of the cabling was standard level 5 twisted pair. The cost of the wiring plant was brought up several times, but most of the managers had seen wiring plants done inadequately and did not put up too much of a fight. The only weakness in the wiring plant as implemented was that the MTI building was not tied heavily enough into the surrounding buildings. The MTI organization quickly spread into the building next door, but there was not enough fiber between the buildings to have more than a couple of nets spanning both buildings, and being tied in to the MTI computer room. Very few other networking options were considered for this project, use of higher speed networks was delayed for a reasonable alternative, but the wiring plant was designed to support the more promising networking possibilities. Other physical plant questions that were raised included the amount of lab space that was required for each of the engineering and support groups, the locations for printers, copiers, and other centralized resources, and, of course, layout of office locations. These were primarily political questions that were just decided by committee in a heavily attended meeting, but they required systems administration presence, and cut into the time for other preparations. Move Day The actual physical move was relatively easy, but there were an enormous number of problems that were found at that point for which some pretty creative answers were found. The move was done in three phases. In some ways this was nice, but for the most part it made it even more difficult to not interrupt the current projects. Phase one was moving the Silicon Graphics processor group into the new building. This was the simplest of the move phases since this group was already on the Silicon Graphics campus, and was already used to doing things the SGI way. The tough parts of this move were trying to get the domain mti.sgi.com setup in advance, testing all of the network connections, and bringing up the machines in their new network configuration. Little else had to change for these people to do work. However, there were many problems encountered that had to be addressed. The first was that there were several license servers that did not understand domain names, and do not work if they do not have aliases in the local NIS database. Another was that the Sun OS machines expected to receive a non-qualified address from gethostbyname() when they booted to set the right hostname for sendmail, while all of the other machines wanted a fully qualified name. And, a third was that there were some extreme bugs in the routers that were purchased for the new network implementation such that if you plugged in a machine setup for the wrong subnet into some network the routers stopped forwarding packets. In addition to the technical difficulties there were the weaknesses in experience with the Silicon Graphics systems to overcome. Many of the problems in the initial move were that the people on the move team just didn't know were to find things in the Irix operating system, or how to jumper boards that had to be reconfigured. Many of the default configuration decisions were different in Irix so that much of the automation did not work as planned. Phase two was moving the compiler group into the new building. The compiler group was made up of people from Mips and from SGI, and created a considerable bit of difficulty. The biggest issue was that the Mips compiler group and the Mips operating systems group were sharing source trees, and working together very closely, and wanted to be able to work as usual without too much effort until the tapeout for RISC/os. This meant that the User IDs for these compiler people were not changed at this point, and they continued to use the mips.com domain for their machines. The named configuration was heavily messaged so that it would look as if every machine at mips was in the mti.sgi.com domain when named was queried from the Silicon Graphics side of the pipe, and that every machine in MTI was part of the mips.com domain when viewed from the mips side. This made it possible to hide the fact that source machines were moving away from Mips without changing hundreds of links, and breaking dependencies. Another interesting part of this move was that all of the customization was automated by a script that was executed on bootup at the new site. This script setup the new host address, and copied over a large list of configuration files. The lesson was quickly learned that what these types of scripts should do is to copy over a new script, and then just execute it. The customization script changed about ten times during the afternoon as machines were brought up and problems were found. The third phase of the move was the largest, and most complicated. In this phase they moved all of the Mips VLSI groups, and revisited all of the machines from the Mips compiler group to change the domain to mti.sgi.com. Nearly five hundred machines were moved in this one phase which made it a logistic nightmare. But, thanks to the earlier efforts in writing automation scripts to do most of the work, this was not too overwhelming. For most machines the customization was done almost entirely by the script, and all that people on the move team had to do was run around and turn machines on, and make sure that they came up and ran the script without errors. There were still some problems in this move that made things difficult. The biggest problem actually came about because of some bugs in the program that was written to remap the UIDs on some architectures of machines, and more importantly a few typos in the fix that resulted in large numbers of OS files getting there permissions and ownerships changed. Another problem that came up at this point was that they had chosen this time to switch the mips.com MX record over to a machine at SGI, and that the new mail gateway had not been correctly configured. Two days of bounced mips.com mail resulted, and annoyed many of the Mips engineers when they came back to work. Another issue that came up at about this time was that many tools internal to SGI did not deal correctly with subnetting, and made it very difficult for the programmers who had moved into MTI to do any reasonable work. Of course, much of this had to do with the fact that the software people in MTI were upgrading to new alpha versions of the Irix operating system at the same time as systems administrators were trying to deal with move problems. A final interesting issue comes about primarily due to network design by committee. The systems administrators in MTI did not fully understand the network traffic flow for VLSI work so they called together meetings with CAD engineers in Mips to try to layout a reasonable network for the new site. In the meeting it was decided that the structure of how they were designing, and their data location model, was going to have to change anyway, so they might as well plan for the new model instead of changing things six months after getting into the building. The problem with such choices is that changing the network layout is easier than getting people to change how they work, and the transfer to a new compute model took months, with all heat from network problems during that transition being transferred onto the systems administration group. Politics The biggest problem in dealing with a large merger proved to be dealing with the merger of culture and organization structure. A couple of good examples for this came out during this merger, and in the months following it. The most difficult issue for the systems administrators to handle was the changes in the systems administration model itself. Mips had a centralized systems administration group that handled all aspects in a single group. Silicon Graphics was driven by project, and whatever projects felt they needed help hired a person to do support for their group, and the rest just had engineers supporting their resources. Mips had central servers and dataless workstations that had all of their files updated centrally out of rdist. Silicon Graphics followed a peer model where people had their homes, and all of their important data on their workstations, and all of the machines on the network were highly customized by the people who used them. Mips followed relatively strict rules for internal security of the machines while SGI had the iron door philosophy where the gateways are very secure, but all the rest of the machines are basically open. Mips systems administrators maintained the engineering hardware, while the customer service department maintained the hardware platforms on the SGI campus. Plus, of course, about two hundred other major differences in administration philosophy. The structure changes were also hard to get used to. Mips had a very simple management hierarchy. People worked together well, but each person still had one boss, and had to please a single individual. SGI has more of a matrix management, where each person has a boss, but also reports to any other manager that he directly influences. This might be reasonable for an engineer, but for a support person this means that an individual may have a dozen indirect managers. Mips had a single domain, and a flat name-space throughout the company maintained by a single group within the company. Silicon Graphics, however, is highly subdomained, and the interaction between these domains can be very difficult to understand. For instance, the MTI systems administrators can plan their networks, but they have almost no say in the implementation of the plan. The network hardware and wiring is all handled out of another group in a different domain. Also an engineer can have accounts in multiple domains, and they are each treated as separate account with little automatic updating in configuration. If a user wishes to change his password he must change it in each of the domains were he has accounts separately. One other issue that made this merger more difficult than it should have been was the whole issue of having employees, that should have been concerned with administrating the move, spend large portions of their time trying to locate a permanent position within the new organization. Many of the people on the move team were temporary employees that had no assurances of having a job after the move was completed, and nearly all of the people had to work on finding a comfortable new position in Silicon Graphics, and proving their capabilities to a new set of managers. Conclusions The Silicon Graphics Inc., Mips Computer Systems Inc. merger and subsequent move was a beast unto itself, but there are many lessons that can be drawn out of the trials involved that will help in other such instances. This is a summary of some of the clear examples. The first hint is to automate as much as possible, but assume that the script and the configuration files will change during the move. During the MTI move almost none of the dozen or so configuration files that were copied around went through the entire process without change. The second is to never assume that the people who setup the machines will know anything about systems administration. Do to the enormous scale of most moves, the systems administrators typically end up doing only a small part of the work on the day of the move. Everything should be setup so that the machines just need to be plugged in and turned on for any changes to take place. A third would be to make sure that all machines trust at least one host from a central location during the move. All of the machines within MTI were setup to trust the domain master mti.mti.sgi.com during the move so that changes in the configuration could be easily pushed around from a central host. This is especially important if the move is multi-phased since configuration files are almost sure to change as each step is achieved. Another hint would be to carefully layout the network maps well in advance of the move. Every team member should have a clear map, containing every host, so that they can easily see how things fit together. Many of the initial problems on move day will be connectivity related. Such problems as a misconfigured gated on a server that ceases sending out route changes an hour into setup are common. A fifth point is that order is important on bringing the machines down, and in bringing them back up. This is a relatively obvious point to a systems administrator, but not to a mover or a facilities person. If the script that runs when a machine comes up cannot get to a host that it wants to copy files from then bad things happen. Another obvious point, but one that is often neglected in the panic of a move, is that everything should be documented, and hard copies should be available. The host file, a network map, a copy of each of the configuration files, directions in setting things up, etc. should be printed out and given to each person on the move team. The basic point is "kill a tree; save a systems administrator". And, part of the same point, all cables, outlets, etc., should be carefully labeled as things are setup. As difficult as moves and mergers are for systems administrators it is important to remember that every person in the company, or building, is being affected, and calm heads prevail. Stress does interesting things to people, and it should be considered in the overall plan. Planning to have meals taken care of in advance, or staggering schedules, can help more than all the technical planning in the world. And, in the same vane, everyone should be prepared for the possibility of catastrophic failure. If the supported engineers have unrealistic expectations of being able to come into work the day after the move, and having everything work perfectly then conflict is inevitable. Another idea is to warn all of maintenance contractors that you are planning to move hardware so that they can prepare a stockpile of parts that commonly fail in a move. Some maintenance agreements have special clauses with respect to moving hardware that necessitate the contractors presence, and possible additional payment. Setup a report mechanism in advance. Leave a form that people should fill out for move problems sitting in each chair, and then deal with the problems as people turn them in, checking each one off as it is done. Giving a clear impression that problems are being addressed is very important. An important hint is that there is a priority to the services that must be up quickly, and without fail. The hottest topics in MTI at the time of the move were news and mail. Users were relatively happy if they could come into work, sit down at their workstation, and read news, even if they were unable to work on their projects. Don't neglect the little things. A common reaction to the stress in a large move is to ignore the small problems to concentrate on the major issues. To an engineer there are no minor problems, and the fact that his workstation is facing the wrong direction is just as important as the fact that another system can't talk on the net. It is well worth having a person dedicated to these types of problems. The other people in the building want to help, and often feel powerless sitting around in their office all day watching systems administrators jogging back and forth down the hallway taking care of problems. It is a common impression that engineers, or managers, would only confuse the issue and make things more difficult in the long run, but there are things that an engineering manager can do to be helpful. Having them put on a barbecue on move day, or running to the store to get powerstrips are good examples. A good thing to remember is that the administrators should be single-minded about making the move successful. None of these people should be allowed to worry about anything outside of the task at hand. The added pressure of having to locate their next job, or their next meal should be removed. There should be a central point of communication and status for the move team. At Mips this was a giant whiteboard with a constantly maintained list of move issues that needed to be dealt with, and what their current status was. Any third party orders need to be sent out should be taken care of very early. This includes primarily hardware orders for wiring, network equipment, new machines/printers, etc. But, it also includes things like licenses that may have to migrated. Most third party vendors have trouble accurately estimating delivery times. Always remember, this is only a job. Any major change in a company is a carefully measured risk, and falls back on higher management. Moves and mergers are one of the most difficult and stressful of all systems administration projects, but they can also be one of the most fun. John Schimmel was educated at the University of California, San Diego, where he was introduced to the world of systems administration. He has been doing systems administration work for six years, and is currently a Senior Programmer/Analyst for the Engineering Computer Services group of Mips Technologies Inc. Reach him via U.S. Mail at Silicon Graphics Inc. M.S. 10U-178; 2011 N. Shoreline Blvd.; Mountain View, CA 94039. Reach him electronically at jes@sgi.com.