2008/08/16

Virtually Speaking

Any teaching activity requiring any amount of experimental work faces the problem of replicating the appropriate experimental setting for all the students involved. In a course in biology, this might mean a complex environment in which several work spots need to be deployed with a complete set of tools and measurement instruments. If the course is in programming, the requirement is usually a properly equipped computer (or several of them) with the appropriate programs previously loaded.

The common difficulty for these environments is that the starting point for all the replicas should be identical. The reason for this strong requirement is to facilitate the support by the teaching staff. By providing a common starting point, students then might reach situations that could be easily analyzed by the staff, because the starting point is well known.

In the context of experimental environments that rely on a computer, there has been multiple articles which cover how to provide a fixed, common, well known environment to work. The measures to achieve such property can go as far as to provide every student (or workstation) with a newly installed version of all the tools. This means students work every day in a computer with tools installed just minutes away. The reason behind this (apparently) extreme decision is the trade-off between the amount of administration power given to regular users and how easy is to keep the computer with the tools properly installed. If a laboratory requires experimenting with the installation of new tools, users must be given these privileges. But then, these privileges can be used to completely re-install all the applications. Thus, the solution is to do a fresh install, say every 24 hours. There is a market for tools that automatically replicate the configuration of a given operating system in a large number of computers through a network. They greatly simplify this administration task, although in most cases, this solution is a bit drastic in the sense that at it might take a long time to synchronize a large number of machines (think a university with 20 labs each of them with 20 computers).

Virtualization technology (a concept that was conceived in the early 70s) offers the possibility of embedding one complete computer as single application running in a second computer. This second computer is what is usually called the physical machine whereas the first one is the virtual machine. This trick is pulled thanks to two functions. The first is that the monitor used by the virtual machine is embedded as a regular window in the physical machine. The second is that every resource needed in the virtual machine, the petition is passed along to the physical machine.

For example, the virtual machine needs disk, it uses a chunk of disk in the physical machine. If it needs a network connection, it uses the connection in the physical machine. The same applies for USB devices, keyboard, mouse, etc. The keys you type in the keyboard are read by the virtual machine if the window with the virtual screen is selected.

The crux of this scheme is in what is usually called the virtualization layer which is the one absorbing the complexity of translating all the internal requests in the virtual machine to the proper requests for the physical one. Of course, this layer takes some time to perform such translation, and as a consequence, performance problems may arise.

But with the current processor speeds, disk size, bandwidth, etc, virtualization has emerged as a perfectly valid paradigm to execute any equipment in any hardware platform (at least in theory). One of the companies that has been around for quite some time is VMware. The business model that they have envisioned focuses on those companies that need to equip a large number of programmers with several virtual machines simultaneously installed. Another area in which these machines are very useful is to guarantee compatibility of a product across different platforms. Any team member developing a product may try it almost instantly in a variety of operating systems and hardware platforms by using as many virtual machines.

As a consequence, VMware offers the tool to create a virtual machine from scratch (which means installing the operating system) and the program to virtualize its execution in a physical machine (commonly known simply as a player free of charge. This means that anybody may create a virtual machine, save it (it is simply a folder with several files), and distribute it under certain conditions (you may distribute your virtual machine, but not the player).

One of the courses I teach requires programming in a Linux environment. Not all the students (although the percentage is increasing significantly) have Linux installed in their personal computer. The solution I've adopted consists on creating a virtual machine at the beginning of the semester, make sure it has all the required software installed, I even through a couple of files in the desktop specific for the course I'm teaching, burn a DVD and distribute it among the students (I simply give one of them the DVD, they then apply the extremely effective distribution channel they are used to).

With such approach, the installation problems are all fixed at design time with the virtual machine, I have the guarantee they are all using the same versions of all the tools (which rules out some occasional annoying bugs), and if anything goes terribly wrong, the machine can always be re-deployed by simply replacing the folder containing the virtual machine by its initial values.

VMware is not the only virtualization engine. In fact, there is a powerful open source project called Xen that offers similar functionality, very competitive performance, although reduced compatibility.

The use of such technology in a learning environment is very promising. More and more disciplines require special tools that require special operating systems or packages. By confining these requirements into a virtual machine, the special environment can be immediately available to all the students at a low cost.

Of course, as you may already imagine, when crossing virtualization techniques with licensing schemes, things are complicated. In principle, you might purchase one single license of a tool, install it on a virtual machine and distribute the virtual machine. Vendors have long been aware of this possibility and the licensing schemes have evolved to factor this in. If all the tools you use are open source, there should be no problem. As soon as you use a tool with some sort of license, distributing a virtual machine with that program installed might not work as expected or get you in some trouble.

0 comments: