Category Archives: General

Tips on Using Ganeti to Manage a KVM based Virtual Machine Cluster on Ubunty Jaunty Jackelope 9.04

Update:  I apologize for not updating this post.  I struggled with this for quite a while before making real progress, which I’ll try to detail.  A few key points:

  1. debootstrap doesn’t install a bootloader, so even if you are using kvm, you need to specify a kernel on the parent/host and a root disk device (on in the vm) as part of the config.   Make sure that the kernel matches the modules installed by debootstrap, or you’ll have lots of other problems.
  2. The default use of virtio for the disk interface causes problems with the kvm version that ships with ubuntu.  The virtual machines bios may not detect it.  Specify IDE for less hassle.

I’ve hacked up the ganeti-os-debootstrap scripts to use ubuntu’s vmbuilder script to create ubuntu VMs that do have a boot loader.  I need to do a little cleanup and then I’ll share my work.

————————————

We are using a number of virtual machines to support the efforts at work.  We’ve been running these on VMWare server on some Linux servers for the past year, but I’m looking at moving on from there to something that is based on more open software.   I wanted to share some of the reasons behind the choices I made, and how I got over some of the obstacles I encountered with my choice of Ubuntu Jaunty Jackelope (9.04) for my OS, KVM for virtualization and Ganeti to manage the virtual machines.  This won’t be exhaustive, but hopefully it will help other people.

I’d been eying Ganeti, a package for managing multiple Xen or KVM virtual machines running on a cluster of hosts.  I was particularly intrigued because Ganeti went so far as managing redundant storage via DRBD.  Still, I took a look at Eucalyptus because it implements significant portions of the Amazon Web Services API for provisioning system instances and perisistent storage.  I was even more intrigued when I discovered that they supported both S3 (a key-value store) and EBS (a block-based storage layer).  I ended up choosing Ganeti though.  Eucalyptus required me to configure a shared highly available storage layer, something that Ganeti largely handled for me.  More importantly a limitation in some of the software they integrated to provide EBS, meant that I couldn’t run instances that used EBS volumes on the same machine that was providing the EBS storage service, which wasn’t acceptable for the small 2-4 host cluster I planned on building.

I also had the choice of the Xen or LVM hypervisors.  I chose LVM because it is supposed to be better supported by Ubuntu, and, in the long run, looks like it will become the favored choice of Redhat as well since.

Installing Ganeti:

There is a version of Ganeti packaged for Ubuntu, but it is an older version that doesn’t support the features that most interest me that are only available in v2.0, so I I worked from the Ganeti 2.0 installation document.  I ran into a few problems because it is skewed towards using the Xen hypervisor and Debian, while I wanted to use the KVM hypervisor on Ubuntu.

The first issue I hit was in trying to install the DRBD prerequisite.  DRBD mirrors block devices over the network, providing an important piece of the fault tolerance and high availability puzzle that Ganeti builds on.  Ganeti requires a more recent version of DRBD.  Earlier versions of Ubuntu and Debian package this version, but Jaunty only has a package for an earlier version of DRBD.  Stranger still, it has utilities for managing the more recent version.  With a little digging, I found that the modules for DRBD8 are actually packaged with the server kernel.  So, my first problem was no problem at all.

Initializing and Running Ganeti:

The next issue I hit is with changes Jaunty made to the default python path, and the fact that the implications of those changes hadn’t propagated everywhere they needed to go.  The result is that once I installed Ganeti, I got a python import error when trying to run ‘gnt-cluster init.’ My solution was to move ‘ganeti’ from  ‘site-packages’ to ‘dist-packages.’

The next problem I ran into is that I wasn’t using Xen.  I knew enough the first time through not to bother creating symlinks for a Xen instance kernel, but I didn’t really know what to do instead.  In trying to figure that out, I realized that I should have specified that the default hypervisor be kvm when I initialized the cluster.  Even though Xen wasn’t installed, it defaulted to Xen.  So, I had to destroy the cluster and initialize a new one:

gnt-cluster init --default-hypervisor=kvm myclustername

Default Kernel for New Instances:

When I first tried to create a new instance, I got this:

gnt-instance add -t plain -s1G -n vmhost3 -o debootstrap vm1.office.alki.comFailure: command execution error:

Hypervisor parameter validation failed on node vmhost3.office.alki.com: Instance kernel '/boot/vmlinuz-2.6-kvmU' not found or not a fi

# gnt-instance add -t plain -s1G -n vmhost3 -o debootstrap vm1.ournet.net
Failure: command execution error:
Hypervisor parameter validation failed on node vmhost3: Instance kernel '/boot/vmlinuz-2.6-kvmU' not found or not a file

It looks like the solution to this problem is to adapt the instructions for creating symlinks for a Xen instance kernel, and link /boot/vmlinuz-2.6-kvmU to my current server kernel.  I have a feeling that I’ll be using a more stripped down kernel once I figure out how this all fits together.

OS Support Files for Creating New Instances:

The ganeti install cover installing the OS support files, but it seems the default configuration option puts the files in ‘/usr/local/share/ganeti/os,’ rather than ‘/srv/ganeti/os.’ The README that comes with the support files suggests more appropriate configuration options:

./configure --prefix=/usr --localstatedir=/var \
    --sysconfdir=/etc \
    --with-os-dir=/srv/ganeti/os
  make && make install

That seems to do the trick:  gnt-os list includes debootstrap, and creating a new instance seems to work as expected and I type this, it seems to be starting up!

Connecting a Console to a Running Instance:

When I first ran ‘gnt-instance console instancename’ I got an error that /usr/bin/socat was missing.  Installing it with ‘aptitude install socat’ but the console doesn’t seem responsive, and a kvm process has been using 100% of one core for about 5 minutes now.

Accessing an Instances Disks:

As part of my debugging, I wanted to try to access the disk image of the instance to see if the log files showed anything.  This was a challenge in and of itself.  From the Ganeti documentation, I thought that running ‘gnt-instance activate-disks instancename’ would give me the name of a device I could mount, but doing so generated an error”

mount: wrong fs type, bad option, bad superblock on /dev/mapper/xenvg-a505f631--72fe--4100--a7e5--b3efae6d8082.disk0,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

A little digging and I learned that the virtual disk actually had partitions, which needed to be mapped before I could mount the partition.

# gnt-instance activate-disks vm1
vmhost3.office.alki.com:disk/0:/dev/xenvg/a505f631-72fe-4100-a7e5-b3efae6d8082.disk0
# kpartx -av /dev/xenvg/a505f631-72fe-4100-a7e5-b3efae6d8082.disk0
add map xenvg-a505f631--72fe--4100--a7e5--b3efae6d8082.disk0p1 (252:7): 0 2088449 linear /dev/xenvg/a505f631-72fe-4100-a7e5-b3efae6d8082.disk0 1
# mount -t ext3 /dev/mapper/xenvg-a505f631--72fe--4100--a7e5--b3efae6d8082.disk0p1 /mnt/kvm-image
# ls /mnt/kvm-image/
bin  boot  dev  etc  home  lib  lost+found  media  mnt  opt  proc  root  sbin  selinux  srv  sys  tmp  usr  var

From checking the log directory, it is clear that whatever is going on, it’s never getting to the point where it can write to a log file.

Hmmmm, maybe it has something to do with the fact that there is no kernel or initrd?  Could that be it, maybe? Hmmm.

UPDATE: As of this writing, I still don’t have an instance running successfully.  I’m going to spend a little more time trying to get it to work and then probably cut bait in use the basic vm management tools ubuntu provides.

The Ganeti community seems pretty thin. The Google group has had undealt-with spam for the last few days, and an appeal for help I posted hasn’t drawn any response.  I found an IRC group on Freenode, but there are only two other people in it, and they may well be dead.  It’s too bad, because it seems like cool software.  I guess the other option is to try using Xen instead of KVM, and/or try using the packaged version in the universe repository.

Google’s ChromeOS Doesn’t Have to be Popular to Matter

This week Google confirmed a long running rumor that they were working on their own operating system when they announced their ChromeOS.  Most of the resulting commentary I’ve seen have missed the mark.  A lot of tech journalists and bloggers focused on the Google / Microsoft rivalry.  Dave Weiner found that predictable narrative to be a boring one, and dismissed it for the same reason the journalists seemed to find it interesting, because it was yet another fight between two big tech companies. Ultimately ChromeOS didn’t interest him because the Chrome browser didn’t support his favorite browser extension, a bookmark synchronization tool, and because, being Linux based, it wouldn’t run Frontier, the desktop software he wrote that he uses to develop and run most of his websites. On Slate, Farad Manjoo criticized the move with an article titled “Five Reasons Google’s new Chrome OS is a Bad Idea.

Here is the thing, and it is really simple, Chrome and ChromeOS don’t need to become popular for them to do well by Google, they just have to have influence.

It works like this.  Google benefits when more people use the web more often for more activities. They benefit primarily from increased opportunities for advertising revenue, but also they are getting paid for Google Apps.

More people will use the web more often for more activities as:

  • Web applications offer more and more utility and usability
  • Devices that can access the web become more affordable
  • Internet connectivity becomes cheaper and more widespread

I don’t think ChromeOS helps with internet connectivity, unless it includes easy to use mesh networking, and even then, its not going to make that big a difference, but the effort helps with the other two.

Chrome the browser helps make web applications more useful and easier to use. It has already helped make both performance and robustness a bigger issue in the browser world. Since Chrome published their first performance numbers, both Safari and Firefox have made strong strides of their own on Javascript performance. I’m not saying that WebKit and Firefox weren’t already working on the problem, the speed with which they responded shows they were, but I think the entry of Chrome has helped accelerate the pace of improvement.  Just this week, the Firefox developers let out some news about their work on a multiprocess architecture like Chrome’s to help with stability.

Chrome the OS both helps make web applications more useful. It has the potential to create an environment where web applications work better with each other, and also with local applications and files. By doing so Chrome OS also puts pressure on other OS vendors (ie Apple and Microsoft) to do a better job of supporting web applications as well.

It also gives them away to influence the cost of client operating systems, and, by extension, desktop, notebook and netbook computers. Linux may ultimately be an unpopular choice on netbooks, but its presence helped put pressure on Microsoft to keep selling XP and make it available for netbooks at a lower cost.

It would be a mistake to look at this through the cost issue through lens of the US or Western Europe. This really an issue in the developing markets where computer penetration among “consumers” and small businesses is still quite low.  In those circumstances fewer people think they need to run Office or Photoshop, etc so compatibility with desktop applications isn’t as important as it is to tech journalists and bloggers. These markets represent a huge opportunity for Google’s advertising and also Google Apps. When computer penetration is low, even pushing the price down $20 could lead to a big bump in the number using computers, and that will help drive economies of scale that help make the hardware even cheaper, and network effects that increase the relative value of having a computer.

That all this might hurt Microsoft by putting pressure on their prices and revenues is kind of a bonus.

Ditching Webfaction for a Linode VPS

A few months back I moved our websites off of Pair.com to Webfaction in search of better performance.

I still haven’t cancelled my account with Pair, but I’m already planning on leaving webfaction. The performance is there, the reliability is less than I’d like.

Our sites were down for something around 12 hours a few months back. I couldn’t even get to their support website to file a trouble-ticket and I never saw any explanation or even acknowledgement of the problem. More often, I’ll get a “bad gateway” error because the backend apache server running my sites failed and hadn’t restarted yet. This has hit my wife particularly hard because it keeps happening in the middle of writing posts for her blog, and she’s lost work.

So, I’m going to get a virtual private server from Linode.com. I avoided this in the first place because I didn’t want to deal with system administration, but truth is, it isn’t going to be that much work, and its going to be more expensive. The upside is that I’ll have complete control to tweak things.

Update: Shortly after posting this, someone at Webfaction saw my post and emailed me, offering to help with the bad-gateway problem.  I’ve been giving it a try.  It seems potentially better, but it has eats more aggressively into my memory allocation.  We also went back and forth about the downtime I had at the end of may, but I still don’t have an explanation I’m satisfied with.  I am satisfied though that they are going to make their support offering more robust.  I’ll see how it goes.  I’d already paid for a month of service at Linode, so I’ve been tinkering with getting a VPS set up to see how it performs.

Tools For Tracking Breaking News on Social Media

It was exciting and exhausting to try and keep up with the flood of news coming out of Iran in the wake of its disputed election on Flickr, YouTube, and Twitter, among others. My general approach is to search by keyword, sort the results by recency, and then start refining the search by adding or excluding keywords. For example, when I was searching Flickr, I ended up excluding a long list of non-Iranian city names to filter out protests of sympathy from outside Iran.

It ended up being overwhelming though, and I’ve since started relying on other people like Andrew Sullivan (who is back to blogging on a wider range of subjects) and Nico Pitney to filter signal from the noise.  I thought though that I’d take a minute to blog some of the tools I found helpful, and also some thoughts about the challenges I faced, and how to better deal with them.

In the middle of using Flickr to find photos of the first days protests, Flickr switched on (for a subset of users, at least) a new search results interface.  Flickr’s search was already pretty nice because it allowed you to filter out specific keywords, sort by date, and even filter by a range of dates.  The new results interface goes further.  First off, it allows you to chose from three different image sizes for the results, which makes it easier to screen photos.  It also includes a sidebar that highlights groups and photographers that might have photos relevant to your search.  Flickr’s search is pretty good, the only incremental improvement I’d like is if they made it easier to narrow, expand, or pan the date range on an existing search result set.  That, and giving me more visibility into the tags in the results would make it easier to refine my search terms.

I started using Twitter’s search for the #iranelection hashtag, but quickly got overwhelmed by retweets.  I ended up turning to Tweetmeme.  Tweetmeme aggregates links posted to Twitter and then lets you sort by relevance, # of tweets or age.  Even better, it lets you slice the results by the same criteria.  So, you can sort by relevance, but then limit the results to only show links tweeted in the past day and retweeted at least 100 times.

I should say that while I didn’t make use of it, Twitter’s advanced search allows limiting of results by time, and various other criteria that could be useful.  It would be nice if these were surfaced as suggested refinements on the search results page.

I also wanted an easy way to look for photos being posted to Twitter.  There are a few options,  I wasn’t too happy with any of them.  I ended up using Twicsy, which had the advantage of an interface optimized for scanning photos.  The downside was that it only showed photos from the past hour or so, and didn’t seem to include any ranking based on retweets.  Tweetmeme lets you filter your search to images, but it presents the results using their generic UI.  They show thumbnails for some of the results, but the thumbnails are tiny.

This posting is taking longer than I wanted, so I’m going to finish with a laundry list of the issues that I’m still having, accompanied in some cases by ideas of how to fix them.

  • Images, photos, text, and links get repeated.  Often I’ll want to find the early/original expressions.
  • The sides that sites deal with redundancy at all (like Tweetmeme), just seem to be counting links.  This helps, but really, I think more sophisticated content analysis is needed.  Analyzing links doesn’t help when different people repost the same image to Flickr, or the same video to YouTube, iReport, etc.
  • The first appearance of a given piece of content is important.  It helps establish credibility.
  • Reputation and history of the poster is important.  For example, Flickr photos from people who’d been posting photos from Iran for the past month or more tended to be more credible, and more likely to be original, than those from people who have been in San Francisco for the past year.
  • I don’t always want the most recent information.  I’m not watching this stuff minute by minute.  Sometimes I want to check in after a day or two, so, just letting me sort by recency isn’t good enough.  I need to be able to filter by day, or even hour intervals.

Flickr Photos from Iran

Flickr can be a great place to find photos of breaking news, but it can take some work to filter through things.  I thought I’d share some of the images my wife and I found there, with links back to the originals.

Iran election

Post election_009

20090613.Protest

20090613.Protest

20090613.Protest
Interesting comment by the phototgrapher of the above 3 images. He had to use a proxy to upload it, but later he was able to get through again.

090613151436_clash1

Tehran is on fire

Iran election

Iran elections

I think the following may be republished wire photos, but I’m not sure:
Tehran Protests

Iran elections

Iran election

Tehran Protests

more photo's of todays chaos #iranelection

more photo's of todays chaos #iranelection

more photo's of todays chaos #iranelection

more photo's of todays chaos #iranelection

Some more photo's from yesterday in tehran #iranelection

more photo's of todays chaos #iranelection

more photo's of todays chaos #iranelection

more photo's of todays chaos #iranelection