JISCMail - TB-SUPPORT Archives

Email discussion lists for the UK Education and Research communities

Subscriber's Corner

Email Lists

TB-SUPPORT Archives

TB-SUPPORT@JISCMAIL.AC.UK

View:

Message:

[

First

Last

]

By Topic:

[

First

Last

]

By Author:

[

First

Last

]

Font:

Proportional Font

		LISTSERV Archives
		TB-SUPPORT Home
		TB-SUPPORT September 2011

Options

Subscribe or Unsubscribe

Get Password

Subject:

Re: Virtual machines

From:

Peter Grandi <[log in to unmask]>

Reply-To:

Testbed Support for GridPP member institutes <[log in to unmask]>

Date:

Wed, 21 Sep 2011 23:56:27 +0100

Content-Type:

text/plain

Parts/Attachments:

text/plain (179 lines)

>> I am just looking around to find what is the commonly used VM
>> Hypervisor in GridPP group (i.e Xen, KVM or etc)

> KVM without a shadow of a doubt. It works, it's easy, and it's
> in SL5 as standard issue. You'd need a good positive reason to
> go for anything else these days.

I agree with this. I would however say that while KVM is good,
but in other contexts I have had good experiences with Xen, with
a paravirtualized kernel, as paravirtualization can have reduced
overheads compared to virtualization (even with AMD or Intel hw
virtualization assist).

Many WLCG host types are more network and disk intensive, and VMs
work better for memory and CPU oriented workloads that do not
involve device virtualization. Xen also allows moving VM images
around, which may be useful (it was designed to implement the
XenoServers "cloud").

As to other proper VM systems a pet peeve is that having had to
deal with a system based on VMware Server 2.0/GSX I found that I
really dislike it (buggy, big limitations, high overheads, no
longer maintained that much). At the time it was installed there
was little else though. I had better previous experiences with
VMware ESX, but I think that the VMware sysmaint infrastructure
is annoying to use (I preferred editing ".vmx" files directy),
but then I don't particularly like GUIs.

As to VM overheads with VMware GSX this for example is a year of
CPU overheads *inside* (the host has additional overheads...) a
GSX VM for an LCG CE:

  http://ganglia.dur.scotgrid.ac.uk/ganglia/graph.php?g=cpu_report&z=large&c=Grid%20Servers&h=ce02.dur.scotgrid.ac.uk&m=&r=year&s=descending&hc=4&st=1316641054

However, given the relatively small number of hosts and host
types in a T2 however I would strongly prefer for a new setup to
just buy a number of smaller, low power-draw real machines.

Lots, lots simpler to deal with and less buggy and much lower
overheads, and nearly all WLCG host types are far from CPU
intensive (and most are not even RAM bound). But then I am very
skeptical as to the usefulness of VM setups in general (while
they can be very useful in special cases), so perhaps this is
just my prejudice.

While for a WLCG site the middleware people heavily discourage
running multiple host types on the same physical host, the number
of those is not really that huge, and there are hw products that
have 2 or even 4 real machines in a 1U pizzabox (this is driven
by webhosting).

An alternative that I considered was switching to something else
like http://linux-vserver.org/ which virtualizes/partitions user
space into contexts/containers/zones (for those unfamiliar: a
kind of generalized 'chroot'), and has nugatory or negative
overheads, and a lot less opportunities for bugs, and seems to
map very well onto WLCG host type issues, and I have had very
good experiences with it in the past (as long as they can share
the running kernel, it even supports running many different
distributions on the same host).

>> Does any use any tool to manage these VM [ ... ]

> We use the libvirt/virsh/Virtual Machine Manager tools. [ ... ]
> definitely don't want to be starting kvm itself on the
> commandline manually - the libvirt stuff is the right level of
> indirection.

I think that VMs are really simple things, and one needs tools
only for very many. Dealing with half a dozen VMs it seemed
easier to just edit the VM configuration files manually and start
and stop the VM process manually too.

>> For all these stuff, planing to use a machine with spec
>> Intel(R) Xeon(R) E5345 @ 2.33GHz , 8 core, 16Gb RAM Just
>> wondering how many VM can be setup on this machine considering
>> the instance we may run on it.  - two EMI cream - sbdii - apel
>> - argus

Sounds reasonable, and I had similar hosts running pretty
comfortably 4 VMs each. Some hosts types like SBDII and APEL are
really small (SBDII is one LDAP daemon serving a few KiB of data
total, APEL is one Java log summarizer running once a day). So I
ended up running SBDII on the UI, as that seems one of the host
type combinations for which I can't see any problem running two
host types on the same host, and I was tempted to do the same for
APEL. I would guess that running Torque on the same physical host
as might work well too, and I was very tempted to do that, as in
general the host types that do not depend much on middleware
libraries should be able to share a host. To be confirmed :-).

> [ ... ] CREAM CEs, in our experience, will want about 6Gb of
> RAM each, so two of those, plus say 1GB for each of the others,
> totals 15GB, and leaves you a little over for the host OS.

That seems reasonable to me, but I found to my surprise that the
CREAM CE was lighter than the LCG one, and 2GiB seemed adequate:

  http://ganglia.dur.scotgrid.ac.uk/ganglia/graph.php?g=mem_report&z=large&c=Grid%20Servers&h=cream02.dur.scotgrid.ac.uk&m=&r=year&s=descending&hc=4&st=1316641054

But like the LCG CE it seems to need restarting periodically
because of climbing memory usage with time (same with a few other
daemon based services, and I would restart servers every 1-3
months "just in case" :->).

> [ ... ] pay some attention to the speed of the disks and the
> amount of random IO they can handle. A locally attached array
> of 15k SAS disks (i.e. a Dell R510 disk server) is one approach,

That's a very good point, and R510s are nice. One configuration
that I liked and seemed good value was 2x 15k SAS and 2x or 4x or
6x 10k SAS (or 10k/7.2k "enterprise SATA" with ERC). BTW I much
prefer using Linux MD to hw RAID for several reasons, among them
the ability to move disks to boxes with different hw IO cards,
and I had terrible experiences with some 3ware cards, and other
people with many other types of RAID cards (as previously
mentioned in this list).

I had to deal with a setup with VM images on NFS, with several
virtual disks allocated as growable, and I found that was (very)
painful on the non-grid side. On the grid side it worked better,
but it was still something that I would not have done.

One of the major issues was backups: with backups running, that
is tree-walking (RSYNC) inside the VMs the peak load (especially
IOPS) climbs much higher than average, and VM overheads (RSYNC
networking and RSYNC reading heavily) can be huge.

> we have our VMs storage on an old 14 drive supermicro disk
> server, and that seems able to cope too (it's not running the
> VMs though, so all its memory is disk cache).

If that is a SAN with virtual disks allocated as chunks of the
SAN it seems viable to me, if it is a NAS (NFS) it seems a lot
less of a good idea.

If one has to use NFS I liked as a workaround to put the relevant
subtree on NFS, and mount it inside the VM, instead of putting it
inside a virtual disk image and accesidn the virtual image over
NFS. This allows backing up the tree without going through the VM
overhead, and often network VM overheads are less expensive than
virtual disk ones (and there are other reasons).

So where possible I had small virtual disk images (4GiB, so
relatively quick to backup/duplicate as a whole, after making it
quiet) containing just the OS, and all data mounted via NFS,
*even from the same host* (that is, the NFS server was the VM
host itself, and traffic went over 'lo'). Not optimal, but
better, as the VM's only virtual disk accesses are then almost
only syslogging.

> typical small server setup of two basic SATA disks in a RAID
> mirror though, you won't have enough IO capacity to go round,
> particularly for the CREAM CEs.

In my experience that actually sort of worked, choosing nice
disks and a nice compact layout, but it was just sufficient in
some cases.

Also on a host with a few GiB the whole dataset ends up residing
in memory. After all a CE may have a load characterized by a job
turnover rate of a few per second, and manage perhaps a thousand
jobs, and the total (active) data needed to represent them should
fit in a few GiB, and writes should not be that frequent.

But there indeed are sources of disk arm contention, like OS
logging, and critically as mentioned above, backups, so a nice
RAID10 of 4 disks looks to me better than a RAID1 of 2 disks.
Even the DPM SE seems to require relatively small memory and disk
footprints (but then this site us not doing analysis):

  http://ganglia.dur.scotgrid.ac.uk/ganglia/graph.php?g=mem_report&z=large&c=Grid%20Servers&h=se01.dur.scotgrid.ac.uk&m=&r=year&s=descending&hc=4&st=1316641624

but then while a big analysis site may have lots more files
registered in the DPM, the number of metadata queries against the
SE is really proportional to number of jobs (and I think that
even analisys jobs open very few), not total number of files
stored.

Top of Message | Previous Page | Permalink

JiscMail Tools

Files Area | help

RSS Feeds and Sharing

Search Archives

Advanced Options

Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002

JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk