XEN HA HOWTO
Uvod
1)Zakladne pojmy
2)Konkretna instalacia
3)Zakladne prikazy
4)Konfiguracia & install
6)Links
1)Zakladne pojmy
XEN,HA,LVM,DRBD
XEN je opensource projekt hypervisora ktory umoznuje spustat rozne operacne system sucastne
na jednom pocitaci v sucastnosti su to Linux s jadrom 2.4 a 2.6 ,FreeBSD,NetBSD,Sun Solaris.
architectura xenu je asi takato
|DomU|Dom U|DOM U|
-----------------
| DOM 0 |
-----------------
|XEN HP|
--------
| HW |
--------
HA je skratka pre High Availability co znamena vysoka dostupnost,pre linux
existuje mnozstvo projektov ktore su na HA zamerane vid. Links.My pouzijeme
software z projektu linux-ha.org ktory sa vola heartbeat
LVM je vrstva ktora sa nachadza nad realnym diskom a umoznuje mi za behu
systemu zvacsovat zmensovat particie,disky pre Dom U,
alebo stripovat resp. mirrorovat data medzi rozne disky.
architectura LVM je asi takato
-----------
| LVM |
|particie |
-----------
| LVM |
-----------
| Disk |
-----------
2)Konkretna instalacia
Instalacia bude pozostavat z 2 serverov skylla,charybda ktore budu priamo spojene
sietovym kablom(toto spojenie sa odporuca robit redundatne tade UTP/seriak a podobne).
Na kazdom servery pobezi jedna domena a ta obsah svojho disku bude mirrorvat
na druhy server.
Z pohladu HA sa jedna o active/active cluster kde obi dva servery vykonavaju kriticke
ulohy,ale zaroven striehnu ci je ten druhy hore a ak nie preberu za nho je ho pracu.
Nevyhnutnou podmienkou je pre taketo riesenie zdielany storage
napr. SAN alebo aj (alebo ako lacne linux riesenie)drbd disk.
Ak cluster bezi iba s jednym nodom tak po
starte druheho nodu, server automaticky preberie na seba zodpovednost za
svoje ulohy a cluster bezi dalej
skylla charybda
------------ -----------
| | drbd raid 1 | |
| domain1 |=================>| domain1 |
| disk | domain1 | disk |
|----------| |----------|
| |10.0.1.1 10.0.1.2| |
| |<=================| |
|----------| drbd raid 2 |----------|
| domain2 |<=================| domain2 |
| disk | domain2 | disk |
------------ ------------
Instalacia LVM
Na serveroch spravime jednu vg definovana nad diskom a to lvm .
--- Volume group ---
VG Name lvm
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 12
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 3
Open LV 1
Max PV 0
Cur PV 2
Act PV 2
VG Size 67.55 GB
PE Size 4.00 MB
Total PE 17293
Alloc PE / Size 14258 / 55.70 GB
Free PE / Size 3035 / 11.86 GB
VG UUID 0DxUer-knvn-IOUI-I3G4-I6D7-qBBz-5ktO13
--- Logical volume ---
LV Name /dev/lvm/home
VG Name lvm
LV UUID 14poHQ-kHwZ-GO4z-VYJn-Uvi2-Qq6P-AtkfhZ
LV Write Access read/write
LV Status available
# open 1
LV Size 35.00 GB
Current LE 8960
Segments 2
Allocation inherit
Read ahead sectors 0
Block device 253:0
--- Logical volume ---
LV Name /dev/lvm/domain1
VG Name lvm
LV UUID fmOgOW-rVv0-Xr3l-2TrT-LJsR-V3Sp-J5rZsZ
LV Write Access read/write
LV Status available
# open 0
LV Size 10.35 GB
Current LE 2649
Segments 4
Allocation inherit
Read ahead sectors 0
Block device 253:1
--- Logical volume ---
LV Name /dev/lvm/domain2
VG Name lvm
LV UUID Vg4U5z-4zV1-UOKg-lTlN-321N-bPPN-T5wJ5E
LV Write Access read/write
LV Status available
# open 0
LV Size 10.35 GB
Current LE 2649
Segments 2
Allocation inherit
Read ahead sectors 0
Block device 253:2
XEN xend sa v userspace sklada zo servera xend ktory sa spusta hnedpo starte
systemu z init.d skriptov a z utility xm na manazovanie behu domen a ich
sledovanie.
xm list [-l pre long vypis] vypise zoznam aktualne na systeme beziacich virtualnych domen.
xm console [domain name] pripoji virtualnu konsolu na konsolu domeny u co bezi
s menom [domain name]
xm top nieco podobne ako top pre processy a len pre domeny.
xm create [-c] [path to domain config] vytvori novu domenu a spusti ju tak ako je definovana
v configuraku.Defaultne su configuraky v /etc/xend/.
xm destroy okamzite zrusi virtualnu domenu .
xm shutdown robia to co maju v nazve :)
reboot
DRBD ma svoj konfigurak v /etc/drbd.conf
na kontrolovanie statusu drbd modulu pouzivame /proc/drbd kde mame aktualne informacie.
!!Pozor nikdy sa nesmie pracovat s nizsie polozenym zariadenim ako s drbd ak
to obsluhuje disk.t,j nikdy neskusat nic robit s lvm particiou domain1 ale
iba pomocou drbd
drbdadm utilita na manazovanie drbddiskov
drbdadm state [drbd resource name||all] vypise status drbd raidu.
drbdadm cstate [drbd resource name||all] vypise connection status drbd raidu.
drbdadm primary [drbd resource name||all] !!dangerous nastavi drbd resource status na
tejto masine na primary
drbdadm secondary [drbd resource name||all] !!dangerous nastavi drbd resource status na
tejto masine na secondary neda sa pokial je drbd disk mountnuty
drbdadm connect [drbd resource name||all] spoji nody do jedneho.
4)Konfiguracia && Instalacia
Takze konfiguraciu xend servera tu opisovat nebudem ,existuje uz nespocetne vela manualov
na to pre rozne linuxove distribucie.Takze Zaciname in medias res :D.
takze ak mame nakonfigurovane lvm na nasom servery presne tak ako je hore uvedene
(zalezi na menach nie velkostiach).mozme prejst ku konfiguracii drbd ako zdielanecho storage pre
nase domeny.DRBD je distribuovane ako modul do kernelu takze ho nejak musite dostat do svojej
oblubenej distribucie.
DRBD konfigurak je v /etc/drbd.conf
resource domain2 {
# transfer protocol to use.
# C: write IO is reported as completed, if we know it has
# reached _both_ local and remote DISK.
# * for critical transactional data.
# B: write IO is reported as completed, if it has reached
# local DISK and remote buffer cache.
# * for most cases.
# A: write IO is reported as completed, if it has reached
# local DISK and local tcp send buffer. (see also sndbuf-size)
# * for high latency networks
#
#**********
# uhm, benchmarks have shown that C is actually better than B.
# this note shall disappear, when we are convinced that B is
# the right choice "for most cases".
# Until then, always use C unless you have a reason not to.
# --lge
#**********
#
protocol C;
# what should be done in case the cluster starts up in
# degraded mode, but knows it has inconsistent data.
#incon-degr-cmd "echo '!!DRBD!! raid was started in degradated mode see /etc/drbd.conf line 90 ' | wall ; sleep 60 ; halt -f";
startup {
# Wait for connection timeout.
# The init script blocks the boot process until the resources
# are connected. This is so when the cluster manager starts later,
# it does not see a resource with internal split-brain.
# In case you want to limit the wait time, do it here.
# Default is 0, which means unlimited. Unit is seconds.
#
wfc-timeout 20;
# Wait for connection timeout if this node was a degraded cluster.
# In case a degraded cluster (= cluster with only one node left)
# is rebooted, this timeout value is used.
#
degr-wfc-timeout 100; # 2 minutes.
}
disk {
# if the lower level device reports io-error you have the choice of
# "pass_on" -> Report the io-error to the upper layers.
# Primary -> report it to the mounted file system.
# Secondary -> ignore it.
# "panic" -> The node leaves the cluster by doing a kernel panic.
# "detach" -> The node drops its backing storage device, and
# continues in disk less mode.
#
on-io-error detach;
# In case you only want to use a fraction of the available space
# you might use the "size" option here.
#
# size 10G;
}
net {
# this is the size of the tcp socket send buffer
# increase it _carefully_ if you want to use protocol A over a
# high latency network with reasonable write throughput.
# defaults to 2*65535; you might try even 1M, but if your kernel or
# network driver chokes on that, you have been warned.
sndbuf-size 512k;
timeout 30; # 6 seconds (unit = 0.1 seconds)
connect-int 6; # 10 seconds (unit = 1 second)
ping-int 6; # 10 seconds (unit = 1 second)
# Maximal number of requests (4K) to be allocated by DRBD.
# The minimum is hardcoded to 32 (=128 kb).
# For hight performance installations it might help if you
# increase that number. These buffers are used to hold
# datablocks while they are written to disk.
#
max-buffers 8192;
# The highest number of data blocks between two write barriers.
# If you set this < 10 you might decrease your performance.
max-epoch-size 10240;
# if some block send times out this many times, the peer is
# considered dead, even if it still answers ping requests.
# ko-count 4;
# if the connection to the peer is lost you have the choice of
# "reconnect" -> Try to reconnect (AKA WFConnection state)
# "stand_alone" -> Do not reconnect (AKA StandAlone state)
# "freeze_io" -> Try to reconnect but freeze all IO until
# the connection is established again.
# on-disconnect reconnect;
}
syncer {
# Limit the bandwith used by the resynchronisation process.
# default unit is KB/sec; optional suffixes K,M,G are allowed
#
rate 500M;
# All devices in one group are resynchronized parallel.
# Resychronisation of groups is serialized in ascending order.
# Put DRBD resources which are on different physical disks in one group.
# Put DRBD resources on one physical disk in different groups.
#
group 2;
# Configures the size of the active set. Each extent is 4M,
# 257 Extents ~> 1GB active set size. In case your syncer
# runs @ 10MB/sec, all resync after a primary's crash will last
# 1GB / ( 10MB/sec ) ~ 102 seconds ~ One Minute and 42 Seconds.
# BTW, the hash algorithm works best if the number of al-extents
# is prime. (To test the worst case performace use a power of 2)
al-extents 257;
}
on skylla {
device /dev/drbd1;
disk /dev/lvm/domain2;
address 10.0.1.1:7789;
meta-disk internal;
# meta-disk is either 'internal' or '/dev/ice/name [idx]'
#
# You can use a single block device to store meta-data
# of multiple DRBD's.
# E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1];
# for two different resources. In this case the meta-disk
# would need to be at least 256 MB in size.
#
# 'internal' means, that the last 128 MB of the lower device
# are used to store the meta-data.
# You must not give an index with 'internal'.
}
on charybda {
device /dev/drbd1;
disk /dev/lvm/domain2;
address 10.0.1.2:7789;
meta-disk internal;
}
}
XEND konfigurak pre xend je v /etc/xend
konfigurak pre domeny je v /etc/xend/domain?
#----------------------------------------------------------------------------
# Kernel image file.
kernel = "/boot/vmlinuz-2.6.1?-xenU"
# Optional ramdisk.
#ramdisk = "/boot/initrd.gz"
# The domain build function. Default is 'linux'.
#builder='linux'
# Initial memory allocation (in megabytes) for the new domain.
memory = 256
# A name for your domain. All domains must have different names.
name = "domain1"
# List of which CPUS this domain is allowed to use, default Xen picks
#cpus = "" # leave to Xen to pick
cpus = "0" # all vcpus run on CPU0
#cpus = "0-3,5,^1" # run on cpus 0,2,3,5
# Number of Virtual CPUS to use, default is 1
vcpus = 1
#----------------------------------------------------------------------------
# Define network interfaces.
# Number of network interfaces. Default is 1.
#nics=1
# Optionally define mac and/or bridge for the network interfaces.
# Random MACs are assigned if not given.
vif = [ '' ]
#----------------------------------------------------------------------------
# Define the disk devices you want the domain to have access to, and
# what you want them accessible as.
# Each disk entry is of the form phy:UNAME,DEV,MODE
# where UNAME is the device, DEV is the device name the domain will see,
# and MODE is r for read-only, w for read-write.
disk = [ 'phy:/dev/drbd0,sda1,w' ]
# Set the hostname.
#hostname= "vm%d" % vmid
# Set root device.
root = "/dev/sda1 rw"
# Sets runlevel 4.
extra = "3"
pri konfiguracii ha.cf su velmi podstatne nastavenie timeoutov pre prehlasenie ze dany stroj
je mrtvy.Lebo neodpoveda na ping(hearbeat).
v haresources je uvedeny zoznam prislusnych aplikacii ,ktore startuje heartbeat z
/etc/heartbeat/init.d alebo z /etc/init.d. u nas by tento file mohol vyzerat asi takto.
skylla drbddisk /dev/lvm/domena1
skylla haxendomains domain1
charybda drbddisk /dev/lvm/domena2
charybda haxendomains domain2
(!! je to len ukazke scripty drbddisk a haxendomains nie su mojim majetkom tak
ich nemozem zverejnit,pisal som ich pre jednu firmu takze tak :D.Haxendomains je inak jednoduchy
startovaci script co podla vstupu stopne alebo startne danu domenu.
drbddisk len nastavi danu domenu do modu Primary/* na danom node.)
6)Links:
LVM
http://www.tldp.org/HOWTO/LVM-HOWTO/
XEN
http://tx.downloads.xensource.com/downloads/docs/user/
Heartbeat & drbd
http://www.linux-ha.org/GettingStartedV2
http://www.linux-ha.org/DataRedundancyByDrbd
ha.cf
http://linux-ha.org/ha.cf
http://linux-ha.org/ha.cf/DefaultValues
haresources
http://linux-ha.org/haresources
|
webhosting by: |
UnlimitedHosting | CustomHosting | FreeWeb.sk |
Comments
hypervisor
schvalne, kolkym vam procesor podporuje VMX/SVM, aby ste tam nejaky HW hypervisor vobec mohli pustit?
nie je nutne mat podporu
nie je nutne mat podporu VMX/VT staci obycajny x86/x86-64/ia64 a este neviem aky CPU .
mne dom bezi xen3 na netbsd a cPU mam staru 400MHZ sunku:)
xen ruluje
inak hosi neviem ci ste si to vsimli ale je na free download velmi mocny nastroj, xen enterprise, je to sice 30 dnovka, ale to hadam nie je problem ;)
______________________________
my life is better than sci-fi
dobre
dobre diky...
---------------------------------------
nadani ucit se je dar;
schopnost ucit se je dovednost;
ochota ucit se je volba;
(0_-)
dost dobre
LVM je podobna RAIDu alebo
LVM je podobna RAIDu alebo som to zle pochopil?? Inak tucny clanok. gj.
-------
I'm lowkey like seashells.
RAID bol pokial viem
RAID bol pokial viem navrhnuty na zvysenie redundancie systemu voci vypadku disku (raid 1...5),a na zvysenie vykonu pomalych diskov(raid 0)
LVM ja navrhnute ako vrstva ,ktora pred userom schovava vsetky take veci ako su partitions,a taktiez disk na ktorom je dana particia.Ktorych resizovanie je fakt zla vec. zatial co resiznut LVM partition je hracka.
Mozem mat viac diskov v tvz VG(volume groupe)a ked vytvorim v tejto VG LV(Logica volume) tak sa mi tato lv rozlozi po celej VG a ja nemam kotrolu nad tym kde sa mi nachadzaju data ak zapisujem na LV.
:)
P.S. Urcite skuste LVM je fakt super vec :D